ES2909446A1

ES2909446A1 - Sistole and diastole automatic detection method (Machine-translation by Google Translate, not legally binding)

Info

Publication number: ES2909446A1
Application number: ES202130971A
Authority: ES
Inventors: Perez David Moratal; Pelegri Manuel Perez; Menadas José Vicente Monmeneu; Lereu María Pilar Lopez; Gomez José Manuel Santabarbara; Gonzalez Alicia M Maceira
Original assignee: Universidad Politecnica de Valencia; Exploraciones Radiologicas Especiales SA ERESA
Current assignee: Ecg Medica SL
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2022-05-06
Anticipated expiration: 2041-10-18
Also published as: WO2023067212A1; ES2909446B2

Abstract

Sistole and diastole automatic detection method. The invention describes an automatic detection method of systole and diastole in a sequence cardiac magnetic resonance cine the network with convolutionary layers to extract spatial patterns and dilated convolutions to encode temporary information, and training the network; and a stage of applying the network trained to the detection of systole and diastole in the cinema sequence object of study by preparing the sequence to normalize its entrance to the network, extracting spatial patterns and encoding temporary information through the network, providing 3 values for each moment that indicate the probability that corresponds to systole, diastole or none of them, and classifying temporal moments in systole and diastole. (Machine-translation by Google Translate, not legally binding)

Description

DESCRIPCIÓNDESCRIPTION

Método de detección automática de sístole y diástoleAutomatic detection method of systole and diastole

CAMPO DE LA INVENCIÓNFIELD OF THE INVENTION

La presente invención se refiere de manera general al campo de la medicina y más concretamente al de la cardiología. La invención se refiere específicamente a un método de detección automática de sístole y diástole mediante el uso de una red neuronal puramente convolucional en una secuencia cine de resonancia magnética cardíaca.The present invention relates generally to the field of medicine and more specifically to cardiology. The invention specifically relates to a method of automatic detection of systole and diastole by using a purely convolutional neural network in a cardiac magnetic resonance cine sequence.

ANTECEDENTES DE LA INVENCIÓNBACKGROUND OF THE INVENTION

En el ámbito clínico se puede caracterizar el estado del corazón a través de secuencias cine de resonancia magnética. Estas secuencias conforman volúmenes del corazón en distintos instantes temporales permitiendo ver la contracción de los tejidos para su análisis. Para una correcta caracterización el profesional clínico deberá extraer como principales parámetros los volúmenes de los ventrículos en sístole (máxima contracción cardíaca) y diástole (estado de relajación cardíaca), así como la fracción de eyección que se deriva de los anteriores. Sin embargo, el primer paso para realizar este análisis es la detección de la sístole y la diástole en la secuencia, lo cual puede requerir mucho tiempo y por tanto es deseable disponer de un método de detección automática de la sístole y la diástole que facilite y agilice el análisis y diagnóstico posteriores por parte del profesional médico.In the clinical setting, the state of the heart can be characterized through magnetic resonance cine sequences. These sequences make up volumes of the heart at different moments in time, allowing the contraction of the tissues to be seen for analysis. For a correct characterization, the clinical professional must extract the volumes of the ventricles in systole (maximum cardiac contraction) and diastole (state of cardiac relaxation) as main parameters, as well as the ejection fraction derived from the former. However, the first step to perform this analysis is the detection of systole and diastole in the sequence, which can be time consuming and therefore it is desirable to have an automatic detection method of systole and diastole that facilitates and expedite subsequent analysis and diagnosis by the medical professional.

Se han realizado pocos estudios describiendo métodos que permitan automatizar la detección de la sístole y la diástole en este tipo de secuencias. Algunos casos han propuesto la segmentación del ventrículo izquierdo a través de redes neuronales en todos los instantes temporales (también denominados “frames”) y derivar la localización en función del volumen de cada uno de ellos (véase, por ejemplo, Hsin, C., y Danner, C. (2016). Convolutional Neural Networks for Left Ventricle Volume Estimation). Este método tiene el inconveniente de requerir la segmentación previa en todos los instantes temporales para poder entrenar la red neuronal. Few studies have been carried out describing methods that allow automating the detection of systole and diastole in this type of sequences. Some cases have proposed the segmentation of the left ventricle through neural networks at all time instants (also called "frames") and derive the location based on the volume of each of them (see, for example, Hsin, C., and Danner, C. (2016). Convolutional Neural Networks for Left Ventricle Volume Estimation. This method has the drawback of requiring prior segmentation at all time instants in order to train the neural network.

En otro caso se han utilizado redes neuronales que se entrenan para segmentar el ventrículo izquierdo y, a continuación, determinar los puntos de sístole y diástole mediante la localización del centro del ventrículo izquierdo respecto a una referencia (véase, por ejemplo, Yang, F., He, Y., Hussain, M., Xie, H., y Lei, P. (2017). Convolutional neural network for the detection of end-diastole and end-systole frames in free-breathing cardiac magnetic resonance imaging. Computational and mathematical methods in medicine, ID de artículo 1640835, 2017). Finalmente, otro método descrito ha utilizado redes convolucionales para extraer los patrones espaciales en las imágenes y, a continuación, utilizar redes recurrentes con capas de LSTM (“Long Short Term Memory’, memoria a corto/largo plazo) para codificar la información a nivel temporal y finalmente aplicar la clasificación en cada instante temporal de la secuencia (véase, por ejemplo, Kong, B., Zhan, Y., Shin, M., Denny, T., y Zhang, S. (octubre de 2016). Recognizing end-diastole and end-systole frames via deep temporal regression network. In International conference on medical image computing and computer-assisted intervention (págs. 264-272). Springer, Cham). En los últimos dos casos los estudios se realizaban utilizando únicamente cortes individuales de la secuencia cine y con un número fijo de instantes temporales.In another case, neural networks have been used that are trained to segment the left ventricle and then determine the points of systole and diastole by locating the center of the left ventricle relative to a reference (see, for example, Yang, F. , He, Y., Hussain, M., Xie, H., & Lei, P. (2017). Convolutional neural network for the detection of end-diastole and end-systole frames in free-breathing cardiac magnetic resonance imaging. and mathematical methods in medicine, Article ID 1640835, 2017). Finally, another method described has used convolutional networks to extract the spatial patterns in the images and then use recurrent networks with layers of LSTM ( 'Long Short Term Memory', short/long term memory) to encode the information at the spatial level. and finally apply the classification at each time instant of the sequence (see, for example, Kong, B., Zhan, Y., Shin, M., Denny, T., and Zhang, S. (October 2016). Recognizing end-diastole and end-systole frames via deep temporal regression network.In International conference on medical image computing and computer-assisted intervention (pp. 264-272). Springer, Cham. In the last two cases, the studies were carried out using only individual cuts of the cine sequence and with a fixed number of time instants.

El artículo “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master’s thesis in Mathematical Statistics, Skarberg, Fredrik, Department of Mathematical Sciences. UNIVERSITY OF GOTHENBURG), describe un método para promediar inputs (datos de entrada) de volumen a lo largo del eje z en una aplicación de segmentación sobre imágenes de microscopía de alta resolución. El conjunto de datos consiste en imágenes en 3D de grandes dimensiones que se quieren segmentar. Concretamente, el autor describe estos datos con un número de cortes por cada caso de unos 200; sin embargo, la red neuronal que utiliza no coge el dato al completo, sino que coge fragmentos de unos pocos cortes y aplica la segmentación a nivel del fragmento de la imagen escogida, pasando de un fragmento de imagen en 3D (es decir, en 3 dimensiones) a un fragmento de imagen en 2D (es decir, en 2 dimensiones). Esto es posible en ese caso dado que las imágenes de cada corte descritas son de una similitud muy elevada, por lo que aplicar una segmentación a la imagen promedio y después expandirla al resto de cortes del bloque tiene lógica en esta aplicación en la que una imagen promedio incorpora toda la información volumétrica y al mismo tiempo será extremadamente similar a todos los cortes del volumen dada la naturaleza de las imágenes. Sin embargo, esta técnica no sería aplicable a la detección automática de sístole y diástole en una secuencia cine de resonancia magnética cardíaca, ya que, por ejemplo, debido a la propia naturaleza del corazón (el tejido del corazón puede variar notablemente entre unas regiones y otras), la imagen de un corte puede ser muy distinta en función de la región en la que se localiza dicho corte.The article “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master's thesis in Mathematical Statistics, Skarberg, Fredrik, Department of Mathematical Sciences. UNIVERSITY OF GOTHENBURG), describes a method for averaging inputs (input data) of volume along the z-axis in a segmentation application on high-resolution microscopy images. The data set consists of large 3D images that are to be segmented. Specifically, the author describes these data with a number of cuts for each case of about 200; however, the neural network it uses does not take the entire data, but rather takes fragments of a few cuts and applies the segmentation at the fragment level of the chosen image, going from a 3D image fragment (that is, in 3 dimensions) to an image fragment in 2D (that is, in 2 dimensions). This is possible in this case given that the images of each slice described are of a very high similarity, so applying a segmentation to the average image and then expanding it to the rest of the slices of the block makes sense in this application in which an image The average incorporates all the volumetric information and at the same time will be extremely similar to all the slices of the volume given the nature of the images. However, this technique would not be applicable to the automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence, since, for example, due to the very nature of the heart (heart tissue can vary considerably between some regions and others), the image of a cut can be very different depending on the region in which the cut is located.

En los artículos de X. Lei, H. Pan and X. Huang ("A Dilated CNN Model for Image Classification”, en IEEE Access, vol. 7, págs. 124087-124095, 2019, doi: 10.1109/ACCESS.2019.2927169) y de Ming Li, Chengjia Wang, Heye Zhang, Guang Yang ("MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis”, Computers in Biology and Medicine, volumen 120, 2020) se da a conocer la aplicación de convoluciones dilatadas específicamente para la extracción de información espacial. Concretamente en la segunda referencia queda descrito un bloque inicial en el que se aplican las convoluciones dilatadas para la extracción de patrones espaciales y para ayudar en la segmentación, y a lo largo del texto se menciona que dicho módulo permite la extracción de patrones espaciales de cada instante temporal. Posteriormente se describe otro módulo en el que se extrae la información temporal mediante capas LSTM. Todo esto queda resumido en el apartado de discusión del artículo, en el que se menciona que "el entramado propuesto se basa en un enfoque híbrido de convolución densa dilatada piramidal para extracción precisa de características espaciales, convolución jerárquica con unidades recurrentes de LSTM para la recuperación de información temporal, (...)”.In articles by X. Lei, H. Pan and X. Huang ("A Dilated CNN Model for Image Classification”, in IEEE Access, vol. 7, pp. 124087-124095, 2019, doi: 10.1109/ACCESS.2019.2927169) and Ming Li, Chengjia Wang, Heye Zhang, Guang Yang ("MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis," Computers in Biology and Medicine, volume 120, 2020) discloses the application of dilated convolutions specifically for the extraction of spatial information. Specifically, in the second reference, an initial block is described in which dilated convolutions are applied to extract spatial patterns and to assist in segmentation, and throughout the text it is mentioned that said module allows the extraction of spatial patterns at each instant. temporary. Later another module is described in which temporal information is extracted through LSTM layers. All this is summarized in the discussion section of the article, in which it is mentioned that "the proposed framework is based on a hybrid approach of pyramidal dilated dense convolution for precise spatial feature extraction, hierarchical convolution with recurring LSTM units for retrieval temporary information, (...)”.

En los artículos mencionados en el párrafo anterior, el problema abordado es el de la clasificación de imágenes, en el que se dispone de una imagen para clasificar (como ejemplo, en la primera referencia mencionada "A Dilated CNN Model for Image Classification” se dispone de imágenes de números del 0 al 9 escritos a mano y se quieren clasificar en función del número al que hacen referencia). Sin embargo, esta técnica no puede aplicarse directamente al problema de clasificar imágenes volumétricas (imágenes en 3D) dentro de una secuencia temporal de imágenes volumétricas (concretamente en secuencias de cine de resonancia magnética cardiaca), dado que en este caso se dispone de una secuencia de N volúmenes como input con una relación temporal entre sí (similar a los fotogramas en un vídeo) y es necesario clasificar cada uno de los N volúmenes en las categorías de sístole, diástole o fondo, y donde sólo uno de los volúmenes ha de clasificarse como sístole y sólo uno de los volúmenes ha de clasificarse como diástole.In the articles mentioned in the previous paragraph, the problem addressed is that of image classification, in which there is an image to classify (as an example, in the first reference mentioned "A Dilated CNN Model for Image Classification” there is images of handwritten numbers from 0 to 9 and want to classify according to the number they refer to.) However, this technique cannot be applied directly to the problem of classifying volumetric images (3D images) within a time sequence of volumetric images (specifically in cardiac magnetic resonance film sequences), given that in this case there is a sequence of N volumes as input with a temporal relationship between them (similar to the frames in a video) and it is necessary to classify each one of the N volumes in the categories of systole, diastole or fundus, and where only one of the volumes is to be classified as systole and only one of the volumes is to be classified as diastole.

El documento US10586531 da a conocer un sistema para el reconocimiento de voz basado en el uso de una red neuronal que utiliza convoluciones dilatadas para procesar la relación temporal de la secuencia de inputs. En este caso se trata de inputs de audio en formato 1D (es decir, se trata de una secuencia de 1 dimensión a lo largo del tiempo: 1D+t). Se hace uso de las convoluciones dilatadas en una red neuronal para tratamiento de información con una relación temporal; sin embargo, la aplicación es completamente diferente de la prevista en la presente invención y por tanto no es aplicable en este caso, dado que, por ejemplo, en la presente invención se requiere el tratamiento de una secuencia de volúmenes (es decir, 3 dimensiones a lo largo del tiempo: 3D+t).Document US10586531 discloses a system for voice recognition based on the use of a neural network that uses dilated convolutions to process the time relationship of the input sequence. In this case, it is about audio inputs in 1D format (that is, it is a 1-dimensional sequence over time: 1D+t). Dilated convolutions are used in a neural network to process information with a temporal relationship; however, the application is completely different from the one foreseen in the present invention and therefore it is not applicable in this case, since, for example, in the present invention the treatment of a sequence of volumes is required (that is, 3 dimensions over time: 3D+t).

El documento US8345984B2 da a conocer un método basado en una red neuronal para la clasificación de acciones humanas en secuencias de vídeo (es decir, 2 dimensiones a lo largo del tiempo: 2D+t). Para ello se basan en la utilización de convoluciones 3D en la red neuronal, de tal forma que la tercera componente de las convoluciones es utilizada para extraer la información temporal de la secuencia de imágenes en el vídeo. El objetivo es por tanto la clasificación de una acción presente en el vídeo completo, no siendo el método aplicable a la clasificación de cada instante temporal de la secuencia para determinar el instante específico correspondiente a un acontecimiento, tal como la sístole y la diástole.US8345984B2 discloses a neural network based method for classifying human actions in video sequences (ie 2 dimensions over time: 2D+t). For this, they are based on the use of 3D convolutions in the neural network, in such a way that the third component of the convolutions is used to extract the temporal information from the sequence of images in the video. The objective is therefore the classification of an action present in the complete video, the method not being applicable to the classification of each temporal instant of the sequence to determine the specific instant corresponding to an event, such as systole and diastole.

El documento US10147193 describe un sistema basado en una red neuronal para la segmentación de objetos en imágenes mediante una arquitectura que combina diferentes niveles de convoluciones dilatadas en 2D.Document US10147193 describes a system based on a neural network for the segmentation of objects in images by means of an architecture that combines different levels of 2D dilated convolutions.

Por tanto, sigue siendo deseable concebir un método que permita detectar automáticamente la sístole y la diástole en una secuencia cine usando los volúmenes al completo de dicha secuencia, comprendiendo la secuencia un número variable de instantes temporales, y comprendiendo los volúmenes un número variable de cortes. Therefore, it is still desirable to devise a method that allows automatic detection of systole and diastole in a cine sequence using the entire volumes of said sequence, the sequence comprising a variable number of time instants, and the volumes comprising a variable number of slices. .

SUMARIO DE LA INVENCIÓNSUMMARY OF THE INVENTION

La presente invención resuelve el problema anteriormente mencionado proponiendo un método tal como se describe en la reivindicación 1. En concreto, se da a conocer un método implementado por ordenador de detección automática de sístole y diástole en una secuencia cine de resonancia magnética cardíaca (también denominada "secuencia cine objeto de estudio” en el presente documento) mediante el uso de una red neuronal puramente convolucional, comprendiendo la secuencia cine una pluralidad de instantes temporales cada uno de los cuales corresponde a un volumen, comprendiendo cada volumen al menos un corte. El método comprende las etapas de: a) preparar la red neuronal, según las siguientes subetapas:The present invention solves the aforementioned problem by proposing a method as described in claim 1. In particular, a computer-implemented method of automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence (also called "cinema sequence object of study" in this document) through the use of a purely convolutional neural network, the cine sequence comprising a plurality of time instants each of which corresponds to a volume, each volume comprising at least one cut. The method comprises the steps of: a) preparing the neural network, according to the following substeps:

a1) preprocesar una o varias secuencias cine de resonancia magnética cardíaca de entrenamiento (también denominadas "secuencias cine de entrenamiento” en el presente documento) para normalizar su entrada a la red neuronal;a1) preprocessing one or more training CMR cine sequences (also referred to as "training cine sequences" herein) to normalize their input to the neural network;

a2) diseñar la red neuronal incluyendo capas convolucionales para extraer patrones espaciales, y convoluciones dilatadas para codificar la información temporal;a2) design the neural network including convolutional layers to extract spatial patterns, and dilated convolutions to encode temporal information;

a3) entrenar la red neuronal mediante la secuencia o secuencias cine de entrenamiento normalizadas;a3) train the neural network using the normalized cine training sequence(s);

b) aplicar la red neuronal entrenada a la detección automática de sístole y diástole en la secuencia cine objeto de estudio, siguiendo las siguientes subetapas:b) apply the trained neural network to the automatic detection of systole and diastole in the cine sequence under study, following the following sub-steps:

b1) preprocesar la secuencia cine objeto de estudio para normalizar su entrada a la red neuronal;b1) preprocess the cine sequence under study to normalize its input to the neural network;

b2) extraer patrones espaciales y codificar información temporal mediante la red neuronal, proporcionando como resultado una secuencia de 3 valores para cada instante temporal, indicando cada valor la probabilidad de que el instante temporal corresponda a sístole, a diástole o a ninguno de los anteriores;b2) extract spatial patterns and encode temporal information using the neural network, providing as a result a sequence of 3 values for each time instant, each value indicating the probability that the time instant corresponds to systole, diastole or none of the above;

b3) clasificar los instantes temporales para detectar un instante temporal correspondiente a la sístole y un instante temporal correspondiente a la diástole.b3) classifying the time instants to detect a time instant corresponding to systole and a time instant corresponding to diastole.

En las reivindicaciones dependientes se describen realizaciones preferidas de la presente invención. Preferred embodiments of the present invention are described in the dependent claims.

BREVE DESCRIPCIÓN DE LOS DIBUJOSBRIEF DESCRIPTION OF THE DRAWINGS

A continuación, se describirá una realización preferida y no limitativa de la presente invención haciendo referencia a las siguientes figuras:Next, a preferred and non-limiting embodiment of the present invention will be described with reference to the following figures:

La figura 1 es un esquema de conversión de un volumen de secuencia cine. La secuencia al completo está conformada por varios instantes temporales, a cada uno de los cuales le corresponde un volumen de la región cardíaca a lo largo del tiempo (inputs en 3D+t). En la figura se presentan los cortes de uno de estos volúmenes. La subetapa de preprocesado transforma cada volumen en una única imagen mediante la operación mediana aplicada en el eje z. El resultado final es una secuencia de imágenes (2D t) en lugar de una secuencia de volúmenes.Figure 1 is a conversion scheme of a cine sequence volume. The entire sequence is made up of several time instants, each of which corresponds to a volume of the cardiac region over time (inputs in 3D+t). The figure shows the sections of one of these volumes. The preprocessing substep transforms each volume into a single image using the median operation applied on the z-axis. The end result is a sequence of images (2D t) instead of a sequence of volumes.

La figura 2 es un esquema de la red neuronal convolucional utilizada. La primera sección recibe como input una secuencia de imágenes de tamaño X * Y * n (donde n corresponde al número de instantes temporales) y aplica convoluciones en 2D y operaciones de agrupación (también denominado “pooling”) 2D para extraer información espacial de las imágenes y reducir su dimensión en el plano 2D. La segunda sección corresponde a un bloque de convoluciones 3D con diferentes caminos que incluyen diferentes tamaños de convolución y dilatación en el eje temporal (tercera dimensión de la convolución). Finalmente, la red aplica una convolución con la operación softmax en 2D para generar las probabilidades asociadas a cada instante temporal de pertenecer a cada categoría.Figure 2 is a schematic of the convolutional neural network used. The first section receives as input a sequence of images of size X * Y * n (where n corresponds to the number of time instants) and applies 2D convolutions and 2D pooling operations to extract spatial information from the images. images and reduce their dimension in the 2D plane. The second section corresponds to a block of 3D convolutions with different paths that include different convolution sizes and dilation in the time axis (third dimension of the convolution). Finally, the network applies a convolution with the 2D softmax operation to generate the probabilities associated with each time instant of belonging to each category.

La figura 3 es un esquema de la subetapa de clasificación final utilizada. En la figura aparecen reflejados varios instantes temporales cercanos a la sístole junto a las probabilidades asociadas de ser sístole de los mismos. El instante temporal finalmente clasificado es el que ocupa la posición central de entre los que presentan una probabilidad superior al 90%. En este caso corresponde al instante temporal n+4 que a su vez corresponde a la sístole real de la secuencia de la figura.Figure 3 is a schematic of the final classification substep used. The figure shows several time instants close to systole along with the associated probabilities of being systole. The time instant finally classified is the one that occupies the central position among those that present a probability greater than 90%. In this case it corresponds to the time instant n+4 which in turn corresponds to the actual systole of the sequence in the figure.

DESCRIPCIÓN DETALLADA DE LAS REALIZACIONES PREFERIDASDETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Según la realización preferida de la presente invención, se da a conocer un método implementado por ordenador para la detección automática de la sístole (máxima contracción cardíaca) y la diástole (estado de relajación cardíaca) en secuencias cine de resonancia magnética cardíaca a través del uso de una red neuronal puramente convolucional. Una secuencia cine de resonancia magnética cardíaca comprende una pluralidad de instantes temporales adquiridos por ciclo cardíaco, cada uno de los cuales corresponde a un volumen (imagen en 3D). Por tanto, se trata de una secuencia en 3D+t. Cada uno de los volúmenes puede presentar cualquier número de cortes (más o menos secciones del corazón).According to the preferred embodiment of the present invention, a computer-implemented method for the automatic detection of systole (maximum cardiac contraction) and diastole (state of cardiac relaxation) in cine sequences is disclosed. Cardiac MRI through the use of a purely convolutional neural network. A cardiac magnetic resonance cine sequence comprises a plurality of time instants acquired per cardiac cycle, each of which corresponds to a volume (3D image). Therefore, it is a 3D+t sequence. Each of the volumes can present any number of cuts (more or less sections of the heart).

De manera general, el método según la realización preferida de la presente invención comprende las siguientes etapas:In general, the method according to the preferred embodiment of the present invention comprises the following steps:

a) preparar la red neuronal, según las siguientes subetapas:a) prepare the neural network, according to the following sub-steps:

a1) preprocesar una o varias secuencias cine de entrenamiento para normalizar su entrada a la red neuronal;a1) preprocess one or more cine training sequences to normalize their input to the neural network;

Según la realización preferida de la presente invención, las subetapas a1) y a2) pueden realizarse en cualquier orden temporal una con respecto a la otra. Es decir, puede realizarse en primer lugar la subetapa a1) seguida por la subetapa a2), puede realizarse la subetapa a2) seguida por la subetapa a1), o pueden realizarse tanto la subetapa a1) como la subetapa a2) de manera simultánea o sustancialmente simultánea, sin por ello alterar el resultado del método dado a conocer en el presente documento.According to the preferred embodiment of the present invention, substeps a1) and a2) can be performed in any temporal order with respect to one another. That is, substep a1) may be performed first followed by substep a2), substep a2) may be performed followed by substep a1), or both substep a1) and substep a2) may be performed simultaneously or substantially simultaneously, without thereby altering the result of the method disclosed herein.

En el presente documento, se usa el término "eje z” para referirse al eje a lo largo del cual se presentan ortogonalmente el corte o cortes (correspondientes a sendas secciones del corazón) que conforman un volumen.In this document, the term "z-axis" is used to refer to the axis along which the slice or slices (corresponding to respective sections of the heart) that make up a volume are orthogonally presented.

En la figura 1 se muestra un ejemplo de preprocesamiento según las subetapas a1) y b1) anteriormente mencionadas. En concreto, según una realización preferida, cada una de las subetapas a1) y b1) incluyen, en primer lugar, la normalización numérica de la señal de los píxeles en cada volumen de la secuencia cine correspondiente (secuencia cine de entrenamiento en la subetapa a1, y secuencia cine objeto de estudio en la subetapa b1), ya que el valor puede variar en función de la máquina empleada para su obtención. En segundo lugar, se normaliza el tamaño de las imágenes a un tamaño fijo de plano en cuanto al número de píxeles. Finalmente, se reconvierte el volumen al completo para dar una única imagen. Para ello, se calcula el valor mediana de cada píxel a lo largo del eje z de cada volumen, generando así, para cada instante temporal, una única imagen representativa del volumen completo. El resultado final es una secuencia de imágenes (secuencia 2D+t) en lugar de una secuencia de volúmenes (secuencia 3D+t). Este último paso permite entrenar una red empleando menos recursos de hardware (el entrenamiento de estas redes neuronales requiere del uso de tarjetas gráficas de grandes prestaciones), al tiempo que incorpora la información de todo el volumen en la imagen final. En el caso de secuencias en las que cada instante temporal corresponda a una única imagen (un único corte para cada volumen), entonces el último paso no es necesario (su aplicación no modificaría la secuencia original) ya que esta ya es una secuencia con las mismas dimensiones requeridas por la red (secuencia 2D+t).Figure 1 shows an example of preprocessing according to the aforementioned substeps a1) and b1). Specifically, according to a preferred embodiment, each of the substeps a1) and b1) include, first, the numerical normalization of the pixel signal in each volume of the corresponding cine sequence (training cine sequence in substep a1 , and cine sequence object of study in sub-stage b1), since the value may vary depending on the machine used to obtain it. Second, the size of the images is normalized to a fixed plane size in terms of the number of pixels. Finally, the entire volume is reconverted to give a single image. To do this, the median value of each pixel along the z axis of each volume is calculated, thus generating, for each time instant, a single representative image of the entire volume. The end result is a sequence of images (2D+t sequence) instead of a sequence of volumes (3D+t sequence). This last step allows training a network using fewer hardware resources (the training of these neural networks requires the use of high-performance graphics cards), while incorporating the information of the entire volume in the final image. In the case of sequences in which each time instant corresponds to a single image (a single cut for each volume), then the last step is not necessary (its application would not modify the original sequence) since this is already a sequence with the same dimensions required by the network (2D+t sequence).

Esto supone una diferencia sustancial, por ejemplo, con respecto al artículo "Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master’s thesis in Mathematical Statistics, Skarberg, Fredrik, Department of Mathematical Sciences, UNIVERSITY OF GOTHENBURG) anteriormente mencionado. En el presente caso no se aplica un factor promediador sobre una región del input, como se realiza en dicho artículo, sino sobre todo el input que corresponde a imágenes médicas en 3D+t para convertirlas en imágenes en 2D+t. Por otro lado, otro factor importante es que en dicho artículo se describe el uso de la media, mientras que en el método según la presente invención se aplica la mediana. Ambos son elementos estadísticos extraíbles de una distribución de valores, pero existen notables diferencias entre las propiedades de ambos. Concretamente, la más notable es que la mediana tiene la capacidad de ser insensible a la presencia de datos aberrantes a diferencia de la media. En el caso descrito en el artículo el valor medio es probablemente más indicado que la mediana, pues el promediado se hace entre cortes muy similares entre sí. En cambio, en el caso de la presente invención se trata con volúmenes de imagen cardíaca en los que hay una diferencia notable entre cortes, pues el tejido puede variar notablemente en unas regiones del corazón con respecto a otras. Con este tipo de imágenes el objetivo de promediar es intentar obtener una representación global del estado de contracción del tejido lo menos alterada posible, por lo que la supresión de datos aberrantes del eje z del volumen es un factor importante de cara a evitar posibles artefactos en la imagen resultante. Por ello se utiliza la función mediana, ya que la aplicación de otras funciones promediadoras como la media (tal como se describe en el artículo anteriormente mencionado) en el método de la presente invención podría producir la presencia de artefactos y aberraciones en las imágenes resultantes, dada la naturaleza de las mismas.This is a substantial difference, for example, with respect to the previously mentioned article "Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master's thesis in Mathematical Statistics, Skarberg, Fredrik, Department of Mathematical Sciences, UNIVERSITY OF GOTHENBURG). In the present case, an averaging factor is not applied to a region of the input, as is done in said article, but to all the input that corresponds to 3D+t medical images to convert them into 2D+t images. , other An important factor is that in said article the use of the mean is described, while in the method according to the present invention the median is applied. Both are extractable statistical elements of a distribution of values, but there are notable differences between the properties of both. Specifically, the most notable is that the median has the ability to be insensitive to the presence of outliers, unlike the mean. In the case described in the article, the mean value is probably more indicated than the median, since the average is made between cuts that are very similar to each other. On the other hand, in the case of the present invention, cardiac image volumes are treated in which there is a notable difference between slices, since the tissue can vary considerably in some regions of the heart with respect to others. With this type of images, the objective of averaging is to try to obtain a global representation of the state of tissue contraction that is as unaltered as possible, so the suppression of aberrant data on the z-axis of the volume is an important factor in order to avoid possible artifacts in the resulting image. For this reason, the median function is used, since the application of other averaging functions such as the mean (as described in the aforementioned article) in the method of the present invention could produce the presence of artifacts and aberrations in the resulting images, given their nature.

En la figura 2 se muestra un ejemplo de la subetapa a2) anteriormente mencionada. El diseño de la red utilizada incluye convoluciones en 2D junto a funciones de activación y operaciones de agrupación (“pooling”) para extraer patrones espaciales de las imágenes de la secuencia cine y reducir su tamaño de forma progresiva. A continuación, se incluye un módulo compuesto de convoluciones en 3D, donde se aplican diferentes tamaños de convolución y de dilatación para codificar la información temporal de la secuencia cine. De este modo, según la realización preferida de la presente invención, se aplican convoluciones con factor de dilatación únicamente en el eje temporal para la extracción de información temporal, a diferencia de técnicas anteriormente conocidas tales como las divulgadas en los artículos “A Dilated CNN Model for Image Classification” y “MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis” anteriormente mencionados, en las que las convoluciones dilatadas se aplican para la extracción de información espacial.Figure 2 shows an example of the aforementioned sub-step a2). The design of the network used includes 2D convolutions together with activation functions and pooling operations to extract spatial patterns from the images of the cine sequence and progressively reduce their size. Next, a module composed of 3D convolutions is included, where different sizes of convolution and dilation are applied to encode the temporal information of the cine sequence. In this way, according to the preferred embodiment of the present invention, dilation factor convolutions are applied only on the time axis for the extraction of time information, unlike previously known techniques such as those disclosed in the articles "A Dilated CNN Model for Image Classification” and “MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis” mentioned above, in which dilated convolutions are applied for spatial information extraction.

La principal característica de las convoluciones con factor de dilatación es que permiten aumentar el campo de visión de la operación sin aumentar el número de parámetros. Este método para tratar secuencias temporales tiene la ventaja de requerir un menor consumo de memoria por parte del hardware y de ser más fáciles de entrenar en comparación con las redes recurrentes más extendidas tales como las LSTM. El resultado final ofrecido por la red es una secuencia de 3 valores para cada instante temporal, donde cada valor indica la probabilidad de que el instante temporal corresponda a sístole, a diástole o a ninguno de los anteriores.The main characteristic of dilation factor convolutions is that they allow to increase the field of vision of the operation without increasing the number of parameters. This method of dealing with time sequences has the advantage of requiring less hardware memory consumption and being easier to train compared to more widespread recurrent networks such as LSTMs. The final result offered by the network is a sequence of 3 values for each time instant, where each value indicates the probability that the time instant corresponds to systole, diastole, or none of the above.

La subetapa de entrenamiento de la red (subetapa a3) según el método de la realización preferida de la presente invención comprende aplicar metodología de aumento de datos (“data augmentation”) para generar nuevas secuencias modificadas, que incluyen rotaciones y translaciones de las imágenes, así como añadir factores que aumenten el ruido en la imagen. En esta subetapa también se incluyen modificaciones de la secuencia temporal a través de un retardo que modifica la posición de los instantes temporales dentro de la secuencia y, por tanto, se modifica la posición en la que se encuentra la sístole y la diástole.The network training substep (substep a3) according to the method of the preferred embodiment of the present invention comprises applying data augmentation methodology to generate new modified sequences, which include rotations and translations of the images, as well as adding factors that increase noise in the image. Modifications to the temporal sequence are also included in this sub-stage through a delay that modifies the position of the temporal instants within the sequence and, therefore, the position in which systole and diastole are located is modified.

Para el entrenamiento de la red también se usa preferiblemente una función de coste a través de la métrica del coeficiente de Dice ponderado. Esta métrica se usa ampliamente en el ámbito de la segmentación de imágenes. El coeficiente de Dice se aplica habitualmente cuando se pretende clasificar elementos dentro de un input con categorías desbalanceadas (véase, por ejemplo, los píxeles dentro de una imagen para segmentar, que es el método en el que se utiliza de forma habitual). Sin embargo, en el caso de la presente invención el input es una secuencia cine de volúmenes de resonancia magnética cardíaca en la que los elementos internos a clasificar son los volúmenes o instantes temporales dentro de la secuencia. Por tanto, la aplicación del coeficiente de Dice al caso concreto de la presente invención no resulta en principio evidente a partir de la técnica conocida. En este caso el resultado puede interpretarse como una segmentación de un vector, por lo que su uso, aunque poco convencional, es igualmente válido. En el campo de la clasificación de secuencias, se ha descrito el uso de esta función de coste en el campo del procesamiento de lenguaje natural para la clasificación de palabras dentro de una frase (véase, por ejemplo, Li, X., Sun, X., Meng, Y., Liang, J., Wu, F., y Li, J. (2019). Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv:1911.02855) ofreciendo resultados superiores a otras funciones de coste. Sin embargo, esta función de coste no parece haberse usado antes en secuencias más allá de este campo. En el presente caso, se trata de la clasificación de instantes temporales dentro de secuencias de vídeo, no de texto. Para esta función de coste se asocia un peso a las categorías de clasificación para darle una mayor importancia a la correcta clasificación de los instantes temporales de sístole y diástole. De esta forma se le asigna un peso mayor al vector de sístole y diástole con respecto al resto de instantes temporales de la secuencia, permitiendo así balancear la función de coste y permitir que la red se centre más en la correcta clasificación de la sístole y la diástole. En cualquier caso, la suma de los pesos ha de ser 1.A cost function via the weighted Dice coefficient metric is also preferably used for network training. This metric is widely used in the realm of image segmentation. Dice's coefficient is usually applied when trying to classify elements within an input with unbalanced categories (see, for example, the pixels within an image to segment, which is the method in which it is commonly used). However, in the case of the present invention, the input is a cine sequence of cardiac magnetic resonance volumes in which the internal elements to be classified are the volumes or time instants within the sequence. Therefore, the application of the Dice coefficient to the specific case of the present invention is not initially evident from the known technique. In this case, the result can be interpreted as a segmentation of a vector, so its use, although unconventional, is equally valid. In the field of sequence classification, the use of this cost function has been described in the field of natural language processing for the classification of words within a sentence (see, for example, Li, X., Sun, X ., Meng, Y., Liang, J., Wu, F., and Li, J. (2019). Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv:1911.02855) offering superior results to other cost functions. However, this cost function does not seem to have been used before in sequences beyond this field. In the present case, it is about the classification of temporal instants within video sequences, not text. For this cost function, a weight is associated with the classification categories to give greater importance to the correct classification of the time instants of systole and diastole. In this way, a greater weight is assigned to the systole and diastole vector with respect to the rest of the time instants of the sequence, thus allowing the cost function to be balanced and allowing the network to focus more on the correct classification of systole and diastole. diastole. In any case, the sum of the weights must be 1.

Una vez que la red neuronal ha sido entrenada, se puede utilizar para detectar automáticamente la sístole y la diástole en nuevas secuencias cine de resonancia magnética cardíaca conforme a la etapa b). En primer lugar, en la subetapa b1) del método según la realización preferida de la presente invención se normaliza la secuencia cine objeto de estudio. A continuación, según la realización preferida, en la subetapa b2) la red neuronal entrenada realiza automáticamente una predicción de probabilidades asociadas a cada instante temporal de la secuencia. Finalmente, en la subetapa b3) del método según la realización preferida se procesan los resultados ofrecidos por la red neuronal en la subetapa b2), con el fin de realizar la correcta detección de la sístole y la diástole. La red neuronal puede ofrecer resultados en los que varios instantes temporales contiguos tienen altas probabilidades de ser sístole (o de igual manera de ser diástole). Para optimizar la clasificación final, preferiblemente, se escogen aquellos instantes temporales con una probabilidad mayor del 90% de ser sístole o diástole, respectivamente; y a continuación, se clasifica como sístole o diástole, respectivamente, tan sólo el instante temporal que ocupa la posición promedio de entre estos instantes temporales con una probabilidad mayor del 90% de ser sístole o diástole, respectivamente. Un ejemplo de este procesamiento se muestra en la figura 3 adjunta. Por tanto, esta última parte de la subetapa b3) de clasificación incluye la aplicación de un umbral (en este caso, una probabilidad mayor del 90%) seguida por la selección del elemento central o promedio. A priori, podría considerarse más evidente y directo para la clasificación final escoger simplemente el instante temporal que tuviera una mayor probabilidad de pertenecer a la categoría de sístole (e igualmente para la diástole). Sin embargo, se comprobó mediante estudios experimentales que las imágenes en las zonas cercanas a la sístole y a la diástole tienden a ser muy similares (es decir, el estado de contracción cardíaca es muy similar entre cortes muy cercanos entre sí en las imágenes de las secuencias cine de resonancia magnética cardíaca). Por tanto, la red neuronal tendía a obtener probabilidades asociadas extremadamente elevadas en las zonas cercanas a la sístole y la diástole, siendo por tanto difícil determinar cuál es el instante temporal más correcto cuando las probabilidades asociadas difieren tan poco.Once the neural network has been trained, it can be used to automatically detect systole and diastole in new cardiac magnetic resonance cine sequences according to step b). Firstly, in substep b1) of the method according to the preferred embodiment of the present invention, the cine sequence under study is normalized. Then, according to the preferred embodiment, in substep b2) the trained neural network automatically makes a prediction of probabilities associated with each time instant of the sequence. Finally, in substep b3) of the method according to the preferred embodiment, the results offered by the neural network in substep b2) are processed, in order to carry out the correct detection of systole and diastole. The neural network can offer results in which several contiguous time instants have a high probability of being systole (or diastole as well). To optimize the final classification, those instants with a greater than 90% probability of being systole or diastole, respectively, are chosen; and then, only the time instant that occupies the average position among these time instants with a greater than 90% probability of being systole or diastole, respectively, is classified as systole or diastole, respectively. An example of this processing is shown in Figure 3 attached. Therefore, this last part of the classification substep b3) includes the application of a threshold (in this case, a probability greater than 90%) followed by the selection of the central or average element. A priori, it could be considered more obvious and direct for the final classification to simply choose the time instant that had a higher probability of belonging to the systole category (and also for diastole). However, it was verified through experimental studies that the images in the areas near systole and diastole tend to be very similar (that is, the state of cardiac contraction is very similar between very close slices in the images of the sequences). cinema of cardiac MRI). Therefore, the neural network tended to obtain extremely high associated probabilities in the areas close to systole and diastole, making it difficult to determine which is the most correct time instant when the associated probabilities differ so little.

Por tanto, en el método dado a conocer en el presente documento se aplica un umbral muy alto de selección inicial para las probabilidades de sístole y diástole respectivamente (tal como se mencionó anteriormente, sólo se seleccionan valores con probabilidad superior al 90%). Esto difiere sustancialmente de otros métodos conocidos en la técnica anterior, en los que se abarca un mayor rango de probabilidades, incluyendo probabilidades de difícil decisión a priori, como por ejemplo, valores cercanos al 50%; lo que puede ocasionar un mayor error en la clasificación.Therefore, in the method disclosed herein a very high initial selection threshold is applied for the probabilities of systole and diastole respectively (as mentioned above, only values with probability greater than 90% are selected). This differs substantially from other methods known in the prior art, in which a greater range of probabilities is covered, including probabilities of difficult a priori decision, such as, for example, values close to 50%; which can cause a greater error in the classification.

En la aplicación específica dada a conocer en el presente documento, el problema radica en que no es posible localizar un pico inequívoco en las probabilidades obtenidas (abarcando un intervalo de alrededor del 95 al 100% en los instantes temporales circundantes al real sin un patrón de pico claro), y por ello se aplica la selección del valor central de entre aquellos valores con probabilidades muy elevadas. La aplicación de este paso final de clasificación se da por la propia naturaleza de las imágenes utilizadas y por el diseño de red neuronal planteado. Se considera que, en un diseño distinto aplicado a otro problema, se podría aplicar la selección del punto de mayor probabilidad como la elección óptima, tal como se divulga en otros métodos conocidos en la técnica, no siendo evidente a priori la selección del elemento central de entre un conjunto de valores con altas probabilidades.In the specific application disclosed in this document, the problem is that it is not possible to locate an unequivocal peak in the obtained probabilities (covering an interval of about 95 to 100% in the time instants surrounding the real one without a pattern of clear peak), and therefore the selection of the central value from among those values with very high probabilities is applied. The application of this final classification step is given by the very nature of the images used and by the proposed neural network design. It is considered that, in a different design applied to another problem, the selection of the point of greatest probability could be applied as the optimal choice, as disclosed in other methods known in the art, the selection of the central element not being evident a priori from a set of values with high probabilities.

A continuación, se describe un ejemplo concreto de un caso práctico de aplicación del método dado a conocer en el presente documento. En este estudio se utilizó una base de datos de secuencias cine de resonancia magnética cardíaca (secuencias cine de entrenamiento) correspondientes a un total de 399 pacientes. Entre dichos pacientes, el número de cortes por volumen era variable entre 8 y 14, y el número de volúmenes era variable entre 14 y 35, siendo la gran mayoría de 35. La resolución temporal era de 52,92 ms.Next, a specific example of a practical application of the method disclosed in this document is described. In this study, a database of cardiac magnetic resonance cine sequences (training cine sequences) corresponding to a total of 399 patients was used. Among these patients, the number of slices per volume varied between 8 and 14, and the number of volumes varied between 14 and 35, with the vast majority being 35. The temporal resolution was 52.92 ms.

Partiendo de que se dispone de dicha base de datos de secuencias cine de resonancia magnética cardíaca, se desea aplicar el método de la presente invención en un entorno clínico, para su uso en el cálculo automático de la sístole y la diástole en nuevas secuencias cine (secuencias cine objeto de estudio).Starting from the fact that said cardiac magnetic resonance cine sequence database is available, it is desired to apply the method of the present invention in a clinical environment, for use in the automatic calculation of systole and diastole in new cine sequences (cine sequences under study).

En primer lugar, en la etapa a):First, in step a):

a1) Se normalizan las secuencias cine de la base de datos disponible.a1) The cine sequences of the available database are normalized.

a2) Se diseña y se implementa en un programa la red neuronal descrita anteriormente.a2) The neural network described above is designed and implemented in a program.

a3) La red neuronal implementada se entrena mediante las secuencias cine normalizadas de la base de datos disponible, según la metodología descrita anteriormente.a3) The implemented neural network is trained using the normalized cine sequences of the available database, according to the methodology described above.

Una vez que se dispone de la red neuronal entrenada, esta se puede usar en el entorno clínico. Para ello, cada vez que se desea detectar la sístole y la diástole de una nueva secuencia cine se aplica la etapa b) según las siguientes subetapas: b1) Se normaliza la secuencia cine objeto de estudio.Once the trained neural network is available, it can be used in the clinical setting. To do this, each time it is desired to detect the systole and diastole of a new cine sequence, step b) is applied according to the following sub-steps: b1) The cine sequence under study is normalized.

b2) Se pasa la secuencia cine normalizada a la red neuronal y ésta proporciona el listado de probabilidades asociadas a los instantes temporales de la secuencia cine. b3) Se aplica el método de clasificación final a las probabilidades obtenidas por la red neuronal para generar la clasificación final.b2) The normalized cine sequence is passed to the neural network and it provides the list of probabilities associated with the temporal instants of the cine sequence. b3) The final classification method is applied to the probabilities obtained by the neural network to generate the final classification.

A continuación, se incluye un estudio comparativo entre los resultados obtenidos mediante la aplicación del método según la realización preferida de la presente invención y los resultados obtenidos mediante otros métodos. Concretamente, en el presente documento se evalúan los resultados directamente sobre 99 casos, en los que hay casos con números de instantes temporales y de volúmenes variables. Se compara usar 4 métodos de entrenamiento y clasificación final:Next, a comparative study between the results obtained by applying the method according to the preferred embodiment of the present invention and the results obtained by other methods is included. Specifically, in this document the results are evaluated directly on 99 cases, in which there are cases with numbers of time instants and variable volumes. It is compared to use 4 training methods and final classification:

- Función de coste con entropía cruzada (función estándar para problemas de clasificación en aprendizaje profundo) y clasificación de los instantes temporales en sístole y diástole en función del que obtuviera la mayor probabilidad asociada (método naive en la tabla siguiente).- Cost function with crossed entropy (standard function for classification problems in deep learning) and classification of the time instants in systole and diastole based on the one that obtained the highest associated probability ( naive method in the following table).

- Función de coste con entropía cruzada y clasificación de los instantes temporales en sístole y diástole utilizando la selección del punto central de alta probabilidad (método promedio en la tabla siguiente).- Cost function with crossed entropy and classification of the temporal instants in systole and diastole using the selection of the central point of high probability (average method in the following table).

- Función de coste con factor de Dice ponderado y clasificación de los instantes temporales en sístole y diástole en función del que obtuviera la mayor probabilidad asociada (método naive en la tabla siguiente).- Cost function with weighted Dice factor and classification of the time instants in systole and diastole based on the one that obtained the highest associated probability ( naive method in the following table).

- Función de coste con factor de Dice ponderado y clasificación de los instantes temporales en sístole y diástole utilizando la selección del punto central de alta probabilidad (método promedio en la tabla siguiente).- Cost function with weighted Dice factor and instant classification Temporal changes in systole and diastole using high probability center point selection (averaging method in the table below).

Como medida comparativa se usa la media de la distancia del instante temporal clasificado con respecto al real para sístole y para diástole.As a comparative measure, the mean of the distance of the classified temporal instant with respect to the real one for systole and for diastole is used.

En la tabla anterior puede observarse que el error en el método propuesto por la presente invención (método de entrenamiento "Dice ponderado” y clasificación final por "método promedio”) es el menor, tanto para la diástole como para la sístole. Se obtiene un error medio de 0 para la diástole y de 1,242 ± 1,45 para la sístole. Si se normalizan los errores de cada caso en función del número de instantes temporales de cada secuencia se obtienen los errores normalizados presentados en la siguiente tabla:In the table above it can be seen that the error in the method proposed by the present invention ("weighted Dice" training method and final classification by "average method") is the smallest, both for diastole and for systole. A mean error of 0 is obtained for diastole and 1.242 ± 1.45 for systole. If the errors of each case are normalized according to the number of time instants of each sequence, the normalized errors presented in the following table are obtained:

Un valor de 1 implicaría que el instante temporal clasificado como sístole (o diástole en su caso) es el más alejado posible del real. Se observa que en el caso de la presente invención la tasa de error es de 0,036 en promedio en la sístole. En diástole, el resultado obtenido con el método de la presente invención es perfecto.A value of 1 would imply that the time instant classified as systole (or diastole in its case) is the furthest possible from the real one. It is observed that in the case of the present invention the error rate is 0.036 on average in systole. In diastole, the result obtained with the method of the present invention is perfect.

A continuación, se presentan algunos de los aspectos novedosos del método de la presente invención, sin pretender que esta lista sea exhaustiva de ninguna manera: - En primer lugar, el método de la invención permite detectar el instante temporal de la sístole y la diástole cardíaca de forma automática en secuencias cine de resonancia magnética cardíaca utilizando los volúmenes de la secuencia al completo.Below are some of the novel aspects of the method of present invention, without claiming that this list is in any way exhaustive: - Firstly, the method of the invention allows the time instant of cardiac systole and diastole to be detected automatically in cardiac magnetic resonance cine sequences using the volumes of the entire sequence.

- Para la detección de la sístole y la diástole se convierte cada volumen de imágenes cardíacas a una única imagen representativa de todo el volumen. Esto permite concretamente incorporar más información de la secuencia temporal y reducir el número de datos a procesar.- For the detection of systole and diastole, each volume of cardiac images is converted to a single image representative of the entire volume. Specifically, this makes it possible to incorporate more information on the time sequence and reduce the number of data to be processed.

- El análisis temporal de la secuencia se implementa mediante algoritmos de aprendizaje profundo (“Deep leaming”), específicamente mediante el uso de convoluciones 3D incluyendo diferentes tamaños de convolución y de dilatación. Esto permite la implementación de una red más sencilla y con un menor número de parámetros, haciendo el proceso más rápido y con unos requisitos de hardware menores.- The temporal analysis of the sequence is implemented through deep learning algorithms ( "Deep leaming"), specifically through the use of 3D convolutions including different sizes of convolution and dilation. This allows the implementation of a simpler network with fewer parameters, making the process faster and with lower hardware requirements.

- El entrenamiento utiliza como función de coste el coeficiente de Dice ponderado en la clasificación de una secuencia de vídeo. Esta función de coste se ha utilizado en aplicaciones de segmentación de imagen, pero no en clasificaciones como en el método propuesto. El uso de esta función de coste ha demostrado ofrecer resultados superiores a otras funciones de coste más tradicionales en el tratamiento de otro tipo de secuencias, concretamente secuencias de texto. En el problema planteado se suelen utilizar otras funciones de coste (siendo el paradigma la entropía cruzada) aunque se ha demostrado que en diversas aplicaciones donde hay un desequilibrio de categorías (como en el presente caso, en el que las categorías de sístole y diástole solo corresponden a un instante temporal, respectivamente, y el resto de la secuencia corresponde a la categoría de fondo) el coeficiente de Dice funciona mejor. De esta forma, el método hace uso de esta función de coste en una aplicación nueva como es la clasificación de instantes temporales en secuencias de vídeo.- The training uses the Dice coefficient weighted in the classification of a video sequence as a cost function. This cost function has been used in image segmentation applications, but not in classifications as in the proposed method. The use of this cost function has shown to offer superior results to other more traditional cost functions in the treatment of other types of sequences, specifically text sequences. In the problem posed, other cost functions are usually used (the cross-entropy being the paradigm), although it has been shown that in various applications where there is an imbalance of categories (as in the present case, in which the systolic and diastolic categories only correspond to a time instant, respectively, and the rest of the sequence corresponds to the background category) the Dice coefficient works better. In this way, the method makes use of this cost function in a new application such as the classification of time instants in video sequences.

Las características y los aspectos novedosos anteriormente mencionados del método según la presente invención hacen que presente diversas ventajas con respecto a alternativas actualmente conocidas en la técnica. Algunas de dichas ventajas ya que han explicado o se desprenden de la descripción anterior, e incluyen, de manera no exhaustiva: The aforementioned features and novel aspects of the method according to the present invention make it present several advantages with respect to alternatives currently known in the art. Some of said advantages have already been explained or are clear from the previous description, and include, in a non-exhaustive manner:

- El método según la presente invención permite el uso de secuencias con volúmenes completos, incorporando toda la información volumétrica de la secuencia.- The method according to the present invention allows the use of sequences with complete volumes, incorporating all the volumetric information of the sequence.

- El método según la presente invención no requiere segmentaciones previas para entrenar la red neuronal.- The method according to the present invention does not require previous segmentations to train the neural network.

- La subetapa de preprocesamiento de las secuencias permite reducir el número de datos que se le pasan a la red y, al mismo tiempo, maximizar la información global disponible. Todo ello hace que el entrenamiento de la red neuronal y la predicción de probabilidades de que un instante temporal dado corresponda a sístole, a diástole o a ninguno de los anteriores, sea más rápido y factible. A su vez el diseño de la red neuronal hace uso de convoluciones dilatadas para la codificación temporal, por lo que reduce el número de parámetros necesarios en contraste con las redes recurrentes. Esto hace que la subetapa de entrenamiento sea más estable y rápida y, además, permite realizar las inferencias a una mayor velocidad. Estas características permiten el uso de hardware con menos prestaciones al requerir un menor número de recursos de memoria. Esto implica el abaratamiento en la implantación de una aplicación que utilice el método dado a conocer en el presente documento, además de que la red resultante se puede ejecutar más rápidamente.- The sequence pre-processing sub-stage allows reducing the number of data passed to the network and, at the same time, maximizing the global information available. All this makes the training of the neural network and the prediction of probabilities that a given time instant corresponds to systole, diastole or none of the above, faster and more feasible. In turn, the design of the neural network makes use of dilated convolutions for temporal encoding, thus reducing the number of necessary parameters in contrast to recurrent networks. This makes the training sub-stage more stable and faster, and also allows for faster inferences. These features allow the use of less powerful hardware by requiring fewer memory resources. This implies a lower cost in the implementation of an application that uses the method disclosed in this document, in addition to the fact that the resulting network can be executed more quickly.

- El diseño de la red permite la detección de la sístole y la diástole en secuencias de duración arbitraria, a diferencia de otros estudios realizados en los que el número de instantes temporales en la secuencia es fijo. Esto amplía la aplicabilidad del método propuesto para diferentes entornos en los que los equipos de imagen por resonancia magnética pueden ofrecer secuencias cine de duración variable.- The design of the network allows the detection of systole and diastole in sequences of arbitrary duration, unlike other studies in which the number of time instants in the sequence is fixed. This broadens the applicability of the proposed method to different environments in which magnetic resonance imaging equipment can offer cine sequences of variable duration.

- El diseño del bloque de convoluciones dilatadas puede ampliarse en función del hardware disponible. El diseño plantea una implementación generalista que puede ser ampliada en función del número de convoluciones y el número de canales de las mismas presentadas en cada bloque. Hay que considerar que la memoria requerida por las tarjetas gráficas empleadas se incrementará en función de estas variaciones.- The dilated convolution block design can be extended depending on the available hardware. The design proposes a generalist implementation that can be extended depending on the number of convolutions and the number of channels of the same presented in each block. It must be considered that the memory required by the graphics cards used will increase depending on these variations.

Tal como se mencionó anteriormente, gracias al método de la presente invención es posible automatizar el proceso de detección de sístole y diástole en secuencias cine de imagen por resonancia magnética cardíaca. Esto permite acelerar el proceso de análisis y diagnóstico de patologías cardiacas, así como reducir la variabilidad del diagnóstico dependiente del usuario.As mentioned above, thanks to the method of the present invention it is possible to automate the process of detecting systole and diastole in cardiac magnetic resonance imaging cine sequences. This allows to accelerate the process of analysis and diagnosis of cardiac pathologies, as well as to reduce the variability of the user-dependent diagnosis.

Sin embargo, el método de la presente invención puede encontrar aplicaciones en otros campos distintos del mencionado anteriormente, para el entrenamiento de una red neuronal para su aplicación en la clasificación de instantes temporales en secuencias de otra tipología diferente de la descrita. Algunos ejemplos pueden ser: clasificación de escenas en vídeos de películas en categorías (escena de humor, escena violenta, escena con contenido apropiado/inapropiado, etc.), clasificación de adquisiciones de imágenes médicas cuando se ha inyectado un contraste en categorías (no presencia de contraste, aumento de presencia de contraste, máxima presencia de contraste, disminución de presencia de contraste, etc.), clasificación de actividad de acciones humanas en un vídeo en sistemas de seguridad (ninguna acción, acción violenta, acción de robo, acción de allanamiento de morada, acción con tenencia de armas, etc.), clasificación en categorías de adquisiciones de secuencias de imágenes médicas de fMRI (“functional magnetic resonance imaging”) para analizar la actividad cerebral en una región del cerebro (no presenta activación cerebral (reposo), activación cerebral baja, activación cerebral alta, etc.). De manera general, el método dado a conocer en el presente documento puede encontrar aplicación en la clasificación de fotogramas o instantes temporales dentro de una secuencia de imágenes en formato de vídeo. Además, permite la detección en secuencias de cualquier duración, así como en secuencias con volúmenes de tamaño arbitrario.However, the method of the present invention can find applications in fields other than the one mentioned above, for the training of a neural network for its application in the classification of time instants in sequences of another typology different from the one described. Some examples may be: classification of scenes in movie videos into categories (humorous scene, violent scene, scene with appropriate/inappropriate content, etc.), classification of acquisitions of medical images when a contrast agent has been injected into categories (no presence of contrast, increase of contrast presence, maximum contrast presence, decrease of contrast presence, etc.), activity classification of human actions in a video in security systems (no action, violent action, robbery action, breaking and entering, action with possession of weapons, etc.), classification in categories of acquisitions of medical image sequences of fMRI ( "functional magnetic resonance imaging") to analyze brain activity in a region of the brain (it does not present brain activation ( rest), low brain activation, high brain activation, etc.). In general, the method disclosed in this document can find application in the classification of frames or time instants within a sequence of images in video format. Furthermore, it allows detection on streams of any length, as well as on streams with volumes of arbitrary size.

Por otro lado, el método de la presente invención puede implementarse con unos requisitos de hardware menores que otros diseños propuestos con el mismo objetivo. Esto hace que su implantación sea económicamente más viable.On the other hand, the method of the present invention can be implemented with lower hardware requirements than other designs proposed for the same purpose. This makes its implementation more economically viable.

Por último, gracias al diseño de red neuronal propuesto en el presente documento, su aplicabilidad puede extenderse a diferentes protocolos de distintos equipos de obtención de imágenes por resonancia magnética en los que la duración y el número de cortes en los volúmenes de la secuencia cine obtenida pueden ser variables. Esto aumenta su aplicabilidad dentro del mercado.Finally, thanks to the neural network design proposed in this document, its applicability can be extended to different protocols of different magnetic resonance imaging equipment in which the duration and number of cuts in the volumes of the cine sequence obtained they can be variables. This increases its applicability within the market.

Habiéndose descrito la presente invención con referencia a un ejemplo de realización preferido de la misma, el experto en la técnica podrá realizar modificaciones y variaciones evidentes a dicho ejemplo de realización sin por ello alejarse del alcance de protección definido por las siguientes reivindicaciones. The present invention having been described with reference to a preferred embodiment of the same, the person skilled in the art will be able to make obvious modifications and variations to said embodiment without thereby departing from the scope of protection defined by the following claims.

Claims

1. Method of automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence by using a purely convolutional neural network, the cine sequence comprising a plurality of time instants each of which corresponds to a volume, each volume comprising at least one cut, the method comprising the steps of: a) preparing the neural network, according to the following substeps:

a1) preprocess one or more cine training sequences to normalize their input to the neural network;

a2) design the neural network including convolutional layers to extract spatial patterns, and dilated convolutions to encode temporal information;

a3) train the neural network using the normalized cine training sequence(s);

b) apply the trained neural network to the automatic detection of systole and diastole in the cardiac magnetic resonance cine sequence under study, following the following sub-steps:

b1) preprocess the cine sequence under study to normalize its input to the neural network;

b2) extract spatial patterns and encode temporal information using the neural network, providing as a result a sequence of 3 values for each time instant, each value indicating the probability that the time instant corresponds to systole, diastole or none of the above;

b3) classifying the time instants to detect a time instant corresponding to systole and a time instant corresponding to diastole.

Method according to claim 1, characterized in that the substeps a1) and a2) can be carried out in any temporal order with respect to one another.

3. Method according to any of the preceding claims, characterized in that substeps a1) and b1) each comprise:

- perform a numerical normalization of the signal of the pixels in each sequence volume;

- normalize the size of the images to a fixed size of plane in terms of the number of pixels;

- reconvert the entire volume to give a single image;

thus generating for each time instant a single representative image of the complete volume, thus obtaining a sequence of images instead of a sequence of volumes.

4. Method according to claim 3, characterized in that reconverting the entire volume to give a single image comprises calculating the median value of each pixel along the z axis of each volume.

5. Method according to any of the preceding claims, characterized in that substeps a2) and b2) each comprise:

- 2D convolutions together with activation functions and grouping operations to extract spatial patterns from the images of the cine sequence and progressively reduce their size;

- a module composed of 3D convolutions, where different convolution and dilation sizes are applied to encode the temporal information of the cine sequence.

6. Method according to any of the preceding claims, characterized in that substep a3) comprises:

- apply data augmentation methodology to generate new modified sequences, which include rotations and translations of the images, as well as adding factors that increase noise in the image;

- use a cost function through the metric of the weighted Dice coefficient for the classification of time instants within the sequence.

7. Method according to claim 6, characterized in that applying data augmentation methodology comprises modifications of the temporal sequence through a delay that modifies the position of the temporal instants within the sequence and therefore the position in which systole and diastole are found.

8. Method according to any of the preceding claims, characterized in that substep b3) comprises:

- choose the time instants with a greater than 90% probability of being systole or diastole, respectively; Y

- classify as systole or diastole, respectively, only the time instant that occupies the average position among these time instants with a probability greater than 90% of being systole or diastole, respectively.