WO2023067212A1 - Method for automatically detecting systole and diastole - Google Patents

Method for automatically detecting systole and diastole Download PDF

Info

Publication number
WO2023067212A1
WO2023067212A1 PCT/ES2022/070645 ES2022070645W WO2023067212A1 WO 2023067212 A1 WO2023067212 A1 WO 2023067212A1 ES 2022070645 W ES2022070645 W ES 2022070645W WO 2023067212 A1 WO2023067212 A1 WO 2023067212A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
diastole
systole
neural network
cine
Prior art date
Application number
PCT/ES2022/070645
Other languages
Spanish (es)
French (fr)
Inventor
David MORATAL PÉREZ
Manuel PÉREZ PELEGRÍ
José Vicente MONMENEU MENADAS
María Pilar LÓPEZ LEREU
José Manuel SANTABÁRBARA GÓMEZ
Alicia M. MACEIRA GONZÁLEZ
Original Assignee
Universitat Politècnica De València
Ecg Médica S.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universitat Politècnica De València, Ecg Médica S.L. filed Critical Universitat Politècnica De València
Publication of WO2023067212A1 publication Critical patent/WO2023067212A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/05Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves 
    • A61B5/055Detecting, measuring or recording for diagnosis by means of electric currents or magnetic fields; Measuring using microwaves or radio waves  involving electronic [EMR] or nuclear [NMR] magnetic resonance, e.g. magnetic resonance imaging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates generally to the field of medicine and more specifically to that of cardiology.
  • the invention specifically relates to a method of automatic detection of systole and diastole by using a pure convolutional neural network in a cardiac magnetic resonance cine sequence.
  • the state of the heart can be characterized through magnetic resonance cine sequences. These sequences make up volumes of the heart at different moments in time, allowing us to see the contraction of the tissues for analysis.
  • the clinician must extract as main parameters the volumes of the ventricles in systole (maximum cardiac contraction) and diastole (state of cardiac relaxation), as well as the ejection fraction derived from the above.
  • the first step in performing this analysis is the detection of systole and diastole in the sequence, which can be time consuming and therefore it is desirable to have an automatic systole and diastole detection method that facilitates and expedite further analysis and diagnosis by the medical professional.
  • neural networks have been used that are trained to segment the left ventricle and then determine the points of systole and diastole by locating the center of the left ventricle with respect to a reference (see, for example, Yang, F. , He, Y., Hussain, M., Xie, H., & Lei, P. (2017).
  • Another method described has used convolutional networks to extract the spatial patterns in the images and then use recurrent networks with LSTM (Long Short Term Memor) layers to encode the information at the temporal level. and finally apply the classification at each point in time in the sequence (see, for example, Kong, B., Zhan, Y., Shin, M., Denny, T., and Zhang, S. (October 2016). Recognizing End-diastole and end-systole frames via deep temporal regression network.In International conference on medical image computing and computer-assisted intervention (pp. 264-272. Springer, Cham). In the last two cases, the studies were carried out using only individual cuts of the cinema sequence and with a fixed number of temporal moments.
  • LSTM Long Short Term Memor
  • the author describes these data with a number of cuts for each case of about 200;
  • the neural network that it uses does not take the complete data, but takes fragments of a few slices and applies the segmentation at the level of the selected image fragment, going from a 3D image fragment (that is, in 3 dimensions) to a 2D (i.e., 2-dimensional) image fragment.
  • This is possible in this case since the images of each slice described are very similar, so applying a segmentation to the average image and then expanding it to the rest of the slices in the block makes sense in this application in which an image average incorporates all volumetric information and at the same time will be extremely similar to all volume slices given the nature of the images.
  • N volumes as input with a temporal relationship to each other (similar to frames in a video) and it is necessary to classify each of the N volumes into the categories of systole, diastole or background, and where only one of the volumes has to be classified as systole and only one of the volumes is to be classified as diastole.
  • Document US10586531 discloses a system for speech recognition based on the use of a neural network that uses dilated convolutions to process the temporal relationship of the sequence of inputs.
  • a neural network that uses dilated convolutions to process the temporal relationship of the sequence of inputs.
  • we are dealing with audio inputs in 1D format ie, we are dealing with a 1-dimensional sequence over time: 1D+t.
  • Dilated convolutions are used in a neural network for information processing with a temporal relationship; however, the application is completely different from that foreseen in the present invention and therefore it is not applicable in this case, since, for example, in the present invention the treatment of a sequence of volumes is required (that is, 3 dimensions over time: 3D+t).
  • Document US8345984B2 discloses a method based on a neural network for the classification of human actions in video sequences (that is, 2 dimensions along of time: 2D+t). For this, they are based on the use of 3D convolutions in the neural network, in such a way that the third component of the convolutions is used to extract the temporal information from the sequence of images in the video.
  • the objective is therefore the classification of an action present in the complete video, the method not being applicable to the classification of each temporal instant of the sequence to determine the specific instant corresponding to an event, such as systole and diastole.
  • Document US10147193 describes a system based on a neural network for the segmentation of objects in images by means of an architecture that combines different levels of 2D dilated convolutions.
  • the present invention solves the aforementioned problem by proposing a method as described in claim 1. Specifically, a computer-implemented method of automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence (also called "cine sequence under study” in this document) by using a purely convolutional neural network, the cine sequence comprising a plurality of time instants each of which corresponds to a volume, each volume comprising at least one cut.
  • the method comprises the steps of: a) preparing the neural network, according to the following sub-steps: a1) preprocessing one or more training cine sequences of cardiac magnetic resonance (also called “training cine sequences" in this document) to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns, and dilated convolutions to encode temporal information; a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns and encode temporal information by means of the neural network, providing as a result a sequence of 3 values for each time instant, each value indicating the probability that the time instant corresponds to systole, diastole, or none of the above; b3) classifying the time points to detect a time point corresponding
  • Figure 1 is a conversion schematic of a cine sequence volume.
  • the entire sequence is made up of several temporal instants, each of which corresponds to a volume of the cardiac region over time (3D+t inputs).
  • the figure shows the sections of one of these volumes.
  • the preprocessing substage transforms each volume into a single image using the median operation applied on the z-axis.
  • the final result is a sequence of images (2D + t) instead of a sequence of volumes.
  • Figure 2 is a schematic of the convolutional neural network used.
  • the first section receives as input a sequence of images of size X x Y x n (where n corresponds to the number of time instants) and applies 2D convolutions and 2D pooling operations to extract spatial information from the images. and reduce its dimension in the 2D plane.
  • the second section corresponds to a block of 3D convolutions with different paths that include different convolution sizes and dilation in the time axis (third dimension of the convolution).Finally, the network applies a convolution with the 2D softmax operation to generate the probabilities associated with each time instant of belonging to each category.
  • Figure 3 is a schematic of the final classification substep used.
  • the time instant finally classified is the one that occupies the central position among those with a probability greater than 90%. In this case, it corresponds to time instant n+4, which in turn corresponds to the real systole of the sequence in the figure.
  • a computer-implemented method for automatic detection of systole (maximal cardiac contraction) and diastole (state of cardiac relaxation) in cardiac magnetic resonance cine sequences through the use of of a purely convolutional neural network.
  • a cardiac magnetic resonance cine sequence comprises a plurality of temporal instants acquired per cardiac cycle, each of which corresponds to a volume (3D image). Therefore, it is a 3D+t sequence.
  • Each of the volumes can feature any number of slices (more or less heart sections).
  • the method according to the preferred embodiment of the present invention comprises the following steps: a) preparing the neural network, according to the following substeps: a1) preprocessing one or several cine training sequences to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns (1), and dilated convolutions to encode temporal information (2); a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns (1) and encode temporal information (2) through the neural network, providing as a result a sequence of 3 values for each time instant (3), each value indicating the probability that the time instant corresponds to systole, to diastole or none of the above; b3) classifying the time points to detect a time point corresponding to sy
  • substeps a1) and a2) can be performed in any temporal order with respect to each other. That is, substep a1) can be performed first followed by substep a2), substep a2) can be performed followed by substep a1), or both substep a1) and substep a2) can be performed simultaneously or substantially simultaneously, without thereby altering the result of the method disclosed in this document.
  • z-axis is used to refer to the axis along which the cut or cuts (corresponding to respective sections of the heart) that make up a volume are presented orthogonally.
  • each of the substeps a1) and b1) include, firstly, the numerical normalization of the signal of the pixels in each volume of the corresponding cine sequence (training cine sequence in substep a1 , and film sequence studied in substage b1), since the value may vary depending on the machine used to obtain it. Secondly, the size of the images is normalized to a fixed plane size in terms of the number of pixels. Finally, the entire volume is converted to give a single image.
  • the median has the ability to be insensitive to the presence of outliers to difference from the mean.
  • the average value is probably more appropriate than the median, since the average is done between cuts that are very similar to each other.
  • it is treated with cardiac image volumes in which there is a notable difference between slices, since the tissue can vary significantly in some regions of the heart with respect to others.
  • the objective of averaging is to try to obtain a global representation of the contraction state of the tissue that is as unaltered as possible, so the suppression of aberrant data from the z axis of the volume is an important factor in order to avoid possible artifacts in the resulting image.
  • the median function is used, since the application of other averaging functions such as the average (as described in the aforementioned article) in the method of the present invention could produce the presence of artifacts and aberrations in the resulting images, given their nature.
  • the network design used includes 2D convolutions together with activation functions and pooling operations to extract spatial patterns (1) from the cine sequence images and progressively reduce their size. includes a module composed of 3D convolutions, where different convolution and dilation sizes are applied to encode the temporal information (2) of the cine sequence.Thus, according to the preferred embodiment of the present invention, convolutions with factor of dilation only in the temporal axis for the extraction of temporal information, unlike previously known techniques such as those disclosed in the articles “A Dilated CNN Model for Image Classification” and “MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis” mentioned above, in which dilated convolutions are applied for the extraction of spatial information.
  • the main characteristic of convolutions with dilation factor is that they allow to increase the field of view of the operation without increasing the number of parameters.
  • This method of dealing with temporal sequences has the advantage of requiring less memory consumption by the hardware and of being easier to train compared to the more widespread recurrent networks such as LSTMs.
  • the final result offered by the network is a sequence of 3 values for each time instant (3), where each value indicates the probability that the time instant corresponds to systole, diastole, or none of the above.
  • the network training substep (substep a3) comprises applying data augmentation methodology to generate new modified sequences, which include rotations and translations of the images, as well as adding factors that increase the noise in the image.
  • the they include modifications of the temporal sequence through a delay that modifies the position of the temporal instants within the sequence and, therefore, the position in which the systole and diastole are located is modified.
  • Dice's coefficient is usually applied when trying to classify elements within an input with unbalanced categories (see, for example, the pixels within an image to segment, which is the method in which it is commonly used).
  • the input is a cine sequence of cardiac magnetic resonance volumes in which the internal elements to be classified are the volumes or time instants within the sequence. Therefore, the application of Dice's coefficient to the specific case of the present invention is not, in principle, evident from the known technique. In this case, the result can be interpreted as a segmentation of a vector, so its use, although unconventional, is equally valid.
  • a weight is associated with the classification categories to give greater importance to the correct classification of the systole and diastole temporal moments. In this way, a greater weight is assigned to the systole and diastole vector with respect to the rest of the temporal moments of the sequence, thus allowing the cost function to be balanced and allowing the network to focus more on the correct classification of systole and diastole. In any case, the sum of the weights must be 1.
  • the neural network can be used to automatically detect systole and diastole in new cardiac MRI cine sequences according to step b).
  • the cine sequence object of study the cine sequence object of study.
  • the trained neural network automatically makes a prediction of probabilities associated with each time instant of the sequence.
  • the results offered by the neural network in substep b2) are processed in order to carry out the correct detection of systole and diastole.
  • the neural network can offer results in which several contiguous moments in time have a high probability of being systole (or equally of being diastole).
  • those moments in time with a greater than 90% probability of being systole or diastole, respectively are preferably chosen; and then, only the time instant that occupies the average position among these time instants with a greater than 90% probability of being systole or diastole, respectively, is classified as systole or diastole, respectively.
  • An example of this processing is shown in the accompanying figure 3 .
  • this last part of the classification substep b3) includes the application of a threshold (in this case, a probability greater than 90%) followed by the selection of the central or average element.
  • a threshold in this case, a probability greater than 90%
  • the neural network tended to obtain extremely high associated probabilities in the areas close to systole and diastole, making it difficult to determine the most correct point in time when the associated probabilities differ so little.
  • a very high initial selection threshold is applied for the probabilities of systole and diastole respectively (as mentioned above, only values with probability greater than 90%) are selected. This differs substantially from other methods known in the prior art, in which a greater range of probabilities is covered, including probabilities that are difficult to decide a priori, such as values close to 50%; which can cause a greater error in the classification.
  • the problem is that it is not possible to locate an unambiguous peak in the obtained probabilities (covering an interval of about 95 to 100% in the time instants surrounding the real one without a pattern of clear peak), and therefore the selection of the central value is applied among those values with very high probabilities.
  • the application of this final classification step is given by the very nature of the images used and by the proposed neural network design. It is considered that, in a different design applied to another problem, the selection of the point with the highest probability could be applied as the optimal choice, as disclosed in other methods known in the art, the selection of the central element not being evident a priori. from a set of values with high probabilities.
  • stage b) is applied according to the following substages: b1)
  • the cine sequence under study is normalized.
  • the normalized cine sequence is passed to the neural network and this provides the list of probabilities associated with the temporal instants of the cine sequence.
  • the final classification method is applied to the probabilities obtained by the neural network to generate the final classification.
  • the mean of the distance of the classified temporal instant with respect to the real one for systole and diastole is used.
  • the method of the invention makes it possible to automatically detect the temporal instant of cardiac systole and diastole in cardiac magnetic resonance cine sequences using the entire sequence volumes.
  • each volume of cardiac images is converted to a single representative image of the entire volume. This specifically makes it possible to incorporate more time sequence information and reduce the number of data to be processed.
  • the temporal analysis of the sequence is implemented through deep learning algorithms, specifically through the use of 3D convolutions including different sizes of convolution and dilation. This allows the implementation of a simpler network with less number of parameters, making the process faster and with lower hardware requirements.
  • the training uses the weighted Dice coefficient as a cost function in the classification of a video sequence.
  • This cost function has been used in image segmentation applications, but not in classifications as in the proposed method.
  • the use of this cost function has been shown to offer better results than other more traditional cost functions in the treatment of other types of sequences, specifically text sequences.
  • other cost functions are usually used (crossed entropy being the paradigm) although it has been shown that in various applications where there is an imbalance of categories (as in the present case, in which the categories of systole and diastole only correspond to a time instant, respectively, and the rest of the sequence corresponds to the background category) the Dice coefficient works better.
  • the method makes use of this cost function in a new application such as the classification of temporal instants in video sequences.
  • the method according to the present invention allows the use of sequences with complete volumes, incorporating all the volumetric information of the sequence.
  • the method according to the present invention does not require previous segmentations to train the neural network.
  • the sequence preprocessing substage allows to reduce the number of data passed to the network and, at the same time, to maximize the global information available. All of this makes training the neural network and predicting the probabilities that a given time instant corresponds to systole, diastole, or none of the above, faster and more feasible.
  • the design of the neural network makes use of dilated convolutions for temporal coding, thus reducing the number of necessary parameters in contrast to recurrent networks.
  • This makes the training substage more stable and faster, and also allows inferences to be made at a higher speed.
  • the design of the network allows the detection of systole and diastole in sequences of arbitrary duration, unlike other studies carried out in which the number of time points in the sequence is fixed. This broadens the applicability of the proposed method for different environments in which magnetic resonance imaging equipment can offer cine sequences of variable duration.
  • the dilated convolution block design can be extended depending on the available hardware.
  • the design proposes a general implementation that can be extended depending on the number of convolutions and the number of channels of the same presented in each block. It must be considered that the memory required by the graphics cards used will increase based on these variations.
  • the method of the present invention can find applications in other fields than the one mentioned above, for the training of a neural network for its application in the classification of temporal instants in sequences of another type. different from the one described.
  • Some examples may be: classification of scenes in movie videos into categories (humorous scene, violent scene, scene with appropriate/inappropriate content, etc.), classification of medical imaging acquisitions when contrast has been injected into categories (non-presence contrast, increased presence of contrast, maximum presence of contrast, decreased presence of contrast, etc.), activity classification of human actions in a video in security systems (no action, violent action, robbery action, action of burglary, action with possession of weapons, etc.), classification into categories of acquisitions of fMRI medical image sequences ("functional magnetic resonance imaging") to analyze brain activity in a region of the brain (it does not present brain activation ( rest), low brain activation, high brain activation, etc.)
  • the method disclosed in the present document can find application in the classification of frames or temporal moments within a sequence of images in video format.
  • the method of the present invention can be implemented with lower hardware requirements than other designs proposed for the same purpose. This makes its implementation economically more viable.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Radiology & Medical Imaging (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • High Energy & Nuclear Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Pathology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)
  • Electrotherapy Devices (AREA)

Abstract

The invention discloses a method for automatically detecting systole and diastole in a cardiac magnetic resonance cine sequence using a purely convolutional neural network, the method comprising a step of preparing the network by preprocessing training cine sequences to normalise their input into the network, designing the network with convolutional layers to extract spatial patterns (1) and dilated convolutions to encode time information (2), and training the network; and a step of applying the trained network to detect systole and diastole in the cine sequence under study by preprocessing the sequence to normalise its input into the network, extracting spatial patterns (1) and encoding time information (2) using the network, providing three values for each instant (3) which indicate the probability that it corresponds to systole, diastole or none of them, and classifying the time instants into systole and diastole.

Description

DESCRIPCIÓN DESCRIPTION
Método de detección automática de sístole y diástole Automatic systole and diastole detection method
CAMPO DE LA INVENCIÓN FIELD OF THE INVENTION
La presente invención se refiere de manera general al campo de la medicina y más concretamente al de la cardiología. La invención se refiere específicamente a un método de detección automática de sístole y diástole mediante el uso de una red neuronal puramente convolucional en una secuencia cine de resonancia magnética cardíaca. The present invention relates generally to the field of medicine and more specifically to that of cardiology. The invention specifically relates to a method of automatic detection of systole and diastole by using a pure convolutional neural network in a cardiac magnetic resonance cine sequence.
ANTECEDENTES DE LA INVENCIÓN BACKGROUND OF THE INVENTION
En el ámbito clínico se puede caracterizar el estado del corazón a través de secuencias cine de resonancia magnética. Estas secuencias conforman volúmenes del corazón en distintos instantes temporales permitiendo ver la contracción de los tejidos para su análisis. Para una correcta caracterización el profesional clínico deberá extraer como principales parámetros los volúmenes de los ventrículos en sístole (máxima contracción cardíaca) y diástole (estado de relajación cardíaca), así como la fracción de eyección que se deriva de los anteriores. Sin embargo, el primer paso para realizar este análisis es la detección de la sístole y la diástole en la secuencia, lo cual puede requerir mucho tiempo y por tanto es deseable disponer de un método de detección automática de la sístole y la diástole que facilite y agilice el análisis y diagnóstico posteriores por parte del profesional médico. In the clinical field, the state of the heart can be characterized through magnetic resonance cine sequences. These sequences make up volumes of the heart at different moments in time, allowing us to see the contraction of the tissues for analysis. For a correct characterization, the clinician must extract as main parameters the volumes of the ventricles in systole (maximum cardiac contraction) and diastole (state of cardiac relaxation), as well as the ejection fraction derived from the above. However, the first step in performing this analysis is the detection of systole and diastole in the sequence, which can be time consuming and therefore it is desirable to have an automatic systole and diastole detection method that facilitates and expedite further analysis and diagnosis by the medical professional.
Se han realizado pocos estudios describiendo métodos que permitan automatizar la detección de la sístole y la diástole en este tipo de secuencias. Algunos casos han propuesto la segmentación del ventrículo izquierdo a través de redes neuronales en todos los instantes temporales (también denominados “frames") y derivar la localización en función del volumen de cada uno de ellos (véase, por ejemplo, Hsin, C., y Danner, C. (2016). Convolutional Neural Networks for Left Ventricle Volume Estimation). Este método tiene el inconveniente de requerir la segmentación previa en todos los instantes temporales para poder entrenar la red neuronal. Few studies have been carried out describing methods that allow automating the detection of systole and diastole in this type of sequences. Some cases have proposed the segmentation of the left ventricle through neural networks at all time points (also called "frames") and derive the location based on the volume of each one of them (see, for example, Hsin, C., and Danner, C. (2016). Convolutional Neural Networks for Left Ventricle Volume Estimation.) This method has the drawback of requiring prior segmentation at all time points in order to train the neural network.
En otro caso se han utilizado redes neuronales que se entrenan para segmentar el ventrículo izquierdo y, a continuación, determinar los puntos de sístole y diástole mediante la localización del centro del ventrículo izquierdo respecto a una referencia (véase, por ejemplo, Yang, F., He, Y., Hussain, M., Xie, H., y Lei, P. (2017). Convolutional neural network for the detection of end-diastole and end-systole frames in free-breathing cardiac magnetic resonance imaging. Computational and mathematical methods in medicine, ID de artículo 1640835, 2017). Finalmente, otro método descrito ha utilizado redes convolucionales para extraer los patrones espaciales en las imágenes y, a continuación, utilizar redes recurrentes con capas de LSTM (“Long Short Term Memor , memoria a corto/largo plazo) para codificar la información a nivel temporal y finalmente aplicar la clasificación en cada instante temporal de la secuencia (véase, por ejemplo, Kong, B., Zhan, Y., Shin, M., Denny, T., y Zhang, S. (octubre de 2016). Recognizing end-diastole and end-systole frames via deep temporal regression network. In International conference on medical image computing and computer- assisted intervention (págs. 264-272). Springer, Cham). En los últimos dos casos los estudios se realizaban utilizando únicamente cortes individuales de la secuencia cine y con un número fijo de instantes temporales. In another case, neural networks have been used that are trained to segment the left ventricle and then determine the points of systole and diastole by locating the center of the left ventricle with respect to a reference (see, for example, Yang, F. , He, Y., Hussain, M., Xie, H., & Lei, P. (2017). Convolutional neural network for the detection of end-diastole and end-systole frames in free-breathing cardiac magnetic resonance imaging. Computational and mathematical methods in medicine, Article ID 1640835, 2017). Finally, another method described has used convolutional networks to extract the spatial patterns in the images and then use recurrent networks with LSTM (Long Short Term Memor) layers to encode the information at the temporal level. and finally apply the classification at each point in time in the sequence (see, for example, Kong, B., Zhan, Y., Shin, M., Denny, T., and Zhang, S. (October 2016). Recognizing End-diastole and end-systole frames via deep temporal regression network.In International conference on medical image computing and computer-assisted intervention (pp. 264-272. Springer, Cham). In the last two cases, the studies were carried out using only individual cuts of the cinema sequence and with a fixed number of temporal moments.
El artículo “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master’s thesis in Mathematical Statistics, Skárberg, Fredrik, Department of Mathematical Sciences. UNIVERSITY OF GOTHENBURG), describe un método para promediar inputs (datos de entrada) de volumen a Io largo del eje z en una aplicación de segmentación sobre imágenes de microscopía de alta resolución. El conjunto de datos consiste en imágenes en 3D de grandes dimensiones que se quieren segmentar. Concretamente, el autor describe estos datos con un número de cortes por cada caso de unos 200; sin embargo, la red neuronal que utiliza no coge el dato al completo, sino que coge fragmentos de unos pocos cortes y aplica la segmentación a nivel del fragmento de la imagen escogida, pasando de un fragmento de imagen en 3D (es decir, en 3 dimensiones) a un fragmento de imagen en 2D (es decir, en 2 dimensiones). Esto es posible en ese caso dado que las imágenes de cada corte descritas son de una similitud muy elevada, por lo que aplicar una segmentación a la imagen promedio y después expandirla al resto de cortes del bloque tiene lógica en esta aplicación en la que una imagen promedio incorpora toda la información volumétrica y al mismo tiempo será extremadamente similar a todos los cortes del volumen dada la naturaleza de las imágenes. Sin embargo, esta técnica no sería aplicable a la detección automática de sístole y diástole en una secuencia cine de resonancia magnética cardíaca, ya que, por ejemplo, debido a la propia naturaleza del corazón (el tejido del corazón puede vahar notablemente entre unas regiones y otras), la imagen de un corte puede ser muy distinta en función de la región en la que se localiza dicho corte. The article “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master's thesis in Mathematical Statistics, Skárberg, Fredrik, Department of Mathematical Sciences. UNIVERSITY OF GOTHENBURG), describes a method for averaging inputs from volume along the z axis in a segmentation application on high resolution microscopy images. The data set consists of large 3D images that are to be segmented. Specifically, the author describes these data with a number of cuts for each case of about 200; However, the neural network that it uses does not take the complete data, but takes fragments of a few slices and applies the segmentation at the level of the selected image fragment, going from a 3D image fragment (that is, in 3 dimensions) to a 2D (i.e., 2-dimensional) image fragment. This is possible in this case since the images of each slice described are very similar, so applying a segmentation to the average image and then expanding it to the rest of the slices in the block makes sense in this application in which an image average incorporates all volumetric information and at the same time will be extremely similar to all volume slices given the nature of the images. However, this technique would not be applicable to the automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence, since, for example, due to the very nature of the heart (heart tissue can vary markedly between some regions and others), the image of a cut can be very different depending on the region in which said cut is located.
En los artículos de X. Lei, H. Pan and X. Huang (“A Dilated CNN Model for Image Classification”, en IEEE Access, vol. 7, págs. 124087-124095, 2019, doi: 10.1109/ACCESS.2019.2927169) y de Ming Li, Chengjia Wang, Heye Zhang, Guang Yang (“MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis”, Computers in Biology and Medicine, volumen 120, 2020) se da a conocer la aplicación de convoluciones dilatadas específicamente para la extracción de información espacial. Concretamente en la segunda referencia queda descrito un bloque inicial en el que se aplican las convoluciones dilatadas para la extracción de patrones espaciales y para ayudar en la segmentación, y a lo largo del texto se menciona que dicho módulo permite la extracción de patrones espaciales de cada instante temporal. Posteriormente se describe otro módulo en el que se extrae la información temporal mediante capas LSTM. Todo esto queda resumido en el apartado de discusión del artículo, en el que se menciona que “el entramado propuesto se basa en un enfoque híbrido de convolución densa dilatada piramidal para extracción precisa de características espaciales, convolución jerárquica con unidades recurrentes de LSTM para la recuperación de información temporal, (...)”. In the articles by X. Lei, H. Pan and X. Huang (“A Dilated CNN Model for Image Classification”, in IEEE Access, vol. 7, pp. 124087-124095, 2019, doi: 10.1109/ACCESS.2019.2927169) and Ming Li, Chengjia Wang, Heye Zhang, Guang Yang (“MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis,” Computers in Biology and Medicine, vol. 120, 2020) the application of dilated convolutions specifically for the extraction of spatial information is disclosed. Specifically, in the second reference, an initial block is described in which the dilated convolutions are applied for the extraction of spatial patterns and to help in segmentation, and throughout the text it is mentioned that said module allows the extraction of spatial patterns at each instant. temporary. Subsequently, another module is described in which temporal information is extracted through LSTM layers. All this is summarized in the discussion section of the article, in which it is mentioned that "the proposed framework is based on a hybrid approach of pyramidal dilated dense convolution for precise extraction of spatial features, hierarchical convolution with recurrent units of LSTM for the recovery of temporary information, (...)”.
En los artículos mencionados en el párrafo anterior, el problema abordado es el de la clasificación de imágenes, en el que se dispone de una imagen para clasificar (como ejemplo, en la primera referencia mencionada “A Dilated CNN Model for Image Classification” se dispone de imágenes de números del 0 al 9 escritos a mano y se quieren clasificar en función del número al que hacen referencia). Sin embargo, esta técnica no puede aplicarse directamente al problema de clasificar imágenes volumétricas (imágenes en 3D) dentro de una secuencia temporal de imágenes volumétricas (concretamente en secuencias de cine de resonancia magnética cardiaca), dado que en este caso se dispone de una secuencia de N volúmenes como input con una relación temporal entre sí (similar a los fotogramas en un vídeo) y es necesario clasificar cada uno de los N volúmenes en las categorías de sístole, diástole o fondo, y donde sólo uno de los volúmenes ha de clasificarse como sístole y sólo uno de los volúmenes ha de clasificarse como diástole. In the articles mentioned in the previous paragraph, the problem addressed is that of image classification, in which an image is available to classify (as an example, in the first mentioned reference "A Dilated CNN Model for Image Classification" it is provided of images of numbers from 0 to 9 written by hand and they want to be classified according to the number to which they refer). However, this technique cannot be directly applied to the problem of classifying volumetric images (3D images) within a temporal sequence of volumetric images (specifically in cardiac magnetic resonance cine sequences), since in this case a sequence is available. of N volumes as input with a temporal relationship to each other (similar to frames in a video) and it is necessary to classify each of the N volumes into the categories of systole, diastole or background, and where only one of the volumes has to be classified as systole and only one of the volumes is to be classified as diastole.
El documento US10586531 da a conocer un sistema para el reconocimiento de voz basado en el uso de una red neuronal que utiliza convoluciones dilatadas para procesar la relación temporal de la secuencia de inputs. En este caso se trata de inputs de audio en formato 1 D (es decir, se trata de una secuencia de 1 dimensión a lo largo del tiempo: 1D+t). Se hace uso de las convoluciones dilatadas en una red neuronal para tratamiento de información con una relación temporal; sin embargo, la aplicación es completamente diferente de la prevista en la presente invención y por tanto no es aplicable en este caso, dado que, por ejemplo, en la presente invención se requiere el tratamiento de una secuencia de volúmenes (es decir, 3 dimensiones a lo largo del tiempo: 3D+t). Document US10586531 discloses a system for speech recognition based on the use of a neural network that uses dilated convolutions to process the temporal relationship of the sequence of inputs. In this case, we are dealing with audio inputs in 1D format (ie, we are dealing with a 1-dimensional sequence over time: 1D+t). Dilated convolutions are used in a neural network for information processing with a temporal relationship; however, the application is completely different from that foreseen in the present invention and therefore it is not applicable in this case, since, for example, in the present invention the treatment of a sequence of volumes is required (that is, 3 dimensions over time: 3D+t).
El documento US8345984B2 da a conocer un método basado en una red neuronal para la clasificación de acciones humanas en secuencias de vídeo (es decir, 2 dimensiones a lo largo del tiempo: 2D+t). Para ello se basan en la utilización de convoluciones 3D en la red neuronal, de tal forma que la tercera componente de las convoluciones es utilizada para extraer la información temporal de la secuencia de imágenes en el vídeo. El objetivo es por tanto la clasificación de una acción presente en el vídeo completo, no siendo el método aplicable a la clasificación de cada instante temporal de la secuencia para determinar el instante específico correspondiente a un acontecimiento, tal como la sístole y la diástole. Document US8345984B2 discloses a method based on a neural network for the classification of human actions in video sequences (that is, 2 dimensions along of time: 2D+t). For this, they are based on the use of 3D convolutions in the neural network, in such a way that the third component of the convolutions is used to extract the temporal information from the sequence of images in the video. The objective is therefore the classification of an action present in the complete video, the method not being applicable to the classification of each temporal instant of the sequence to determine the specific instant corresponding to an event, such as systole and diastole.
El documento US10147193 describe un sistema basado en una red neuronal para la segmentación de objetos en imágenes mediante una arquitectura que combina diferentes niveles de convoluciones dilatadas en 2D. Document US10147193 describes a system based on a neural network for the segmentation of objects in images by means of an architecture that combines different levels of 2D dilated convolutions.
Por tanto, sigue siendo deseable concebir un método que permita detectar automáticamente la sístole y la diástole en una secuencia cine usando los volúmenes al completo de dicha secuencia, comprendiendo la secuencia un número variable de instantes temporales, y comprendiendo los volúmenes un número variable de cortes. Therefore, it remains desirable to devise a method that allows automatically detecting systole and diastole in a cine sequence using the entire volumes of said sequence, the sequence comprising a variable number of time instants, and the volumes comprising a variable number of slices. .
SUMARIO DE LA INVENCIÓN SUMMARY OF THE INVENTION
La presente invención resuelve el problema anteriormente mencionado proponiendo un método tal como se describe en la reivindicación 1. En concreto, se da a conocer un método implementado por ordenador de detección automática de sístole y diástole en una secuencia cine de resonancia magnética cardíaca (también denominada “secuencia cine objeto de estudio” en el presente documento) mediante el uso de una red neuronal puramente convolucional, comprendiendo la secuencia cine una pluralidad de instantes temporales cada uno de los cuales corresponde a un volumen, comprendiendo cada volumen al menos un corte. El método comprende las etapas de: a) preparar la red neuronal, según las siguientes subetapas: a1) preprocesar una o vahas secuencias cine de resonancia magnética cardíaca de entrenamiento (también denominadas “secuencias cine de entrenamiento” en el presente documento) para normalizar su entrada a la red neuronal; a2) diseñar la red neuronal incluyendo capas convolucionales para extraer patrones espaciales, y convoluciones dilatadas para codificar la información temporal; a3) entrenar la red neuronal mediante la secuencia o secuencias cine de entrenamiento normalizadas; b) aplicar la red neuronal entrenada a la detección automática de sístole y diástole en la secuencia cine objeto de estudio, siguiendo las siguientes subetapas: b1) preprocesar la secuencia cine objeto de estudio para normalizar su entrada a la red neuronal; b2) extraer patrones espaciales y codificar información temporal mediante la red neuronal, proporcionando como resultado una secuencia de 3 valores para cada instante temporal, indicando cada valor la probabilidad de que el instante temporal corresponda a sístole, a diástole o a ninguno de los anteriores; b3) clasificar los instantes temporales para detectar un instante temporal correspondiente a la sístole y un instante temporal correspondiente a la diástole. The present invention solves the aforementioned problem by proposing a method as described in claim 1. Specifically, a computer-implemented method of automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence (also called "cine sequence under study" in this document) by using a purely convolutional neural network, the cine sequence comprising a plurality of time instants each of which corresponds to a volume, each volume comprising at least one cut. The method comprises the steps of: a) preparing the neural network, according to the following sub-steps: a1) preprocessing one or more training cine sequences of cardiac magnetic resonance (also called "training cine sequences" in this document) to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns, and dilated convolutions to encode temporal information; a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns and encode temporal information by means of the neural network, providing as a result a sequence of 3 values for each time instant, each value indicating the probability that the time instant corresponds to systole, diastole, or none of the above; b3) classifying the time points to detect a time point corresponding to systole and a time point corresponding to diastole.
En las reivindicaciones dependientes se describen realizaciones preferidas de la presente invención. Preferred embodiments of the present invention are described in the dependent claims.
BREVE DESCRIPCIÓN DE LOS DIBUJOS BRIEF DESCRIPTION OF THE DRAWINGS
A continuación, se describirá una realización preferida y no limitativa de la presente invención haciendo referencia a las siguientes figuras: Next, a preferred and non-limiting embodiment of the present invention will be described with reference to the following figures:
La figura 1 es un esquema de conversión de un volumen de secuencia cine. La secuencia al completo está conformada por varios instantes temporales, a cada uno de los cuales le corresponde un volumen de la región cardíaca a lo largo del tiempo (inputs en 3D+t). En la figura se presentan los cortes de uno de estos volúmenes. La subetapa de preprocesado transforma cada volumen en una única imagen mediante la operación mediana aplicada en el eje z. El resultado final es una secuencia de imágenes (2D + t) en lugar de una secuencia de volúmenes. Figure 1 is a conversion schematic of a cine sequence volume. The entire sequence is made up of several temporal instants, each of which corresponds to a volume of the cardiac region over time (3D+t inputs). The figure shows the sections of one of these volumes. The preprocessing substage transforms each volume into a single image using the median operation applied on the z-axis. The final result is a sequence of images (2D + t) instead of a sequence of volumes.
La figura 2 es un esquema de la red neuronal convolucional utilizada. La primera sección recibe como input una secuencia de imágenes de tamaño X x Y x n (donde n corresponde al número de instantes temporales) y aplica convoluciones en 2D y operaciones de agrupación (también denominado “pooling") 2D para extraer información espacial de las imágenes y reducir su dimensión en el plano 2D. La segunda sección corresponde a un bloque de convoluciones 3D con diferentes caminos que incluyen diferentes tamaños de convolution y dilatación en el eje temporal (tercera dimensión de la convolution). Finalmente, la red aplica una convolution con la operación softmax en 2D para generar las probabilidades asociadas a cada instante temporal de pertenecer a cada categoría. Figure 2 is a schematic of the convolutional neural network used. The first section receives as input a sequence of images of size X x Y x n (where n corresponds to the number of time instants) and applies 2D convolutions and 2D pooling operations to extract spatial information from the images. and reduce its dimension in the 2D plane.The second section corresponds to a block of 3D convolutions with different paths that include different convolution sizes and dilation in the time axis (third dimension of the convolution).Finally, the network applies a convolution with the 2D softmax operation to generate the probabilities associated with each time instant of belonging to each category.
La figura 3 es un esquema de la subetapa de clasificación final utilizada. En la figura aparecen reflejados varios instantes temporales cercanos a la sístole junto a las probabilidades asociadas de ser sístole de los mismos. El instante temporal finalmente clasificado es el que ocupa la posición central de entre los que presentan una probabilidad superior al 90%. En este caso corresponde al instante temporal n+4 que a su vez corresponde a la sístole real de la secuencia de la figura. Figure 3 is a schematic of the final classification substep used. In the figure, several moments in time close to systole are reflected together with the associated probabilities of being systole. The time instant finally classified is the one that occupies the central position among those with a probability greater than 90%. In this case, it corresponds to time instant n+4, which in turn corresponds to the real systole of the sequence in the figure.
DESCRIPCIÓN DETALLADA DE LAS REALIZACIONES PREFERIDAS DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Según la realización preferida de la presente invención, se da a conocer un método implementado por ordenador para la detección automática de la sístole (máxima contracción cardíaca) y la diástole (estado de relajación cardíaca) en secuencias cine de resonancia magnética cardíaca a través del uso de una red neuronal puramente convolucional. Una secuencia cine de resonancia magnética cardíaca comprende una pluralidad de instantes temporales adquiridos por ciclo cardíaco, cada uno de los cuales corresponde a un volumen (imagen en 3D). Por tanto, se trata de una secuencia en 3D+t. Cada uno de los volúmenes puede presentar cualquier número de cortes (más o menos secciones del corazón). According to the preferred embodiment of the present invention, a computer-implemented method for automatic detection of systole (maximal cardiac contraction) and diastole (state of cardiac relaxation) in cardiac magnetic resonance cine sequences through the use of of a purely convolutional neural network. A cardiac magnetic resonance cine sequence comprises a plurality of temporal instants acquired per cardiac cycle, each of which corresponds to a volume (3D image). Therefore, it is a 3D+t sequence. Each of the volumes can feature any number of slices (more or less heart sections).
De manera general, el método según la realización preferida de la presente invención comprende las siguientes etapas: a) preparar la red neuronal, según las siguientes subetapas: a1) preprocesar una o vahas secuencias cine de entrenamiento para normalizar su entrada a la red neuronal; a2) diseñar la red neuronal incluyendo capas convolucionales para extraer patrones espaciales (1), y convoluciones dilatadas para codificar la información temporal (2); a3) entrenar la red neuronal mediante la secuencia o secuencias cine de entrenamiento normalizadas; b) aplicar la red neuronal entrenada a la detección automática de sístole y diástole en la secuencia cine objeto de estudio, siguiendo las siguientes subetapas: b1) preprocesar la secuencia cine objeto de estudio para normalizar su entrada a la red neuronal; b2) extraer patrones espaciales (1) y codificar información temporal (2) mediante la red neuronal, proporcionando como resultado una secuencia de 3 valores para cada instante temporal (3), indicando cada valor la probabilidad de que el instante temporal corresponda a sístole, a diástole o a ninguno de los anteriores; b3) clasificar los instantes temporales para detectar un instante temporal correspondiente a la sístole y un instante temporal correspondiente a la diástole. In general, the method according to the preferred embodiment of the present invention comprises the following steps: a) preparing the neural network, according to the following substeps: a1) preprocessing one or several cine training sequences to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns (1), and dilated convolutions to encode temporal information (2); a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns (1) and encode temporal information (2) through the neural network, providing as a result a sequence of 3 values for each time instant (3), each value indicating the probability that the time instant corresponds to systole, to diastole or none of the above; b3) classifying the time points to detect a time point corresponding to systole and a time point corresponding to diastole.
Según la realización preferida de la presente invención, las subetapas a1) y a2) pueden realizarse en cualquier orden temporal una con respecto a la otra. Es decir, puede realizarse en primer lugar la subetapa a1) seguida por la subetapa a2), puede realizarse la subetapa a2) seguida por la subetapa a1), o pueden realizarse tanto la subetapa a1) como la subetapa a2) de manera simultánea o sustancialmente simultánea, sin por ello alterar el resultado del método dado a conocer en el presente documento. According to the preferred embodiment of the present invention, substeps a1) and a2) can be performed in any temporal order with respect to each other. That is, substep a1) can be performed first followed by substep a2), substep a2) can be performed followed by substep a1), or both substep a1) and substep a2) can be performed simultaneously or substantially simultaneously, without thereby altering the result of the method disclosed in this document.
En el presente documento, se usa el término “eje z” para referirse al eje a lo largo del cual se presentan ortogonalmente el corte o cortes (correspondientes a sendas secciones del corazón) que conforman un volumen. In this document, the term "z-axis" is used to refer to the axis along which the cut or cuts (corresponding to respective sections of the heart) that make up a volume are presented orthogonally.
En la figura 1 se muestra un ejemplo de preprocesamiento según las subetapas a1) y b1) anteriormente mencionadas. En concreto, según una realización preferida, cada una de las subetapas a1) y b1) incluyen, en primer lugar, la normalización numérica de la señal de los píxeles en cada volumen de la secuencia cine correspondiente (secuencia cine de entrenamiento en la subetapa a1, y secuencia cine objeto de estudio en la subetapa b1), ya que el valor puede variar en función de la máquina empleada para su obtención. En segundo lugar, se normaliza el tamaño de las imágenes a un tamaño fijo de plano en cuanto al número de píxeles. Finalmente, se reconvierte el volumen al completo para dar una única imagen. Para ello, se calcula el valor mediana de cada pixel a lo largo del eje z de cada volumen, generando así, para cada instante temporal, una única imagen representativa del volumen completo. El resultado final es una secuencia de imágenes (secuencia 2D+t) en lugar de una secuencia de volúmenes (secuencia 3D+t). Este último paso permite entrenar una red empleando menos recursos de hardware (el entrenamiento de estas redes neuronales requiere del uso de tarjetas gráficas de grandes prestaciones), al tiempo que incorpora la información de todo el volumen en la imagen final. En el caso de secuencias en las que cada instante temporal corresponda a una única imagen (un único corte para cada volumen), entonces el último paso no es necesario (su aplicación no modificaría la secuencia original) ya que esta ya es una secuencia con las mismas dimensiones requeridas por la red (secuencia 2D+t). An example of pre-processing according to the aforementioned sub-steps a1) and b1) is shown in figure 1 . Specifically, according to a preferred embodiment, each of the substeps a1) and b1) include, firstly, the numerical normalization of the signal of the pixels in each volume of the corresponding cine sequence (training cine sequence in substep a1 , and film sequence studied in substage b1), since the value may vary depending on the machine used to obtain it. Secondly, the size of the images is normalized to a fixed plane size in terms of the number of pixels. Finally, the entire volume is converted to give a single image. To do this, the median value of each pixel along the z axis of each volume is calculated, thus generating, for each instant in time, a single representative image of the entire volume. The end result is a sequence of images (2D+t sequence) instead of a sequence of volumes (3D+t sequence). This last step makes it possible to train a network using fewer hardware resources (the training of these neural networks requires the use of high-performance graphics cards), while incorporating the information from the entire volume into the final image. In the case of sequences in which each temporal instant corresponds to a single image (a single slice for each volume), then the last step is not necessary (its application would not modify the original sequence) since this is already a sequence with the same dimensions required by the network (2D+t sequence).
Esto supone una diferencia sustancial, por ejemplo, con respecto al artículo “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master’s thesis in Mathematical Statistics, Skárberg, Fredrik, Department of Mathematical Sciences, UNIVERSITY OF GOTHENBURG) anteriormente mencionado. En el presente caso no se aplica un factor promediador sobre una región del input, como se realiza en dicho artículo, sino sobre todo el input que corresponde a imágenes médicas en 3D+t para convertirlas en imágenes en 2D+1 Por otro lado, otro factor importante es que en dicho artículo se describe el uso de la media, mientras que en el método según la presente invención se aplica la mediana. Ambos son elementos estadísticos extraíbles de una distribución de valores, pero existen notables diferencias entre las propiedades de ambos. Concretamente, la más notable es que la mediana tiene la capacidad de ser insensible a la presencia de datos aberrantes a diferencia de la media. En el caso descrito en el artículo el valor medio es probablemente más indicado que la mediana, pues el promediado se hace entre cortes muy similares entre sí. En cambio, en el caso de la presente invención se trata con volúmenes de imagen cardíaca en los que hay una diferencia notable entre cortes, pues el tejido puede vahar notablemente en unas regiones del corazón con respecto a otras. Con este tipo de imágenes el objetivo de promediar es intentar obtener una representación global del estado de contracción del tejido lo menos alterada posible, por lo que la supresión de datos aberrantes del eje z del volumen es un factor importante de cara a evitar posibles artefactos en la imagen resultante. Por ello se utiliza la función mediana, ya que la aplicación de otras funciones promediadoras como la media (tal como se describe en el artículo anteriormente mencionado) en el método de la presente invención podría producir la presencia de artefactos y aberraciones en las imágenes resultantes, dada la naturaleza de las mismas. This is a substantial difference, for example, from the article “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master's thesis in Mathematical Statistics, Skárberg, Fredrik, Department of Mathematical Sciences, UNIVERSITY OF GOTHENBURG) mentioned above . In the present case, an averaging factor is not applied to a region of the input, as is done in said article, but above all the input that corresponds to 3D+t medical images to convert them into 2D+1 images. On the other hand, another An important factor is that in said article the use of the mean is described, while in the method according to the present invention the median is applied. Both are extractable statistical elements of a distribution of values, but there are notable differences between the properties of both. Specifically, the most notable is that the median has the ability to be insensitive to the presence of outliers to difference from the mean. In the case described in the article, the average value is probably more appropriate than the median, since the average is done between cuts that are very similar to each other. On the other hand, in the case of the present invention, it is treated with cardiac image volumes in which there is a notable difference between slices, since the tissue can vary significantly in some regions of the heart with respect to others. With this type of images, the objective of averaging is to try to obtain a global representation of the contraction state of the tissue that is as unaltered as possible, so the suppression of aberrant data from the z axis of the volume is an important factor in order to avoid possible artifacts in the resulting image. For this reason, the median function is used, since the application of other averaging functions such as the average (as described in the aforementioned article) in the method of the present invention could produce the presence of artifacts and aberrations in the resulting images, given their nature.
En la figura 2 se muestra un ejemplo de la subetapa a2) anteriormente mencionada. El diseño de la red utilizada incluye convoluciones en 2D junto a funciones de activación y operaciones de agrupación (“pooling") para extraer patrones espaciales (1) de las imágenes de la secuencia cine y reducir su tamaño de forma progresiva. A continuación, se incluye un módulo compuesto de convoluciones en 3D, donde se aplican diferentes tamaños de convolution y de dilatación para codificar la información temporal (2) de la secuencia cine. De este modo, según la realización preferida de la presente invención, se aplican convoluciones con factor de dilatación únicamente en el eje temporal para la extracción de información temporal, a diferencia de técnicas anteriormente conocidas tales como las divulgadas en los artículos “A Dilated CNN Model for Image Classification” y “MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis” anteriormente mencionados, en las que las convoluciones dilatadas se aplican para la extracción de información espacial. An example of the aforementioned substep a2) is shown in figure 2 . The network design used includes 2D convolutions together with activation functions and pooling operations to extract spatial patterns (1) from the cine sequence images and progressively reduce their size. includes a module composed of 3D convolutions, where different convolution and dilation sizes are applied to encode the temporal information (2) of the cine sequence.Thus, according to the preferred embodiment of the present invention, convolutions with factor of dilation only in the temporal axis for the extraction of temporal information, unlike previously known techniques such as those disclosed in the articles “A Dilated CNN Model for Image Classification” and “MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis” mentioned above, in which dilated convolutions are applied for the extraction of spatial information.
La principal característica de las convoluciones con factor de dilatación es que permiten aumentar el campo de visión de la operación sin aumentar el número de parámetros. Este método para tratar secuencias temporales tiene la ventaja de requerir un menor consumo de memoria por parte del hardware y de ser más fáciles de entrenar en comparación con las redes recurrentes más extendidas tales como las LSTM. El resultado final ofrecido por la red es una secuencia de 3 valores para cada instante temporal (3), donde cada valor indica la probabilidad de que el instante temporal corresponda a sístole, a diástole o a ninguno de los anteriores. The main characteristic of convolutions with dilation factor is that they allow to increase the field of view of the operation without increasing the number of parameters. This method of dealing with temporal sequences has the advantage of requiring less memory consumption by the hardware and of being easier to train compared to the more widespread recurrent networks such as LSTMs. The final result offered by the network is a sequence of 3 values for each time instant (3), where each value indicates the probability that the time instant corresponds to systole, diastole, or none of the above.
La subetapa de entrenamiento de la red (subetapa a3) según el método de la realización preferida de la presente invención comprende aplicar metodología de aumento de datos (“data augmentation") para generar nuevas secuencias modificadas, que incluyen rotaciones y translaciones de las imágenes, así como añadir factores que aumenten el ruido en la imagen. En esta subetapa también se incluyen modificaciones de la secuencia temporal a través de un retardo que modifica la posición de los instantes temporales dentro de la secuencia y, por tanto, se modifica la posición en la que se encuentra la sístole y la diástole. The network training substep (substep a3) according to the implementation method The preferred method of the present invention comprises applying data augmentation methodology to generate new modified sequences, which include rotations and translations of the images, as well as adding factors that increase the noise in the image. In this substage, the they include modifications of the temporal sequence through a delay that modifies the position of the temporal instants within the sequence and, therefore, the position in which the systole and diastole are located is modified.
Para el entrenamiento de la red también se usa preferiblemente una función de coste a través de la métrica del coeficiente de Dice ponderado. Esta métrica se usa ampliamente en el ámbito de la segmentación de imágenes. El coeficiente de Dice se aplica habitualmente cuando se pretende clasificar elementos dentro de un input con categorías desbalanceadas (véase, por ejemplo, los píxeles dentro de una imagen para segmentar, que es el método en el que se utiliza de forma habitual). Sin embargo, en el caso de la presente invención el input es una secuencia cine de volúmenes de resonancia magnética cardíaca en la que los elementos internos a clasificar son los volúmenes o instantes temporales dentro de la secuencia. Por tanto, la aplicación del coeficiente de Dice al caso concreto de la presente invención no resulta en principio evidente a partir de la técnica conocida. En este caso el resultado puede interpretarse como una segmentación de un vector, por lo que su uso, aunque poco convencional, es igualmente válido. En el campo de la clasificación de secuencias, se ha descrito el uso de esta función de coste en el campo del procesamiento de lenguaje natural para la clasificación de palabras dentro de una frase (véase, por ejemplo, Li, X., Sun, X., Meng, Y., Liang, J., Wu, F., y Li, J. (2019). Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv: 1911.02855) ofreciendo resultados superiores a otras funciones de coste. Sin embargo, esta función de coste no parece haberse usado antes en secuencias más allá de este campo. En el presente caso, se trata de la clasificación de instantes temporales dentro de secuencias de vídeo, no de texto. Para esta función de coste se asocia un peso a las categorías de clasificación para darle una mayor importancia a la correcta clasificación de los instantes temporales de sístole y diástole. De esta forma se le asigna un peso mayor al vector de sístole y diástole con respecto al resto de instantes temporales de la secuencia, permitiendo así balancear la función de coste y permitir que la red se centre más en la correcta clasificación de la sístole y la diástole. En cualquier caso, la suma de los pesos ha de ser 1. For network training a cost function via the weighted Dice coefficient metric is also preferably used. This metric is widely used in the realm of image segmentation. Dice's coefficient is usually applied when trying to classify elements within an input with unbalanced categories (see, for example, the pixels within an image to segment, which is the method in which it is commonly used). However, in the case of the present invention, the input is a cine sequence of cardiac magnetic resonance volumes in which the internal elements to be classified are the volumes or time instants within the sequence. Therefore, the application of Dice's coefficient to the specific case of the present invention is not, in principle, evident from the known technique. In this case, the result can be interpreted as a segmentation of a vector, so its use, although unconventional, is equally valid. In the field of sequence classification, the use of this cost function has been described in the field of natural language processing for the classification of words within a sentence (see, for example, Li, X., Sun, X ., Meng, Y., Liang, J., Wu, F., & Li, J. (2019) Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv: 1911.02855) yielding superior results to other cost functions. However, this cost function does not appear to have been used before in sequences beyond this field. In the present case, it is about the classification of temporal instants within video sequences, not text. For this cost function, a weight is associated with the classification categories to give greater importance to the correct classification of the systole and diastole temporal moments. In this way, a greater weight is assigned to the systole and diastole vector with respect to the rest of the temporal moments of the sequence, thus allowing the cost function to be balanced and allowing the network to focus more on the correct classification of systole and diastole. In any case, the sum of the weights must be 1.
Una vez que la red neuronal ha sido entrenada, se puede utilizar para detectar automáticamente la sístole y la diástole en nuevas secuencias cine de resonancia magnética cardíaca conforme a la etapa b). En primer lugar, en la subetapa b1) del método según la realización preferida de la presente invención se normaliza la secuencia cine objeto de estudio. A continuación, según la realización preferida, en la subetapa b2) la red neuronal entrenada realiza automáticamente una predicción de probabilidades asociadas a cada instante temporal de la secuencia. Finalmente, en la subetapa b3) del método según la realización preferida se procesan los resultados ofrecidos por la red neuronal en la subetapa b2), con el fin de realizar la correcta detección de la sístole y la diástole. La red neuronal puede ofrecer resultados en los que varios instantes temporales contiguos tienen altas probabilidades de ser sístole (o de igual manera de ser diástole). Para optimizar la clasificación final, preferiblemente, se escogen aquellos instantes temporales con una probabilidad mayor del 90% de ser sístole o diástole, respectivamente; y a continuación, se clasifica como sístole o diástole, respectivamente, tan sólo el instante temporal que ocupa la posición promedio de entre estos instantes temporales con una probabilidad mayor del 90% de ser sístole o diástole, respectivamente. Un ejemplo de este procesamiento se muestra en la figura 3 adjunta. Por tanto, esta última parte de la subetapa b3) de clasificación incluye la aplicación de un umbral (en este caso, una probabilidad mayor del 90%) seguida por la selección del elemento central o promedio. A priori, podría considerarse más evidente y directo para la clasificación final escoger simplemente el instante temporal que tuviera una mayor probabilidad de pertenecer a la categoría de sístole (e igualmente para la diástole). Sin embargo, se comprobó mediante estudios experimentales que las imágenes en las zonas cercanas a la sístole y a la diástole tienden a ser muy similares (es decir, el estado de contracción cardíaca es muy similar entre cortes muy cercanos entre sí en las imágenes de las secuencias cine de resonancia magnética cardíaca). Por tanto, la red neuronal tendía a obtener probabilidades asociadas extremadamente elevadas en las zonas cercanas a la sístole y la diástole, siendo por tanto difícil determinar cuál es el instante temporal más correcto cuando las probabilidades asociadas difieren tan poco. Once the neural network has been trained, it can be used to automatically detect systole and diastole in new cardiac MRI cine sequences according to step b). First of all, in substep b1) of the method according to the preferred embodiment of the present invention, the cine sequence object of study. Next, according to the preferred embodiment, in substep b2) the trained neural network automatically makes a prediction of probabilities associated with each time instant of the sequence. Finally, in substep b3) of the method according to the preferred embodiment, the results offered by the neural network in substep b2) are processed in order to carry out the correct detection of systole and diastole. The neural network can offer results in which several contiguous moments in time have a high probability of being systole (or equally of being diastole). In order to optimize the final classification, those moments in time with a greater than 90% probability of being systole or diastole, respectively, are preferably chosen; and then, only the time instant that occupies the average position among these time instants with a greater than 90% probability of being systole or diastole, respectively, is classified as systole or diastole, respectively. An example of this processing is shown in the accompanying figure 3 . Therefore, this last part of the classification substep b3) includes the application of a threshold (in this case, a probability greater than 90%) followed by the selection of the central or average element. A priori, it could be considered more obvious and direct for the final classification to simply choose the time instant that had a higher probability of belonging to the systole category (and likewise for diastole). However, it was verified by experimental studies that the images in the areas close to systole and diastole tend to be very similar (that is, the state of cardiac contraction is very similar between slices very close to each other in the images of the sequences). cardiac magnetic resonance imaging). Therefore, the neural network tended to obtain extremely high associated probabilities in the areas close to systole and diastole, making it difficult to determine the most correct point in time when the associated probabilities differ so little.
Por tanto, en el método dado a conocer en el presente documento se aplica un umbral muy alto de selección inicial para las probabilidades de sístole y diástole respectivamente (tal como se mencionó anteriormente, sólo se seleccionan valores con probabilidad superior al 90%). Esto difiere sustancialmente de otros métodos conocidos en la técnica anterior, en los que se abarca un mayor rango de probabilidades, incluyendo probabilidades de difícil decisión a priori, como por ejemplo, valores cercanos al 50%; lo que puede ocasionar un mayor error en la clasificación. Therefore, in the method disclosed herein, a very high initial selection threshold is applied for the probabilities of systole and diastole respectively (as mentioned above, only values with probability greater than 90%) are selected. This differs substantially from other methods known in the prior art, in which a greater range of probabilities is covered, including probabilities that are difficult to decide a priori, such as values close to 50%; which can cause a greater error in the classification.
En la aplicación específica dada a conocer en el presente documento, el problema radica en que no es posible localizar un pico inequívoco en las probabilidades obtenidas (abarcando un intervalo de alrededor del 95 al 100% en los instantes temporales circundantes al real sin un patrón de pico claro), y por ello se aplica la selección del valor central de entre aquellos valores con probabilidades muy elevadas. La aplicación de este paso final de clasificación se da por la propia naturaleza de las imágenes utilizadas y por el diseño de red neuronal planteado. Se considera que, en un diseño distinto aplicado a otro problema, se podría aplicar la selección del punto de mayor probabilidad como la elección óptima, tal como se divulga en otros métodos conocidos en la técnica, no siendo evidente a priori la selección del elemento central de entre un conjunto de valores con altas probabilidades. In the specific application disclosed in the present document, the problem is that it is not possible to locate an unambiguous peak in the obtained probabilities (covering an interval of about 95 to 100% in the time instants surrounding the real one without a pattern of clear peak), and therefore the selection of the central value is applied among those values with very high probabilities. The application of this final classification step is given by the very nature of the images used and by the proposed neural network design. It is considered that, in a different design applied to another problem, the selection of the point with the highest probability could be applied as the optimal choice, as disclosed in other methods known in the art, the selection of the central element not being evident a priori. from a set of values with high probabilities.
A continuación, se describe un ejemplo concreto de un caso práctico de aplicación del método dado a conocer en el presente documento. En este estudio se utilizó una base de datos de secuencias cine de resonancia magnética cardíaca (secuencias cine de entrenamiento) correspondientes a un total de 399 pacientes. Entre dichos pacientes, el número de cortes por volumen era variable entre 8 y 14, y el número de volúmenes era variable entre 14 y 35, siendo la gran mayoría de 35. La resolución temporal era de 52,92 ms. A specific example of a practical case of application of the method disclosed in this document is described below. In this study, a database of cardiac magnetic resonance cine sequences (training cine sequences) corresponding to a total of 399 patients was used. Among these patients, the number of slices per volume was variable between 8 and 14, and the number of volumes was variable between 14 and 35, with the vast majority being 35. Temporal resolution was 52.92 ms.
Partiendo de que se dispone de dicha base de datos de secuencias cine de resonancia magnética cardíaca, se desea aplicar el método de la presente invención en un entorno clínico, para su uso en el cálculo automático de la sístole y la diástole en nuevas secuencias cine (secuencias cine objeto de estudio). Based on the availability of such a database of cardiac magnetic resonance cine sequences, it is desired to apply the method of the present invention in a clinical setting, for its use in the automatic calculation of systole and diastole in new cine sequences ( cinema sequences object of study).
En primer lugar, en la etapa a): a1) Se normalizan las secuencias cine de la base de datos disponible. a2) Se diseña y se implementa en un programa la red neuronal descrita anteriormente. a3) La red neuronal implementada se entrena mediante las secuencias cine normalizadas de la base de datos disponible, según la metodología descrita anteriormente. First, in step a): a1) The cine sequences of the available database are normalized. a2) The neural network described above is designed and implemented in a program. a3) The implemented neural network is trained using the normalized cine sequences from the available database, according to the methodology described above.
Una vez que se dispone de la red neuronal entrenada, esta se puede usar en el entorno clínico. Para ello, cada vez que se desea detectar la sístole y la diástole de una nueva secuencia cine se aplica la etapa b) según las siguientes subetapas: b1) Se normaliza la secuencia cine objeto de estudio. b2) Se pasa la secuencia cine normalizada a la red neuronal y ésta proporciona el listado de probabilidades asociadas a los instantes temporales de la secuencia cine. b3) Se aplica el método de clasificación final a las probabilidades obtenidas por la red neuronal para generar la clasificación final. Once the trained neural network is available, it can be used in the clinical setting. To do this, each time it is desired to detect the systole and diastole of a new cine sequence, stage b) is applied according to the following substages: b1) The cine sequence under study is normalized. b2) The normalized cine sequence is passed to the neural network and this provides the list of probabilities associated with the temporal instants of the cine sequence. b3) The final classification method is applied to the probabilities obtained by the neural network to generate the final classification.
A continuación, se incluye un estudio comparativo entre los resultados obtenidos mediante la aplicación del método según la realización preferida de la presente invención y los resultados obtenidos mediante otros métodos. Concretamente, en el presente documento se evalúan los resultados directamente sobre 99 casos, en los que hay casos con números de instantes temporales y de volúmenes variables. Se compara usar 4 métodos de entrenamiento y clasificación final: A comparative study is included below between the results obtained by applying the method according to the preferred embodiment of the present invention and the results obtained by other methods. Specifically, this document evaluates the results directly on 99 cases, in which there are cases with variable numbers of time instants and volumes. It compares using 4 training methods and final classification:
- Función de coste con entropía cruzada (función estándar para problemas de clasificación en aprendizaje profundo) y clasificación de los instantes temporales en sístole y diástole en función del que obtuviera la mayor probabilidad asociada (método naive en la tabla siguiente). - Cost function with crossed entropy (standard function for classification problems in deep learning) and classification of the time instants in systole and diastole based on the one that obtained the highest associated probability (naive method in the following table).
- Función de coste con entropía cruzada y clasificación de los instantes temporales en sístole y diástole utilizando la selección del punto central de alta probabilidad (método promedio en la tabla siguiente). - Cost function with crossed entropy and classification of the time instants in systole and diastole using the selection of the high probability center point (average method in the following table).
- Función de coste con factor de Dice ponderado y clasificación de los instantes temporales en sístole y diástole en función del que obtuviera la mayor probabilidad asociada (método naive en la tabla siguiente). - Cost function with weighted Dice factor and classification of the systolic and diastolic time points according to which obtained the highest associated probability (naive method in the following table).
- Función de coste con factor de Dice ponderado y clasificación de los instantes temporales en sístole y diástole utilizando la selección del punto central de alta probabilidad (método promedio en la tabla siguiente). - Cost function with weighted Dice factor and classification of systolic and diastolic time points using the selection of the high-probability central point (average method in the following table).
Como medida comparativa se usa la media de la distancia del instante temporal clasificado con respecto al real para sístole y para diástole.
Figure imgf000014_0001
As a comparative measure, the mean of the distance of the classified temporal instant with respect to the real one for systole and diastole is used.
Figure imgf000014_0001
En la tabla anterior puede observarse que el error en el método propuesto por la presente invención (método de entrenamiento “Dice ponderado” y clasificación final por “método promedio”) es el menor, tanto para la diástole como para la sístole. Se obtiene un error medio de 0 para la diástole y de 1 ,242 ± 1 ,45 para la sístole. Si se normalizan los errores de cada caso en función del número de instantes temporales de cada secuencia se obtienen los errores normalizados presentados en la siguiente tabla:
Figure imgf000014_0002
Un valor de 1 implicaría que el instante temporal clasificado como sístole (o diástole en su caso) es el más alejado posible del real. Se observa que en el caso de la presente invención la tasa de error es de 0,036 en promedio en la sístole. En diástole, el resultado obtenido con el método de la presente invención es perfecto.
In the above table it can be seen that the error in the method proposed by the present invention (training method "Dice weighted" and final classification by "average method") is the smallest, both for diastole and systole. A mean error of 0 is obtained for diastole and 1.242 ± 1.45 for systole. If the errors of each case are normalized as a function of the number of time instants of each sequence, the normalized errors presented in the following table are obtained:
Figure imgf000014_0002
A value of 1 would imply that the time instant classified as systole (or diastole in its case) is the farthest possible from the real one. It is observed that in the case of the present invention the error rate is 0.036 on average in systole. In diastole, the result obtained with the method of the present invention is perfect.
A continuación, se presentan algunos de los aspectos novedosos del método de la presente invención, sin pretender que esta lista sea exhaustiva de ninguna manera: Some of the novel aspects of the method of the present invention are presented below, without claiming that this list is exhaustive in any way:
- En primer lugar, el método de la invención permite detectar el instante temporal de la sístole y la diástole cardíaca de forma automática en secuencias cine de resonancia magnética cardíaca utilizando los volúmenes de la secuencia al completo. - In the first place, the method of the invention makes it possible to automatically detect the temporal instant of cardiac systole and diastole in cardiac magnetic resonance cine sequences using the entire sequence volumes.
- Para la detección de la sístole y la diástole se convierte cada volumen de imágenes cardíacas a una única imagen representativa de todo el volumen. Esto permite concretamente incorporar más información de la secuencia temporal y reducir el número de datos a procesar. - For the detection of systole and diastole, each volume of cardiac images is converted to a single representative image of the entire volume. This specifically makes it possible to incorporate more time sequence information and reduce the number of data to be processed.
- El análisis temporal de la secuencia se implementa mediante algoritmos de aprendizaje profundo (“Deep learning"), específicamente mediante el uso de convoluciones 3D incluyendo diferentes tamaños de convolution y de dilatación. Esto permite la implementation de una red más sencilla y con un menor número de parámetros, haciendo el proceso más rápido y con unos requisitos de hardware menores. - The temporal analysis of the sequence is implemented through deep learning algorithms, specifically through the use of 3D convolutions including different sizes of convolution and dilation. This allows the implementation of a simpler network with less number of parameters, making the process faster and with lower hardware requirements.
- El entrenamiento utiliza como función de coste el coeficiente de Dice ponderado en la clasificación de una secuencia de vídeo. Esta función de coste se ha utilizado en aplicaciones de segmentación de imagen, pero no en clasificaciones como en el método propuesto. El uso de esta función de coste ha demostrado ofrecer resultados superiores a otras funciones de coste más tradicionales en el tratamiento de otro tipo de secuencias, concretamente secuencias de texto. En el problema planteado se suelen utilizar otras funciones de coste (siendo el paradigma la entropía cruzada) aunque se ha demostrado que en diversas aplicaciones donde hay un desequilibrio de categorías (como en el presente caso, en el que las categorías de sístole y diástole solo corresponden a un instante temporal, respectivamente, y el resto de la secuencia corresponde a la categoría de fondo) el coeficiente de Dice funciona mejor. De esta forma, el método hace uso de esta función de coste en una aplicación nueva como es la clasificación de instantes temporales en secuencias de vídeo. - The training uses the weighted Dice coefficient as a cost function in the classification of a video sequence. This cost function has been used in image segmentation applications, but not in classifications as in the proposed method. The use of this cost function has been shown to offer better results than other more traditional cost functions in the treatment of other types of sequences, specifically text sequences. In the problem posed, other cost functions are usually used (crossed entropy being the paradigm) although it has been shown that in various applications where there is an imbalance of categories (as in the present case, in which the categories of systole and diastole only correspond to a time instant, respectively, and the rest of the sequence corresponds to the background category) the Dice coefficient works better. In this way, the method makes use of this cost function in a new application such as the classification of temporal instants in video sequences.
Las características y los aspectos novedosos anteriormente mencionados del método según la presente invención hacen que presente diversas ventajas con respecto a alternativas actualmente conocidas en la técnica. Algunas de dichas ventajas ya que han explicado o se desprenden de la descripción anterior, e incluyen, de manera no exhaustiva: The aforementioned features and novelty aspects of the method according to the present invention give it various advantages over alternatives currently known in the art. Some of these advantages have already been explained or flow from the above description, and include, but are not limited to:
- El método según la presente invención permite el uso de secuencias con volúmenes completos, incorporando toda la información volumétrica de la secuencia. - The method according to the present invention allows the use of sequences with complete volumes, incorporating all the volumetric information of the sequence.
- El método según la presente invención no requiere segmentaciones previas para entrenar la red neuronal. - The method according to the present invention does not require previous segmentations to train the neural network.
- La subetapa de preprocesamiento de las secuencias permite reducir el número de datos que se le pasan a la red y, al mismo tiempo, maximizar la información global disponible. Todo ello hace que el entrenamiento de la red neuronal y la predicción de probabilidades de que un instante temporal dado corresponda a sístole, a diástole o a ninguno de los anteriores, sea más rápido y factible. A su vez el diseño de la red neuronal hace uso de convoluciones dilatadas para la codificación temporal, por lo que reduce el número de parámetros necesarios en contraste con las redes recurrentes. Esto hace que la subetapa de entrenamiento sea más estable y rápida y, además, permite realizar las inferencias a una mayor velocidad. Estas características permiten el uso de hardware con menos prestaciones al requerir un menor número de recursos de memoria. Esto implica el abaratamiento en la implantación de una aplicación que utilice el método dado a conocer en el presente documento, además de que la red resultante se puede ejecutar más rápidamente. - The sequence preprocessing substage allows to reduce the number of data passed to the network and, at the same time, to maximize the global information available. All of this makes training the neural network and predicting the probabilities that a given time instant corresponds to systole, diastole, or none of the above, faster and more feasible. In turn, the design of the neural network makes use of dilated convolutions for temporal coding, thus reducing the number of necessary parameters in contrast to recurrent networks. This makes the training substage more stable and faster, and also allows inferences to be made at a higher speed. These features allow the use of hardware with less features by requiring fewer memory resources. This implies cheaper implementation of an application that uses the method disclosed in this document, in addition to the fact that the resulting network can be executed more quickly.
- El diseño de la red permite la detección de la sístole y la diástole en secuencias de duración arbitraria, a diferencia de otros estudios realizados en los que el número de instantes temporales en la secuencia es fijo. Esto amplía la aplicabilidad del método propuesto para diferentes entornos en los que los equipos de imagen por resonancia magnética pueden ofrecer secuencias cine de duración variable. - The design of the network allows the detection of systole and diastole in sequences of arbitrary duration, unlike other studies carried out in which the number of time points in the sequence is fixed. This broadens the applicability of the proposed method for different environments in which magnetic resonance imaging equipment can offer cine sequences of variable duration.
- El diseño del bloque de convoluciones dilatadas puede ampliarse en función del hardware disponible. El diseño plantea una implementación generalista que puede ser ampliada en función del número de convoluciones y el número de canales de las mismas presentadas en cada bloque. Hay que considerar que la memoria requerida por las tarjetas gráficas empleadas se incrementará en función de estas variaciones. - The dilated convolution block design can be extended depending on the available hardware. The design proposes a general implementation that can be extended depending on the number of convolutions and the number of channels of the same presented in each block. It must be considered that the memory required by the graphics cards used will increase based on these variations.
Tal como se mencionó anteriormente, gracias al método de la presente invención es posible automatizar el proceso de detección de sístole y diástole en secuencias cine de imagen por resonancia magnética cardíaca. Esto permite acelerar el proceso de análisis y diagnóstico de patologías cardiacas, así como reducir la variabilidad del diagnóstico dependiente del usuario. As previously mentioned, thanks to the method of the present invention it is possible to automate the detection process of systole and diastole in cine image sequences by cardiac magnetic resonance. This allows speeding up the process of analysis and diagnosis of cardiac pathologies, as well as reducing the variability of the diagnosis dependent on the user.
Sin embargo, el método de la presente invención puede encontrar aplicaciones en otros campos distintos del mencionado anteriormente, para el entrenamiento de una red neuronal para su aplicación en la clasificación de instantes temporales en secuencias de otra tipología diferente de la descrita. Algunos ejemplos pueden ser: clasificación de escenas en vídeos de películas en categorías (escena de humor, escena violenta, escena con contenido apropiado/inapropiado, etc.), clasificación de adquisiciones de imágenes médicas cuando se ha inyectado un contraste en categorías (no presencia de contraste, aumento de presencia de contraste, máxima presencia de contraste, disminución de presencia de contraste, etc.), clasificación de actividad de acciones humanas en un vídeo en sistemas de seguridad (ninguna acción, acción violenta, acción de robo, acción de allanamiento de morada, acción con tenencia de armas, etc.), clasificación en categorías de adquisiciones de secuencias de imágenes médicas de fMRI (“functional magnetic resonance imaging") para analizar la actividad cerebral en una región del cerebro (no presenta activación cerebral (reposo), activación cerebral baja, activación cerebral alta, etc.). De manera general, el método dado a conocer en el presente documento puede encontrar aplicación en la clasificación de fotogramas o instantes temporales dentro de una secuencia de imágenes en formato de vídeo. Además, permite la detección en secuencias de cualquier duración, así como en secuencias con volúmenes de tamaño arbitrario. However, the method of the present invention can find applications in other fields than the one mentioned above, for the training of a neural network for its application in the classification of temporal instants in sequences of another type. different from the one described. Some examples may be: classification of scenes in movie videos into categories (humorous scene, violent scene, scene with appropriate/inappropriate content, etc.), classification of medical imaging acquisitions when contrast has been injected into categories (non-presence contrast, increased presence of contrast, maximum presence of contrast, decreased presence of contrast, etc.), activity classification of human actions in a video in security systems (no action, violent action, robbery action, action of burglary, action with possession of weapons, etc.), classification into categories of acquisitions of fMRI medical image sequences ("functional magnetic resonance imaging") to analyze brain activity in a region of the brain (it does not present brain activation ( rest), low brain activation, high brain activation, etc.) In general, the method disclosed in the present document can find application in the classification of frames or temporal moments within a sequence of images in video format. Furthermore, it allows detection on streams of any duration, as well as on streams with volumes of arbitrary size.
Por otro lado, el método de la presente invención puede implementarse con unos requisitos de hardware menores que otros diseños propuestos con el mismo objetivo. Esto hace que su implantación sea económicamente más viable. On the other hand, the method of the present invention can be implemented with lower hardware requirements than other designs proposed for the same purpose. This makes its implementation economically more viable.
Por último, gracias al diseño de red neuronal propuesto en el presente documento, su aplicabilidad puede extenderse a diferentes protocolos de distintos equipos de obtención de imágenes por resonancia magnética en los que la duración y el número de cortes en los volúmenes de la secuencia cine obtenida pueden ser variables. Esto aumenta su aplicabilidad dentro del mercado. Finally, thanks to the neural network design proposed in this document, its applicability can be extended to different protocols of different magnetic resonance imaging equipment in which the duration and number of slices in the volumes of the cine sequence obtained they can be variable. This increases its applicability within the market.
Habiéndose descrito la presente invención con referencia a un ejemplo de realización preferido de la misma, el experto en la técnica podrá realizar modificaciones y variaciones evidentes a dicho ejemplo de realización sin por ello alejarse del alcance de protección definido por las siguientes reivindicaciones. Having described the present invention with reference to a preferred embodiment thereof, the person skilled in the art will be able to make obvious modifications and variations to said embodiment without departing from the scope of protection defined by the following claims.

Claims

REIVINDICACIONES
1. Método de detección automática de sístole y diástole en una secuencia cine de resonancia magnética cardíaca mediante el uso de una red neuronal puramente convolucional, comprendiendo la secuencia cine una pluralidad de instantes temporales cada uno de los cuales corresponde a un volumen, comprendiendo cada volumen al menos un corte, estando el método caracterizado por que comprende las etapas de: a) preparar la red neuronal, según las siguientes subetapas: a1) preprocesar una o vahas secuencias cine de entrenamiento para normalizar su entrada a la red neuronal; a2) diseñar la red neuronal incluyendo capas convolucionales para extraer patrones espaciales (1), y convoluciones dilatadas para codificar información temporal (2); a3) entrenar la red neuronal mediante la secuencia o secuencias cine de entrenamiento normalizadas; b) aplicar la red neuronal entrenada a la detección automática de sístole y diástole en la secuencia cine de resonancia magnética cardíaca objeto de estudio, siguiendo las siguientes subetapas: b1) preprocesar la secuencia cine objeto de estudio para normalizar su entrada a la red neuronal; b2) extraer patrones espaciales (1) y codificar información temporal (2) mediante la red neuronal, proporcionando como resultado una secuencia de 3 valores para cada instante temporal (3), indicando cada valor la probabilidad de que el instante temporal corresponda a sístole, a diástole o a ninguno de los anteriores; b3) clasificar los instantes temporales para detectar un instante temporal correspondiente a la sístole y un instante temporal correspondiente a la diástole; con la particularidad de que las subetapas a2) y b2) comprenden, cada una: 1. Method of automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence by using a purely convolutional neural network, the cine sequence comprising a plurality of time instants each of which corresponds to a volume, each volume comprising at least one cut, the method being characterized in that it comprises the stages of: a) preparing the neural network, according to the following substages: a1) preprocessing one or several cine training sequences to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns (1), and dilated convolutions to encode temporal information (2); a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cardiac magnetic resonance cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns (1) and encode temporal information (2) through the neural network, providing as a result a sequence of 3 values for each time instant (3), each value indicating the probability that the time instant corresponds to systole, to diastole or none of the above; b3) classifying the time points to detect a time point corresponding to systole and a time point corresponding to diastole; with the particularity that substages a2) and b2) each comprise:
- convoluciones en 2D junto a funciones de activación y operaciones de agrupación, para extraer patrones espaciales (1) de las imágenes de la secuencia cine y reducir su tamaño de forma progresiva en el plano 2D; y - 2D convolutions together with activation functions and grouping operations, to extract spatial patterns (1) from cine sequence images and reduce their size progressively in the 2D plane; and
- a continuación, un módulo compuesto de convoluciones en 3D, donde se aplican diferentes tamaños de convolution y de dilatación en un eje temporal para codificar la información temporal (2) de la secuencia cine, siendo el eje temporal la tercera dimensión de las convoluciones en 3D. - next, a module composed of 3D convolutions, where different convolution and dilation sizes are applied on a time axis to encode the time information (2) of the cine sequence, the time axis being the third dimension of the convolutions in 3D.
2. Método según la reivindicación 1, caracterizado por que las subetapas a1) y a2) pueden realizarse en cualquier orden temporal una con respecto a la otra. Método según cualquiera de las reivindicaciones anteriores, caracterizado por que las subetapas a1) y b1) comprenden, cada una: The method according to claim 1, characterized in that substeps a1) and a2) can be carried out in any time order with respect to one another. Method according to any of the preceding claims, characterized in that substeps a1) and b1) each comprise:
- realizar una normalización numérica de la señal de los píxeles en cada volumen de la secuencia; - performing a numerical normalization of the signal of the pixels in each volume of the sequence;
- normalizar el tamaño de las imágenes a un tamaño fijo de plano en cuanto al número de píxeles; - normalize the size of the images to a fixed plane size in terms of the number of pixels;
- reconvertir el volumen completo para dar una única imagen; generando así para cada instante temporal una única imagen representativa del volumen completo, obteniéndose por tanto una secuencia de imágenes en lugar de una secuencia de volúmenes. Método según la reivindicación 3, caracterizado por que reconvertir el volumen completo para dar una única imagen comprende calcular el valor mediana de cada pixel a lo largo del eje z de cada volumen. Método según cualquiera de las reivindicaciones anteriores, caracterizado por que la subetapa a3) comprende: - convert the entire volume to give a single image; thus generating for each time instant a single representative image of the complete volume, thus obtaining a sequence of images instead of a sequence of volumes. Method according to claim 3, characterized in that converting the entire volume to give a single image comprises calculating the median value of each pixel along the z-axis of each volume. Method according to any of the preceding claims, characterized in that substep a3) comprises:
- aplicar metodología de aumento de datos para generar nuevas secuencias modificadas, que incluyen rotaciones y translaciones de las imágenes, así como añadir factores que aumentan el ruido en la imagen; - apply data augmentation methodology to generate new modified sequences, which include rotations and translations of the images, as well as adding factors that increase image noise;
- utilizar una función de coste a través de la métrica del coeficiente de Dice ponderado para la clasificación de instantes temporales dentro de la secuencia. Método según la reivindicación 5, caracterizado por que aplicar metodología de aumento de datos comprende modificaciones de la secuencia temporal a través de un retardo que modifica la posición de los instantes temporales dentro de la secuencia y por tanto se modifica la posición en la que se encuentra la sístole y la diástole. Método según cualquiera de las reivindicaciones anteriores, caracterizado por que la subetapa b3) comprende: - use a cost function through the metric of the weighted Dice coefficient for the classification of temporal instants within the sequence. Method according to claim 5, characterized in that applying the data increase methodology comprises modifications to the temporal sequence through a delay that modifies the position of the temporal instants within the sequence and therefore the position in which it is located is modified. the systole and diastole. Method according to any of the preceding claims, characterized in that substep b3) comprises:
- escoger los instantes temporales con una probabilidad mayor del 90% de ser sístole o diástole, respectivamente; y - choose the moments in time with a probability greater than 90% of being systole or diastole, respectively; and
- clasificar como sístole o diástole, respectivamente, tan sólo el instante temporal que ocupa la posición promedio de entre estos instantes temporales con una probabilidad mayor del 90% de ser sístole o diástole, respectivamente. - classify as systole or diastole, respectively, only the time instant that occupies the average position among these time instants with a probability greater than 90% of being systole or diastole, respectively.
PCT/ES2022/070645 2021-10-18 2022-10-13 Method for automatically detecting systole and diastole WO2023067212A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ES202130971A ES2909446B2 (en) 2021-10-18 2021-10-18 AUTOMATIC DETECTION METHOD OF SYSTOLE AND DIASTOLE
ESP202130971 2021-10-18

Publications (1)

Publication Number Publication Date
WO2023067212A1 true WO2023067212A1 (en) 2023-04-27

Family

ID=81387393

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/ES2022/070645 WO2023067212A1 (en) 2021-10-18 2022-10-13 Method for automatically detecting systole and diastole

Country Status (2)

Country Link
ES (1) ES2909446B2 (en)
WO (1) WO2023067212A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356750A1 (en) * 2014-06-05 2015-12-10 Siemens Medical Solutions Usa, Inc. Systems and Methods for Graphic Visualization of Ventricle Wall Motion
CN110543912A (en) * 2019-09-02 2019-12-06 李肯立 Method for automatically acquiring cardiac cycle video in fetal key section ultrasonic video

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356750A1 (en) * 2014-06-05 2015-12-10 Siemens Medical Solutions Usa, Inc. Systems and Methods for Graphic Visualization of Ventricle Wall Motion
CN110543912A (en) * 2019-09-02 2019-12-06 李肯立 Method for automatically acquiring cardiac cycle video in fetal key section ultrasonic video

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CIUSDEL, C. ET AL.: "Deep neural networks for ECG-free cardiac phase and end-diastolic frame detection on coronary angiographies", COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, vol. 84, 9 January 2020 (2020-01-09), pages 101749, ISSN: 0895-6111, DOI: https://doi.org/10.1016/j.compmedimag.2020.101749 *
KHENED MAHENDRA; KOLLERATHU VARGHESE ALEX; KRISHNAMURTHI GANAPATHY: "Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers", MEDICAL IMAGE ANALYSIS, OXFORD UNIVERSITY PRESS, OXOFRD, GB, vol. 51, 19 October 2018 (2018-10-19), GB , pages 21 - 45, XP085544958, ISSN: 1361-8415, DOI: 10.1016/j.media.2018.10.004 *
KREBS JULIAN; DELINGETTE HERVE; AYACHE NICHOLAS; MANSI TOMMASO: "Learning a Generative Motion Model From Image Sequences Based on a Latent Motion Matrix", IEEE TRANSACTIONS ON MEDICAL IMAGING, IEEE, USA, vol. 40, no. 5, 2 February 2021 (2021-02-02), USA, pages 1405 - 1416, XP011851738, ISSN: 0278-0062, DOI: 10.1109/TMI.2021.3056531 *
SARMIENTO EVERSON; PICO JEAN; MARTINEZ FABIO: "Cardiac disease prediction from spatio-temporal motion patterns in cine-MRI", 2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), IEEE, 4 April 2018 (2018-04-04), pages 1305 - 1308, XP033348389, DOI: 10.1109/ISBI.2018.8363811 *

Also Published As

Publication number Publication date
ES2909446B2 (en) 2023-03-02
ES2909446A1 (en) 2022-05-06

Similar Documents

Publication Publication Date Title
Zhuang et al. Evaluation of algorithms for multi-modality whole heart segmentation: an open-access grand challenge
US11182896B2 (en) Automated segmentation of organ chambers using deep learning methods from medical imaging
Oktay et al. Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation
US9968257B1 (en) Volumetric quantification of cardiovascular structures from medical imaging
Navab et al. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III
WO2021244661A1 (en) Method and system for determining blood vessel information in image
Geremia et al. Spatially adaptive random forests
CN109829918B (en) Liver image segmentation method based on dense feature pyramid network
Estienne et al. Deep learning-based concurrent brain registration and tumor segmentation
Fan et al. Unsupervised cerebrovascular segmentation of TOF-MRA images based on deep neural network and hidden Markov random field model
Mahapatra et al. Active learning based segmentation of Crohns disease from abdominal MRI
JP7250166B2 (en) Image segmentation method and device, image segmentation model training method and device
Dong et al. A combined multi-scale deep learning and random forests approach for direct left ventricular volumes estimation in 3D echocardiography
Amiri et al. Bayesian Network and Structured Random Forest Cooperative Deep Learning for Automatic Multi-label Brain Tumor Segmentation.
CN110570394A (en) medical image segmentation method, device, equipment and storage medium
Sivanesan et al. Unsupervised medical image segmentation with adversarial networks: From edge diagrams to segmentation maps
Pang et al. A modified scheme for liver tumor segmentation based on cascaded FCNs
Ma et al. An iterative multi‐path fully convolutional neural network for automatic cardiac segmentation in cine MR images
Zhang et al. Conditional convolution generative adversarial network for Bi-ventricle segmentation in cardiac MR images
ES2909446B2 (en) AUTOMATIC DETECTION METHOD OF SYSTOLE AND DIASTOLE
Ma et al. Automatic nasopharyngeal carcinoma segmentation in MR images with convolutional neural networks
Jiang et al. AIU-Net: an efficient deep convolutional neural network for brain tumor segmentation
Catanese et al. Automatic graph cut segmentation of multiple sclerosis lesions
Wu et al. Inner Cascaded U²-Net: An Improvement to Plain Cascaded U-Net.
Gong et al. Unsupervised domain adaptation network with category-centric prototype aligner for biomedical image segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22883023

Country of ref document: EP

Kind code of ref document: A1