WO2023067212A1

WO2023067212A1 - Method for automatically detecting systole and diastole

Info

Publication number: WO2023067212A1
Application number: PCT/ES2022/070645
Authority: WO
Inventors: David MORATAL PÉREZ; Manuel PÉREZ PELEGRÍ; José Vicente MONMENEU MENADAS; María Pilar LÓPEZ LEREU; José Manuel SANTABÁRBARA GÓMEZ; Alicia M. MACEIRA GONZÁLEZ
Original assignee: Universitat Politècnica De València; Ecg Médica S.L.
Priority date: 2021-10-18
Filing date: 2022-10-13
Publication date: 2023-04-27
Also published as: ES2909446B2; ES2909446A1

Abstract

The invention discloses a method for automatically detecting systole and diastole in a cardiac magnetic resonance cine sequence using a purely convolutional neural network, the method comprising a step of preparing the network by preprocessing training cine sequences to normalise their input into the network, designing the network with convolutional layers to extract spatial patterns (1) and dilated convolutions to encode time information (2), and training the network; and a step of applying the trained network to detect systole and diastole in the cine sequence under study by preprocessing the sequence to normalise its input into the network, extracting spatial patterns (1) and encoding time information (2) using the network, providing three values for each instant (3) which indicate the probability that it corresponds to systole, diastole or none of them, and classifying the time instants into systole and diastole.

Description

DESCRIPTION

Automatic systole and diastole detection method

FIELD OF THE INVENTION

The present invention relates generally to the field of medicine and more specifically to that of cardiology. The invention specifically relates to a method of automatic detection of systole and diastole by using a pure convolutional neural network in a cardiac magnetic resonance cine sequence.

BACKGROUND OF THE INVENTION

In the clinical field, the state of the heart can be characterized through magnetic resonance cine sequences. These sequences make up volumes of the heart at different moments in time, allowing us to see the contraction of the tissues for analysis. For a correct characterization, the clinician must extract as main parameters the volumes of the ventricles in systole (maximum cardiac contraction) and diastole (state of cardiac relaxation), as well as the ejection fraction derived from the above. However, the first step in performing this analysis is the detection of systole and diastole in the sequence, which can be time consuming and therefore it is desirable to have an automatic systole and diastole detection method that facilitates and expedite further analysis and diagnosis by the medical professional.

Few studies have been carried out describing methods that allow automating the detection of systole and diastole in this type of sequences. Some cases have proposed the segmentation of the left ventricle through neural networks at all time points (also called "frames") and derive the location based on the volume of each one of them (see, for example, Hsin, C., and Danner, C. (2016). Convolutional Neural Networks for Left Ventricle Volume Estimation.) This method has the drawback of requiring prior segmentation at all time points in order to train the neural network.

In another case, neural networks have been used that are trained to segment the left ventricle and then determine the points of systole and diastole by locating the center of the left ventricle with respect to a reference (see, for example, Yang, F. , He, Y., Hussain, M., Xie, H., & Lei, P. (2017). Convolutional neural network for the detection of end-diastole and end-systole frames in free-breathing cardiac magnetic resonance imaging. Computational and mathematical methods in medicine, Article ID 1640835, 2017). Finally, another method described has used convolutional networks to extract the spatial patterns in the images and then use recurrent networks with LSTM (Long Short Term Memor) layers to encode the information at the temporal level. and finally apply the classification at each point in time in the sequence (see, for example, Kong, B., Zhan, Y., Shin, M., Denny, T., and Zhang, S. (October 2016). Recognizing End-diastole and end-systole frames via deep temporal regression network.In International conference on medical image computing and computer-assisted intervention (pp. 264-272. Springer, Cham). In the last two cases, the studies were carried out using only individual cuts of the cinema sequence and with a fixed number of temporal moments.

The article “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master's thesis in Mathematical Statistics, Skárberg, Fredrik, Department of Mathematical Sciences. UNIVERSITY OF GOTHENBURG), describes a method for averaging inputs from volume along the z axis in a segmentation application on high resolution microscopy images. The data set consists of large 3D images that are to be segmented. Specifically, the author describes these data with a number of cuts for each case of about 200; However, the neural network that it uses does not take the complete data, but takes fragments of a few slices and applies the segmentation at the level of the selected image fragment, going from a 3D image fragment (that is, in 3 dimensions) to a 2D (i.e., 2-dimensional) image fragment. This is possible in this case since the images of each slice described are very similar, so applying a segmentation to the average image and then expanding it to the rest of the slices in the block makes sense in this application in which an image average incorporates all volumetric information and at the same time will be extremely similar to all volume slices given the nature of the images. However, this technique would not be applicable to the automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence, since, for example, due to the very nature of the heart (heart tissue can vary markedly between some regions and others), the image of a cut can be very different depending on the region in which said cut is located.

In the articles by X. Lei, H. Pan and X. Huang (“A Dilated CNN Model for Image Classification”, in IEEE Access, vol. 7, pp. 124087-124095, 2019, doi: 10.1109/ACCESS.2019.2927169) and Ming Li, Chengjia Wang, Heye Zhang, Guang Yang (“MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis,” Computers in Biology and Medicine, vol. 120, 2020) the application of dilated convolutions specifically for the extraction of spatial information is disclosed. Specifically, in the second reference, an initial block is described in which the dilated convolutions are applied for the extraction of spatial patterns and to help in segmentation, and throughout the text it is mentioned that said module allows the extraction of spatial patterns at each instant. temporary. Subsequently, another module is described in which temporal information is extracted through LSTM layers. All this is summarized in the discussion section of the article, in which it is mentioned that "the proposed framework is based on a hybrid approach of pyramidal dilated dense convolution for precise extraction of spatial features, hierarchical convolution with recurrent units of LSTM for the recovery of temporary information, (...)”.

In the articles mentioned in the previous paragraph, the problem addressed is that of image classification, in which an image is available to classify (as an example, in the first mentioned reference "A Dilated CNN Model for Image Classification" it is provided of images of numbers from 0 to 9 written by hand and they want to be classified according to the number to which they refer). However, this technique cannot be directly applied to the problem of classifying volumetric images (3D images) within a temporal sequence of volumetric images (specifically in cardiac magnetic resonance cine sequences), since in this case a sequence is available. of N volumes as input with a temporal relationship to each other (similar to frames in a video) and it is necessary to classify each of the N volumes into the categories of systole, diastole or background, and where only one of the volumes has to be classified as systole and only one of the volumes is to be classified as diastole.

Document US10586531 discloses a system for speech recognition based on the use of a neural network that uses dilated convolutions to process the temporal relationship of the sequence of inputs. In this case, we are dealing with audio inputs in 1D format (ie, we are dealing with a 1-dimensional sequence over time: 1D+t). Dilated convolutions are used in a neural network for information processing with a temporal relationship; however, the application is completely different from that foreseen in the present invention and therefore it is not applicable in this case, since, for example, in the present invention the treatment of a sequence of volumes is required (that is, 3 dimensions over time: 3D+t).

Document US8345984B2 discloses a method based on a neural network for the classification of human actions in video sequences (that is, 2 dimensions along of time: 2D+t). For this, they are based on the use of 3D convolutions in the neural network, in such a way that the third component of the convolutions is used to extract the temporal information from the sequence of images in the video. The objective is therefore the classification of an action present in the complete video, the method not being applicable to the classification of each temporal instant of the sequence to determine the specific instant corresponding to an event, such as systole and diastole.

Document US10147193 describes a system based on a neural network for the segmentation of objects in images by means of an architecture that combines different levels of 2D dilated convolutions.

Therefore, it remains desirable to devise a method that allows automatically detecting systole and diastole in a cine sequence using the entire volumes of said sequence, the sequence comprising a variable number of time instants, and the volumes comprising a variable number of slices. .

SUMMARY OF THE INVENTION

The present invention solves the aforementioned problem by proposing a method as described in claim 1. Specifically, a computer-implemented method of automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence (also called "cine sequence under study" in this document) by using a purely convolutional neural network, the cine sequence comprising a plurality of time instants each of which corresponds to a volume, each volume comprising at least one cut. The method comprises the steps of: a) preparing the neural network, according to the following sub-steps: a1) preprocessing one or more training cine sequences of cardiac magnetic resonance (also called "training cine sequences" in this document) to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns, and dilated convolutions to encode temporal information; a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns and encode temporal information by means of the neural network, providing as a result a sequence of 3 values for each time instant, each value indicating the probability that the time instant corresponds to systole, diastole, or none of the above; b3) classifying the time points to detect a time point corresponding to systole and a time point corresponding to diastole.

Preferred embodiments of the present invention are described in the dependent claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Next, a preferred and non-limiting embodiment of the present invention will be described with reference to the following figures:

Figure 1 is a conversion schematic of a cine sequence volume. The entire sequence is made up of several temporal instants, each of which corresponds to a volume of the cardiac region over time (3D+t inputs). The figure shows the sections of one of these volumes. The preprocessing substage transforms each volume into a single image using the median operation applied on the z-axis. The final result is a sequence of images (2D + t) instead of a sequence of volumes.

Figure 2 is a schematic of the convolutional neural network used. The first section receives as input a sequence of images of size X x Y x n (where n corresponds to the number of time instants) and applies 2D convolutions and 2D pooling operations to extract spatial information from the images. and reduce its dimension in the 2D plane.The second section corresponds to a block of 3D convolutions with different paths that include different convolution sizes and dilation in the time axis (third dimension of the convolution).Finally, the network applies a convolution with the 2D softmax operation to generate the probabilities associated with each time instant of belonging to each category.

Figure 3 is a schematic of the final classification substep used. In the figure, several moments in time close to systole are reflected together with the associated probabilities of being systole. The time instant finally classified is the one that occupies the central position among those with a probability greater than 90%. In this case, it corresponds to time instant n+4, which in turn corresponds to the real systole of the sequence in the figure.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

According to the preferred embodiment of the present invention, a computer-implemented method for automatic detection of systole (maximal cardiac contraction) and diastole (state of cardiac relaxation) in cardiac magnetic resonance cine sequences through the use of of a purely convolutional neural network. A cardiac magnetic resonance cine sequence comprises a plurality of temporal instants acquired per cardiac cycle, each of which corresponds to a volume (3D image). Therefore, it is a 3D+t sequence. Each of the volumes can feature any number of slices (more or less heart sections).

In general, the method according to the preferred embodiment of the present invention comprises the following steps: a) preparing the neural network, according to the following substeps: a1) preprocessing one or several cine training sequences to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns (1), and dilated convolutions to encode temporal information (2); a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns (1) and encode temporal information (2) through the neural network, providing as a result a sequence of 3 values for each time instant (3), each value indicating the probability that the time instant corresponds to systole, to diastole or none of the above; b3) classifying the time points to detect a time point corresponding to systole and a time point corresponding to diastole.

According to the preferred embodiment of the present invention, substeps a1) and a2) can be performed in any temporal order with respect to each other. That is, substep a1) can be performed first followed by substep a2), substep a2) can be performed followed by substep a1), or both substep a1) and substep a2) can be performed simultaneously or substantially simultaneously, without thereby altering the result of the method disclosed in this document.

In this document, the term "z-axis" is used to refer to the axis along which the cut or cuts (corresponding to respective sections of the heart) that make up a volume are presented orthogonally.

An example of pre-processing according to the aforementioned sub-steps a1) and b1) is shown in figure 1 . Specifically, according to a preferred embodiment, each of the substeps a1) and b1) include, firstly, the numerical normalization of the signal of the pixels in each volume of the corresponding cine sequence (training cine sequence in substep a1 , and film sequence studied in substage b1), since the value may vary depending on the machine used to obtain it. Secondly, the size of the images is normalized to a fixed plane size in terms of the number of pixels. Finally, the entire volume is converted to give a single image. To do this, the median value of each pixel along the z axis of each volume is calculated, thus generating, for each instant in time, a single representative image of the entire volume. The end result is a sequence of images (2D+t sequence) instead of a sequence of volumes (3D+t sequence). This last step makes it possible to train a network using fewer hardware resources (the training of these neural networks requires the use of high-performance graphics cards), while incorporating the information from the entire volume into the final image. In the case of sequences in which each temporal instant corresponds to a single image (a single slice for each volume), then the last step is not necessary (its application would not modify the original sequence) since this is already a sequence with the same dimensions required by the network (2D+t sequence).

This is a substantial difference, for example, from the article “Convolutional neural networks for semantic segmentation of FIB-SEM volumetric image data” (Master's thesis in Mathematical Statistics, Skárberg, Fredrik, Department of Mathematical Sciences, UNIVERSITY OF GOTHENBURG) mentioned above . In the present case, an averaging factor is not applied to a region of the input, as is done in said article, but above all the input that corresponds to 3D+t medical images to convert them into 2D+1 images. On the other hand, another An important factor is that in said article the use of the mean is described, while in the method according to the present invention the median is applied. Both are extractable statistical elements of a distribution of values, but there are notable differences between the properties of both. Specifically, the most notable is that the median has the ability to be insensitive to the presence of outliers to difference from the mean. In the case described in the article, the average value is probably more appropriate than the median, since the average is done between cuts that are very similar to each other. On the other hand, in the case of the present invention, it is treated with cardiac image volumes in which there is a notable difference between slices, since the tissue can vary significantly in some regions of the heart with respect to others. With this type of images, the objective of averaging is to try to obtain a global representation of the contraction state of the tissue that is as unaltered as possible, so the suppression of aberrant data from the z axis of the volume is an important factor in order to avoid possible artifacts in the resulting image. For this reason, the median function is used, since the application of other averaging functions such as the average (as described in the aforementioned article) in the method of the present invention could produce the presence of artifacts and aberrations in the resulting images, given their nature.

An example of the aforementioned substep a2) is shown in figure 2 . The network design used includes 2D convolutions together with activation functions and pooling operations to extract spatial patterns (1) from the cine sequence images and progressively reduce their size. includes a module composed of 3D convolutions, where different convolution and dilation sizes are applied to encode the temporal information (2) of the cine sequence.Thus, according to the preferred embodiment of the present invention, convolutions with factor of dilation only in the temporal axis for the extraction of temporal information, unlike previously known techniques such as those disclosed in the articles “A Dilated CNN Model for Image Classification” and “MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis” mentioned above, in which dilated convolutions are applied for the extraction of spatial information.

The main characteristic of convolutions with dilation factor is that they allow to increase the field of view of the operation without increasing the number of parameters. This method of dealing with temporal sequences has the advantage of requiring less memory consumption by the hardware and of being easier to train compared to the more widespread recurrent networks such as LSTMs. The final result offered by the network is a sequence of 3 values for each time instant (3), where each value indicates the probability that the time instant corresponds to systole, diastole, or none of the above.

The network training substep (substep a3) according to the implementation method The preferred method of the present invention comprises applying data augmentation methodology to generate new modified sequences, which include rotations and translations of the images, as well as adding factors that increase the noise in the image. In this substage, the they include modifications of the temporal sequence through a delay that modifies the position of the temporal instants within the sequence and, therefore, the position in which the systole and diastole are located is modified.

For network training a cost function via the weighted Dice coefficient metric is also preferably used. This metric is widely used in the realm of image segmentation. Dice's coefficient is usually applied when trying to classify elements within an input with unbalanced categories (see, for example, the pixels within an image to segment, which is the method in which it is commonly used). However, in the case of the present invention, the input is a cine sequence of cardiac magnetic resonance volumes in which the internal elements to be classified are the volumes or time instants within the sequence. Therefore, the application of Dice's coefficient to the specific case of the present invention is not, in principle, evident from the known technique. In this case, the result can be interpreted as a segmentation of a vector, so its use, although unconventional, is equally valid. In the field of sequence classification, the use of this cost function has been described in the field of natural language processing for the classification of words within a sentence (see, for example, Li, X., Sun, X ., Meng, Y., Liang, J., Wu, F., & Li, J. (2019) Dice loss for data-imbalanced NLP tasks. arXiv preprint arXiv: 1911.02855) yielding superior results to other cost functions. However, this cost function does not appear to have been used before in sequences beyond this field. In the present case, it is about the classification of temporal instants within video sequences, not text. For this cost function, a weight is associated with the classification categories to give greater importance to the correct classification of the systole and diastole temporal moments. In this way, a greater weight is assigned to the systole and diastole vector with respect to the rest of the temporal moments of the sequence, thus allowing the cost function to be balanced and allowing the network to focus more on the correct classification of systole and diastole. In any case, the sum of the weights must be 1.

Once the neural network has been trained, it can be used to automatically detect systole and diastole in new cardiac MRI cine sequences according to step b). First of all, in substep b1) of the method according to the preferred embodiment of the present invention, the cine sequence object of study. Next, according to the preferred embodiment, in substep b2) the trained neural network automatically makes a prediction of probabilities associated with each time instant of the sequence. Finally, in substep b3) of the method according to the preferred embodiment, the results offered by the neural network in substep b2) are processed in order to carry out the correct detection of systole and diastole. The neural network can offer results in which several contiguous moments in time have a high probability of being systole (or equally of being diastole). In order to optimize the final classification, those moments in time with a greater than 90% probability of being systole or diastole, respectively, are preferably chosen; and then, only the time instant that occupies the average position among these time instants with a greater than 90% probability of being systole or diastole, respectively, is classified as systole or diastole, respectively. An example of this processing is shown in the accompanying figure 3 . Therefore, this last part of the classification substep b3) includes the application of a threshold (in this case, a probability greater than 90%) followed by the selection of the central or average element. A priori, it could be considered more obvious and direct for the final classification to simply choose the time instant that had a higher probability of belonging to the systole category (and likewise for diastole). However, it was verified by experimental studies that the images in the areas close to systole and diastole tend to be very similar (that is, the state of cardiac contraction is very similar between slices very close to each other in the images of the sequences). cardiac magnetic resonance imaging). Therefore, the neural network tended to obtain extremely high associated probabilities in the areas close to systole and diastole, making it difficult to determine the most correct point in time when the associated probabilities differ so little.

Therefore, in the method disclosed herein, a very high initial selection threshold is applied for the probabilities of systole and diastole respectively (as mentioned above, only values with probability greater than 90%) are selected. This differs substantially from other methods known in the prior art, in which a greater range of probabilities is covered, including probabilities that are difficult to decide a priori, such as values close to 50%; which can cause a greater error in the classification.

In the specific application disclosed in the present document, the problem is that it is not possible to locate an unambiguous peak in the obtained probabilities (covering an interval of about 95 to 100% in the time instants surrounding the real one without a pattern of clear peak), and therefore the selection of the central value is applied among those values with very high probabilities. The application of this final classification step is given by the very nature of the images used and by the proposed neural network design. It is considered that, in a different design applied to another problem, the selection of the point with the highest probability could be applied as the optimal choice, as disclosed in other methods known in the art, the selection of the central element not being evident a priori. from a set of values with high probabilities.

A specific example of a practical case of application of the method disclosed in this document is described below. In this study, a database of cardiac magnetic resonance cine sequences (training cine sequences) corresponding to a total of 399 patients was used. Among these patients, the number of slices per volume was variable between 8 and 14, and the number of volumes was variable between 14 and 35, with the vast majority being 35. Temporal resolution was 52.92 ms.

Based on the availability of such a database of cardiac magnetic resonance cine sequences, it is desired to apply the method of the present invention in a clinical setting, for its use in the automatic calculation of systole and diastole in new cine sequences ( cinema sequences object of study).

First, in step a): a1) The cine sequences of the available database are normalized. a2) The neural network described above is designed and implemented in a program. a3) The implemented neural network is trained using the normalized cine sequences from the available database, according to the methodology described above.

Once the trained neural network is available, it can be used in the clinical setting. To do this, each time it is desired to detect the systole and diastole of a new cine sequence, stage b) is applied according to the following substages: b1) The cine sequence under study is normalized. b2) The normalized cine sequence is passed to the neural network and this provides the list of probabilities associated with the temporal instants of the cine sequence. b3) The final classification method is applied to the probabilities obtained by the neural network to generate the final classification.

A comparative study is included below between the results obtained by applying the method according to the preferred embodiment of the present invention and the results obtained by other methods. Specifically, this document evaluates the results directly on 99 cases, in which there are cases with variable numbers of time instants and volumes. It compares using 4 training methods and final classification:

- Cost function with crossed entropy (standard function for classification problems in deep learning) and classification of the time instants in systole and diastole based on the one that obtained the highest associated probability (naive method in the following table).

- Cost function with crossed entropy and classification of the time instants in systole and diastole using the selection of the high probability center point (average method in the following table).

- Cost function with weighted Dice factor and classification of the systolic and diastolic time points according to which obtained the highest associated probability (naive method in the following table).

- Cost function with weighted Dice factor and classification of systolic and diastolic time points using the selection of the high-probability central point (average method in the following table).

As a comparative measure, the mean of the distance of the classified temporal instant with respect to the real one for systole and diastole is used.

In the above table it can be seen that the error in the method proposed by the present invention (training method "Dice weighted" and final classification by "average method") is the smallest, both for diastole and systole. A mean error of 0 is obtained for diastole and 1.242 ± 1.45 for systole. If the errors of each case are normalized as a function of the number of time instants of each sequence, the normalized errors presented in the following table are obtained:

A value of 1 would imply that the time instant classified as systole (or diastole in its case) is the farthest possible from the real one. It is observed that in the case of the present invention the error rate is 0.036 on average in systole. In diastole, the result obtained with the method of the present invention is perfect.

Some of the novel aspects of the method of the present invention are presented below, without claiming that this list is exhaustive in any way:

- In the first place, the method of the invention makes it possible to automatically detect the temporal instant of cardiac systole and diastole in cardiac magnetic resonance cine sequences using the entire sequence volumes.

- For the detection of systole and diastole, each volume of cardiac images is converted to a single representative image of the entire volume. This specifically makes it possible to incorporate more time sequence information and reduce the number of data to be processed.

- The temporal analysis of the sequence is implemented through deep learning algorithms, specifically through the use of 3D convolutions including different sizes of convolution and dilation. This allows the implementation of a simpler network with less number of parameters, making the process faster and with lower hardware requirements.

- The training uses the weighted Dice coefficient as a cost function in the classification of a video sequence. This cost function has been used in image segmentation applications, but not in classifications as in the proposed method. The use of this cost function has been shown to offer better results than other more traditional cost functions in the treatment of other types of sequences, specifically text sequences. In the problem posed, other cost functions are usually used (crossed entropy being the paradigm) although it has been shown that in various applications where there is an imbalance of categories (as in the present case, in which the categories of systole and diastole only correspond to a time instant, respectively, and the rest of the sequence corresponds to the background category) the Dice coefficient works better. In this way, the method makes use of this cost function in a new application such as the classification of temporal instants in video sequences.

The aforementioned features and novelty aspects of the method according to the present invention give it various advantages over alternatives currently known in the art. Some of these advantages have already been explained or flow from the above description, and include, but are not limited to:

- The method according to the present invention allows the use of sequences with complete volumes, incorporating all the volumetric information of the sequence.

- The method according to the present invention does not require previous segmentations to train the neural network.

- The sequence preprocessing substage allows to reduce the number of data passed to the network and, at the same time, to maximize the global information available. All of this makes training the neural network and predicting the probabilities that a given time instant corresponds to systole, diastole, or none of the above, faster and more feasible. In turn, the design of the neural network makes use of dilated convolutions for temporal coding, thus reducing the number of necessary parameters in contrast to recurrent networks. This makes the training substage more stable and faster, and also allows inferences to be made at a higher speed. These features allow the use of hardware with less features by requiring fewer memory resources. This implies cheaper implementation of an application that uses the method disclosed in this document, in addition to the fact that the resulting network can be executed more quickly.

- The design of the network allows the detection of systole and diastole in sequences of arbitrary duration, unlike other studies carried out in which the number of time points in the sequence is fixed. This broadens the applicability of the proposed method for different environments in which magnetic resonance imaging equipment can offer cine sequences of variable duration.

- The dilated convolution block design can be extended depending on the available hardware. The design proposes a general implementation that can be extended depending on the number of convolutions and the number of channels of the same presented in each block. It must be considered that the memory required by the graphics cards used will increase based on these variations.

As previously mentioned, thanks to the method of the present invention it is possible to automate the detection process of systole and diastole in cine image sequences by cardiac magnetic resonance. This allows speeding up the process of analysis and diagnosis of cardiac pathologies, as well as reducing the variability of the diagnosis dependent on the user.

However, the method of the present invention can find applications in other fields than the one mentioned above, for the training of a neural network for its application in the classification of temporal instants in sequences of another type. different from the one described. Some examples may be: classification of scenes in movie videos into categories (humorous scene, violent scene, scene with appropriate/inappropriate content, etc.), classification of medical imaging acquisitions when contrast has been injected into categories (non-presence contrast, increased presence of contrast, maximum presence of contrast, decreased presence of contrast, etc.), activity classification of human actions in a video in security systems (no action, violent action, robbery action, action of burglary, action with possession of weapons, etc.), classification into categories of acquisitions of fMRI medical image sequences ("functional magnetic resonance imaging") to analyze brain activity in a region of the brain (it does not present brain activation ( rest), low brain activation, high brain activation, etc.) In general, the method disclosed in the present document can find application in the classification of frames or temporal moments within a sequence of images in video format. Furthermore, it allows detection on streams of any duration, as well as on streams with volumes of arbitrary size.

On the other hand, the method of the present invention can be implemented with lower hardware requirements than other designs proposed for the same purpose. This makes its implementation economically more viable.

Finally, thanks to the neural network design proposed in this document, its applicability can be extended to different protocols of different magnetic resonance imaging equipment in which the duration and number of slices in the volumes of the cine sequence obtained they can be variable. This increases its applicability within the market.

Having described the present invention with reference to a preferred embodiment thereof, the person skilled in the art will be able to make obvious modifications and variations to said embodiment without departing from the scope of protection defined by the following claims.

Claims

1. Method of automatic detection of systole and diastole in a cardiac magnetic resonance cine sequence by using a purely convolutional neural network, the cine sequence comprising a plurality of time instants each of which corresponds to a volume, each volume comprising at least one cut, the method being characterized in that it comprises the stages of: a) preparing the neural network, according to the following substages: a1) preprocessing one or several cine training sequences to normalize their input to the neural network; a2) design the neural network including convolutional layers to extract spatial patterns (1), and dilated convolutions to encode temporal information (2); a3) training the neural network using the standard training cine sequence(s); b) apply the trained neural network to the automatic detection of systole and diastole in the cardiac magnetic resonance cine sequence under study, following the following sub-stages: b1) preprocess the cine sequence under study to normalize its input to the neural network; b2) extract spatial patterns (1) and encode temporal information (2) through the neural network, providing as a result a sequence of 3 values for each time instant (3), each value indicating the probability that the time instant corresponds to systole, to diastole or none of the above; b3) classifying the time points to detect a time point corresponding to systole and a time point corresponding to diastole; with the particularity that substages a2) and b2) each comprise:

- 2D convolutions together with activation functions and grouping operations, to extract spatial patterns (1) from cine sequence images and reduce their size progressively in the 2D plane; and

- next, a module composed of 3D convolutions, where different convolution and dilation sizes are applied on a time axis to encode the time information (2) of the cine sequence, the time axis being the third dimension of the convolutions in 3D.

The method according to claim 1, characterized in that substeps a1) and a2) can be carried out in any time order with respect to one another. Method according to any of the preceding claims, characterized in that substeps a1) and b1) each comprise:

- performing a numerical normalization of the signal of the pixels in each volume of the sequence;

- normalize the size of the images to a fixed plane size in terms of the number of pixels;

- convert the entire volume to give a single image; thus generating for each time instant a single representative image of the complete volume, thus obtaining a sequence of images instead of a sequence of volumes. Method according to claim 3, characterized in that converting the entire volume to give a single image comprises calculating the median value of each pixel along the z-axis of each volume. Method according to any of the preceding claims, characterized in that substep a3) comprises:

- apply data augmentation methodology to generate new modified sequences, which include rotations and translations of the images, as well as adding factors that increase image noise;

- use a cost function through the metric of the weighted Dice coefficient for the classification of temporal instants within the sequence. Method according to claim 5, characterized in that applying the data increase methodology comprises modifications to the temporal sequence through a delay that modifies the position of the temporal instants within the sequence and therefore the position in which it is located is modified. the systole and diastole. Method according to any of the preceding claims, characterized in that substep b3) comprises:

- choose the moments in time with a probability greater than 90% of being systole or diastole, respectively; and

- classify as systole or diastole, respectively, only the time instant that occupies the average position among these time instants with a probability greater than 90% of being systole or diastole, respectively.