CN111667459A

CN111667459A - Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion

Info

Publication number: CN111667459A
Application number: CN202010360486.5A
Authority: CN
Inventors: 马杰超; 张树; 俞益洲
Original assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Current assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-09-15
Anticipated expiration: 2040-04-30
Also published as: CN111667459B

Abstract

The application provides a medical sign detection method, a system, a terminal and a storage medium based on 3D variable convolution and time sequence feature fusion, wherein the method comprises the following steps: acquiring 3D input data or pseudo 3D input data of the medical image; constructing a variable convolution network model, inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model; inputting the 3D input data or the pseudo 3D input data of the medical image into a trained variable convolution neural network model; modeling prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence characteristics of different scales; carrying out multi-scale pyramid progressive fusion on the prediction data of fusion time sequence characteristics with different resolutions to obtain a candidate frame of the medical symptom; the method and the device solve the problems of the computing efficiency of the 3D network and the loss and underutilization of multi-layer related information of the pseudo 3D network in the prior art.

Description

Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion

Technical Field

The application relates to the technical field of medical images and computer assistance, in particular to a medical sign detection method, a medical sign detection system, a medical sign detection terminal and a medical sign detection storage medium based on 3D variable convolution and time sequence feature fusion.

Background

The detection of medical signs is an important issue in the field of diagnosis of diseases. Traditional definitive diagnosis of suspected malignant areas requires that the patient's tissue be cut from the lesion and biopsied. However, this procedure places high demands on the position and angle of the slices and also requires consideration of the trauma level of the patient.

With the development of a plurality of medical imaging technologies and the improvement of medical equipment, an opportunity is provided for solving the problem. Computer vision and artificial intelligence have been developed rapidly over the last two decades, and many computer-aided diagnosis systems have been used to assist doctors in making diagnosis. The chest CT has the advantages of high spatial resolution, fast scanning time, clear image, capability of completing three-dimensional reconstruction of lesions, and the like, and is greatly determined by patients and doctors. For CT images, medical image processing has its special and different emphasis compared to image processing in the general sense. For natural images, general deep learning techniques are based on 2D images to solve the related problems. However, for medical images, different slices of the same medical sign have high correlation and complementarity between information, and if the detection of the image is performed only on a single slice, most of the information of the image is not effectively used, which not only wastes information, but also causes deviation of the final diagnosis result.

With the development of deep learning technology, a series of methods are available for accurate 3D target detection. The existing medical symptom detection can be generally divided into two common technologies, the first is to directly cut the 3D input in the form of patch and then input the 3D input into a 3D network for judgment, but this approach has high requirements on computing resources. The second method is to input pseudo 3D, which is formed by splicing successive layers, into a 2D network, and the method fuses the input layers at an early stage of the network, but because the method simultaneously models the x and y axes of the spatial 2D layer and the z axis of the layer with different time sequences, and because the resolution of the x and y axes is different from the z axis during the reconstruction of the CT image, the data of two dimensions needs to be modeled respectively, and the same weight parameters are applied to the x and y axes of the spatial dimension and the z axis of the time sequence dimension, the network is difficult to learn the effective difference between different resolutions. In addition, the pseudo 3D structure learns the weights of each layer before being input, but the weight information exists only at the input layer of the network, which may cause the multi-layer related information (such as characteristic patterns specific in high-dimensional sequence data) of the pseudo 3D to be lost or not fully utilized at the later stage of the network.

Therefore, a medical symptom detection method, a medical symptom detection system, a medical symptom detection terminal and a storage medium based on 3D variable convolution and time sequence feature fusion are needed to solve the problem of computational efficiency of a 3D network and the problem of insufficient utilization of multi-level related information loss of pseudo 3D in the existing 3D medical symptom detection.

Disclosure of Invention

Aiming at the defects of the prior art, the application provides a medical sign detection method, a medical sign detection system, a medical sign detection terminal and a medical sign detection storage medium based on 3D variable convolution and time sequence feature fusion, and solves the problems of the prior art that the calculation efficiency of a 3D network is high, the multi-layer related information of a pseudo 3D network is lost and not fully utilized, and the like.

In order to solve the above technical problem, in a first aspect, the present application provides a medical image detection method based on 3D variable convolution and time series feature fusion, including:

acquiring 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;

constructing a variable convolution network model, inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;

inputting the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;

modeling prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence characteristics of different scales;

and carrying out multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence characteristics with different resolutions to obtain a candidate frame of the medical symptom.

Optionally, the acquiring of the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers includes:

preprocessing the medical image to obtain 3D input data;

and (3) scanning and cutting the medical image into 2D (two-dimensional) level images, and splicing a plurality of continuous 2D level images to obtain pseudo-3D input data.

Optionally, the inputting the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers into the trained variable convolutional neural network model includes:

constructing a variable convolution network model;

inputting standard medical sign detection data into a variable convolutional neural network for training;

and calculating by a parallel standard convolutional neural network to obtain an offset parameter, and learning the variable convolutional neural network end to end by gradient back propagation to obtain a trained variable convolutional neural network model.

Optionally, the modeling the prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence features of different scales includes:

acquiring an image frame sequence with time-series prediction data;

extracting the features of the medical signs from the image frame sequence to obtain a medical sign feature map;

and fusing the medical sign feature images according to the time sequence to obtain medical sign prediction data based on fused time sequence features of different scales.

Optionally, the performing multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence features with different resolutions to obtain a candidate frame of the medical sign includes:

and (3) performing feature addition through 1 × 1 convolution kernels by upsampling the high-level features to obtain a candidate frame of the medical feature.

Optionally, the method further includes:

and setting a threshold corresponding to the medical sign candidate frame according to the category of the medical sign candidate frame, and outputting the candidate frame with the detection score exceeding the threshold to obtain a medical sign detection result.

Optionally, the method further includes:

comparing the size of the candidate frame of the medical sign and the output prediction data with the corresponding standard medical sign detection data, and calculating to obtain a difference value between the prediction data and the actual data of the medical sign;

and (4) returning loss of the variable convolutional neural network model through the difference value, and optimizing the variable convolutional neural network model.

In a second aspect, the present application further provides a medical image detection system based on 3D variable convolution and time series feature fusion, comprising:

the data acquisition unit is configured for acquiring 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;

the model construction unit is configured for constructing a variable convolution network model, and inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;

the data input unit is configured to input 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;

the time sequence feature fusion unit is configured for modeling the prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence features of different scales;

and the pyramid progressive fusion unit is configured to perform multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence features with different resolutions to obtain a candidate frame of the medical symptom.

Optionally, the data obtaining unit is specifically configured to:

preprocessing the medical image to obtain 3D input data;

Optionally, the model building unit is specifically configured to:

constructing a variable convolution network model;

Optionally, the time sequence feature fusion unit is specifically configured to:

acquiring an image frame sequence with time-series prediction data;

Optionally, the pyramid progressive fusion unit is specifically configured to:

Optionally, the system further includes:

and the candidate frame extracting unit is configured for setting a threshold corresponding to the medical symptom candidate frame according to the category of the medical symptom candidate frame, and outputting the candidate frame with the detection score exceeding the threshold to obtain the detection result of the medical symptom.

Optionally, the system further includes:

the model loss calculation unit is configured for comparing the size of the candidate frame of the medical sign and the output prediction data with corresponding standard medical sign detection data and calculating a difference value between the prediction data and actual data of the medical sign; and (4) returning loss of the variable convolutional neural network model through the difference value, and optimizing the variable convolutional neural network model.

In a third aspect, the present application provides a terminal, comprising:

a processor, a memory, wherein,

the memory is used for storing a computer program which,

the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.

In a fourth aspect, the present application provides a computer storage medium having instructions stored thereon, which when executed on a computer, cause the computer to perform the method of the above aspects.

Compared with the prior art, the method has the following beneficial effects:

1. according to the method and the device, the 3D variable convolution and the time sequence feature fusion are combined, a model framework which can not only aim at 3D input but also aim at pseudo 3D input is constructed, and the model framework can be applied to different positions, different data types and different task scenes.

2. The 3D-based variable convolution can adaptively adjust the high resolution on the spatial dimension according to different resolutions of an x axis, a y axis and a z axis, and can also adaptively adjust the parameters of network learning according to the geometric change of medical signs, thereby learning irregular shape information on the spatial dimension, learning the correlation information between different layers on the time sequence dimension, and adaptively learning the weight relationship between the layers;

3. according to the method and the device, time sequence feature fusion is introduced, feature fusion is carried out in the later stage of the network, so that the model can well learn high-dimensional feature representation among image information, modeling is carried out on information of different layers in the later stage according to the time sequence information, the time sequence characteristics of data are effectively utilized, and the detection performance of the model is further improved. The method also conforms to the habit that the doctor in the imaging department checks the film according to the time sequence relation when the film is read, so that the built model has more reasonability.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a medical image detection method based on 3D variable convolution and temporal feature fusion according to an embodiment of the present application;

fig. 2 is a flowchart of another medical feature detection method based on 3D variable volume and temporal feature fusion according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a medical image detection system based on 3D variable convolution and temporal feature fusion according to another embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal system according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a medical image detection method based on 3D variable convolution and temporal feature fusion according to an embodiment of the present application, where the method 100 includes:

s101: acquiring 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;

s102: constructing a variable convolution network model, inputting standard medical sign detection data into a variable convolution neural network for training to obtain a trained variable convolution neural network model;

s103: inputting the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers into a trained variable convolution neural network model;

s104: modeling prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence characteristics of different scales;

s105: and carrying out multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence characteristics with different resolutions to obtain a candidate frame of the medical symptom.

Based on the foregoing embodiment, as an optional embodiment, the S101 acquiring 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D slices includes:

preprocessing the medical image to obtain 3D input data;

Specifically, the dimension of 3D input data of the medical image is B × D × H × W, where B denotes a batch size during model input, D denotes the number of slices in the data time sequence dimension, and H and W denote the length and width of the image, respectively; the dimension of the pseudo 3D input data of the medical image is B × C × H × W, where B denotes a batch size at the time of model input, C denotes a plurality of channels formed by the multi-layered image, and H and W denote the length and width of the image, respectively.

It should be noted that the present application can be adapted to a currently commonly used 3D input and a pseudo 3D input formed by splicing a plurality of consecutive 2D layers, and for the input of a 3D structure and the input of a pseudo 3D structure, the difference is that the final output forms are 3D and 2D images. For 3D input, the weights of each layer are the same and learning is required, while for pseudo 3D input, typically only the middle layer of the image is predicted, and the successive upper and lower layers are only the auxiliary information data for predicting the layer.

Based on the foregoing embodiment, as an optional embodiment, the S102 constructs a variable convolutional network model, and inputs the standard medical sign detection data to the variable convolutional neural network for training, so as to obtain a trained variable convolutional neural network model, including:

constructing a variable convolution network model;

Specifically, for a conventional 3D convolution kernel, which is 3 x 3 in size, the sampling points are arranged in squares, but many medical signs appear to have irregular shapes. For the 3D variable convolution kernel, the structure of the medical symptoms can be adapted in a self-adaptive mode, artifacts caused by respiration and other motions can be effectively reduced, meanwhile, a learnable offset parameter can be added to each point, the convolution kernel can learn irregular medical symptoms, the size and the position of the deformable convolution kernel can be adjusted dynamically according to the image content needing to be identified at present, and the visual effect of the variable convolution kernel is that the positions of sampling points of the convolution kernels at different positions can be changed in a self-adaptive mode according to the image content, so that the variable convolution kernel is suitable for geometric deformation of shapes, sizes and the like of different objects.

Based on the foregoing embodiment, as an optional embodiment, the S104 modeling the prediction data output by the variable convolutional neural network model by using time series fusion to form prediction data based on fusion time series features of different scales, including:

acquiring an image frame sequence with time-series prediction data;

Specifically, for the features in each stage, the features are composed of multiple 2d networks of channels, and the channels of the multiple 2d networks are regarded as timelines; the time sequence is then modeled using LSTM, and the medical symptom feature map is input into the LSTM time sequence model to obtain medical symptom prediction data based on fused time sequence features of different scales.

For both 3D images and pseudo-3D images, the nature of the input is continuous in several layers. There will be some temporal correlation between these layers, as when looking at video, the information between several frames is relevant and continuous, and so on for CT images. For the time series feature fusion stage, the input vector is (B, T, H, W), and it can be seen that the time series feature fusion is also applicable to the two input modes. Each layer of the input is considered as each frame of the video, and for each time point, a corresponding output can be generated, and for this input, both the information of the previous time point and the information of the input layer at this time point are taken into account. And the output on the time node is also circularly connected with the hidden unit of the next time node, so that the output state of the next time point is influenced. To better simulate the timing relationships, convolution structures are used in both input-to-state and state-to-state transitions. And forming a time sequence characteristic fusion structure by overlapping a plurality of time sequence layers. An important concept in the time sequence feature fusion structure is a forgetting mechanism, and by modeling the forgetting mechanism, for some information in the current time point, it is determined which information is forgotten from the state, and which information needs to be output to the next time point. Recall that when a doctor reads a picture, if there is no abnormality at a certain level, the doctor quickly looks over the next one, and if it says that a suspicious symptom appears at a certain level, the doctor can perform careful analysis. This mechanism can be accomplished by a gating layer that reads the information of h (t-1) and x (t) and outputs a value between 0 and 1 in the C (t-1) state. Where 1 indicates "complete retention" and 0 indicates complete rejection. For each output of the variable convolutional neural network model, modeling using time series fusion, fusion features based on different scales can be formed.

Based on the foregoing embodiment, as an optional embodiment, in S105, performing multi-scale pyramid progressive fusion on the prediction data of the fusion time series features with different resolutions to obtain a candidate frame of the medical symptom, where the candidate frame includes:

Specifically, pyramid (FPN) fusion can be divided into 5 layers (stages): (F1, F2, F3, F4 and F5; each F layer represents different size resolution), after the 3D feature layers in each F layer are subjected to time sequence fusion, all the 5 candidate frames with different resolutions are added up to be subjected to multi-scale fusion to obtain the candidate frame of the medical symptom, and the multi-scale problem in medical symptom detection can be solved. The method comprises the steps of compressing or amplifying a certain input picture to form pictures with different dimensions as model input, respectively processing the pictures with different dimensions by using the same model, and finally combining the respectively obtained features to obtain a feature set capable of reflecting multi-dimensional information. In the pyramid progressive fusion process, the channel numbers of two feature layers are unified by upsampling the high-layer features and adding the features through a 1 × 1 convolution kernel, so that richer semantic information can be obtained.

During the detection of multiple signs in the lung, pulmonary nodules of extremely small size may appear, as well as emphysema that spreads throughout the chest. Under the condition of basically not increasing the calculated amount of the original model, the performance of medical sign detection of different sizes can be greatly improved through simple network connection change. Specifically, for the convolutional neural network, different depths correspond to different levels of semantic features, the shallow network has high resolution, more learners are detail features, the deep network has low resolution, and more learners are semantic features. As the network is propagating forward in the variable convolution modeling stage, each calculation is downsampled, i.e., each feature is the original 1/2.

Based on the foregoing embodiment, as an optional embodiment, the method 100 further includes:

Specifically, suspicious region candidate frame extraction is performed on features including time sequence information and semantic information, and a group of candidate frames with different sizes and different aspect ratios are generated for each point in the features. For each candidate box, there is a score that measures how much of the region box contains the object to be detected. When the probability of the candidate box is greater than a given threshold, it is determined as a result of the detection of the medical sign.

Referring to fig. 2, fig. 2 is a flowchart of another medical symptom detection method based on 3D variable convolution and time series feature fusion according to an embodiment of the present disclosure, where the method mainly includes six stages, a 3D/pseudo 3D data input stage, a variable convolution neural network modeling stage, a time series feature fusion stage, a pyramid progressive fusion stage, a candidate box extraction stage, and a model loss calculation stage. For the time-series feature fusion module, it can be applied after the variable convolutional neural network modeling stage, and also after the pyramid progressive fusion stage (only one of them is shown in the figure).

It should also be noted that the present application is applicable to 3D or pseudo 3D based related tasks such as CT images and multi-modality MRI images. In addition, the network model framework can be applied to various symptoms, various body parts, various data types and various tasks, can be used for detecting chest-based symptoms, such as pulmonary nodules and masses, frosted glass shadows, semilunar air, real change shadows, streak shadows, emphysema, bullae, pleural effusion, bronchiectasis, pleural thickening and other symptoms and diseases in the lung, and can also be used for example segmentation and semantic segmentation of focus areas, such as delineation of brain tumor areas and delineation of lung lobe segments; and classification of lesions, such as judgment of malignancy of lung nodules, grading of degree of squamous carcinoma adenocarcinoma infiltration, and the like.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a medical image detection system based on 3D variable convolution and temporal feature fusion according to an embodiment of the present application, where the system 300 includes:

a data acquisition unit 301 configured to acquire 3D input data of a medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers;

the model building unit 302 is configured to build a variable convolutional network model, and input standard medical sign detection data into a variable convolutional neural network for training to obtain a trained variable convolutional neural network model;

a data input unit 303 configured to input 3D input data of the medical image or pseudo 3D input data formed by splicing a plurality of continuous 2D layers to a trained variable convolutional neural network model;

the time sequence feature fusion unit 304 is configured to model the prediction data output by the variable convolutional neural network model by using time sequence fusion to form prediction data based on fusion time sequence features of different scales;

the pyramid progressive fusion unit 305 is configured to perform multi-scale pyramid progressive fusion on the prediction data of the fusion time sequence features with different resolutions to obtain a candidate frame of the medical symptom.

Based on the foregoing embodiment, as an optional embodiment, the data obtaining unit 301 is specifically configured to:

preprocessing the medical image to obtain 3D input data;

Based on the foregoing embodiment, as an optional embodiment, the model building unit 302 is specifically configured to:

constructing a variable convolution network model;

Based on the foregoing embodiment, as an optional embodiment, the time sequence feature fusion unit 304 is specifically configured to:

acquiring an image frame sequence with time-series prediction data;

Based on the foregoing embodiment, as an optional embodiment, the pyramid progressive fusion unit 305 is specifically configured to:

Based on the foregoing embodiment, as an optional embodiment, the system 300 further includes:

Referring to fig. 4, fig. 4 is a schematic structural diagram of a terminal system 400 according to an embodiment of the present disclosure, where the terminal system 400 can be used to execute the software multi-language display and input synchronization switching method according to the embodiment of the present disclosure.

The terminal system 400 may include: a processor 401, a memory 402, and a communication unit 403. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.

The memory 402 may be used for storing instructions executed by the processor 401, and the memory 402 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The execution instructions in the memory 402, when executed by the processor 401, enable the terminal system 400 to perform some or all of the steps in the method embodiments described below.

The processor 401 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 401 may only include a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.

A communication unit 403, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.

The present application also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

According to the method, 3D variable convolution and time sequence feature fusion are combined, a model framework which can not only aim at 3D input but also aim at pseudo 3D input is constructed, and the model framework can be applied to different positions, different data types and different task scenes; the 3D-based variable convolution can adaptively adjust the high resolution on the spatial dimension according to different resolutions of an x axis, a y axis and a z axis, and can also adaptively adjust the parameters of network learning according to the geometric change of medical signs, thereby learning irregular shape information on the spatial dimension, learning the correlation information between different layers on the time sequence dimension, and adaptively learning the weight relationship between the layers; according to the method and the device, time sequence feature fusion is introduced, feature fusion is carried out in the later stage of the network, so that the model can well learn high-dimensional feature representation among image information, modeling is carried out on information of different layers in the later stage according to the time sequence information, the time sequence characteristics of data are effectively utilized, and the detection performance of the model is further improved. The method also conforms to the habit that the doctor in the imaging department checks the film according to the time sequence relation when the film is read, so that the built model has more reasonability.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A medical sign detection method based on 3D variable convolution and time sequence feature fusion is characterized by comprising the following steps:

2. The medical image detection method based on 3D variable volume and time series feature fusion according to claim 1, wherein the acquiring of the 3D input data of the medical image or the pseudo 3D input data formed by splicing a plurality of continuous 2D layers comprises:

preprocessing the medical image to obtain 3D input data;

3. The method according to claim 1, wherein the inputting of the 3D input data of the medical image or the pseudo 3D input data obtained by splicing a plurality of continuous 2D slices into the trained variable convolutional neural network model comprises:

constructing a variable convolution network model;

4. The medical sign detection method based on 3D variable convolution and time-series feature fusion according to claim 1, wherein the modeling of the prediction data output by the variable convolution neural network model by using time-series fusion to form prediction data based on fused time-series features of different scales comprises:

acquiring an image frame sequence with time-series prediction data;

5. The method for detecting medical signs based on 3D variable convolution and time series feature fusion according to claim 1, wherein the multi-scale pyramid progressive fusion of the prediction data of the fused time series features with different resolutions to obtain the candidate frame of the medical signs comprises:

6. The method of medical feature detection based on 3D variable volume and temporal feature fusion of claim 1, further comprising:

7. The method of medical feature detection based on 3D variable volume and temporal feature fusion of claim 1, further comprising:

8. A medical feature detection system based on 3D variable convolution and temporal feature fusion, comprising:

9. A terminal, comprising:

a processor;

a memory for storing instructions for execution by the processor;

wherein the processor is configured to perform the method of any one of claims 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.