CN116392260A

CN116392260A - Control device and method for vascular intervention operation

Info

Publication number: CN116392260A
Application number: CN202310233812.XA
Authority: CN
Inventors: 周小虎; 李�浩; 谢晓亮; 刘市祺; 奉振球; 侯增广; 姚泊先; 黄德兴; 于喆; 项天宇; 桂美将
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-07-07

Abstract

The invention provides a control device and a control method for vascular intervention operation, and relates to the technical field of control, wherein the device comprises: the acquisition module is used for acquiring the first image data at the previous moment, the second image data at the current moment and the first action information at the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation; the prediction module is used for obtaining the probability of selecting each action at the current moment output by the target operation model; the target operation model is trained based on an offline reinforcement learning method; a determining module for determining a target control instruction based on a probability of selecting each action; a control module for controlling movement of the instrument in the blood vessel based on the target control instructions. The invention improves the accuracy of the target operation model obtained by training, and further improves the accuracy of the movement of the instrument in the blood vessel.

Description

Control device and method for vascular intervention operation

Technical Field

The invention relates to the technical field of control, in particular to a control device and method for vascular intervention operation.

Background

Vascular intervention is a minimally invasive treatment modality performed using robotic systems. Under the guidance of the imaging system, a doctor operates the robot system to control interventional instruments such as a guide wire to reach the focus position through the vascular cavity so as to perform treatments such as thrombolysis, dilating stenotic blood vessels and the like.

In the related art, a blood vessel interventional operation model is generally trained by adopting an imitation learning method or a statistical learning method based on a doctor operation example, so that autonomous instrument delivery of the blood vessel interventional operation robot is realized.

However, in the above related art, the model is trained by using a simulated learning method or a statistical learning method, and optimization of model parameters is aimed at a doctor operation example, so that accuracy of the model depends on quality of the doctor operation example, and if quality of the doctor operation example is poor, accuracy of the trained model is reduced, and further accuracy of movement of the instrument in a blood vessel is reduced.

Disclosure of Invention

Aiming at the problems existing in the prior art, the embodiment of the invention provides a control device and a control method for vascular intervention operation.

The invention provides a control device for vascular intervention operation, comprising:

the acquisition module is used for acquiring the first image data at the previous moment, the second image data at the current moment and the first action information at the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation;

The prediction module is used for inputting the first image data, the second image data and the first action information into a target operation model to obtain the probability of selecting each action at the current moment output by the target operation model; the target surgical operation model is obtained by training based on an offline reinforcement learning method by using first image sample data, second image sample data and action sample information corresponding to the first image sample data; the first image sample data is data acquired at a previous time of the second image sample data;

a determining module for determining a target control instruction based on a probability of selecting each of the actions;

a control module for controlling movement of the instrument in the blood vessel based on the target control instructions.

According to the control device for the vascular intervention operation, the target operation model is trained based on the following modes:

inputting the first image sample data and the second image sample data in the sample data into an encoder of an initial operation model to obtain a coded information sample output by the encoder;

Inputting the coded information samples and the action sample information in the sample data into a strategy estimation sub-model and a function estimation sub-model of the initial operation model to obtain a prediction probability of each action selected at a first moment output by the strategy estimation sub-model and a function estimation value output by the function estimation sub-model; the first moment is the moment corresponding to the second image sample data;

selecting a predicted probability of each action based on the first moment, determining a target action, and determining a return value of the first moment based on the target action;

updating model parameters of the initial operation model based on the return value and the function estimated value to obtain the target operation model; and the accumulated return value corresponding to the target operation model is the largest.

According to the control device for vascular intervention operation provided by the invention, the updating of the model parameters of the initial operation model based on the return value and the function estimation value to obtain the target operation model comprises the following steps:

determining a function estimation loss function based on the return value, the function estimation value, an estimation value of an objective function, and the advantage of the sample data relative to an agent policy;

Determining a policy mimicking loss function based on the agent policy and the sample data;

and estimating a loss function based on the function and simulating the loss function by the strategy, and optimizing model parameters of the initial operation model to obtain the target operation model.

According to the control device for vascular intervention operation provided by the invention, the model parameters of the initial operation model are optimized based on the function estimated loss function and the strategy simulated loss function to obtain the target operation model, and the control device comprises the following steps:

estimating a loss function based on the function and simulating the loss function by the strategy, and optimizing model parameters of the initial surgical operation model to obtain a simulated surgical operation model;

determining a policy optimization loss function based on the agent policy and an estimated value of the objective function;

optimizing a loss function based on the strategy and the function estimated loss function, and optimizing model parameters of the simulation-based operation model to obtain the target operation model.

According to the control device for vascular intervention operation provided by the invention, the determining the return value at the first moment based on the target action comprises the following steps:

Executing the target action to obtain the position of the instrument in the blood vessel at a first moment;

and determining a return value of the first moment based on whether the position deviates from a target path, whether the contact force of the instrument is greater than or equal to a preset threshold, and a difference value between the position of the first moment and the position of the second moment when the instrument moves from the position of the first moment to the position of the second moment.

According to the control device for vascular intervention operation provided by the invention, the optimizing the loss function based on the strategy and the function estimating loss function optimize model parameters of the simulation-based operation model to obtain the target operation model, and the control device comprises the following steps:

determining a sampling probability corresponding to each sample data based on the weight of the sample data;

determining target sample data based on the sampling probability of each sample data;

inputting the target sample data into the simulation-based operation model, optimizing model parameters of the simulation-based operation model based on the strategy optimization loss function and the function estimation loss function, and obtaining the target operation model.

According to the control device for vascular intervention operation provided by the invention, the device further comprises:

and the updating module is used for updating the weight of the target sample data based on the return value, the estimated value of the target function and the estimated value of the function after optimizing the model parameters of the simulated operation model in each round.

The invention also provides a control method of the vascular intervention operation, which comprises the following steps: acquiring first image data of a previous moment, second image data of a current moment and first action information of the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation;

inputting the first image data, the second image data and the first action information into a target operation model to obtain the probability of selecting each action at the current moment output by the target operation model; the target surgical operation model is obtained by training based on an offline reinforcement learning method by using first image sample data, second image sample data and action sample information corresponding to the first image sample data; the first image sample data is data acquired at a previous time of the second image sample data;

Determining a target control instruction based on the probability of selecting each of the actions;

the instrument is controlled to move in the blood vessel based on the target control instruction.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the control method of the vascular intervention operation according to any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of controlling vascular interventional procedures as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method of controlling a vascular interventional procedure as described in any one of the above.

According to the control device and method for vascular intervention operation, the acquired first image data at the previous moment, the acquired second image data at the current moment and the acquired first action information at the previous moment are input into the target operation model, and the target operation model can output the probability of selecting the action of the instrument, so that the target control instruction corresponding to the action of the instrument can be determined, and the instrument is controlled to move in the blood vessel according to the target control instruction. The target operation model is obtained by training based on an offline reinforcement learning method, and the offline reinforcement learning method optimizes model parameters based on a reward mechanism and does not depend on the quality of a doctor operation example, so that the accuracy of the target operation model obtained by training is improved, and the accuracy of movement of an instrument in a blood vessel is further improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural view of a control device for vascular interventional operation provided by the invention;

FIG. 2 is a schematic diagram of an exemplary acquisition device according to the present invention;

FIG. 3 is a schematic view of the structure of a surgical operation model provided by the present invention;

FIG. 4 is a schematic diagram of the structure of an encoder in a surgical operation model provided by the present invention;

FIG. 5 is a schematic illustration of the interaction of a surgical model with a surgical environment provided by the present invention;

FIG. 6 is a schematic diagram of a process for optimizing model parameters of a model of a simulation-based surgical procedure provided by the present invention;

FIG. 7 is a flow chart of a method for controlling vascular intervention provided by the present invention;

fig. 8 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The control device for vascular intervention of the present invention is described below with reference to fig. 1-6.

Fig. 1 is a schematic structural diagram of a control device for vascular intervention provided by an embodiment of the present invention, and as shown in fig. 1, a control device 100 for vascular intervention includes an acquisition module 101, a prediction module 102, a determination module 103, and a control module 104; wherein:

an acquiring module 101, configured to acquire first image data at a previous time, second image data at a current time, and first action information at the previous time; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation.

By way of example, the first image data and the second image data are described below in connection with a schematic structural diagram of the operation example acquisition device of fig. 2.

The operational example acquisition device in fig. 2 includes a vascular interventional procedure robot master 21, a vascular interventional procedure robot slave 22, a guidewire 23, a catheter 24, a vascular model 25, a camera 26, a control handle 27, and a display device 28. The blood vessel model 25 may be made by three-dimensional printing, and the camera 26 is located above the blood vessel model 25 and fixed in position. The vascular interventional procedure robot slave 22 may receive control instructions from the vascular interventional procedure robot master 21 and manipulate the guide wire 23 to move in the vascular model 25 according to the control instructions. In the following embodiment, the vessel intervention robot master end 21 is simply referred to as a master end, and the vessel intervention robot slave end 22 is simply referred to as a slave end.

The camera 26 is used for photographing blood vessels and instruments, the main terminal can obtain photographed pictures from the camera 26, and the main terminal performs binarization processing on the photographed pictures and sends the binarized images to the display device 28. The first image data and the second image data in the invention comprise blood vessel image data and instrument image data after binarization processing, namely blood vessel binarization image data and instrument binarization image data. Wherein the instrument may comprise a guidewire, the invention is not particularly limited as to the type of instrument.

The binarized image data of the blood vessel can be obtained specifically by the following modes: the color of the blood vessel portion and the color of the background portion in the three-dimensional printed blood vessel model 25 are set to be different, the pixel values of the blood vessel portion and the background portion are different, the main end can set a pixel threshold range for the blood vessel portion, whether the photographed image is in the pixel threshold range is judged, and if the photographed image is in the pixel threshold range, the pixels in the pixel threshold range can be extracted to serve as binary image data of the blood vessel. The binarized image data of the instrument is derived based on the difference between the image comprising the instrument and the blood vessel and the image comprising only the blood vessel. In addition, after the binarized image data of the instrument is obtained, the main end can also perform closed operation processing on the binarized image data of the instrument, so that possible fracture of the instrument in the binarized image of the instrument can be eliminated. And the main end can also perform centering processing on the binarized image data of the instrument, so that the obtained model can be suitable for instruments with different thicknesses.

Further, the first motion information may be a motion performed by the vascular interventional surgical robot in the blood vessel based on the control instruction control instrument. The control instructions include control instructions of two degrees of freedom of axial direction and rotation, the control instructions of the degrees of freedom of axial direction can include forward instruction of constant speed and backward instruction of constant speed, the control instructions of the degrees of freedom of rotation can include non-rotation instruction, clockwise rotation instruction and anticlockwise rotation instruction, and the corresponding speeds of clockwise rotation instruction and anticlockwise rotation instruction can be set based on the requirement. The constant-speed advancing instruction corresponds to the constant-speed advancing of the instrument in the blood vessel, the constant-speed retreating instruction corresponds to the constant-speed retreating of the instrument in the blood vessel, the non-rotating instruction corresponds to the non-rotating of the instrument in the blood vessel, the clockwise rotating instruction corresponds to the clockwise rotating of the instrument in the blood vessel at the actual set speed, and the anticlockwise rotating instruction corresponds to the anticlockwise rotating of the instrument in the blood vessel at the actual set speed. The duration of the control command may be set according to actual requirements, and in the present invention, the duration of the control command may be 0.5 seconds.

The prediction module 102 is configured to input the first image data, the second image data, and the first motion information into a target surgical operation model, so as to obtain a probability of selecting each motion at the current time output by the target surgical operation model; the target surgical operation model is obtained by training based on an offline reinforcement learning method by using first image sample data, second image sample data and action sample information corresponding to the first image sample data; the first image sample data is data acquired at a previous time of the second image sample data.

For example, the first image data, the second image data, and the first motion information are input into the target surgical operation model, and the target surgical operation model may output a probability that each motion is selected at the current time. For example, the target surgical operation model may output 97% of the instrument forward motion, 10% of the instrument backward motion, 70% of the instrument clockwise rotation motion, and 5% of the instrument counterclockwise rotation motion.

In the process of training the target surgical operation model based on the offline reinforcement learning method by using the first image sample data, the second image sample data and the motion sample information corresponding to the first image sample data, the first image sample data may refer to the description of the first image data in the above embodiment, the second image sample data may refer to the description of the second image data in the above embodiment, and the motion sample information corresponding to the first image sample data may refer to the first motion information in the above embodiment acquisition module. The first image sample data is distinguished from the first image data in that the first image sample data is model input data when training a surgical operation model, and the first image data is model input data when using the surgical operation model. The second image sample data is distinguished from the second image data in that the second image sample data is model input data when training a surgical operation model, and the second image data is model input data when using the surgical operation model.

It should be noted that the target surgical operation model is trained based on an offline reinforcement learning method, and the offline reinforcement learning method (offline reinforcement learning) is a machine learning method that improves the agent policy according to given data to maximize the cumulative return obtained by the agent.

A determining module 103, configured to determine a target control instruction based on a probability of selecting each of the actions.

For example, after the probability of the selected action is output by the target surgical operation model, a control instruction corresponding to the action with a high probability may be determined as the target control instruction. For example, if the motion with the highest probability is the instrument advancing, the target control command corresponding to the advancing motion is the advancing command.

A control module 104 for controlling movement of the instrument in the blood vessel based on the target control instructions.

For example, after determining the target control command corresponding to the selection action in the determining module, the vascular interventional surgical robot may control the instrument to move in the blood vessel based on the target control command.

The control device for vascular intervention operation provided by the invention inputs the acquired first image data at the previous moment, the acquired second image data at the current moment and the acquired first action information at the previous moment into the target operation model, and the target operation model can output the probability of selecting the action of the instrument, so that the target control instruction corresponding to the action of the instrument can be determined, and the instrument is controlled to move in the blood vessel according to the target control instruction. The target operation model is obtained by training based on an offline reinforcement learning method, and the offline reinforcement learning method optimizes model parameters based on a reward mechanism and does not depend on the quality of a doctor operation example, so that the accuracy of the target operation model obtained by training is improved, and the accuracy of movement of an instrument in a blood vessel is further improved.

In one embodiment, the target surgical operation model in the prediction module 102 is trained based on the following manner:

By way of example, the surgical procedure model training process of the present invention is described below in connection with the schematic structural diagram of the surgical procedure model of fig. 3.

The surgical operation model in fig. 3 includes an encoder 301, a strategy estimation sub-model 302, and a function estimation sub-model 303.

The encoder 301 is configured to receive the first image sample data and the second image sample data in the sample data, and compress the first image sample data and the second image sample data to obtain encoded sample information.

The policy estimate sub-model 302 is configured to receive the encoded sample information and the action sample information in the sample data, and select a prediction probability for each action at a first time based on the encoded sample information, the action sample information, and the set agent policy prediction. Selecting a predicted probability for each action based on the first time may determine a target action, and determining a return value based on the target action at the first time; the determination of the instrument performing action based on the predicted probability for each action may be described with reference to the determination of the instrument performing action in the above embodiment, and will not be described in detail herein.

The function estimation sub-model 303 is used to receive the encoded sample information and the motion sample information in the sample data, and calculate a function estimation value based on the encoded sample information and the motion sample information. Alternatively, the function estimation sub-model may include two local Q function estimators, where the model structures of the 2 local Q function estimators are the same, and parameters set in the 2 local Q function estimators are different, so that two estimated values of the Q function output by the 2 local Q function estimators are also different, and the two estimated values may be represented as Q respectively ₁ (o _t |o _t-1 ,a _t-1 ) And Q ₂ (o _t |o _t-1 ,a _t-1 ). The function is estimated from Q ₁ And Q ₂ The minimum value of (2) is obtained, and the function estimated value satisfies the following formula (1):

Q(o _t ,a _t |o _t-1 ,a _t-1 )＝min _i＝1,2 Q _i (o _t ,a _t |o _t-1 ,a _t-1 ) (1)

and updating the model parameters of the initial operation model based on the return value and the function estimated value, thereby obtaining the target operation model.

The structure of the encoder in the surgical operation model of the present invention is shown in fig. 4, and the encoder includes: convolution layers, non-linear layers and regularization layers (Adaptive Local Signal Mixing, A-LIX). The convolution layer and the nonlinear layer may be considered as one layer, and the a-LIX layer may perform a random translation-like operation on the feature map obtained by the convolution layer to smooth the gradient of the convolution layer.

In training the operation model, the A-LIX layer in the encoder takes the characteristic diagram z output by the convolution layer as the A-LIX layerThe inputs of the A-LIX layer can be written as

Wherein C represents the number of channels of the convolution layer, H represents the height of the feature map,

W represents the width of the feature map. The A-LIX layer is uniformly distributed with U < -S >, S]Sampling to obtain delta _h And delta _w Wherein S represents the translational range, delta _h Represents the offset in the height direction, delta _w Indicating the amount of offset in the width direction.

Then the output of the A-LIX layer is obtained based on the following formulas (2) to (4) by bilinear interpolation

Element->

The following formula (2) is satisfied:

wherein z (c, h, w) represents an element of the input feature map z, c represents coordinates in the channel direction, h represents coordinates in the height direction, w represents coordinates in the width direction,

representing a rounding down, a +.>

Representing an upward rounding. During training, the A-LIX layer is used for playing a role in smoothing the gradient of the convolution layer in the training process, so that the trained operation model is better. The A-LIX layer requires a smooth convolution gradient during training and the A-LIX layer does not require a smooth convolution gradient during testing, so the input of the A-LIX layer is equal to the output of the A-LIX layer.

The process of determining the target action and determining the return value will be described below in connection with the schematic interaction diagram of the surgical operation model and the surgical environment of fig. 5.

Determining the target course of action may be modeled as a Markov decision process, which may be composed of six tuples<S,O,A,P,R,γ>Wherein S represents a state space, O represents an observation space, A represents an operation space, and P is SxAxS → [0,1 ]]The state-transfer function is represented as such,

represents the return function, gamma represents the decay factor, +.>

Representing the real number domain and being a common mathematical aggregate symbol. At a first time t, second image sample data o _t And first image sample data o _t-1 Coded sample information obtained by coding and action sample information a corresponding to first image sample data _t-1 The method comprises the steps of inputting the prediction probability of each action into a strategy estimation sub-model, and outputting the prediction probability of each action at the first moment by the strategy estimation sub-model, wherein the action with the highest probability is selected as a target action a _t Wherein o _t-1 ∈O，o _t ∈O；a _t-1 ∈A，a _t ∈A。

In executing the target action a _t After that, the state of the operation environment is s _t With state transition probability P(s) _t ,a _t ,s _t+1 ) Conversion to s _t+1 ，s _t ∈S，s _t+1 ∈S。

In executing the target action a _t Then, the return value r of the current target action can be determined _t ，r _t ＝R(s _t ,a _t ). Determining a return value r _t The specific process of (2) can be as follows:

For example, the return value of the target action may be determined based on whether the position of the instrument in the blood vessel deviates from a correct path after the target action is performed at the first moment, whether the instrument is closer to the end point on the correct path, and whether the contact force of the instrument is within a preset range. The method specifically comprises the following steps:

After performing the target action, the position of the instrument in the vessel at the first time may be determined,

and can determine whether the instrument deviates from the correct path at the first time based on the position of the instrument in the blood vessel at the first time, if the instrument deviates from the correct path, an incorrect blood vessel branch is entered, giving a preset penalty; conversely, if the instrument leaves the wrong vessel branch and returns to the correct path, a preset reward is given. Wherein the correct path is the shortest path of the instrument automatically generated by Dijkstra algorithm from the start position to the end position of the vessel. For example, if the instrument deviates from the correct path and enters the wrong vascular branch after performing the target action at the first moment, the preset penalty value may be-2, and the return value is-2. If the instrument enters the correct blood vessel after executing the target action at the first moment, the preset reward value may be +2, and the return value is +2.

When the instrument moves from the position at the first time to the position at the second time on the basis of the correct path, the instrument can be considered to advance or retreat on the correct path, and the return value at the first time can be determined based on the difference between the position of the instrument at the first time and the position of the instrument at the second time. For example, at a first time, the distance between the position of the instrument and the target position is a distance of 9 pixels, at a second time, the distance between the position of the instrument and the target position is a distance of 10 pixels, and at this time, the difference between the position of the instrument at the second time and the position of the instrument at the first time relative to the target position is a distance of 1 pixel, then the reward may be +1, and the return value is +1. For another example, the distance between the position of the instrument and the target position at the first time is a distance of 9 pixels, the distance between the position of the instrument and the target position at the second time is a distance of 8 pixels, and the difference between the position of the instrument at the second time and the position of the instrument at the first time and the target position is a distance of-1 pixels, then the penalty value may be-1, and the return value is-1.

Based on the determining the return value, the return value may be further determined by combining whether the contact force of the apparatus is greater than or equal to a preset threshold. The motor current may be determined according to the contact force of the instrument, and when the value of the motor current exceeds the threshold value, the penalty is performed by determining that the contact force of the instrument exceeds the preset threshold value, for example, the penalty value may be set to-1. When the contact force of the instrument does not exceed the threshold, no rewards are made and no penalties are made. Wherein, the reward value and the punishment value can be set based on the actual requirement, and the invention is not limited to this.

It should be noted that, during the model training process, the return value of each target action may be determined, so that the accumulated return value corresponding to the target operation model obtained after training

Maximum, wherein γ ^t Meaning gamma when t increases ^t Decrease, gamma when t decreases ^t Increasing, 0 < gamma < 1.

According to the control device for the vascular interventional operation, provided by the embodiment of the invention, the model is trained based on the offline reinforcement learning method, so that the accumulated return value of the trained target operation model is maximum. Compared with the prior art which only imitates the mode of learning to train the model, the model trained by the invention has higher precision, and the target operation model is configured in the vascular interventional operation robot, so that the accuracy of the vascular access operation robot control instrument moving in the blood vessel is higher.

In an embodiment, updating the model parameters of the initial surgical operation model based on the return value and the function estimation value to obtain the target surgical operation model may be specifically implemented by the following ways:

For example, the function estimation loss function J may be determined based on the following equation (5) _Q ：

Wherein, the liquid crystal display device comprises a liquid crystal display device,

representing the expectation on the sample data, beta is the hyper-parameter, r _t To report value Q _i (o _t ,a _t |o _t-1 ,a _t-1 ) Represents Q _i (o _t |o _t-1 ,a _t-1 ) Corresponds to action a _t Is of the element V (o) _t+1 |o _t-1 ,a _t-1 ) As an estimate of the objective function,

representing the advantage of sample data over agent policy, Q _i (o _t ,a _t |o _t-1 ,a _t-1 ) Is an estimate of the function.

The estimated value of the objective function can be expressed by the following equation (6):

estimation branch of estimation value of objective function

Parameter of->

Is the exponential average of the parameter θ of the branch Q (·) of the function estimate.

After each update of theta is made,

updating is performed using the following equation (7):

where τ represents the exponential average speed.

Policy emulation loss function J can be determined based on the following equation (8) _im ：

When the function estimated loss function and the strategy simulated loss function are obtained, the gradient descent method can be used for optimizing the function estimated loss function and the strategy simulated loss function, and model parameters of the initial operation model are updated to obtain the target operation model.

According to the control device for vascular intervention operation, provided by the embodiment of the invention, the model parameters of an initial operation model are optimized based on the function estimation loss function and the strategy simulation loss function, and the error between the function estimation value and the estimation value of the objective function and the advantage of sample data relative to the strategy of the intelligent agent can be reflected by the function estimation loss function, and the strategy simulation loss function can reflect the relation between the strategy of the intelligent agent at the front moment and the moment, so that the accuracy of the model obtained by optimization is higher.

In an embodiment, the model parameters of the initial surgical operation model are optimized based on the function estimated loss function and the policy simulated loss function to obtain the target surgical operation model, which is specifically implemented by the following ways:

By way of example, model parameters of an initial surgical model are optimized based on a function estimation loss function and a policy mimicking loss function, and the overall idea of the optimization is to perform optimization of the mimicking policy, so that a mimicking-based surgical model can be obtained.

On the basis of obtaining the model of the imitation-based surgical operation, the weight of each sample data is initialized to epsilon ₂ At this time, the weight of each sample data is the same. After random sampling of the sample data, the loss function j can be estimated by using the strategy optimization loss function and the function in the above embodiment _Q And optimizing model parameters of the simulated operation model to obtain the target operation model.

The function estimation loss function is described in detail in the above embodiments, and is not described herein.

The policy optimization loss function J may be determined based on the following equation (9) _op ：

representing entropy, α represents a non-negative parameter.

Alpha can be automatically adjusted by the following equation (10):

wherein J is _α As a loss function of alpha, super-parameters

Is the target entropy.

Furthermore, during training, the translation range S can be automatically adjusted using lagrangian relaxation for the a-LIX layer in the encoder so that the smoothness of the convolutional layer gradient remains in the proper range. Gradient for A-LIX layer output

The smoothness of (c) may be represented by a modified normalized discontinuity score (Modified Normalized Discontinuity Score,/c) expressed by the following formula (11)>

) The following measures:

for measuring the smoothness of the gradient, C represents the number of channels of the convolution layer, H represents the height of the feature map and W represents the width of the feature map, +.>

Is->

Is a desired square local discontinuity (Expected Squared Local Discontinuity) can be represented by the following equation (12):

for gradient, a matrix of rows and columns can be used, v is the direction of the matrix of rows and columns, when +.>

The smaller, means +.>

The smoother.

In addition, the translation range S may be updated using the following equation (13):

wherein J is _S Super-parameters as loss function of S

Is->

Is set to a target value of (1).

According to the vascular intervention operation control device, model parameters based on the simulated operation model are further optimized based on the function estimation loss function and the strategy optimization loss function on the basis of the model based on the simulated parameter optimization, so that the accuracy of the target operation model obtained through final training is higher, and the accuracy of movement of the instrument in the blood vessel is further improved.

In an embodiment, optimizing the model parameters of the model of the simulated surgical operation based on the policy-optimized loss function and the function-estimated loss function to obtain the target surgical operation model may be implemented specifically by:

Illustratively, the sampling probability of the ith sample data may be determined based on the following equation (14):

wherein p is _i Represents the probability that the ith sample data is sampled, N represents the total number of samples, f _i Representing the weight of the ith sample data.

The sampling probability of each sample data is proportional to the weight, and when the weight of the i-th sample data is larger, the probability that the i-th sample data is sampled is also larger. Therefore, when the sampling probability of each sample data is obtained, the sampling probability of each sample data can be compared with a preset probability, the sample data corresponding to the sampling probability larger than the preset probability is determined to be the target sample data, the target sample data is further used as training data for the round of training based on the simulated operation model, the loss function is optimized based on the strategy, the loss function is estimated based on the function, and the model parameters based on the simulated operation model are optimized.

According to the control device for the vascular intervention operation, when model parameters based on the simulated operation model are optimized based on the strategy optimization loss function and the function estimation loss function, sampling probability of corresponding sample data can be updated based on the weight of the sample data, so that training data based on the simulated operation model is more accurate in each round of input.

In an embodiment, the control device for vascular intervention may be further implemented by:

Fig. 6 is a schematic diagram of a process for optimizing model parameters of a model based on an imitated operation model according to the present invention, after optimizing model parameters of the model based on the imitated operation model in each round, updating weights of each sampled target sample data, updating sampling probabilities of corresponding target sample data based on the updated weights, further redetermining new round of sample data based on the updated sampling probabilities, further optimizing model parameters of the model based on the imitated operation model based on a policy optimization loss function and a function estimation loss function by adopting the new round of sample data, and circulating in this way until convergence conditions are reached, thereby obtaining a target operation model.

Specifically, the weight of the target sample data may be updated based on the following formula (15):

is the time sequence difference error of the ith target sample data, and the super parameter tau is used for controlling the updating speed of the weight epsilon ₁ Sum epsilon ₂ For limiting the weight range and preventing the weights from being too small or too large.

According to the vascular intervention operation control device, the sampling weight of each sample data is updated based on the return value, the estimated value of the objective function and the estimated value of the function, so that the sample data input in each round are more accurate, and the accuracy of the finally trained objective operation model is higher.

The control method of the vascular intervention provided by the invention is described below, and the control method of the vascular intervention described below and the control device of the vascular intervention described above can be referred to correspondingly.

Fig. 7 is a flow chart of a control method of a vascular intervention operation according to an embodiment of the present invention, as shown in fig. 7, the control method of the vascular intervention operation includes the following steps:

step 701, acquiring first image data of a previous moment, second image data of a current moment and first action information of the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation.

Step 702, inputting the first image data, the second image data and the first motion information into a target operation model, and obtaining a probability of selecting each motion at the current time output by the target operation model; the target surgical operation model is obtained by training based on an offline reinforcement learning method by using first image sample data, second image sample data and action sample information corresponding to the first image sample data; the first image sample data is data acquired at a previous time of the second image sample data.

Step 703, determining a target control instruction based on the probability of selecting each of the actions.

Step 704, controlling the instrument to move in the blood vessel based on the target control instruction.

According to the control method for vascular intervention operation, the acquired first image data at the previous moment, the acquired second image data at the current moment and the acquired first action information at the previous moment are input into the target operation model, the probability of selecting the action of the instrument can be output by the target operation model, so that the target control instruction corresponding to the action of the instrument can be determined, and the instrument is controlled to move in the blood vessel according to the target control instruction. The target operation model is obtained by training based on an offline reinforcement learning method, and the offline reinforcement learning method optimizes model parameters based on a reward mechanism and does not depend on the quality of a doctor operation example, so that the accuracy of the target operation model obtained by training is improved, and the accuracy of movement of an instrument in a blood vessel is further improved. Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a control method for vascular interventional procedures, the method comprising: acquiring first image data of a previous moment, second image data of a current moment and first action information of the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation;

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the method of controlling vascular intervention provided by the methods described above, the method comprising: acquiring first image data of a previous moment, second image data of a current moment and first action information of the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation;

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a control method of vascular interventional procedures provided by the above methods, the method comprising: acquiring first image data of a previous moment, second image data of a current moment and first action information of the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation;

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A control device for vascular interventional procedures, the device comprising:

2. The vascular interventional procedure control device according to claim 1, wherein the target surgical procedure model is trained based on:

3. The vascular interventional procedure control device according to claim 2, wherein updating model parameters of the initial procedure model based on the return value and the function estimation value to obtain the target procedure model comprises:

4. A vascular interventional procedure control device according to claim 3, wherein the estimating a loss function based on the function and the policy mimicking a loss function, optimizing model parameters of the initial procedure model to obtain the target procedure model, comprises:

5. The vascular interventional procedure control device according to claim 2, wherein the determining the return value at the first time based on the target action comprises:

6. The vascular interventional procedure control device according to claim 5, wherein the optimizing the model parameters of the model-based simulator operation model based on the strategic optimization loss function and the function estimation loss function to obtain the target surgical operation model comprises:

7. The vascular interventional procedure control device according to claim 6, wherein the device further comprises:

8. A method for controlling vascular interventional procedures, comprising:

acquiring first image data of a previous moment, second image data of a current moment and first action information of the previous moment; the first image data and the second image data comprise blood vessel image data and instrument image data corresponding to blood vessel interventional operation;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the control method of vascular interventional procedure as claimed in claim 8 when executing the program.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the control method of vascular intervention as claimed in claim 8.