CN115116590B

CN115116590B - Deep reinforcement learning method and device, and pulmonary nodule patient follow-up procedure planning method, system, medium and equipment

Info

Publication number: CN115116590B
Application number: CN202210749794.6A
Authority: CN
Inventors: 王子兴; 薛芳; 胡耀达; 姜晶梅; 韩伟
Original assignee: Institute of Basic Medical Sciences of CAMS
Current assignee: Institute of Basic Medical Sciences of CAMS
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2023-04-07
Anticipated expiration: 2042-06-29
Also published as: CN115116590A

Abstract

The invention discloses a deep reinforcement learning method, a device, a lung nodule patient follow-up procedure planning method, a system, a medium and equipment, wherein the deep reinforcement learning method comprises the following steps: training a depth Q network based on lung nodule morphology, follow-up examination behavior and feedback scores; and calculating the future morphological characteristics of the pulmonary nodule through a dynamic predictor for predicting the morphological characteristics of the pulmonary nodule, and filling state information for making the decision of the behavior opportunity of the follow-up examination. The invention solves the problem that when the reinforcement learning is used for the opportunity decision of clinical examination behaviors such as the follow-up procedure planning of pulmonary nodule patients and the like, the state information is not updated timely due to the sparsity of the examination behaviors on the time axis, and the opportunity decision effect is influenced.

Description

Deep reinforcement learning method and device, and pulmonary nodule patient follow-up procedure planning method, system, medium and equipment

Technical Field

The invention relates to the technical field of computers, in particular to a deep reinforcement learning method and device, a pulmonary nodule patient follow-up procedure planning method, system, medium and equipment.

Background

The lung cancer is the first malignant tumor, early diagnosis and early treatment are the key points for prevention and control, and the low-dose CT is the standard method for finding early cases. In practice, nearly half of screeners can detect at least 1 lung nodule, but the benign and malignant qualities of the lung nodule are difficult to distinguish by relying on a baseline CT examination, so that the follow-up management of lung nodule patients constitutes the main work of lung cancer screening. The longitudinal follow-up process of the lung nodule patient relates to complex clinical decisions such as 'diagnosis is confirmed' and 'when follow-up visit' and the like, and the effect of the longitudinal follow-up process is directly related to the key problems of timeliness of lung cancer detection, over diagnosis and the like. The review of the top journal focused on this issue explicitly states that: the corresponding follow-up protocols in the current guidelines generally lack evidence-based basis (see: doubeni C A, gabler N B, wheeler C M, et al. Timely follow-up of positive results for cancer screening: systematic evaluation and recommendations of the PROSPR Association, CA: a cancer journel for clinicians,2018 (3): 199-216..

The classic concept is that the solidity degree and diameter of the lung nodule are the main judgment basis of the benign and malignant properties of the lung nodule, and the current lung nodule management scheme in the guideline mainly depends on the indexes. However: (1) the lung nodule solidity degree belongs to semi-quantitative evaluation, and the stability, accuracy and standardization of an evaluation result are difficult to guarantee; (2) the diameter index has limited utilization degree on morphological information, and the pulmonary nodule morphology is the key of clinical decision; (3) rule making for follow-up procedures is based on expert advice and lacks analytical techniques and methods to match it.

In the field of computer technology, researchers currently apply machine learning methods such as random forests, support vector machines, artificial neural networks and the like to lung nodule image analysis, and the application of the methods includes lung nodule detection, image segmentation and benign and malignant quality classification. The common feature of this class of techniques is the task of classification of the lung nodules or their constituent pixels. Further, some similar lung nodule image analysis techniques are developed on the basis of the deep learning technique, including histopathology, staging and genotyping, but essentially still belong to classification tasks. The prior art is limited by the static classification algorithm mode, and the practical requirement of on-line screening is difficult to meet, because even if the accurate single prediction is used in clinic, the misdiagnosis and missed diagnosis of doctors and patients cannot be completely eliminated, and on the contrary, a large number of patients need to judge the disease condition by means of the subsequent lung nodule change condition. This change requires multiple decisions on time intervals, and patients with too short intervals will frequently be exposed to CT radiation and too long may cause a prolonged illness. Therefore, this series of decisions on the time axis is extremely critical, and it is in mutual constraint and influence with the changes of the patient's condition.

Pulmonary nodule patient management is essentially a dynamic decision process. In the aspect of decision optimization, reinforcement learning is a branch of machine learning and is used for optimizing strategy implementation target promotion in the interaction process of environment and intelligent agent behaviors. Such approaches are used in medicine with a major focus on therapy and other intervention decisions, which is generally consistent with their original prototype (smart robot) problem structure (see parts a and b of fig. 7). However, in terms of medical examination decision, the technology needs to be modified according to the structural characteristics of the problem shown in the part c in fig. 7, wherein "no examination is applied and no corresponding state information is updated" is the difficulty of such decision problem. There is currently no reinforcement learning technique for follow-up procedure planning for pulmonary nodule patients that can address this issue.

Therefore, the invention is especially provided.

Disclosure of Invention

The invention aims to provide a deep reinforcement learning method and device, a pulmonary nodule patient follow-up flow planning method and system, a storage medium and equipment, and solves the problem that when reinforcement learning is used for making decisions on the timing of clinical examination behaviors such as pulmonary nodule patient follow-up flow planning and the like, due to the fact that the state information is not updated timely due to the sparsity of the examination behaviors on the time axis, the decision-making effect on the timing is affected.

In order to solve the above problem, in a first aspect, an embodiment of the present invention provides a deep reinforcement learning method for planning a follow-up procedure of a pulmonary nodule patient, where the method includes:

training a depth Q network based on lung nodule morphology, follow-up examination behavior and feedback scores;

and calculating the future morphological characteristics of the pulmonary nodule through a dynamic predictor for predicting the morphological characteristics of the pulmonary nodule, and filling state information for carrying out the decision of the behavior opportunity of the follow-up examination.

Optionally, the dynamic predictor comprises a mixed effect model determined from a fixed effect term related to a predictor having the same effect on all patients, a random effect term related to the difference between patients and a noise-induced error term.

Optionally, when the dynamic predictor is used to calculate the future morphological features of the lung nodule, the dynamic predictor may be used to calculate the future morphological features of the lung nodule based on only the baseline examination result at the baseline time point or based on a plurality of examination results at the repeated examination time point.

Optionally, when the deep Q network is trained, the score rules fed back by the deep Q network are used to avoid or reduce over-diagnosis and delay diagnosis as the optimal strategy targets.

In a second aspect, an embodiment of the present invention provides a pulmonary nodule patient follow-up procedure planning method, including:

acquiring pulmonary nodule clinical information of a patient;

planning the follow-up flow of an individual by utilizing a deep Q network obtained by training in the deep reinforcement learning method of the first aspect;

and outputting the planned follow-up procedure.

In a third aspect, an embodiment of the present invention provides a deep reinforcement learning apparatus for planning a follow-up procedure of a pulmonary nodule patient, including:

the depth Q network module is used for training a depth Q network based on the lung nodule morphology, the follow-up examination behavior and the feedback score;

and the dynamic prediction and information filling module is used for calculating the future morphological characteristics of the lung nodule through a dynamic predictor for predicting the morphological characteristics of the lung nodule and filling state information for carrying out the decision of the time of the follow-up examination behavior.

In a fourth aspect, an embodiment of the present invention provides a pulmonary nodule patient follow-up procedure planning system, including:

the acquisition module is used for acquiring the clinical information of pulmonary nodules of the patient;

a planning module, configured to plan a follow-up procedure of an individual by using a deep Q network obtained by training in the deep reinforcement learning method according to the first aspect;

and the output module is used for outputting the planned follow-up procedure.

In a fifth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, the program, when executed by a processor, implementing the methods of the first and second aspects.

In a sixth aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes:

one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the methods of the first and second aspects described above.

According to the depth reinforcement learning method and device, the lung nodule patient follow-up flow planning method and system, the storage medium and the equipment, a dynamic predictor is added on the basis of a depth Q network, the lung nodule shape is predicted by the dynamic predictor, and the state information which is lost due to sparse follow-up decision behaviors when the follow-up decision behaviors are selected is filled, so that when the depth Q network obtained through reinforcement learning is used for clinical examination opportunity decision such as lung nodule patient follow-up flow planning and the like, the problem that the state information is not updated timely due to the sparsity of examination behaviors on a time axis is solved, and the opportunity decision effect is improved.

Drawings

FIG. 1 shows a flow diagram of a method of deep reinforcement learning according to an embodiment of the invention;

FIG. 2 illustrates a flow diagram of a lung nodule patient follow-up procedure planning method according to an embodiment of the present invention;

FIG. 3 shows a block diagram of a deep reinforcement learning apparatus according to an embodiment of the invention;

FIG. 4 illustrates a block diagram of a pulmonary nodule patient follow-up procedure planning system, in accordance with an embodiment of the present invention;

FIG. 5 illustrates a block diagram of a pulmonary nodule patient follow-up procedure planning system, according to another embodiment of the present invention;

FIG. 6 illustrates a block diagram of a computing device capable of implementing various embodiments of the invention;

FIG. 7 shows a comparison graph of structural similarities and differences between a classical and an inspection opportunity decision-oriented reinforcement learning problem;

FIG. 8 shows a schematic diagram of a convolutional neural network as a deep Q-network function fitter of an embodiment of the present invention

FIG. 9 shows a schematic diagram of the role of a dynamic predictor in a deep Q network, according to an embodiment of the invention.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments shown in the drawings. It should be understood that these embodiments are described only to enable those skilled in the art to better understand and to implement the present invention, and are not intended to limit the scope of the present invention in any way.

In describing embodiments of the present invention, the terms "include" and its derivatives should be interpreted as being open-ended, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like may refer to different or the same objects. Other explicit and implicit definitions are also possible below.

Referring to fig. 1, in order to solve the above problem, an embodiment of the present invention provides a deep reinforcement learning method 1 for planning a follow-up procedure of a pulmonary nodule patient, where the method 1 includes:

step 10, training a depth Q network based on pulmonary nodule morphology, follow-up examination behaviors and feedback scores; and calculating the future morphological characteristics of the pulmonary nodule through a dynamic predictor for predicting the morphological characteristics of the pulmonary nodule, and filling state information for carrying out the decision of the behavior opportunity of the follow-up examination.

To better illustrate the technical effects of the embodiments of the present invention, as an example, the deep Q network in step 10 is described.

The deep Q network is one of reinforcement learning, and integrates network algorithms such as a convolutional neural network and the like and a Q learning algorithm. The Q learning algorithm, i.e., a value-based reinforcement learning algorithm, is a selection method for a policy, and a basic idea thereof is to obtain an optimal policy by continuously estimating and updating a Q cost function under the condition of giving concepts such as a state, an action, and a return. Specifically, the Q learning algorithm initializes the Q value for each pair of state-behavior, then selects the behavior to be executed based on the state S, then observes the reward and the new state, and updates according to the Q function update formula.

In this embodiment, an optimal behavior cost function q is defined _* (s, a) which is the starting state (i.e. lung nodule morphology at baseline examination) at s and takes arbitrary actions a (i.e. decision-making actions on follow-up examination occasions, such as 0 day for immediate diagnosis; 3 months for review; 1 year for 1 year) followed by expected feedback (expressed in scores, see below, whose values are inversely related to patient negative events) continuously following an optimal strategy (i.e. follow-up procedure that minimizes patient negative events), i.e.:

the Q-type reinforcement learning is to search the strategy set pi for the strategy tau which maximizes the expected feedback R (tau). In the decision-making process of the time of the follow-up examination, the selection of the behavior A in each step can be realized by an epsilon greedy method (random behavior selection is carried out from all behaviors with small probability epsilon), or the selection can also be realized by adopting Boltzmann exploration. As an example, in the greedy method of ε, ε is typically set to a very small value, so 1- ε may be a large value, i.e., when the decision-making behavior of the follow-up exam occasion is chosen, the current best behavior is chosen with a high probability of 1- ε, but the probability of ε is still random. In general, ε will decrease over time. This is because, at the very beginning. Because it is unknown which decision-making behavior of the following diagnosis interval is a better strategy, but with more and more training times, the better strategy is determined, so that exploration is reduced, the value of epsilon is reduced, the behavior is determined mainly according to a Q function, and less behaviors are determined randomly.

The updating rule of the Q function in each step is as follows:

the space formed by the morphological state of the pulmonary nodule and the possible time behaviors of the follow-up examination is very large, and a function fitting device is adopted to fit the Q function at the moment. The deep Q network employed by the present invention is based on a convolutional neural network as shown in fig. 8 as a function fitter.

The embodiment of the invention adopts the deep Q network for the optimization of the follow-up procedure of the pulmonary nodule patient, and has the advantages that: (1) Various biological information contained in the lung nodule image can be deeply excavated without characteristic screening and domain knowledge; (2) The generalization requirement of the visible condition to the invisible condition in the follow-up procedure of the pulmonary nodule patient can be met; (3) By combining with a playback caching technology (namely performing Q-type reinforcement learning by sampling past experience data sets in small batches in a caching mode), a target network technology (namely replacing a required Q network with an independent network) and the like, the robustness of the network performance can be guaranteed.

However, the inventors have found through research that the direct application of the above-mentioned deep Q network to the problem of the follow-up procedure planning of lung nodules faces the following technical difficulties: the examination of the patient during the follow-up procedure is limited, i.e. A _t Is sparse, and theoretically the morphological state of the lung nodules is constantly changing, so S _t Is continuous due to A _t Is responsible for most of S _t Not observable, but effective observation of the state is the basis for the decision. This is an important obstacle to the reinforcement learning described above for medical examination-like decisions similar to the present invention.

To overcome this obstacle, the present invention adds a dynamic predictor to the framework of the deep Q-network method to estimate the future lung nodule morphological characteristics to deal with the problem of missing relevant decision bases (the role of the dynamic predictor in the reinforcement learning system is shown in fig. 9). Therefore, in the step 10, the dynamic predictor is used to calculate the future morphological features of the lung nodule, and the state information required for making the timing decision of the follow-up examination behavior is filled, so that the problem of decision basis deletion caused by the sparsity of the follow-up examination behavior on the time axis is overcome.

As an example, the dynamic predictor may be implemented by a mixed effects model, which in one embodiment may be expressed as:

S _ij ＝μ _ij +ε _ij ＝αV _ij +β _i W _ij +ε _ij (3)

S _ij is a state characteristic value mu of a certain patient i at a time point j _ij Theoretical value, α V, representing the characteristic _ij 、βW _ij Respectively representing a fixed effect term and a random effect term, epsilon _ij Errors caused by image noise.

The mixed effect model (which can be naturally expanded to a multi-element mixed effect model) can realize continuous prediction of lung nodule image characteristics, namely state tracks, and is used for filling the calculation of Q functions at the time point j and selecting the behavior A _j The required state information. Further, the prediction may be based on only baseline findings at the baseline time points, or may be based on multiple findings at the repeat time points, i.e., a dynamic prediction function is implemented.

Another advantage of introducing statistical models such as the mixed effect model is that the lung cancer diagnosis process generates low-dose CT image data, which has image noise relative to conventional CT, and causes a problem of unstable image features extracted from deep learning. The model can greatly reduce the interference of the noise, so that the subsequent analysis result is more robust.

As described above, in training the deep Q network, it is desirable to maximize the feedback, which is expressed in terms of scores, as an example, the scoring rules include:

on the lung nodule patient follow-up planning problem, the goal of making an optimal strategy is to avoid or reduce the following two main categories of situations:

over-diagnosis, i.e., lung nodules are not malignant, but have received excessive additional examination, including follow-up CT examination, invasive definitive examination (e.g., puncture, surgical pathology), the latter being defined as misdiagnosis;

delayed diagnosis, i.e., lung nodules are malignant and have been diagnosed at a high clinical stage (stages I, II, IIIa, IIIb, IVa, IVb, respectively, from low to high), including death due to lung cancer, the latter being defined as missed diagnosis.

Based on this, further, the feedback R is defined in the following consensus manner:

set R _t Is 100 points;

each time the diagnosis is increased by 10 minutes, the patient is in the stage IIIa, 70, 80 and 90 minutes respectively when the lung cancer is diagnosed, until the point is 0;

the misdiagnosis is deducted for 50 minutes;

the missed diagnosis (death) was directly scored as 0 point.

Since the final feedback in the follow-up process depends on the prediction effect of the mixed effect model and the learning effect of the deep Q network, the feedback score method can be used for optimizing the performance of the dynamic predictor and the Q function fitter.

As shown in fig. 3, an embodiment of the present invention further provides a deep reinforcement learning apparatus 3 for planning a follow-up procedure of a pulmonary nodule patient, including:

a deep Q network module 31 for training a deep Q network based on lung nodule morphology, follow-up examination behavior and feedback scores;

and a dynamic prediction and information filling module 32, which calculates the future morphological characteristics of the lung nodule through a dynamic predictor for predicting the morphological characteristics of the lung nodule, and fills the state information for making the time decision of the follow-up examination behavior.

It should be understood that, the above program modules and the steps described in the method embodiments have a one-to-one correspondence relationship, and the technical solution described in the method embodiments may also be applied to the specific configuration of each program module, and in order to avoid repetition, the details are not described here again.

With reference to fig. 2, an embodiment of the present invention further provides a pulmonary nodule patient follow-up procedure planning method 2, including:

and step 20, acquiring the clinical information of the pulmonary nodule of the patient.

And step 21, planning the follow-up flow of the patient by using the deep Q network obtained by training in the step 1.

In step 20, clinical information and lung images of the patient are input, so that in step 21, the patient is subjected to individual follow-up procedure planning according to the input information and the trained deep Q-network.

And step 22, outputting the planned follow-up procedure.

The depth Q network used in the planning method utilizes the dynamic predictor to calculate the future morphological characteristics of the pulmonary nodules and fills the state information required for the decision-making of the time of the follow-up examination behaviors, so that the problem of decision-making basis loss caused by the sparsity of the follow-up examination behaviors on the time axis is solved.

With reference to fig. 4, an embodiment of the present invention further provides a pulmonary nodule patient follow-up procedure planning system 4, including:

an obtaining module 41, configured to obtain clinical information of a pulmonary nodule of a patient;

a planning module 42, configured to plan a follow-up procedure of an individual by using the deep Q network obtained through training in the deep reinforcement learning method;

and an output module 43 for outputting the planned follow-up procedure.

It should be understood that, the above program modules have a one-to-one correspondence with the steps described in the method embodiment, and the technical solution described in the method embodiment may also be applied to the specific configuration of each program module, and is not described herein again to avoid repetition.

As an embodiment, the pulmonary nodule patient follow-up procedure planning system may further construct each module according to the following logic, and with reference to fig. 5, specifically includes:

the data acquisition module is used for acquiring lung images and clinical information of the patient;

the storage module is used for storing corresponding data, so that the device has the capability of continuously improving the performance by self through the accumulation of the data;

the analysis module is used for training and testing the deep Q network and planning the follow-up process of the individual;

and the output module is used for outputting the analysis result of the analysis module.

The invention also provides an electronic device, a readable storage medium and a computer program product according to the embodiment of the invention.

FIG. 6 is a block diagram of a computing device 600 capable of implementing multiple embodiments of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 6, the device 600 comprises a computing unit 601, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as

methods

1 and 2. For example, in some embodiments,

methods

1 and 2 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 600 via ROM 602 and/or communications unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the

methods

1 and 2 described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform

methods

1 and 2 in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The inventive concept is explained in detail herein using specific examples, which are given only to aid in understanding the core concepts of the invention. It should be understood that any obvious modifications, equivalents and other improvements made by those skilled in the art without departing from the spirit of the present invention are all included in the scope of the present invention.

Claims

1. The deep reinforcement learning method is used for the follow-up procedure planning of the pulmonary nodule patient, and comprises the following steps:

training a depth Q network based on pulmonary nodule morphology, follow-up examination behaviors and feedback scores;

calculating future morphological characteristics of the pulmonary nodule through a dynamic predictor for predicting the morphological characteristics of the pulmonary nodule, and filling state information for carrying out decision-making of the follow-up examination behavior opportunity; the dynamic predictor comprises a mixed effect model, wherein the mixed effect model is determined according to a fixed effect term, a random effect term and a noise-caused error term, the fixed effect term is related to a prediction factor having the same effect on all patients, and the random effect term is related to the difference between the patients;

the training of the deep Q network based on lung nodule morphology, follow-up examination behavior and feedback scores comprises:

defining an optimal behavior cost function q _* (s, a) wherein s is taken as the starting state, i.e. s is the lung nodule morphology at baseline exam, and arbitrary behavior a, a is the decision-making behavior regarding the follow-up exam timing, e.g. 0 days for immediate diagnosis, 3 months for 1 year for follow-up, after which the expected feedback of the optimal strategy is continuously followed, the optimal strategy is the follow-up procedure that minimizes patient negative events, i.e.:

;

where τ denotes the strategy, π denotes the strategy set, R (τ) denotes the desired feedback, S ₀ Denotes the initial state, A ₀ A decision-making behavior indicative of an initial follow-up examination opportunity;

the update rule of the Q function is:

;

S _t representing the morphological state of the lung nodules at time t, A _t Decision-making behavior representing the follow-up examination occasion at time point t; s _t+1 Represents the morphological state of the lung nodules at time point t +1, A _t+1 Decision-making behavior representing the follow-up examination occasion at time point t + 1; α represents a learning rate; r is _t Representing a feedback score;

the dynamic predictor may be expressed as:

S _ij ＝μ _ij +ε _ij ＝αV _ij +β _i W _ij +ε _ij ；

S _ij for a state characteristic value, mu, of a certain patient i at time point j _ij A theoretical value representing the characteristic;

αV _ij 、βW _ij respectively representing a fixed effect term and a random effect term, epsilon _ij Errors caused by image noise.

2. The method of claim 1, wherein the dynamic predictor is used to calculate the future morphological feature of the lung nodule, if it can be based on only baseline examination results at baseline time point, or based on multiple examination results at repeated examination time point, so as to realize dynamic prediction function.

3. The deep reinforcement learning method of claim 2, wherein the deep Q network is trained to feed back scoring rules to avoid or reduce over-diagnosis and delay diagnosis as optimal strategy objectives.

4. The follow-up procedure planning method for the pulmonary nodule patient is characterized by comprising the following steps:

acquiring pulmonary nodule clinical information of a patient;

planning the follow-up flow of the individual by using the deep Q network obtained by training in the deep reinforcement learning method of any one of claims 1 to 3;

and outputting the planned follow-up procedure.

5. A deep reinforcement learning device that is used for pulmonary nodule patient to follow a diagnosis flow planning, its characterized in that includes:

the dynamic prediction and information filling module is used for calculating the future morphological characteristics of the lung nodule through a dynamic predictor for predicting the morphological characteristics of the lung nodule and filling state information for carrying out the decision-making of the follow-up examination behavior opportunity;

the dynamic predictor comprises a mixed effect model, the mixed effect model is determined according to a fixed effect term, a random effect term and a noise-caused error term, wherein the fixed effect term is related to a prediction factor having the same action on all patients, and the random effect term is related to differences among the patients;

the training of the deep Q-network based on lung nodule morphology, follow-up examination behavior, and feedback scores includes:

;

where τ denotes the strategy, π denotes the strategy set, R (τ) denotes the desired feedback, S ₀ Denotes the initial state, A ₀ Decision behavior indicative of an initial follow-up examination opportunity;

the update rule of the Q function is:

;

S _t representing the morphological state of the lung nodules at time t, A _t Decision-making behavior representing the follow-up examination occasion at time point t; s. the _t+1 The morphological state of the pulmonary nodules at time point t +1, A _t+1 Decision-making behavior representing the follow-up examination occasion at time point t + 1; α represents a learning rate; r _t Representing a feedback score;

the dynamic predictor may be expressed as:

S _ij ＝μ _ij +ε _ij ＝αV _ij +β _i W _ij +ε _ij ；

6. Pulmonary nodule patient follow-up procedure planning system, its characterized in that includes:

a planning module, configured to plan a follow-up procedure of an individual by using the deep Q network obtained through training in the deep reinforcement learning method according to any one of claims 1 to 3;

and the output module is used for outputting the planned follow-up procedure.

7. A storage medium, characterized in that a computer program is stored thereon, which program, when being executed by a processor, carries out the method of any one of claims 1-4.

8. An electronic device, the electronic device comprising:

one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any one of claims 1-4.