CN117423426A

CN117423426A - Method, device and equipment for constructing treatment plan dose prediction model

Info

Publication number: CN117423426A
Application number: CN202311280970.7A
Authority: CN
Inventors: 张云; 龚长飞; 袁星星; 王少彬; 陈颀; 王小平
Original assignee: Jiangxi Cancer Hospital Jiangxi Second People's Hospital Jiangxi Cancer Center
Current assignee: Jiangxi Cancer Hospital Jiangxi Second People's Hospital Jiangxi Cancer Center
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2024-01-19

Abstract

The application provides a method, a device and equipment for constructing a treatment plan dose prediction model. The dose prediction model is trained by using an artificial intelligence technology, so that more accurate detection and assessment of dose errors can be realized based on the trained dose prediction model, and compared with a traditional dose verification method and doctor experience judgment, the dose prediction model has stronger objectivity and sensitivity. In addition, the multi-head self-attention mechanism is adopted in the dose prediction model for feature extraction and processing, so that the representation capacity and generalization capacity of the model can be enhanced, and the obtained prediction result is more accurate and effective.

Description

Method, device and equipment for constructing treatment plan dose prediction model

Technical Field

The application relates to a method, a device and equipment for constructing a treatment plan dose prediction model, and belongs to the technical field of radiotherapy and artificial intelligence.

Background

Accurate radiation therapy is a constantly pursuing goal in the field of radiotherapy, namely, ensuring that a tumor area is subjected to accurate and sufficient radiation dose during radiation therapy, and simultaneously avoiding that surrounding organs at risk are subjected to excessive irradiation. In order to achieve accurate radiation treatment, it is important to perform quality assurance assessment (quality assurance, QA) of the treatment plan prior to delivery of the treatment. Wherein, dose verification before treatment is a key step of the treatment plan quality assurance process.

However, due to objective limitations, existing dose verification methods often fail to sensitively and/or effectively detect dose errors, resulting in poor and good results of the intended QA, leading to bottlenecks in accurate radiation therapy development.

Disclosure of Invention

The application provides a method, a device and equipment for constructing a treatment plan dose prediction model, which are used for solving the problems that the existing dose verification method can not detect dose errors sensitively and/or effectively, so that the QA results of the plan are different, and the bottleneck is caused to be encountered in the development of accurate radiotherapy.

In order to achieve the above object, the present application provides the following technical solutions:

in a first aspect, embodiments of the present application provide a method for constructing a treatment plan dose prediction model, including:

acquiring training samples based on a treatment planning system and CT equipment, wherein the training samples comprise RTDose distribution data and CT image data;

performing tensor conversion processing on each training sample to obtain a training sample in a three-dimensional tensor form;

based on a pre-constructed neural network model, executing a training process for preset times on the training sample in the form of the three-dimensional tensor, and storing the parameters of the trained neural network model to obtain a treatment plan dose prediction model;

the neural network model comprises an encoder and a decoder, the encoder is used for extracting three-dimensional features and three-dimensional dose distribution information of a CT image, the decoder is used for realizing regression fit of the three-dimensional features to the three-dimensional dose distribution, and the training process comprises the following steps:

performing 2 times downsampling on the training sample in the three-dimensional tensor form, and performing feature coding on the downsampled result by using a CNN-converter hybrid coder to obtain a coding result; the CNN-converter hybrid encoder comprises an encoding module and a converter module, wherein the encoding module comprises a plurality of convolution encoding blocks and residual blocks, the convolution encoding blocks are used for extracting characteristics of input training samples, and the residual blocks are used for reducing errors in the characteristic extraction process; the processing result of the coding module is subjected to patch embedding processing and then is input into the converter module in a vector form, and the converter module adopts a multi-head self-attention mechanism and comprises a plurality of self-attention modules; in each self-attention module, an input vector is converted into a query matrix, a key value matrix and a value matrix, a similarity matrix between the query matrix and the key value matrix is obtained by multiplying transposes of two matrices, the similarity matrix is normalized by using a softmax function to obtain a weight matrix, the weight matrix is multiplied by the value matrix, and attention of the input vector is obtained, wherein the formula is as follows:

where Attention (Q, K, V) represents the Attention of the input vector, Q represents the query matrix, K represents the key value matrix, V represents the value matrix,representing the dimension of the query matrix Q or the key value matrix K;

decoding the coding result by using a decoder to obtain a prediction result; the decoder comprises a plurality of decoding modules which are sequentially connected, each decoding module sequentially performs up-sampling for 2 times on the output of the previous module, and corresponding features are spliced until the output result is restored to the size before encoding, so that a dosage prediction result is obtained;

calculating L1 loss based on the obtained prediction result and the true value to obtain model loss;

based on the model loss, a back propagation algorithm is used to update the gradient of the neural network model.

Based on the above method, optionally, the patch embedding process includes:

dividing an input image into a plurality of non-overlapping small blocks, and flattening the small blocks to obtain a vector sequence;

and carrying out convolution projection on the vector sequence to realize token of the vector sequence, thereby obtaining a patch embedding processing result.

Based on the above method, optionally, the multi-head self-attention mechanism includes a plurality of processing stages, in each processing stage, firstly, the input token is remodeled into a two-dimensional token mapping, then, convolution projection is performed based on the depth separable convolution layer, and then, the projection result is planarized, so as to obtain a corresponding query matrix, a key value matrix and a value matrix.

Based on the above method, optionally, the CNN-transducer hybrid encoder includes 3 encoding modules and 1 transducer module.

Based on the above method, optionally, the processing results of all convolution layers in the encoder and the decoder are processed by a batch normalization and rectification linear unit.

Based on the above method, optionally, the transform encoder includes an L-layer multi-head self-attention mechanism and a multi-layer perceptron, and the output of the first layer is:

Z′ _l ＝MSA(LN(Z _l-1 ))+Z _l-1

Z _l ＝MLP(LN(Z′ _l ))+Z′ _l

wherein Z 'is' _l Output representing layer I multi-head self-attention mechanism, LN representing layer normalization operator, MSA representingProcessing procedure of multi-head self-attention mechanism, Z _l Representing the output of the layer-I multi-headed self-attention mechanism, the MLP represents the processing of the multi-layer perceptron.

In a second aspect, embodiments of the present application further provide a device for constructing a treatment plan dose prediction model, including:

the acquisition module is used for acquiring training samples based on the treatment planning system and the CT equipment, wherein the training samples comprise RTDose distribution data and CT image data;

the conversion module is used for carrying out tensor conversion processing on each training sample to obtain a training sample in a three-dimensional tensor form;

the training module is used for executing a training process for the preset times on the training sample in the three-dimensional tensor form based on a pre-constructed neural network model, and storing the parameters of the trained neural network model to obtain a treatment plan dose prediction model;

where Attention (Q, K, V) represents the Attention of the input vector, Q represents the query matrix, K represents the key value matrix K, V represents the value matrix,representing the dimension of the query matrix Q or the key value matrix K;

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, where the memory stores a computer program, and where the processor invokes and executes the computer program to implement a method for constructing a treatment plan dose prediction model according to any one of the first aspects.

In the method, the device and the equipment for constructing the treatment plan dose prediction model, the dose prediction model is trained by utilizing an artificial intelligence technology, so that more accurate detection and assessment of the dose error can be realized based on the trained dose prediction model, and compared with a traditional dose verification method and doctor experience judgment, the method and the device have stronger objectivity and sensitivity. In addition, the multi-head self-attention mechanism is adopted in the dose prediction model for feature extraction and processing, so that the representation capacity and generalization capacity of the model can be enhanced, and the obtained prediction result is more accurate and effective.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. Furthermore, these drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

FIG. 1 is a flow chart of a method of constructing a treatment plan dose prediction model according to one embodiment of the present application;

FIG. 2 is a schematic structural diagram of a neural network model according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a transducer module according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a self-care module according to one embodiment of the present application;

FIG. 5 is a schematic structural diagram of a device for constructing a treatment plan dose prediction model according to one embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described in the following in conjunction with the embodiments of the present application, and it is apparent that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure. The following embodiments and features of the embodiments may be combined with each other without conflict.

Accurate radiation therapy is a constantly pursuing goal in the field of radiotherapy, namely, ensuring that a tumor area is subjected to accurate and sufficient radiation dose during radiation therapy, and simultaneously avoiding that surrounding organs at risk are subjected to excessive irradiation. The current emerging precision radiotherapy techniques such as Intensity-modulated radiation therapy (IMRT) are well-suited for this in terms of dose distribution. However, due to the high degree of conformality of IMRT and the complexity of the planning design process, and particularly with the advent of emerging IMRT techniques-volume rotation intensive therapy techniques (volumetric modulated arc therapy, VMAT), planning has more dynamically changing parameters during design and execution, increasing the uncertainty of planning implementation. Therefore, dose verification before treatment becomes a key step in the procedure of the specific patient plan quality assurance (quality assurance, QA) of accurate radiotherapy technology such as IMRT.

Due to individual differences in patients, the planning complexity and planning quality vary from patient to patient, and the planning also produces dose deviations of different characteristics in clinical practice. Thus, the ideal patient-specific IMRT plan QA target should ensure that the differences in the treatment plan clinical delivered dose and the planned dose for each patient meet the relevant clinical tolerance criteria. However, due to objective limitations, existing dose verification methods often fail to sensitively and/or effectively detect dose errors, and the intended QA results are also of varying quality, leading to bottlenecks in accurate radiation therapy development. The reasons for the above limitations come mainly from the following aspects: first, two important evaluation indicators of current dose verification: there are significant disadvantages to both the gamma index and the dose-volume histogram (DVH). Second, the unknown of the clinical executability of the planning parameters. Because the acquisition of the plan optimization parameters is an unknown process, when the plan parameters are required to be readjusted when the dose cold spots or hot spots are encountered during dose verification, a method of manually operating and calculating to gradually exclude error sources is often adopted, the method is extremely dependent on the experience richness of a physical engineer, and each dose verification needs to occupy an accelerator, so that the number of patients treated by each accelerator is reduced, the workload of the physical engineer is increased, and the overall radiotherapy efficiency is reduced. In addition to the above two factors, related studies also show that the dose verification procedures of different hospitals and different physicists are different, the process is complex, the time and cost consumed on a single treatment plan are different, and finally the quality of dose verification is affected, so that the current situation of different quality of the treatment plan QA of the patient clinically is caused.

AI (Artificial Intelligence) refers to artificial intelligence, a new technical science for simulating, extending and expanding human intelligence theory, methods, techniques and application systems. With the wide application of medical development and digitization technology in recent years, the traditional statistical analysis method alone is insufficient to reveal potential rules and complex association relations in clinical big data; the development of artificial intelligence technology greatly widens the limitation of the traditional analysis method, and enables large-scale and comprehensive analysis of clinical big data to be possible. Currently, the medical field uses artificial intelligence technology to analyze a plurality of aspects such as radiotherapy plan optimization, dose prediction, clinical effect prediction big data and the like to obtain exciting results.

Based on the above, the application provides a construction scheme of a treatment plan dose prediction model, and by means of an artificial intelligence technology, a brand new objective dose prediction model is developed by utilizing accurate radiotherapy treatment plan clinical data of tumor patients so as to improve the accuracy and objectivity of current dose verification. The specific implementations are described below without limitation by way of several examples or embodiments.

Some embodiments of the present application provide a method for constructing a treatment plan dose prediction model, and referring to fig. 1, fig. 1 is a flow chart of a method for constructing a treatment plan dose prediction model according to an embodiment of the present application. In specific implementation, the solution of this embodiment may be a management system configured in an electronic device such as a computer or a server. That is, the solution of the present embodiment may be implemented by a software system in an electronic device.

As shown in fig. 1, the method for constructing a treatment plan dose prediction model of the present embodiment includes the following steps:

step S101: training samples are acquired based on the treatment planning system and the CT device. Wherein, the training sample comprises RTDose distribution data and CT image data.

In particular, RTdose (Radiotherapy Dose, radiation dose) is a DICOM (Digital Imaging and Communications in Medicine ) format common in radiation therapy treatment planning systems (e.g., pinnacle or Monaco, etc. systems). Wherein RTdose is a radiation dose file that records the distribution of radiation doses received when radiation is received in a patient. RTdose is typically represented in the form of a three-dimensional matrix of voxels, where each voxel contains a received dose value. The CT image data is image data of a patient acquired by a CT apparatus.

Step S102: and performing tensor conversion processing on each training sample to obtain the training sample in a three-dimensional tensor form.

Specifically, the process may be expressed as:

X _n ＝H(x _n )(n∈(1，2，...，N))

wherein x is _n Representing a single sample of input training sample data, N being the number of samples; h (·) represents the tensor transformation process.

Step S103: based on a pre-constructed neural network model, executing a training process for a preset number of times on a training sample in a three-dimensional tensor form, and storing parameters of the trained neural network model to obtain a treatment plan dose prediction model.

Specifically, the pre-constructed neural network model comprises an encoder and a decoder, wherein the encoder is used for extracting three-dimensional features of the CT image and three-dimensional dose distribution information, and the decoder is used for realizing regression fit of the three-dimensional features to the three-dimensional dose distribution. The encoder is a CNN-transform hybrid encoder, which as shown in fig. 2, includes 3 Encoding modules (Encoding blocks in fig. 2) and 1 transform module (Transformer block in fig. 2), and the decoder includes 3 Decoding modules (Decoding blocks in fig. 2).

The training process of the pre-constructed neural network model by using the training sample comprises the following steps:

step S1031: and carrying out 2 times downsampling on the training samples in the three-dimensional tensor form, and carrying out feature coding on the downsampled result by utilizing a CNN-converter hybrid coder to obtain a coding result.

Wherein, as shown in fig. 2, downsampling is performed by 2 times by using a Downsampling module, so as to realize dimension reduction of the input image. The result is input into a first coding module for feature extraction, the result of feature extraction is input into a second coding module for further feature extraction, and the result is input into a third coding module for further feature extraction, so that a final coding result is obtained. Wherein the processing of each coding module is also a down-sampling process in effect, gradually converting the high resolution input image into a medium/low resolution image.

Further, the coding module comprises a plurality of convolution coding blocks, namely a processing module based on a Convolution Neural Network (CNN), and a Residual block (Class Residual), wherein the convolution coding blocks are used for extracting features of input training samples, and the Residual block is used for reducing errors in the process of extracting the features. Since the training error is larger as the depth of the deep learning layer is deeper, and even the gradient disappears when the training error is serious, in this embodiment, a residual block is introduced to solve the problem, and the specific principle thereof is the prior art, so the description thereof will not be repeated here.

The processing result of the coding module is input into the transform module in a vector form after being subjected to patch embedding processing. The patch embedding process comprises the following steps: dividing an input image into a plurality of non-overlapping small blocks, namely a patch (patch), and flattening the small blocks to obtain a vector sequence (1-dimensional sequence); and carrying out convolution projection on the vector sequence to realize token of the vector sequence and obtain a patch embedding processing result. Can be expressed as:

where D represents the processing result of the encoding module, i.e. the low resolution image, h represents the image height, w represents the image width,representing the ith patch, each patch having a resolution of (p, p), M representing the number of patches, M=hw/p ² The method comprises the steps of carrying out a first treatment on the surface of the E represents convolution projection processing in patch embedding process, Z ₀ The result of the token processing is shown.

As shown in fig. 3, the transducer module adopts a Multi-headed Self-Attention (MSA) mechanism, and includes a plurality of Self-Attention modules; as shown in fig. 4, in each self-attention module, an input vector is converted into a query matrix Q, a key value matrix K and a value matrix V, a similarity matrix between the query matrix Q and the key value matrix K is obtained by multiplying transposes of two matrices, a weight matrix is obtained by normalizing the similarity matrix by a softmax function, and the weight matrix is multiplied by the value matrix V, so as to obtain the attention of the input vector, wherein the formula is as follows:

where Attention (Q, K, V) represents the Attention of the input vector, Q represents the query matrix, K represents the key value matrix, V represents the value matrix,representing the dimension of the query matrix Q or the key value matrix K.

The final output of the multi-head self-attention mechanism MSA can be expressed as:

MSA(Q，K，V)＝Concat(head _i ，...，head _i W ⁰ )

wherein head _i Representing the output of the ith self-attention module,respectively represent the ith self-noticeable linear transformation matrix, X _i Representing the input vector, W ⁰ Representing a linear transducer matrix.

In some embodiments, the multi-headed self-attention mechanism includes a plurality of processing stages, in each processing stage, an input token is first remodeled into a two-dimensional token map, then a convolution projection is performed based on a depth separable convolution layer, and then a projection result is planarized to obtain a corresponding query matrix, a key value matrix and a value matrix. The process can be expressed as:

x ^Q/K/V ＝Flatten(Conv2d(Reshape2D(x _i )，s)

wherein x is ^q/k/v Representing the input of the i-th layer Q/K/V matrix, reshape2D represents a two-dimensional token mapping operation, conv2D represents a convolution projection operation based on a depth separable convolution layer, s represents the convolution kernel size, and Flatten represents a planarization operation.

Further, the transducer encoder comprises an L-layer multi-head self-attention mechanism and a multi-layer perceptron, and the output of the first layer is as follows:

Z′ _l ＝MSA(LN(Z _l-1 ))+Z _l-1

Z _l ＝MLP(LN(Z′ _l ))+Z′ _l

wherein Z 'is' _l The output of the multi-head self-attention mechanism of the first layer is represented, LN represents a layer normalization operator, MSA represents the processing procedure of the multi-head self-attention mechanism, and MLP represents the processing procedure of the multi-layer perceptron.

Step S1032: decoding the encoding result by using a decoder to obtain a prediction result; the decoder comprises a plurality of decoding modules which are sequentially connected, each decoding module sequentially carries out up-sampling for 2 times on the output of the previous module, and corresponding features are spliced until the output result is restored to the size before encoding, so that a dosage prediction result is obtained.

In particular, the decoding module may be implemented using a deconvolution network. As shown in fig. 2, the three decoding modules sequentially perform up-sampling operations to perform regression fit on the feature codes output by the encoding modules. And finally, up-sampling by 2 times through an Upsampling module to obtain a dose prediction result (PDose) with the same size as the original image.

Furthermore, in some embodiments, the processing results of all convolution layers in the encoder and decoder are processed by batch normalization (Batch Normalization, BN) and rectifying linear units (Rectified Linear Unit, reLU). Thus, the convergence speed of the neural network during training can be increased.

Step S1033: and calculating the L1 loss based on the obtained prediction result and the true value to obtain the model loss.

Wherein, the true value refers to the true dose distribution (MDose) corresponding to the training sample obtained by actual measurement. The L1 loss, i.e., the Mean Absolute Error (MAE) loss, can represent the deviation of the predicted result from the true value by calculating the L1 loss. The smaller the deviation, the better the performance of the model.

Step S1034: based on the model loss, a back propagation algorithm is used to update the gradient of the neural network model.

Specifically, the Back Propagation (BP) algorithm is a learning algorithm suitable for a multi-layer neuronal network, which is based on a gradient descent method. The input-output relationship of the BP network is essentially a mapping relationship: an n-input m-output BP neural network performs the function of a continuous mapping from n-dimensional Euclidean space to a finite field in m-dimensional Euclidean space, which mapping is highly nonlinear. Its information processing capability is derived from multiple complex of simple nonlinear functions, and thus has a strong function reproduction capability. By adopting the BP algorithm, the gradient of the neural network model can be updated based on the model loss, so that the model loss is gradually reduced.

Based on the schemes of the embodiments, the dose prediction model is trained by using an artificial intelligence technology, so that more accurate detection and assessment of dose errors can be realized based on the trained dose prediction model, and compared with a traditional dose verification method and doctor experience judgment, the dose prediction model has stronger objectivity and sensitivity. In addition, the multi-head self-attention mechanism is adopted in the dose prediction model for feature extraction and processing, so that the representation capacity and generalization capacity of the model can be enhanced, and the obtained prediction result is more accurate and effective.

In addition, an embodiment of the present application provides a device for constructing a treatment plan dose prediction model, referring to fig. 5, the device for constructing a treatment plan dose prediction model includes:

an acquisition module 51 for acquiring training samples based on the treatment planning system and the CT apparatus, the training samples including RTDose distribution data and CT image data;

the conversion module 52 is configured to perform tensor conversion on each training sample to obtain a training sample in a three-dimensional tensor form;

the training module 53 is configured to perform a training process on the training sample in the three-dimensional tensor form for a preset number of times based on a pre-constructed neural network model, and save parameters of the trained neural network model to obtain a treatment plan dose prediction model.

For the specific implementation method of each module of the above-mentioned construction device of the treatment plan dose prediction model, reference may be made to the corresponding content in the foregoing method embodiment, which is not described herein again.

Further, the present embodiment provides an electronic device, as shown in fig. 6, including a memory 61 and a processor 62; the memory 61 stores a computer program, and the processor 62 executes and invokes the computer program to implement the method for constructing a treatment plan dose prediction model according to any of the above embodiments.

The electronic device may be a desktop computer, a notebook computer, a server, or the like.

It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.

It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or part of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. A method of constructing a treatment plan dose prediction model, comprising:

2. The method of claim 1, wherein the patch embedding process comprises:

3. The method of claim 2, wherein the multi-headed self-attention mechanism comprises a plurality of processing stages, each processing stage firstly remodelling an input token into a two-dimensional token map, then performing convolution projection based on a depth separable convolution layer, and then planarizing the projection result to obtain a corresponding query matrix, key value matrix and value matrix.

4. The method of claim 1, wherein the CNN-fransformer hybrid encoder comprises 3 encoding modules and 1 fransformer module.

5. The method of claim 1, wherein the processing results of all convolution layers in the encoder and the decoder are processed by a batch normalization and rectification linear unit.

6. The method of claim 1, wherein the transform encoder comprises an L-layer multi-headed self-attention mechanism and a multi-layer perceptron, the output of the first layer being:

Z′ _l ＝MSA(LN(Z _l-1 ))+Z _l-1

Z _l ＝MLP(LN(Z′ _l ))+Z′ _l

7. A treatment plan dose prediction model constructing apparatus, comprising:

8. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing a method of constructing a treatment plan dose prediction model as claimed in any one of claims 1 to 6 when the computer program is invoked and executed.