CN117437365B

CN117437365B - Medical three-dimensional model generation method and device, electronic equipment and storage medium

Info

Publication number: CN117437365B
Application number: CN202311756147.9A
Authority: CN
Inventors: 秦文健; 陈鑫
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-04-12
Anticipated expiration: 2043-12-20
Also published as: CN117437365A

Abstract

The application provides a method and a device for generating a medical three-dimensional model, electronic equipment and a storage medium, and relates to the technical field of computers. The method for generating the medical three-dimensional model comprises the following steps: responding to input operation in the interactive interface, and acquiring medical text; invoking a first generation network, and under the guidance of a medical text, learning a first tensor input into the first generation network to obtain a medical two-dimensional image conforming to the description of the medical text; the first generation network is a trained deep learning model with the capability of generating two-dimensional images from medical text to medical science; inputting the medical two-dimensional image into a second generation network to generate a medical three-dimensional model; the second generation network is a trained deep learning model with generation capabilities from a medical two-dimensional image to a medical three-dimensional model; and displaying the medical three-dimensional model generated by the second generation network in the interactive interface. The method and the device solve the problem of poor reality of the medical three-dimensional model in the related technology.

Description

Medical three-dimensional model generation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for generating a medical three-dimensional model, an electronic device, and a storage medium.

Background

At present, a large number of medical three-dimensional models are needed to be used in the medical training and medical teaching processes, and the medical three-dimensional models are usually created manually by three-dimensional model content companies, so that not only is the reality degree of virtual content limited, but also a large amount of time is consumed for content editing, and the change of a patient in different situations and different states cannot be truly reflected.

From the above, how to improve the realism of the three-dimensional model of medicine remains to be solved.

Disclosure of Invention

The embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for generating a medical three-dimensional model, which can solve the problem of poor reality of the medical three-dimensional model in the related art. The technical scheme is as follows:

according to one aspect of the present application, a method of generating a medical three-dimensional model includes: responding to input operation in the interactive interface, and acquiring medical text; invoking a first generation network, and under the guidance of the medical text, learning a first tensor input into the first generation network to obtain a medical two-dimensional image conforming to the description of the medical text; the first generation network is a trained deep learning model with generation capabilities from medical text to medical two-dimensional images; inputting the medical two-dimensional image into a second generation network to generate a medical three-dimensional model; the second generation network is a trained deep learning model with generation capabilities from a medical two-dimensional image to a medical three-dimensional model; and displaying the medical three-dimensional model generated by the second generation network in the interactive interface.

According to one aspect of the present application, a medical three-dimensional model generating apparatus includes: the text acquisition module is used for responding to input operation in the interactive interface and acquiring medical text; the image generation module is used for calling a first generation network, and under the guidance of the medical text, learning a first tensor input into the first generation network to obtain a medical two-dimensional image conforming to the description of the medical text; the first generation network is a trained deep learning model with generation capabilities from medical text to medical two-dimensional images; the model generation module is used for inputting the medical two-dimensional image into a second generation network to generate a medical three-dimensional model; the second generation network is a trained deep learning model with generation capabilities from a medical two-dimensional image to a medical three-dimensional model; and the model display module is used for displaying the medical three-dimensional model generated by the second generation network in the interactive interface.

According to one aspect of the application, an electronic device comprises at least one processor and at least one memory, wherein the memory has computer readable instructions stored thereon; the computer readable instructions are executed by one or more of the processors to cause an electronic device to implement a method of generating a medical three-dimensional model as described above.

According to one aspect of the present application, a storage medium has stored thereon computer readable instructions that are executed by one or more processors to implement the method of generating a medical three-dimensional model as described above.

According to one aspect of the application, a computer program product includes computer-readable instructions stored in a storage medium, one or more processors of an electronic device reading the computer-readable instructions from the storage medium, loading and executing the computer-readable instructions, causing the electronic device to implement a method of generating a medical three-dimensional model as described above.

The beneficial effects that this application provided technical scheme brought are:

in the technical scheme, after the medical text is obtained, a first generation network with the generation capability from the medical text to the medical two-dimensional image can be called, so that under the guidance of the medical text, the first tensor input into the first generation network is learned to obtain the medical two-dimensional image which accords with the description of the medical text, then the medical two-dimensional image is input into a second generation network with the generation capability from the medical two-dimensional image to the medical three-dimensional model to generate the medical three-dimensional model, and finally the medical three-dimensional model is generated and displayed in an interactive interface. On one hand, the two generating networks are utilized to automatically and quickly generate medical three-dimensional models required by medical training or medical teaching, on the other hand, simple medical texts can be input into the interactive interface according to actual requirements of different medical training or medical teaching, and the generated medical three-dimensional models can be guided to truly reflect changes of different situations and different states of a patient, so that the problem of poor authenticity of the medical three-dimensional models in the related technology can be effectively solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment in accordance with the teachings of the present application;

FIG. 2 is a flowchart illustrating a method of generating a medical three-dimensional model, according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating a first generation network generation process from medical text to medical two-dimensional images, according to an example embodiment;

FIG. 4 is a schematic diagram of a medical text-to-medical three-dimensional model shown according to an exemplary embodiment;

FIG. 5 is a flowchart illustrating a process of constructing a first training set and a second training set, according to an example embodiment;

FIG. 6 is a schematic diagram of a first training set and a second training set in the corresponding embodiment of FIG. 5;

FIG. 7 is a detailed interactive schematic diagram of a method for generating a medical three-dimensional model in an application scenario;

FIG. 8 is a block diagram illustrating a medical three-dimensional model generation apparatus according to an exemplary embodiment;

FIG. 9 is a hardware block diagram of an electronic device shown in accordance with an exemplary embodiment;

fig. 10 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

The following is an introduction and explanation of several terms involved in this application:

AIGC, english is known as AI-Generated Content, and Chinese meaning is generative artificial intelligence.

Stable Diffusion, chinese meaning Stable Diffusion.

VR, english is called Virtual Reality, and Chinese meaning is Virtual Reality.

MR, english is called Mixed Reality, chinese meaning is Mixed Reality.

As described above, the current medical three-dimensional model is mainly created manually by a three-dimensional model content company, is usually manufactured by 3D software or is generated after medical image segmentation, reconstruction and rendering, has the defect of low authenticity, has complex operation steps and high manufacturing cost, has no means to achieve the free change and editing of the recommended text description control content style, and is difficult to adapt to the requirements of different content styles in a plurality of different scenes such as medical training or medical teaching.

With the rapid development of deep learning models, generation type Artificial Intelligence (AIGC) has begun to appear, and the core idea of the AIGC technology is to generate content with a certain creative and quality by using an artificial intelligence algorithm. By training the model and learning the large amount of data, the AIGC can generate content related thereto according to the inputted conditions or instructions. For example, by entering keywords, descriptions, or samples, the AIGC may generate articles, images, audio, etc. that match it.

The current AIGC has a still effective effect in a natural image model and a natural scene, is limited by the number of medical content data sets, and cannot realize the training of large-scale medical content data sets, so that the AIGC still cannot achieve a satisfactory effect in the generation of medical related content.

From the above, the related art still has the defect of poor reality of the three-dimensional medical model, so that the assessment in the application scenarios such as medical training or medical teaching is difficult to perform accurate quantitative evaluation.

Therefore, the method for generating the medical three-dimensional model can effectively improve the reality of the medical three-dimensional model, and is correspondingly suitable for a device for generating the medical three-dimensional model, wherein the device for generating the medical three-dimensional model can be deployed in electronic equipment, and the electronic equipment can be computer equipment with deployed von neumann architecture, for example, the computer equipment comprises a desktop computer, a notebook computer, a server and the like.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an implementation environment involved in a method of generating a medical three-dimensional model. It should be noted that this implementation environment is only one example adapted to the present application, and should not be considered as providing any limitation on the scope of use of the present application.

The implementation environment includes a client 110 and a server 130.

Specifically, the user terminal 110 may be an electronic device that provides a display function, for example, the electronic device may be a desktop computer, a notebook computer, a server, or the like configured with a display screen, or may be a smart phone, a tablet computer, or the like configured with a touch screen.

The server 130 may be a desktop computer, a notebook computer, a server, or other electronic devices, or may be a computer cluster formed by multiple servers, or even a cloud computing center formed by multiple servers. The server 130 is configured to provide a background service, for example, a background service includes, but is not limited to, a medical three-dimensional model generation service, and the like.

The communication connection is pre-established between the server 130 and the client 110 in a wired or wireless mode, and data transmission between the server 130 and the client 110 is realized through the communication connection. The data transmitted includes, but is not limited to: a first generation network, a second generation network, etc.

In an application scenario, for the server 130, a generation service of the medical three-dimensional model is invoked, so that a first generation network and a second generation network for generating the medical three-dimensional model are deployed to the client 110, specifically, the first generation network is obtained based on a pre-constructed first training set, the second generation network is obtained based on a pre-constructed second training set, and the first generation network and the second generation network are sent to the client 110.

Along with the interaction between the user terminal 110 and the server terminal 130, the user terminal 110 receives the first generation network and the second generation network, so as to complete the deployment of the first generation network and the second generation network, then after the user inputs the medical text by means of the interaction interface of the user terminal 110, the first generation network and the second generation network can be called, and the medical three-dimensional model is generated under the guidance of the medical text, so that the medical three-dimensional model can truly reflect the changes of different situations of a patient in different states, and finally the medical three-dimensional model is displayed in the interaction interface, thereby effectively solving the problem of poor reality of the medical three-dimensional model in the related technology.

Referring to fig. 2, an embodiment of the present application provides a method for generating a medical three-dimensional model, which is applicable to an electronic device, and the electronic device may be the user terminal 110 in the implementation environment shown in fig. 1.

In the following method embodiments, for convenience of description, the execution subject of each step of the method is described as an electronic device, but this configuration is not particularly limited.

As shown in fig. 2, the method may include the steps of:

in step 310, medical text is acquired in response to an input operation in the interactive interface.

First, the interactive interface essentially refers to a page that interacts with a user during the generation of a medical three-dimensional model. In some embodiments, the interactive interface is provided by a client running on the electronic device. It will be appreciated that the interactive interface can be displayed on a screen of an electronic device configuration as a client is run on the electronic device, either in the form of an application or web page, and accordingly, the page can be in the form of a program window or web page, without limitation.

Secondly, medical text is used to describe the content style, in particular to describe the content style of a medical two-dimensional image or a medical three-dimensional model desired by the user, i.e. the medical text essentially truly reflects the changes in different situations of the patient. For example, the medical text may be described as "tumor-bearing lung" and may also be more precisely described as "tumor with 20mm diameter right lung".

In order to be able to understand the content style of the medical two-dimensional image or medical three-dimensional model desired by the user, in some embodiments, an input portal is provided in the interactive interface, so that the user can input medical text by means of the input portal.

For example, an input box is displayed in the interactive interface, and the user can input medical text in the input box, so that the electronic device can detect the input operation in the input box, and further learn the content style of the medical two-dimensional image or the medical three-dimensional model expected by the user. Wherein the input box is regarded as an input entry provided in the interactive interface, and the input operation is regarded as an input operation in the interactive interface.

Of course, in other embodiments, the input operation may differ depending on the input component of the electronic device configuration. For example, the electronic device is a desktop computer configured with a keyboard, and accordingly, the input operation may be a mechanical operation such as clicking of the keyboard by a pointer; the electronic device is a tablet computer configured with a touch screen, and accordingly, the input operation may be gesture operations such as clicking, sliding, etc. of the pointer on the touch screen, which is not particularly limited herein.

Step 330, invoking the first generation network, and learning the first tensor input into the first generation network under the guidance of the medical text to obtain a medical two-dimensional image conforming to the description of the medical text.

Wherein the first generation network is a trained deep learning model with the capability of generating two-dimensional images from medical text to medical. That is, training the deep learning model by means of a first training set, which is constructed from a large number of medical two-dimensional images and corresponding medical text, enables a first generation network to be obtained with the capability of generating from medical text to medical two-dimensional images. Based on the above, the first generation network substantially reflects the mathematical mapping relationship between the medical text and the medical two-dimensional image, and then, by calling the first generation network, the corresponding medical two-dimensional image can be obtained by mapping the medical text based on the mathematical mapping relationship between the medical text and the medical two-dimensional image reflected by the first generation network.

In some embodiments, the first generation network includes a text encoder, a diffusion model, and an image decoder. The initial Diffusion model may be a pre-trained Stable Diffusion model, among others.

Fig. 3 shows a schematic diagram of a first generation network from medical text to a generation of a medical two-dimensional image. In fig. 3, first, a medical text is converted into a second tensor (tensor) by a text encoder, and a randomly generated first tensor is acquired; then, controlling the second tensor to guide the diffusion model to conduct diffusion learning on the first tensor according to the content style described by the medical text, so as to obtain a third tensor; finally, the third tensor is converted into a medical two-dimensional image using an image decoder.

The diffusion learning process specifically refers to: as shown in fig. 3, the first tensor is input into the diffusion model, and denoising is performed through the inverse process of the diffusion model; taking the second tensor as a guiding condition, and introducing the guiding condition into the denoising process of the first tensor to obtain a third tensor.

In the above process, since the medical text substantially describes the content style of the medical two-dimensional image desired by the user, the second tensor obtained by the conversion of the medical text can reflect the content style of the medical two-dimensional image desired by the user, and the second tensor is introduced in the denoising process of the first tensor, so as to instruct the diffusion model to conduct diffusion learning on the first tensor to the content style desired by the user, and finally obtain the medical two-dimensional image conforming to the description of the medical text, that is, the medical two-dimensional image conforms to the content style desired by the user. For example, if the medical text is "lung with tumor", the resulting medical two-dimensional image contains not only lung but also tumor in the lung; or if the medical text is a tumor with the diameter of 20mm of the right lung, the obtained medical two-dimensional image not only comprises the left lung and the right lung, but also presents the tumor with the diameter of 20mm in the right lung, so that the medical two-dimensional image can truly reflect the change of a patient in different situations and different states under the guidance of the medical text.

It is worth mentioning that the first tensor refers to the image tensor of the set size, and correspondingly, the third tensor is also the image tensor of the set size, and it is understood that the size of the first tensor controls the size of the third tensor and controls the image size of the medical two-dimensional image. In other words, the setting size may be flexibly set according to the actual requirement of the image size of the medical two-dimensional image in the application scene, which is not limited herein.

Step 350, inputting the medical two-dimensional image into a second generation network for generating a medical three-dimensional model.

Wherein the second generation network is a trained deep learning model having generation capabilities from a medical two-dimensional image to a medical three-dimensional model. In some embodiments, the deep learning model may be a pre-trained end-to-end deep learning model Pixel2Mesh.

That is, training the deep learning model by the second training set, which is constructed by a large number of medical two-dimensional images and corresponding medical three-dimensional models at the same or different perspectives, can result in a second generation network having the generation capability from the medical two-dimensional images to the medical three-dimensional models. The medical two-dimensional image corresponds to the medical three-dimensional model, and the medical two-dimensional image and the medical three-dimensional model under the same or different visual angles have matched medical texts, and the medical two-dimensional image and the medical three-dimensional model under the same or different visual angles can be understood to conform to the content style described by the matched medical texts.

Based on the above, the first generation network substantially reflects the mathematical mapping relationship between the medical two-dimensional image and the medical three-dimensional model, and then by inputting the medical two-dimensional image into the second generation network, the corresponding medical three-dimensional model can be obtained by mapping the medical two-dimensional image based on the mathematical mapping relationship between the medical two-dimensional image and the medical three-dimensional model reflected by the second generation network.

Fig. 4 shows a schematic diagram of a medical text to a medical three-dimensional model, in fig. 4, the medical text is "a lung with tumor", a picture of the lung with tumor, i.e. a medical two-dimensional image, is obtained by means of a first generation network, and a three-dimensional grid model of the lung with tumor, i.e. a medical three-dimensional model, is obtained by means of a second generation network on the basis of the medical two-dimensional image.

And step 370, displaying the medical three-dimensional model generated by the second generation network in the interactive interface.

After the medical three-dimensional model is obtained, the medical three-dimensional model may be presented to the user in the interactive interface. It should be noted that, with the difference of display functions provided by different electronic devices, for example, different resolutions of display screens configured by different electronic devices, before the medical three-dimensional model is displayed in the interactive interface, the generated medical three-dimensional model needs to be encoded according to a data format compatible with the electronic device, so that the medical three-dimensional model adapted to the electronic device can be output.

Through the process, on one hand, the medical three-dimensional model required by medical training or medical teaching can be automatically and quickly generated by utilizing the two generation networks, and on the other hand, simple medical texts can be input into the interactive interface according to the actual requirements of different medical training or medical teaching, and the generated medical three-dimensional model can be guided to truly reflect the changes of different situations and different states of a patient, so that the problem of poor reality of the medical three-dimensional model in the related technology can be effectively solved.

Referring to fig. 5, in an exemplary embodiment, the method may further include the steps of:

step 410, obtaining a medical original image, and labeling the medical text on the medical original image to obtain a medical labeling image.

The medical annotation image refers to a medical original image carrying medical text.

The acquisition of the medical original image can be obtained from the medical image disclosed on the internet, can also be obtained from the medical image data proprietary to organization institutions such as hospitals/medical colleges, and the like, and can also be obtained from the medical competition data disclosed in each department; further, based on the obtained medical raw image, the segmentation may be performed according to different organs, tissues, lesion sites, etc., for example, one medical raw image including the left lung and the right lung is segmented into two medical raw images according to organs, thereby forming a medical raw image dataset having various organs, tissues, lesion sites on a large scale.

Secondly, the annotation refers to adding medical text to the original medical image. In some embodiments, the medical text may be added as a text label to the medical original image, or may be added to the medical original image in a form named file name, without limitation.

And 430, performing three-dimensional image reconstruction calculation on the medical annotation image to obtain a three-dimensional annotation model.

The three-dimensional annotation model carries medical texts corresponding to the medical annotation images.

Specifically, based on the medical annotation image, determining a medical original image and a medical text carried by the medical original image; performing three-dimensional image reconstruction calculation on the medical original image by adopting a three-dimensional image reconstruction technology to obtain a medical original three-dimensional model; and labeling the medical original three-dimensional model by using the medical text carried by the medical original image to obtain a three-dimensional labeling model.

That is, by adopting the three-dimensional image reconstruction technology, each medical annotation image can obtain a corresponding three-dimensional annotation model, and it should be noted that, by corresponding, it is meant that the medical annotation image and the three-dimensional annotation model have matched medical texts, and it is also understood that the medical original image and the medical original three-dimensional model both conform to the content style described by the matched medical texts.

And step 450, decomposing the three-dimensional annotation model into a plurality of two-dimensional annotation images according to different visual angles.

Each two-dimensional annotation image corresponds to different visual angles and carries medical texts corresponding to the three-dimensional annotation model.

Specifically, determining a medical original three-dimensional model and a medical text carried by the medical original three-dimensional model based on the three-dimensional labeling model; decomposing the medical original three-dimensional model into a plurality of medical original two-dimensional images under different viewing angles; and labeling the plurality of medical original two-dimensional images under different visual angles by utilizing the medical text carried by the medical original three-dimensional model to obtain a plurality of two-dimensional labeling images under different visual angles.

It is explained here that for each three-dimensional annotation model, there are a plurality of two-dimensional annotation images at a respective different viewing angle, the three-dimensional annotation model and its respective plurality of two-dimensional annotation images having a matching medical text, it being understood that the medical raw three-dimensional model and its respective plurality of medical raw two-dimensional images each conform to the content style described by the matching medical text.

Step 470, constructing a first training set based on each two-dimensional annotation image and the medical text carried by the two-dimensional annotation images, and constructing a second training set based on the three-dimensional annotation model and each two-dimensional annotation image carrying the medical text corresponding to the three-dimensional annotation model.

The two-dimensional annotation image is a medical original two-dimensional image carrying medical text; the three-dimensional labeling model refers to a medical original three-dimensional model carrying medical text.

As shown in fig. 6, the first training set is composed of a large number of medical original two-dimensional images and medical texts carried by the medical original two-dimensional images, namely, the first training set is a data set from the medical texts to the medical two-dimensional images; the second training set is composed of a plurality of medical original three-dimensional models, a plurality of medical original two-dimensional images under corresponding different visual angles and medical texts carried by the medical original two-dimensional images, namely the second training set is a data set from the medical two-dimensional images to the medical three-dimensional models.

After the first training set is obtained, the first generating network can be obtained based on the training of the first training set, and the method specifically comprises the following steps: acquiring an initial diffusion model; the initial Diffusion model is a pre-trained Stable Diffusion model; based on the first training set, performing parameter tuning training on the initial diffusion model, and if the parameter tuning training of the initial diffusion model meets the set condition, obtaining a diffusion model which is completed to be trained; a first generation network is constructed based on the text encoder, the diffusion model that completes the training, and the image decoder.

After the second training set is obtained, the second generating network can be obtained based on the training of the second training set, which specifically includes the following steps: obtaining a deep learning model which is obtained by pre-training a natural image training set; based on the second training set, performing parameter tuning training on the deep learning model; and if the parameter tuning training of the deep learning model meets the set condition, obtaining a second generation network.

The setting conditions can be flexibly set according to the actual needs of the application scene, for example, in the application scene, the setting conditions can mean that parameters reach the optimum, so that the training precision of the model is improved; in another application scenario, the setting condition may mean that the number of iterations reaches a threshold, so as to improve the training efficiency of the model, which is not limited herein. The optimization of the parameters may be achieved by a loss function or the like, and is not limited thereto.

Under the action of the embodiment, the data set of the medical text-medical three-dimensional model is constructed, and is applied to the pre-trained model for training, so that the network from the end-to-end medical text to the medical three-dimensional model is realized, the method is suitable for multiple clinical scenes and multiple categories in medicine, improves the representation of the original three-dimensional AIGC generation model based on text prompt in medical content generation, and is convenient for providing brand new technology and tools for subsequent VR/MR clinical practice training and examination.

At present, the traditional medical education system mainly comprising animal specimens, human specimens and teaching auxiliary devices has the problems of insufficient resources, high risk of damage to patients and the like in the culture process of medical talents. Virtual Reality (VR) is a technique of creating a virtual three-dimensional scene by using a computer, and giving a user an immersion feeling such as a visual sense, an auditory sense, and a tactile sense. With the development of virtual reality and Mixed Reality (MR) technologies in recent years, the application of VR and MR technologies to medical education training assessment has become a new trend. Compared with the traditional teaching mode, the VR and MR technology has very remarkable advantages, through carrying out multidimensional data acquisition on a real human body, constructing a digital human body or target tissue model through simulation modeling, the digital teaching with low cost, repeatability and quantification assessment can be carried out, students are allowed to learn and grow in the environment with the repeatability and exercise, simulation materials with rich cases and scientific specifications are provided, and the problems of insufficient teaching resources and difficult quantification assessment can be effectively improved. Meanwhile, in clinical teaching and preoperative planning, a better effect can be achieved by using a mixed reality technology.

On one hand, the main stream method of the traditional medical three-dimensional virtual model is directly manufactured by using geometric modeling software, and some schemes use medical images to acquire a three-dimensional model through steps of segmentation, reconstruction, rendering and the like, but the operation steps are complex, the manufacturing cost is high, and no simple text description control content style free change and editing can be realized; on the other hand, the current AIGC three-dimensional generation large model has the effect of a natural image model and a natural scene, but the effect of satisfying the medical related content generation cannot be achieved, and the AIGC three-dimensional generation large model is difficult to adapt to the requirements of medical training on different content styles of a plurality of different scenes.

Fig. 7 shows a schematic diagram of a medical three-dimensional model in an application scenario. As shown in fig. 7, in the application scenario, the server may be a server, and the first user side and the second user side may be electronic devices capable of implementing interaction with the user, for example, the first user side may be a desktop computer, the second user side may be a notebook computer, and the like, so that the first user side may generate a medical three-dimensional model from a medical text input by the user through a user interaction manner, and the second user side may perform medical training or medical teaching and other examination on the user through a user interaction manner. It should be noted that the first user terminal and the second user terminal may also be disposed on the same electronic device, which is not particularly limited herein.

Specifically, the server builds a first training set and a second training set so as to obtain a first generating network based on the training of the first training set and obtain a second generating network based on the training of the second training set, and deploys the first generating network and the second generating network to the first user.

After the first user terminal completes the deployment of the first generation network and the second generation network, the user can input corresponding medical texts by means of the first user terminal according to the content styles of the medical three-dimensional model which are expected to be obtained, so that the first user terminal generates a medical two-dimensional image from the medical texts by calling the first generation network, and the second generation network is called to generate the medical three-dimensional model from the medical two-dimensional image, the obtained medical three-dimensional model accords with the content styles expected by the user, and the medical three-dimensional model is transmitted to the second user terminal.

Along with the operation of the client in the second user side, the second user side displays a virtual scene with the medical three-dimensional model introduced to the user, wherein the virtual scene is a digital scene constructed for medical training, medical teaching and other examination by utilizing a computer technology so as to simulate an environment (such as a medical operation laboratory) required by medical training or medical teaching, and further carry out medical training, medical teaching and other examination on the user by capturing the mode of the analog operation of the user in the simulated environment.

For example, when a user desires to participate in a medical training assessment, the client may be initiated to enter a corresponding virtual scene, e.g., a simulated medical surgery laboratory, where the imported medical three-dimensional model may be a patient's tumor-bearing lung. At this point, the simulated operation performed by the user on the patient's tumor-bearing lung includes, but is not limited to: a simulation operation of the surgical instrument by the user, a reply operation of the user for the examination content, and the like. Along with the simulation operation process of the user, the second user side can also check the medical training of the user based on the corresponding operation video acquired by the image acquisition equipment and the corresponding sensing data acquired by the mixed reality equipment.

In the application scene, the AIGC technology is applied to the medical field, so that the AIGC technology is applied to the generation of a medical mixed reality model, and the application blank of the aspect is filled; the method can also make up for the defects of AIGC in the aspect of medical content model generation, which are presented by the lack of learning of medical content, and simultaneously can solve the three-dimensional model requirements in different scenes in mixed reality medical simulation training and teaching by using the generation scheme; in addition, by the text generation mode, medical three-dimensional contents under various conditions can be rapidly acquired, and the generation cost for developing a three-dimensional model by mixed reality medicine is reduced.

The following is an embodiment of the apparatus of the present application, which may be used to execute the method for generating a medical three-dimensional model according to the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to an embodiment of a method for generating a medical three-dimensional model according to the present application.

Referring to fig. 8, in an embodiment of the present application, a medical three-dimensional model generating apparatus 900 is provided, including but not limited to: text acquisition module 910, image generation module 930, model generation module 950, and model presentation module 970.

The text obtaining module 910 is configured to obtain a medical text in response to an input operation in the interactive interface.

The image generating module 930 is configured to invoke the first generating network, learn the first tensor input to the first generating network under the guidance of the medical text, and obtain a medical two-dimensional image according with the description of the medical text. The first generation network is a deep learning model trained and having the capability of generating two-dimensional images from medical text to medical.

The model generating module 950 is configured to input the medical two-dimensional image into the second generating network for generating a medical three-dimensional model. The second generation network is a deep-learning model trained and having the capability of generating from a medical two-dimensional image to a medical three-dimensional model.

The model display module 970 is configured to display the medical three-dimensional model generated by the second generation network in the interactive interface.

In an exemplary embodiment, the first generation network includes a text encoder, a diffusion model, and an image decoder.

Wherein the image generation module 930 is further configured to convert the medical text into a second tensor by using the text encoder, and obtain the first tensor that is randomly generated; controlling the second tensor to guide the diffusion model to conduct diffusion learning on the first tensor according to the content style of the medical text description to obtain a third tensor; the third tensor is converted into the medical two-dimensional image using the image decoder.

In an exemplary embodiment, the image generating module 930 is further configured to input the first tensor into the diffusion model, and perform denoising through a reverse process of the diffusion model; and taking the second tensor as a guiding condition, and introducing the guiding condition into a denoising process of the first tensor to obtain the third tensor.

In an exemplary embodiment, the apparatus 900 further includes: and a training set construction module.

The training set construction module is used for acquiring a medical original image, and labeling the medical original image with respect to a medical text to obtain a medical labeling image; the medical annotation image is a medical original image carrying medical text; performing three-dimensional image reconstruction calculation on the medical annotation image to obtain a three-dimensional annotation model; the three-dimensional annotation model carries medical texts corresponding to the medical annotation images; decomposing the three-dimensional annotation model into a plurality of two-dimensional annotation images according to different visual angles; each two-dimensional annotation image corresponds to a different visual angle and carries a medical text corresponding to the three-dimensional annotation model; constructing a first training set based on each two-dimensional annotation image and the medical text carried by the two-dimensional annotation images, and constructing a second training set based on the three-dimensional annotation model and each two-dimensional annotation image carrying the medical text corresponding to the three-dimensional annotation model; the first training set is used for training to obtain the first generation network, and the second training set is used for training to obtain the second generation network.

In an exemplary embodiment, the apparatus 900 further includes: a first training module.

The first training module is used for acquiring an initial diffusion model; the initial Diffusion model is a pre-trained Stable Diffusion model; based on the first training set, performing parameter tuning training on the initial diffusion model to obtain the diffusion model after training; the first generation network is constructed based on a text encoder, the diffusion model that completes training, and an image decoder.

In an exemplary embodiment, the apparatus 900 further includes: and a second training module.

The second training module is used for acquiring a deep learning model which is obtained by pre-training a natural image training set; based on the second training set, performing parameter tuning training on the deep learning model; and if the parameter tuning training of the deep learning model meets the set condition, obtaining the second generation network.

In an exemplary embodiment, the apparatus 900 further includes: and (5) checking the module.

The assessment module is used for importing the medical three-dimensional model into the constructed virtual scene; the virtual scene is constructed for medical training or medical teaching; and based on the simulation operation of the target object on the medical three-dimensional model in the virtual scene, checking medical training or medical teaching of the target object.

It should be noted that, when the medical three-dimensional model generating device provided in the foregoing embodiment generates a medical three-dimensional model, only the division of the functional modules is illustrated, and in practical application, the above-mentioned function allocation may be performed by different functional modules according to needs, that is, the internal structure of the medical three-dimensional model generating device is divided into different functional modules to perform all or part of the functions described above.

In addition, the apparatus for generating a medical three-dimensional model provided in the foregoing embodiments belongs to the same concept as the embodiment of the method for generating a medical three-dimensional model, where the specific manner in which each module performs the operation has been described in detail in the method embodiment, which is not described herein again.

Fig. 9 shows a schematic structure of an electronic device according to an exemplary embodiment. The electronic device is suitable for use at a user terminal 110 in the implementation environment shown in fig. 1.

It should be noted that the electronic device is just one example adapted to the present application, and should not be construed as providing any limitation to the scope of use of the present application. Nor should the electronic device be construed as necessarily relying on or necessarily having one or more of the components of the exemplary electronic device 2000 illustrated in fig. 9.

The hardware structure of the electronic device 2000 may vary widely depending on the configuration or performance, as shown in fig. 9, the electronic device 2000 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processing unit (CPU, central Processing Units) 270.

Specifically, the power supply 210 is configured to provide an operating voltage for each hardware device on the electronic device 2000.

The interface 230 includes at least one wired or wireless network interface 231 for interacting with external devices. Of course, in other examples of adaptation of the present application, the interface 230 may further include at least one serial-parallel conversion interface 233, at least one input-output interface 235, and at least one USB interface 237, as shown in fig. 9, which is not specifically limited herein.

The memory 250 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, where the resources stored include an operating system 251, application programs 253, and data 255, and the storage mode may be transient storage or permanent storage.

The operating system 251 is used for managing and controlling various hardware devices and applications 253 on the electronic device 2000, so as to implement the operation and processing of the cpu 270 on the mass data 255 in the memory 250, which may be Windows server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The application 253 is based on computer readable instructions on the operating system 251 to perform at least one specific task, which may include at least one module (not shown in fig. 9), each of which may include computer readable instructions for the electronic device 2000, respectively. For example, the generation apparatus of the medical three-dimensional model may be regarded as the application 253 deployed on the electronic device 2000.

The data 255 may be a photograph, a picture, etc. stored in a disk, or may be a first generation network, a second generation network, etc. stored in the memory 250.

The central processor 270 may include one or more of the above processors and is configured to communicate with the memory 250 via at least one communication bus to read computer readable instructions stored in the memory 250, thereby implementing operations and processing of the bulk data 255 in the memory 250. The method of generating a medical three-dimensional model is accomplished, for example, by the central processor 270 reading a series of computer readable instructions stored in the memory 250.

Furthermore, the present application can be realized by hardware circuitry or by a combination of hardware circuitry and software, and thus, the implementation of the present application is not limited to any specific hardware circuitry, software, or combination of the two.

Referring to fig. 10, in an embodiment of the present application, an electronic device 4000 is provided, and the electronic device 400 may include: desktop computers, notebook computers, servers, etc.

In fig. 10, the electronic device 4000 includes at least one processor 4001 and at least one memory 4003.

Among other things, data interaction between the processor 4001 and the memory 4003 may be achieved by at least one communication bus 4002. The communication bus 4002 may include a path for transferring data between the processor 4001 and the memory 4003. The communication bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus.

Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program instructions or code in the form of instructions or data structures and that can be accessed by electronic device 400.

The memory 4003 has computer readable instructions stored thereon, and the processor 4001 can read the computer readable instructions stored in the memory 4003 through the communication bus 4002.

The computer readable instructions are executed by the one or more processors 4001 to implement the method of generating a medical three-dimensional model in the above embodiments.

Furthermore, in an embodiment of the present application, a storage medium is provided, on which computer readable instructions are stored, which are executed by one or more processors to implement a method for generating a medical three-dimensional model as described above.

In an embodiment of the present application, a computer program product is provided, where the computer program product includes computer readable instructions, where the computer readable instructions are stored in a storage medium, and where one or more processors of an electronic device read the computer readable instructions from the storage medium, load and execute the computer readable instructions, so that the electronic device implements a method for generating a medical three-dimensional model as described above.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method of generating a medical three-dimensional model, the method comprising:

acquiring a medical original image, and labeling the medical original image with respect to a medical text to obtain a medical labeling image; the medical annotation image is a medical original image carrying medical text;

performing three-dimensional image reconstruction calculation on the medical annotation image to obtain a three-dimensional annotation model; the three-dimensional annotation model carries medical texts corresponding to the medical annotation images;

decomposing the three-dimensional annotation model into a plurality of two-dimensional annotation images according to different visual angles; each two-dimensional annotation image corresponds to a different visual angle and carries a medical text corresponding to the three-dimensional annotation model;

constructing a first training set based on each two-dimensional annotation image and the medical text carried by the two-dimensional annotation images, and constructing a second training set based on the three-dimensional annotation model and each two-dimensional annotation image carrying the medical text corresponding to the three-dimensional annotation model; the first training set is used for training to obtain a first generation network, and the second training set is used for training to obtain a second generation network;

Responding to input operation in the interactive interface, and acquiring medical text;

invoking a first generation network, and under the guidance of the medical text, learning a first tensor input into the first generation network to obtain a medical two-dimensional image conforming to the description of the medical text; the first generation network is a trained deep learning model with generation capabilities from medical text to medical two-dimensional images;

inputting the medical two-dimensional image into a second generation network to generate a medical three-dimensional model; the second generation network is a trained deep learning model with generation capabilities from a medical two-dimensional image to a medical three-dimensional model;

and displaying the medical three-dimensional model generated by the second generation network in the interactive interface.

2. The method of claim 1, wherein the first generation network comprises a text encoder, a diffusion model, and an image decoder;

the method comprises the steps of calling a first generation network, under the guidance of the medical text, learning a first tensor input into the first generation network to obtain a medical two-dimensional image conforming to the description of the medical text, and comprises the following steps:

Converting the medical text into a second tensor by using the text encoder, and acquiring the first tensor which is randomly generated;

controlling the second tensor to guide the diffusion model to conduct diffusion learning on the first tensor according to the content style of the medical text description to obtain a third tensor;

the third tensor is converted into the medical two-dimensional image using the image decoder.

3. The method of claim 2, wherein the controlling the second tensor directs the diffusion model to perform diffusion learning on the first tensor according to the content style of the medical text description to obtain a third tensor comprises:

inputting the first tensor into the diffusion model, and denoising through the reverse process of the diffusion model;

and taking the second tensor as a guiding condition, and introducing the guiding condition into a denoising process of the first tensor to obtain the third tensor.

4. The method of claim 1, wherein the invoking the first generation network, under guidance of the medical text, learns a first tensor entered into the first generation network, before obtaining a medical two-dimensional image that conforms to the medical text description, the method further comprising:

Acquiring an initial diffusion model; the initial diffusion model is a pre-trained stable diffusion StableDiffuse model;

based on the first training set, performing parameter tuning training on the initial diffusion model to obtain the diffusion model after training;

the first generation network is constructed based on a text encoder, the diffusion model that completes training, and an image decoder.

5. The method of claim 1, wherein before the inputting the medical two-dimensional image into the second generation network for generation of a medical three-dimensional model, the method further comprises:

obtaining a deep learning model which is obtained by pre-training a natural image training set;

based on the second training set, performing parameter tuning training on the deep learning model;

and if the parameter tuning training of the deep learning model meets the set condition, obtaining the second generation network.

6. The method of any of claims 1 to 5, wherein after displaying the second generated network-generated medical three-dimensional model in the interactive interface, the method comprises:

importing the medical three-dimensional model in the constructed virtual scene; the virtual scene is constructed for medical training or medical teaching;

And based on the simulation operation of the target object on the medical three-dimensional model in the virtual scene, checking medical training or medical teaching of the target object.

7. A device for generating a medical three-dimensional model, the device comprising:

the image acquisition module is used for acquiring a medical original image, and labeling the medical original image with respect to a medical text to obtain a medical labeling image; the medical annotation image is a medical original image carrying medical text;

the computing module is used for carrying out three-dimensional image reconstruction computation on the medical annotation image to obtain a three-dimensional annotation model; the three-dimensional annotation model carries medical texts corresponding to the medical annotation images;

the decomposition module is used for decomposing the three-dimensional annotation model into a plurality of two-dimensional annotation images according to different visual angles; each two-dimensional annotation image corresponds to a different visual angle and carries a medical text corresponding to the three-dimensional annotation model;

the training set construction module is used for constructing a first training set based on each two-dimensional annotation image and the medical text carried by the two-dimensional annotation image, and constructing a second training set based on the three-dimensional annotation model and each two-dimensional annotation image carrying the medical text corresponding to the three-dimensional annotation model; the first training set is used for training to obtain a first generation network, and the second training set is used for training to obtain a second generation network;

The text acquisition module is used for responding to input operation in the interactive interface and acquiring medical text;

the image generation module is used for calling a first generation network, and under the guidance of the medical text, learning a first tensor input into the first generation network to obtain a medical two-dimensional image conforming to the description of the medical text; the first generation network is a trained deep learning model with generation capabilities from medical text to medical two-dimensional images;

the model generation module is used for inputting the medical two-dimensional image into a second generation network to generate a medical three-dimensional model; the second generation network is a trained deep learning model with generation capabilities from a medical two-dimensional image to a medical three-dimensional model;

and the model display module is used for displaying the medical three-dimensional model generated by the second generation network in the interactive interface.

8. An electronic device, comprising: at least one processor, and at least one memory, wherein,

the memory has computer readable instructions stored thereon;

the computer readable instructions are executed by one or more of the processors to cause an electronic device to implement the method of generating a medical three-dimensional model as claimed in any one of claims 1 to 6.

9. A storage medium having stored thereon computer readable instructions, the computer readable instructions being executable by one or more processors to implement the method of generating a medical three-dimensional model according to any one of claims 1 to 6.