WO2023224350A2

WO2023224350A2 - Method and device for detecting landmark from 3d volume image

Info

Publication number: WO2023224350A2
Application number: PCT/KR2023/006595
Authority: WO
Inventors: 김선경; 장정열
Original assignee: 주식회사 애마슈
Priority date: 2022-05-17
Filing date: 2023-05-16
Publication date: 2023-11-23
Also published as: WO2023224350A3; KR20230160995A

Abstract

A method for detecting a landmark from a three-dimensional volume image performed by a computing device, according to one embodiment of the present disclosure, may comprise the steps of: acquiring input data from a three-dimensional volume image in which a target site of an object is photographed; inputting the input data into a first landmark detection model; detecting a plurality of landmarks for a first part of the target site by using the first landmark detection model; inputting the three-dimensional volume image into a second landmark detection model; detecting a plurality of landmarks for a second part of the target site that is different from the first part of the target site by using the second landmark detection model; and generating a plurality of landmarks for the target site on the basis of the plurality of landmarks for the first part of the target site and the plurality of landmarks for the second part of the target site. The representative drawing may be Figure 4.

Description

Method and device for detecting landmarks from 3D volume images

The present disclosure relates to a method and device for detecting landmarks from a 3D volume image.

Recently, as interest in oral health, such as aesthetic treatment and/or orthodontic treatment, has increased, the demand for it is increasing not only among teenagers and young adults but also among the elderly.

In order to perform aesthetic treatment and/or orthodontic treatment, dentistry, oral and maxillofacial surgery, or plastic surgery, etc. use x-ray and CT (Computed Tomography) devices to photograph the patient's oral skeleton, face, teeth, etc. Two-dimensional images such as X-rays and CT images are acquired, anatomical locations (landmarks) are extracted from the two-dimensional images using oral orthodontic software, and diagnosis and treatment plans are performed based on this.

However, conventional oral orthodontic software only extracts landmarks from two-dimensional images and provides information for diagnosis and treatment planning based on them, but is not technically sufficient to comprehensively analyze the oral skeleton, face, teeth, etc. from two-dimensional images. There are difficulties.

Furthermore, digital correction technology based on 3D volume imaging technology is being introduced for improved aesthetic treatment and/or orthodontic treatment, but this digital correction technology requires more measurement operations than correction technology using 2D images. Therefore, it takes more time to analyze the 3D image, and there is a problem in that the accuracy of the 3D image analysis is reduced due to the complexity and volume of the 3D volume image.

Therefore, a method for quickly and accurately detecting landmarks from a 3D volume image is required.

In relation to this, Republic of Korea Patent No. 10-2334480 has been issued.

The present disclosure has been made in response to the above-described background technology, and seeks to provide a method and user device for detecting landmarks from a 3D volume image.

The technical problems of the present disclosure are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below.

According to an embodiment of the present disclosure for solving the above-described problem, a method for detecting a landmark from a 3D volume image performed by a computing device includes detecting a landmark from a 3D volume image of a target portion of an object. Obtaining input data; Inputting the input data into a first landmark detection model; detecting a plurality of landmarks for a first portion of the target area using the first landmark detection model; Inputting the 3D volume image into a second landmark detection model; detecting a plurality of landmarks for a second part of the target object that is different from the first part using the second landmark detection model; and generating a plurality of landmarks for the target area based on the plurality of landmarks for the first part and the plurality of landmarks for the second part.

Additionally, the target area may include the head, the first part may mean a skin area of the head, and the second part may mean a bone area of the head.

Additionally, the input data may include a first two-dimensional image representing the front of the head, a depth image corresponding to the first image, and a second two-dimensional image representing the side of the head.

Additionally, the first landmark detection model may include a first detection model learned to detect landmarks for the front of the head and a second detection model learned to detect landmarks for the side of the head. there is.

Additionally, detecting a plurality of landmarks for the first portion using the first landmark detection model may include inputting the first image and the depth image into the first detection model; Detecting a plurality of landmarks of a first group using the first detection model; Inputting the second image into the second detection model; detecting a plurality of landmarks of a second group using the second detection model; and generating a plurality of landmarks for the first portion based on the plurality of landmarks of the first group and the plurality of landmarks of the second group.

In addition, the first detection model predicts a plurality of landmarks of the first group by using the first image and the depth image as input, and provides two-dimensional coordinates for the predicted plurality of landmarks of the first group It is configured to output a value, and the second detection model predicts a plurality of landmarks of the second group using the second image as an input, and generates a two-dimensional image for the predicted plurality of landmarks of the second group. It may be configured to output coordinate values.

In addition, the step of generating a plurality of landmarks for the first portion includes two-dimensional coordinate values for the plurality of landmarks in the first group and two-dimensional coordinate values for the plurality of landmarks in the second group. This may be a step of generating 3D coordinate values by combining at least some of them.

In addition, the second landmark detection model includes a third detection model learned to detect a plurality of cropped images in which a plurality of landmarks for the second part are located from the 3D volume image, and a plurality of the detected It may include a plurality of fourth detection models learned to detect a plurality of landmarks for the second portion from the cropped image.

Additionally, detecting a plurality of landmarks for the second portion of the target area using the second landmark detection model may include inputting the three-dimensional volume image into the third detection model; detecting the plurality of cropped images using the third detection model; Inputting the detected plurality of cropped images into the plurality of fourth detection models; and detecting a plurality of landmarks for the second portion using the plurality of fourth detection models.

In addition, the third detection model uses the 3D volume image as an input to detect a plurality of candidate regions predicted as the plurality of landmarks for the second part from the 3D volume image, and It may be configured to output the plurality of cropped images corresponding to candidate areas.

Additionally, the plurality of candidate areas may overlap at least in part and include at least one position predicted by each of a plurality of landmarks for the second part.

Additionally, each of the plurality of fourth detection models may be configured to correspond to each of the plurality of cropped images.

In addition, each of the plurality of fourth detection models takes each cropped image as an input and predicts at least one landmark from a candidate area corresponding to each cropped image, and 3 models for each of the predicted at least one landmark. It may be configured to output dimensional coordinate values.

Additionally, each of the plurality of cropped images may be a 3D volume image corresponding to each of the plurality of candidate regions.

According to an embodiment of the present disclosure, a computer program stored in a computer-readable storage medium, wherein the computer program, when executed on one or more processors, performs the following operations for detecting a landmark from a three-dimensional volume image, The operations include obtaining input data from a 3D volume image that captures a target portion of an object; Inputting the input data into a first landmark detection model; detecting a plurality of landmarks for a first portion of the target region using the first landmark detection model; Inputting the 3D volume image into a second landmark detection model; detecting a plurality of landmarks for a second portion of the target area using the second landmark detection model; and an operation of generating a plurality of landmarks for the target area based on the first group of landmarks and the second group of landmarks.

According to an embodiment of the present disclosure, a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a computing device, causes the computing device to perform a method for detecting a landmark from a three-dimensional volume image. The method includes obtaining input data from a 3D volume image captured in a target area of an object; Inputting the input data into a first landmark detection model; detecting a plurality of landmarks for a first portion of the target area using the first landmark detection model; Inputting the 3D volume image into a second landmark detection model; detecting a plurality of landmarks for a second portion of the target area using the second landmark detection model; and generating a plurality of landmarks for the target area based on the first group of landmarks and the second group of landmarks. may include.

According to an embodiment of the present disclosure, a computing device for detecting a landmark from a 3D volume image includes: at least one processor; and a memory, wherein the at least one processor acquires input data from a 3D volume image photographing a target portion of the object, inputs the input data to a first landmark detection model, and detects the first landmark. Detecting a plurality of landmarks for the first part of the target area using a detection model, inputting the 3D volume image into a second landmark detection model, and using the second landmark detection model configured to detect a plurality of landmarks for a second part of the target region and generate a plurality of landmarks for the target region based on the first group of landmarks and the second group of landmarks. You can.

The technical solutions obtainable from this disclosure are not limited to the solutions mentioned above, and other solutions not mentioned above will be clearly apparent to those skilled in the art from the description below. It will be understandable.

According to some embodiments of the present disclosure, the 3D landmark detection time can be minimized by dividing the user's head into a skin area and a bone area and detecting landmarks using each artificial intelligence-based model.

In addition, by using an artificial intelligence-based model specialized for each landmark, it is possible to obtain optimized 3D landmark coordinate values for each landmark, thereby improving landmark detection accuracy.

Additionally, the hassle of converting a 3D image to 2D and then converting it back to 3D for landmark detection can be reduced.

The effects that can be obtained from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. .

Various aspects will now be described with reference to the drawings, where like reference numerals are used to collectively refer to like elements. In the following examples, for purposes of explanation, numerous specific details are set forth to provide a comprehensive understanding of one or more aspects. However, it will be clear that such aspect(s) may be practiced without these specific details.

1 is a configuration diagram of an example system for detecting landmarks from a 3D volume image according to an embodiment of the present disclosure.

Figure 2 is a block diagram of a server for detecting landmarks from a 3D volume image according to an embodiment of the present disclosure.

Figure 3 is a schematic diagram showing a network function according to an embodiment of the present disclosure.

FIG. 4 is a flowchart illustrating an example of a method for detecting a landmark from a 3D volume image according to an embodiment of the present disclosure.

Figure 5 is a schematic illustration for explaining a first landmark detection model according to an embodiment of the present disclosure.

Figure 6 is an example diagram for explaining the schematic structure of a detection model according to an embodiment of the present disclosure.

Figure 7 is a schematic illustration for explaining a second landmark detection model according to an embodiment of the present disclosure.

Figure 8 is an example diagram for explaining the schematic structure of a detection model according to an embodiment of the present disclosure.

Figure 9 is an example diagram for explaining the learning process of the first landmark detection model according to an embodiment of the present disclosure.

Figure 10 is an example diagram for explaining the inference process of the first landmark detection model according to an embodiment of the present disclosure.

Figure 11 is an example diagram for explaining the learning process of a second landmark detection model according to an embodiment of the present disclosure.

Figure 12 is an example diagram for explaining the inference process of a second landmark detection model according to an embodiment of the present disclosure.

Figure 13 is a brief, general schematic diagram of an example computing environment in which embodiments of the present disclosure may be implemented.

Various embodiments are now described with reference to the drawings. In this specification, various descriptions are presented to provide an understanding of the disclosure. However, it is clear that these embodiments may be practiced without these specific descriptions.

As used herein, the terms “component,” “module,” “system,” and the like refer to a computer-related entity, hardware, firmware, software, a combination of software and hardware, or an implementation of software. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, a thread of execution, a program, and/or a computer. For example, both an application running on a computing device and the computing device can be a component. One or more components may reside within a processor and/or thread of execution. A component may be localized within one computer. A component may be distributed between two or more computers. Additionally, these components can execute from various computer-readable media having various data structures stored thereon. Components can transmit signals, for example, with one or more data packets (e.g., data and/or signals from one component interacting with other components in a local system, a distributed system, to other systems and over a network such as the Internet). Depending on the data being transmitted, they may communicate through local and/or remote processes.

Additionally, the term “or” is intended to mean an inclusive “or” and not an exclusive “or.” That is, unless otherwise specified or clear from context, “X utilizes A or B” is intended to mean one of the natural implicit substitutions. That is, either X uses A; X uses B; Or, if X uses both A and B, “X uses A or B” can apply to either of these cases. Additionally, the term “and/or” as used herein should be understood to refer to and include all possible combinations of one or more of the related listed items.

Additionally, the terms “comprise” and/or “comprising” should be understood to mean that the corresponding feature and/or element is present. However, the terms “comprise” and/or “comprising” should be understood as not excluding the presence or addition of one or more other features, elements and/or groups thereof. Additionally, unless otherwise specified or the context is clear to indicate a singular form, the singular terms herein and in the claims should generally be construed to mean “one or more.”

And, the term “at least one of A or B” should be interpreted to mean “a case containing only A,” “a case containing only B,” and “a case of combining A and B.”

Those skilled in the art will additionally recognize that the various illustrative logical blocks, components, modules, circuits, means, logic, and algorithm steps described in connection with the embodiments disclosed herein may be implemented using electronic hardware, computer software, or a combination of both. It must be recognized that it can be implemented with To clearly illustrate the interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented in hardware or software will depend on the specific application and design constraints imposed on the overall system. A skilled technician can implement the described functionality in a variety of ways for each specific application. However, such implementation decisions should not be construed as causing a departure from the scope of the present disclosure.

The description of the presented embodiments is provided to enable anyone skilled in the art to use or practice the present invention. Various modifications to these embodiments will be apparent to those skilled in the art. The general principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Therefore, the present invention is not limited to the embodiments presented herein. The present invention is to be interpreted in the broadest scope consistent with the principles and novel features presented herein.

In this disclosure, network function, artificial neural network, and neural network may be used interchangeably.

When the terms “about” or “approximately” are used in this disclosure in connection with a numerical value, that numerical value is intended to include a deviation of ±10% around the stated numerical value.

Referring to FIG. 1, a system for detecting landmarks from a 3D volume image includes an imaging device 10 that captures a 3D volume image of a target area of an object, and a request for landmark detection by acquiring the 3D volume image. It may include a computing device 20 that detects a landmark from a 3D volume image and a server 100 that detects a landmark. The components shown in FIG. 1 are exemplary, and additional components may exist or some of the components may be omitted.

According to some embodiments of the present disclosure, the photographing device 10, the computing device 20, and the server 100 detect landmarks from a three-dimensional volume image according to some embodiments of the present disclosure through a communication network. Data for each other can be transmitted and received.

The imaging device 10 is a device for taking a three-dimensional volume image of a target area of an object (e.g., a patient), for example, a computed tomography (CT) device, a cone beam CT (CBCT) device, or a multi-dimensional CT (MDCT) device. -detector CT) device, and/or MRI (Magnetic Resonance Imaging) device, etc. may be included. For example, the target area may include, but is not limited to, the head including the oral cavity.

Specifically, the imaging device 10 may provide a 3D volume image of a target area including the patient's oral cavity to the computing device 20 or the server 100 through a communication network (not shown).

The computing device 20 is a device for acquiring a 3D volume image from the imaging device 10 and requesting landmark detection for the 3D volume image, and is a device having a PC, laptop computer, terminal, and/or network connectivity. Can include any electronic device.

Specifically, the computing device 20 receives a 3D volume image of a target area of an object from the imaging device 10, and sends a request to the server to detect a landmark for the target area based on the 3D volume image. It can be transmitted as (100). Additionally, the computing device 20 may display an output screen that outputs data about the landmark detected from the server 100. For example, when the server 100 operates as a web server, the computing device 20 accesses a website related to landmark detection hosted by the server 100 through an application such as a web browser and accesses the website. A web page (i.e., interface screen) can be displayed.

According to some embodiments of the present disclosure, the website may include a first web page for uploading a 3D volume image and a second web page for outputting data on landmarks detected by the server. For landmark detection, the 3D volume image may be uploaded through the first web page and transmitted to the server 100.

According to some embodiments of the present disclosure, the computing device 20 may be any entity capable of processing, storing, and outputting any data, including a processor, a storage unit (memory and persistent storage media), and a display unit. there is.

The processor in the present disclosure may consist of one or more cores, including a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), and a tensor processing unit of the computing device 20. It may include any type of processor, such as a tensor processing unit (TPU), for requesting landmark detection and displaying an output screen by executing instructions stored in memory. The processor reads the computer program stored in the memory, transmits a landmark detection request according to an embodiment of the present disclosure to the server 100, and outputs data about the landmark provided from the server 100. there is.

The memory in the present disclosure may store a program for the operation of a processor, and may temporarily or permanently store input/output data. Memory is a flash memory type, hard disk type, multimedia card micro type, card type memory (e.g. SD or XD memory, etc.), and RAM (Random Access). Memory, RAM), SRAM (Static Random Access Memory), ROM (Read-Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, optical disk It may include at least one type of storage medium. These memories can be operated under processor control. Additionally, memory and storage may be used interchangeably with each other in the present disclosure.

The server 100 is a device that detects landmarks for a target area based on a three-dimensional volume image provided from the computing device 20, and may be any type such as a computer, digital processor, portable device, and device controller. It may include a computer system or computer device.

Specifically, the server 100 can learn an artificial intelligence-based model and detect landmarks from a 3D volume image using the learned model. Specifically, the server 100 may detect a landmark for the user's target area by performing at least one of object detection, classification, or segmentation from a 3D volume image using an artificial intelligence-based model.

According to some embodiments of the present disclosure, the server 100 may be any entity capable of processing and storing any data, including a processor and a storage unit (memory and persistent storage media).

A processor in the present disclosure may consist of one or more cores, including a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), a tensor processing unit (TPU), a graphics processing unit (GPU), and neural processing of a computing device. It may include any type of processor for landmark detection by executing instructions stored in memory, such as a Neural Processing Unit (NPU). The processor may read a computer program stored in a memory and perform landmark detection according to an embodiment of the present disclosure.

The memory in the present disclosure may store a program for the operation of a processor, and may temporarily or permanently store input/output data. Memory includes flash memory type, hard disk type, multimedia card micro type, card type memory (e.g. SD or XD memory, etc.), RAM, SRAM, ROM, EEPROM, PROM, magnetic memory, magnetic memory. It may include at least one type of storage medium among disks and optical disks. These memories can be operated under processor control. Additionally, memory and storage may be used interchangeably with each other in the present disclosure.

Figure 2 is a block diagram of a server for detecting landmarks from a 3D volume image according to an embodiment of the present disclosure. The configuration of the server 100 shown in FIG. 1 is only a simplified example. In one embodiment of the present disclosure, the server 100 may include different configurations for performing the computing environment of the server 100, and only some of the disclosed configurations may configure the server 100.

The server 100 may include a communication unit 110, a memory 120, and a processor 130. However, the above-described components are not essential for implementing the server 100, so the server 100 may have more or less components than the components listed above. Here, each component may be composed of a separate chip, module, or device, or may be included in one device.

The communication unit 110 according to an embodiment of the present disclosure may include any type of wired/wireless Internet module for network connection. Additionally, the communication unit 110 may include a short range communication module. Short-distance communication technologies include Bluetooth, Radio Frequency Identification (RFID), infrared data association (IrDA), Ultra-Wideband (UWB), and ZigBee.

The techniques described herein can be used in the networks mentioned above, as well as other networks.

According to some embodiments of the present disclosure, the communication unit 110 connects the server 100 to enable communication with an external device. The communication unit 110 can be connected to the imaging device 10 using wired/wireless communication and receive a 3D volume image.

According to an embodiment of the present disclosure, the storage unit 120 may store any type of information generated or determined by the processor 130 and any type of information received by the communication unit 110. According to some embodiments of the present disclosure, the storage unit 120 may store various data for detecting landmarks from a 3D volume image.

Storage unit 120 may include memory and/or persistent storage media. The storage unit 120 includes flash memory type, hard disk type, multimedia card micro type, card type memory (e.g. SD or XD memory, etc.), RAM, SRAM, ROM, EEPROM, PROM, It may include at least one type of storage medium among magnetic memory, magnetic disk, and optical disk. The server 100 may operate in relation to web storage that performs the storage function of the storage unit 120 on the Internet. The description of the storage unit described above is merely an example, and the present disclosure is not limited thereto.

The processor 130 may be composed of one or more cores, and may include a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), and a tensor processing unit (TPU) of a computing device. unit) and NPU (Neural Processing Unit) may include processors for data analysis and deep learning.

The processor 130 may read the computer program stored in the memory 130 and perform data processing to detect landmarks from the 3D volume image according to an embodiment of the present disclosure.

Specifically, in order to detect landmarks from a 3D volume image, the processor 130 uses each artificial intelligence learned to detect a plurality of landmarks for each of the first part of the target area and the second part different from the first part. A base model can be used. Hereinafter, a method for detecting a plurality of landmarks using an artificial intelligence-based model will be described in detail with reference to FIGS. 4 to 12.

According to various embodiments of the present disclosure, the processor 130 may perform operations for learning a neural network. The processor 130 is used for learning neural networks, such as processing input data for learning in deep learning (DL), extracting features from input data, calculating errors, and updating the weights of the neural network using backpropagation. Calculations can be performed. At least one of the CPU, GPGPU, TPU, and NPU of the processor 130 may process learning of the network function. For example, CPU and GPGPU can work together to process learning of network functions and data classification using network functions. Additionally, in an embodiment of the present disclosure, the processors of a plurality of computing devices can be used together to process learning of network functions and data classification using network functions. Additionally, a computer program executed in the server 100 according to an embodiment of the present disclosure may be a CPU, GPGPU TPU, or NPU executable program.

Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. A neural network can generally consist of a set of interconnected computational units, which can be referred to as nodes. These nodes may also be referred to as neurons. A neural network consists of at least one node. Nodes (or neurons) that make up neural networks may be interconnected by one or more links. In one embodiment, the detection model and/or extraction model herein may include the neural network described above.

Within a neural network, one or more nodes connected through a link may form a relative input node and output node relationship. The concepts of input node and output node are relative, and any node in an output node relationship with one node may be in an input node relationship with another node, and vice versa. As described above, input node to output node relationships can be created around links. One or more output nodes can be connected to one input node through a link, and vice versa.

In a relationship between an input node and an output node connected through one link, the value of the data of the output node may be determined based on the data input to the input node. Here, the link connecting the input node and the output node may have a weight. Weights may be variable and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. The output node value can be determined based on the weight.

As described above, in a neural network, one or more nodes are interconnected through one or more links to form an input node and output node relationship within the neural network. The characteristics of the neural network can be determined according to the number of nodes and links within the neural network, the correlation between the nodes and links, and the value of the weight assigned to each link. For example, if the same number of nodes and links exist and two neural networks with different weight values of the links exist, the two neural networks may be recognized as different from each other.

A neural network may consist of a set of one or more nodes. A subset of nodes that make up a neural network can form a layer. Some of the nodes constituting the neural network may form one layer based on the distances from the first input node. For example, a set of nodes with a distance n from the initial input node may constitute n layers. The distance from the initial input node can be defined by the minimum number of links that must be passed to reach the node from the initial input node. However, this definition of a layer is arbitrary for explanation purposes, and the order of a layer within a neural network may be defined in a different way than described above. For example, a layer of nodes may be defined by distance from the final output node.

The initial input node may refer to one or more nodes in the neural network through which data is directly input without going through links in relationships with other nodes. Alternatively, in a neural network network, in the relationship between nodes based on links, it may mean nodes that do not have other input nodes connected by links. Similarly, the final output node may refer to one or more nodes that do not have an output node in their relationship with other nodes among the nodes in the neural network. Additionally, hidden nodes may refer to nodes constituting a neural network other than the first input node and the last output node.

The neural network according to an embodiment of the present disclosure is a neural network in which the number of nodes in the input layer may be the same as the number of nodes in the output layer, and the number of nodes decreases and then increases again as it progresses from the input layer to the hidden layer. You can. In addition, the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be less than the number of nodes in the output layer, and the number of nodes decreases as it progresses from the input layer to the hidden layer. there is. In addition, the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be greater than the number of nodes in the output layer, and the number of nodes increases as it progresses from the input layer to the hidden layer. You can. A neural network according to another embodiment of the present disclosure may be a neural network that is a combination of the above-described neural networks.

A deep neural network (DNN) may refer to a neural network that includes multiple hidden layers in addition to the input layer and output layer. Deep neural networks allow you to identify latent structures in data. In other words, it is possible to identify the potential structure of a photo, text, video, voice, or music (e.g., what object is in the photo, what the content and emotion of the text are, what the content and emotion of the voice are, etc.) . Deep neural networks include convolutional neural networks (CNN), recurrent neural networks (RNN), auto encoders, generative adversarial networks (GAN), and restricted Boltzmann machines (RBM). machine), deep belief network (DBN), Q network, U network, Siamese network, Generative Adversarial Network (GAN), etc. The description of the deep neural network described above is only an example and the present disclosure is not limited thereto.

In one embodiment of the present disclosure, the network function may include an autoencoder. An autoencoder may be a type of artificial neural network to output output data similar to input data. The autoencoder may include at least one hidden layer, and an odd number of hidden layers may be placed between input and output layers. The number of nodes in each layer may be reduced from the number of nodes in the input layer to an intermediate layer called the bottleneck layer (encoding), and then expanded symmetrically and reduced from the bottleneck layer to the output layer (symmetrical to the input layer). Autoencoders can perform nonlinear dimensionality reduction. The number of input layers and output layers can be corresponded to the dimension after preprocessing of the input data. In an auto-encoder structure, the number of nodes in the hidden layer included in the encoder may have a structure that decreases as the distance from the input layer increases. If the number of nodes in the bottleneck layer (the layer with the fewest nodes located between the encoder and decoder) is too small, not enough information may be conveyed, so if it is higher than a certain number (e.g., more than half of the input layers, etc.) ) may be maintained.

A neural network may be trained in at least one of supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning. Learning of a neural network may be a process of applying knowledge for the neural network to perform a specific operation to the neural network.

Neural networks can be trained to minimize output errors. In neural network learning, learning data is repeatedly input into the neural network, the output of the neural network and the error of the target for the learning data are calculated, and the error of the neural network is transferred from the output layer of the neural network to the input layer in the direction of reducing the error. This is the process of updating the weight of each node in the neural network through backpropagation. In the case of supervised learning, learning data in which the correct answer is labeled for each learning data is used (i.e., labeled learning data), while in the case of unsupervised learning, the correct answer may not be labeled in each learning data. That is, for example, in the case of supervised learning on data classification, the training data may be data in which each training data is labeled with a category. Labeled training data is input to the neural network, and the error can be calculated by comparing the output (category) of the neural network with the label of the training data. As another example, in the case of unsupervised learning on data classification, the error can be calculated by comparing the input training data with the neural network output. The calculated error is backpropagated in the reverse direction (i.e., from the output layer to the input layer) in the neural network, and the connection weight of each node in each layer of the neural network can be updated according to backpropagation. The amount of change in the connection weight of each updated node may be determined according to the learning rate. The neural network's calculation of input data and backpropagation of errors can constitute a learning cycle (epoch). The learning rate may be applied differently depending on the number of repetitions of the learning cycle of the neural network. For example, in the early stages of neural network training, a high learning rate can be used to increase efficiency by allowing the neural network to quickly achieve a certain level of performance, and in the later stages of training, a low learning rate can be used to increase accuracy.

In the learning of neural networks, the training data can generally be a subset of real data (i.e., the data to be processed using the learned neural network), and thus the error for the training data is reduced, but the error for the real data is reduced. There may be an incremental learning cycle. Overfitting is a phenomenon in which errors in actual data increase due to excessive learning on training data. For example, a phenomenon in which a neural network that learned a cat by showing a yellow cat fails to recognize that it is a cat when it sees a non-yellow cat may be a type of overfitting. Overfitting can cause errors in machine learning algorithms to increase. To prevent such overfitting, various optimization methods can be used. To prevent overfitting, methods such as increasing the training data, regularization, dropout to disable some of the network nodes during the learning process, and use of a batch normalization layer are used. It can be applied.

Hereinafter, a method for detecting landmarks from a 3D volume image will be described with reference to FIGS. 4 to 12.

FIG. 4 is a flowchart illustrating an example of a method for detecting a landmark from a 3D volume image according to an embodiment of the present disclosure. In the presented embodiment, the operations of FIG. 4 may be performed by the processor 130 of the server 100.

Referring to FIG. 4, the processor 130 obtains input data from a 3D volume image captured of a target portion of an object (S400). Here, the target area of the object may refer to the head including the patient's oral region, but is not limited thereto.

Specifically, the processor 130 may extract a 2D skin image representing the frontal (coronal) and lateral (sagittal) skin regions of the patient's head from a 3D volume image of the patient's head. Furthermore, the processor 130 may extract a depth image. The two-dimensional skin image and depth image extracted in this way can be used as input data.

To extract a 2D skin image, the processor 130 may use an extraction algorithm, but is not limited to this. For example, the processor 130 may extract a 2D skin image and a depth image from a 3D volume image through software rendering.

Referring again to FIG. 4, the processor 130 inputs the input data obtained in this way to the first landmark detection model (S410), and uses the first landmark detection model to detect multiple images of the first part of the target area. Detect landmarks (S420). Here, the first landmark detection model may be an artificial intelligence-based model learned to detect a plurality of landmarks from each skin area on the front of the head and the side of the head based on input data.

In order to detect a plurality of landmarks from each of the skin areas on the front of the head and the side of the head, the first landmark detection model is a first detection model learned to detect a plurality of landmarks from the skin area on the front of the head and the skin area on the side of the head. It may include a second detection model learned to detect a plurality of landmarks. Hereinafter, the first landmark detection model will be described in detail with reference to FIG. 6.

Referring to FIG. 5, the first landmark detection model 500 is a first detection model 530 learned to detect a plurality of landmarks of the first group from the first two-dimensional image 510 corresponding to the front of the head. and a second detection model 540 learned to detect a plurality of landmarks of the second group from the second two-dimensional image 520 corresponding to the side of the head. The learning process of the first landmark detection model 500 will be described in detail with reference to FIG. 9 below.

Specifically, the processor 130 inputs the first two-dimensional image 510 and the depth image into the first detection model 530, and uses the first detection model 530 to detect a plurality of landmarks of the first group ( 550) can be detected. Subsequently, the processor 130 inputs the second two-dimensional image 520 into the second detection model 540 and detects a plurality of landmarks 560 of the second group using the second detection model 540. can do.

The first detection model 530 may output two-dimensional coordinate values for a plurality of landmarks 550 of the first group by using the first two-dimensional image 510 and the depth image as input. Additionally, the second detection model 540 may receive the second two-dimensional image 520 as input and output two-dimensional coordinate values for the plurality of landmarks 560 of the second group.

For example, the two-dimensional coordinate values of the output plurality of landmarks 550 of the first group consist of x-axis coordinate values and z-axis coordinate values, and the output plurality of landmarks 560 of the second group The two-dimensional coordinate value for may be composed of a y-axis coordinate value and a z-axis coordinate value.

The processor 130 combines at least some of the detected two-dimensional coordinate values for the plurality of landmarks 550 of the first group and the two-dimensional coordinate values of the plurality of landmarks 560 of the second group to Three-dimensional coordinate values can be generated as a plurality of landmarks for the first portion. Through this, the processor 130 can detect 19 landmarks (eg, Glabella, soft tissue Nasion, Nasal Dorsum, etc.) for the skin area using the first landmark detection model 500.

Hereinafter, the schematic structure of the first detection model 530 and the second detection model 540 will be described with reference to FIG. 6.

Figure 6 is an example diagram for explaining the schematic structure of a detection model according to an embodiment of the present disclosure. In the presented embodiment, each of the first detection model 530 and the second detection model 540 of FIG. 5 may have the structure of the detection model 600 of FIG. 6.

Referring to FIG. 6, the detection model 600 includes at least one convolution layer 610, a convolution channel, a pooling channel, etc. to extract features of the input image by performing a convolution operation. A convolutional neural network (CNN) 620 including at least one pooling layer 630 for a pooling operation and at least one fully connected layer 640 performing classification. It may include, but is not limited to this. Here, the CNN 620 may preferably be a CNN such as MobileNet V1/V2, which has sufficiently high accuracy and low computational complexity, uses low energy, and has a small model size, but is not limited thereto. Since the structure of the detection model 600 described in FIG. 6 is an example, additional layers may exist or some of the layers may be omitted.

Referring again to FIG. 4, the processor 130 inputs the 3D volume image into the second landmark detection model (S430), and uses the second landmark detection model to detect a plurality of lands for the second part of the target area. Detect the mark (S440). Here, the second landmark detection model is a third detection model for detecting a plurality of 3D cropped images in which a plurality of landmarks for the second part are located based on the 3D volume image, and a plurality of detected 3D It may include a fourth detection model for detecting a plurality of landmarks for the second portion from the cropped image. For example, the second landmark detection model may be a C2F (Coarse to Fine) 3D CNN, but is not limited thereto. For example, a 3D cropped image in the present disclosure may mean an image edited to include at least a portion of the target image. In the present disclosure, a 3D cropped image including a landmark corresponding to a second portion in a 3D volume image may be detected.

Hereinafter, the second landmark detection model will be described in detail with reference to FIG. 7.

Referring to FIG. 7, the second landmark detection model 700 is used to detect a crop image 730 corresponding to a candidate area where each of a plurality of landmarks for the second part is located from the 3D volume image 710. A plurality of fourth detection models learned to detect a plurality of landmarks for the second part from the learned third detection model 720 and a plurality of cropped images 730 detected from the third detection model 720 ( 740). Here, the second part refers to the bone area of the head, and the cropped image refers to a 3D cropped image.

The learning process of the second landmark detection model 700 will be described in detail with reference to FIG. 11 below.

In an embodiment of the present disclosure, a plurality of fourth detection models are configured to correspond to each of the plurality of cropped images, and they may have a parallel structure. Additionally, the candidate area may include at least one location predicted by each of a plurality of landmarks for the second part, and each candidate area may overlap at least partially with another candidate area. Specifically, the processor 130 changes the 3D volume image 710 into a 3D tensor to input it to the third detection model 720, and uses the 3D tensor as input data to input the third detection model 720. ) can be entered. The processor 130 uses the third detection model 720 to detect a candidate area including a position predicted by each of a plurality of landmarks for the second part and generate a crop image corresponding to the detected candidate area. You can. In various embodiments, the processor 130 may detect a plurality of cropped images corresponding to a plurality of candidate areas using the third detection model 720.

That is, the third detection model 720 according to an embodiment of the present disclosure takes a 3D tensor as input and outputs a cropped image corresponding to a candidate area for each of a plurality of landmarks for the second part, or a plurality of landmarks A 3D volume image representing each candidate area can also be output.

Next, the processor 130 inputs the plurality of cropped images 730 into each of the plurality of fourth detection models 740 corresponding to each of the plurality of cropped images 730, and generates the plurality of fourth detection models 740. Using this, a plurality of landmarks 750 for the second part can be detected. Each fourth detection model may use each cropped image as an input and output at least one landmark detected in each candidate area as output data. For example, the output data is the 3D coordinate values (x1, y1, z1) for the first landmark detected from the candidate area of the first cropped image 730a, and the candidate area of the second cropped image 730b. 3D coordinate values (x2, y2, z2) for the second landmark, 3D coordinate values (x3, y3, z3) for the third landmark detected from the candidate area of the third cropped image 730c, 3D coordinate values (x4, y4, z4) for the fourth landmark detected from the candidate area of the fourth cropped image 730d... , may include 3D coordinate values (xn, yn, zn) for the nth landmark detected from the candidate area of the nth cropped image 730n. Here, n may be a natural number greater than 4. That is, each of the plurality of fourth detection models 740 according to an embodiment of the present disclosure takes each cropped image as an input and outputs a three-dimensional coordinate value 750 for the landmark detected from the candidate area of each cropped image. You can.

The processor 130 may generate a 3D landmark 760 for the second portion including the 3D coordinate values detected in this way. Specifically, the processor 130 processes a first landmark, a second landmark, a third landmark,... , the 3D coordinate value 750 for the n-th landmark can be converted into a 3D landmark coordinate value in the 3D volume image to generate a plurality of 3D landmarks for the bone area of the target region. Through this, the processor 130 uses the second landmark detection model 700 to detect 49 landmarks for the bone area (e.g. Nasion, sella, R, L Po, R, L Or, R, L Pertygoid, etc. ) can be detected.

In various embodiments, the plurality of fourth detection models 740 may be artificial intelligence-based models learned to detect each landmark in response to each of the plurality of landmarks. For example, if the number of landmarks for the bone area is 49, each of the plurality of fourth detection models 740 may be an artificial intelligence-based model learned to detect each of the 49 landmarks. In this case, the processor 130 inputs each of the plurality of 3D cropped images into each of the 49 fourth detection models, and uses each of the 49 fourth detection models to determine 49 3D landmark coordinates from the plurality of 3D cropped images. The value can be detected. In this way, by providing a detection model for each landmark, high-accuracy detection of 3D landmarks is possible.

Hereinafter, the schematic structure of the third detection model 720 and the plurality of fourth detection models 740 will be described with reference to FIG. 8.

Figure 8 is an example diagram for explaining the schematic structure of a detection model according to an embodiment of the present disclosure. In the presented embodiment, each of the third detection model 720 and the plurality of fourth detection models 740 of FIG. 7 may have the structure of the detection model 800 of FIG. 8.

Referring to FIG. 8, the detection model 800 includes at least one convolution layer 805 that performs a convolution operation, a batch normalization (BN) layer 810, an activation function layer (RELU) 815, and max pooling. (Max Pooling) layer 820, CNN 825, BN layer 830, and FC layer 835. Here, the CNN 825 preferably solves the problem that weights are not properly updated in layers close to the input layer (vanishing-gradient problem), provides enhanced feature propagation, and reuses feature values. It can be a CNN such as DenseNet121 that can reduce the number of parameters, but is not limited to this. Since the structure of the detection model 800 described in FIG. 8 is exemplary, additional layers may exist or some of the layers may be omitted.

Referring again to FIG. 4, the processor 130 generates a plurality of landmarks for the target area based on the plurality of landmarks for the first part and the plurality of landmarks for the second part (S450). Specifically, the processor 130 detects using the 3D coordinate values of a plurality of landmarks for the first portion detected using the first landmark detection model 500 and the second landmark detection model 700. The 3D coordinate values of the plurality of landmarks for the second part may be combined to generate the 3D coordinate values of the plurality of landmarks for the target area.

Furthermore, the processor 130 may provide a web page indicating 3D coordinate values of a plurality of landmarks for the target area.

Through this, the landmark detection method according to an embodiment of the present disclosure extracts landmarks by distinguishing the skin area and bone area of the head from a 3D volume image using an artificial intelligence-based model, thereby minimizing landmark detection time and , detection accuracy can be improved.

Figure 9 is an example diagram for explaining the learning process of the first landmark detection model according to an embodiment of the present disclosure. In the presented embodiment, the following operations may be performed by the processor 130 of the server 100.

Referring to FIG. 9, the processor 130 may perform a first preprocessing 905 to convert the learning data 900 into a three-dimensional tensor. Here, the learning data 900 is a 3D volume image obtained by photographing a target area corresponding to the head, and may be 200 to 300 3D CT images. For example, the first preprocessing 905 performs a linear transform to convert a 3D volume image into a 3D tensor and converts the 3D tensor into 3 with a voxel size of (1, 1, 1). It may include, but is not limited to, resampling to convert to a dimensional tensor.

Subsequently, the processor 130 may perform primary data augmentation 910 on the 3D tensor to augment the number of learning data. Here, the first data augmentation 910 is data that generates a plurality of 3D tensors and a plurality of 3D coordinate values by rotating (shifting) the 3D tensor and each 3D coordinate value for the 3D tensor by a preset angle. Augmentation methods may be used. For example, the processor 130 may generate a plurality of 3D tensors by rotating a 3D tensor along the 3D y-axis and z-axis. In the case of CT images, the imaging height may be different for each user and the user may move the head during imaging, so the processor 130 can rotate the 3D tensor by various angles and use it as learning data. Through this, for example, an augmented image that is approximately 21 times the size of the existing image can be generated.

Next, the processor 130 performs first learning data acquisition 915 for extracting a two-dimensional image of the front of the head from a three-dimensional tensor and second learning data acquisition 920 for extracting a two-dimensional image of the side of the head. It can be done. First learning data acquisition 915 and second learning data acquisition 920 may be performed through algorithms for this purpose.

Specifically, to obtain the first learning data 915, the processor 130 may render a 3D tensor and extract a 2D image and a depth image of the front of the head. To acquire second learning data 920, the processor 130 may extract a 2D image of the side of the head by rendering a 3D tensor. Here, the extracted two-dimensional image may be, for example, a two-dimensional tensor.

Subsequently, the processor 130 may perform a second preprocessing 925 on the first learning data and a second preprocessing 930 on the second learning data. Here, the second preprocessing 925 for the first learning data and the second preprocessing 930 for the second learning data each normalize the two-dimensional image and make the size of the second image a preset size. It may include padding that fills empty space with a preset value, but is not limited to this.

Next, the processor 130 may perform secondary data augmentation 935 on the second preprocessed first training data and secondary data augmentation 940 on the second preprocessed second training data. For each secondary data augmentation, the data augmentation method described above may be used. For example, the processor 130 may include second preprocessed first training data, that is, a second preprocessed two-dimensional image for the front of the head, a second preprocessed depth image, and two-dimensional coordinate values (x, z) for each image. ) by a preset angle to generate a plurality of two-dimensional images, a plurality of depth images, and a plurality of two-dimensional coordinate values, thereby rotating the two-dimensional image, depth image, and two-dimensional coordinate values by various angles to generate first learning data. It can be used as. In addition, the processor 130 rotates the second preprocessed second learning data, that is, the second preprocessed two-dimensional image for the side of the head, and the two-dimensional coordinate values (y, z) for each image by a preset angle to obtain a plurality of preprocessed learning data. By generating a two-dimensional image and a plurality of two-dimensional coordinate values, the two-dimensional image and two-dimensional coordinates can be rotated by various angles and used as second learning data. Through this, for example, an enhanced image that is approximately 10 to 16 times larger than the existing image can be generated.

Subsequently, the processor 130 may perform first model training 945 and second model training 950. Here, the first model learning 945 uses the first learning data augmented through the secondary data augmentation 935 as input and uses the first detection model 530 to detect a plurality of landmarks from the skin area on the front of the head. It may be a learning behavior. The second model learning (1050) trains the second detection model (540) to detect a plurality of landmarks from the skin area on the side of the head by using the second learning data augmented through the secondary data augmentation (940) as input. It could be a movement.

Hereinafter, the inference process of the first landmark detection model 500 will be described in detail with reference to FIG. 10.

Figure 10 is an example diagram for explaining the inference process of the first landmark detection model according to an embodiment of the present disclosure. In the presented embodiment, the following operations may be performed by the processor 130 of the server 100.

Referring to FIG. 10, the processor 130 performs a first preprocessing 1000 on the 3D volume image 710 to convert it into a 3D tensor, and from the 3D tensor a 2D image of the front of the head of the skin area. First input data acquisition 1010 for extracting and second input data acquisition 1020 for extracting a two-dimensional image of the head side of the skin area may be performed. Here, the first preprocessing (1000), the first input data acquisition (1010), and the second input data acquisition (1020) are the first preprocessing (905), the first learning data acquisition (915) and the first learning data acquisition (915) described above with reference to FIG. 9. This may be at least the same operation as the second learning data acquisition 920.

Specifically, the processor 130 acquires a first 2-dimensional image and a depth image for the front of the head from a 3D tensor as first input data through first input data acquisition 1010, and acquires second input data 1020. ), a second two-dimensional image of the side of the head can be obtained as second input data.

Subsequently, the processor 130 may perform a second preprocessing 1030 on the first input data and a second preprocessing 1040 on the second input data. Here, the second preprocessing (1030, 1040) may be at least the same operation as the second preprocessing (925, 930) described above with reference to FIG. 9.

Next, the processor 130 may perform two-dimensional landmark prediction 1050 from the second pre-processed first input data and perform two-dimensional landmark prediction 1060 from the second pre-processed second input data. there is. Here, the two-dimensional landmark prediction (1050, 1060) uses the first landmark detection model 500 described above with reference to FIG. 5 to predict a plurality of images of the first group from the first two-dimensional image 510 and the depth image. This may mean an operation of detecting a landmark 550 and detecting a plurality of landmarks 560 of the second group from the second two-dimensional image 520. That is, the processor 130 predicts the two-dimensional landmark coordinate values (x, z) and the second pre-processed Two-dimensional landmark coordinate values (y, z) predicted for the side of the head can be obtained from the second input data.

Subsequently, the processor 130 may perform 3D landmark prediction 1070 and perform post-processing 1080 to generate a 3D landmark for the target area. Here, 3D landmark prediction 1070 may refer to the operation of generating 3D coordinate values by combining at least some of the 2D landmark coordinate values obtained through

2D landmark prediction

1050 and 1060. . In addition, post-processing 1080 refers to the operation of acquiring modified 3D coordinate values (i.e., 3D coordinate values in a 3D volume image) based on the 3D tensor and 3D coordinate values, and an algorithm for this etc. can be used.

Specifically, in 3D landmark prediction 1070, the processor 130 predicts 2D landmark coordinate values (x, z) predicted for the front of the head and 2D landmark coordinate values (y, z) can be combined to generate 3D landmark coordinate values (x, y, z). For example, processor 130 may combine the x coordinates predicted for the front of the head with the y, z coordinates predicted for the sides of the head, or the x, z coordinates predicted for the front of the head and the y coordinates predicted for the sides of the head. By combining the coordinates, 3D landmark coordinate values (x, y, z) for the skin area can be generated.

In post-processing (1080), the processor 130 searches for a frontal image in the 3-dimensional tensor obtained through the first pre-processing (1000) using the (edge) can be detected. Subsequently, the processor 130 finds the coordinate value corresponding to the skin closest to the detected edge and converts the corresponding landmark coordinate value into the found coordinate value, thereby converting the predicted 3D coordinate value into the 3D coordinate value in the skin area. By moving to , the difference between the 3D coordinate value in the skin area and the predicted 3D coordinate value can be minimized. The processor 130 may output the post-processed 3D landmark coordinate values as a plurality of landmarks for the first portion.

Figure 11 is an example diagram for explaining the learning process of a second landmark detection model according to an embodiment of the present disclosure. In the presented embodiment, the following operations may be performed by the processor 130 of the server 100.

Referring to FIG. 11, the processor 130 may perform preprocessing 1110 to convert the learning data 1100 into a three-dimensional tensor. Here, the preprocessing 1110 may use a masking operation method to remove the remaining areas other than the bone area from the head area of the 3D volume image. Specifically, the processor 130 may remove the remaining area excluding the bone area from the head area of the 3D volume image and convert the 3D volume image including the bone area into a 3D tensor.

Subsequently, the processor 130 may perform data augmentation 1120 on the 3D tensor to augment the number of learning data. Here, the data augmentation 1120, like the first data augmentation 910 described above with reference to FIG. 9, rotates the 3D tensor and each 3D coordinate value for the 3D tensor by a preset angle to create a plurality of 3D tensors. And a data augmentation method that generates a plurality of three-dimensional coordinate values can be used.

Next, the processor 130 may perform third model learning 1130 on the augmented 3D tensor. Here, the third model learning 1130 uses training data increased through data augmentation as input to detect a plurality of candidate regions predicted by the positions of a plurality of 3D landmarks existing in the bone region (a third detection model (3130)). 720) may be a learning operation.

Subsequently, the processor 130 may perform a plurality of fourth model training 1140 on a plurality of cropped images. Here, the plurality of fourth model training 1140 may be an operation of training a plurality of fourth detection models 740 corresponding to a plurality of cropped images to detect a plurality of 3D landmarks from the plurality of cropped images.

Hereinafter, the inference process of the second landmark detection model 700 will be described in detail with reference to FIG. 12.

Figure 12 is an example diagram for explaining the inference process of a second landmark detection model according to an embodiment of the present disclosure. In the presented embodiment, the following operations may be performed by the processor 130 of the server 100.

Referring to FIG. 12, the processor 130 performs preprocessing (1200) on the 3D volume image 710, converts it into a 3D tensor, and predicts 3D landmark candidate area for the 3D tensor (1210). It can be done. Here, the preprocessing 1200 may be at least the same operation as the preprocessing 1110 described above with reference to FIG. 11 . Specifically, in the 3D landmark candidate area prediction 1210, the processor 130 inputs a 3D tensor and 3D coordinate values into the third detection model 730, and uses the third detection model 730 to detect the bone. A plurality of 3D crop images corresponding to a plurality of candidate areas predicted by the positions of a plurality of 3D landmarks existing in the area can be generated. That is, the third detection model 730 may output a plurality of 3D cropped images corresponding to a plurality of candidate regions by using the augmented 3D tensor and the augmented 3D coordinate values as input.

Next, the processor 130 may perform a plurality of 3D landmark predictions 1220 on a plurality of 3D cropped images. In the plurality of 3D landmark predictions 1220, the processor 130 inputs a plurality of cropped images into a plurality of fourth detection models 740, and uses the plurality of fourth detection models 740 to predict a plurality of 3D landmarks. A plurality of 3D landmark coordinate values can be detected from the candidate area of the cropped image. That is, the plurality of fourth detection models 740 may receive a plurality of 3D cropped images as input and output a plurality of 3D landmark coordinate values (x, y, z) predicted from a plurality of candidate areas.

Subsequently, the processor 130 may perform 3D landmark generation 1230 for the bone area based on the 3D landmark coordinate values detected from each of the plurality of 3D cropped images. Specifically, the processor 130 can generate 3D landmark coordinate values for the bone region by converting the 3D landmark coordinate values predicted from each candidate region into 3D landmark coordinate values in the 3D volume image. there is.

Furthermore, the processor 130 may combine the 3D landmark coordinate values for the skin area and the 3D landmark coordinate values for the bone area to generate 3D landmark coordinate values for the target area.

In this way, the landmark detection method according to an embodiment of the present disclosure divides the head into a skin area and a bone area and detects the landmark through an artificial intelligence-based model specialized for each landmark, thereby minimizing the time required for landmark detection. , detection accuracy can be increased, and more accurate coordinate values for each landmark can be obtained. Specifically, to detect 68 landmarks using an artificial intelligence-based model according to an embodiment of the present disclosure, the successive detection rate (SDR) is 93% or more within an error range of 3 mm, and the mean distance error (MDE) is The error range may be within 1.938 mm.

According to an embodiment of the present disclosure, a computer-readable medium storing a data structure is disclosed.

Data structure can refer to the organization, management, and storage of data to enable efficient access and modification of data. Data structure can refer to the organization of data to solve a specific problem (e.g., retrieving data, storing data, or modifying data in the shortest possible time). A data structure may be defined as a physical or logical relationship between data elements designed to support a specific data processing function. Logical relationships between data elements may include connection relationships between user-defined data elements. Physical relationships between data elements may include actual relationships between data elements that are physically stored in a computer-readable storage medium (e.g., a persistent storage device). A data structure may specifically include a set of data, relationships between data, and functions or instructions applicable to the data. Effectively designed data structures allow computing devices to perform computations while minimizing the use of the computing device's resources. Specifically, computing devices can increase the efficiency of operations, reading, insertion, deletion, comparison, exchange, and search through effectively designed data structures.

Data structures can be divided into linear data structures and non-linear data structures depending on the type of data structure. A linear data structure may be a structure in which only one piece of data is connected to another piece of data. Linear data structures may include List, Stack, Queue, and Deque. A list can refer to a set of data that has an internal order. The list may include a linked list. A linked list may be a data structure in which data is connected in such a way that each data is connected in a single line with a pointer. In a linked list, a pointer may contain connection information to the next or previous data. Depending on its form, a linked list can be expressed as a singly linked list, a doubly linked list, or a circularly linked list. A stack may be a data listing structure that allows limited access to data. A stack can be a linear data structure in which data can be processed (for example, inserted or deleted) at only one end of the data structure. Data stored in the stack may have a data structure (LIFO-Last in First Out) where the later it enters, the sooner it comes out. A queue is a data listing structure that allows limited access to data. Unlike the stack, it can be a data structure (FIFO-First in First Out) where data stored later is released later. A deck can be a data structure that can process data at both ends of the data structure.

A non-linear data structure may be a structure in which multiple pieces of data are connected behind one piece of data. Nonlinear data structures may include graph data structures. A graph data structure can be defined by vertices and edges, and an edge can include a line connecting two different vertices. Graph data structure may include a tree data structure. A tree data structure may be a data structure in which there is only one path connecting two different vertices among a plurality of vertices included in the tree. In other words, it may be a data structure that does not form a loop in the graph data structure.

Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. Below, it is described in a unified manner as a neural network. Data structures may include neural networks. And the data structure including the neural network may be stored in a computer-readable medium. Data structures including neural networks also include data preprocessed for processing by a neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data acquired from the neural network, activation functions associated with each node or layer of the neural network, neural network It may include a loss function for learning. A data structure containing a neural network may include any of the components disclosed above. In other words, the data structure including the neural network includes preprocessed data for processing by the neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data acquired from the neural network, activation functions associated with each node or layer of the neural network, neural network It may be configured to include all or any combination of the loss function for learning. In addition to the configurations described above, a data structure containing a neural network may include any other information that determines the characteristics of the neural network. Additionally, the data structure may include all types of data used or generated in the computational process of a neural network and is not limited to the above. Computer-readable media may include computer-readable recording media and/or computer-readable transmission media. A neural network can generally consist of a set of interconnected computational units, which can be referred to as nodes. These nodes may also be referred to as neurons. A neural network consists of at least one node.

The data structure may include data input to the neural network. A data structure containing data input to a neural network may be stored in a computer-readable medium. Data input to the neural network may include learning data input during the neural network learning process and/or input data input to the neural network on which training has been completed. Data input to the neural network may include data that has undergone pre-processing and/or data subject to pre-processing. Preprocessing may include a data processing process to input data into a neural network. Therefore, the data structure may include data subject to preprocessing and data generated by preprocessing. The above-described data structure is only an example and the present disclosure is not limited thereto.

The data structure may include the weights of the neural network. (In this specification, weights and parameters may be used with the same meaning.) And the data structure including the weights of the neural network may be stored in a computer-readable medium. A neural network may include multiple weights. Weights may be variable and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. Based on the weight, the data value output from the output node can be determined. The above-described data structure is only an example and the present disclosure is not limited thereto.

As an example and not a limitation, the weights may include weights that are changed during the neural network learning process and/or weights for which neural network learning has been completed. Weights that change during the neural network learning process may include weights that change at the start of the learning cycle and/or weights that change during the learning cycle. Weights for which neural network training has been completed may include weights for which a learning cycle has been completed. Therefore, the data structure including the weights of the neural network may include weights that are changed during the neural network learning process and/or the data structure including the weights for which neural network learning has been completed. Therefore, the above-mentioned weights and/or combinations of each weight are included in the data structure including the weights of the neural network. The above-described data structure is only an example and the present disclosure is not limited thereto.

The data structure including the weights of the neural network may be stored in a computer-readable storage medium (e.g., memory, hard disk) after going through a serialization process. Serialization can be the process of converting a data structure into a form that can be stored on the same or a different computing device and later reorganized and used. Computing devices can transmit and receive data over a network by serializing data structures. Data structures containing the weights of a serialized neural network can be reconstructed on the same computing device or on a different computing device through deserialization. The data structure including the weights of the neural network is not limited to serialization. Furthermore, the data structure including the weights of the neural network is a data structure to increase computational efficiency while minimizing the use of computing device resources (e.g., in non-linear data structures, B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree) may be included. The foregoing is merely an example and the present disclosure is not limited thereto.

The data structure may include hyper-parameters of a neural network. And the data structure including the hyperparameters of the neural network can be stored in a computer-readable medium. A hyperparameter may be a variable that can be changed by the user. Hyperparameters include, for example, learning rate, cost function, number of learning cycle repetitions, weight initialization (e.g., setting the range of weight values subject to weight initialization), Hidden Unit. It may include a number (e.g., number of hidden layers, number of nodes in hidden layers). The above-described data structure is only an example and the present disclosure is not limited thereto.

Although the present disclosure has generally been described above as being capable of being implemented by a computing device, those skilled in the art will understand that the present disclosure can be implemented in combination with computer-executable instructions and/or other program modules that can be executed on one or more computers and/or in hardware and software. It will be well known that it can be implemented as a combination.

Typically, program modules include routines, programs, components, data structures, etc. that perform specific tasks or implement specific abstract data types. Additionally, those skilled in the art will understand that the methods of the present disclosure are applicable to single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, handheld computing devices, microprocessor-based or programmable consumer electronics, etc. It will be appreciated that each of these may be implemented in other computer system configurations, including those capable of operating in conjunction with one or more associated devices.

The described embodiments of the disclosure can also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

Computers typically include a variety of computer-readable media. Computer-readable media can be any medium that can be accessed by a computer, and such computer-readable media includes volatile and non-volatile media, transitory and non-transitory media, removable and non-transitory media. Includes removable media. By way of example, and not limitation, computer-readable media may include computer-readable storage media and computer-readable transmission media. Computer-readable storage media refers to volatile and non-volatile media, transient and non-transitory media, removable and non-removable, implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Includes media. Computer readable storage media may include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage. This includes, but is not limited to, a device, or any other medium that can be accessed by a computer and used to store desired information.

A computer-readable transmission medium typically implements computer-readable instructions, data structures, program modules, or other data on a modulated data signal, such as a carrier wave or other transport mechanism. Includes all information delivery media. The term modulated data signal refers to a signal in which one or more of the characteristics of the signal have been set or changed to encode information within the signal. By way of example, and not limitation, computer-readable transmission media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of computer-readable transmission media.

An example environment 1300 is shown that implements various aspects of the present disclosure, including a computer 1302, which includes a processing unit 1304, a system memory 1306, and a system bus 1308. do. System bus 1308 couples system components, including but not limited to system memory 1306, to processing unit 1304. Processing unit 1304 may be any of a variety of commercially available processors. Dual processors and other multiprocessor architectures may also be used as processing units 1304.

System bus 1308 may be any of several types of bus structures that may further be interconnected to a memory bus, peripheral bus, and local bus using any of a variety of commercial bus architectures. System memory 1306 includes read only memory (ROM) 1310 and random access memory (RAM) 1312. The basic input/output system (BIOS) is stored in non-volatile memory 1310, such as ROM, EPROM, and EEPROM, and is a basic input/output system that helps transfer information between components within the computer 1302, such as during startup. Contains routines. RAM 1312 may also include high-speed RAM, such as static RAM, for caching data.

Computer 1302 may also include an internal hard disk drive (HDD) 1314 (e.g., EIDE, SATA)—the internal hard disk drive 1314 may also be configured for external use within a suitable chassis (not shown). -, a magnetic floppy disk drive (FDD) 1316 (e.g., for reading from or writing to a removable diskette 1318), and an optical disk drive 1320 (e.g., a CD-ROM (for reading the disk 1322 or reading from or writing to other high-capacity optical media such as DVD). Hard disk drive 1314, magnetic disk drive 1316, and optical disk drive 1320 are connected to system bus 1308 by hard disk drive interface 1324, magnetic disk drive interface 1326, and optical drive interface 1328, respectively. ) can be connected to. The interface 1324 for implementing an external drive includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.

These drives and their associated computer-readable media provide non-volatile storage of data, data structures, computer-executable instructions, and the like. For computer 1302, drive and media correspond to storing any data in a suitable digital format. Although the description of computer-readable media above refers to removable optical media such as HDDs, removable magnetic disks, and CDs or DVDs, those skilled in the art will also recognize removable optical media such as zip drives, magnetic cassettes, flash memory cards, cartridges, etc. It will be appreciated that other types of computer-readable media, such as the like, may also be used in the example operating environment and that any such media may contain computer-executable instructions for performing the methods of the present disclosure.

A number of program modules may be stored in drives and RAM 1312, including an operating system 1330, one or more application programs 1332, other program modules 1334, and program data 1336. All or portions of the operating system, applications, modules and/or data may also be cached in RAM 1312. It will be appreciated that the present disclosure may be implemented on various commercially available operating systems or combinations of operating systems.

A user may enter commands and information into computer 1302 through one or more wired/wireless input devices, such as a keyboard 1338 and a pointing device such as mouse 1340. Other input devices (not shown) may include microphones, IR remote controls, joysticks, game pads, stylus pens, touch screens, etc. These and other input devices are connected to the processing unit 1304 through an input device interface 1342, which is often connected to the system bus 1308, but may also include a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, It can be connected by other interfaces, etc.

A monitor 1344 or other type of display device is also connected to system bus 1308 through an interface, such as a video adapter 1346. In addition to monitor 1344, computers typically include other peripheral output devices (not shown) such as speakers, printers, etc.

Computer 1302 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1348, via wired and/or wireless communications. Remote computer(s) 1348 may be a workstation, computing device computer, router, personal computer, portable computer, microprocessor-based entertainment device, peer device, or other conventional network node, and is generally connected to computer 1302. For simplicity, only memory storage device 1350 is shown, although it includes many or all of the components described. The logical connections depicted include wired/wireless connections to a local area network (LAN) 1352 and/or a larger network, such as a wide area network (WAN) 1354. These LAN and WAN networking environments are common in offices and companies and facilitate enterprise-wide computer networks, such as intranets, all of which can be connected to a worldwide computer network, such as the Internet.

When used in a LAN networking environment, computer 1302 is connected to local network 1352 through wired and/or wireless communications network interfaces or adapters 1356. Adapter 1356 may facilitate wired or wireless communication to LAN 1352, which also includes a wireless access point installed thereon for communicating with wireless adapter 1356. When used in a WAN networking environment, the computer 1302 may include a modem 1358 or be connected to a communicating computing device on the WAN 1354 or to establish communications over the WAN 1354, such as over the Internet. Have other means. Modem 1358, which may be internal or external and a wired or wireless device, is coupled to system bus 1308 via serial port interface 1342. In a networked environment, program modules described for computer 1302, or portions thereof, may be stored in remote memory/storage device 1350. It will be appreciated that the network connections shown are exemplary and that other means of establishing a communications link between computers may be used.

Computer 1302 may be associated with any wireless device or object deployed and operating in wireless communications, such as a printer, scanner, desktop and/or portable computer, portable data assistant (PDA), communications satellite, wirelessly detectable tag. Performs actions to communicate with any device or location and telephone. This includes at least Wi-Fi and Bluetooth wireless technologies. Accordingly, communication may be a predefined structure as in a conventional network or may simply be ad hoc communication between at least two devices.

Wi-Fi (Wireless Fidelity) allows connection to the Internet, etc. without wires. Wi-Fi is a wireless technology, like cell phones, that allows these devices, such as computers, to send and receive data indoors and outdoors, anywhere within the coverage area of a base station. Wi-Fi networks use wireless technology called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, and high-speed wireless connections. Wi-Fi can be used to connect computers to each other, the Internet, and wired networks (using IEEE 802.3 or Ethernet). Wi-Fi networks can operate in the unlicensed 2.4 and 5 GHz wireless bands, for example, at data rates of 11 Mbps (802.11a) or 54 Mbps (802.11b), or in products that include both bands (dual band). .

Those skilled in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols and chips that may be referenced in the above description include voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields. It can be expressed by particles or particles, or any combination thereof.

Those skilled in the art will understand that the various illustrative logical blocks, modules, processors, means, circuits and algorithm steps described in connection with the embodiments disclosed herein may be used in electronic hardware, (for convenience) It will be understood that it may be implemented by various forms of program or design code (referred to herein as software) or a combination of both. To clearly illustrate this interoperability of hardware and software, various illustrative components, blocks, modules, circuits and steps have been described above generally with respect to their functionality. Whether this functionality is implemented as hardware or software depends on the specific application and design constraints imposed on the overall system. A person skilled in the art of this disclosure may implement the described functionality in various ways for each specific application, but such implementation decisions should not be construed as departing from the scope of this disclosure.

The various embodiments presented herein may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term article of manufacture includes a computer program, carrier, or media accessible from any computer-readable storage device. For example, computer-readable storage media include magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips, etc.), optical disks (e.g., CDs, DVDs, etc.), smart cards, and flash. Includes, but is not limited to, memory devices (e.g., EEPROM, cards, sticks, key drives, etc.). Additionally, various storage media presented herein include one or more devices and/or other machine-readable media for storing information.

It is to be understood that the specific order or hierarchy of steps in the processes presented is an example of illustrative approaches. It is to be understood that the specific order or hierarchy of steps in processes may be rearranged within the scope of the present disclosure, based on design priorities. The appended method claims present elements of the various steps in a sample order but are not meant to be limited to the particular order or hierarchy presented.

The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Thus, the present disclosure is not limited to the embodiments presented herein but is to be interpreted in the broadest scope consistent with the principles and novel features presented herein.

As described above, the relevant content has been described in the best form for carrying out the invention.

It can be used in devices and systems for detecting landmarks from 3D volume images for oral health.

Claims

A method for detecting landmarks from a three-dimensional volume image, performed by a computing device, comprising:

Obtaining input data from a 3D volume image of a target area of an object;

Inputting the input data into a first landmark detection model;

detecting a plurality of landmarks for a first portion of the target area using the first landmark detection model;

Inputting the 3D volume image into a second landmark detection model;

detecting a plurality of landmarks for a second part of the target object that is different from the first part using the second landmark detection model; and

generating a plurality of landmarks for the target area based on a plurality of landmarks for the first part and a plurality of landmarks for the second part;

Including,

method.
The method of claim 1, wherein the target area includes tofu,

The first portion refers to the skin area of the head,

The second part refers to the bony region of the head,

method.
The method of claim 2, wherein the input data is:

Comprising a first two-dimensional image representing the front of the head, a depth image corresponding to the first image, and a second two-dimensional image representing the side of the head,

method.
The method of claim 3, wherein the first landmark detection model is:

Comprising a first detection model learned to detect landmarks on the front of the head and a second detection model learned to detect landmarks on the side of the head,

method.
The method of claim 4, wherein detecting a plurality of landmarks for the first portion using the first landmark detection model comprises:

Inputting the first image and the depth image into the first detection model;

Detecting a plurality of landmarks of a first group using the first detection model;

Inputting the second image into the second detection model;

detecting a plurality of landmarks of a second group using the second detection model; and

generating a plurality of landmarks for the first portion based on the plurality of landmarks of the first group and the plurality of landmarks of the second group;

Including,

method.
The method of claim 5, wherein the first detection model is:

configured to predict a plurality of landmarks of the first group using the first image and the depth image as input, and output two-dimensional coordinate values for the predicted plurality of landmarks of the first group,

The second detection model is,

Configured to predict a plurality of landmarks of the second group using the second image as an input, and output two-dimensional coordinate values for the predicted plurality of landmarks of the second group,

method.
The method of claim 6, wherein generating a plurality of landmarks for the first portion comprises:

Combining at least some of the two-dimensional coordinate values for the plurality of landmarks in the first group and the two-dimensional coordinate values for the plurality of landmarks in the second group to generate a three-dimensional coordinate value,

method.
The method of claim 2, wherein the second landmark detection model is:

A third detection model learned to detect a plurality of cropped images in which a plurality of landmarks for the second part are located from the 3D volume image, and a plurality of cropped images for the second part from the detected plurality of cropped images Comprising a plurality of fourth detection models learned to detect landmarks,

method.
The method of claim 8, wherein the step of detecting a plurality of landmarks for the second portion of the target area using the second landmark detection model comprises:

Inputting the 3D volume image into the third detection model;

detecting the plurality of cropped images using the third detection model;

Inputting the detected plurality of cropped images into the plurality of fourth detection models; and

detecting a plurality of landmarks for the second portion using the plurality of fourth detection models;

Including,

method.
The method of claim 9, wherein the third detection model is:

Using the 3D volume image as an input, detecting a plurality of candidate regions predicted as the plurality of landmarks for the second part from the 3D volume image, and cropping the plurality of candidate regions corresponding to the plurality of detected candidate regions. configured to output an image,

method.
The method of claim 10, wherein the plurality of candidate regions overlap at least in part,

Containing at least one location predicted for each of a plurality of landmarks for the second portion,

method.
The method of claim 9, wherein each of the plurality of fourth detection models is configured to correspond to each of the plurality of cropped images.

method.
The method of claim 12, wherein each of the plurality of fourth detection models is:

Configured to predict at least one landmark from a candidate area corresponding to each cropped image using each cropped image as an input, and output a three-dimensional coordinate value for each of the predicted at least one landmark,

method.
The method of claim 9, wherein each of the plurality of cropped images is a 3D volume image corresponding to each of the plurality of candidate regions.

method.
A computer program stored on a computer-readable storage medium, wherein the computer program, when executed on one or more processors, performs the following operations for detecting a landmark from a three-dimensional volume image, the operations being:

An operation of acquiring input data from a 3D volume image of a target area of an object;

Inputting the input data into a first landmark detection model;

detecting a plurality of landmarks for a first portion of the target region using the first landmark detection model;

Inputting the 3D volume image into a second landmark detection model;

detecting a plurality of landmarks for a second portion of the target area using the second landmark detection model; and

generating a plurality of landmarks for the target area based on the first group of landmarks and the second group of landmarks;

Including,

A computer program stored on a computer-readable storage medium.
A computer-readable storage medium storing a computer program, the computer program, when executed by a computing device, causing the computing device to perform a method for detecting a landmark from a three-dimensional volume image, the method comprising:

Obtaining input data from a 3D volume image of a target area of an object;

Inputting the input data into a first landmark detection model;

detecting a plurality of landmarks for a first portion of the target area using the first landmark detection model;

Inputting the 3D volume image into a second landmark detection model;

detecting a plurality of landmarks for a second portion of the target area using the second landmark detection model; and

generating a plurality of landmarks for the target area based on the first group of landmarks and the second group of landmarks;

Including,

Computer readable storage medium.
A computing device for detecting landmarks from a three-dimensional volume image, comprising:

at least one processor; and

Contains memory,

The at least one processor:

Obtain input data from a 3D volume image taken of the target area of the object,

Input the input data into a first landmark detection model,

Detecting a plurality of landmarks for a first part of the target area using the first landmark detection model,

Input the 3D volume image into a second landmark detection model,

Detecting a plurality of landmarks for a second part of the target area using the second landmark detection model,

Configured to generate a plurality of landmarks for the target area based on the first group of landmarks and the second group of landmarks,

Computing device.