CN116747026B

CN116747026B - Intelligent robot bone cutting method, device and equipment based on deep reinforcement learning

Info

Publication number: CN116747026B
Application number: CN202310656264.1A
Authority: CN
Inventors: 张逸凌; 刘星宇
Original assignee: Longwood Valley Medtech Co Ltd
Current assignee: Longwood Valley Medtech Co Ltd
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2024-06-25
Anticipated expiration: 2043-06-05
Also published as: CN116747026A

Abstract

The application provides a robot intelligent osteotomy method, device and equipment based on deep reinforcement learning and a computer readable storage medium. The intelligent robot bone cutting method based on the deep reinforcement learning comprises the following steps: controlling the mechanical arm to move to the vicinity of the planned osteotomy surface; controlling the mechanical arm to adjust to the same plane of the saw blade and the planned osteotomy plane; when bone cutting is started, controlling the mechanical arm to move on a plane of a planned bone cutting plane according to a preset mechanical arm path movement strategy; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy. According to the embodiment of the application, the efficiency and the accuracy of knee joint osteotomy can be improved.

Description

Intelligent robot bone cutting method, device and equipment based on deep reinforcement learning

Technical Field

The application belongs to the technical field of deep learning intelligent recognition, and particularly relates to a robot intelligent osteotomy method, device and equipment based on deep reinforcement learning and a computer readable storage medium.

Background

Currently, when a knee joint is used for osteotomy, the doctor is mainly used for manually performing osteotomy according to experience, so that the efficiency and the accuracy are low.

Therefore, how to improve the efficiency and accuracy of knee osteotomies is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides a robot intelligent osteotomy method, device and equipment based on deep reinforcement learning and a computer readable storage medium, which can improve the efficiency and accuracy of knee joint osteotomy.

In a first aspect, an embodiment of the present application provides a robot intelligent osteotomy method based on deep reinforcement learning, including:

controlling the mechanical arm to move to the vicinity of the planned osteotomy surface;

controlling the mechanical arm to adjust to the same plane of the saw blade and the planned osteotomy plane;

When bone cutting is started, controlling the mechanical arm to move on a plane of a planned bone cutting plane according to a preset mechanical arm path movement strategy; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy.

Optionally, the reinforcement learning strategy includes:

initializing parameters; wherein the parameters include environmental parameters and network parameters;

Performing actions;

Obtaining rewards;

The network is trained.

Optionally, the data acquisition is performed before the parameter initialization, including:

And acquiring osteotomy face data, moving relative coordinates during osteotomy of the mechanical arm, bone data after knee joint segmentation and the instantaneous speed of the mechanical arm.

Optionally, the action performing includes:

Environment detection and environment interaction to learn state parameters in real time during the osteotomy phase.

Optionally, the mechanical arm path movement strategy is obtained through model training based on a reinforcement learning strategy, and the mechanical arm path movement strategy comprises the following steps:

and sequentially inputting each state information into a long-short-time memory network LSTM with a cyclic neural network structure, selecting the quantity of previous information to be memorized through a forgetting gate, storing the effective information in the current information through an input gate, outputting the effective information through an output gate, storing the effective information into a hidden state, and obtaining a mechanical arm path movement strategy through network model training.

Optionally, in the model training process, the trained batch size is 32, the initial learning rate is set to be 1e-4, a learning rate attenuation strategy is added, the learning rate is attenuated to be 0.9 in each iteration for 5000 times, the optimizer uses the Adam optimizer, the loss function is a mean square error loss function, each iteration is set to be 1000 times, one verification is performed on a training set and a verification set, the network training stop time is judged through an early stop method, and a final model is obtained.

Optionally, the reward mechanism includes:

the mechanical arm learns the correct strategy through a feedback signal obtained by interaction with the environment;

Ending the round of learning when the mechanical arm is out of bounds or does not reach the destination within a specified step length;

Setting a penalty value between-1 and 0 and a prize value between 0 and 1; giving punishment when the mechanical arm is out of bounds, and giving rewards when the mechanical arm is within a specified range; to speed up the network training, a negative prize is given to each step of movement of the robotic arm, set to-0.0002.

In a second aspect, an embodiment of the present application provides a robotic intelligent osteotomy device based on deep reinforcement learning, the device comprising:

The movement control module is used for controlling the mechanical arm to move to the vicinity of the planned osteotomy surface;

the adjustment control module is used for controlling the mechanical arm to be adjusted to the same plane of the saw blade and the planned osteotomy plane;

The osteotomy face movement control module is used for controlling the mechanical arm to move on a plane of the planned osteotomy face according to a preset mechanical arm path movement strategy when osteotomy is started; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions;

The processor, when executing the computer program instructions, implements the intelligent osteotomy method for a robot based on deep reinforcement learning as in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement a deep reinforcement learning based robotic intelligent osteotomy method as in the first aspect.

The intelligent osteotomy method, device and equipment for the robot and the computer readable storage medium based on the deep reinforcement learning can improve the efficiency and the accuracy of knee osteotomy.

The intelligent robot bone cutting method based on the deep reinforcement learning comprises the following steps: controlling the mechanical arm to move to the vicinity of the planned osteotomy surface; controlling the mechanical arm to adjust to the same plane of the saw blade and the planned osteotomy plane; when bone cutting is started, controlling the mechanical arm to move on a plane of a planned bone cutting plane according to a preset mechanical arm path movement strategy; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy.

Therefore, when the method starts osteotomy, the mechanical arm is controlled to move on the plane of the planned osteotomy plane according to the preset mechanical arm path movement strategy, and the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy, so that the efficiency and the accuracy of knee joint osteotomy can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a robotic intelligent osteotomy method based on deep reinforcement learning provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a reinforcement learning strategy provided by one embodiment of the present application;

FIG. 3 is a schematic diagram of a long and short term memory network LSTM according to an embodiment of the present application;

FIG. 4 is a schematic view of a tibial osteotomy provided in accordance with an embodiment of the present application;

FIG. 5 is a schematic representation of a femoral resection provided in one embodiment of the present application;

FIG. 6 is a schematic structural view of a robotic intelligent osteotomy device based on deep reinforcement learning according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings and the detailed embodiments. It should be understood that the particular embodiments described herein are meant to be illustrative of the application only and not limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the application by showing examples of the application.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In order to solve the problems in the prior art, the embodiment of the application provides a method, a device, equipment and a computer-readable storage medium for intelligent osteotomy of a robot based on deep reinforcement learning. The following first describes a robot intelligent osteotomy method based on deep reinforcement learning provided by the embodiment of the application.

Fig. 1 shows a flow diagram of a robot intelligent osteotomy method based on deep reinforcement learning according to an embodiment of the present application. As shown in fig. 1, the intelligent osteotomy method based on the deep reinforcement learning robot comprises the following steps:

s101, controlling the mechanical arm to move to the vicinity of a planned osteotomy plane;

s102, controlling the mechanical arm to adjust to the same plane of the saw blade and the planned osteotomy plane;

s103, when bone cutting is started, controlling the mechanical arm to move on a plane of a planned bone cutting surface according to a preset mechanical arm path movement strategy; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy.

Mechanical arm osteotomy movement reinforcement learning strategy:

1) Aiming at the problem that the training samples are limited due to the fact that the cost of acquiring the mechanical arm samples is excessive in the reinforcement learning training process, the natural track obtained by the interaction of the mechanical arm and the environment is copied and expanded, so that the sample efficiency is improved; the environment is synchronously modified while the track is copied, so that the generalization performance of the mechanical arm in a complex environment is improved.

2) And the expert path planning experience is used as priori knowledge of a design rewarding function, so that the exploration efficiency of the mechanical arm in the training process is improved.

As shown in fig. 2, in one embodiment, the reinforcement learning strategy includes:

Performing actions;

Obtaining rewards;

The network is trained.

In one embodiment, the act performing comprises:

Optimization point of the reinforcement learning strategy:

1) And collecting the movement data of the mechanical arm, inputting the data serving as a feature vector and a reward function into a neural network for training, and finally selecting the optimal action according to the exploration strategy and outputting the optimal action to reach the next visual observation.

2) And continuously and iteratively executing the three stages of action, rewarding and training decision until the training is completed.

3) The environment interaction module is added to learn state parameters in real time in the osteotomy stage.

In one embodiment, data acquisition is performed prior to parameter initialization, including:

According to the point coordinates on the preoperative planning prosthesis, through registration conversion, a tibia registration matrix is used on the tibia side, a femur registration matrix is used on the femur side osteotomy surface, and one surface is calculated by three points, namely the osteotomy surface.

In one embodiment, the robot arm path movement strategy is model trained based on a reinforcement learning strategy, comprising:

Fig. 3 is a schematic diagram of an LSTM structure of a long-short-term memory network according to an embodiment of the present application, in which the long-short-term memory network is introduced to an environment sensing end due to complexity of a motion environment, and an LSTM network internal structure is in a solid frame. LSTM is a recurrent neural network structure that can process continuous data information. The reinforcement learning network inputs each state information into the LSTM network in turn, and selects the amount of previous information to be memorized through the forgetting gate. Second, valid information in the current information is stored through the input gate. Then, the effective information is outputted through the output gate and stored in the hidden state. And finally, obtaining a mechanical arm movement strategy through network training.

In one embodiment, in the model training process, the trained batch size is 32, the initial learning rate is set to be 1e-4, a learning rate attenuation strategy is added, the learning rate is attenuated to be 0.9 in each iteration for 5000 times, an Adam optimizer is used, the loss function is used as a mean square error loss function, each iteration is set for 1000 times, one verification is performed on a training set and a verification set, the network training stop time is judged through an early stop method, and a final model is obtained.

In one embodiment, a reward mechanism includes:

In one embodiment, a tibial osteotomy diagram and a femoral osteotomy diagram are shown in fig. 4 and 5, respectively.

Fig. 6 is a schematic structural diagram of a deep reinforcement learning-based intelligent osteotomy device of a robot according to an embodiment of the present application, the device includes:

a movement control module 601, configured to control the mechanical arm to move to a position near the planned osteotomy plane;

the adjustment control module 602 is used for controlling the mechanical arm to be adjusted to the same plane of the saw blade and the planned osteotomy plane;

The osteotomy face movement control module 603 is configured to control the movement of the mechanical arm on the planned osteotomy face plane according to a preset mechanical arm path movement strategy when osteotomy is started; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy.

Fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

The electronic device may include a processor 701 and a memory 702 storing computer program instructions.

In particular, the processor 701 may comprise a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.

Memory 702 may include mass storage for data or instructions. By way of example, and not limitation, memory 702 may include a hard disk drive (HARD DISK DRIVE, HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) drive, or a combination of two or more of the foregoing. The memory 702 may include removable or non-removable (or fixed) media, where appropriate. The memory 702 may be internal or external to the electronic device, where appropriate. In a particular embodiment, the memory 702 may be a non-volatile solid state memory.

In one embodiment, memory 702 may be Read Only Memory (ROM). In one embodiment, the ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these.

The processor 701 reads and executes the computer program instructions stored in the memory 702 to implement any of the robot intelligent osteotomy methods based on deep reinforcement learning in the above embodiments.

In one example, the electronic device may also include a communication interface 703 and a bus 710. As shown in fig. 7, the processor 701, the memory 702, and the communication interface 703 are connected by a bus 710 and perform communication with each other.

The communication interface 703 is mainly used for implementing communication between each module, device, unit and/or apparatus in the embodiment of the present application.

Bus 710 includes hardware, software, or both that couple components of the electronic device to one another. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 710 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.

In addition, in combination with the robot intelligent osteotomy method based on deep reinforcement learning in the above embodiment, the embodiment of the application can be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by the processor, implement any of the deep reinforcement learning-based intelligent osteotomy methods of the above embodiments.

It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. The method processes of the present application are not limited to the specific steps described and shown, but various changes, modifications and additions, or the order between steps may be made by those skilled in the art after appreciating the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. The present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.

Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present application, and they should be included in the scope of the present application.

Claims

1. A robotic intelligent osteotomy device based on deep reinforcement learning, the device comprising:

The osteotomy face movement control module is used for controlling the mechanical arm to move on a plane of the planned osteotomy face according to a preset mechanical arm path movement strategy when osteotomy is started; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy;

the intelligent robot osteotomy method based on the deep reinforcement learning of the intelligent robot osteotomy device based on the deep reinforcement learning comprises the following steps:

When bone cutting is started, controlling the mechanical arm to move on a plane of a planned bone cutting plane according to a preset mechanical arm path movement strategy; the mechanical arm path movement strategy is obtained through model training based on the reinforcement learning strategy;

Wherein the reinforcement learning strategy comprises:

Performing actions;

Obtaining rewards;

training a network;

Wherein, carry out data acquisition before parameter initialization, include:

Acquiring osteotomy face data, moving relative coordinates during osteotomy by a mechanical arm, bone data after knee joint segmentation and the instantaneous speed of the mechanical arm;

Wherein the action execution comprises:

Environment detection and environment interaction to learn state parameters in real time in the osteotomy stage;

the mechanical arm path movement strategy is obtained through model training based on a reinforcement learning strategy, and comprises the following steps:

Sequentially inputting each state information into a long-short-time memory network LSTM with a cyclic neural network structure, selecting the quantity of previous information to be memorized through a forgetting gate, storing effective information in the current information through an input gate, outputting the effective information through an output gate and storing the effective information into a hidden state, and obtaining a mechanical arm path movement strategy through network model training;

In the model training process, the trained batch size is 32, the initial learning rate is set to be 1e-4, a learning rate attenuation strategy is added, the learning rate is attenuated to be 0.9 in each iteration for 5000 times, an Adam optimizer is used by the optimizer, a loss function is used as a mean square error loss function, each iteration for 1000 times is set, one-time verification is carried out on a training set and a verification set, the network training stop time is judged through an early stop method, and a final model is obtained;

wherein the rewarding mechanism comprises:

2. An electronic device, the electronic device comprising: a processor and a memory storing computer program instructions;

The processor, when executing the computer program instructions, implements the robotic intelligent osteotomy method based on deep reinforcement learning as recited in claim 1.

3. A computer readable storage medium, wherein computer program instructions are stored on the computer readable storage medium, which when executed by a processor, implement the deep reinforcement learning based robotic intelligent osteotomy method as in claim 1.