CN114663432B

CN114663432B - Skeleton model correction method and device

Info

Publication number: CN114663432B
Application number: CN202210568530.0A
Authority: CN
Inventors: 曾承
Original assignee: Wuhan Talent Information Technology Co ltd
Current assignee: Wuhan Talent Information Technology Co ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-08-16
Anticipated expiration: 2042-05-24
Also published as: CN114663432A

Abstract

The invention provides a skeleton model correction method and a skeleton model correction device, wherein the method comprises the following steps: by constructing and decomposing the three-dimensional action model of the medical expert to obtain training data of different steps and adding the space attention mechanism and the time attention mechanism, the correction result of the model is closer to the three-dimensional action model of the medical expert, the automation of correcting the clinical operation of the internist is realized, the medical expert only needs to comment according to the final correction result, and the time of the medical expert is further saved.

Description

Skeleton model correction method and device

Technical Field

The invention relates to the field of artificial intelligence, in particular to a skeleton model correction method and device.

Background

In order to enhance the training of the clinical skills of the interns and improve the overall quality of the interns, the clinical skills of the interns are generally examined. Currently, the assessment method mainly uses a medical expert to observe the clinical operation of a intern on site for correction, but the method consumes valuable time of the medical expert, so a correction model for the clinical operation skill is urgently needed.

Disclosure of Invention

The invention mainly aims to provide a skeleton model correction method and a skeleton model correction device, and aims to solve the problem that medical expert time is consumed for performing correction by observing clinical operation of a intern on site.

The invention provides a skeleton model correction method, which comprises the following steps:

collecting a plurality of clinical operation videos, and marking the bone points of medical experts in the clinical operation videos to obtain a plurality of first marks;

constructing a three-dimensional action model of the medical expert according to the first mark and the clinical operation video, and performing step decomposition on the three-dimensional action model according to a preset clinical step and a time sequence to obtain a plurality of three-dimensional action sub-models;

inputting the three-dimensional action submodel into an initial model, and acquiring the distance from each bone point in the three-dimensional action submodel to a target object in the initial model;

marking skeleton points within a preset distance as effective skeleton points, extracting a local submodel of the effective skeleton points corresponding to the three-dimensional action submodel, and acquiring first position information of the effective skeleton points in each frame of the local submodel and second position information of the target object;

calculating a first vector of each of the valid bone points to remaining valid bone points based on the first location information and a second vector of each of the valid bone points to the target object based on the second location information;

setting the space attention value of each first vector based on the modulus of the second vector, and extracting the same first vector in each frame of the local sub-model to form a first vector set; two effective bone points corresponding to any two first vectors in the first vector set are the same;

arranging first vectors in the first vector set in chronological order;

calculating the position difference of two adjacent first vectors

；

Setting a position attention score corresponding to an mth first vector in the (n + 1) th frame local sub-model according to a softmax function, and setting the position attention score corresponding to the first vector in the first frame local sub-model as a constant;

carrying out weighted summation according to each first vector and the corresponding space attention score and position attention score to obtain the weighted sum of each local submodel;

inputting the weighted sum of each frame of local submodel of the local submodel into a corresponding judgment network input layer, training by taking a preset score corresponding to the clinical operation as the output of the judgment network, and obtaining a skeleton correction model after the training is finished;

and acquiring a clinical operation video of a intern, inputting the clinical operation video into the skeleton correction model to obtain a skill score of the intern, and correcting the action of the intern based on the skill score.

Further, the step of decomposing the three-dimensional motion model according to a preset clinical procedure in a time sequence to obtain a plurality of three-dimensional motion submodels includes:

acquiring third position information of the skeleton point of each frame of the three-dimensional action model;

obtaining the moving distance of each bone point in each frame of the three-dimensional action model relative to the previous frame according to the third position information, and summing to obtain the total displacement of the adjacent frames of the medical expert;

judging whether the total displacement of the continuous adjacent frames is larger than a total displacement threshold value;

if yes, taking the frames with the total displacement quantity of the continuous adjacent frames larger than the total displacement quantity threshold value as truncation points;

and decomposing the three-dimensional action model in sequence according to the interception points to obtain a plurality of three-dimensional action submodels.

Further, the step of extracting a local sub-model of a corresponding valid bone point in the three-dimensional motion sub-model includes:

acquiring first position information of the effective bone point in each frame of the three-dimensional action sub-model and second position information of the target object;

determining a target space according to the first position information and the second position information, wherein the target space comprises all positions corresponding to the first position information and positions corresponding to the second position information;

and acquiring the local submodel from the target space.

Further, the step of constructing a three-dimensional motion model of the medical expert from the first label and the video of clinical operations comprises:

acquiring a two-dimensional picture containing each bone point from each clinical operation video;

setting K aggregation centers in the two-dimensional picture according to a K-means algorithm;

determining a minimum central rectangle according to the k aggregation centers; wherein the k aggregation centers are contained in the minimum center rectangle;

calculating the area of each minimum center rectangle, recording the clinical operation video corresponding to the minimum center rectangle with the minimum area as a main video, and recording the rest clinical operation videos as auxiliary videos;

establishing a space coordinate system based on the main video, and obtaining coordinate points on two coordinate axes of each skeleton point;

obtaining coordinate points on the remaining coordinate axis of the skeleton points according to the auxiliary video, so as to obtain a complete coordinate value of each skeleton point;

and constructing a three-dimensional action model of the medical expert based on each complete coordinate value.

acquiring a first dominant hand of a medical expert;

judging whether the first inertial hand is the same as a preset second inertial hand or not;

if not, performing mirror image processing on the marked clinical operation video to obtain a mirror image operation video;

and constructing a three-dimensional action model of the medical expert according to the mirror image operation video and the first mark.

Further, the step of inputting the weighted sum of each frame of local submodel of the local submodel into a corresponding discrimination network input layer, training by using a preset score corresponding to the clinical operation as the output of the discrimination network, and obtaining a bone correction model after the training is completed further includes:

acquiring a test set;

inputting the test video in the test set into the skeleton correction model to obtain prediction output;

according to the formula

Calculating a model loss value, y is a prediction output,

actual outputs in the test set;

judging whether the model loss value is smaller than a preset loss value or not;

if so, the bone correction model is judged to be available for correcting the intern.

The present invention also provides a bone model correction apparatus, including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a plurality of clinical operation videos and marking bone points of medical experts in the clinical operation videos to obtain a plurality of first marks;

the construction module is used for constructing a three-dimensional action model of the medical expert according to the first mark and the clinical operation video, and performing step decomposition on the three-dimensional action model according to a preset clinical step and a time sequence to obtain a plurality of three-dimensional action submodels;

the distance acquisition module is used for inputting the three-dimensional action submodel into an initial model and acquiring the distance from each bone point in the three-dimensional action submodel to a target object in the initial model;

a marking module, configured to mark a bone point within a preset distance as an effective bone point, extract a local sub-model of the effective bone point corresponding to the three-dimensional action sub-model, and obtain first position information of the effective bone point in each frame of the local sub-model and second position information of the target object;

a vector calculation module for calculating a first vector of each of the valid bone points to the remaining valid bone points based on the first location information and a second vector of each of the valid bone points to the target object based on the second location information;

the setting module is used for setting the spatial attention value of each first vector based on the modulus of the second vector, extracting the same first vector in each frame of the local sub-model and forming a first vector set; two effective bone points corresponding to any two first vectors in the first vector set are the same;

the arrangement module is used for arranging the first vectors in the first vector set according to a time sequence;

a position difference calculating module for calculating the position difference between two adjacent first vectors

；

The setting module is used for setting a position attention score corresponding to the mth first vector in the n +1 th frame local sub-model according to a softmax function, and setting the position attention score corresponding to the first vector in the first frame local sub-model as a constant;

the weighting module is used for carrying out weighted summation according to each first vector and the corresponding space attention score and position attention score to obtain the weighted sum of each local sub-model;

the training module is used for inputting the weighted sum of each frame of local submodel of the local submodel into a corresponding judgment network input layer, training by taking a preset score corresponding to the clinical operation as the output of the judgment network, and obtaining a skeleton correction model after the training is finished;

and the correction module is used for acquiring a clinical operation video of a intern, inputting the clinical operation video into the bone correction model to obtain a skill score of the intern, and correcting the action of the intern based on the skill score.

Further, the building module includes:

the third position information acquisition submodule is used for acquiring third position information of the skeleton point of each frame of the three-dimensional action model;

the adjacent frame displacement total quantity obtaining submodule is used for obtaining the moving distance of each bone point in each frame of the three-dimensional action model relative to the previous frame according to the third position information, and summing the moving distances to obtain the adjacent frame displacement total quantity of the medical expert;

the judgment sub-module is used for judging whether the total displacement of the adjacent continuous frames is greater than a total displacement threshold value;

the truncation point setting submodule is used for taking the frames of which the total displacement amounts of the continuous adjacent frames are greater than the total displacement amount threshold value as the truncation points if the frames are continuous;

and the decomposition submodule is used for sequentially decomposing the three-dimensional action model according to the interception points to obtain a plurality of three-dimensional action submodels.

Further, the recording module includes:

the position information acquisition submodule is used for acquiring first position information of the effective bone point in each frame of the three-dimensional action sub-model and second position information of the target object;

a space determining submodule, configured to determine a target space according to the first location information and the second location information, where the target space includes locations corresponding to all the first location information and locations corresponding to the second location information;

and the local submodel acquisition submodule is used for acquiring the local submodel from the target space.

Further, the building module includes:

the two-dimensional picture acquisition sub-module is used for acquiring a two-dimensional picture containing each bone point from each clinical operation video;

the aggregation center setting submodule is used for setting K aggregation centers in the two-dimensional picture according to a K-means algorithm;

a minimum center rectangle determining submodule for determining a minimum center rectangle according to the k aggregation centers; wherein the k aggregation centers are contained in the minimum center rectangle;

the area calculation submodule is used for calculating the area of each minimum central rectangle, recording the clinical operation video corresponding to the minimum central rectangle with the minimum area as a main video, and recording the rest clinical operation videos as auxiliary videos;

the coordinate point acquisition submodule is used for establishing a space coordinate system based on the main video and obtaining coordinate points on two coordinate axes of each skeleton point;

the complete coordinate value acquisition submodule is used for obtaining coordinate points on the remaining coordinate axis of the bone points according to the auxiliary video so as to obtain the complete coordinate value of each bone point;

and the three-dimensional action model establishing submodule is used for establishing the three-dimensional action model of the medical expert based on each complete coordinate value.

The invention has the beneficial effects that: the three-dimensional action model of the medical expert is constructed and decomposed to obtain training data of different steps, and a space attention mechanism and a time attention mechanism are added to make the correction result of the model closer to the three-dimensional action model of the medical expert. Therefore, the automation of correcting the clinical operation of the intern is realized, so that the medical expert only needs to comment according to the final correction result, and the time of the medical expert is further saved.

Drawings

FIG. 1 is a schematic flow chart of a bone model modification method according to an embodiment of the present invention;

fig. 2 is a block diagram schematically illustrating a structure of a bone model correction apparatus according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that all directional indicators (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly, and the connection may be a direct connection or an indirect connection.

The term "and/or" herein is only one kind of association relationship describing the association object, and means that there may be three kinds of relationships, for example, a and B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone.

In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a skeleton model modification method, including:

s1: collecting a plurality of clinical operation videos, and marking the bone points of medical experts in the clinical operation videos to obtain a plurality of first marks;

s2: constructing a three-dimensional action model of the medical expert according to the first mark and the clinical operation video, and performing step decomposition on the three-dimensional action model according to a preset clinical step and a time sequence to obtain a plurality of three-dimensional action sub-models;

s3: inputting the three-dimensional action submodel into an initial model, and acquiring the distance from each bone point in the three-dimensional action submodel to a target object in the initial model;

s4: marking skeleton points within a preset distance as effective skeleton points, extracting a local submodel of the effective skeleton points corresponding to the three-dimensional action submodel, and acquiring first position information of the effective skeleton points in each frame of the local submodel and second position information of the target object;

s5: calculating a first vector of each of the valid bone points to remaining valid bone points based on the first location information and a second vector of each of the valid bone points to the target object based on the second location information;

s6: setting the space attention value of each first vector based on the modulus of the second vector, and extracting the same first vector in each frame of the local sub-model to form a first vector set; two effective bone points corresponding to any two first vectors in the first vector set are the same;

s7: arranging first vectors in the first vector set in chronological order;

s8: calculating the position difference of two adjacent first vectors

；

S9: according to

Setting a position attention score corresponding to the mth first vector in the n +1 th frame local sub-model by the function, and setting the position attention score corresponding to the first vector in the first frame local sub-model as a constant;

s10: carrying out weighted summation according to each first vector and the corresponding space attention score and position attention score to obtain the weighted sum of each local submodel;

s11: inputting the weighted sum of each frame of local submodel of the local submodel into a corresponding judgment network input layer, training by taking a preset score corresponding to the clinical operation as the output of the judgment network, and obtaining a skeleton correction model after the training is finished;

s12: and acquiring a clinical operation video of a intern, inputting the video into the bone correction model to obtain a skill score of the intern, and correcting the action of the intern based on the skill score.

As described in the above step S1, a plurality of videos of clinical operations of the medical expert on the target object are collected through a plurality of preset cameras, where the number of the preset cameras is at least four, and the preset cameras are distributed around the medical expert, or more cameras may be set to obtain each position of the medical expert, so as to facilitate the subsequent construction of the three-dimensional motion model of the medical expert. And marking the bone points of the medical experts in the clinical operation video to obtain a plurality of first marks, or marking the bone points of the target object to obtain a plurality of corresponding second marks, and calculating the distance between each bone point and the target object through the second marks when subsequently calculating the distance. The marking mode may be marking in the clinical operation video through feature recognition of each bone point, or manually marking to obtain a corresponding first mark and a corresponding second mark.

As described in the above step S2, the three-dimensional motion model of the medical expert is constructed according to the first label and the clinical operation video. The method comprises the steps that a space rectangular coordinate system can be established according to positions of all cameras and clinical operation videos collected by all the cameras to form a virtual space, and coordinate points in the virtual space are obtained according to positions of all skeleton points in the clinical operation videos in the virtual space, so that a three-dimensional action model of the medical expert is obtained. And according to preset clinical steps, carrying out step decomposition on the three-dimensional motion model according to a time sequence to obtain a plurality of three-dimensional motion sub-models. The decomposition method can be artificial decomposition, and generally speaking, the skill for a clinical operation generally comprises a plurality of steps, for example, the basic surgical operations include four common techniques of "incision", "suture", "knot" and "hemostasis", which are the working skills that each front-line doctor must possess. Therefore, the clinical operation video can be artificially decomposed, or the clinical operation video can be decomposed according to the pause action, which is described in detail later and is not described herein again.

As described in step S3, the three-dimensional motion sub-model is input into an initial model, and the distance from each bone point in the three-dimensional motion sub-model to the target object is obtained in the initial model. The initial model is a neural network discrimination model, and the three-dimensional coordinates of each bone point are known, and the three-dimensional coordinates of the target object are known, so that the distance from each bone point to the target object can be directly acquired.

As described in step S4 above, the bone points within the preset distance are recorded as valid bone points. For the clinical operation skill, the main body for performing the operation is generally only focused on the dominant hand or both hands, so that for each finger, the bone points corresponding to the finger joints are focused, but for the head, the eyes and the like are not important bone points, and therefore, the preset distance can be preset according to the considered clinical operation skill, so that the effective bone points can be obtained. And extracting a local sub-model of the corresponding effective skeleton point in the three-dimensional action sub-model. The extraction method is described in detail later, and is not described herein again. And acquiring first position information of the effective skeleton point in each frame of the local sub-model and second position information of the target object. Since the coordinate point of the effective skeleton and the coordinate point of the target object are known, the first position information and the second position information can be directly acquired. It should be noted that, since the clinical operation video is composed of one frame of picture, the corresponding local sub-model also corresponds to each frame of picture, so as to form a three-dimensional stereo map of the local sub-model of one frame, that is, the local sub-model.

As described in step S5 above, a first vector of each of the valid bone points to the remaining valid bone points is calculated based on the first position information, and a second vector of each of the valid bone points to the target object is calculated based on the second position information. The first vector and the second vector between each effective skeleton point and the rest effective skeleton points are calculated, and the coordinate values of the effective skeleton points are subtracted to obtain the second vector.

As described in step S6, the positions between the effective bone points include the operation information of the operator, and the operation information is expressed by the formula

Setting a spatial attention score for each of the first vectors based on a modulus of the second vector, wherein,

，

the (i) th first vector is represented,

indicating the attention score corresponding to the ith first vector,

and

to represent

Two corresponding second vectors need to be established, and the closer the two second vectors are to the target object, the greater the influence degree of the bone point on the target object is, so that more attention needs to be allocated to the features of the part. Wherein,

the representation is based on

And

the calculation function (c) may be a linear function, a multiple function, or a complex function, and may satisfy the requirement

The smaller the value of (A), the corresponding

The larger the value is.

Is a normalized function, and the value range obtained by the function is [0,1 ]]。

As described in step S7, the same first vector in each frame of the local sub-model may be extracted to form a first vector set. That is, a first vector set is formed, that is, the same first vector is generally different in local submodels of different frames, and here, the first vectors formed by the same two skeleton points in each frame are integrated to obtain the first vector set. The first vectors of the first set of vectors are arranged in chronological order. The first vectors may be ordered according to the order of the frames of the local submodel.

As described in the above step S8, the position difference between two adjacent first vectors is calculated

For the skill of the medical expert, it also includes dynamic information, i.e. the movement of the medical expert, and can therefore be based on the difference in position of two adjacent first vectors

And acquiring action information of the medical expert, wherein the action information is embodied at the position difference of two adjacent first vectors. The calculation formula is

Calculating the position difference of two adjacent first vectors

(ii) a Wherein,

，

two valid bone points corresponding to the mth first vector in the nth frame local sub-model,

，

two valid bone points corresponding to the mth first vector in the local sub-model of the (n + 1) th frame,

representing a preset position difference calculation function;

as described in step S9, the position attention score is set according to the acquired position difference, and it should be noted that the position differenceThe larger the magnitude of the motion is, and therefore the position attention score can be set to obtain a better recognition effect, it should be noted that, since no first vector in the first frame local submodel can obtain a position difference before the first vector, the position attention score corresponding to the first vector in the first frame local submodel can be set to be a constant, which can be set to 0 or 1, depending on the importance of the step of the first step of the training expert on the clinical operation skill correction. The specific setting mode is according to the formula

Setting the position attention score corresponding to the mth first vector in the n +1 th frame local sub-model, and setting the position attention score corresponding to the first vector in the first frame local sub-model as a constant.

As described in step S10, since there is a large amount of information in the spatial attention score and the positional attention score, the first vectors may be weighted according to the corresponding spatial attention score and the corresponding positional attention score, and the weighting may be performed by performing weighted summation on the positional attention score of each first vector and the corresponding spatial attention score, then performing vector multiplication on the first vectors to obtain the weighted result of each first vector, and then weighting the weighted results of all the first vectors of each local sub-model to obtain the weighted sum of each local sub-model. I.e. by the formula

The spatial attention score and the positional attention score are added to achieve the acquisition of the training data, namely the weighted sum of the n-th frame local sub-models. According to the formula

Wherein,

represents a weighted sum of the n-th frame local submodels,

representing the number of first vectors in the nth local submodel,

represents the ith first vector in the nth local submodel,

represents a weighted sum of the positional attention score and the temporal attention vector corresponding to the ith first vector in the nth local submodel.

As described in step S11, the weighted sum of the local submodels of each frame of the local submodels is input to the corresponding discriminant network input layer, the preset score corresponding to the clinical operation is used as the output of the discriminant network for training, and after the training is completed, the bone correction model is obtained. In addition, the bone correction model comprises a plurality of discrimination networks, the number of the discrimination networks corresponds to each sub-model one by one, so that each step is respectively recognized, and the accuracy of generating the bone correction model is higher.

As described in step S12, a clinical operation video of a intern is acquired, a skill score of the intern is obtained by inputting the video into the bone correction model, and the motion of the intern is corrected based on the skill score. Wherein, because of having a plurality of judging network to evaluate each step of different interns, so can get the score value of each step of interns, when the score value is comparatively low, the movements of the intern are pointed, which may be by sending a score to the intern, then the information is summarized by the intern, or can be sent to the medical expert, the medical expert knows the information, then the intern can repeat the clinical operation again, acquire the operation video and input the operation video into the skeleton correction model again until the score value of each item reaches the preset requirement, therefore, the action of the intern is corrected, the shortage of the intern is found out, the automation of correcting the clinical operation of the intern is realized, the medical expert only needs to improve according to the final correction result, and the time of the medical expert is further saved.

In one embodiment, the step S2 of decomposing the three-dimensional motion model according to a predetermined clinical procedure to obtain a plurality of three-dimensional motion submodels includes:

s201: acquiring third position information of the skeleton point of each frame of the three-dimensional action model;

s202: obtaining the moving distance of each bone point in each frame of the three-dimensional action model relative to the previous frame according to the third position information, and summing to obtain the total displacement of the adjacent frames of the medical expert;

s203: judging whether the total displacement of the continuous adjacent frames is larger than a total displacement threshold value;

s204: if yes, taking the frames with the total displacement quantity of the continuous adjacent frames larger than the total displacement quantity threshold value as truncation points;

s205: and decomposing the three-dimensional action model in sequence according to the interception points to obtain a plurality of three-dimensional action submodels.

As described in the above steps S201 to S205, the three-dimensional motion sub-model is obtained. Specifically, in the actual operation process, at the end of a step, for example, at the end of a suturing step, the medical expert places a sterilized tool, so that the corresponding bone point moves a considerable distance, and therefore, obtains the total displacement amount of the three-dimensional motion model of each frame relative to the adjacent frame of the previous frame, but it needs to consider some errors in executing the step, and therefore, needs to determine whether the total displacement amount of the adjacent frames is larger than the threshold displacement amount. The total displacement threshold value is a preset value and can be obtained comprehensively according to multiple times of video data, when the total displacement threshold value is larger than the total displacement threshold value, the step is possibly finished, and further judgment is carried out by combining whether the total displacement of the continuous adjacent frames is larger than the total displacement threshold value or not, so that an interception point is obtained, the three-dimensional action model is decomposed in sequence according to the interception point, and a plurality of three-dimensional action submodels are obtained. Automatic decomposition of the three-dimensional motion model is achieved.

In one embodiment, the step S4 of extracting a local sub-model of a corresponding valid bone point in the three-dimensional motion sub-model includes:

s401: acquiring first position information of the effective bone point in each frame of the three-dimensional action submodel and second position information of the target object;

s402: determining a target space according to the first position information and the second position information, wherein the target space comprises positions corresponding to all the first position information and positions corresponding to the second position information;

s403: and acquiring the local submodel from the target space.

As described in the above steps S401 to S403, the local submodel is obtained, wherein, since the coordinate values of the valid bone points are obtained in advance, the corresponding target space may be determined according to the first position information and the second position information, the size of the target space is not limited, but a minimum space including all valid bone points and the target object is preferably used as the target space, and then the local submodel is obtained from the target space.

In one embodiment, the step S2 of constructing a three-dimensional motion model of the medical expert based on the first label and the video of clinical operations includes:

s211: acquiring a two-dimensional picture containing each bone point from each clinical operation video;

s212: setting K aggregation centers in the two-dimensional picture according to a K-means algorithm;

s213: determining a minimum central rectangle according to the k aggregation centers; wherein the k aggregation centers are contained in the minimum center rectangle;

s214: calculating the area of each minimum center rectangle, recording the clinical operation video corresponding to the minimum center rectangle with the minimum area as a main video, and recording the rest clinical operation videos as auxiliary videos;

s215: establishing a space coordinate system based on the main video, and obtaining coordinate points on two coordinate axes of each skeleton point;

s216: obtaining coordinate points on the remaining coordinate axis of the skeleton points according to the auxiliary video, so as to obtain a complete coordinate value of each skeleton point;

s217: and constructing a three-dimensional action model of the medical expert based on each complete coordinate value.

As described in the above steps S211 to S217, the construction of the three-dimensional motion model is realized. Because each camera takes a two-dimensional picture, each frame of two-dimensional picture in the video can be acquired, and it should be noted that each frame of two-dimensional picture describes the two-dimensional picture taken by each camera at the same time point. Because the two-dimensional picture with the best shooting effect is generally the picture with the most dispersed bone points, the motion and the positions of all the bone points can be clearly seen, and the confusion is avoided, the K-means algorithm is adopted to obtain all the gathering centers, wherein the K-means algorithm comprises the following execution steps: respectively calculating the distance (generally, the Euclidean distance or the cosine distance) from each sample point to K cluster cores (aggregation centers), finding the cluster core closest to the point, and attributing the cluster core to the corresponding cluster; after all points belong to a cluster, M points are divided into K clusters. Then, the gravity center (average distance center) of each cluster is recalculated and is determined as a new 'cluster core'; and repeating the steps until an abort condition is reached. And then selecting a main video, establishing a space coordinate axis based on the main video, and obtaining coordinate points on the remaining coordinate axis of the skeletal points according to the auxiliary video so as to obtain complete coordinate values of all the skeletal points, thereby establishing a three-dimensional action model of a medical expert.

s321: acquiring a first dominant hand of a medical expert;

s322: judging whether the first inertial hand is the same as a preset second inertial hand or not;

s323: if not, carrying out mirror image processing on the marked clinical operation video to obtain a mirror image operation video;

s324: and constructing a three-dimensional action model of the medical expert according to the mirror image operation video and the first mark.

As described above in steps S321-S322, data processing for some special medical experts is implemented so as to reduce errors in training subsequently. For example, for a general person, the dominant hand is generally the right hand, but some dominant hands used by medical experts are the left hand, but the number of medical experts is limited, and more data of the medical experts cannot be acquired, so that the first dominant hand of the medical experts can be judged, the acquired method can be to acquire the dominant hand of the medical experts in advance, then manually upload the data, and when the first dominant hand is different from the preset second dominant hand, the marked clinical operation video is subjected to mirror image processing to obtain a mirror image operation video, so that the three-dimensional action model of the medical experts is constructed according to the mirror image operation video and the first mark, thereby improving the source of training data and making up for the defect of insufficient training data.

In one embodiment, the step S11 of inputting the weighted sum of the local submodels of each frame of the local submodels into the corresponding discriminant network input layer, training a preset score corresponding to the clinical operation as the output of the discriminant network, and obtaining a bone correction model after the training is completed further includes:

s1201: acquiring a test set;

s1202: inputting the test video in the test set into the skeleton correction model to obtain a prediction output;

s1203: according to the formula

Calculating the model loss value, y is the prediction output,

actual outputs in the test set;

s1204: judging whether the model loss value is smaller than a preset loss value or not;

s1205: if so, the bone correction model is judged to be available for correcting the intern.

As described in the above steps S1201-S1205, the accuracy detection of the bone correction model is realized. The method includes the steps that a test set is obtained, the test set is generally an artificially uploaded test set, wherein the test set can also be extracted from a training set, but the obtained loss value error is large, and therefore the artificially uploaded test set is generally adopted. According to the formula

Calculating a model loss value, said y being a prediction output,

for the actual output in the test set,

and judging whether the model loss value is smaller than a preset loss value or not for the loss value, wherein the preset loss value is a preset loss value, if so, the model is considered to have a good recognition effect and can be used, and if so, the model is indicated to have insufficient precision and needs to be trained continuously. Therefore, the detection on the model precision is realized, the identification effect of the skeleton correction model is judged, and the obtained skeleton correction model is ensured to have good identification precision.

Referring to fig. 2, the present invention also provides a bone model correction apparatus, including:

the system comprises an acquisition module 10, a display module and a display module, wherein the acquisition module is used for acquiring a plurality of clinical operation videos and marking the bone points of medical experts in the clinical operation videos to obtain a plurality of first marks;

a building module 20, configured to build a three-dimensional motion model of the medical expert according to the first marker and the clinical operation video, and perform step decomposition on the three-dimensional motion model according to a preset clinical step and a time sequence to obtain a plurality of three-dimensional motion sub-models;

a distance obtaining module 30, configured to input the three-dimensional motion sub-model into an initial model, and obtain, in the initial model, a distance from each bone point in the three-dimensional motion sub-model to a target object;

a marking module 40, configured to mark a bone point within a preset distance as an effective bone point, extract a local sub-model of the effective bone point corresponding to the three-dimensional action sub-model, and obtain first position information of the effective bone point in each frame of the local sub-model and second position information of the target object;

a vector calculation module 50 for calculating a first vector of each of the valid bone points to the remaining valid bone points based on the first position information and a second vector of each of the valid bone points to the target object based on the second position information;

a setting module 60, configured to set a spatial attention score of each first vector based on a modulus of the second vector, and extract a same first vector in each frame of the local sub-model to form a first vector set; two effective bone points corresponding to any two first vectors in the first vector set are the same;

a ranking module 70 configured to rank the first vectors in the first set of vectors in chronological order;

a position difference calculating module 80 for calculating the position difference between two adjacent first vectors

；

A setting module 90, configured to set, according to a softmax function, a position attention score corresponding to an mth first vector in the n +1 th frame local sub-model, and set, as a constant, a position attention score corresponding to the first vector in the first frame local sub-model;

the weighting module 100 is configured to perform weighted summation according to each first vector and the corresponding spatial attention score and the corresponding position attention score to obtain a weighted sum of each local sub-model;

the training module 110 is configured to input the weighted sum of each frame of the local submodel to a corresponding discrimination network input layer, train a preset score corresponding to the clinical operation as an output of the discrimination network, and obtain a bone correction model after the training is completed;

and the correcting module 120 is configured to acquire a clinical operation video of a intern, input the clinical operation video into the bone correction model to obtain a skill score of the intern, and correct the motion of the intern based on the skill score.

In one embodiment, the building module 20 includes:

the judgment submodule is used for judging whether the total displacement of the continuous adjacent frames is greater than a total displacement threshold value;

In one embodiment, the recording module 40 includes:

the position information acquisition sub-module is used for acquiring first position information of the effective bone point in each frame of the three-dimensional action sub-model and second position information of the target object;

In one embodiment, the building module 20 includes:

The invention has the beneficial effects that: the three-dimensional action model of the medical expert is constructed and decomposed to obtain training data of different steps, and a space attention mechanism and a time attention mechanism are added to make the correction result of the model closer to the three-dimensional action model of the medical expert. Therefore, the automation of correcting the clinical operation of the intern is realized, so that the medical expert only needs to improve according to the final correction result, and the time of the medical expert is further saved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A bone model modification method, comprising:

collecting a plurality of clinical operation videos, and marking skeleton points of medical experts in the clinical operation videos to obtain a plurality of first marks;

arranging first vectors in the first vector set according to a time sequence;

calculating the position difference of two adjacent first vectors

；

carrying out weighted summation according to each first vector and the corresponding space attention value and position attention value to obtain the weighted sum of each local submodel;

and acquiring a clinical operation video of a intern, inputting the video into the bone correction model to obtain a skill score of the intern, and correcting the action of the intern based on the skill score.

2. The bone model revision method of claim 1, wherein the step of performing step decomposition of the three-dimensional motion model in a time sequence according to a predetermined clinical step to obtain a plurality of three-dimensional motion sub-models comprises:

if yes, taking the frames of which the total displacement amounts are greater than the total displacement amount threshold value as truncation points;

3. The bone model revision method of claim 1, wherein the step of extracting a local sub-model of a corresponding valid bone point in the three-dimensional action sub-model comprises:

acquiring first position information of the effective bone point in each frame of the three-dimensional action submodel and second position information of the target object;

and acquiring the local submodel from the target space.

4. The bone model revision method of claim 1, wherein said step of constructing a three-dimensional motion model of said medical professional based on said first markers and said video of clinical operations comprises:

5. The bone model revision method of claim 1, wherein the step of constructing a three-dimensional motion model of the medical professional based on the first markers and the video of clinical operations comprises:

acquiring a first dominant hand of a medical expert;

if not, carrying out mirror image processing on the marked clinical operation video to obtain a mirror image operation video;

6. The bone model modification method of claim 1, wherein the step of inputting the weighted sum of each frame of local submodel of the local submodel into a corresponding discriminant network input layer, training a preset score corresponding to the clinical operation as an output of the discriminant network, and obtaining the bone modification model after the training is completed further comprises:

acquiring a test set;

inputting the test video in the test set into the skeleton correction model to obtain a prediction output;

according to the formula

Calculating the model loss value, y is the prediction output,

actual outputs in the test set;

7. A bone model revision device, comprising:

the arrangement module is used for arranging the first vectors in the first vector set according to the time sequence;

；

8. The bone model revision device of claim 7, wherein the construction module comprises:

9. The bone model revision device of claim 7, said writing module comprising:

the space determining submodule is used for determining a target space according to the first position information and the second position information, and the target space comprises positions corresponding to all the first position information and positions corresponding to the second position information;

10. The bone model revision device of claim 7, wherein the construction module comprises: