CN113742527A - Method and system for retrieving and extracting operation video clips based on artificial intelligence - Google Patents
Method and system for retrieving and extracting operation video clips based on artificial intelligence Download PDFInfo
- Publication number
- CN113742527A CN113742527A CN202111310650.2A CN202111310650A CN113742527A CN 113742527 A CN113742527 A CN 113742527A CN 202111310650 A CN202111310650 A CN 202111310650A CN 113742527 A CN113742527 A CN 113742527A
- Authority
- CN
- China
- Prior art keywords
- video
- pictures
- identification model
- extracting
- operation stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/74—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention relates to a method and a system for retrieving and extracting operation video clips based on artificial intelligence, which comprises the steps of dividing a video, and inputting a plurality of pictures of the video clips extracted at equal intervals into an operation stage identification model and an operation event identification model for identification; outputting the identification results of the operation stage and the operation event to the start-stop time of the complete video according to the start-stop time of the video clip, and storing the corresponding time of the identification results in a video retrieval and video extraction system; and loading the recognition result and the corresponding time data in the video retrieval and video extraction system and displaying the recognition result and the corresponding time data in a progress bar of the video playing system. According to the method and the device, the operation stages and the operation events in the operation video are identified, and the corresponding time points are displayed on the progress bar of the played time axis, so that medical personnel can quickly position the operation stages or the operation events needing to be paid attention to when watching the operation video, the time is greatly saved, and the learning efficiency is improved.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a system for retrieving and extracting operation video clips based on artificial intelligence.
Background
According to the statistical annual report of China, 6171.58 and 6930.44 thousands of people can perform various operations. With the increasing medical quality demands of people, the demands on the surgical technicians are also expanding. On one hand, medical students need to learn surgical skills and treatment methods of sudden events in surgery as soon as possible; secondly, in order to improve the operation quality, the surgeon can repeatedly check the operation videos of himself or others, so that the surgical skill of the surgeon can be improved.
However, long-time operation video watching may take a lot of time for the doctor, and long-time operation video watching may reduce the attention of the doctor and the student, and thus may miss a part of the important steps or interested operation time. Especially, the operation video with ultra-long duration (such as the laparoscopic pancreaticoduodenectomy operation, which is generally as long as 6-8 hours), so that the doctor is greatly burdened when reviewing the video, and the learning efficiency is extremely low. The video cannot be quickly searched when a specific operation process, a certain operation event or a certain type of operation event occurs in the operation cannot be quickly positioned, so that the interested time can be quickly found for key browsing. In addition, according to the management needs or business application needs, a hospital or an administrative management organization needs to extract key information in the operation process, that is, extract key video clips or pictures in the operation process by an automatic method, so as to be used for learning research, knowledge base construction, safety behavior evaluation samples and selection of main knife technical capability evaluation samples in the operation process, however, the existing technology cannot meet the requirements.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a method and a system for retrieving and extracting an operation video clip based on artificial intelligence, and solves the problems existing when medical care personnel watch an operation video at present.
The purpose of the invention is realized by the following technical scheme: a method for retrieving and extracting surgical video clips based on artificial intelligence, the method comprising:
dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying an operation stage and an operation event in the pictures after image feature extraction;
outputting the identification results of the operation stage and the operation event to the start-stop time of the complete video according to the start-stop time of the video clip, and storing the corresponding time of the identification results in a video retrieval system and a video extraction system;
and loading the recognition results and the corresponding time data in the video retrieval system and the video extraction system, displaying the recognition results and the corresponding time data in a progress bar of the video playing system, and marking the operation stage and the time period of the operation event.
The method also comprises the step of extracting corresponding videos or pictures to manufacture a surgery knowledge base according to the surgical events and the occurrence time of the surgical stages in the videos, so that doctors or experts can quickly browse to carry out safety evaluation and main knife skill evaluation.
The method comprises the following steps of dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction, wherein the steps comprise:
dividing a video into a plurality of video clips, extracting N pictures of the video clips at equal intervals, and storing each picture as a four-dimensional tensor of a (N, C, H, W) format, wherein N represents the number of extracted frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset characteristic vector;
and inputting the image feature vectors with the format of (M, S) into an LSTM network by M times, and identifying the operation stage and the operation event of continuous video frames.
The video retrieval system and the video extraction system are constructed by the following method:
performing deep learning-based reasoning on the input picture by adopting an operation stage identification model and an operation event identification model to obtain operation stage information and operation event information;
outputting an operation stage and an operation event occurrence time period for a user through customized video playing software to be displayed on a progress bar, and realizing the construction of a video retrieval system;
and extracting corresponding video clips in the video or extracting the video into static pictures according to a certain frame rate by using the start and stop time points of the key operation process identified by the operation stage identification model and the operation event identification model according to customized extraction software, so as to realize the construction of a video extraction system.
The method also comprises the step of constructing an operation stage identification model and an operation event identification model before dividing the video; the construction steps of the surgery stage identification model and the surgery event identification model comprise:
establishing an operation stage theoretical model and an operation event theoretical model according to expert experience, guidelines and theories, and carrying out boundary division on the operation stage and the operation event according to the operation stage theoretical model and the operation event theoretical model on the collected operation video;
and collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures with operation stage time periods and operation event time periods.
And randomly distributing the labeled surgical phase data and surgical event data into a training set, a verification set and a test set according to corresponding proportions, and training, verifying and testing images of the training set, the verification set and the test set through a ResNet network and an LSTM network to complete construction of a surgical phase identification model and a surgical event identification model.
The collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period comprises the following steps:
collecting operation video data according to the requirement that the resolution of each operation video is not lower than a preset value and not lower than a preset frame per second, and storing the operation video data in a picture form;
uniformly transcoding the collected pictures into the same format through ffmpeg software, and completing primary labeling of the operation stage time period and the operation event time period through labeling software Anvil;
and manually labeling the video data subjected to preliminary labeling by a professional, and modifying the picture with unqualified preliminary labeling to obtain a picture with qualified labeling.
The training, verifying and testing of the training set, the verifying set and the testing set pictures through the ResNet network and the LSTM network to complete the construction of the operation stage identification model and the operation event identification model comprises the following steps:
storing the pictures of the training set, the verification set and the test set into a four-dimensional tensor of an (N, C, H, W) format, wherein N represents the number of frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the picture represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S);
inputting the image characteristic vectors with the format of (M, S) into an LSTM network by M times, identifying the operation stage and operation event of continuous video frames and recording the start time and the end time;
putting the recognition result into a cross entropy loss functionThe loss is calculated, and the model parameters are updated in a gradient descending mode, so that an operation stage identification model and an operation event identification model are constructed.
A system for retrieving and extracting operation video clips based on artificial intelligence comprises an identification module, a video retrieving and extracting module and a video playing module;
the identification module is used for dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
the video retrieval and extraction module is used for outputting the identification results of the operation stage and the operation event to correspond to the start-stop time of the complete video according to the start-stop time of the video clip, and then storing the corresponding time of the identification results in the video retrieval and extraction unit;
the video playing module is used for loading the identification results and the corresponding time data in the video retrieval system and the video extraction system, displaying the identification results and the corresponding time data in a progress bar of the video playing system, and marking an operation stage and a time period when an operation event occurs.
The system further comprises a construction module, wherein the construction module is used for constructing the video retrieval and extraction unit, the operation stage identification model and the operation event identification model.
The system also comprises a video collection and labeling module, wherein the video collection and labeling module is used for collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period.
The invention has the following advantages: a method and a system for retrieving and extracting operation video clips based on artificial intelligence are disclosed, wherein operation events combined with operation stages in an operation video are identified, and corresponding time points are displayed on a progress bar of a played time axis, so that medical staff can quickly locate the operation stage or the operation event which needs to be concerned when watching the operation video, the time for watching the operation video is greatly saved, and the learning efficiency is improved; meanwhile, the operation knowledge base and the like are constructed to serve students of medical colleges and surgeons, and the device has higher social benefit and economic benefit.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
fig. 2 is a schematic diagram of a display effect of a progress bar in a video playing system.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention relates to a method for retrieving and extracting a surgical video clip based on artificial intelligence, which specifically includes the following steps:
s1, constructing an operation stage identification model and an operation event identification model; the method specifically comprises the following steps:
s11, establishing a theoretical model of the operation stage and a theoretical model of the operation event according to expert experience, guidelines and theories, and carrying out boundary division on the operation stage and the operation event according to the theoretical model of the operation stage and the theoretical model of the operation event on the collected operation video;
and S12, collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures with operation stage time periods and operation event time periods. The method specifically comprises the following steps:
collecting operation video data according to the requirement that the resolution of each operation video is not lower than 720 x 560 and not lower than 21 frames per second, and storing the operation video data in the form of pictures;
uniformly transcoding the collected pictures into the same mpeg-4 format through ffmpeg software, and finishing primary labeling on the operation stage time period and the operation event time period through labeling software Anvil Video indication Research Tool;
and 6 qualified surgical specialist physicians trained in the early stage are responsible for quality control, manual annotation is carried out on the video data subjected to the preliminary annotation, and the pictures subjected to the preliminary annotation are modified to obtain pictures subjected to the qualified annotation.
And S13, randomly distributing the labeled surgical phase data and surgical event data into a training set, a verification set and a test set according to the ratio of 8:1:1, and training, verifying and testing images of the training set, the verification set and the test set through a ResNet network and an LSTM network to complete the construction of a surgical phase recognition model and a surgical event recognition model.
Wherein, all models are developed by using an Anaconda and Qt Creator platform, and the image processing uses an NVIDIA Tesla V100 graphics processor.
S2, dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
further, it specifically includes:
s21, dividing a video into a plurality of video clips, extracting N pictures of the video clips at equal intervals, storing each picture as a four-dimensional tensor in an (N, C, H, W) format, wherein N represents the number of extracted frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
s22, putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU active layer, a batch normalization layer and a full connection layer to extract image features, and storing the format of the image features as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset feature vector;
further, the convolution layer or the full link layer has a calculation formula as follows:where y represents the computational output, n represents the number of neurons,represents the weight of the ith neuron,representing input data of the ith neuron, b adding an offset to the result of the computation, when convolution is performedAndis a two-dimensional matrix when calculated for full connectivityAndis a one-dimensional vector.
The batch normalization layer calculation formula is as follows:where BN represents the batch normalization computation output, x represents the input data, E [ x ]]Mean value, Var [ x ], representing the x tensor]The variance of the x tensor is represented,a very small parameter is indicated to ensure that the denominator is not 0 and that γ and β are learnable coefficients.
The formula for the ReLU activation function is:where ReLU represents the computational output, z represents the input tensor, and max () represents the maximum value taken therein.
And S23, inputting the image feature vectors with the format of (M, S) into the LSTM network in M times, and identifying the operation stage and the operation event of continuous video frames. The LSTM network is composed of a plurality of units, provides information that the user forgets that the user is too long away from the current time, and updates the current state through inputting a new video frame to complete recognition.
Further, the calculation formula of one unit in the LSTM network is as follows:
wherein h istRepresenting a hidden state at time t; c. CtDenotes LSTM atCell calculation results at time t; x is the number oftRepresenting input data at time t; h ist-1Representing a hidden state at time t-1; i.e. itRepresenting the calculation result after the input operation; f. oftRepresenting the calculation result after the forgotten operation; gtRepresenting the result of the output operation; otRepresenting an output of a temporal information fusion operation on the nucleus; wii、Whi、bii、bhiRespectively representing the weight parameter and the offset of the input data feature extracted from the LSTM cell; wif、Whf、bif、bhfRespectively representing the weight parameter and the offset calculated in a forgetting gate in the LSTM cell; wig、Whg、big、bhgRespectively representing the weight parameter and the offset calculated in the output gate in the LSTM cell; wio、Who、bio、bhoRespectively representing the weight parameters and the offset calculated in the update gate in the LSTM cells; σ () is a sigmiod function calculation, wherein an indicates an exclusive-or operation.
S3, outputting the identification results of the operation stage and the operation event to the start and stop time of the complete video according to the start and stop time of the video clip, and storing the corresponding time of the identification results in the video retrieval system and the video extraction system;
s4, loading the recognition results and the corresponding time data in the video retrieval system and the video extraction system and displaying the recognition results and the corresponding time data in a progress bar of the video playing system; and mark the surgical stage and the time period during which the surgical event occurred.
As shown in fig. 2, the operation stage information and the operation event information are provided on the playing time axis, so that the surgeon can quickly retrieve the video clip to be watched; three progress bars are arranged below the playing time axis and correspond to the time axis, and if a certain event occurs or is in a certain stage in a period of time, the corresponding operation stage and the operation event are marked at the corresponding time position.
Meanwhile, according to the occurrence time of the surgical event and the surgical stage in the video, the corresponding video or picture is extracted by using the ffmpeg software to manufacture a surgical knowledge base, so that a doctor or an expert can quickly browse to carry out safety evaluation and main scalpel skill evaluation.
Further, the video retrieval system and the video extraction system are constructed by the following method:
performing deep learning-based reasoning on the input picture by adopting an operation stage identification model and an operation event identification model to obtain operation stage information and operation event information;
outputting an operation stage and an operation event occurrence time period for a user through customized video playing software to be displayed on a progress bar, and realizing the construction of a video retrieval system;
and extracting corresponding video clips in the video or extracting the video into static pictures according to a certain frame rate by using the start and stop time points of the key operation process identified by the operation stage identification model and the operation event identification model according to customized extraction software, so as to realize the construction of a video extraction system.
Further, training, verifying and testing the images of the training set, the verifying set and the testing set through a ResNet network and an LSTM network, and completing the construction of the operation stage identification model and the operation event identification model comprises the following steps:
storing the pictures of the training set, the verification set and the test set into a four-dimensional tensor of an (N, C, H, W) format, wherein N represents the number of frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the picture represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S);
and inputting the image feature vectors with the format of (M, S) into an LSTM network by M times, and identifying the operation stage and the operation event of continuous video frames. The LSTM network is composed of a plurality of units, provides information that the user forgets that the user is too long away from the current time, and updates the current state through inputting a new video frame to complete recognition.
Putting the recognition result into a cross entropy loss functionThe loss is calculated, and the model parameters are updated in a gradient descending mode, so that an operation stage identification model and an operation event identification model are constructed. Wherein CELoss represents the computation output and M represents the number of classes; y isicIndicating an indicator variable (0 or 1), which is 1 if the category is the same as that of the sample, and is 0 otherwise; p is a radical oficRepresenting the predicted probability of belonging to class c for the observed sample.
The invention also relates to a system for searching and extracting operation video clips based on artificial intelligence, which comprises an identification module, a video searching and extracting module and a video playing module;
the identification module is used for dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
the video retrieval and extraction module is used for outputting the identification results of the operation stage and the operation event to correspond to the start-stop time of the complete video according to the start-stop time of the video clip, and then storing the corresponding time of the identification results in the video retrieval and extraction unit;
the video playing module is used for loading the identification results and the corresponding time data in the video retrieval system and the video extraction system, displaying the identification results and the corresponding time data in a progress bar of the video playing system, and marking an operation stage and a time period when an operation event occurs.
Further, the system comprises a construction module, wherein the construction module is used for constructing the video retrieval and extraction unit, the operation stage identification model and the operation event identification model.
The system further comprises a video collecting and labeling module, wherein the video collecting and labeling module is used for collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A method for searching and extracting operation video clips based on artificial intelligence is characterized by comprising the following steps: the method comprises the following steps:
dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying an operation stage and an operation event in the pictures after image feature extraction;
outputting the identification results of the operation stage and the operation event to the start-stop time of the complete video according to the start-stop time of the video clip, and storing the corresponding time of the identification results in a video retrieval system and a video extraction system;
and loading the recognition results and the corresponding time data in the video retrieval system and the video extraction system, displaying the recognition results and the corresponding time data in a progress bar of the video playing system, and marking the operation stage and the time period of the operation event.
2. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the method also comprises the step of extracting corresponding videos or pictures to manufacture a surgery knowledge base according to the surgical events and the occurrence time of the surgical stages in the videos, so that doctors or experts can quickly browse to carry out safety evaluation and main knife skill evaluation.
3. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the method comprises the following steps of dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction, wherein the steps comprise:
dividing a video into a plurality of video clips, extracting N pictures of the video clips at equal intervals, and storing each picture as a four-dimensional tensor of a (N, C, H, W) format, wherein N represents the number of extracted frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset characteristic vector;
and inputting the image feature vectors with the format of (M, S) into an LSTM network by M times, and identifying the operation stage and the operation event of continuous video frames.
4. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the video retrieval system and the video extraction system are constructed by the following method:
performing deep learning-based reasoning on the input picture by adopting an operation stage identification model and an operation event identification model to obtain operation stage information and operation event information;
outputting an operation stage and an operation event occurrence time period for a user through customized video playing software to be displayed on a progress bar, and realizing the construction of a video retrieval system;
and extracting corresponding video clips in the video or extracting the video into static pictures according to a certain frame rate by using the start and stop time points of the key operation process identified by the operation stage identification model and the operation event identification model according to customized extraction software, so as to realize the construction of a video extraction system.
5. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the method also comprises the step of constructing an operation stage identification model and an operation event identification model before dividing the video; the construction steps of the surgery stage identification model and the surgery event identification model comprise:
establishing an operation stage theoretical model and an operation event theoretical model according to expert experience, guidelines and theories, and carrying out boundary division on the operation stage and the operation event according to the operation stage theoretical model and the operation event theoretical model on the collected operation video;
collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures with operation stage time periods and operation event time periods;
and randomly distributing the labeled surgical phase data and surgical event data into a training set, a verification set and a test set according to corresponding proportions, and training, verifying and testing images of the training set, the verification set and the test set through a ResNet network and an LSTM network to complete construction of a surgical phase identification model and a surgical event identification model.
6. The method for retrieving and extracting operation video clip based on artificial intelligence as claimed in claim 5, wherein: the collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period comprises the following steps:
collecting operation video data according to the requirement that the resolution of each operation video is not lower than a preset value and not lower than a preset frame per second, and storing the operation video data in a picture form;
uniformly transcoding the collected pictures into the same format through ffmpeg software, and completing primary labeling of the operation stage time period and the operation event time period through labeling software Anvil;
and manually labeling the video data subjected to preliminary labeling by a professional, and modifying the picture with unqualified preliminary labeling to obtain a picture with qualified labeling.
7. The method for retrieving and extracting operation video clip based on artificial intelligence as claimed in claim 5, wherein: the training, verifying and testing of the training set, the verifying set and the testing set pictures through the ResNet network and the LSTM network to complete the construction of the operation stage identification model and the operation event identification model comprises the following steps:
storing the pictures of the training set, the verification set and the test set into a four-dimensional tensor of an (N, C, H, W) format, wherein N represents the number of frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset characteristic vector;
inputting the image characteristic vectors with the format of (M, S) into an LSTM network by M times, identifying the operation stage and operation event of continuous video frames and recording the start time and the end time;
8. A system for retrieving and extracting operation video clips based on artificial intelligence is characterized in that: the system comprises an identification module, a video retrieval and extraction module and a video playing module;
the identification module is used for dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
the video retrieval and extraction module is used for outputting the identification results of the operation stage and the operation event to correspond to the start-stop time of the complete video according to the start-stop time of the video clip, and then storing the corresponding time of the identification results in the video retrieval and extraction unit;
the video playing module is used for loading the identification results and the corresponding time data in the video retrieval system and the video extraction system, displaying the identification results and the corresponding time data in a progress bar of the video playing system, and marking an operation stage and a time period when an operation event occurs.
9. The system for retrieving and extracting video clip of surgery based on artificial intelligence as claimed in claim 8, wherein: the system further comprises a construction module, wherein the construction module is used for constructing the video retrieval and extraction unit, the operation stage identification model and the operation event identification model.
10. The system for retrieving and extracting video clip of surgery based on artificial intelligence as claimed in claim 8, wherein: the system also comprises a video collection and labeling module, wherein the video collection and labeling module is used for collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111310650.2A CN113742527A (en) | 2021-11-08 | 2021-11-08 | Method and system for retrieving and extracting operation video clips based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111310650.2A CN113742527A (en) | 2021-11-08 | 2021-11-08 | Method and system for retrieving and extracting operation video clips based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113742527A true CN113742527A (en) | 2021-12-03 |
Family
ID=78727661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111310650.2A Pending CN113742527A (en) | 2021-11-08 | 2021-11-08 | Method and system for retrieving and extracting operation video clips based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113742527A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114339299A (en) * | 2021-12-27 | 2022-04-12 | 司法鉴定科学研究院 | Video evidence obtaining method for automobile driving recorder |
TWI778900B (en) * | 2021-12-28 | 2022-09-21 | 慧術科技股份有限公司 | Marking and teaching of surgical procedure system and method thereof |
CN116193231A (en) * | 2022-10-24 | 2023-05-30 | 成都与睿创新科技有限公司 | Method and system for handling minimally invasive surgical field anomalies |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200237452A1 (en) * | 2018-08-13 | 2020-07-30 | Theator inc. | Timeline overlay on surgical video |
US10729502B1 (en) * | 2019-02-21 | 2020-08-04 | Theator inc. | Intraoperative surgical event summary |
CN112932663A (en) * | 2021-03-02 | 2021-06-11 | 成都与睿创新科技有限公司 | Intelligent auxiliary method and system for improving safety of laparoscopic cholecystectomy |
-
2021
- 2021-11-08 CN CN202111310650.2A patent/CN113742527A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200237452A1 (en) * | 2018-08-13 | 2020-07-30 | Theator inc. | Timeline overlay on surgical video |
US10729502B1 (en) * | 2019-02-21 | 2020-08-04 | Theator inc. | Intraoperative surgical event summary |
CN112932663A (en) * | 2021-03-02 | 2021-06-11 | 成都与睿创新科技有限公司 | Intelligent auxiliary method and system for improving safety of laparoscopic cholecystectomy |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114339299A (en) * | 2021-12-27 | 2022-04-12 | 司法鉴定科学研究院 | Video evidence obtaining method for automobile driving recorder |
TWI778900B (en) * | 2021-12-28 | 2022-09-21 | 慧術科技股份有限公司 | Marking and teaching of surgical procedure system and method thereof |
CN116193231A (en) * | 2022-10-24 | 2023-05-30 | 成都与睿创新科技有限公司 | Method and system for handling minimally invasive surgical field anomalies |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113742527A (en) | Method and system for retrieving and extracting operation video clips based on artificial intelligence | |
CN110464366A (en) | A kind of Emotion identification method, system and storage medium | |
WO2021047237A1 (en) | Uploader matching method and device | |
JPH10326286A (en) | Similarity retrieval device and recording medium where similarity retrival program is recorded | |
CN113395578B (en) | Method, device, equipment and storage medium for extracting video theme text | |
CN112309215A (en) | Demonstration system for clinical medicine internal medicine teaching and control method thereof | |
CN107292103A (en) | A kind of prognostic chart picture generation method and device | |
JP2021108146A (en) | Information processing device, information processing method and information processing program | |
CN116862931A (en) | Medical image segmentation method and device, storage medium and electronic equipment | |
CN116016869A (en) | Campus safety monitoring system based on artificial intelligence and Internet of things | |
CN107707940A (en) | Video sequencing method, device, server and system | |
Bärmann et al. | Where did i leave my keys?-episodic-memory-based question answering on egocentric videos | |
Beriwal et al. | Techniques for suicidal ideation prediction: a qualitative systematic review | |
Song | RETRACTED: Image processing technology in American football teaching | |
CN113313254B (en) | Deep learning model unbiasing method for memory enhancement element learning | |
CN112183108B (en) | Inference method, system, computer equipment and storage medium for short text topic distribution | |
CN114862141A (en) | Method, device and equipment for recommending courses based on portrait relevance and storage medium | |
CN112396114A (en) | Evaluation system, evaluation method and related product | |
CN113822389B (en) | Digestive tract disease classification system based on endoscope picture | |
JP7180921B1 (en) | Program, information processing device and information processing method | |
WO2021131762A1 (en) | Exercise menu evaluating device, method, and computer-readable medium | |
KR102648225B1 (en) | Method of providing emotional intelligence training for the mentally disadvantaged based on virtual reality and device using the same | |
CN110457507B (en) | Picture identification processing method and device, electronic equipment and storage medium | |
CN117743407A (en) | Method, device, equipment and readable storage medium for recommending sports item | |
CN113948210A (en) | Data evaluation method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211203 |
|
RJ01 | Rejection of invention patent application after publication |