CN113742527A - Method and system for retrieving and extracting operation video clips based on artificial intelligence - Google Patents

Method and system for retrieving and extracting operation video clips based on artificial intelligence Download PDF

Info

Publication number
CN113742527A
CN113742527A CN202111310650.2A CN202111310650A CN113742527A CN 113742527 A CN113742527 A CN 113742527A CN 202111310650 A CN202111310650 A CN 202111310650A CN 113742527 A CN113742527 A CN 113742527A
Authority
CN
China
Prior art keywords
video
pictures
identification model
extracting
operation stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111310650.2A
Other languages
Chinese (zh)
Inventor
刘杰
王玉贤
刘润文
吴少南
沈小江
王昕�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Yurui Innovation Technology Co ltd
Original Assignee
Chengdu Yurui Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Yurui Innovation Technology Co ltd filed Critical Chengdu Yurui Innovation Technology Co ltd
Priority to CN202111310650.2A priority Critical patent/CN113742527A/en
Publication of CN113742527A publication Critical patent/CN113742527A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a method and a system for retrieving and extracting operation video clips based on artificial intelligence, which comprises the steps of dividing a video, and inputting a plurality of pictures of the video clips extracted at equal intervals into an operation stage identification model and an operation event identification model for identification; outputting the identification results of the operation stage and the operation event to the start-stop time of the complete video according to the start-stop time of the video clip, and storing the corresponding time of the identification results in a video retrieval and video extraction system; and loading the recognition result and the corresponding time data in the video retrieval and video extraction system and displaying the recognition result and the corresponding time data in a progress bar of the video playing system. According to the method and the device, the operation stages and the operation events in the operation video are identified, and the corresponding time points are displayed on the progress bar of the played time axis, so that medical personnel can quickly position the operation stages or the operation events needing to be paid attention to when watching the operation video, the time is greatly saved, and the learning efficiency is improved.

Description

Method and system for retrieving and extracting operation video clips based on artificial intelligence
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a system for retrieving and extracting operation video clips based on artificial intelligence.
Background
According to the statistical annual report of China, 6171.58 and 6930.44 thousands of people can perform various operations. With the increasing medical quality demands of people, the demands on the surgical technicians are also expanding. On one hand, medical students need to learn surgical skills and treatment methods of sudden events in surgery as soon as possible; secondly, in order to improve the operation quality, the surgeon can repeatedly check the operation videos of himself or others, so that the surgical skill of the surgeon can be improved.
However, long-time operation video watching may take a lot of time for the doctor, and long-time operation video watching may reduce the attention of the doctor and the student, and thus may miss a part of the important steps or interested operation time. Especially, the operation video with ultra-long duration (such as the laparoscopic pancreaticoduodenectomy operation, which is generally as long as 6-8 hours), so that the doctor is greatly burdened when reviewing the video, and the learning efficiency is extremely low. The video cannot be quickly searched when a specific operation process, a certain operation event or a certain type of operation event occurs in the operation cannot be quickly positioned, so that the interested time can be quickly found for key browsing. In addition, according to the management needs or business application needs, a hospital or an administrative management organization needs to extract key information in the operation process, that is, extract key video clips or pictures in the operation process by an automatic method, so as to be used for learning research, knowledge base construction, safety behavior evaluation samples and selection of main knife technical capability evaluation samples in the operation process, however, the existing technology cannot meet the requirements.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a method and a system for retrieving and extracting an operation video clip based on artificial intelligence, and solves the problems existing when medical care personnel watch an operation video at present.
The purpose of the invention is realized by the following technical scheme: a method for retrieving and extracting surgical video clips based on artificial intelligence, the method comprising:
dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying an operation stage and an operation event in the pictures after image feature extraction;
outputting the identification results of the operation stage and the operation event to the start-stop time of the complete video according to the start-stop time of the video clip, and storing the corresponding time of the identification results in a video retrieval system and a video extraction system;
and loading the recognition results and the corresponding time data in the video retrieval system and the video extraction system, displaying the recognition results and the corresponding time data in a progress bar of the video playing system, and marking the operation stage and the time period of the operation event.
The method also comprises the step of extracting corresponding videos or pictures to manufacture a surgery knowledge base according to the surgical events and the occurrence time of the surgical stages in the videos, so that doctors or experts can quickly browse to carry out safety evaluation and main knife skill evaluation.
The method comprises the following steps of dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction, wherein the steps comprise:
dividing a video into a plurality of video clips, extracting N pictures of the video clips at equal intervals, and storing each picture as a four-dimensional tensor of a (N, C, H, W) format, wherein N represents the number of extracted frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset characteristic vector;
and inputting the image feature vectors with the format of (M, S) into an LSTM network by M times, and identifying the operation stage and the operation event of continuous video frames.
The video retrieval system and the video extraction system are constructed by the following method:
performing deep learning-based reasoning on the input picture by adopting an operation stage identification model and an operation event identification model to obtain operation stage information and operation event information;
outputting an operation stage and an operation event occurrence time period for a user through customized video playing software to be displayed on a progress bar, and realizing the construction of a video retrieval system;
and extracting corresponding video clips in the video or extracting the video into static pictures according to a certain frame rate by using the start and stop time points of the key operation process identified by the operation stage identification model and the operation event identification model according to customized extraction software, so as to realize the construction of a video extraction system.
The method also comprises the step of constructing an operation stage identification model and an operation event identification model before dividing the video; the construction steps of the surgery stage identification model and the surgery event identification model comprise:
establishing an operation stage theoretical model and an operation event theoretical model according to expert experience, guidelines and theories, and carrying out boundary division on the operation stage and the operation event according to the operation stage theoretical model and the operation event theoretical model on the collected operation video;
and collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures with operation stage time periods and operation event time periods.
And randomly distributing the labeled surgical phase data and surgical event data into a training set, a verification set and a test set according to corresponding proportions, and training, verifying and testing images of the training set, the verification set and the test set through a ResNet network and an LSTM network to complete construction of a surgical phase identification model and a surgical event identification model.
The collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period comprises the following steps:
collecting operation video data according to the requirement that the resolution of each operation video is not lower than a preset value and not lower than a preset frame per second, and storing the operation video data in a picture form;
uniformly transcoding the collected pictures into the same format through ffmpeg software, and completing primary labeling of the operation stage time period and the operation event time period through labeling software Anvil;
and manually labeling the video data subjected to preliminary labeling by a professional, and modifying the picture with unqualified preliminary labeling to obtain a picture with qualified labeling.
The training, verifying and testing of the training set, the verifying set and the testing set pictures through the ResNet network and the LSTM network to complete the construction of the operation stage identification model and the operation event identification model comprises the following steps:
storing the pictures of the training set, the verification set and the test set into a four-dimensional tensor of an (N, C, H, W) format, wherein N represents the number of frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the picture represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S);
inputting the image characteristic vectors with the format of (M, S) into an LSTM network by M times, identifying the operation stage and operation event of continuous video frames and recording the start time and the end time;
putting the recognition result into a cross entropy loss function
Figure 41356DEST_PATH_IMAGE001
The loss is calculated, and the model parameters are updated in a gradient descending mode, so that an operation stage identification model and an operation event identification model are constructed.
A system for retrieving and extracting operation video clips based on artificial intelligence comprises an identification module, a video retrieving and extracting module and a video playing module;
the identification module is used for dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
the video retrieval and extraction module is used for outputting the identification results of the operation stage and the operation event to correspond to the start-stop time of the complete video according to the start-stop time of the video clip, and then storing the corresponding time of the identification results in the video retrieval and extraction unit;
the video playing module is used for loading the identification results and the corresponding time data in the video retrieval system and the video extraction system, displaying the identification results and the corresponding time data in a progress bar of the video playing system, and marking an operation stage and a time period when an operation event occurs.
The system further comprises a construction module, wherein the construction module is used for constructing the video retrieval and extraction unit, the operation stage identification model and the operation event identification model.
The system also comprises a video collection and labeling module, wherein the video collection and labeling module is used for collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period.
The invention has the following advantages: a method and a system for retrieving and extracting operation video clips based on artificial intelligence are disclosed, wherein operation events combined with operation stages in an operation video are identified, and corresponding time points are displayed on a progress bar of a played time axis, so that medical staff can quickly locate the operation stage or the operation event which needs to be concerned when watching the operation video, the time for watching the operation video is greatly saved, and the learning efficiency is improved; meanwhile, the operation knowledge base and the like are constructed to serve students of medical colleges and surgeons, and the device has higher social benefit and economic benefit.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
fig. 2 is a schematic diagram of a display effect of a progress bar in a video playing system.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention relates to a method for retrieving and extracting a surgical video clip based on artificial intelligence, which specifically includes the following steps:
s1, constructing an operation stage identification model and an operation event identification model; the method specifically comprises the following steps:
s11, establishing a theoretical model of the operation stage and a theoretical model of the operation event according to expert experience, guidelines and theories, and carrying out boundary division on the operation stage and the operation event according to the theoretical model of the operation stage and the theoretical model of the operation event on the collected operation video;
and S12, collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures with operation stage time periods and operation event time periods. The method specifically comprises the following steps:
collecting operation video data according to the requirement that the resolution of each operation video is not lower than 720 x 560 and not lower than 21 frames per second, and storing the operation video data in the form of pictures;
uniformly transcoding the collected pictures into the same mpeg-4 format through ffmpeg software, and finishing primary labeling on the operation stage time period and the operation event time period through labeling software Anvil Video indication Research Tool;
and 6 qualified surgical specialist physicians trained in the early stage are responsible for quality control, manual annotation is carried out on the video data subjected to the preliminary annotation, and the pictures subjected to the preliminary annotation are modified to obtain pictures subjected to the qualified annotation.
And S13, randomly distributing the labeled surgical phase data and surgical event data into a training set, a verification set and a test set according to the ratio of 8:1:1, and training, verifying and testing images of the training set, the verification set and the test set through a ResNet network and an LSTM network to complete the construction of a surgical phase recognition model and a surgical event recognition model.
Wherein, all models are developed by using an Anaconda and Qt Creator platform, and the image processing uses an NVIDIA Tesla V100 graphics processor.
S2, dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
further, it specifically includes:
s21, dividing a video into a plurality of video clips, extracting N pictures of the video clips at equal intervals, storing each picture as a four-dimensional tensor in an (N, C, H, W) format, wherein N represents the number of extracted frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
s22, putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU active layer, a batch normalization layer and a full connection layer to extract image features, and storing the format of the image features as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset feature vector;
further, the convolution layer or the full link layer has a calculation formula as follows:
Figure 336071DEST_PATH_IMAGE002
where y represents the computational output, n represents the number of neurons,
Figure 842139DEST_PATH_IMAGE003
represents the weight of the ith neuron,
Figure 120674DEST_PATH_IMAGE004
representing input data of the ith neuron, b adding an offset to the result of the computation, when convolution is performed
Figure 737600DEST_PATH_IMAGE003
And
Figure 621242DEST_PATH_IMAGE004
is a two-dimensional matrix when calculated for full connectivity
Figure 360528DEST_PATH_IMAGE005
And
Figure 532883DEST_PATH_IMAGE004
is a one-dimensional vector.
The batch normalization layer calculation formula is as follows:
Figure 761957DEST_PATH_IMAGE006
where BN represents the batch normalization computation output, x represents the input data, E [ x ]]Mean value, Var [ x ], representing the x tensor]The variance of the x tensor is represented,
Figure 500106DEST_PATH_IMAGE008
a very small parameter is indicated to ensure that the denominator is not 0 and that γ and β are learnable coefficients.
The formula for the ReLU activation function is:
Figure 410293DEST_PATH_IMAGE009
where ReLU represents the computational output, z represents the input tensor, and max () represents the maximum value taken therein.
And S23, inputting the image feature vectors with the format of (M, S) into the LSTM network in M times, and identifying the operation stage and the operation event of continuous video frames. The LSTM network is composed of a plurality of units, provides information that the user forgets that the user is too long away from the current time, and updates the current state through inputting a new video frame to complete recognition.
Further, the calculation formula of one unit in the LSTM network is as follows:
Figure 866682DEST_PATH_IMAGE011
Figure 825411DEST_PATH_IMAGE013
Figure 683645DEST_PATH_IMAGE015
Figure 702417DEST_PATH_IMAGE017
Figure 646102DEST_PATH_IMAGE019
Figure 470839DEST_PATH_IMAGE020
wherein h istRepresenting a hidden state at time t; c. CtDenotes LSTM atCell calculation results at time t; x is the number oftRepresenting input data at time t; h ist-1Representing a hidden state at time t-1; i.e. itRepresenting the calculation result after the input operation; f. oftRepresenting the calculation result after the forgotten operation; gtRepresenting the result of the output operation; otRepresenting an output of a temporal information fusion operation on the nucleus; wii、Whi、bii、bhiRespectively representing the weight parameter and the offset of the input data feature extracted from the LSTM cell; wif、Whf、bif、bhfRespectively representing the weight parameter and the offset calculated in a forgetting gate in the LSTM cell; wig、Whg、big、bhgRespectively representing the weight parameter and the offset calculated in the output gate in the LSTM cell; wio、Who、bio、bhoRespectively representing the weight parameters and the offset calculated in the update gate in the LSTM cells; σ () is a sigmiod function calculation, wherein an indicates an exclusive-or operation.
S3, outputting the identification results of the operation stage and the operation event to the start and stop time of the complete video according to the start and stop time of the video clip, and storing the corresponding time of the identification results in the video retrieval system and the video extraction system;
s4, loading the recognition results and the corresponding time data in the video retrieval system and the video extraction system and displaying the recognition results and the corresponding time data in a progress bar of the video playing system; and mark the surgical stage and the time period during which the surgical event occurred.
As shown in fig. 2, the operation stage information and the operation event information are provided on the playing time axis, so that the surgeon can quickly retrieve the video clip to be watched; three progress bars are arranged below the playing time axis and correspond to the time axis, and if a certain event occurs or is in a certain stage in a period of time, the corresponding operation stage and the operation event are marked at the corresponding time position.
Meanwhile, according to the occurrence time of the surgical event and the surgical stage in the video, the corresponding video or picture is extracted by using the ffmpeg software to manufacture a surgical knowledge base, so that a doctor or an expert can quickly browse to carry out safety evaluation and main scalpel skill evaluation.
Further, the video retrieval system and the video extraction system are constructed by the following method:
performing deep learning-based reasoning on the input picture by adopting an operation stage identification model and an operation event identification model to obtain operation stage information and operation event information;
outputting an operation stage and an operation event occurrence time period for a user through customized video playing software to be displayed on a progress bar, and realizing the construction of a video retrieval system;
and extracting corresponding video clips in the video or extracting the video into static pictures according to a certain frame rate by using the start and stop time points of the key operation process identified by the operation stage identification model and the operation event identification model according to customized extraction software, so as to realize the construction of a video extraction system.
Further, training, verifying and testing the images of the training set, the verifying set and the testing set through a ResNet network and an LSTM network, and completing the construction of the operation stage identification model and the operation event identification model comprises the following steps:
storing the pictures of the training set, the verification set and the test set into a four-dimensional tensor of an (N, C, H, W) format, wherein N represents the number of frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the picture represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S);
and inputting the image feature vectors with the format of (M, S) into an LSTM network by M times, and identifying the operation stage and the operation event of continuous video frames. The LSTM network is composed of a plurality of units, provides information that the user forgets that the user is too long away from the current time, and updates the current state through inputting a new video frame to complete recognition.
Putting the recognition result into a cross entropy loss function
Figure 183580DEST_PATH_IMAGE021
The loss is calculated, and the model parameters are updated in a gradient descending mode, so that an operation stage identification model and an operation event identification model are constructed. Wherein CELoss represents the computation output and M represents the number of classes; y isicIndicating an indicator variable (0 or 1), which is 1 if the category is the same as that of the sample, and is 0 otherwise; p is a radical oficRepresenting the predicted probability of belonging to class c for the observed sample.
The invention also relates to a system for searching and extracting operation video clips based on artificial intelligence, which comprises an identification module, a video searching and extracting module and a video playing module;
the identification module is used for dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
the video retrieval and extraction module is used for outputting the identification results of the operation stage and the operation event to correspond to the start-stop time of the complete video according to the start-stop time of the video clip, and then storing the corresponding time of the identification results in the video retrieval and extraction unit;
the video playing module is used for loading the identification results and the corresponding time data in the video retrieval system and the video extraction system, displaying the identification results and the corresponding time data in a progress bar of the video playing system, and marking an operation stage and a time period when an operation event occurs.
Further, the system comprises a construction module, wherein the construction module is used for constructing the video retrieval and extraction unit, the operation stage identification model and the operation event identification model.
The system further comprises a video collecting and labeling module, wherein the video collecting and labeling module is used for collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for searching and extracting operation video clips based on artificial intelligence is characterized by comprising the following steps: the method comprises the following steps:
dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying an operation stage and an operation event in the pictures after image feature extraction;
outputting the identification results of the operation stage and the operation event to the start-stop time of the complete video according to the start-stop time of the video clip, and storing the corresponding time of the identification results in a video retrieval system and a video extraction system;
and loading the recognition results and the corresponding time data in the video retrieval system and the video extraction system, displaying the recognition results and the corresponding time data in a progress bar of the video playing system, and marking the operation stage and the time period of the operation event.
2. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the method also comprises the step of extracting corresponding videos or pictures to manufacture a surgery knowledge base according to the surgical events and the occurrence time of the surgical stages in the videos, so that doctors or experts can quickly browse to carry out safety evaluation and main knife skill evaluation.
3. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the method comprises the following steps of dividing a video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into an operation stage identification model and an operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction, wherein the steps comprise:
dividing a video into a plurality of video clips, extracting N pictures of the video clips at equal intervals, and storing each picture as a four-dimensional tensor of a (N, C, H, W) format, wherein N represents the number of extracted frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset characteristic vector;
and inputting the image feature vectors with the format of (M, S) into an LSTM network by M times, and identifying the operation stage and the operation event of continuous video frames.
4. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the video retrieval system and the video extraction system are constructed by the following method:
performing deep learning-based reasoning on the input picture by adopting an operation stage identification model and an operation event identification model to obtain operation stage information and operation event information;
outputting an operation stage and an operation event occurrence time period for a user through customized video playing software to be displayed on a progress bar, and realizing the construction of a video retrieval system;
and extracting corresponding video clips in the video or extracting the video into static pictures according to a certain frame rate by using the start and stop time points of the key operation process identified by the operation stage identification model and the operation event identification model according to customized extraction software, so as to realize the construction of a video extraction system.
5. The method for retrieving and extracting the surgical video clip based on artificial intelligence of claim 1, wherein: the method also comprises the step of constructing an operation stage identification model and an operation event identification model before dividing the video; the construction steps of the surgery stage identification model and the surgery event identification model comprise:
establishing an operation stage theoretical model and an operation event theoretical model according to expert experience, guidelines and theories, and carrying out boundary division on the operation stage and the operation event according to the operation stage theoretical model and the operation event theoretical model on the collected operation video;
collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures with operation stage time periods and operation event time periods;
and randomly distributing the labeled surgical phase data and surgical event data into a training set, a verification set and a test set according to corresponding proportions, and training, verifying and testing images of the training set, the verification set and the test set through a ResNet network and an LSTM network to complete construction of a surgical phase identification model and a surgical event identification model.
6. The method for retrieving and extracting operation video clip based on artificial intelligence as claimed in claim 5, wherein: the collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period comprises the following steps:
collecting operation video data according to the requirement that the resolution of each operation video is not lower than a preset value and not lower than a preset frame per second, and storing the operation video data in a picture form;
uniformly transcoding the collected pictures into the same format through ffmpeg software, and completing primary labeling of the operation stage time period and the operation event time period through labeling software Anvil;
and manually labeling the video data subjected to preliminary labeling by a professional, and modifying the picture with unqualified preliminary labeling to obtain a picture with qualified labeling.
7. The method for retrieving and extracting operation video clip based on artificial intelligence as claimed in claim 5, wherein: the training, verifying and testing of the training set, the verifying set and the testing set pictures through the ResNet network and the LSTM network to complete the construction of the operation stage identification model and the operation event identification model comprises the following steps:
storing the pictures of the training set, the verification set and the test set into a four-dimensional tensor of an (N, C, H, W) format, wherein N represents the number of frames of each video clip, C represents the number of channels of each picture, H is the width of the picture, and W represents the length of the picture;
putting the pictures represented by the four-dimensional tensor into a ResNet network consisting of a plurality of 2D convolutions, a ReLU activation layer, a batch normalization layer and a full connection layer to extract image characteristics, and storing the format of the image characteristics as (M, S), wherein M represents the number of input pictures, and S represents the length of a preset characteristic vector;
inputting the image characteristic vectors with the format of (M, S) into an LSTM network by M times, identifying the operation stage and operation event of continuous video frames and recording the start time and the end time;
putting the recognition result into a cross entropy loss function
Figure 935768DEST_PATH_IMAGE002
The loss is calculated, and the model parameters are updated in a gradient descending mode, so that an operation stage identification model and an operation event identification model are constructed.
8. A system for retrieving and extracting operation video clips based on artificial intelligence is characterized in that: the system comprises an identification module, a video retrieval and extraction module and a video playing module;
the identification module is used for dividing the video into a plurality of video segments, extracting a plurality of pictures of the video segments at equal intervals, inputting the pictures into the operation stage identification model and the operation event identification model, and respectively identifying the operation stage and the operation event in the pictures after image feature extraction;
the video retrieval and extraction module is used for outputting the identification results of the operation stage and the operation event to correspond to the start-stop time of the complete video according to the start-stop time of the video clip, and then storing the corresponding time of the identification results in the video retrieval and extraction unit;
the video playing module is used for loading the identification results and the corresponding time data in the video retrieval system and the video extraction system, displaying the identification results and the corresponding time data in a progress bar of the video playing system, and marking an operation stage and a time period when an operation event occurs.
9. The system for retrieving and extracting video clip of surgery based on artificial intelligence as claimed in claim 8, wherein: the system further comprises a construction module, wherein the construction module is used for constructing the video retrieval and extraction unit, the operation stage identification model and the operation event identification model.
10. The system for retrieving and extracting video clip of surgery based on artificial intelligence as claimed in claim 8, wherein: the system also comprises a video collection and labeling module, wherein the video collection and labeling module is used for collecting a large amount of video data into pictures according to the requirement of resolution, and labeling the collected pictures in the operation stage time period and the operation event time period.
CN202111310650.2A 2021-11-08 2021-11-08 Method and system for retrieving and extracting operation video clips based on artificial intelligence Pending CN113742527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111310650.2A CN113742527A (en) 2021-11-08 2021-11-08 Method and system for retrieving and extracting operation video clips based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111310650.2A CN113742527A (en) 2021-11-08 2021-11-08 Method and system for retrieving and extracting operation video clips based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN113742527A true CN113742527A (en) 2021-12-03

Family

ID=78727661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111310650.2A Pending CN113742527A (en) 2021-11-08 2021-11-08 Method and system for retrieving and extracting operation video clips based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN113742527A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114339299A (en) * 2021-12-27 2022-04-12 司法鉴定科学研究院 Video evidence obtaining method for automobile driving recorder
TWI778900B (en) * 2021-12-28 2022-09-21 慧術科技股份有限公司 Marking and teaching of surgical procedure system and method thereof
CN116193231A (en) * 2022-10-24 2023-05-30 成都与睿创新科技有限公司 Method and system for handling minimally invasive surgical field anomalies

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200237452A1 (en) * 2018-08-13 2020-07-30 Theator inc. Timeline overlay on surgical video
US10729502B1 (en) * 2019-02-21 2020-08-04 Theator inc. Intraoperative surgical event summary
CN112932663A (en) * 2021-03-02 2021-06-11 成都与睿创新科技有限公司 Intelligent auxiliary method and system for improving safety of laparoscopic cholecystectomy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200237452A1 (en) * 2018-08-13 2020-07-30 Theator inc. Timeline overlay on surgical video
US10729502B1 (en) * 2019-02-21 2020-08-04 Theator inc. Intraoperative surgical event summary
CN112932663A (en) * 2021-03-02 2021-06-11 成都与睿创新科技有限公司 Intelligent auxiliary method and system for improving safety of laparoscopic cholecystectomy

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114339299A (en) * 2021-12-27 2022-04-12 司法鉴定科学研究院 Video evidence obtaining method for automobile driving recorder
TWI778900B (en) * 2021-12-28 2022-09-21 慧術科技股份有限公司 Marking and teaching of surgical procedure system and method thereof
CN116193231A (en) * 2022-10-24 2023-05-30 成都与睿创新科技有限公司 Method and system for handling minimally invasive surgical field anomalies

Similar Documents

Publication Publication Date Title
CN113742527A (en) Method and system for retrieving and extracting operation video clips based on artificial intelligence
CN110464366A (en) A kind of Emotion identification method, system and storage medium
WO2021047237A1 (en) Uploader matching method and device
JPH10326286A (en) Similarity retrieval device and recording medium where similarity retrival program is recorded
CN113395578B (en) Method, device, equipment and storage medium for extracting video theme text
CN112309215A (en) Demonstration system for clinical medicine internal medicine teaching and control method thereof
CN107292103A (en) A kind of prognostic chart picture generation method and device
JP2021108146A (en) Information processing device, information processing method and information processing program
CN116862931A (en) Medical image segmentation method and device, storage medium and electronic equipment
CN116016869A (en) Campus safety monitoring system based on artificial intelligence and Internet of things
CN107707940A (en) Video sequencing method, device, server and system
Bärmann et al. Where did i leave my keys?-episodic-memory-based question answering on egocentric videos
Beriwal et al. Techniques for suicidal ideation prediction: a qualitative systematic review
Song RETRACTED: Image processing technology in American football teaching
CN113313254B (en) Deep learning model unbiasing method for memory enhancement element learning
CN112183108B (en) Inference method, system, computer equipment and storage medium for short text topic distribution
CN114862141A (en) Method, device and equipment for recommending courses based on portrait relevance and storage medium
CN112396114A (en) Evaluation system, evaluation method and related product
CN113822389B (en) Digestive tract disease classification system based on endoscope picture
JP7180921B1 (en) Program, information processing device and information processing method
WO2021131762A1 (en) Exercise menu evaluating device, method, and computer-readable medium
KR102648225B1 (en) Method of providing emotional intelligence training for the mentally disadvantaged based on virtual reality and device using the same
CN110457507B (en) Picture identification processing method and device, electronic equipment and storage medium
CN117743407A (en) Method, device, equipment and readable storage medium for recommending sports item
CN113948210A (en) Data evaluation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211203

RJ01 Rejection of invention patent application after publication