WO2023116351A1 - Responsibility frame extraction method, video classification method, device and medium - Google Patents

Responsibility frame extraction method, video classification method, device and medium Download PDF

Info

Publication number
WO2023116351A1
WO2023116351A1 PCT/CN2022/134699 CN2022134699W WO2023116351A1 WO 2023116351 A1 WO2023116351 A1 WO 2023116351A1 CN 2022134699 W CN2022134699 W CN 2022134699W WO 2023116351 A1 WO2023116351 A1 WO 2023116351A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
frame
image
responsible
matrix
Prior art date
Application number
PCT/CN2022/134699
Other languages
French (fr)
Chinese (zh)
Inventor
蒋逸韬
石思远
崔晨
Original Assignee
上海微创卜算子医疗科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202111572826.1A external-priority patent/CN116343073A/en
Priority claimed from CN202210639251.9A external-priority patent/CN117237263A/en
Application filed by 上海微创卜算子医疗科技有限公司 filed Critical 上海微创卜算子医疗科技有限公司
Publication of WO2023116351A1 publication Critical patent/WO2023116351A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present application relates to the technical field of image processing, in particular to a responsible frame extraction method, a video classification method, electronic equipment and a storage medium.
  • Ultrasound is a common means of disease medical imaging examination, which can be used for disease diagnosis of various tissues and organs.
  • ultrasound hardware has been continuously upgraded in terms of portability, and the product form of handheld ultrasound has realized the unity of function and portability, which is suitable for grassroots disease screening scenarios.
  • speckle noise due to the high granularity of ultrasound images, there are a large number of speckle noise, artifacts, attenuation and other problems, it is difficult to standardize and standardize ultrasound diagnosis, and it relies heavily on the clinical experience of sonographers.
  • Grassroots medical institutions such as primary hospitals, community hospitals, and township clinics lack experienced sonographers, and it is difficult to make accurate benign and malignant judgments on ultrasound videos.
  • the purpose of this application is to provide a responsible frame extraction method, video classification method, electronic equipment and storage medium, which can automatically find out in the video that contributes different important features to video classification (such as the classification of benign and malignant nodule videos).
  • Responsibility frames to improve the accuracy of video classification, such as the classification of benign and malignant nodule videos.
  • the application provides a method for extracting responsible frames, including: obtaining the video to be extracted; using the skeleton network of the static image classification neural network model to perform feature extraction on each frame image in the video to be extracted, to obtain The feature matrix of each frame image; the maximum pooling operation is performed on the feature matrix of all frame images to obtain the video feature matrix of the video to be extracted; according to the feature matrix of each frame image and the video feature matrix, extract Preset number of responsible frames.
  • extracting a preset number of responsible frames according to the feature matrix of each frame image and the video feature matrix includes: multiplying the feature value of each feature dimension in the video feature matrix by the The importance value of the feature dimension to obtain the video feature importance matrix; for each frame image, the feature value of each feature dimension in the feature matrix of the frame image is multiplied by the importance value of the feature dimension to obtain the A feature importance matrix of a frame image; extracting a preset number of responsible frames according to the video feature importance matrix and the feature importance matrix of each frame image.
  • extracting a preset number of responsible frames according to the video feature importance matrix and the feature importance matrix of each frame image includes: step A1, using the video feature importance matrix as the current video Feature importance matrix; step B1, for each frame of image, subtracting the feature importance matrix of the frame image from the current video feature importance matrix to obtain the remaining feature importance matrix corresponding to the frame image; step C1 1.
  • step D1 For each frame image, add the eigenvalues of each feature dimension in the remaining feature importance matrix corresponding to the frame image to obtain the remaining information entropy corresponding to the frame image; step D1, minimize the remaining information entropy image as the current responsibility frame; step E1, using the remaining feature importance matrix corresponding to the current responsibility frame as a new current video feature importance matrix; repeat the above steps B1 to step E1 until a preset number of responsibility is extracted frame.
  • the subtracting the feature importance matrix of the frame image from the feature importance matrix of the current video to obtain the remaining feature importance matrix corresponding to the frame image includes: dividing the feature importance matrix of the current video The eigenvalue of each feature dimension in the matrix is subtracted from the eigenvalue of the corresponding feature dimension in the feature importance matrix of the frame image to obtain the eigenvalue difference of each feature dimension; for the eigenvalue difference of each feature dimension, If the eigenvalue difference of this feature dimension is less than 0, then use 0 as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image; if the eigenvalue difference of this feature dimension is greater than or equal to 0, then set The eigenvalue difference of the feature dimension is used as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image.
  • extracting a preset number of responsible frames according to the feature matrix of each frame image includes: for each frame image, multiplying the feature value of each feature dimension in the feature matrix of the frame image by The contribution weight value of the feature dimension to obtain the feature entropy matrix of the frame image; perform a maximum pooling operation on the feature entropy matrix of all frame images to obtain the video feature entropy matrix of the video to be extracted; according to each frame of image
  • the feature entropy matrix and the video feature entropy matrix are used to extract a preset number of responsible frames.
  • the extracting a preset number of responsible frames according to the feature entropy matrix of each frame image and the video feature entropy matrix includes: for each frame image, the The eigenvalues of all feature dimensions are added to obtain the evaluation score of the frame image; the eigenvalues of all the feature dimensions in the video feature entropy matrix are added to obtain the evaluation score of the video to be extracted; The evaluation score of each frame image and the evaluation score of the video to be extracted extract a preset number of responsible frames, wherein the evaluation score of the video to be extracted is the same as that determined by the preset number of responsible frames The resulting set of images has the smallest difference in evaluation scores.
  • the extraction of a preset number of responsible frames according to the evaluation score of each frame of image and the evaluation score of the video to be extracted includes: step A2, for each frame of image, calculating the Extract the difference between the evaluation score of the video and the evaluation score of the frame image to obtain the feature entropy difference of the frame image; step B2, determine the image with the smallest feature entropy difference as the responsibility frame; step C2, set all responsibility The frame and each non-responsible frame form an image set respectively, and calculate the evaluation score of each image set respectively; step D2, for each image set, calculate the evaluation score of the video to be extracted and the evaluation of the image set score difference to obtain the feature entropy difference of the image set; step E2, determine all the images in the image set with the smallest feature entropy difference as responsible frames; repeat the above steps C2 to step E2 until the preset number is extracted responsibility frame.
  • the responsible frame extraction method further includes: using a target detection neural network model to extract the region of interest for each frame of image in the acquired video to be extracted, so as to obtain the region of interest corresponding to each frame of image Region image; use the skeleton network of the static image classification neural network model to extract the features of each frame of the region of interest image to obtain the feature matrix of each frame of the region of interest image; according to the feature matrix of each frame of the region of interest image, perform The extraction of malicious responsible frames until the malignant feature entropy corresponding to the set of malicious responsible frames formed by all the malicious responsible frames reaches a minimum value; until the benign feature entropy corresponding to the benign responsibility frame set composed of all the benign responsibility frames reaches the minimum value.
  • the present application also provides a video classification method, including: using the method for extracting responsible frames described above to extract a preset number of responsible frames from the acquired video; The feature matrix of the responsible frame is used to classify the video.
  • classifying the video according to the feature matrix of the preset number of responsible frames includes: performing a maximum pooling operation on the feature matrix of the preset number of responsible frames to obtain a set of responsible frames.
  • Feature matrix video classification is performed according to the feature matrix of the responsible frame set.
  • the performing video classification according to the feature matrix of the responsible frame set includes: inputting the feature matrix of the responsible frame set into a video classification model to perform video classification.
  • the video classification model is a random forest classification model.
  • the video classification method further includes displaying the classification result of the video and the extracted preset number of responsible frames.
  • the present application also provides an electronic device, including a processor and a memory, and a computer program is stored on the memory, and when the computer program is executed by the processor, the above-mentioned responsibility frame extraction is realized method or the video classification method described above.
  • the present application also provides a readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for extracting responsible frames or video Classification.
  • the responsible frame extraction method, video classification method, electronic equipment and storage medium provided by this application have the following advantages:
  • the responsible frame extraction method, electronic equipment and storage medium provided by the application by first obtaining the video to be extracted; then using the skeleton network of the static image classification neural network model to perform features on each frame of the image in the video to be extracted Extract to obtain the feature matrix of each frame of image; finally, extract a preset number of responsible frames according to the feature matrix of each frame of image.
  • the extracted responsible frames can lay the foundation for subsequent video classification. A good foundation effectively eliminates the interference caused by noise frame images on video classification during the video classification process.
  • the video classification method provided by this application extracts a preset number of responsible frames by using the above-mentioned responsible frame extraction method; and classifies the video according to the feature matrix of the extracted preset number of responsible frames . Since the video classification method provided by this application uses the above-mentioned responsible frame extraction method to extract a preset number of responsible frames, thus, the video classification method provided by this application has all the above-mentioned responsible frame extraction methods. advantage. In addition, since the video classification method provided by this application classifies videos based on the extracted preset number of responsible frames, it can effectively reduce the interference of noise frames in the video and effectively improve the accuracy of video classification .
  • FIG. 1 is a schematic flow diagram of a responsibility frame extraction method in an embodiment of the present application
  • Fig. 2 is a schematic diagram of an adjusted single frame image in the video to be extracted in a specific example
  • Fig. 3 is a schematic diagram of obtaining the feature matrix of each frame image in the video to be extracted in a specific example of the present application
  • Fig. 4 is a schematic diagram of acquiring video feature importance matrix and feature importance matrix of each frame image in a specific example of the present application
  • FIG. 5 is a schematic diagram of a specific flow of extracting a responsibility frame in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of obtaining the remaining feature importance matrix in a specific example of the present application.
  • FIG. 7 is a schematic diagram of a specific flow of extracting a responsibility frame in another embodiment of the present application.
  • Figure 8a is a schematic diagram of a production video feature entropy matrix in a specific example of the present application.
  • Fig. 8b is a schematic diagram of selecting the first frame responsibility frame in a specific example of the present application.
  • Fig. 8c is a schematic diagram of selecting the second frame responsibility frame in a specific example of the present application.
  • FIG. 9 is a flowchart of a responsibility frame extraction method provided in an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a medical image provided by a specific example of the present application.
  • Fig. 11 is a schematic diagram of the region of interest image extracted from Fig. 10;
  • FIG. 12 is a schematic flowchart of extracting malicious responsibility frames provided by an embodiment of the present application.
  • Fig. 13 is a schematic diagram of the relationship between the feature entropy of the responsible frame image set and the number of responsible frames provided by an embodiment of the present application;
  • FIG. 14 is a schematic diagram of a specific flow for extracting benign responsibility frames provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram of video classification using a random forest classifier provided in an embodiment of the present application.
  • FIG. 16 is a schematic diagram of an adjustment responsibility frame provided by an embodiment of the present application.
  • FIG. 17 is a schematic flow diagram of a video classification method in an embodiment of the present application.
  • FIG. 18 is a schematic block diagram of an electronic device in an embodiment of the present application.
  • the core idea of this application is to provide a responsible frame extraction method, video classification method, electronic equipment and storage medium, which can automatically find out in the video that contributes to different important features for video classification (such as the classification of benign and malignant nodules) frame of responsibility to improve the accuracy of video classification (such as the classification of benign and malignant nodule videos).
  • the responsible frame extraction method and the video classification method of the embodiments of the present application can be applied to the electronic device of the embodiment of the present application, wherein the electronic device can be a personal computer, a mobile terminal, etc., and the mobile terminal can be a mobile phone , Tablet PC and other hardware devices with various operating systems.
  • the electronic device can be a personal computer, a mobile terminal, etc.
  • the mobile terminal can be a mobile phone , Tablet PC and other hardware devices with various operating systems.
  • this article takes the example of extracting a preset number of responsible frames from medical videos that can contribute different features to the classification of medical videos, as those skilled in the art can understand, this The application can also extract a preset number of responsible frames that can contribute different features to the classification of videos in other fields than medical videos, and this application does not limit this .
  • the present application provides a responsibility frame extraction method, please refer to FIG. 1 , which schematically shows a flow diagram of the responsibility frame extraction method provided by an embodiment of the present application.
  • the responsibility frame extraction method includes the following steps:
  • Step S110 acquiring the video to be extracted.
  • Step S120 using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of image in the video to be extracted, so as to obtain a feature matrix of each frame of image.
  • Step S130 extracting a preset number of responsible frames according to the feature matrix of each frame of image.
  • the video to be extracted can be an ultrasound scan video (such as scan data of breast cancer, thyroid nodule, etc.), of course, as those skilled in the art can understand, the video to be extracted can also be other medical
  • the medical video collected by an imaging device for example, the medical video collected by an endoscope, etc.
  • the video to be extracted may also be a non-medical video, which is not limited in this application. Therefore, the responsibility frame extraction method provided by this application can automatically extract multiple responsibility frames with non-repetitive contribution features, and realizes that it can extract Responsibility frames with diverse features can be extracted, which can lay a good foundation for subsequent video classification, and effectively eliminate the interference caused by noise frame images on video classification during the video classification process. For example, by extracting a preset number of responsible frames from an ultrasound video of a thyroid nodule, a good foundation can be laid for subsequent accurate judgment of whether the thyroid nodule in the ultrasound video is benign or malignant.
  • the method It also includes adjusting the size of each frame of image in the video to be extracted, so as to adjust the size of each frame of image in the video to be extracted to a preset size.
  • the preset size can be set according to specific conditions, which is not limited in this application.
  • the size of the adjusted video to be extracted is 100 ⁇ 224 ⁇ 224 ⁇ 3 (number of frames ⁇ width ⁇ height ⁇ number of channels).
  • FIG. 2 schematically shows a schematic diagram of an adjusted single-frame image in a video to be extracted in a specific example.
  • the static image classification neural network model includes a skeleton network for feature extraction and a classification network for classification.
  • the skeleton network can use different convolutional neural networks, such as MobileNet network, DenseNet121 network, Xception network and so on.
  • the classification network includes at least one fully connected layer, and the fully connected layer is used to perform nonlinear mapping regression on the features extracted by the classification network to obtain classification results.
  • FIG. 3 schematically shows a schematic diagram of acquiring the feature matrix of each frame of image in the video to be extracted in a specific example of the present application.
  • the skeleton network in the static image classification neural network model performs multiple (for example N convolution) convolution operations on each frame image in the video to be extracted to obtain each
  • the feature matrix of the frame image can be represented by a 1 ⁇ k matrix, where k represents the feature dimension, which is determined by the structure of the static image classification neural network model.
  • the static image classification neural network model is obtained through the following steps of training:
  • the original training sample including an original sample image and a classification label corresponding to the original sample image
  • the pre-built static image classification neural network model is trained according to the expanded training samples and the initial values of the model parameters of the static image classification neural network model until the preset training end condition is satisfied.
  • a data amplification operation is required to increase the performance of the static image classification neural network model.
  • a random rigid transformation may be performed on the original sample image, specifically including: rotation, scaling, translation, flipping, and grayscale transformation. More specifically, the original sample image can be translated by -10 to 10 pixels, rotated by -10° to 10°, horizontally flipped, vertically flipped, scaled by 0.9 to 1.1 times, grayscale transformation, etc. to complete the training sample data Amplify.
  • the classification label does not need to be transformed when performing sample expansion, that is, it is obtained by different transformations of the same original sample image.
  • the classification labels corresponding to the obtained expanded sample images are all consistent with the classification labels corresponding to the original sample image.
  • the model parameters of the static image classification neural network model include two categories: feature parameters and hyperparameters.
  • the feature parameter is a parameter for learning image features.
  • Feature parameters include weight parameters and bias parameters.
  • Hyperparameters are parameters that are artificially set during training. Only by setting appropriate hyperparameters can feature parameters be learned from samples. Hyperparameters can include learning rate, number of hidden layers, convolution kernel size, number of training iterations, and batch size for each iteration. The learning rate can be thought of as a step size. For example, in this application, the learning rate can be set to 0.001, and the number of training iterations is 100.
  • the preset training end condition is that the error value between the predicted classification result of the sample image in the expanded training sample and the corresponding classification label converges to a preset error value.
  • the training process of the static image classification neural network model is a multi-cycle iterative process. Therefore, the training can be ended by setting the number of iterations, that is, the preset training end condition can also be that the number of iterations reaches the preset number of iterations.
  • the training of the pre-built static image classification neural network model according to the expanded training samples and the initial values of the model parameters of the static image classification neural network model includes:
  • the pre-built static image classification neural network model is trained using a stochastic gradient descent method.
  • this method of obtaining derivatives is the gradient descent method. Therefore, using the gradient descent method to train the static image classification neural network model can quickly and simply realize the training of the static image classification neural network model.
  • the gradient descent method is mainly used to train the static image classification neural network model, and then the back propagation algorithm is used to update and optimize the weight parameters and bias parameters in the static image classification neural network model.
  • the gradient descent method is used to judge that the place where the slope of the curve is the largest is the direction to reach the optimal value faster.
  • the backpropagation method uses the chain derivation method of probability to calculate the partial derivative to update the weight, and updates the parameters through continuous iterative training. to learn images.
  • the method of updating the weight parameters and bias parameters of the backpropagation algorithm is as follows:
  • y is the real value of the sample, is the predicted value of the output layer, Indicates the partial derivative of the output layer parameters;
  • W l represents the weight parameter of the l-th layer
  • ⁇ l+1 represents the sensitive value of the l+1-th layer
  • f'(z l ) represents the partial derivative of the l-th layer
  • W l and b l represent the weight parameter and bias parameter of layer l respectively
  • a l represents the output value of layer l
  • ⁇ l+1 represents the sensitive value of layer l+1.
  • the stochastic gradient descent method is used to train the pre-built static image classification neural network model, including:
  • Step 1 using the expanded training sample as the input of the static image classification neural network model, and obtaining the predicted classification result of the expanded sample image according to the initial value of the model parameter of the static image classification neural network model;
  • Step 2 Calculate a loss function value according to the predicted classification result of the expanded sample image and the classification label corresponding to the expanded sample image;
  • Step 3 judging whether the loss function value converges to a preset error value, if yes, the training ends, if not, adjust the model parameters of the static image classification neural network model, and set the static image classification neural network model's
  • the initial value of the model parameter is updated to the adjusted model parameter, and the step 1 is executed back.
  • the loss function value does not converge to the preset error value, it means that the static image classification neural network model is not accurate, and it is necessary to continue training the static image classification neural network model.
  • the loss function is the objective function used to optimize the neural network, and the neural network can learn better by minimizing the loss function. Because the static image classification neural network model needs to learn image features in a certain situation, that is, it needs to define a suitable loss function to learn effective features. This application uses the binary classification network loss function L(W,b) as the loss function.
  • the binary classification network loss function L(W,b) is as follows:
  • W and b represent the weight parameters and bias parameters of the static image classification neural network model
  • m is the number of training samples
  • m is a positive integer
  • x i represents the i-th training sample input
  • f W,b (x i ) represents the predicted classification result of the i-th training sample
  • y i represents the classification label of the i-th training sample.
  • the extraction of a preset number of responsible frames according to the feature matrix of each frame image includes:
  • performing the maximum pooling operation on the feature matrices of all frame images refers to taking the feature matrices of all frame images (for example, 100 frames) in the video to be extracted in the column direction (that is, the direction of the feature dimension)
  • the largest eigenvalue, to obtain the eigenvalue of each feature dimension is the 1 ⁇ k video feature matrix of the feature matrix of all frame images in the feature dimension of the largest eigenvalue, the obtained video feature matrix is integrated in each frame image Important information that can be contributed. Since the video is essentially a superposition of multiple frames of images, the feature information of the video is scattered in each frame of images, so the video feature matrix obtained by performing the maximum pooling operation on the feature matrices of all frame images represents the features of the video to be extracted.
  • extracting a preset number of responsible frames according to the feature matrix of each frame image and the video feature matrix including:
  • the importance value of each feature dimension can represent the importance of the feature of this feature dimension in the random forest classification model described below, defined by the random forest classification model, all positive numbers, of course, as in the art Personnel can understand that in some other implementations, the importance value of each feature dimension can also represent the importance of the features of this feature dimension in other classification models except the random forest classification model, and this application does not Not limited.
  • FIG. 4 schematically shows a schematic diagram of acquiring video feature importance matrix and feature importance matrix of each frame image in a specific example of the present application.
  • FIG. 5 schematically shows a specific flowchart of extracting a responsibility frame provided by an embodiment of the present application.
  • a preset number of responsible frames is extracted, including:
  • Step A1 using the video feature importance matrix as the current video feature importance matrix
  • Step B1 for each frame of image, subtracting the feature importance matrix of the frame image from the current video feature importance matrix to obtain the remaining feature importance matrix corresponding to the frame image;
  • Step C1 For each frame of image, add the eigenvalues of each feature dimension in the remaining feature importance matrix corresponding to the frame of image to obtain the remaining information entropy corresponding to the frame of image;
  • Step D1 taking the image with the smallest remaining information entropy as the current responsible frame
  • Step E1 using the remaining feature importance matrix corresponding to the current responsible frame as a new current video feature importance matrix
  • the eigenvalue of each feature dimension in the feature importance matrix can be regarded as the amount of information, and correspondingly, the eigenvalue of each feature dimension in the video feature importance matrix can be regarded as the entire video under this feature dimension
  • the total amount of information contributed, the eigenvalue of each feature dimension in the feature importance matrix of each frame image is regarded as the single frame information amount contributed by the frame image under this feature dimension.
  • the video feature importance matrix is subtracted from the feature importance matrix of the frame image, and the obtained matrix is the remaining feature importance matrix corresponding to the frame image.
  • the information amount i.e., feature value
  • the obtained sum is the residual information entropy after subtracting the frame of image from the video , find the frame that produces the smallest residual information entropy, that is, find the most important responsible frame.
  • finding the most important responsibility frame regard its corresponding remaining feature importance matrix as a new video feature importance matrix, and then use the same method to find out the second important responsibility frame, and then use the found second important responsibility
  • the corresponding remaining feature importance matrix is regarded as a new video feature importance matrix, and the same method is used to find out the next important responsible frame until a preset number of responsible frames are found.
  • the responsibility frame extraction method provided by this embodiment
  • Responsibility frames with diverse features can be extracted without defining the extraction distance between frames.
  • the responsible frames in the video that contribute different important features to video classification (such as the classification of benign and malignant nodules) are automatically found.
  • the responsibility frame extraction method provided has strong versatility, can be applied to various CNN (convolutional neural network) models, and has good applicability and transferability.
  • the subtracting the feature importance matrix of the frame image from the current video feature importance matrix to obtain the remaining feature importance matrix corresponding to the frame image includes:
  • the eigenvalue of each feature dimension in the feature importance matrix of the current video is subtracted from the eigenvalue of the corresponding feature dimension in the feature importance matrix of the frame image to obtain the eigenvalue difference of each feature dimension;
  • the eigenvalue difference of each feature dimension if the eigenvalue difference of the feature dimension is less than 0, then use 0 as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image; if the feature dimension’s If the eigenvalue difference is greater than or equal to 0, the eigenvalue difference of the feature dimension is used as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image.
  • FIG. 6 schematically shows a schematic diagram of obtaining the remaining feature importance matrix in a specific example of the present application.
  • Figure 6 by subtracting the eigenvalue of each feature dimension in the feature importance matrix of the video feature importance matrix from the eigenvalue of the corresponding feature dimension in the feature importance matrix of a frame image, the corresponding eigenvalue of the frame image can be obtained The remaining feature importance matrix.
  • the extraction of a preset number of responsible frames according to the feature matrix of each frame image includes:
  • a video can be regarded as a collection of a series of frames, the information of the entire video is scattered in each frame, and the contribution of each frame image on each feature dimension is represented by its feature matrix, where the number of feature dimensions is determined by the skeleton network It is determined that each feature dimension represents an image feature in a depth space (such as the feature of malignant nodules or benign nodules), and by multiplying the contribution weight value on the feature matrix, the contribution weight value can be determined by the static image classification neural network model The channel weight difference of the classification network (such as the fully connected layer) is determined.
  • the classification network is used to classify benign and malignant, and then one channel of the classification network corresponds to the malignant category, and the other channel corresponds to the benign category, wherein the corresponding malignant
  • the weight of the channel of the category is W 1
  • the weight of the channel corresponding to the benign category is W 0 .
  • the output Y pred predicted by the model in the basic CNN architecture can be expressed as:
  • Sigmoid represents the activation function
  • X represents the feature matrix
  • Y 0 represents the benign probability
  • Y 1 represents the malignant probability
  • MaxPooling maximum pooling
  • the feature value of each feature dimension in the video feature matrix above can also be directly multiplied by the contribution weight value of the feature dimension, to get the video feature entropy matrix.
  • extracting a preset number of responsible frames according to the feature entropy matrix of each frame image and the video feature entropy matrix including:
  • a preset number of responsible frames are extracted, wherein the evaluation score of the video to be extracted is related to the preset number of responsible frames
  • the resulting set of images has the smallest difference in evaluation scores.
  • FScore assertment score
  • A [frame a , frame b ,...frame n ]
  • the evaluation of the collection of images A The score FScore satisfies the following relationship:
  • the difference between the evaluation score of the video to be extracted and the evaluation score of the image set composed of the finally extracted multiple responsible frames is the smallest, it can not only ensure that the image set composed of multiple responsible frames contains The information is as close as possible to the entire video, and at the same time, it can also ensure that the selected responsible frames can directly form complementary features.
  • FIG. 7 schematically shows a specific flowchart of extracting the responsibility frame provided by another embodiment of the present application.
  • a preset number of responsible frames are extracted, including:
  • Step A2 For each frame of image, calculate the difference between the evaluation score of the video to be extracted and the evaluation score of the frame image, so as to obtain the feature entropy difference of the frame image;
  • Step B2 determining the image with the smallest feature entropy difference as the responsible frame
  • Step C2. Composing all responsible frames and each non-responsible frame into an image set respectively, and calculating the evaluation score of each image set respectively;
  • Step D2 For each image set, calculate the difference between the evaluation score of the video to be extracted and the evaluation score of the image set to obtain the feature entropy difference of the image set;
  • Step E2 determining all images in the image set with the smallest feature entropy difference as responsible frames
  • the maximum pooling operation can be performed on the feature entropy matrix of all frame images in the image set to obtain the feature entropy matrix of the image set. Add the eigenvalues of all feature dimensions of , and the resulting sum is the evaluation score of the image set.
  • all determined responsible frames top1, tpo2,...topi-1 can be combined with each remaining image frame (that is, each frame except the responsible frame images) form an image set respectively.
  • the difference between the evaluation score of the video to be extracted and the evaluation score of the image set is calculated by the following formula:
  • the feature entropy difference of each image set can be obtained, wherein the image set with the smallest feature entropy difference All the images in are responsible frames, that is, the remaining frame images in the image set with the smallest feature entropy difference are the responsible frames of the i-th frame.
  • the responsible frame extraction method provided in this embodiment can realize the extraction of multiple responsible frames whose contribution features are not repeated for video classification (such as the classification of benign and malignant nodule videos) without adding additional training parameters.
  • This embodiment can be applied to various CNN models, and has good applicability and portability.
  • FIG. 8a schematically shows a schematic diagram of obtaining a video feature entropy matrix in a specific example of the present application
  • FIG. 8b schematically shows the selection of the first Schematic diagram of a frame responsibility frame
  • FIG. 8c schematically shows a schematic diagram of selecting a second frame responsibility frame in a specific example of the present application.
  • the video to be extracted includes 3 frames of images, the total number of depth feature dimensions is 3, and the number of responsible frames to be extracted is 2.
  • the evaluation score FScore video of the video to be extracted is 24, and the evaluation score FScore frame1 of the first frame image is 16, the evaluation score FScore frame2 of the second frame image is 14, and the evaluation score FScore frame3 of the third frame image is 11.
  • the evaluation score FScore video of the video to be extracted is the same as the evaluation score of the first frame image
  • the difference between the value FScore frame1 is 8 (that is, the feature entropy difference of the first frame image is 8)
  • the difference between the evaluation score FScore video of the video to be extracted and the evaluation score FScore frame2 of the second frame image is 10 (that is, the feature entropy difference of the first frame image is 10)
  • the difference between the evaluation score FScore video of the video to be extracted and the evaluation score FScore frame3 of the third frame image is 13 (that is, the first frame The feature entropy difference of the image is 13)
  • the first frame image is determined as the first responsible frame.
  • the first frame responsibility frame i.e. the first frame image
  • the second frame image form an image set [frame1, frame2]
  • the feature entropy matrix of the first frame responsibility frame and the feature entropy matrix of the second frame image are performed.
  • the maximum pooling operation can obtain the feature entropy matrix of the image set [frame1, frame2].
  • the evaluation score FScore [frame1, frame2] of the image set [frame1, frame2] is 16.
  • the evaluation score FScore video of the video to be extracted is the same as the The difference between the evaluation scores FScore [frame1, frame2] of the image set [frame1, frame3] is 0 (that is, the feature entropy difference of the image set is 0). Since the feature entropy difference of the image set [frame1, frame3] composed of the first frame responsibility frame and the third frame image is smaller than the feature entropy difference of the image set [frame1, frame2] composed of the first frame responsibility frame and the second frame image , thus determining the third frame image as the responsible frame of the second frame.
  • the present application also provides a responsibility frame extraction method, please refer to FIG. 9, which schematically shows a flow chart of the responsibility frame extraction method provided by an embodiment of the present application, as shown in FIG. 9, the The method for extracting the responsibility frame comprises the following steps:
  • Step S210 using the object detection neural network model to extract the region of interest for each frame of medical image in the acquired medical video, so as to obtain the region of interest image corresponding to each frame of medical image.
  • Step S220 using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of the ROI image, so as to obtain a feature matrix of each frame of the ROI image.
  • Step S230 according to the feature matrix of each frame of the region of interest image, extract the malicious responsible frame until the first preset end condition is met; and/or perform the extraction of the benign responsible frame according to the feature matrix of each frame of the region of interest image , until the second preset end condition is met.
  • the responsibility frame extraction method provided by this application first adopts the target detection neural network model from the acquired medical
  • the image of the region of interest is extracted from each frame of the medical image of the video, and then the malicious responsible frame and/or the benign responsible frame are extracted according to the feature matrix of each frame of the region of interest image, which can effectively reduce the malignant responsible frame and/or benign responsible frame.
  • the interference of image noise in the process of extracting responsible frames further improves the efficiency and accuracy of extracting malicious responsible frames and/or benign responsible frames.
  • the target detection neural network model is used to extract the region of interest for each frame of medical image in the acquired medical video, so as to obtain the region of interest corresponding to each frame of medical image.
  • Area images including:
  • the corresponding region is cut out on each frame of medical image, so as to obtain the image of the region of interest corresponding to each frame of medical image.
  • the position information of the region of interest that is, the ultrasound window
  • the location information of the medical image can be cropped to the corresponding ROI image.
  • the method before using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of the region-of-interest image, the method further includes:
  • the region of interest needs to be The size of the image is adjusted, so as to adjust the size of the ROI image to a preset size.
  • the preset size can be set according to specific conditions, which is not limited in this application.
  • the height dimension of the image is consistent with the width dimension, that is, the image of the region of interest adjusted to the preset size is a square image, for example, the preset size is 448*448.
  • the ROI image may be filled with a "zero pixel" filling method, so as to adjust the width and height dimensions of the ROI image to be consistent.
  • the application provides the extraction efficiency of the responsibility frame extraction method. It should be noted that, as those skilled in the art can understand, the total number of frames of the region of interest image that can be processed in parallel each time is determined by the computing power of the GPU of the computer. The stronger the computing power of the GPU of the computer, the more The more total frames of ROI images that can be processed in parallel.
  • the extracting of the malicious responsible frame according to the feature matrix of each frame of the region-of-interest image until the first preset end condition is met includes:
  • the ROI image is acquired according to the feature matrix of the ROI image and the difference between the malignant feature weight parameter and the benign feature weight parameter corresponding to the static image classification neural network model
  • the malicious responsible frame is extracted until the first preset end condition is met.
  • the benign and malignant judgment of each frame of the region of interest image is based on the feature matrix of the region of interest image, and the output probability predicted by the static image classification neural network model Y pred can be expressed as:
  • Y 0 represents the probability that the ROI image belongs to the benign category
  • Y 1 represents the probability that the ROI image belongs to the malignant category
  • W 1 represents the probability that the static image classification neural network model belongs to.
  • the corresponding malignant feature weight parameters, W 0 represents the benign feature weight parameters corresponding to the static image classification neural network model
  • B 0 and B 1 represent the bias parameters corresponding to the static image classification neural network model.
  • the present application obtains the malignant feature matrix of the ROI image according to the feature matrix of the ROI image and the difference between the malignant feature weight parameters and the benign feature weight parameters corresponding to the static image classification neural network model , and then according to the malignant feature matrix of each frame of the region of interest image, the malicious responsible frame is extracted until the first preset end condition is met, so that the malicious responsible frame with a large amount of malicious contribution information can be accurately extracted.
  • the malignant feature weight parameter W 1 is a matrix with k malignant feature weights
  • the benign feature weight parameter W 0 is a matrix with k benign feature weights. That is, each feature dimension corresponds to a malignant feature weight and a benign feature weight.
  • the sense is obtained according to the feature matrix of the image of the region of interest and the difference between the malignant feature weight parameter and the benign feature weight parameter corresponding to the static image classification neural network model.
  • the malignant feature matrix of the ROI image including:
  • the malignant feature matrix of the image of the region of interest is obtained:
  • [FM] i represents the malignant feature matrix of the i-th frame ROI image
  • formula (3) Represents the eigenvalue of the jth feature dimension in the feature matrix of the i-th frame region of interest image, Indicates the malignant feature weight of the jth feature dimension corresponding to the static image classification neural network model, Indicates the benign feature weight of the jth feature dimension corresponding to the static image classification neural network model, Indicates the malignant feature value of the jth feature dimension in the malignant feature matrix of the i-th frame ROI image.
  • the extracting of the malicious responsible frame according to the malignant feature matrix of each frame of the region-of-interest image until the first preset end condition is met includes:
  • the malicious responsible frame is extracted until the first preset end condition is met.
  • the malicious responsible frame is extracted according to the total malignant feature value of each frame of the region of interest image until the first preset is satisfied. End conditions, including:
  • Step A10 sorting the total malignant feature values of the ROI images of each frame, and determining the ROI image with the largest total malignant feature value as the malignant responsible frame;
  • Step A20 forming a first image set with all malicious responsible frames and each non-malignant responsible frame, and calculating the total malignant feature value of each first image set respectively, wherein the total malignant feature value of the first image set The value is equal to the sum of malignant feature values of all feature dimensions in the malignant feature matrix obtained after performing the maximum pooling operation on the malignant feature matrices of all frame ROI images in the first image set, and the non-malignant responsible frame is an image of a region of interest that is not determined to be a malicious frame;
  • Step A30 judging whether the malignant feature entropy corresponding to the first image set with the smallest total malignant feature value is greater than the malignant feature entropy corresponding to the malignant responsible frame set composed of all malignant responsible frames;
  • step A40 If not, then perform step A40, if so, then perform step A50;
  • Step A40 determine all frames of ROI images in the first image set with the smallest total malignant feature value as malignant responsible frames, and return to step A20;
  • Step A50 end the extraction of the malicious responsibility frame.
  • performing the maximum pooling operation on the malignant feature matrices of all frames of the region of interest images in the first image collection means that the malignant feature matrices of all the frames of the region of interest images in the first image collection are listed in the column
  • the direction (that is, the direction of the feature dimension) takes the maximum malignant eigenvalue, so that the malignant eigenvalue of each feature dimension is the maximum of the malignant feature matrix of all frame ROI images in the first image set in the feature dimension.
  • the responsible frame extraction method first identifies the image of the region of interest with the largest total malignant feature value as the first malicious responsible frame in the malicious responsible frame set (that is, the malicious responsible frame set), and then takes the remaining
  • Each frame of the ROI image that is not determined to be a malicious responsibility frame and the first malicious responsibility frame form a first image set (every first image set at this time includes the first malicious responsibility frame and A region of interest image not determined as a malignant responsible frame), and calculate the total malignant feature value of each first image set, then the sense of not determined as a malignant responsible frame in the first image set with the largest total malignant feature value
  • the ROI image is the second malicious frame in the malicious frame set.
  • the image sets all include the first malicious frame, the second frame and an image of the region of interest that is not determined to be a frame), by calculating the total malignant feature value of each first image set, you can find out The first image set with the largest total malignant feature value, if the malignant feature entropy of the first image set with the largest total malignant feature value is greater than the malicious responsibility frame set composed of the first malicious responsibility frame and the second malicious responsibility frame
  • the malicious feature entropy then end the extraction of the malicious responsibility frame, and take the extracted first malicious responsibility frame and the second malicious responsibility frame as the final malicious responsibility frame; if the first The malignant feature entropy of the image set is less than or equal to the malignant feature entropy of the malignant responsibility frame set composed of the first malicious responsibility frame and the second malicious responsibility frame, then the first image set with the largest
  • the malignant feature entropy of the image set is calculated according to the following formulas (6) and (7):
  • H 1 (A) -p 1 (A) ⁇ log 2 p 1 (A) (6)
  • H 1 (A) represents the malignant feature entropy of image set A
  • MScoreA represents the total malignant feature value of image set A
  • BScoreA represents the total benign feature value of image set A.
  • the present application can automatically extract the required number of malignant frames that can contribute important features to the classification of medical videos based on the content of the acquired medical video by judging whether it is necessary to stop the extraction of malignant responsible frames according to whether the feature entropy has increased.
  • Responsibility frame Please refer to FIG. 13 , which schematically shows the relationship between the feature entropy and the number of responsible frames provided by an embodiment of the present application.
  • the extraction of benign responsible frames according to the feature matrix of each frame of the region-of-interest image until the second preset end condition is met includes:
  • the ROI image is acquired according to the feature matrix of the ROI image and the difference between the benign feature weight parameter and the malignant feature weight parameter corresponding to the static image classification neural network model
  • the benign responsible frame is extracted until the second preset end condition is met.
  • the present application obtains the benign feature matrix of the image of the region of interest according to the feature matrix of the image of the region of interest and the difference between the benign feature weight parameter and the malignant feature weight parameter corresponding to the static image classification neural network model , and then extract the benign responsible frames according to the benign feature matrix of the ROI image in each frame, until the second preset end condition is met, so that the benign responsible frames with a large amount of benign contribution information can be accurately extracted.
  • the sensory information is obtained according to the feature matrix of the image of the region of interest and the difference between the benign feature weight parameters and the malignant feature weight parameters corresponding to the static image classification neural network model.
  • the benign feature matrix of the ROI image including:
  • [FB] i represents the benign feature matrix of the region-of-interest image of the i-th frame
  • formula (9) Represents the eigenvalue of the jth feature dimension in the feature matrix of the i-th frame region of interest image, Indicates the benign feature weight of the jth feature dimension corresponding to the static image classification neural network model, Indicates the malignant feature weight of the jth feature dimension corresponding to the static image classification neural network model, Indicates the benign eigenvalue of the jth feature dimension in the benign feature matrix of the region-of-interest image of the i-th frame.
  • the extraction of benign responsible frames according to the benign feature matrix of each frame of the region-of-interest image until the second preset end condition is met includes:
  • the benign responsible frame is extracted until the second preset end condition is satisfied.
  • FIG. 14 schematically shows a specific flowchart of extracting benign responsibility frames provided by an embodiment of the present application.
  • the extraction of benign responsible frames is carried out according to the total benign feature value of the region-of-interest image of each frame until the second preset end condition is met, including:
  • Step B10 sorting the total benign feature values of the ROI images of each frame, and determining the ROI image with the largest total benign feature value as the benign responsible frame;
  • Step B20 Composing all benign responsible frames and each non-benign responsible frame into a second image set, and calculating the total benign feature value of each second image set, wherein the total benign feature of the second image set The value is equal to the sum of the benign eigenvalues of all the feature dimensions in the benign feature matrix obtained after performing the maximum pooling operation on the benign feature matrices of all frame ROI images in the second image collection, and the non-benign responsible frame is an image of a region of interest that has not been determined to be a benign responsible frame;
  • Step B30 judging whether the benign feature entropy corresponding to the second image set with the smallest total benign feature value is greater than the benign feature entropy corresponding to the benign responsible frame set composed of all benign responsible frames;
  • step B40 If not, then perform step B40, if so, then perform step B50;
  • Step B40 determine all ROI images in the second image set with the smallest total benign feature value as benign responsible frames, and return to step B20;
  • Step B50 ending the extraction of benign responsibility frames.
  • performing the maximum pooling operation on the benign feature matrices of all frames of the region-of-interest images in the second image collection means that the benign feature matrices of all the frames of the region-of-interest images in the second image collection are listed in the column
  • the direction takes the maximum benign eigenvalue, so that the benign eigenvalue of each feature dimension is the maximum benign feature matrix of all frame ROI images in the second image collection in the feature dimension.
  • the responsible frame extraction method firstly identifies the ROI image with the largest total benign feature value as the first benign responsible frame in the set of benign responsible frames (that is, the set of benign responsible frames), and then takes the remaining
  • Each frame of the region-of-interest image that is not determined to be a benign responsible frame and the first benign responsible frame form a second image set (each second image set at this time includes the first benign responsible frame and A region of interest image not determined as a benign responsible frame), and calculate the total benign eigenvalues of each second image set, then the sense of not determined as a benign responsible frame in the second image set with the largest total benign eigenvalue
  • the ROI image is the second benign responsible frame in the set of benign responsible frames.
  • the first benign responsibility frame, the second benign responsibility frame and each frame of ROI images that are not determined to be benign responsibility frames form a second image set respectively (every second image set at this time)
  • the image sets all include the first benign responsible frame, the second benign responsible frame and an image of the region of interest that is not determined to be a benign responsible frame), by calculating the total benign eigenvalues of each second image set, you can find out
  • the second image set with the largest total benign feature value if the benign feature entropy of the second image set with the largest total benign feature value is greater than the benign responsible frame set composed of the first benign responsible frame and the second benign responsible frame benign feature entropy, the extraction of the benign responsibility frame ends, and the extracted first benign responsibility frame and the second benign responsibility frame are taken as the final benign responsibility frame; if the second benign responsibility frame with the largest total benign feature value
  • the benign feature entropy of the image set is less than or equal to the benign feature entropy of the benign responsible frame set composed of the first benign responsible frame and the second benign responsible frame
  • the benign feature entropy of the second image set with the largest total benign feature value is greater than the benign feature entropy of the benign responsible frame set composed of all benign responsible frames. Since visually identical ROI images usually share similar benign feature matrices, adding similar ROI images will not have a significant impact on the total benign feature value of the image collection, therefore, the benign The extraction method of responsibility frame will not repeatedly select similar benign responsibility frames.
  • the benign feature entropy of the image set is calculated according to the following formulas (12) and (13):
  • H 0 (A) represents the benign feature entropy of image set A
  • MScoreA represents the total malignant feature value of image set X
  • BScoreA represents the total benign feature value of image set A.
  • this application judges whether to stop the extraction of benign responsible frames according to whether the feature entropy has increased, and can automatically extract the required number of benign frames that can contribute important features to the classification of medical videos according to the content of the acquired medical video. Responsibility frame.
  • FIG. 16 schematically shows a software interface of a doctor adjusting responsibility frame provided by an embodiment of the present application.
  • the extracted malicious responsibility frames and/or benign responsibility frames can be displayed in the responsibility aspect recommendation window in the software interface, and the acquired medical videos to be classified can also be displayed in the video playback window in the software interface.
  • the doctor can access the adjacent frames near the responsible frame (that is, the image of the region of interest) near the responsible frame through the buttons of "previous frame" and "next frame”, and can choose to accept or reject the current frame (that is, the image of the region of interest currently accessed ) as the responsibility frame, the system will automatically record the responsibility frame confirmed by the doctor.
  • the inventors of the present application collected a total of 13702 2D ultrasound breast nodule images (including 9177 images from 2457 patients with benign pathology and 4545 images from 991 patients with malignant pathology), and 2141 breast ultrasound images Videos (including 1227 videos from 560 patients with benign pathology and 914 videos from 412 patients with malignant pathology) were used for training and validation of the still image classification neural network model and the video classification model.
  • the performance of the video classification method provided by this application was evaluated by using AUROC (area under the test operator curve), accuracy rate, sensitivity, and specificity indicators.
  • the verification results of the 50-fold crossover (dividing the data set into 5 equal parts on average, taking one part for each round as the test set and the rest as the training set) are shown in Table 1 below, and the results of the test set are shown in Table 2 below.
  • the AUROC, accuracy, Sensitivity and specificity are significantly better than AUROC, accuracy, sensitivity and specificity for benign and malignant classification of breast nodules based on the responsibility frame artificially selected by doctors.
  • FIG. 17 schematically shows a flowchart of the video classification method provided by an embodiment of the present application.
  • described video classification method comprises the steps:
  • Step S310 using the method for extracting responsible frames described above, extracting a preset number of responsible frames from the acquired medical video.
  • Step S320 classify the video according to the feature matrix of the preset number of responsible frames.
  • the video classification method provided by this application uses the above-mentioned responsible frame extraction method to extract a preset number of responsible frames, thus, the video classification method provided by this application has all the above-mentioned responsible frame extraction methods. advantage.
  • the video classification method provided by the present application is to classify videos according to the extracted preset number of responsible frames, it can effectively reduce the interference of noise frames in the medical video, and effectively improve video classification (such as Classification of benign and malignant nodule videos) accuracy.
  • the video classification according to the feature matrix of the preset number of responsible frames includes:
  • Video classification is performed according to the feature matrix of the responsible frame set.
  • the feature matrix contributed by all responsible frames can be obtained, that is, the feature matrix of the responsible frame set, and thus, according to the obtained feature of the responsible frame set matrix, video classification can be performed accurately.
  • the classification of the video according to the feature matrix of the responsible frame set includes:
  • the final video classification can be performed.
  • the video classification model is a random forest classification model.
  • the random forest classification model consists of multiple classification trees, and each classification tree classifies the input feature matrix.
  • the random forest classification model votes according to the classification results of all classification trees, and finally makes a judgment of benign and malignant lesions.
  • the video classification model may also be other classification models than the random forest classification model, which is not limited in this application.
  • the random forest classification model is obtained through pre-training, specifically, a video training set can be used (the video training set includes the feature matrix and the corresponding classification label of the responsible frame set of the video) Train a pre-built random forest classification model to get a video classification model.
  • ROC-AUC is 0.885 (95% CI: 0.830-0.939)
  • PR-AUC is 0.876 (95% CI: 0.831-0.927)
  • Accuarcy is 0.82
  • F1-Score is 0.819
  • all evaluation indicators are better than those when directly using video to predict benign and malignant.
  • ROC Receiveiver Operating Characteristic Curve
  • AUC Aera under the curve
  • ROC-AUC represents the area under the ROC curve
  • CI confidence interval
  • PR- AUC represents the area under the precision and recall curves.
  • ROC-AUC is 0.891 (95% CI: 0.835-0.947)
  • PR-AUC is 0.908 (95% CI: 0.876-0.940)
  • Accuarcy is 0.85
  • F1-Score is 0.838.
  • ROC-AUC and PR-AUC are basically the same (difference 0.002)
  • Accuarcy is 0.01 higher
  • F1-Score Significant improvement from 0.819 to 0.838. It can be seen that using different skeleton networks for feature extraction will have different effects on the prediction performance of the classification model, so the appropriate network model can be selected as the skeleton network according to the specific conditions of the classification model.
  • the video classification method further includes:
  • the classification result of the video and the extracted preset number of responsible frames are displayed.
  • the responsible frames on which video classification is based can be given, so that doctors can judge whether the obtained video classification results are accurate based on the extracted responsible frames , so as to further improve the accuracy of video classification.
  • the video is an ultrasound video
  • by outputting a preset number of responsible frames extracted from the ultrasound video it may help to further reduce the missed diagnosis rate and misdiagnosis rate during the ultrasound screening process.
  • the present application also provides an electronic device.
  • FIG. 18 schematically shows a block structure diagram of the electronic device provided in an embodiment of the present application.
  • the electronic device includes a processor 101 and a memory 103, and a computer program is stored on the memory 103.
  • the computer program is executed by the processor 101, the above-mentioned responsibility frame extraction is realized. method or video classification method. Since the electronic device provided by this application and the method for extracting responsible frames described above belong to the same inventive concept, the electronic device provided by this application has all the advantages of the method for extracting responsible frames described above, so no further details are given here. .
  • the electronic device further includes a communication interface 102 and a communication bus 104 , wherein the processor 101 , the communication interface 102 , and the memory 103 communicate with each other through the communication bus 104 .
  • the communication bus 104 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus 104 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 102 is used for communication between the electronic device and other devices.
  • the processor 101 mentioned in this application can be a central processing unit (Central Processing Unit, CPU), and can also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or any conventional processor, etc.
  • the processor 101 is the control center of the electronic device, connecting various parts of the entire electronic device with various interfaces and lines.
  • the memory 103 can be used to store the computer program, and the processor 101 implements various functions of the electronic device by running or executing the computer program stored in the memory 103 and calling the data stored in the memory 103. Function.
  • the memory 103 may include non-volatile and/or volatile memory.
  • Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • DDRSDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced SDRAM
  • SLDRAM Synchronous Chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • the present application also provides a readable storage medium, wherein a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, the method for extracting responsible frames or the video classification method described above can be implemented. Since the storage medium provided by this application and the method for extracting responsible frames described above belong to the same inventive concept, the storage medium provided by this application has all the advantages of the method for extracting responsible frames described above, so no further details are given here.
  • the readable storage medium in the embodiments of the present application may use any combination of one or more computer-readable media.
  • the readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connection with one or more wires, portable computer hard disk, hard disk, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • Computer program code for carrying out the operations of the present application may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via an Internet connection using an Internet service provider). ).
  • LAN local area network
  • WAN wide area network
  • Internet service provider e.g., via an Internet connection using an Internet service provider
  • the responsible frame extraction method, video classification method, electronic equipment and storage medium provided by this application have the following advantages:
  • the responsible frame extraction method, electronic equipment and storage medium provided by the application by first obtaining the video to be extracted; then using the skeleton network of the static image classification neural network model to perform features on each frame of the image in the video to be extracted Extract to obtain the feature matrix of each frame of image; finally, extract a preset number of responsible frames according to the feature matrix of each frame of image.
  • the extracted responsible frames can lay the foundation for subsequent video classification. A good foundation effectively eliminates the interference caused by noise frame images on video classification during the video classification process.
  • the video classification method provided by this application extracts a preset number of responsible frames by using the above-mentioned responsible frame extraction method; and classifies the video according to the feature matrix of the extracted preset number of responsible frames . Since the video classification method provided by this application uses the above-mentioned responsible frame extraction method to extract a preset number of responsible frames, thus, the video classification method provided by this application has all the above-mentioned responsible frame extraction methods. advantage. In addition, since the video classification method provided by this application classifies videos based on the extracted preset number of responsible frames, it can effectively reduce the interference of noise frames in the video and effectively improve the accuracy of video classification .
  • each block in a flowchart or block diagram may represent a module, a program segment, or a portion of code that includes one or more programmable components for implementing specified logical functions.
  • Executable instructions, the module, program segment or part of the code contains one or more executable instructions for realizing the specified logic function.
  • the functions noted in the block may occur out of the order noted in the figures.
  • each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in special purpose hardware-based systems that perform the specified functions or actions. implemented, or may be implemented by a combination of special purpose hardware and computer instructions.
  • the functional modules in the various embodiments herein can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.

Abstract

Provided in the present application are a responsibility frame extraction method, a video classification method, an electronic device and a storage medium. The responsibility frame extraction method comprises: acquiring a video to be subjected to extraction; performing feature extraction on each frame of image in said video by using a backbone network of a static image classification neural network model, so as to acquire a feature matrix of each frame of image; performing a maximum pooling operation on the feature matrices of all the frames of image, so as to acquire a video feature matrix of said video; and extracting a preset number of responsibility frames according to the feature matrix of each frame of image and the video feature matrix.

Description

责任帧提取方法、视频分类方法、设备和介质Responsible frame extraction method, video classification method, device and medium 技术领域technical field
本申请涉及图像处理技术领域,特别涉及一种责任帧提取方法、视频分类方法、电子设备和存储介质。The present application relates to the technical field of image processing, in particular to a responsible frame extraction method, a video classification method, electronic equipment and a storage medium.
背景技术Background technique
超声是疾病医学影像检查的常用手段,可用于各类组织和脏器的疾病诊断,具有适用疾病广泛、成本较CT、MRI等大型影像设备低的特点。同时超声硬件在便携性上不断升级,掌上超声的产品形态实现了功能与便携性的统一,适用于基层疾病筛查场景。然而因为超声图像颗粒度高,存在大量散斑噪声、伪影、衰减等问题,超声诊断难以规范化、标准化,十分依赖超声医师的临床经验。一级医院、社区医院、乡镇诊所等基层医疗机构缺少有经验的超声医师,难以对超声视频做出准确的良恶性判断。Ultrasound is a common means of disease medical imaging examination, which can be used for disease diagnosis of various tissues and organs. At the same time, ultrasound hardware has been continuously upgraded in terms of portability, and the product form of handheld ultrasound has realized the unity of function and portability, which is suitable for grassroots disease screening scenarios. However, due to the high granularity of ultrasound images, there are a large number of speckle noise, artifacts, attenuation and other problems, it is difficult to standardize and standardize ultrasound diagnosis, and it relies heavily on the clinical experience of sonographers. Grassroots medical institutions such as primary hospitals, community hospitals, and township clinics lack experienced sonographers, and it is difficult to make accurate benign and malignant judgments on ultrasound videos.
临床上,在超声医生进行初诊、复核、向主治医师传达诊断建议时,均会使用到医师从视频中抽取的责任帧(具有明显良恶性指征的影像图片)。一个理想的人工智能超声系统应当可以自动地给出判断视频良恶性所依据的责任帧,该功能一方面可以进一步降低医师的工作量,另一方面可以支持医师来判断是否采用AI判断的结果。因此,如何在视频中提取责任帧显的尤为重要。Clinically, when the sonographer conducts the initial diagnosis, review, and conveys diagnostic suggestions to the attending physician, the physician will use the responsibility frames (images with obvious benign and malignant indications) extracted from the video. An ideal artificial intelligence ultrasound system should be able to automatically provide the responsible frame for judging whether the video is benign or malignant. On the one hand, this function can further reduce the workload of doctors, and on the other hand, it can support doctors to judge whether to use the results of AI judgment. Therefore, how to extract responsible frames in video is particularly important.
发明内容Contents of the invention
本申请的目的在于提供一种责任帧提取方法、视频分类方法、电子设备和存储介质,可以自动地在视频中找出为视频分类(例如良恶性结节视频的分类)贡献出不同重要特征的责任帧,以提高视频分类(例如良恶性结节视频的分类)的准确性。The purpose of this application is to provide a responsible frame extraction method, video classification method, electronic equipment and storage medium, which can automatically find out in the video that contributes different important features to video classification (such as the classification of benign and malignant nodule videos). Responsibility frames to improve the accuracy of video classification, such as the classification of benign and malignant nodule videos.
为达到上述目的,本申请提供一种责任帧提取方法,包括:获取待提取视频;采用静态图像分类神经网络模型的骨架网络对所述待提取视频中的每一帧图像进行特征提取,以获取每一帧图像的特征矩阵;对所有帧图像的特征矩阵进行最大池化操作,以获取所述待提取视频的视频特征矩阵;根据每一帧图像的特征矩阵和所述视频特征矩阵,提取出预设数量的责任帧。In order to achieve the above object, the application provides a method for extracting responsible frames, including: obtaining the video to be extracted; using the skeleton network of the static image classification neural network model to perform feature extraction on each frame image in the video to be extracted, to obtain The feature matrix of each frame image; the maximum pooling operation is performed on the feature matrix of all frame images to obtain the video feature matrix of the video to be extracted; according to the feature matrix of each frame image and the video feature matrix, extract Preset number of responsible frames.
可选的,所述根据每一帧图像的特征矩阵和所述视频特征矩阵,提取出预设数量的责任帧,包括:将所述视频特征矩阵中的每个特征维度的特征值乘以该特征维度的重要性值,以获取视频特征重要性矩阵;针对每一帧图像,将该帧图像的特征矩阵中的每个特征维度的特征值乘以该特征维度的重要性值,以获取该帧图像的特征重要性矩阵;根据所述视频特征重要性矩阵和每一帧图像的特征重要性矩阵,提取出预设数量的责任帧。Optionally, extracting a preset number of responsible frames according to the feature matrix of each frame image and the video feature matrix includes: multiplying the feature value of each feature dimension in the video feature matrix by the The importance value of the feature dimension to obtain the video feature importance matrix; for each frame image, the feature value of each feature dimension in the feature matrix of the frame image is multiplied by the importance value of the feature dimension to obtain the A feature importance matrix of a frame image; extracting a preset number of responsible frames according to the video feature importance matrix and the feature importance matrix of each frame image.
可选的,所述根据所述视频特征重要性矩阵和每一帧图像的特征重要性矩阵,提取出预设数量的责任帧,包括:步骤A1、以所述视频特征重要性矩阵作为当前视频特征重要性矩阵;步骤B1、针对每一帧图像,将所述当前视频特征重要性矩阵减去该帧图像的特征重要性矩阵,以获取该帧图像所对应的剩余特征重要性矩阵;步骤C1、针对每一帧图像,将该帧图像所对应的剩余特征重要性矩阵中的各个特征维度的特征值相加,以获取该帧图像所对应的剩余信息熵;步骤D1、将剩余信息熵最小的图像作为当前责任帧;步骤E1、将所述当前责任帧所对应的剩余特征重要性矩阵作为新的当前视频特征重要性矩阵;重复上述步骤B1至步骤E1,直至提取出预设数量的责任帧。Optionally, extracting a preset number of responsible frames according to the video feature importance matrix and the feature importance matrix of each frame image includes: step A1, using the video feature importance matrix as the current video Feature importance matrix; step B1, for each frame of image, subtracting the feature importance matrix of the frame image from the current video feature importance matrix to obtain the remaining feature importance matrix corresponding to the frame image; step C1 1. For each frame image, add the eigenvalues of each feature dimension in the remaining feature importance matrix corresponding to the frame image to obtain the remaining information entropy corresponding to the frame image; step D1, minimize the remaining information entropy image as the current responsibility frame; step E1, using the remaining feature importance matrix corresponding to the current responsibility frame as a new current video feature importance matrix; repeat the above steps B1 to step E1 until a preset number of responsibility is extracted frame.
可选的,所述将所述当前视频特征重要性矩阵减去该帧图像的特征重要性矩阵,以获取该帧图像所对应的剩余特征重要性矩阵,包括:将所述当前视频特征重要性矩阵中的每一特征维度的特征值减去该帧图像的特征重要性矩阵中的对应特征维度的特征值,以获得每一特征维度的特征值差;针对每一特征维度的特征值差,若该特征维度的特征值差小于0,则将0作为该帧图像所对应的剩余特征重要性矩阵中的对应特征维度的特征值;若该特征维度的特征值差大于或等于0,则将该特征维度的特征值差作为该帧图像所对应的剩余特征重要性矩阵中的对应特征维度的特征值。Optionally, the subtracting the feature importance matrix of the frame image from the feature importance matrix of the current video to obtain the remaining feature importance matrix corresponding to the frame image includes: dividing the feature importance matrix of the current video The eigenvalue of each feature dimension in the matrix is subtracted from the eigenvalue of the corresponding feature dimension in the feature importance matrix of the frame image to obtain the eigenvalue difference of each feature dimension; for the eigenvalue difference of each feature dimension, If the eigenvalue difference of this feature dimension is less than 0, then use 0 as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image; if the eigenvalue difference of this feature dimension is greater than or equal to 0, then set The eigenvalue difference of the feature dimension is used as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image.
可选的,所述根据每一帧图像的特征矩阵,提取出预设数量的责任帧,包括:针对每一帧图像,将该帧图像的特征矩阵中的每个特征维度的特征值乘以该特征维度的贡献权重值,以获取该帧图像的特征熵矩阵;对所有帧图像的特征熵矩阵进行最大池化操作,以获取所述待提取视频的视频特征熵矩阵;根据每一帧图像的特征熵矩阵和所述视频特征熵矩阵,提取出预设数量的责任帧。Optionally, extracting a preset number of responsible frames according to the feature matrix of each frame image includes: for each frame image, multiplying the feature value of each feature dimension in the feature matrix of the frame image by The contribution weight value of the feature dimension to obtain the feature entropy matrix of the frame image; perform a maximum pooling operation on the feature entropy matrix of all frame images to obtain the video feature entropy matrix of the video to be extracted; according to each frame of image The feature entropy matrix and the video feature entropy matrix are used to extract a preset number of responsible frames.
可选的,所述根据每一帧图像的特征熵矩阵和所述视频特征熵矩阵,提取出预设数量的责任帧,包括:针对每一帧图像,将该帧图像的特征熵矩阵中的所有特征维度的特征值相加,以获取该帧图像的 评估分值;将所述视频特征熵矩阵中的所有特征维度的特征值相加,以获取所述待提取视频的评估分值;根据每一帧图像的评估分值和所述待提取视频的评估分值,提取出预设数量的责任帧,其中,所述待提取视频的评估分值与由所述预设数量的责任帧所构成的图像集合的评估分值的差值最小。Optionally, the extracting a preset number of responsible frames according to the feature entropy matrix of each frame image and the video feature entropy matrix includes: for each frame image, the The eigenvalues of all feature dimensions are added to obtain the evaluation score of the frame image; the eigenvalues of all the feature dimensions in the video feature entropy matrix are added to obtain the evaluation score of the video to be extracted; The evaluation score of each frame image and the evaluation score of the video to be extracted extract a preset number of responsible frames, wherein the evaluation score of the video to be extracted is the same as that determined by the preset number of responsible frames The resulting set of images has the smallest difference in evaluation scores.
可选的,所述根据每一帧图像的评估分值和所述待提取视频的评估分值,提取出预设数量的责任帧,包括:步骤A2、针对每一帧图像,计算所述待提取视频的评估分值与该帧图像的评估分值的差值,以获取该帧图像的特征熵差;步骤B2、将特征熵差最小的图像确定为责任帧;步骤C2、将所有的责任帧与每一非责任帧分别组成一图像集合,并分别计算每一图像集合的评估分值;步骤D2、针对每一图像集合,计算所述待提取视频的评估分值与该图像集合的评估分值的差值,以获取该图像集合的特征熵差;步骤E2、将特征熵差最小的图像集合中的所有图像确定为责任帧;重复上述步骤C2至步骤E2,直至提取出预设数量的责任帧。Optionally, the extraction of a preset number of responsible frames according to the evaluation score of each frame of image and the evaluation score of the video to be extracted includes: step A2, for each frame of image, calculating the Extract the difference between the evaluation score of the video and the evaluation score of the frame image to obtain the feature entropy difference of the frame image; step B2, determine the image with the smallest feature entropy difference as the responsibility frame; step C2, set all responsibility The frame and each non-responsible frame form an image set respectively, and calculate the evaluation score of each image set respectively; step D2, for each image set, calculate the evaluation score of the video to be extracted and the evaluation of the image set score difference to obtain the feature entropy difference of the image set; step E2, determine all the images in the image set with the smallest feature entropy difference as responsible frames; repeat the above steps C2 to step E2 until the preset number is extracted responsibility frame.
可选的,所述责任帧提取方法还包括:采用目标检测神经网络模型对所获取的待提取视频中的每一帧图像进行感兴趣区域的提取,以获取每一帧图像所对应的感兴趣区域图像;采用静态图像分类神经网络模型的骨架网络对每一帧感兴趣区域图像进行特征提取,以获取每一帧感兴趣区域图像的特征矩阵;根据各帧感兴趣区域图像的特征矩阵,进行恶性责任帧的提取,直至由所有的所述恶性责任帧所构成的恶性责任帧集合所对应的恶性特征熵达到最小值;和/或者根据各帧感兴趣区域图像的特征矩阵,进行良性责任帧的提取,直至由所有的所述良性责任帧所构成的良性责任帧集合所对应的良性特征熵达到最小值。Optionally, the responsible frame extraction method further includes: using a target detection neural network model to extract the region of interest for each frame of image in the acquired video to be extracted, so as to obtain the region of interest corresponding to each frame of image Region image; use the skeleton network of the static image classification neural network model to extract the features of each frame of the region of interest image to obtain the feature matrix of each frame of the region of interest image; according to the feature matrix of each frame of the region of interest image, perform The extraction of malicious responsible frames until the malignant feature entropy corresponding to the set of malicious responsible frames formed by all the malicious responsible frames reaches a minimum value; until the benign feature entropy corresponding to the benign responsibility frame set composed of all the benign responsibility frames reaches the minimum value.
为达到上述目的,本申请还提供一种视频分类方法,包括:采用上文所述的责任帧提取方法,从所获取的视频中提取出预设数量的责任帧;根据所述预设数量的责任帧的特征矩阵,进行视频的分类。In order to achieve the above purpose, the present application also provides a video classification method, including: using the method for extracting responsible frames described above to extract a preset number of responsible frames from the acquired video; The feature matrix of the responsible frame is used to classify the video.
可选的,所述根据所述预设数量的责任帧的特征矩阵,进行视频的分类,包括:对所述预设数量的责任帧的特征矩阵进行最大池化操作,以获取责任帧集合的特征矩阵;根据所述责任帧集合的特征矩阵进行视频的分类。Optionally, classifying the video according to the feature matrix of the preset number of responsible frames includes: performing a maximum pooling operation on the feature matrix of the preset number of responsible frames to obtain a set of responsible frames. Feature matrix: video classification is performed according to the feature matrix of the responsible frame set.
可选的,所述根据所述责任帧集合的特征矩阵进行视频的分类,包括:将所述责任帧集合的特征矩阵输入视频分类模型中,以进行视频的分类。Optionally, the performing video classification according to the feature matrix of the responsible frame set includes: inputting the feature matrix of the responsible frame set into a video classification model to perform video classification.
可选的,所述视频分类模型为随机森林分类模型。Optionally, the video classification model is a random forest classification model.
可选的,所述视频分类方法还包括对所述视频的分类结果以及所提取出的预设数量的责任帧进行显示。Optionally, the video classification method further includes displaying the classification result of the video and the extracted preset number of responsible frames.
为达到上述目的,本申请还提供一种电子设备,包括处理器和存储器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时,实现上文所述的责任帧提取方法或上文所述的视频分类方法。In order to achieve the above purpose, the present application also provides an electronic device, including a processor and a memory, and a computer program is stored on the memory, and when the computer program is executed by the processor, the above-mentioned responsibility frame extraction is realized method or the video classification method described above.
为达到上述目的,本申请还提供一种可读存储介质,所述可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时,实现上文所述的责任帧提取方法或视频分类方法。In order to achieve the above purpose, the present application also provides a readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for extracting responsible frames or video Classification.
与现有技术相比,本申请提供的责任帧提取方法、视频分类方法、电子设备和存储介质具有以下优点:Compared with the prior art, the responsible frame extraction method, video classification method, electronic equipment and storage medium provided by this application have the following advantages:
(1)本申请提供的责任帧提取方法、电子设备和存储介质,通过先获取待提取视频;再采用静态图像分类神经网络模型的骨架网络对所述待提取视频中的每一帧图像进行特征提取,以获取每一帧图像的特征矩阵;最后根据每一帧图像的特征矩阵,提取出预设数量的责任帧。由此,可以自动提取出贡献特征不重复的多张责任帧,实现了不需人为定义隔帧提取距离即可以提取出特征多样化的责任帧,提取出的责任帧可以为后续的视频分类奠定良好的基础,有效消除了在视频分类过程中,噪声帧图像对视频分类所造成的干扰。(1) The responsible frame extraction method, electronic equipment and storage medium provided by the application, by first obtaining the video to be extracted; then using the skeleton network of the static image classification neural network model to perform features on each frame of the image in the video to be extracted Extract to obtain the feature matrix of each frame of image; finally, extract a preset number of responsible frames according to the feature matrix of each frame of image. As a result, multiple responsible frames with non-repetitive contribution features can be automatically extracted, and it is possible to extract responsible frames with diverse features without manually defining the extraction distance between frames. The extracted responsible frames can lay the foundation for subsequent video classification. A good foundation effectively eliminates the interference caused by noise frame images on video classification during the video classification process.
(2)本申请提供的视频分类方法通过采用上文所述的责任帧提取方法提取出预设数量的责任帧;并根据所提取出的预设数量的责任帧的特征矩阵,进行视频的分类。由于本申请提供的视频分类方法是采用上文所述的责任帧提取方法提取出预设数量的责任帧,由此,本申请提供的视频分类方法具有上文所述的责任帧提取方法的所有优点。此外,由于本申请提供的视频分类方法是根据所提取出的预设数量的责任帧进行视频的分类,由此可以有效减少所述视频中的噪声帧的干扰,有效提高了视频分类的准确率。(2) The video classification method provided by this application extracts a preset number of responsible frames by using the above-mentioned responsible frame extraction method; and classifies the video according to the feature matrix of the extracted preset number of responsible frames . Since the video classification method provided by this application uses the above-mentioned responsible frame extraction method to extract a preset number of responsible frames, thus, the video classification method provided by this application has all the above-mentioned responsible frame extraction methods. advantage. In addition, since the video classification method provided by this application classifies videos based on the extracted preset number of responsible frames, it can effectively reduce the interference of noise frames in the video and effectively improve the accuracy of video classification .
附图说明Description of drawings
图1为本申请一实施方式中的责任帧提取方法的流程示意图;FIG. 1 is a schematic flow diagram of a responsibility frame extraction method in an embodiment of the present application;
图2为一具体示例中的调整后的待提取视频中的单帧图像的示意图;Fig. 2 is a schematic diagram of an adjusted single frame image in the video to be extracted in a specific example;
图3为本申请一具体示例中的获取待提取视频中的每一帧图像的特征矩阵的示意图;Fig. 3 is a schematic diagram of obtaining the feature matrix of each frame image in the video to be extracted in a specific example of the present application;
图4为本申请一具体示例中的获取视频特征重要性矩阵和每一帧图像的特征重要性矩阵的示意图;Fig. 4 is a schematic diagram of acquiring video feature importance matrix and feature importance matrix of each frame image in a specific example of the present application;
图5为本申请一实施方式中的提取责任帧的具体流程示意图;FIG. 5 is a schematic diagram of a specific flow of extracting a responsibility frame in an embodiment of the present application;
图6为本申请一具体示例中的获取剩余特征重要性矩阵的示意图;6 is a schematic diagram of obtaining the remaining feature importance matrix in a specific example of the present application;
图7为本申请另一实施方式中的提取责任帧的具体流程示意图;FIG. 7 is a schematic diagram of a specific flow of extracting a responsibility frame in another embodiment of the present application;
图8a为本申请一具体示例中的生产视频特征熵矩阵的示意图;Figure 8a is a schematic diagram of a production video feature entropy matrix in a specific example of the present application;
图8b为本申请一具体示例中的选取第一帧责任帧的示意图;Fig. 8b is a schematic diagram of selecting the first frame responsibility frame in a specific example of the present application;
图8c为本申请一具体示例中的选取第二帧责任帧的示意图;Fig. 8c is a schematic diagram of selecting the second frame responsibility frame in a specific example of the present application;
图9为本申请一实施方式提供的责任帧提取方法的流程图;FIG. 9 is a flowchart of a responsibility frame extraction method provided in an embodiment of the present application;
图10为本申请一具体示例提供的医学图像的示意图;Fig. 10 is a schematic diagram of a medical image provided by a specific example of the present application;
图11为从图10中提取出的感兴趣区域图像的示意图;Fig. 11 is a schematic diagram of the region of interest image extracted from Fig. 10;
图12为本申请一实施方式提供的提取恶性责任帧的具体流程示意图;FIG. 12 is a schematic flowchart of extracting malicious responsibility frames provided by an embodiment of the present application;
图13为本申请一实施方式提供的责任帧图像集合的特征熵与责任帧数量之间的关系示意图;Fig. 13 is a schematic diagram of the relationship between the feature entropy of the responsible frame image set and the number of responsible frames provided by an embodiment of the present application;
图14为本申请一实施方式提供的提取良性责任帧的具体流程示意图;FIG. 14 is a schematic diagram of a specific flow for extracting benign responsibility frames provided by an embodiment of the present application;
图15为本申请一实施方式提供的采用随机森林分类器进行视频分类的示意图;FIG. 15 is a schematic diagram of video classification using a random forest classifier provided in an embodiment of the present application;
图16为本申请一实施方式提供的调整责任帧的示意图;FIG. 16 is a schematic diagram of an adjustment responsibility frame provided by an embodiment of the present application;
图17为本申请一实施方式中的视频分类方法的流程示意图;FIG. 17 is a schematic flow diagram of a video classification method in an embodiment of the present application;
图18为本申请一实施方式中的电子设备的方框结构示意图。FIG. 18 is a schematic block diagram of an electronic device in an embodiment of the present application.
其中,附图标记如下:Wherein, the reference signs are as follows:
处理器-101;通信接口-102;存储器-103;通信总线104。Processor-101; communication interface-102; memory-103; communication bus 104.
具体实施方式Detailed ways
以下结合附图和具体实施方式对本申请提出的责任帧提取方法、视频分类方法、电子设备和存储介质作进一步详细说明。根据下面说明,本申请的优点和特征将更清楚。需要说明的是,附图采用非常简化的形式且均使用非精准的比例,仅用以方便、明晰地辅助说明本申请实施方式的目的。为了使本申请的目的、特征和优点能够更加明显易懂,请参阅附图。须知,本说明书所附图式所绘示的结构、比例、大小等,均仅用以配合说明书所揭示的内容,以供熟悉此技术的人士了解与阅读,并非用以限定本申请实施的限定条件,任何结构的修饰、比例关系的改变或大小的调整,在与本申请所能产生的功效及所能达成的目的相同或近似的情况下,均应仍落在本申请所揭示的技术内容能涵盖的范围内。The responsible frame extraction method, video classification method, electronic equipment, and storage medium proposed in this application will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. The advantages and features of the present application will become clearer from the following description. It should be noted that the drawings are in a very simplified form and all use imprecise scales, which are only used to facilitate and clearly assist the purpose of illustrating the embodiments of the present application. In order to make the object, features and advantages of the present application more comprehensible, please refer to the accompanying drawings. It should be noted that the structures, proportions, sizes, etc. shown in the drawings attached to this specification are only used to match the content disclosed in the specification, for those who are familiar with this technology to understand and read, and are not used to limit the implementation of this application. Conditions, any modification of structure, change of proportional relationship or adjustment of size, under the same or similar situation as the effect and purpose that this application can produce, should still fall within the technical content disclosed in this application. within the range that can be covered.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
本申请的核心思想在于提供一种责任帧提取方法、视频分类方法、电子设备和存储介质,可以自动地在视频中找出为视频分类(例如良恶性结节视频的分类)贡献出不同重要特征的责任帧,以提高视频分类(例如良恶性结节视频的分类)的准确性。The core idea of this application is to provide a responsible frame extraction method, video classification method, electronic equipment and storage medium, which can automatically find out in the video that contributes to different important features for video classification (such as the classification of benign and malignant nodules) frame of responsibility to improve the accuracy of video classification (such as the classification of benign and malignant nodule videos).
需要说明的是,本申请实施方式的责任帧提取方法和视频分类方法可应用于本申请实施方式的电子设备上,其中,该电子设备可以是个人计算机、移动终端等,该移动终端可以是手机、平板电脑等具有各种操作系统的硬件设备。此外,需要说明的是,虽然本文是以从医学视频中提取出预设数量的能够为医学视频的分类贡献不同特征的责任帧为例进行说明,但是如本领域技术人员所能理解的,本申请还可以从除医学视频以外的其它领域的视频中提取出预设数量的能够为除医学视频的分类以外的其它领域的视频的分类贡献不同特征的责任帧,本申请对此并不进行限定。It should be noted that the responsible frame extraction method and the video classification method of the embodiments of the present application can be applied to the electronic device of the embodiment of the present application, wherein the electronic device can be a personal computer, a mobile terminal, etc., and the mobile terminal can be a mobile phone , Tablet PC and other hardware devices with various operating systems. In addition, it should be noted that although this article takes the example of extracting a preset number of responsible frames from medical videos that can contribute different features to the classification of medical videos, as those skilled in the art can understand, this The application can also extract a preset number of responsible frames that can contribute different features to the classification of videos in other fields than medical videos, and this application does not limit this .
为实现上述思想,本申请提供一种责任帧提取方法,请参考图1,其示意性地给出了本申请一实施方式提供的责任帧提取方法的流程示意图。如图1所示,所述责任帧提取方法包括如下步骤:In order to realize the above idea, the present application provides a responsibility frame extraction method, please refer to FIG. 1 , which schematically shows a flow diagram of the responsibility frame extraction method provided by an embodiment of the present application. As shown in Figure 1, the responsibility frame extraction method includes the following steps:
步骤S110、获取待提取视频。Step S110, acquiring the video to be extracted.
步骤S120、采用静态图像分类神经网络模型的骨架网络对所述待提取视频中的每一帧图像进行特征提取,以获取每一帧图像的特征矩阵。Step S120, using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of image in the video to be extracted, so as to obtain a feature matrix of each frame of image.
步骤S130、根据每一帧图像的特征矩阵,提取出预设数量的责任帧。Step S130, extracting a preset number of responsible frames according to the feature matrix of each frame of image.
具体地,所述待提取视频可以为超声扫查视频(例如乳腺癌、甲状腺结节等扫查数据),当然,如本领域技术人员所能理解的,所述待提取视频还可以为其它医学影像设备采集的医学视频,例如,内窥镜采集的医学视频等,此外,所述待提取视频还可以为非医学视频,本申请对此并不进行限定。由此,本申请提供的责任帧提取方法可以自动提取出贡献特征不重复的多张责任帧,实现了不需人为定义隔帧提取距离(即每隔多少帧提取一张责任帧)即可以提取出特征多样化的责任帧,提取出的责任帧可以为后续的视频分类奠定良好的基础,有效消除了在视频分类过程中,噪声帧图像对视频分类所造成的干扰。例如,通过在甲状腺结节的超声视频中提取出预设数量的责任帧,可以为后续准确的判断该超声视频中的甲状腺结节是良性的还是恶性的奠定良好的基础。Specifically, the video to be extracted can be an ultrasound scan video (such as scan data of breast cancer, thyroid nodule, etc.), of course, as those skilled in the art can understand, the video to be extracted can also be other medical The medical video collected by an imaging device, for example, the medical video collected by an endoscope, etc. In addition, the video to be extracted may also be a non-medical video, which is not limited in this application. Therefore, the responsibility frame extraction method provided by this application can automatically extract multiple responsibility frames with non-repetitive contribution features, and realizes that it can extract Responsibility frames with diverse features can be extracted, which can lay a good foundation for subsequent video classification, and effectively eliminate the interference caused by noise frame images on video classification during the video classification process. For example, by extracting a preset number of responsible frames from an ultrasound video of a thyroid nodule, a good foundation can be laid for subsequent accurate judgment of whether the thyroid nodule in the ultrasound video is benign or malignant.
进一步地,由于所述神经网络模型需要统一大小的图像作为输入,因此,在采用静态图像分类神经网络模型的骨架网络对所述待提取视频中的每一帧图像进行特征提取之前,所述方法还包括对所述待提取视频中的每一帧图像的尺寸进行调整,以将所述待提取视频中的每一帧图像的尺寸调整至预设尺寸。所述预设尺寸可以根据具体情况进行设置,本申请对此并不进行限定。作为一种示例,调整后的待提取视频的尺寸为100×224×224×3(帧数×宽×高×通道数)。请参考图2,其示意性地给出了一具体示例中的调整后的待提取视频中的单帧图像的示意图。Further, since the neural network model requires an image of uniform size as input, before using the skeleton network of the static image classification neural network model to perform feature extraction on each frame image in the video to be extracted, the method It also includes adjusting the size of each frame of image in the video to be extracted, so as to adjust the size of each frame of image in the video to be extracted to a preset size. The preset size can be set according to specific conditions, which is not limited in this application. As an example, the size of the adjusted video to be extracted is 100×224×224×3 (number of frames×width×height×number of channels). Please refer to FIG. 2 , which schematically shows a schematic diagram of an adjusted single-frame image in a video to be extracted in a specific example.
在一种示范性的实施方式中,所述静态图像分类神经网络模型包括用于进行特征提取的骨架网络和用于进行分类的分类网络。其中,所述骨架网络可以选用不同的卷积神经网络,例如MobileNet网络、DenseNet121网络、Xception网络等。关于MobileNet网络、DenseNet121网络、Xception网络的更多内容可以参考现有技术,故本文对此不再进行赘述。所述分类网络包括至少一个全连接层,所述全连接层用于对所述分类网络所提取出的特征进行非线性映射回归,以获取分类结果。请参考图3,其示意性地给出了本申请一具体示例中的获取待提取视频中的每一帧图像的特征矩阵的示意图。如图3所示,通过所述静态图像分类神经网络模型中的骨架网络对所述待提取视频中的每一帧图像进行多次(例如N次卷积)卷积操作,即可获取每一帧图像的特征矩阵,每一帧图像的特征矩阵可以用一个1×k的矩阵表示,其中k表示特征维度,由静态图像分类神经网络模型的结构决定。In an exemplary implementation, the static image classification neural network model includes a skeleton network for feature extraction and a classification network for classification. Wherein, the skeleton network can use different convolutional neural networks, such as MobileNet network, DenseNet121 network, Xception network and so on. For more information about the MobileNet network, DenseNet121 network, and Xception network, you can refer to the existing technology, so this article will not repeat them. The classification network includes at least one fully connected layer, and the fully connected layer is used to perform nonlinear mapping regression on the features extracted by the classification network to obtain classification results. Please refer to FIG. 3 , which schematically shows a schematic diagram of acquiring the feature matrix of each frame of image in the video to be extracted in a specific example of the present application. As shown in Figure 3, the skeleton network in the static image classification neural network model performs multiple (for example N convolution) convolution operations on each frame image in the video to be extracted to obtain each The feature matrix of the frame image, the feature matrix of each frame image can be represented by a 1×k matrix, where k represents the feature dimension, which is determined by the structure of the static image classification neural network model.
具体地,所述静态图像分类神经网络模型通过以下步骤训练得到:Specifically, the static image classification neural network model is obtained through the following steps of training:
获取原始训练样本,所述原始训练样本包括原始样本图像和与所述原始样本图像对应的分类标签;Obtaining an original training sample, the original training sample including an original sample image and a classification label corresponding to the original sample image;
对所述原始训练样本进行扩展,以获取扩展后的训练样本;expanding the original training samples to obtain expanded training samples;
设置静态图像分类神经网络模型的模型参数的初始值;Set the initial values of the model parameters of the static image classification neural network model;
根据所述扩展后的训练样本和所述静态图像分类神经网络模型的模型参数的初始值对预先搭建的静态图像分类神经网络模型进行训练,直至满足预设训练结束条件。The pre-built static image classification neural network model is trained according to the expanded training samples and the initial values of the model parameters of the static image classification neural network model until the preset training end condition is satisfied.
由于原始训练样本的数据有限,而深度学习需要在一定数据上进行学习才能具有一定的鲁棒性,为了增加鲁棒性,需要做数据扩增操作,以增加所述静态图像分类神经网络模型的泛化能力。具体地,可以通过对所述原始样本图像进行随机刚性变换,具体包括:旋转、缩放、平移、翻转和灰度变换。更具体地,可以对所述原始样本图像平移-10到10个像素、旋转-10°到10°、水平翻转、垂直翻转、缩放0.9到1.1倍、灰度变换等以完成对训练样本的数据扩增。需要说明的是,由于对原始样本图像所进行的随机刚性变换不会对其分类结果造成影响,因此在进行样本扩展时,分类标签不需进行变换,即由同一个原始样本图像进行不同变换所得到的多个扩展后的样本图像所对应的分类标签都是与该原始样本图像所对应的分类标签一致的。Due to the limited data of the original training samples, deep learning needs to learn on certain data to have a certain robustness. In order to increase the robustness, a data amplification operation is required to increase the performance of the static image classification neural network model. Generalization. Specifically, a random rigid transformation may be performed on the original sample image, specifically including: rotation, scaling, translation, flipping, and grayscale transformation. More specifically, the original sample image can be translated by -10 to 10 pixels, rotated by -10° to 10°, horizontally flipped, vertically flipped, scaled by 0.9 to 1.1 times, grayscale transformation, etc. to complete the training sample data Amplify. It should be noted that since the random rigid transformation performed on the original sample image will not affect its classification results, the classification label does not need to be transformed when performing sample expansion, that is, it is obtained by different transformations of the same original sample image. The classification labels corresponding to the obtained expanded sample images are all consistent with the classification labels corresponding to the original sample image.
静态图像分类神经网络模型的模型参数包括两类:特征参数和超参数。特征参数是用于学习图像特征的参数。特征参数包括权重参数和偏置参数。超参数是在训练时人为设置的参数,只有设置合适的超参数才能从样本中学到特征参数。超参数可以包括学习率、隐藏层个数、卷积核大小、训练迭代次数,每次迭代批次大小。学习率可以看作是步长。举例而言,本申请可以设置学习率为0.001,训练迭代次数为100次。The model parameters of the static image classification neural network model include two categories: feature parameters and hyperparameters. The feature parameter is a parameter for learning image features. Feature parameters include weight parameters and bias parameters. Hyperparameters are parameters that are artificially set during training. Only by setting appropriate hyperparameters can feature parameters be learned from samples. Hyperparameters can include learning rate, number of hidden layers, convolution kernel size, number of training iterations, and batch size for each iteration. The learning rate can be thought of as a step size. For example, in this application, the learning rate can be set to 0.001, and the number of training iterations is 100.
所述预设训练结束条件为扩展后的训练样本中的样本图像的预测分类结果与对应的分类标签的误差值收敛到预设误差值。此外,静态图像分类神经网络模型的训练过程为一个多次循环迭代过程,因此,可以通过设置迭代多少次结束训练,即预设训练结束条件还可以为迭代次数达到预设迭代次数。The preset training end condition is that the error value between the predicted classification result of the sample image in the expanded training sample and the corresponding classification label converges to a preset error value. In addition, the training process of the static image classification neural network model is a multi-cycle iterative process. Therefore, the training can be ended by setting the number of iterations, that is, the preset training end condition can also be that the number of iterations reaches the preset number of iterations.
进一步地,所述根据所述扩展后的训练样本和所述静态图像分类神经网络模型的模型参数的初始值对预先搭建的静态图像分类神经网络模型进行训练,包括:Further, the training of the pre-built static image classification neural network model according to the expanded training samples and the initial values of the model parameters of the static image classification neural network model includes:
根据所述扩展后的训练样本和所述静态图像分类神经网络模型的模型参数的初始值采用随机梯度下降法对预先搭建的静态图像分类神经网络模型进行训练。According to the expanded training samples and the initial values of the model parameters of the static image classification neural network model, the pre-built static image classification neural network model is trained using a stochastic gradient descent method.
由于,模型训练过程实际上是最小化损失函数的过程,而求导数可以快速简单地实现这个目标,这种求导数的方法就是梯度下降法。由此,采用梯度下降法来训练静态图像分类神经网络模型,可以快速简单地实现静态图像分类神经网络模型的训练。Since the model training process is actually the process of minimizing the loss function, and obtaining derivatives can quickly and easily achieve this goal, this method of obtaining derivatives is the gradient descent method. Therefore, using the gradient descent method to train the static image classification neural network model can quickly and simply realize the training of the static image classification neural network model.
本申请深度学习中主要利用梯度下降法来训练静态图像分类神经网络模型,然后再使用反向传播算法来更新优化静态图像分类神经网络模型中的权重参数和偏置参数。采用梯度下降法判断曲线斜率最大的地方为越快到达最优值的方向,反向传播法是采用概率学的链式求导方法来求偏导数以更新权重,通过不断迭代训练来更新参数,以学习到图像。反向传播算法更新权重参数和偏置参数的方法如下:In the deep learning of this application, the gradient descent method is mainly used to train the static image classification neural network model, and then the back propagation algorithm is used to update and optimize the weight parameters and bias parameters in the static image classification neural network model. The gradient descent method is used to judge that the place where the slope of the curve is the largest is the direction to reach the optimal value faster. The backpropagation method uses the chain derivation method of probability to calculate the partial derivative to update the weight, and updates the parameters through continuous iterative training. to learn images. The method of updating the weight parameters and bias parameters of the backpropagation algorithm is as follows:
1、首先进行前向传播,通过不断迭代训练来更新参数,以学习到图像,并且计算所有层(卷积层,反卷积层)的激活值,即图像经过卷积操作后以得到激活图像;1. First carry out forward propagation, update the parameters through continuous iterative training to learn the image, and calculate the activation values of all layers (convolution layer, deconvolution layer), that is, the image is obtained after the convolution operation. ;
2、对输出层(第n l层),计算敏感值
Figure PCTCN2022134699-appb-000001
2. For the output layer (layer n l ), calculate the sensitivity value
Figure PCTCN2022134699-appb-000001
Figure PCTCN2022134699-appb-000002
Figure PCTCN2022134699-appb-000002
其中,y为样本真实值,
Figure PCTCN2022134699-appb-000003
为输出层的预测值,
Figure PCTCN2022134699-appb-000004
表示输出层参数的偏导数;
Among them, y is the real value of the sample,
Figure PCTCN2022134699-appb-000003
is the predicted value of the output layer,
Figure PCTCN2022134699-appb-000004
Indicates the partial derivative of the output layer parameters;
3、对于l=n l-1,n l-2,.....的各层,计算敏感值
Figure PCTCN2022134699-appb-000005
3. For each layer of l=n l -1, n l -2,..., calculate the sensitivity value
Figure PCTCN2022134699-appb-000005
Figure PCTCN2022134699-appb-000006
Figure PCTCN2022134699-appb-000006
其中,W l表示第l层的权重参数,δ l+1表示第l+1层的敏感值,f'(z l)表示第l层的偏导数; Among them, W l represents the weight parameter of the l-th layer, δ l+1 represents the sensitive value of the l+1-th layer, and f'(z l ) represents the partial derivative of the l-th layer;
4、更新每层的权重参数和偏置参数:4. Update the weight parameters and bias parameters of each layer:
Figure PCTCN2022134699-appb-000007
Figure PCTCN2022134699-appb-000007
Figure PCTCN2022134699-appb-000008
Figure PCTCN2022134699-appb-000008
其中,W l和b l分别表示l层的权重参数和偏置参数,
Figure PCTCN2022134699-appb-000009
为学习率,a l表示第l层的输出值,δ l+1表示l+1层的敏感值。
Among them, W l and b l represent the weight parameter and bias parameter of layer l respectively,
Figure PCTCN2022134699-appb-000009
is the learning rate, a l represents the output value of layer l, and δ l+1 represents the sensitive value of layer l+1.
更进一步地,所述根据所述扩展后的训练样本和所述静态图像分类神经网络模型的模型参数的初始值采用随机梯度下降法对预先搭建的静态图像分类神经网络模型进行训练,包括:Further, according to the initial value of the model parameters of the expanded training sample and the static image classification neural network model, the stochastic gradient descent method is used to train the pre-built static image classification neural network model, including:
步骤1、将扩展后的训练样本作为静态图像分类神经网络模型的输入,根据所述静态图像分类神经网络模型的模型参数的初始值,获取所述扩展后的样本图像的预测分类结果; Step 1, using the expanded training sample as the input of the static image classification neural network model, and obtaining the predicted classification result of the expanded sample image according to the initial value of the model parameter of the static image classification neural network model;
步骤2、根据所述扩展后的样本图像的预测分类结果和所述扩展后的样本图像对应的分类标签,计算损失函数值;以及 Step 2. Calculate a loss function value according to the predicted classification result of the expanded sample image and the classification label corresponding to the expanded sample image; and
步骤3、判断所述损失函数值是否收敛到预设误差值,如果是,训练结束,如果否,调整所述静态图像分类神经网络模型的模型参数,并将所述静态图像分类神经网络模型的模型参数的初始值更新为调整后的模型参数,返回执行所述步骤1。 Step 3, judging whether the loss function value converges to a preset error value, if yes, the training ends, if not, adjust the model parameters of the static image classification neural network model, and set the static image classification neural network model's The initial value of the model parameter is updated to the adjusted model parameter, and the step 1 is executed back.
当损失函数值未收敛到预设误差值时,表示静态图像分类神经网络模型还不准确,需要继续对静态图像分类神经网络模型进行训练,如此,则调整模型参数,并将模型参数的初始值更新为调整后的模型参数,返回执行步骤1,进入下一次迭代过程。When the loss function value does not converge to the preset error value, it means that the static image classification neural network model is not accurate, and it is necessary to continue training the static image classification neural network model. In this case, adjust the model parameters and change the initial value of the model parameters to Update to the adjusted model parameters, return to step 1, and enter the next iteration process.
可见,损失函数是用来优化神经网络的目标函数,通过最小化该损失函数可以使神经网络学习更好。由于静态图像分类神经网络模型学习图像特征需要在一定的情境下才可以,即需要定义合适的损失函数才能学习到有效特征。本申请采用二值化分类网络损失函数L(W,b)作为损失函数。It can be seen that the loss function is the objective function used to optimize the neural network, and the neural network can learn better by minimizing the loss function. Because the static image classification neural network model needs to learn image features in a certain situation, that is, it needs to define a suitable loss function to learn effective features. This application uses the binary classification network loss function L(W,b) as the loss function.
二值化分类网络损失函数L(W,b)如下所示:The binary classification network loss function L(W,b) is as follows:
Figure PCTCN2022134699-appb-000010
Figure PCTCN2022134699-appb-000010
式中,W和b表示静态图像分类神经网络模型的权重参数和偏置参数,m为训练样本的数量,m为正整数,x i表示输入的第i个训练样本,f W,b(x i)表示第i个训练样本的预测分类结果,y i表示第i个训练样本的分类标签。 In the formula, W and b represent the weight parameters and bias parameters of the static image classification neural network model, m is the number of training samples, m is a positive integer, x i represents the i-th training sample input, f W,b (x i ) represents the predicted classification result of the i-th training sample, and y i represents the classification label of the i-th training sample.
在一种示范性的实施方式中,所述根据每一帧图像的特征矩阵,提取出预设数量的责任帧,包括:In an exemplary implementation, the extraction of a preset number of responsible frames according to the feature matrix of each frame image includes:
对所有帧图像的特征矩阵进行最大池化操作,以获取所述待提取视频的视频特征矩阵;Performing a maximum pooling operation on the feature matrices of all frame images to obtain the video feature matrix of the video to be extracted;
根据每一帧图像的特征矩阵和所述视频特征矩阵,提取出预设数量的责任帧。According to the feature matrix of each frame image and the video feature matrix, a preset number of responsible frames are extracted.
具体地,所述对所有帧图像的特征矩阵进行最大池化操作,是指将所述待提取视频中的所有帧图像(例如100帧)的特征矩阵在列方向(即特征维度的方向)取最大特征值,以得到每一特征维度的特征值均为所有帧图像的特征矩阵在该特征维度的最大特征值的1×k的视频特征矩阵,所获得的视频特征矩阵综合了每帧图像中所能贡献的重要信息。由于视频实质上是多帧图像的叠加,视频的特征信息分散在各帧图像中,因此通过对所有帧图像的特征矩阵进行最大池化操作所获得的视频特征矩阵代表了待提取视频的特征。Specifically, performing the maximum pooling operation on the feature matrices of all frame images refers to taking the feature matrices of all frame images (for example, 100 frames) in the video to be extracted in the column direction (that is, the direction of the feature dimension) The largest eigenvalue, to obtain the eigenvalue of each feature dimension is the 1×k video feature matrix of the feature matrix of all frame images in the feature dimension of the largest eigenvalue, the obtained video feature matrix is integrated in each frame image Important information that can be contributed. Since the video is essentially a superposition of multiple frames of images, the feature information of the video is scattered in each frame of images, so the video feature matrix obtained by performing the maximum pooling operation on the feature matrices of all frame images represents the features of the video to be extracted.
进一步地,所述根据每一帧图像的特征矩阵和所述视频特征矩阵,提取出预设数量的责任帧,包括:Further, extracting a preset number of responsible frames according to the feature matrix of each frame image and the video feature matrix, including:
将所述视频特征矩阵中的每个特征维度的特征值乘以该特征维度的重要性值,以获取视频特征重要性矩阵;Multiplying the eigenvalue of each feature dimension in the video feature matrix by the importance value of the feature dimension to obtain the video feature importance matrix;
针对每一帧图像,将该帧图像的特征矩阵中的每个特征维度的特征值乘以该特征维度的重要性值,以获取该帧图像的特征重要性矩阵;For each frame of image, multiply the eigenvalue of each feature dimension in the feature matrix of the frame image by the importance value of the feature dimension to obtain the feature importance matrix of the frame image;
根据所述视频特征重要性矩阵和每一帧图像的特征重要性矩阵,提取出预设数量的责任帧。According to the video feature importance matrix and the feature importance matrix of each frame image, a preset number of responsible frames are extracted.
具体地,每一特征维度的重要性值可以代表该特征维度的特征在下文所述的随机森林分类模型中的重要性,由随机森林分类模型定义,均为正数,当然,如本领域技术人员所能理解的,在其它一些实施方式中,每一特征维度的重要性值还可以代表该特征维度的特征在除随机森林分类模型以外的其它分类模型中的重要性,本申请对此并不进行限定。请参考图4,其示意性地给出了本申请一具体示例中的获取视频特征重要性矩阵和每一帧图像的特征重要性矩阵的示意图。Specifically, the importance value of each feature dimension can represent the importance of the feature of this feature dimension in the random forest classification model described below, defined by the random forest classification model, all positive numbers, of course, as in the art Personnel can understand that in some other implementations, the importance value of each feature dimension can also represent the importance of the features of this feature dimension in other classification models except the random forest classification model, and this application does not Not limited. Please refer to FIG. 4 , which schematically shows a schematic diagram of acquiring video feature importance matrix and feature importance matrix of each frame image in a specific example of the present application.
更进一步地,请参考图5,其示意性地给出了本申请一实施方式提供的提取责任帧的具体流程示意图。如图5所示,所述根据所述视频特征重要性矩阵和每一帧图像的特征重要性矩阵,提取出预设数量的责任帧,包括:Further, please refer to FIG. 5 , which schematically shows a specific flowchart of extracting a responsibility frame provided by an embodiment of the present application. As shown in Figure 5, according to the video feature importance matrix and the feature importance matrix of each frame image, a preset number of responsible frames is extracted, including:
步骤A1、以所述视频特征重要性矩阵作为当前视频特征重要性矩阵;Step A1, using the video feature importance matrix as the current video feature importance matrix;
步骤B1、针对每一帧图像,将所述当前视频特征重要性矩阵减去该帧图像的特征重要性矩阵,以获取该帧图像所对应的剩余特征重要性矩阵;Step B1, for each frame of image, subtracting the feature importance matrix of the frame image from the current video feature importance matrix to obtain the remaining feature importance matrix corresponding to the frame image;
步骤C1、针对每一帧图像,将该帧图像所对应的剩余特征重要性矩阵中的各个特征维度的特征值相加,以获取该帧图像所对应的剩余信息熵;Step C1. For each frame of image, add the eigenvalues of each feature dimension in the remaining feature importance matrix corresponding to the frame of image to obtain the remaining information entropy corresponding to the frame of image;
步骤D1、将剩余信息熵最小的图像作为当前责任帧;Step D1, taking the image with the smallest remaining information entropy as the current responsible frame;
步骤E1、将所述当前责任帧所对应的剩余特征重要性矩阵作为新的当前视频特征重要性矩阵;Step E1, using the remaining feature importance matrix corresponding to the current responsible frame as a new current video feature importance matrix;
重复上述步骤B1至步骤E1,直至提取出预设数量的责任帧。The above steps B1 to E1 are repeated until a preset number of responsible frames are extracted.
具体地,可以将特征重要性矩阵中的每个特征维度的特征值视为信息量,则相应地,视频特征重要性矩阵中的每个特征维度的特征值视为整个视频在该特征维度下贡献的总信息量,每帧图像的特征重要性矩阵中的每个特征维度的特征值视为该帧图像在该特征维度下贡献的单帧信息量。对于每帧图像,将视频特征重要性矩阵减去该帧图像的特征重要性矩阵,所获得的矩阵即为该帧图像所对应的剩余特征重要性矩阵。对于每帧图像,将该帧图像所对应的剩余特征重要性矩阵中的各特征维度的信息量(即特征值)相加,所得到的和即为视频减去该帧图像后的剩余信息熵,找到产生最小剩余信息熵的帧,即找到了最重要的责任帧。在找到最重要的责任帧后,将其所对应的剩余特征重要性矩阵视为新的视频特征重要性矩阵,再采用同样的方法找出次重要责任帧,再以所找出的次重要责任帧后,将其所对应的剩余特征重要性矩阵视为新的视频特征重要性矩阵,再采用同样的方法找出次次重要责任帧,直至找出预设数量的责任帧。Specifically, the eigenvalue of each feature dimension in the feature importance matrix can be regarded as the amount of information, and correspondingly, the eigenvalue of each feature dimension in the video feature importance matrix can be regarded as the entire video under this feature dimension The total amount of information contributed, the eigenvalue of each feature dimension in the feature importance matrix of each frame image is regarded as the single frame information amount contributed by the frame image under this feature dimension. For each frame of image, the video feature importance matrix is subtracted from the feature importance matrix of the frame image, and the obtained matrix is the remaining feature importance matrix corresponding to the frame image. For each frame of image, add the information amount (i.e., feature value) of each feature dimension in the remaining feature importance matrix corresponding to the frame of image, and the obtained sum is the residual information entropy after subtracting the frame of image from the video , find the frame that produces the smallest residual information entropy, that is, find the most important responsible frame. After finding the most important responsibility frame, regard its corresponding remaining feature importance matrix as a new video feature importance matrix, and then use the same method to find out the second important responsibility frame, and then use the found second important responsibility After each frame, the corresponding remaining feature importance matrix is regarded as a new video feature importance matrix, and the same method is used to find out the next important responsible frame until a preset number of responsible frames are found.
由于视频特征重要性矩阵在减去了最重要责任帧的特征重要性矩阵后后,曾经贡献信息量最大的特征维度减去了一个责任帧贡献的大信息量,该特征维度的剩余信息量会变得很小,所以在进行次重要责任帧选取时,选出的责任帧贡献信息量大的特征维度会与第一次选取的最重要责任帧不同,可见本实 施方式提供的责任帧提取方法不需认为定义隔帧提取距离也可以提取出特征多样化的责任帧。本实施方式通过进行逆向的特征贡献计算,结合信息熵减小的思想,自动地找出视频中为视频分类(例如良恶性结节视频的分类)贡献出不同重要特征的责任帧,本实施方式提供的责任帧提取方法通用性强,可以适用于各类CNN(卷积神经网络)模型,具有良好的适用性和可迁移性。Since the video feature importance matrix subtracts the feature importance matrix of the most important responsible frame, the feature dimension that once contributed the largest amount of information subtracts the large amount of information contributed by a responsible frame, and the remaining information of this feature dimension will be becomes very small, so when the second important responsibility frame is selected, the feature dimension of the selected responsibility frame with a large amount of contribution information will be different from the most important responsibility frame selected for the first time. It can be seen that the responsibility frame extraction method provided by this embodiment Responsibility frames with diverse features can be extracted without defining the extraction distance between frames. In this embodiment, by performing reverse feature contribution calculations, combined with the idea of reducing information entropy, the responsible frames in the video that contribute different important features to video classification (such as the classification of benign and malignant nodules) are automatically found. The responsibility frame extraction method provided has strong versatility, can be applied to various CNN (convolutional neural network) models, and has good applicability and transferability.
在一种示范性的实施方式中,所述将所述当前视频特征重要性矩阵减去该帧图像的特征重要性矩阵,以获取该帧图像所对应的剩余特征重要性矩阵,包括:In an exemplary implementation, the subtracting the feature importance matrix of the frame image from the current video feature importance matrix to obtain the remaining feature importance matrix corresponding to the frame image includes:
将所述当前视频特征重要性矩阵中的每一特征维度的特征值减去该帧图像的特征重要性矩阵中的对应特征维度的特征值,以获得每一特征维度的特征值差;The eigenvalue of each feature dimension in the feature importance matrix of the current video is subtracted from the eigenvalue of the corresponding feature dimension in the feature importance matrix of the frame image to obtain the eigenvalue difference of each feature dimension;
针对每一特征维度的特征值差,若该特征维度的特征值差小于0,则将0作为该帧图像所对应的剩余特征重要性矩阵中的对应特征维度的特征值;若该特征维度的特征值差大于或等于0,则将该特征维度的特征值差作为该帧图像所对应的剩余特征重要性矩阵中的对应特征维度的特征值。For the eigenvalue difference of each feature dimension, if the eigenvalue difference of the feature dimension is less than 0, then use 0 as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image; if the feature dimension’s If the eigenvalue difference is greater than or equal to 0, the eigenvalue difference of the feature dimension is used as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image.
请参考图6,其示意性的给出了本申请一具体示例中的获取剩余特征重要性矩阵的示意图。如图6所示,通过将视频特征重要性矩阵中的每一特征维度的特征值减去某帧图像的特征重要性矩阵中的对应特征维度的特征值,即可获取该帧图像所对应的剩余特征重要性矩阵。Please refer to FIG. 6 , which schematically shows a schematic diagram of obtaining the remaining feature importance matrix in a specific example of the present application. As shown in Figure 6, by subtracting the eigenvalue of each feature dimension in the feature importance matrix of the video feature importance matrix from the eigenvalue of the corresponding feature dimension in the feature importance matrix of a frame image, the corresponding eigenvalue of the frame image can be obtained The remaining feature importance matrix.
在另一种示范性的实施方式中,所述根据每一帧图像的特征矩阵,提取出预设数量的责任帧,包括:In another exemplary embodiment, the extraction of a preset number of responsible frames according to the feature matrix of each frame image includes:
针对每一帧图像,将该帧图像的特征矩阵中的每个特征维度的特征值乘以该特征维度的贡献权重值,以获取该帧图像的特征熵矩阵;For each frame of image, multiply the eigenvalue of each feature dimension in the feature matrix of the frame image by the contribution weight value of the feature dimension to obtain the feature entropy matrix of the frame image;
对所有帧图像的特征熵矩阵进行最大池化操作,以获取所述待提取视频的视频特征熵矩阵;Performing a maximum pooling operation on the feature entropy matrices of all frame images to obtain the video feature entropy matrix of the video to be extracted;
根据每一帧图像的特征熵矩阵和所述视频特征熵矩阵,提取出预设数量的责任帧。According to the feature entropy matrix of each frame image and the video feature entropy matrix, a preset number of responsible frames are extracted.
具体地,视频可以看做是一系列帧的集合,整个视频的信息分散在每一帧中,每帧图像在每个特征维度上的贡献由其特征矩阵表示,其中特征维度的数量由骨架网络决定,每个特征维度表示一个深度空间的图像特征(例如恶性结节或良性结节的特征),通过在特征矩阵上乘以贡献权重值,该贡献权重值可以由静态图像分类神经网络模型中的分类网络(例如全连接层)的通道权重差决定,举例而言,所述分类网络用于进行良恶性分类,则该分类网络的一个通道对应恶性类别,另一个通道对应良性类别,其中对应恶性类别的通道的权重为W 1,对应良性类别的通道的权重为W 0。在基本的CNN架构中的模型预测的输出Y pred可以表示成: Specifically, a video can be regarded as a collection of a series of frames, the information of the entire video is scattered in each frame, and the contribution of each frame image on each feature dimension is represented by its feature matrix, where the number of feature dimensions is determined by the skeleton network It is determined that each feature dimension represents an image feature in a depth space (such as the feature of malignant nodules or benign nodules), and by multiplying the contribution weight value on the feature matrix, the contribution weight value can be determined by the static image classification neural network model The channel weight difference of the classification network (such as the fully connected layer) is determined. For example, the classification network is used to classify benign and malignant, and then one channel of the classification network corresponds to the malignant category, and the other channel corresponds to the benign category, wherein the corresponding malignant The weight of the channel of the category is W 1 , and the weight of the channel corresponding to the benign category is W 0 . The output Y pred predicted by the model in the basic CNN architecture can be expressed as:
Y pred=Sigmoid([W 0,W 1] T*X+B)=[Y 0,Y 1] Y pred = Sigmoid([W 0 ,W 1 ] T *X+B)=[Y 0 ,Y 1 ]
式中,Sigmoid表示激活函数,X表示特征矩阵,Y 0表示良性概率,Y 1表示恶性概率。 In the formula, Sigmoid represents the activation function, X represents the feature matrix, Y 0 represents the benign probability, and Y 1 represents the malignant probability.
Figure PCTCN2022134699-appb-000011
表示单个特征维度i的恶性贡献,其中,
Figure PCTCN2022134699-appb-000012
表示该特征维度i对最终恶性贡献的强度,x i表示图像在该特征维度i中贡献的信息量。若希望将注意力集中在表征恶性的深度空间特征上,可以用如下方程来描述第i帧图像在第j个特征维度上的贡献:
Figure PCTCN2022134699-appb-000011
Denotes the malignant contribution of a single feature dimension i, where,
Figure PCTCN2022134699-appb-000012
Indicates the intensity of the feature dimension i's contribution to the final malignancy, and xi indicates the amount of information contributed by the image in the feature dimension i. If you want to focus on the deep spatial features that represent malignancy, you can use the following equation to describe the contribution of the i-th frame image in the j-th feature dimension:
Figure PCTCN2022134699-appb-000013
Figure PCTCN2022134699-appb-000013
因此,第i帧图像的整个特征熵矩阵可以表示为:Therefore, the entire feature entropy matrix of the i-th frame image can be expressed as:
Figure PCTCN2022134699-appb-000014
Figure PCTCN2022134699-appb-000014
为了专注于视频最具代表性的深度特征,使用MaxPooling(最大池化)来处理所有帧图像的特征熵矩阵以构建视频特征熵矩阵[FE] videoIn order to focus on the most representative depth features of the video, use MaxPooling (maximum pooling) to process the feature entropy matrix of all frame images to construct the video feature entropy matrix [FE] video :
Figure PCTCN2022134699-appb-000015
Figure PCTCN2022134699-appb-000015
需要说明的是,如本领域技术人员所能理解的,在其它一些实施方式中,也可以直接上文中的视频特征矩阵中的每个特征维度的特征值乘以该特征维度的贡献权重值,以获取视频特征熵矩阵。It should be noted that, as those skilled in the art can understand, in other embodiments, the feature value of each feature dimension in the video feature matrix above can also be directly multiplied by the contribution weight value of the feature dimension, to get the video feature entropy matrix.
进一步地,所述根据每一帧图像的特征熵矩阵和所述视频特征熵矩阵,提取出预设数量的责任帧,包括:Further, extracting a preset number of responsible frames according to the feature entropy matrix of each frame image and the video feature entropy matrix, including:
针对每一帧图像,将该帧图像的特征熵矩阵中的所有特征维度的特征值相加,以获取该帧图像的评估分值;For each frame image, add the eigenvalues of all feature dimensions in the feature entropy matrix of the frame image to obtain the evaluation score of the frame image;
将所述视频特征熵矩阵中的所有特征维度的特征值相加,以获取所述待提取视频的评估分值;adding the eigenvalues of all feature dimensions in the video feature entropy matrix to obtain the evaluation score of the video to be extracted;
根据每一帧图像的评估分值和所述待提取视频的评估分值,提取出预设数量的责任帧,其中,所述待提取视频的评估分值与由所述预设数量的责任帧所构成的图像集合的评估分值的差值最小。According to the evaluation score of each frame image and the evaluation score of the video to be extracted, a preset number of responsible frames are extracted, wherein the evaluation score of the video to be extracted is related to the preset number of responsible frames The resulting set of images has the smallest difference in evaluation scores.
具体地,定义FScore(评估分值)等于特征熵矩阵中所有特征维度的特征值的和,则第i帧图像的FScore满足如下关系式:Specifically, define FScore (assessment score) equal to the sum of the eigenvalues of all feature dimensions in the feature entropy matrix, then the FScore of the i-th frame image satisfies the following relationship:
Figure PCTCN2022134699-appb-000016
Figure PCTCN2022134699-appb-000016
将FScore从单帧图像扩展的图像的集合,其中视频也可以视为图像的集合,对于图像集合A(A=[frame a,frame b,...frame n]),该图像集合A的评估分值FScore满足如下关系式: A collection of images that extend FScore from a single frame image, where video can also be regarded as a collection of images. For a collection of images A (A=[frame a , frame b ,...frame n ]), the evaluation of the collection of images A The score FScore satisfies the following relationship:
Figure PCTCN2022134699-appb-000017
Figure PCTCN2022134699-appb-000017
由于待提取视频的评估分值与由最终提取出的多张责任帧所构成的图像集合的评估分值的差值最小,由此不仅可以保证由多张责任帧所构成的图像集合所包含的信息尽可能接近整个视频,同时也可以确保所选取的多张责任帧直接能够形成特征互补。Since the difference between the evaluation score of the video to be extracted and the evaluation score of the image set composed of the finally extracted multiple responsible frames is the smallest, it can not only ensure that the image set composed of multiple responsible frames contains The information is as close as possible to the entire video, and at the same time, it can also ensure that the selected responsible frames can directly form complementary features.
请继续参考图7,其示意性地给出了本申请另一实施方式提供的提取责任帧的具体流程示意图。如图7所示,所述根据每一帧图像的评估分值和所述待提取视频的评估分值,提取出预设数量的责任帧,包括:Please continue to refer to FIG. 7 , which schematically shows a specific flowchart of extracting the responsibility frame provided by another embodiment of the present application. As shown in FIG. 7 , according to the evaluation score of each frame image and the evaluation score of the video to be extracted, a preset number of responsible frames are extracted, including:
步骤A2、针对每一帧图像,计算所述待提取视频的评估分值与该帧图像的评估分值的差值,以获取该帧图像的特征熵差;Step A2. For each frame of image, calculate the difference between the evaluation score of the video to be extracted and the evaluation score of the frame image, so as to obtain the feature entropy difference of the frame image;
步骤B2、将特征熵差最小的图像确定为责任帧;Step B2, determining the image with the smallest feature entropy difference as the responsible frame;
步骤C2、将所有的责任帧与每一非责任帧分别组成一图像集合分别组成一图像集合,并分别计算每一图像集合的评估分值;Step C2. Composing all responsible frames and each non-responsible frame into an image set respectively, and calculating the evaluation score of each image set respectively;
步骤D2、针对每一图像集合,计算所述待提取视频的评估分值与该图像集合的评估分值的差值,以获取该图像集合的特征熵差;Step D2. For each image set, calculate the difference between the evaluation score of the video to be extracted and the evaluation score of the image set to obtain the feature entropy difference of the image set;
步骤E2、将特征熵差最小的图像集合中的所有图像确定为责任帧;Step E2, determining all images in the image set with the smallest feature entropy difference as responsible frames;
重复上述步骤C2至步骤E2,直至提取出预设数量的责任帧。The above steps C2 to E2 are repeated until a preset number of responsible frames are extracted.
具体地,针对每一图像集合,可以先对该图像集合中的所有帧图像的特征熵矩阵进行最大池化操作,以获取该图像集合的特征熵矩阵,通过将该图像集合的特征熵矩阵中的所有特征维度的特征值相加,所得的和即为该图像集合的评估分值。由此,在选取第i帧责任帧topi时,可以将已确定的所有的责任帧(top1,tpo2,……topi-1)与剩余的每一帧图像(即除责任帧以外的每一帧图像)分别组成一图像集合,例如,以剩余的某一帧图像a为例,已确定的所有的责任帧与该帧图像a所组成的图像集合为[top1,tpo2,……topi-1,a],则所述待提取视频的评估分值与该图像集合的评估分值之间的差值通过以下算式计算得到:Specifically, for each image set, the maximum pooling operation can be performed on the feature entropy matrix of all frame images in the image set to obtain the feature entropy matrix of the image set. Add the eigenvalues of all feature dimensions of , and the resulting sum is the evaluation score of the image set. Thus, when selecting the responsible frame topi of the i-th frame, all determined responsible frames (top1, tpo2,...topi-1) can be combined with each remaining image frame (that is, each frame except the responsible frame images) form an image set respectively. For example, taking the remaining frame image a as an example, the image set formed by all the determined responsible frames and the frame image a is [top1, tpo2,...topi-1, a], then the difference between the evaluation score of the video to be extracted and the evaluation score of the image set is calculated by the following formula:
Figure PCTCN2022134699-appb-000018
Figure PCTCN2022134699-appb-000018
由此,通过计算所述待提取视频的评估分值与每一所述图像集合的评估分值的差值,即可获取每一图像集合的特征熵差,其中,特征熵差最小的图像集合中的所有图像均为责任帧,也即特征熵差最小的图像集合中的剩余帧图像即为第i帧责任帧。本实施方式提供的责任帧提取方法可以在不增加额外训练参数的前提下,即可实现提取用于进行视频分类(例如良恶性结节视频的分类)的贡献特征不重复的多张责任帧,本实施方式可以适用于各类CNN模型,具有良好的适用性和可迁移性。Thus, by calculating the difference between the evaluation score of the video to be extracted and the evaluation score of each of the image sets, the feature entropy difference of each image set can be obtained, wherein the image set with the smallest feature entropy difference All the images in are responsible frames, that is, the remaining frame images in the image set with the smallest feature entropy difference are the responsible frames of the i-th frame. The responsible frame extraction method provided in this embodiment can realize the extraction of multiple responsible frames whose contribution features are not repeated for video classification (such as the classification of benign and malignant nodule videos) without adding additional training parameters. This embodiment can be applied to various CNN models, and has good applicability and portability.
请参考图8a至图8c,其中图8a示意性地给出了本申请一具体示例中的获取视频特征熵矩阵的示意图;图8b示意性地给出了本申请一具体示例中的选取第一帧责任帧的示意图;图8c示意性地给出了本申请一具体示例中的选取第二帧责任帧的示意图。如图8a至图8c所示,在该具体示例中,待提取视频包括3帧图像,深度特征维度总数为3,所需要提取的责任帧的数目为2。首先,通过对3帧图像的特征熵矩阵进行最大池化操作,可以获取视频特征熵矩阵,通过计算可知,待提取视频的评估分值FScore video为24,第一帧图像的评估分值FScore frame1为16,第二帧图像的评估分值FScore frame2为14,第三帧图像的评估分值FScore frame3为11,进一步计算可知,待提取视频的评估分值FScore video与 第一帧图像的评估分值FScore frame1之间的差值为8(也即第一帧图像的特征熵差为8),待提取视频的评估分值FScore video与第二帧图像的评估分值FScore frame2之间的差值为10(也即第一帧图像的特征熵差为10),待提取视频的评估分值FScore video与第三帧图像的评估分值FScore frame3之间的差值为13(也即第一帧图像的特征熵差为13),由于第一帧图像的特征熵差最小,因此将第一帧图像确定为第一帧责任帧。然后将第一帧责任帧(即第一帧图像)与第二帧图像组成一个图像集合[frame1,frame2],通过对第一帧责任帧的特征熵矩阵和第二帧图像的特征熵矩阵进行最大池化操作,可以获取该图像集合[frame1,frame2]的特征熵矩阵,通过计算可知,该图像集合[frame1,frame2]的评估分值FScore [frame1,frame2]为16,进一步计算可知,待提取视频的评估分值FScore video与该图像集合[frame1,frame2]的评估分值FScore [frame1,frame2]之间的差值为8(即该图像集合的特征熵差为8);将第一帧责任帧与第三帧图像组成另一个图像集合[frame1,frame3],通过对第一帧责任帧的特征熵矩阵和第三帧图像的特征熵矩阵进行最大池化操作,可以获取该图像集合[frame1,frame3]的特征熵矩阵,通过计算可知,该图像集合[frame1,frame3]的评估分值FScore [frame1,frame3]为16,进一步计算可知,待提取视频的评估分值FScore video与该图像集合[frame1,frame3]的评估分值FScore [frame1,frame2]之间的差值为0(即该图像集合的特征熵差为0)。由于由第一帧责任帧与第三帧图像组成的图像集合[frame1,frame3]的特征熵差小于由第一帧责任帧与第二帧图像组成的图像集合[frame1,frame2]的特征熵差,由此将第三帧图像确定为第二帧责任帧。 Please refer to FIG. 8a to FIG. 8c, wherein FIG. 8a schematically shows a schematic diagram of obtaining a video feature entropy matrix in a specific example of the present application; FIG. 8b schematically shows the selection of the first Schematic diagram of a frame responsibility frame; FIG. 8c schematically shows a schematic diagram of selecting a second frame responsibility frame in a specific example of the present application. As shown in Figures 8a to 8c, in this specific example, the video to be extracted includes 3 frames of images, the total number of depth feature dimensions is 3, and the number of responsible frames to be extracted is 2. First, by performing the maximum pooling operation on the feature entropy matrix of the three frames of images, the video feature entropy matrix can be obtained. Through calculation, it can be known that the evaluation score FScore video of the video to be extracted is 24, and the evaluation score FScore frame1 of the first frame image is 16, the evaluation score FScore frame2 of the second frame image is 14, and the evaluation score FScore frame3 of the third frame image is 11. Further calculation shows that the evaluation score FScore video of the video to be extracted is the same as the evaluation score of the first frame image The difference between the value FScore frame1 is 8 (that is, the feature entropy difference of the first frame image is 8), the difference between the evaluation score FScore video of the video to be extracted and the evaluation score FScore frame2 of the second frame image is 10 (that is, the feature entropy difference of the first frame image is 10), the difference between the evaluation score FScore video of the video to be extracted and the evaluation score FScore frame3 of the third frame image is 13 (that is, the first frame The feature entropy difference of the image is 13), and since the feature entropy difference of the first frame image is the smallest, the first frame image is determined as the first responsible frame. Then, the first frame responsibility frame (i.e. the first frame image) and the second frame image form an image set [frame1, frame2], and the feature entropy matrix of the first frame responsibility frame and the feature entropy matrix of the second frame image are performed. The maximum pooling operation can obtain the feature entropy matrix of the image set [frame1, frame2]. Through calculation, it can be seen that the evaluation score FScore [frame1, frame2] of the image set [frame1, frame2] is 16. Further calculations show that, to be The difference between the evaluation score FScore video of the extracted video and the evaluation score FScore [frame1, frame2] of the image collection [frame1, frame2] is 8 (that is, the feature entropy difference of the image collection is 8); the first The frame responsibility frame and the third frame image form another image set [frame1, frame3], which can be obtained by performing the maximum pooling operation on the feature entropy matrix of the first frame responsibility frame and the feature entropy matrix of the third frame image The feature entropy matrix of [frame1, frame3], through calculation, the evaluation score FScore [frame1, frame3] of the image set [ frame1, frame3] is 16. Further calculation shows that the evaluation score FScore video of the video to be extracted is the same as the The difference between the evaluation scores FScore [frame1, frame2] of the image set [frame1, frame3] is 0 (that is, the feature entropy difference of the image set is 0). Since the feature entropy difference of the image set [frame1, frame3] composed of the first frame responsibility frame and the third frame image is smaller than the feature entropy difference of the image set [frame1, frame2] composed of the first frame responsibility frame and the second frame image , thus determining the third frame image as the responsible frame of the second frame.
在另一方面,本申请还提供一种责任帧提取方法,请参考图9,其示意性地给出了本申请一实施方式提供的责任帧提取方法的流程图,如图9所示,所述责任帧提取方法包括如下步骤:On the other hand, the present application also provides a responsibility frame extraction method, please refer to FIG. 9, which schematically shows a flow chart of the responsibility frame extraction method provided by an embodiment of the present application, as shown in FIG. 9, the The method for extracting the responsibility frame comprises the following steps:
步骤S210、采用目标检测神经网络模型对所获取的医学视频中的每一帧医学图像进行感兴趣区域的提取,以获取每一帧医学图像所对应的感兴趣区域图像。Step S210, using the object detection neural network model to extract the region of interest for each frame of medical image in the acquired medical video, so as to obtain the region of interest image corresponding to each frame of medical image.
步骤S220、采用静态图像分类神经网络模型的骨架网络对每一帧感兴趣区域图像进行特征提取,以获取每一帧感兴趣区域图像的特征矩阵。Step S220 , using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of the ROI image, so as to obtain a feature matrix of each frame of the ROI image.
步骤S230、根据各帧感兴趣区域图像的特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件;和/或者根据各帧感兴趣区域图像的特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件。Step S230, according to the feature matrix of each frame of the region of interest image, extract the malicious responsible frame until the first preset end condition is met; and/or perform the extraction of the benign responsible frame according to the feature matrix of each frame of the region of interest image , until the second preset end condition is met.
请参考图10和图11,由于设备和检查模式的不同,会导致视窗外信息提示栏的风格不一,因此本申请提供的责任帧提取方法通过先采用目标检测神经网络模型从所获取的医学视频的每一帧医学图像中提取出感兴趣区域图像,再根据每一帧感兴趣区域图像的特征矩阵进行恶性责任帧和/或者良性责任帧的提取,可以有效减少恶性责任帧和/或者良性责任帧提取过程中的图像噪声的干扰,进一步提高恶性责任帧和/或者良性责任帧提取的效率和准确率。Please refer to Figure 10 and Figure 11. Due to the difference in equipment and inspection modes, the style of the information prompt bar outside the window will be different. Therefore, the responsibility frame extraction method provided by this application first adopts the target detection neural network model from the acquired medical The image of the region of interest is extracted from each frame of the medical image of the video, and then the malicious responsible frame and/or the benign responsible frame are extracted according to the feature matrix of each frame of the region of interest image, which can effectively reduce the malignant responsible frame and/or benign responsible frame. The interference of image noise in the process of extracting responsible frames further improves the efficiency and accuracy of extracting malicious responsible frames and/or benign responsible frames.
在一种示范性的实施方式中,所述采用目标检测神经网络模型对所获取的医学视频中的每一帧医学图像进行感兴趣区域的提取,以获取每一帧医学图像所对应的感兴趣区域图像,包括:In an exemplary implementation, the target detection neural network model is used to extract the region of interest for each frame of medical image in the acquired medical video, so as to obtain the region of interest corresponding to each frame of medical image. Area images, including:
采用目标检测神经网络模型对所获取的医学视频中的每一帧医学图像进行感兴趣区域的提取,以获取每一帧医学图像所对应的感兴趣区域的位置信息;Using the target detection neural network model to extract the region of interest for each frame of medical image in the acquired medical video, to obtain the position information of the region of interest corresponding to each frame of medical image;
根据各帧医学图像所对应的感兴趣区域的位置信息,在各帧医学图像上裁剪出对应的区域,以获取每一帧医学图像所对应的感兴趣区域图像。According to the position information of the region of interest corresponding to each frame of medical image, the corresponding region is cut out on each frame of medical image, so as to obtain the image of the region of interest corresponding to each frame of medical image.
由此,通过采用目标检测神经网络模型可以准确地获取每一帧医学图像中的感兴趣区域(即超声 视窗)的位置信息,从而针对每一帧医学图像,根据所述医学图像的感兴趣区域的位置信息,即可在该医学图像中裁剪出对应的感兴趣区域图像。Therefore, by using the target detection neural network model, the position information of the region of interest (that is, the ultrasound window) in each frame of medical image can be accurately obtained, so that for each frame of medical image, according to the region of interest of the medical image The location information of the medical image can be cropped to the corresponding ROI image.
在一种示范性的实施方式中,在采用静态图像分类神经网络模型的骨架网络对每一帧感兴趣区域图像进行特征提取之前,所述方法还包括:In an exemplary embodiment, before using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of the region-of-interest image, the method further includes:
针对每一帧感兴趣区域图像:For each frame of ROI image:
以所述感兴趣区域图像的宽度尺寸和高度尺寸中的较大的一者作为目标边长尺寸;Taking the larger one of the width dimension and the height dimension of the image of the region of interest as the target side length dimension;
对所述感兴趣区域图像进行填充,以将所述感兴趣区域图像的宽度尺寸和高度尺寸中的较小一者调整至所述目标边长尺寸;padding the ROI image to adjust the smaller one of the width dimension and the height dimension of the ROI image to the target side length dimension;
将调整至所述目标边长尺寸的感兴趣区域图像进行放大或缩小,以将所述感兴趣区域图像的尺寸调整至预设尺寸。Enlarging or reducing the ROI image adjusted to the target side length size, so as to adjust the size of the ROI image to a preset size.
由于所述静态图像分类神经网络模型需要统一大小的图像作为输入,因此,在采用静态图像分类神经网络模型的骨架网络对每一帧感兴趣区域图像进行特征提取之前,需要对所述感兴趣区域图像的尺寸进行调整,以将所述感兴趣区域图像的尺寸调整至预设尺寸。所述预设尺寸可以根据具体情况进行设置,本申请对此并不进行限定。作为一种优选,在所述预设尺寸中,图像的高度尺寸与宽度尺寸相一致,即调整至预设尺寸后的感兴趣区域图像为方形图像,例如所述预设尺寸为448*448。由此,通过将预设尺寸中的高度尺寸和宽度设置为一致,可以更加便于将所述感兴趣区域图像的尺寸调整至所述预设尺寸。具体地,可以采用“零像素”的填充方法对所述感兴趣区域图像进行填充,以将所述感兴趣区域图像的宽度尺寸和高度尺寸调整成一致。需要说明的是,如本领域技术人员所能理解的,由于目标检测神经网络模型也需要统一大小的图像作为输入,因此在采用目标检测神经网络模型对所获取的医学视频中的每一帧医学图像进行感兴趣区域的提取之前,也需要将各帧医学图像的尺寸调整成一目标尺寸,以满足目标检测神经网络模型的输入需要。Because the static image classification neural network model needs images of uniform size as input, before using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of the region of interest image, the region of interest needs to be The size of the image is adjusted, so as to adjust the size of the ROI image to a preset size. The preset size can be set according to specific conditions, which is not limited in this application. As a preference, in the preset size, the height dimension of the image is consistent with the width dimension, that is, the image of the region of interest adjusted to the preset size is a square image, for example, the preset size is 448*448. Therefore, by setting the height dimension and the width in the preset size to be consistent, it may be more convenient to adjust the size of the ROI image to the preset size. Specifically, the ROI image may be filled with a "zero pixel" filling method, so as to adjust the width and height dimensions of the ROI image to be consistent. It should be noted that, as those skilled in the art can understand, since the target detection neural network model also needs images of uniform size as input, each frame of medical video obtained by using the target detection neural network model Before extracting the region of interest from the image, it is also necessary to adjust the size of each frame of medical image to a target size to meet the input requirements of the target detection neural network model.
由此,通过采用并行处理的方式同时对多帧感兴趣区域图像进行特征提取,以同时获取多帧感兴趣区域图像的特征矩阵(一帧感兴趣区域图像对应一个特征矩阵),可以进一步提高本申请提供的责任帧提取方法的提取效率。需要说明的是,如本领域技术人员所能理解的,每次能够并行处理的感兴趣区域图像的总帧数由计算机的GPU的计算能力决定,计算机的GPU的计算能力越强,则每次能够并行处理的感兴趣区域图像的总帧数越多。Thus, by adopting the method of parallel processing to simultaneously perform feature extraction on multiple frames of ROI images to simultaneously obtain feature matrices of multiple frames of ROI images (one frame of ROI images corresponds to a feature matrix), this can be further improved. The application provides the extraction efficiency of the responsibility frame extraction method. It should be noted that, as those skilled in the art can understand, the total number of frames of the region of interest image that can be processed in parallel each time is determined by the computing power of the GPU of the computer. The stronger the computing power of the GPU of the computer, the more The more total frames of ROI images that can be processed in parallel.
在一种示范性的实施方式中,所述根据各帧感兴趣区域图像的特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件,包括:In an exemplary implementation, the extracting of the malicious responsible frame according to the feature matrix of each frame of the region-of-interest image until the first preset end condition is met includes:
针对每一帧感兴趣区域图像,根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的恶性特征权重参数和良性特征权重参数之差,获取所述感兴趣区域图像的恶性特征矩阵;For each frame of the ROI image, the ROI image is acquired according to the feature matrix of the ROI image and the difference between the malignant feature weight parameter and the benign feature weight parameter corresponding to the static image classification neural network model The malignant feature matrix of ;
根据各帧感兴趣区域图像的恶性特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件。According to the malignant feature matrix of each frame of the region of interest image, the malicious responsible frame is extracted until the first preset end condition is met.
具体地,在静态图像分类神经网络模型中,对每一帧感兴趣区域图像的良恶性判断是基于所述感兴趣区域图像的特征矩阵进行的,所述静态图像分类神经网络模型预测的输出概率Y pred可以表示成: Specifically, in the static image classification neural network model, the benign and malignant judgment of each frame of the region of interest image is based on the feature matrix of the region of interest image, and the output probability predicted by the static image classification neural network model Y pred can be expressed as:
Figure PCTCN2022134699-appb-000019
Figure PCTCN2022134699-appb-000019
式(1)中,Y 0表示所述感兴趣区域图像归属于良性类别的概率,Y 1表示所述感兴趣区域图像归属于恶性类别的概率,W 1表示所述静态图像分类神经网络模型所对应的恶性特征权重参数,W 0表示所述静态图像分类神经网络模型所对应的良性特征权重参数,B 0和B 1表示所述静态图像分类神经网络模 型所对应的偏置参数。 In formula (1), Y 0 represents the probability that the ROI image belongs to the benign category, Y 1 represents the probability that the ROI image belongs to the malignant category, and W 1 represents the probability that the static image classification neural network model belongs to. The corresponding malignant feature weight parameters, W 0 represents the benign feature weight parameters corresponding to the static image classification neural network model, and B 0 and B 1 represent the bias parameters corresponding to the static image classification neural network model.
由上式(1)可知,所述感兴趣区域图像归属于恶性类别的概率Y 1仅由恶性特征权重参数和良性特征权重参数的相对差值和所述感兴趣区域图像的特征矩阵X决定,因此,本申请通过根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的恶性特征权重参数和良性特征权重参数之差,获取所述感兴趣区域图像的恶性特征矩阵,再根据各帧感兴趣区域图像的恶性特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件,从而可以准确地提取出恶性贡献信息量较大的恶性责任帧。需要说明的是,如本领域技术人员所能理解的,恶性特征权重参数W 1是一个具有k个恶性特征权重的矩阵,良性特征权重参数W 0是一个具有k个良性特征权重的矩阵,也即每一个特征维度均对应一个恶性特征权重和一个良性特征权重。 It can be seen from the above formula (1) that the probability Y1 of the ROI image belonging to the malignant category is only determined by the relative difference between the malignant feature weight parameter and the benign feature weight parameter and the feature matrix X of the ROI image, Therefore, the present application obtains the malignant feature matrix of the ROI image according to the feature matrix of the ROI image and the difference between the malignant feature weight parameters and the benign feature weight parameters corresponding to the static image classification neural network model , and then according to the malignant feature matrix of each frame of the region of interest image, the malicious responsible frame is extracted until the first preset end condition is met, so that the malicious responsible frame with a large amount of malicious contribution information can be accurately extracted. It should be noted that, as those skilled in the art can understand, the malignant feature weight parameter W 1 is a matrix with k malignant feature weights, and the benign feature weight parameter W 0 is a matrix with k benign feature weights. That is, each feature dimension corresponds to a malignant feature weight and a benign feature weight.
在一种示范性的实施方式中,所述根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的恶性特征权重参数和良性特征权重参数之差,获取所述感兴趣区域图像的恶性特征矩阵,包括:In an exemplary implementation, the sense is obtained according to the feature matrix of the image of the region of interest and the difference between the malignant feature weight parameter and the benign feature weight parameter corresponding to the static image classification neural network model. The malignant feature matrix of the ROI image, including:
按照如下公式(2)和(3),获取所述感兴趣区域图像的恶性特征矩阵:According to the following formulas (2) and (3), the malignant feature matrix of the image of the region of interest is obtained:
Figure PCTCN2022134699-appb-000020
Figure PCTCN2022134699-appb-000020
Figure PCTCN2022134699-appb-000021
Figure PCTCN2022134699-appb-000021
式(2)中,[FM] i表示第i帧感兴趣区域图像的恶性特征矩阵,式(3)中,
Figure PCTCN2022134699-appb-000022
表示第i帧感兴趣区域图像的特征矩阵中的第j个特征维度的特征值,
Figure PCTCN2022134699-appb-000023
表示所述静态图像分类神经网络模型所对应的第j个特征维度的恶性特征权重,
Figure PCTCN2022134699-appb-000024
表示所述静态图像分类神经网络模型所对应的第j个特征维度的良性特征权重,
Figure PCTCN2022134699-appb-000025
表示第i帧感兴趣区域图像的恶性特征矩阵中的第j个特征维度的恶性特征值。
In formula (2), [FM] i represents the malignant feature matrix of the i-th frame ROI image, in formula (3),
Figure PCTCN2022134699-appb-000022
Represents the eigenvalue of the jth feature dimension in the feature matrix of the i-th frame region of interest image,
Figure PCTCN2022134699-appb-000023
Indicates the malignant feature weight of the jth feature dimension corresponding to the static image classification neural network model,
Figure PCTCN2022134699-appb-000024
Indicates the benign feature weight of the jth feature dimension corresponding to the static image classification neural network model,
Figure PCTCN2022134699-appb-000025
Indicates the malignant feature value of the jth feature dimension in the malignant feature matrix of the i-th frame ROI image.
需要说明的是,如本领域技术人员所能理解的,
Figure PCTCN2022134699-appb-000026
表示从0和
Figure PCTCN2022134699-appb-000027
中取较大的那个,也即若
Figure PCTCN2022134699-appb-000028
大于0,则
Figure PCTCN2022134699-appb-000029
Figure PCTCN2022134699-appb-000030
Figure PCTCN2022134699-appb-000031
小于0,则
Figure PCTCN2022134699-appb-000032
取0。
It should be noted that, as those skilled in the art can understand,
Figure PCTCN2022134699-appb-000026
means from 0 and
Figure PCTCN2022134699-appb-000027
Take the larger one, that is, if
Figure PCTCN2022134699-appb-000028
greater than 0, then
Figure PCTCN2022134699-appb-000029
Pick
Figure PCTCN2022134699-appb-000030
like
Figure PCTCN2022134699-appb-000031
is less than 0, then
Figure PCTCN2022134699-appb-000032
Take 0.
进一步地,在一种示范性的实施方式中,所述根据各帧感兴趣区域图像的恶性特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件,包括:Further, in an exemplary implementation, the extracting of the malicious responsible frame according to the malignant feature matrix of each frame of the region-of-interest image until the first preset end condition is met includes:
针对每一帧感兴趣区域图像,将所述感兴趣区域图像的恶性特征矩阵中的所有特征维度的恶性特征值相加,以获取所述感兴趣区域图像的总恶性特征值;For each frame of the ROI image, add the malignant eigenvalues of all the feature dimensions in the malignant feature matrix of the ROI image to obtain the total malignant eigenvalue of the ROI image;
根据各帧感兴趣区域图像的总恶性特征值,进行恶性责任帧的提取,直至满足第一预设结束条件。According to the total malignant feature value of the region of interest image of each frame, the malicious responsible frame is extracted until the first preset end condition is met.
具体地,结合上文中的公式(2)和(3)可知,第i帧感兴趣区域图像的总恶性特征值可以以下式表示:Specifically, in combination with formulas (2) and (3) above, it can be seen that the total malignant feature value of the i-th frame ROI image can be expressed by the following formula:
Figure PCTCN2022134699-appb-000033
Figure PCTCN2022134699-appb-000033
由此,通过根据各帧感兴趣区域图像的总恶性特征值,进行恶性责任帧的提取,不仅可以进一步提高恶性责任帧的提取效率,而且可以有效防止提取出特征相似的恶性责任帧。Therefore, by extracting the malicious responsible frame according to the total malignant feature value of the ROI image of each frame, not only the extraction efficiency of the malicious responsible frame can be further improved, but also the malicious responsible frame with similar characteristics can be effectively prevented from being extracted.
请继续参考图12,如图12所示,在一种示范性的实施方式中,所述根据各帧感兴趣区域图像的总恶性特征值,进行恶性责任帧的提取,直至满足第一预设结束条件,包括:Please continue to refer to FIG. 12 , as shown in FIG. 12 , in an exemplary implementation, the malicious responsible frame is extracted according to the total malignant feature value of each frame of the region of interest image until the first preset is satisfied. End conditions, including:
步骤A10、对各帧感兴趣区域图像的总恶性特征值进行排序,将总恶性特征值最大的感兴趣区域图像确定为恶性责任帧;Step A10, sorting the total malignant feature values of the ROI images of each frame, and determining the ROI image with the largest total malignant feature value as the malignant responsible frame;
步骤A20、将所有的恶性责任帧与每一非恶性责任帧分别组成一第一图像集合,并分别计算每一第一图像集合的总恶性特征值,其中所述第一图像集合的总恶性特征值等于所述第一图像集合中的所有帧感兴趣区域图像的恶性特征矩阵进行最大池化操作后所得到的恶性特征矩阵中的所有特征维度的恶性特征值之和,所述非恶性责任帧为未被确定为恶性责任帧的感兴趣区域图像;Step A20, forming a first image set with all malicious responsible frames and each non-malignant responsible frame, and calculating the total malignant feature value of each first image set respectively, wherein the total malignant feature value of the first image set The value is equal to the sum of malignant feature values of all feature dimensions in the malignant feature matrix obtained after performing the maximum pooling operation on the malignant feature matrices of all frame ROI images in the first image set, and the non-malignant responsible frame is an image of a region of interest that is not determined to be a malicious frame;
步骤A30、判断总恶性特征值最小的第一图像集合所对应的恶性特征熵是否大于由所有恶性责任帧所构成的恶性责任帧集合所对应的恶性特征熵;Step A30, judging whether the malignant feature entropy corresponding to the first image set with the smallest total malignant feature value is greater than the malignant feature entropy corresponding to the malignant responsible frame set composed of all malignant responsible frames;
若否,则执行步骤A40,若是,则执行步骤A50;If not, then perform step A40, if so, then perform step A50;
步骤A40、将总恶性特征值最小的第一图像集合中的所有帧感兴趣区域图像均确定为恶性责任帧,并返回执行步骤A20;Step A40, determine all frames of ROI images in the first image set with the smallest total malignant feature value as malignant responsible frames, and return to step A20;
步骤A50、结束恶性责任帧的提取。Step A50, end the extraction of the malicious responsibility frame.
具体地,所述第一图像集合中的所有帧感兴趣区域图像的恶性特征矩阵进行最大池化操作,是指将所述第一图像集合中的所有帧感兴趣区域图像的恶性特征矩阵在列方向(即特征维度的方向)取最大恶性特征值,以得到每一特征维度的恶性特征值均为所述第一图像集合中的所有帧感兴趣区域图像的恶性特征矩阵在该特征维度的最大恶性特征值的1×k的恶性特征矩阵,最大池化操作所获得的恶性特征矩阵综合了所述第一图像集合中的每一帧感兴趣区域图像所能贡献的恶性信息。也即,对于图像集合A(A=[frame a,frame b,...frame n]),该图像集合A的总恶性特征值等于满足如下关系式: Specifically, performing the maximum pooling operation on the malignant feature matrices of all frames of the region of interest images in the first image collection means that the malignant feature matrices of all the frames of the region of interest images in the first image collection are listed in the column The direction (that is, the direction of the feature dimension) takes the maximum malignant eigenvalue, so that the malignant eigenvalue of each feature dimension is the maximum of the malignant feature matrix of all frame ROI images in the first image set in the feature dimension. A 1×k malignant feature matrix of malignant feature values, the malignant feature matrix obtained by the maximum pooling operation synthesizes the malignant information that can be contributed by each frame of the region of interest image in the first image set. That is, for an image set A (A=[frame a , frame b ,...frame n ]), the total malignant feature value of the image set A is equal to satisfy the following relationship:
Figure PCTCN2022134699-appb-000034
Figure PCTCN2022134699-appb-000034
由此,本申请提供的责任帧提取方法通过先识别出总恶性特征值最大的感兴趣区域图像作为恶性责任帧集合(也即恶性责任帧集合)中的第一个恶性责任帧,再将剩下的未被确定为恶性责任帧的每一帧感兴趣区域图像分别与第一个恶性责任帧组成一个第一图像集合(此时的每一个第一图像集合均包括第一个恶性责任帧和一个未被确定为恶性责任帧的感兴趣区域图像),并计算各个第一图像集合的总恶性特征值,则总恶性特征值最大的第一图像集合中的未被确定为恶性责任帧的感兴趣区域图像即为恶性责任帧集合中的第二个恶性责任帧。接着再将第一个恶性责任帧、第二个恶性责任帧与剩下的未被确定为恶性责任帧的每一帧感兴趣区域图像分别组成一个第一图像集合(此时的每一个第一图像集合均包括第一个恶性责任帧、第二个恶性责任帧和一个未被确定为恶性责任帧的感兴趣区域图像),通过计算各个第一图像集合的总恶性特征值,即可查找出总恶性特征值最大的第一图像集合,若所述总恶性特征值最大的第一图像集合的恶性特征熵大于由第一个恶性责任帧和第二个恶性责任帧所组成的恶性责任帧集合的恶性特征熵,则结束恶性责任帧的提取,并将所提取出的第一个恶性责任帧和第二个恶性责任帧作为最终的恶性责任帧;若所述总恶性特征值最大的第一图像集合的恶性特征熵小于或等于由第一个恶性责任帧和第二个恶性责任帧所组成的恶性责任帧集合的恶性特征熵,则将所述总恶性特征值最大的第一图像集合中的未被确定为恶性责任帧的感兴趣区域图像确定为恶性责任帧集合中的第三个恶性责任帧。重复上述步骤,直至总恶性特征值最大的第一图像集合的恶性特征熵大于由所有的恶性责任帧所组成的恶性责任帧集合的恶性特征熵。由于视觉上相同的感兴趣区域图像通常共享相似的恶性特征矩阵,因此添加相似的感兴趣区域图像不会对图像集合的总恶性特征值带来显著的影响,因此,采用上文所述的恶性责任帧的提取方法不会重复选择出相似的恶性责任帧。Therefore, the responsible frame extraction method provided by this application first identifies the image of the region of interest with the largest total malignant feature value as the first malicious responsible frame in the malicious responsible frame set (that is, the malicious responsible frame set), and then takes the remaining Each frame of the ROI image that is not determined to be a malicious responsibility frame and the first malicious responsibility frame form a first image set (every first image set at this time includes the first malicious responsibility frame and A region of interest image not determined as a malignant responsible frame), and calculate the total malignant feature value of each first image set, then the sense of not determined as a malignant responsible frame in the first image set with the largest total malignant feature value The ROI image is the second malicious frame in the malicious frame set. Then the first malicious responsibility frame, the second malicious responsibility frame and each frame of region-of-interest images that are not determined to be malicious responsibility frames are formed into a first image set respectively (every first frame of this moment) The image sets all include the first malicious frame, the second frame and an image of the region of interest that is not determined to be a frame), by calculating the total malignant feature value of each first image set, you can find out The first image set with the largest total malignant feature value, if the malignant feature entropy of the first image set with the largest total malignant feature value is greater than the malicious responsibility frame set composed of the first malicious responsibility frame and the second malicious responsibility frame The malicious feature entropy, then end the extraction of the malicious responsibility frame, and take the extracted first malicious responsibility frame and the second malicious responsibility frame as the final malicious responsibility frame; if the first The malignant feature entropy of the image set is less than or equal to the malignant feature entropy of the malignant responsibility frame set composed of the first malicious responsibility frame and the second malicious responsibility frame, then the first image set with the largest total malignant characteristic value The ROI image that is not determined to be a malicious frame is determined as the third malicious frame in the malicious frame set. Repeat the above steps until the malignant feature entropy of the first image set with the largest total malignant feature value is greater than the malignant feature entropy of the malignant responsible frame set composed of all malicious responsible frames. Since visually identical ROI images usually share similar malignant feature matrices, adding similar ROI images will not have a significant impact on the total malignant feature value of the image set, therefore, using the malignant The method of extracting responsibility frames will not repeatedly select similar vicious responsibility frames.
进一步地,在一种示范性的实施方式中,按照如下公式(6)和(7),计算图像集合的恶性特征熵:Further, in an exemplary implementation, the malignant feature entropy of the image set is calculated according to the following formulas (6) and (7):
H 1(A)=-p 1(A)×log 2p 1(A)            (6) H 1 (A)=-p 1 (A)×log 2 p 1 (A) (6)
Figure PCTCN2022134699-appb-000035
Figure PCTCN2022134699-appb-000035
式中,H 1(A)表示图像集合A的恶性特征熵,MScoreA表示图像集合A的总恶性特征值,BScoreA表示图像集合A的总良性特征值。 In the formula, H 1 (A) represents the malignant feature entropy of image set A, MScoreA represents the total malignant feature value of image set A, and BScoreA represents the total benign feature value of image set A.
具体地,当恶性特征熵增加时,则意味着预测结果的不确定性开始上升,因此,当恶性特征熵增加时,则需要停止继续提取新的恶性责任帧。由此,本申请通过根据特征熵是否上升来判断是否需要停 止恶性责任帧的提取,可以自动地依据所获取的医学视频的内容提取出能够为医学视频的分类贡献重要特征的所需数量的恶性责任帧。请参考图13,其示意性地给出了本申请一实施方式提供的责任帧图像集合的特征熵与责任帧数量之间的关系示意图。如图13所示,当恶性责任帧集合中的图像帧数少于5时,恶性责任帧集合的恶性特征熵持续减少,当恶性责任帧集合中的图像帧数超过5后,恶性特征熵开始增加,这意味着预测结果的不确定性增加,因此,对于图13所示的示例而言,当恶性责任帧集合中的恶性责任帧的数目达到5帧后,则不需要继续再提取新的恶性责任帧。Specifically, when the entropy of the malignant feature increases, it means that the uncertainty of the prediction result begins to rise. Therefore, when the entropy of the malignant feature increases, it is necessary to stop continuously extracting new malicious responsibility frames. Therefore, the present application can automatically extract the required number of malignant frames that can contribute important features to the classification of medical videos based on the content of the acquired medical video by judging whether it is necessary to stop the extraction of malignant responsible frames according to whether the feature entropy has increased. Responsibility frame. Please refer to FIG. 13 , which schematically shows the relationship between the feature entropy and the number of responsible frames provided by an embodiment of the present application. As shown in Figure 13, when the number of image frames in the malicious responsible frame set is less than 5, the malignant feature entropy of the malicious responsible frame set continues to decrease, and when the number of image frames in the malicious responsible frame set exceeds 5, the malignant feature entropy begins increases, which means that the uncertainty of the prediction results increases. Therefore, for the example shown in Figure 13, when the number of malicious responsible frames in the malicious responsible frame set reaches 5 frames, there is no need to continue to extract new Malignant responsibility frame.
在一种示范性的实施方式中,所述根据各帧感兴趣区域图像的特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件,包括:In an exemplary implementation, the extraction of benign responsible frames according to the feature matrix of each frame of the region-of-interest image until the second preset end condition is met includes:
针对每一帧感兴趣区域图像,根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的良性特征权重参数和恶性特征权重参数之差,获取所述感兴趣区域图像的良性特征矩阵;For each frame of the ROI image, the ROI image is acquired according to the feature matrix of the ROI image and the difference between the benign feature weight parameter and the malignant feature weight parameter corresponding to the static image classification neural network model The benign feature matrix of ;
根据各帧感兴趣区域图像的良性特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件。According to the benign feature matrix of each frame of the region of interest image, the benign responsible frame is extracted until the second preset end condition is met.
由上式(1)可知,所述感兴趣区域图像归属于良性类别的概率Y 0仅由良性特征权重参数和恶性特征权重参数的相对差值和所述感兴趣区域图像的特征矩阵X决定,因此,本申请通过根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的良性特征权重参数和恶性特征权重参数之差,获取所述感兴趣区域图像的良性特征矩阵,再根据各帧感兴趣区域图像的良性特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件,从而可以准确地提取出良性贡献信息量较大的良性责任帧。 It can be seen from the above formula (1) that the probability Y of the ROI image belonging to the benign category is only determined by the relative difference between the benign feature weight parameter and the malignant feature weight parameter and the feature matrix X of the ROI image, Therefore, the present application obtains the benign feature matrix of the image of the region of interest according to the feature matrix of the image of the region of interest and the difference between the benign feature weight parameter and the malignant feature weight parameter corresponding to the static image classification neural network model , and then extract the benign responsible frames according to the benign feature matrix of the ROI image in each frame, until the second preset end condition is met, so that the benign responsible frames with a large amount of benign contribution information can be accurately extracted.
在一种示范性的实施方式中,所述根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的良性特征权重参数和恶性特征权重参数之差,获取所述感兴趣区域图像的良性特征矩阵,包括:In an exemplary implementation, the sensory information is obtained according to the feature matrix of the image of the region of interest and the difference between the benign feature weight parameters and the malignant feature weight parameters corresponding to the static image classification neural network model. The benign feature matrix of the ROI image, including:
按照如下公式(8)和(9),获取所述感兴趣区域图像的良性特征矩阵:According to the following formulas (8) and (9), the benign feature matrix of the image of the region of interest is obtained:
Figure PCTCN2022134699-appb-000036
Figure PCTCN2022134699-appb-000036
Figure PCTCN2022134699-appb-000037
Figure PCTCN2022134699-appb-000037
式(8)中,[FB] i表示第i帧感兴趣区域图像的良性特征矩阵,式(9)中,
Figure PCTCN2022134699-appb-000038
表示第i帧感兴趣区域图像的特征矩阵中的第j个特征维度的特征值,
Figure PCTCN2022134699-appb-000039
表示所述静态图像分类神经网络模型所对应的第j个特征维度的良性特征权重,
Figure PCTCN2022134699-appb-000040
表示所述静态图像分类神经网络模型所对应的第j个特征维度的恶性特征权重,
Figure PCTCN2022134699-appb-000041
表示第i帧感兴趣区域图像的良性特征矩阵中的第j个特征维度的良性特征值。
In formula (8), [FB] i represents the benign feature matrix of the region-of-interest image of the i-th frame, in formula (9),
Figure PCTCN2022134699-appb-000038
Represents the eigenvalue of the jth feature dimension in the feature matrix of the i-th frame region of interest image,
Figure PCTCN2022134699-appb-000039
Indicates the benign feature weight of the jth feature dimension corresponding to the static image classification neural network model,
Figure PCTCN2022134699-appb-000040
Indicates the malignant feature weight of the jth feature dimension corresponding to the static image classification neural network model,
Figure PCTCN2022134699-appb-000041
Indicates the benign eigenvalue of the jth feature dimension in the benign feature matrix of the region-of-interest image of the i-th frame.
需要说明的是,如本领域技术人员所能理解的,
Figure PCTCN2022134699-appb-000042
表示从0和
Figure PCTCN2022134699-appb-000043
中取较大的那个,也即若
Figure PCTCN2022134699-appb-000044
大于0,则
Figure PCTCN2022134699-appb-000045
Figure PCTCN2022134699-appb-000046
Figure PCTCN2022134699-appb-000047
小于0,则
Figure PCTCN2022134699-appb-000048
取0。
It should be noted that, as those skilled in the art can understand,
Figure PCTCN2022134699-appb-000042
means from 0 and
Figure PCTCN2022134699-appb-000043
Take the larger one, that is, if
Figure PCTCN2022134699-appb-000044
greater than 0, then
Figure PCTCN2022134699-appb-000045
Pick
Figure PCTCN2022134699-appb-000046
like
Figure PCTCN2022134699-appb-000047
is less than 0, then
Figure PCTCN2022134699-appb-000048
Take 0.
进一步地,在一种示范性的实施方式中,所述根据各帧感兴趣区域图像的良性特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件,包括:Further, in an exemplary implementation, the extraction of benign responsible frames according to the benign feature matrix of each frame of the region-of-interest image until the second preset end condition is met includes:
针对每一帧感兴趣区域图像,将所述感兴趣区域图像的良性特征矩阵中的所有特征维度的良性特征值相加,以获取所述感兴趣区域图像的总良性特征值;For each frame of the region of interest image, add the benign eigenvalues of all the feature dimensions in the benign feature matrix of the region of interest image to obtain the total benign eigenvalue of the region of interest image;
根据各帧感兴趣区域图像的总良性特征值,进行良性责任帧的提取,直至满足第二预设结束条件。According to the total benign feature values of the ROI images of each frame, the benign responsible frame is extracted until the second preset end condition is satisfied.
具体地,结合上文中的公式(8)和(9)可知,第i帧感兴趣区域图像的总恶性特征值可以以下式表示:Specifically, in combination with formulas (8) and (9) above, it can be known that the total malignant feature value of the i-th frame ROI image can be expressed by the following formula:
Figure PCTCN2022134699-appb-000049
Figure PCTCN2022134699-appb-000049
由此,通过根据各帧感兴趣区域图像的总良性特征值,进行良性责任帧的提取,不仅可以进一步提高良性责任帧的提取效率,而且可以有效防止提取出特征相似的良性责任帧。Therefore, by extracting benign responsible frames according to the total benign feature values of the ROI images in each frame, not only can the extraction efficiency of benign responsible frames be further improved, but also effectively prevent the extraction of benign responsible frames with similar features.
请继续参考图14,其示意性地给出了本申请一实施方式提供的提取良性责任帧的具体流程示意图。如图14所示,在一种示范性的实施方式中,所述根据各帧感兴趣区域图像的总良性特征值,进行良性责任帧的提取,直至满足第二预设结束条件,包括:Please continue to refer to FIG. 14 , which schematically shows a specific flowchart of extracting benign responsibility frames provided by an embodiment of the present application. As shown in FIG. 14, in an exemplary implementation, the extraction of benign responsible frames is carried out according to the total benign feature value of the region-of-interest image of each frame until the second preset end condition is met, including:
步骤B10、对各帧感兴趣区域图像的总良性特征值进行排序,将总良性特征值最大的感兴趣区域图像确定为良性责任帧;Step B10, sorting the total benign feature values of the ROI images of each frame, and determining the ROI image with the largest total benign feature value as the benign responsible frame;
步骤B20、将所有的良性责任帧与每一非良性责任帧分别组成一第二图像集合,并分别计算每一第二图像集合的总良性特征值,其中所述第二图像集合的总良性特征值等于所述第二图像集合中的所有帧感兴趣区域图像的良性特征矩阵进行最大池化操作后所得到的良性特征矩阵中的所有特征维度的良性特征值之和,所述非良性责任帧为还未被确定为良性责任帧的感兴趣区域图像;Step B20. Composing all benign responsible frames and each non-benign responsible frame into a second image set, and calculating the total benign feature value of each second image set, wherein the total benign feature of the second image set The value is equal to the sum of the benign eigenvalues of all the feature dimensions in the benign feature matrix obtained after performing the maximum pooling operation on the benign feature matrices of all frame ROI images in the second image collection, and the non-benign responsible frame is an image of a region of interest that has not been determined to be a benign responsible frame;
步骤B30、判断总良性特征值最小的第二图像集合所对应的良性特征熵是否大于由所有良性责任帧所构成的良性责任帧集合所对应的良性特征熵;Step B30, judging whether the benign feature entropy corresponding to the second image set with the smallest total benign feature value is greater than the benign feature entropy corresponding to the benign responsible frame set composed of all benign responsible frames;
若否,则执行步骤B40,若是,则执行步骤B50;If not, then perform step B40, if so, then perform step B50;
步骤B40、将总良性特征值最小的第二图像集合中的所有感兴趣区域图像均确定为良性责任帧,并返回执行步骤B20;Step B40, determine all ROI images in the second image set with the smallest total benign feature value as benign responsible frames, and return to step B20;
步骤B50、结束良性责任帧的提取。Step B50, ending the extraction of benign responsibility frames.
具体地,所述第二图像集合中的所有帧感兴趣区域图像的良性特征矩阵进行最大池化操作,是指将所述第二图像集合中的所有帧感兴趣区域图像的良性特征矩阵在列方向(即特征维度的方向)取最大良性特征值,以得到每一特征维度的良性特征值均为所述第二图像集合中的所有帧感兴趣区域图像的良性特征矩阵在该特征维度的最大良性特征值的1×k的良性特征矩阵,最大池化操作所获得的良性特征矩阵综合了所述第二图像集合中的每一帧感兴趣区域图像所能贡献的良性信息。也即,对于图像集合A(A=[frame a,frame b,...frame n]),该图像集合A的总良性特征值等于满足如下关系式: Specifically, performing the maximum pooling operation on the benign feature matrices of all frames of the region-of-interest images in the second image collection means that the benign feature matrices of all the frames of the region-of-interest images in the second image collection are listed in the column The direction (that is, the direction of the feature dimension) takes the maximum benign eigenvalue, so that the benign eigenvalue of each feature dimension is the maximum benign feature matrix of all frame ROI images in the second image collection in the feature dimension. A 1×k benign feature matrix of benign eigenvalues, the benign feature matrix obtained by the max pooling operation synthesizes the benign information that can be contributed by each frame of the region-of-interest image in the second image set. That is, for an image set A (A=[frame a , frame b ,...frame n ]), the total benign eigenvalues of the image set A are equal to satisfy the following relationship:
Figure PCTCN2022134699-appb-000050
Figure PCTCN2022134699-appb-000050
由此,本申请提供的责任帧提取方法通过先识别出总良性特征值最大的感兴趣区域图像作为良性责任帧集合(也即良性责任帧集合)中的第一个良性责任帧,再将剩下的未被确定为良性责任帧的每一帧感兴趣区域图像分别与第一个良性责任帧组成一个第二图像集合(此时的每一个第二图像集合均包括第一个良性责任帧和一个未被确定为良性责任帧的感兴趣区域图像),并计算各个第二图像集合的总良性特征值,则总良性特征值最大的第二图像集合中的未被确定为良性责任帧的感兴趣区域图像即为良性责任帧集合中的第二个良性责任帧。接着再将第一个良性责任帧、第二个良性责任帧与剩下的未被确定为良性责任帧的每一帧感兴趣区域图像分别组成一个第二图像集合(此时的每一个第二图像集合均包括第一个良性责任帧、第二个良性责任帧和一个未被确定为良性责任帧的感兴趣区域图像),通过计算各个第二图像集合的总良性特征值,即可查找出总良性特征值最大的第二图像集合,若所述总良性特征值最大的第二图像集合的良性特征熵大于由第一个良性责任帧和第二个良性责任帧所组成的良性责任帧集合的良性特征熵,则结束良性责任帧的提取,并将所提取出的第一个良性责任帧和第二个良性责任帧作为最终的良性责任帧;若所述总良性特征值最大的第二图像集合的良性特征熵小于或等于由第一个良性责任帧和第二个良性责任帧所组成的良性责任帧集合的良性特征熵,则将所述总良性特征值最大的第二图像集合中的未被确定为良性责任帧的感兴趣区域图像确定为良性责任帧集合中的第三个良性责任帧。重复上述步骤,直至总良性特征值最大的第二图像集合的良性特征熵大于由所有的良性责任帧所组成的良性责任帧集合的良性特征熵。由于视觉上相同的感兴趣区域图像通常共享相似的良性特征矩阵,因此添加相似的感兴趣区域图像不会对图像集合的总良性特征值带来显著的影响,因此,采用上文所述的良性责任帧的提取方法不会重复选择出相似的良性责任帧。Therefore, the responsible frame extraction method provided in this application firstly identifies the ROI image with the largest total benign feature value as the first benign responsible frame in the set of benign responsible frames (that is, the set of benign responsible frames), and then takes the remaining Each frame of the region-of-interest image that is not determined to be a benign responsible frame and the first benign responsible frame form a second image set (each second image set at this time includes the first benign responsible frame and A region of interest image not determined as a benign responsible frame), and calculate the total benign eigenvalues of each second image set, then the sense of not determined as a benign responsible frame in the second image set with the largest total benign eigenvalue The ROI image is the second benign responsible frame in the set of benign responsible frames. Then the first benign responsibility frame, the second benign responsibility frame and each frame of ROI images that are not determined to be benign responsibility frames form a second image set respectively (every second image set at this time) The image sets all include the first benign responsible frame, the second benign responsible frame and an image of the region of interest that is not determined to be a benign responsible frame), by calculating the total benign eigenvalues of each second image set, you can find out The second image set with the largest total benign feature value, if the benign feature entropy of the second image set with the largest total benign feature value is greater than the benign responsible frame set composed of the first benign responsible frame and the second benign responsible frame benign feature entropy, the extraction of the benign responsibility frame ends, and the extracted first benign responsibility frame and the second benign responsibility frame are taken as the final benign responsibility frame; if the second benign responsibility frame with the largest total benign feature value The benign feature entropy of the image set is less than or equal to the benign feature entropy of the benign responsible frame set composed of the first benign responsible frame and the second benign responsible frame, then the second image set with the largest total benign feature value The ROI image that is not determined as a benign responsible frame is determined as the third benign responsible frame in the benign responsible frame set. Repeat the above steps until the benign feature entropy of the second image set with the largest total benign feature value is greater than the benign feature entropy of the benign responsible frame set composed of all benign responsible frames. Since visually identical ROI images usually share similar benign feature matrices, adding similar ROI images will not have a significant impact on the total benign feature value of the image collection, therefore, the benign The extraction method of responsibility frame will not repeatedly select similar benign responsibility frames.
具体地,在一种示范性的实施方式中,按照如下公式(12)和(13),计算图像集合的良性特征熵:Specifically, in an exemplary implementation, the benign feature entropy of the image set is calculated according to the following formulas (12) and (13):
H 0(A)=-p 0(A)×log 2p 0(A)             (12) H 0 (A)=-p 0 (A)×log 2 p 0 (A) (12)
Figure PCTCN2022134699-appb-000051
Figure PCTCN2022134699-appb-000051
式中,H 0(A)表示图像集合A的良性特征熵,MScoreA表示图像集合X的总恶性特征值,BScoreA表示图像集合A的总良性特征值。 In the formula, H 0 (A) represents the benign feature entropy of image set A, MScoreA represents the total malignant feature value of image set X, and BScoreA represents the total benign feature value of image set A.
具体地,当良性特征熵增加时,则意味着预测结果的不确定性开始上升,因此,当良性特征熵增加时,则需要停止继续提取新的良性责任帧。由此,本申请通过根据特征熵是否上升来判断是否需要停止良性责任帧的提取,可以自动地依据所获取的医学视频的内容提取出能够为医学视频的分类贡献重要特征的所需数量的良性责任帧。Specifically, when the entropy of benign features increases, it means that the uncertainty of the prediction results begins to rise. Therefore, when the entropy of benign features increases, it is necessary to stop continuously extracting new benign responsibility frames. Therefore, this application judges whether to stop the extraction of benign responsible frames according to whether the feature entropy has increased, and can automatically extract the required number of benign frames that can contribute important features to the classification of medical videos according to the content of the acquired medical video. Responsibility frame.
请继续参考图16,其示意性地给出了本申请一实施方式提供的医生调整责任帧的软件界面示意图。如图16所示,所提取出的恶性责任帧和/或良性责任帧可以在软件界面中的责任切面推荐窗口进行显示,所获取的待分类医学视频也可以在软件界面中的视频播放窗口进行显示,医生可以通过“上一帧”、“下一帧”按钮访问责任帧附近的临近帧(即临近感兴趣区域图像),并且可以选择接受或拒绝当前帧(即当前访问的感兴趣区域图像)作为责任帧,系统会自动记录医生确认后的责任帧。Please continue to refer to FIG. 16 , which schematically shows a software interface of a doctor adjusting responsibility frame provided by an embodiment of the present application. As shown in Figure 16, the extracted malicious responsibility frames and/or benign responsibility frames can be displayed in the responsibility aspect recommendation window in the software interface, and the acquired medical videos to be classified can also be displayed in the video playback window in the software interface. display, the doctor can access the adjacent frames near the responsible frame (that is, the image of the region of interest) near the responsible frame through the buttons of "previous frame" and "next frame", and can choose to accept or reject the current frame (that is, the image of the region of interest currently accessed ) as the responsibility frame, the system will automatically record the responsibility frame confirmed by the doctor.
本申请的发明人共采集了13702张2D的超声乳腺结节图像(包括来自2457名良性病理学患者的9177张图像,以及来自991名恶性病理学患者的4545张图像),以及2141个乳腺超声视频(包括来自560名良性病理患者的1227个视频和412名恶性病理学患者的914个视频)进行静态图像分类神经网络模型和视频分类模型的训练和验证。通过采用AUROC(受试操作者曲线下面积)、准确率、敏感性、特异性指标等对本申请提供的视频分类方法的表现进行了评价。五折交叉(把数据集平均分成5等份,每轮次拿一份做测试集,其余用做训练集)的验证结果如下表1所示,测试集的结果如下表2所示。The inventors of the present application collected a total of 13702 2D ultrasound breast nodule images (including 9177 images from 2457 patients with benign pathology and 4545 images from 991 patients with malignant pathology), and 2141 breast ultrasound images Videos (including 1227 videos from 560 patients with benign pathology and 914 videos from 412 patients with malignant pathology) were used for training and validation of the still image classification neural network model and the video classification model. The performance of the video classification method provided by this application was evaluated by using AUROC (area under the test operator curve), accuracy rate, sensitivity, and specificity indicators. The verification results of the 50-fold crossover (dividing the data set into 5 equal parts on average, taking one part for each round as the test set and the rest as the training set) are shown in Table 1 below, and the results of the test set are shown in Table 2 below.
表1.乳腺结节良恶性分类五折交叉验证结果Table 1. Five-fold cross-validation results for the classification of benign and malignant breast nodules
Figure PCTCN2022134699-appb-000052
Figure PCTCN2022134699-appb-000052
表2.乳腺结节良恶性分类测试集结果Table 2. Test set results for benign and malignant breast nodules classification
Figure PCTCN2022134699-appb-000053
Figure PCTCN2022134699-appb-000053
由表1和表2可知,通过根据采用本申请提供的责任帧提取方法所提取出的责任帧(包括恶性责任帧和/或良性责任帧)进行乳腺结节良恶性分类的AUROC、准确性、敏感性、特异性均明显优于根据医生人为选择的责任帧进行乳腺结节良恶性分类的AUROC、准确性、敏感性、特异性。As can be seen from Table 1 and Table 2, the AUROC, accuracy, Sensitivity and specificity are significantly better than AUROC, accuracy, sensitivity and specificity for benign and malignant classification of breast nodules based on the responsibility frame artificially selected by doctors.
与上述的责任帧提取方法相对应,本申请还提供一种视频分类方法,请参考图17,示意性地给出了本申请一实施方式提供的视频分类方法的流程示意图。如图17所示,所述视频分类方法包括如下步骤:Corresponding to the above method for extracting responsible frames, the present application also provides a video classification method. Please refer to FIG. 17 , which schematically shows a flowchart of the video classification method provided by an embodiment of the present application. As shown in Figure 17, described video classification method comprises the steps:
步骤S310、采用上文所述的责任帧提取方法,从所获取的医学视频中提取出预设数量的责任帧。Step S310, using the method for extracting responsible frames described above, extracting a preset number of responsible frames from the acquired medical video.
步骤S320、根据所述预设数量的责任帧的特征矩阵,进行视频的分类。Step S320, classify the video according to the feature matrix of the preset number of responsible frames.
由于本申请提供的视频分类方法是采用上文所述的责任帧提取方法提取出预设数量的责任帧,由此,本申请提供的视频分类方法具有上文所述的责任帧提取方法的所有优点。此外,由于本申请提供的视频分类方法是根据所提取出的预设数量的责任帧进行视频的分类,由此可以有效减少所述医学视频中的噪声帧的干扰,有效提高了视频分类(例如良恶性结节视频的分类)的准确率。Since the video classification method provided by this application uses the above-mentioned responsible frame extraction method to extract a preset number of responsible frames, thus, the video classification method provided by this application has all the above-mentioned responsible frame extraction methods. advantage. In addition, since the video classification method provided by the present application is to classify videos according to the extracted preset number of responsible frames, it can effectively reduce the interference of noise frames in the medical video, and effectively improve video classification (such as Classification of benign and malignant nodule videos) accuracy.
进一步地,所述根据所述预设数量的责任帧的特征矩阵,进行视频的分类,包括:Further, the video classification according to the feature matrix of the preset number of responsible frames includes:
对所述预设数量的责任帧的特征矩阵进行最大池化操作,以获取责任帧集合的特征矩阵;Performing a maximum pooling operation on the feature matrices of the preset number of responsible frames to obtain the feature matrix of the responsible frame set;
根据所述责任帧集合的特征矩阵进行视频的分类。Video classification is performed according to the feature matrix of the responsible frame set.
具体地,通过对所有的责任帧做列方向上的最大池化,可以获取由所有的责任帧贡献的特征矩阵, 即责任帧集合的特征矩阵,由此,根据所获取的责任帧集合的特征矩阵,即可准确地进行视频的分类。Specifically, by performing maximum pooling on all responsible frames in the column direction, the feature matrix contributed by all responsible frames can be obtained, that is, the feature matrix of the responsible frame set, and thus, according to the obtained feature of the responsible frame set matrix, video classification can be performed accurately.
在一种示范性的实施方式中,所述根据所述责任帧集合的特征矩阵进行视频的分类,包括:In an exemplary implementation, the classification of the video according to the feature matrix of the responsible frame set includes:
将所述责任帧集合的特征矩阵输入视频分类模型中,以进行视频的分类。Input the feature matrix of the responsible frame set into the video classification model to classify the video.
由此,通过将由所有的责任帧贡献的特征矩阵输入预先训练好的视频分类模型中,即可进行最终的视频的分类。Thus, by inputting the feature matrix contributed by all the responsible frames into the pre-trained video classification model, the final video classification can be performed.
进一步地,所述视频分类模型为随机森林分类模型。随机森林分类模型由多棵分类树构成,每棵分类树会对输入的特征矩阵进行分类,随机森林分类模型根据所有分类树的分类结果进行投票,最终作出病灶良恶性的判断。需要说明的是,如本领域技术人员所能理解的,在其它一些实施方式中,所述视频分类模型还可以为除随机森林分类模型以外的其它分类模型,本申请对此并不进行限定。此外,如本领域技术人员所能理解的,所述随机森林分类模型经过预先训练得到,具体地,可以采用视频训练集(视频训练集包括视频的责任帧集合的特征矩阵和对应的分类标签)对预先搭建的随机森林分类模型进行训练以得到视频分类模型。Further, the video classification model is a random forest classification model. The random forest classification model consists of multiple classification trees, and each classification tree classifies the input feature matrix. The random forest classification model votes according to the classification results of all classification trees, and finally makes a judgment of benign and malignant lesions. It should be noted that, as those skilled in the art can understand, in some other implementation manners, the video classification model may also be other classification models than the random forest classification model, which is not limited in this application. In addition, as those skilled in the art can understand, the random forest classification model is obtained through pre-training, specifically, a video training set can be used (the video training set includes the feature matrix and the corresponding classification label of the responsible frame set of the video) Train a pre-built random forest classification model to get a video classification model.
在利用信息熵减少的方法提取责任帧时,针对不同的骨架网络(例如MobileNet、DenseNet121、Xception),信息熵减少至0时能够提取的责任帧的帧数是不同的,因此针对不同的骨架网络进行特征提取,对分类模型的性能影响也是不同的。举例而言,当骨架网络采用MobileNet模型,所提取的责任帧的帧数为5时,分类模型的各项评价指标:ROC-AUC为0.885(95%CI:0.830-0.939),PR-AUC为0.876(95%CI:0.831-0.927),Accuarcy为0.82,F1-Score为0.819,所有评价指标均优于直接采用视频进行良恶性预测时的指标。其中,ROC(Receiver Operating Characteristic Curve)表示受试者工作特征曲线,AUC(Aera under the curve)表示曲线下面积,ROC-AUC表示ROC曲线下的面积,CI(confidence interval)表示置信区间,PR-AUC表示准确率(Precision)和召回率(Recall)曲线下的面积。当骨架网络采用DenseNet121模型,所提取的责任帧的帧数为10时,分类模型的各项评价指标:ROC-AUC为0.891(95%CI:0.835-0.947),PR-AUC为0.908(95%CI:0.876-0.940),Accuarcy为0.85,F1-Score为0.838,与直接采用视频进行良恶性判断相比,ROC-AUC和PR-AUC基本持平(相差0.002),Accuarcy高出0.01,F1-Score大幅改善,由0.819上升到0.838。可见,采用不同的骨架网络进行特征提取,会对分类模型的预测性能产生不同的影响,因此可以根据分类模型的具体情况来选取合适的网络模型作为骨架网络。When using the method of information entropy reduction to extract responsibility frames, for different skeleton networks (such as MobileNet, DenseNet121, Xception), the number of frames of responsibility frames that can be extracted when the information entropy is reduced to 0 is different, so for different skeleton networks Feature extraction has a different impact on the performance of the classification model. For example, when the skeleton network adopts the MobileNet model and the number of the extracted responsibility frames is 5, the evaluation indicators of the classification model: ROC-AUC is 0.885 (95% CI: 0.830-0.939), and PR-AUC is 0.876 (95% CI: 0.831-0.927), Accuarcy is 0.82, F1-Score is 0.819, and all evaluation indicators are better than those when directly using video to predict benign and malignant. Among them, ROC (Receiver Operating Characteristic Curve) represents the receiver operating characteristic curve, AUC (Aera under the curve) represents the area under the curve, ROC-AUC represents the area under the ROC curve, CI (confidence interval) represents the confidence interval, PR- AUC represents the area under the precision and recall curves. When the skeleton network adopts the DenseNet121 model and the number of extracted responsible frames is 10, the evaluation indicators of the classification model: ROC-AUC is 0.891 (95% CI: 0.835-0.947), PR-AUC is 0.908 (95% CI: 0.876-0.940), Accuarcy is 0.85, and F1-Score is 0.838. Compared with directly using video to judge benign and malignant, ROC-AUC and PR-AUC are basically the same (difference 0.002), Accuarcy is 0.01 higher, and F1-Score Significant improvement, from 0.819 to 0.838. It can be seen that using different skeleton networks for feature extraction will have different effects on the prediction performance of the classification model, so the appropriate network model can be selected as the skeleton network according to the specific conditions of the classification model.
在一种示范性的实施方式中,所述视频分类方法还包括:In an exemplary implementation, the video classification method further includes:
对所述视频的分类结果以及所提取出的预设数量的责任帧进行显示。The classification result of the video and the extracted preset number of responsible frames are displayed.
由此,通过对所提取出的预设数量的责任帧进行显示,可以给出视频分类所依据的责任帧,以便于医生能够根据所提取出的责任帧判断所得到的视频分类的结果是否准确,从而进一步提高视频分类的准确率。举例而言,当所述视频为超声视频时,通过输出从所述超声视频中所提取出的预设数量的责任帧,可以有助于进一步在超声筛查过程中降低漏诊率和误诊率。Thus, by displaying the extracted preset number of responsible frames, the responsible frames on which video classification is based can be given, so that doctors can judge whether the obtained video classification results are accurate based on the extracted responsible frames , so as to further improve the accuracy of video classification. For example, when the video is an ultrasound video, by outputting a preset number of responsible frames extracted from the ultrasound video, it may help to further reduce the missed diagnosis rate and misdiagnosis rate during the ultrasound screening process.
基于同一发明构思,本申请还提供一种电子设备,请参考图18,示意性地给出了本申请一实施方式提供的电子设备的方框结构示意图。如图18所示,所述电子设备包括处理器101和存储器103,所述存储器103上存储有计算机程序,所述计算机程序被所述处理器101执行时,实现上文所述的责任帧提取方法或视频分类方法。由于本申请提供的电子设备与上文所述的责任帧提取方法属于同一发明构思,因此本申请提供的电子设备具有上文所述的责任帧提取方法的所有优点,故对此不再进行赘述。Based on the same inventive concept, the present application also provides an electronic device. Please refer to FIG. 18 , which schematically shows a block structure diagram of the electronic device provided in an embodiment of the present application. As shown in FIG. 18, the electronic device includes a processor 101 and a memory 103, and a computer program is stored on the memory 103. When the computer program is executed by the processor 101, the above-mentioned responsibility frame extraction is realized. method or video classification method. Since the electronic device provided by this application and the method for extracting responsible frames described above belong to the same inventive concept, the electronic device provided by this application has all the advantages of the method for extracting responsible frames described above, so no further details are given here. .
如图18所示,所述电子设备还包括通信接口102和通信总线104,其中所述处理器101、所述通信接口102、所述存储器103通过通信总线104完成相互间的通信。所述通信总线104可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线104可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。所述通信接口102用于上述电子设备与其他设备之间的通信。As shown in FIG. 18 , the electronic device further includes a communication interface 102 and a communication bus 104 , wherein the processor 101 , the communication interface 102 , and the memory 103 communicate with each other through the communication bus 104 . The communication bus 104 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The communication bus 104 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus. The communication interface 102 is used for communication between the electronic device and other devices.
本申请中所称处理器101可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,所述处理器101是所述电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分。The processor 101 mentioned in this application can be a central processing unit (Central Processing Unit, CPU), and can also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor, etc. The processor 101 is the control center of the electronic device, connecting various parts of the entire electronic device with various interfaces and lines.
所述存储器103可用于存储所述计算机程序,所述处理器101通过运行或执行存储在所述存储器103内的计算机程序,以及调用存储在存储器103内的数据,实现所述电子设备的各种功能。The memory 103 can be used to store the computer program, and the processor 101 implements various functions of the electronic device by running or executing the computer program stored in the memory 103 and calling the data stored in the memory 103. Function.
所述存储器103可以包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。The memory 103 may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
本申请还提供了一种可读存储介质,所述可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时可以实现上文所述的责任帧提取方法或视频分类方法。由于本申请提供的存储介质与上文所述的责任帧提取方法属于同一发明构思,因此本申请提供的存储介质具有上文所述的责任帧提取方法的所有优点,故对此不再进行赘述The present application also provides a readable storage medium, wherein a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, the method for extracting responsible frames or the video classification method described above can be implemented. Since the storage medium provided by this application and the method for extracting responsible frames described above belong to the same inventive concept, the storage medium provided by this application has all the advantages of the method for extracting responsible frames described above, so no further details are given here.
本申请实施方式的可读存储介质,可以采用一个或多个计算机可读的介质的任意组合。可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机硬盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其组合使用。The readable storage medium in the embodiments of the present application may use any combination of one or more computer-readable media. The readable medium may be a computer readable signal medium or a computer readable storage medium. A computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connection with one or more wires, portable computer hard disk, hard disk, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. As used herein, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of the present application may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via an Internet connection using an Internet service provider). ).
综上所述,与现有技术相比,本申请提供的责任帧提取方法、视频分类方法、电子设备和存储介质具有以下优点:In summary, compared with the prior art, the responsible frame extraction method, video classification method, electronic equipment and storage medium provided by this application have the following advantages:
(1)本申请提供的责任帧提取方法、电子设备和存储介质,通过先获取待提取视频;再采用静态图像分类神经网络模型的骨架网络对所述待提取视频中的每一帧图像进行特征提取,以获取每一帧图像的特征矩阵;最后根据每一帧图像的特征矩阵,提取出预设数量的责任帧。由此,可以自动提取出贡献特征不重复的多张责任帧,实现了不需人为定义隔帧提取距离即可以提取出特征多样化的责任帧,提取出的责任帧可以为后续的视频分类奠定良好的基础,有效消除了在视频分类过程中,噪声帧图像对视频分类所造成的干扰。(1) The responsible frame extraction method, electronic equipment and storage medium provided by the application, by first obtaining the video to be extracted; then using the skeleton network of the static image classification neural network model to perform features on each frame of the image in the video to be extracted Extract to obtain the feature matrix of each frame of image; finally, extract a preset number of responsible frames according to the feature matrix of each frame of image. As a result, multiple responsible frames with non-repetitive contribution features can be automatically extracted, and it is possible to extract responsible frames with diverse features without manually defining the extraction distance between frames. The extracted responsible frames can lay the foundation for subsequent video classification. A good foundation effectively eliminates the interference caused by noise frame images on video classification during the video classification process.
(2)本申请提供的视频分类方法通过采用上文所述的责任帧提取方法提取出预设数量的责任帧;并根据所提取出的预设数量的责任帧的特征矩阵,进行视频的分类。由于本申请提供的视频分类方法是采用上文所述的责任帧提取方法提取出预设数量的责任帧,由此,本申请提供的视频分类方法具有上文所述的责任帧提取方法的所有优点。此外,由于本申请提供的视频分类方法是根据所提取出的预设数量的责任帧进行视频的分类,由此可以有效减少所述视频中的噪声帧的干扰,有效提高了视频分类的准确率。(2) The video classification method provided by this application extracts a preset number of responsible frames by using the above-mentioned responsible frame extraction method; and classifies the video according to the feature matrix of the extracted preset number of responsible frames . Since the video classification method provided by this application uses the above-mentioned responsible frame extraction method to extract a preset number of responsible frames, thus, the video classification method provided by this application has all the above-mentioned responsible frame extraction methods. advantage. In addition, since the video classification method provided by this application classifies videos based on the extracted preset number of responsible frames, it can effectively reduce the interference of noise frames in the video and effectively improve the accuracy of video classification .
应当注意的是,在本文的实施方式中所揭露的装置和方法,也可以通过其他的方式实现。以上所描述的装置实施方式仅仅是示意性的,例如,附图中的流程图和框图显示了根据本文的多个实施方式的 装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用于执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。It should be noted that the devices and methods disclosed in the embodiments herein may also be implemented in other ways. The device embodiments described above are only illustrative, for example, the flowcharts and block diagrams in the accompanying drawings show the architecture, functions and operations of possible implementations of devices, methods and computer program products according to multiple embodiments herein . In this regard, each block in a flowchart or block diagram may represent a module, a program segment, or a portion of code that includes one or more programmable components for implementing specified logical functions. Executable instructions, the module, program segment or part of the code contains one or more executable instructions for realizing the specified logic function. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also to be noted that each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or actions. implemented, or may be implemented by a combination of special purpose hardware and computer instructions.
另外,在本文各个实施方式中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。In addition, the functional modules in the various embodiments herein can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
上述描述仅是对本申请较佳实施方式的描述,并非对本申请范围的任何限定,本申请领域的普通技术人员根据上述揭示内容做的任何变更、修饰,均属于本申请的保护范围。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若这些修改和变型属于本申请及其等同技术的范围之内,则本申请也意图包括这些改动和变型在内。The above description is only a description of the preferred embodiments of the present application, not any limitation to the scope of the present application. Any changes and modifications made by those of ordinary skill in the field of the present application based on the above disclosures belong to the protection scope of the present application. Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. Thus, if these modifications and variations belong to the scope of the present application and its equivalent technology, the present application also intends to include these modifications and variations.

Claims (20)

  1. 一种责任帧提取方法,其特征在于,包括:A responsibility frame extraction method, characterized in that, comprising:
    获取待提取视频;Obtain the video to be extracted;
    采用静态图像分类神经网络模型的骨架网络对所述待提取视频中的每一帧图像进行特征提取,以获取每一帧图像的特征矩阵;Using the skeleton network of the static image classification neural network model to perform feature extraction on each frame of image in the video to be extracted, to obtain the feature matrix of each frame of image;
    对所有帧图像的特征矩阵进行最大池化操作,以获取所述待提取视频的视频特征矩阵;Performing a maximum pooling operation on the feature matrices of all frame images to obtain the video feature matrix of the video to be extracted;
    根据每一帧图像的特征矩阵和所述视频特征矩阵,提取出预设数量的责任帧。According to the feature matrix of each frame image and the video feature matrix, a preset number of responsible frames are extracted.
  2. 根据权利要求1所述的责任帧提取方法,其特征在于,所述根据每一帧图像的特征矩阵和所述视频特征矩阵,提取出预设数量的责任帧,包括:The responsible frame extraction method according to claim 1, wherein the extraction of a preset number of responsible frames according to the feature matrix of each frame image and the video feature matrix includes:
    将所述视频特征矩阵中的每个特征维度的特征值乘以该特征维度的重要性值,以获取视频特征重要性矩阵;Multiplying the eigenvalue of each feature dimension in the video feature matrix by the importance value of the feature dimension to obtain the video feature importance matrix;
    针对每一帧图像,将该帧图像的特征矩阵中的每个特征维度的特征值乘以该特征维度的重要性值,以获取该帧图像的特征重要性矩阵;For each frame of image, multiply the eigenvalue of each feature dimension in the feature matrix of the frame image by the importance value of the feature dimension to obtain the feature importance matrix of the frame image;
    根据所述视频特征重要性矩阵和每一帧图像的特征重要性矩阵,提取出预设数量的责任帧。According to the video feature importance matrix and the feature importance matrix of each frame image, a preset number of responsible frames are extracted.
  3. 根据权利要求2所述的责任帧提取方法,其特征在于,所述根据所述视频特征重要性矩阵和每一帧图像的特征重要性矩阵,提取出预设数量的责任帧,包括:The responsible frame extraction method according to claim 2, wherein the extraction of a preset number of responsible frames according to the feature importance matrix of the video and the feature importance matrix of each frame image includes:
    步骤A1、以所述视频特征重要性矩阵作为当前视频特征重要性矩阵;Step A1, using the video feature importance matrix as the current video feature importance matrix;
    步骤B1、针对每一帧图像,将所述当前视频特征重要性矩阵减去该帧图像的特征重要性矩阵,以获取该帧图像所对应的剩余特征重要性矩阵;Step B1, for each frame of image, subtracting the feature importance matrix of the frame image from the current video feature importance matrix to obtain the remaining feature importance matrix corresponding to the frame image;
    步骤C1、针对每一帧图像,将该帧图像所对应的剩余特征重要性矩阵中的各个特征维度的特征值相加,以获取该帧图像所对应的剩余信息熵;Step C1. For each frame of image, add the eigenvalues of each feature dimension in the remaining feature importance matrix corresponding to the frame of image to obtain the remaining information entropy corresponding to the frame of image;
    步骤D1、将剩余信息熵最小的图像作为当前责任帧;Step D1, taking the image with the smallest remaining information entropy as the current responsible frame;
    步骤E1、将所述当前责任帧所对应的剩余特征重要性矩阵作为新的当前视频特征重要性矩阵;Step E1, using the remaining feature importance matrix corresponding to the current responsible frame as a new current video feature importance matrix;
    重复上述步骤B1至步骤E1,直至提取出预设数量的责任帧。The above steps B1 to E1 are repeated until a preset number of responsible frames are extracted.
  4. 根据权利要求3所述的责任帧提取方法,其特征在于,所述将所述当前视频特征重要性矩阵减去该帧图像的特征重要性矩阵,以获取该帧图像所对应的剩余特征重要性矩阵,包括:The responsible frame extraction method according to claim 3, wherein the feature importance matrix of the frame image is subtracted from the feature importance matrix of the current video to obtain the remaining feature importance corresponding to the frame image Matrix, including:
    将所述当前视频特征重要性矩阵中的每一特征维度的特征值减去该帧图像的特征重要性矩阵中的对应特征维度的特征值,以获得每一特征维度的特征值差;The eigenvalue of each feature dimension in the feature importance matrix of the current video is subtracted from the eigenvalue of the corresponding feature dimension in the feature importance matrix of the frame image to obtain the eigenvalue difference of each feature dimension;
    针对每一特征维度的特征值差,若该特征维度的特征值差小于0,则将0作为该帧图像所对应的剩余特征重要性矩阵中的对应特征维度的特征值;若该特征维度的特征值差大于或等于0,则将该特征维度的特征值差作为该帧图像所对应的剩余特征重要性矩阵中的对应特征维度的特征值。For the eigenvalue difference of each feature dimension, if the eigenvalue difference of the feature dimension is less than 0, then use 0 as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image; if the feature dimension’s If the eigenvalue difference is greater than or equal to 0, the eigenvalue difference of the feature dimension is used as the eigenvalue of the corresponding feature dimension in the remaining feature importance matrix corresponding to the frame image.
  5. 根据权利要求1所述的责任帧提取方法,其特征在于,所述根据每一帧图像的特征矩阵,提取出预设数量的责任帧,包括:The responsible frame extraction method according to claim 1, wherein the extraction of a preset number of responsible frames according to the feature matrix of each frame image includes:
    针对每一帧图像,将该帧图像的特征矩阵中的每个特征维度的特征值乘以该特征维度的贡献权重值,以获取该帧图像的特征熵矩阵;For each frame of image, multiply the eigenvalue of each feature dimension in the feature matrix of the frame image by the contribution weight value of the feature dimension to obtain the feature entropy matrix of the frame image;
    对所有帧图像的特征熵矩阵进行最大池化操作,以获取所述待提取视频的视频特征熵矩阵;Performing a maximum pooling operation on the feature entropy matrices of all frame images to obtain the video feature entropy matrix of the video to be extracted;
    根据每一帧图像的特征熵矩阵和所述视频特征熵矩阵,提取出预设数量的责任帧。According to the feature entropy matrix of each frame image and the video feature entropy matrix, a preset number of responsible frames are extracted.
  6. 根据权利要求5所述的责任帧提取方法,其特征在于,所述根据每一帧图像的特征熵矩阵和所述视频特征熵矩阵,提取出预设数量的责任帧,包括:The responsible frame extraction method according to claim 5, wherein the extraction of a preset number of responsible frames according to the feature entropy matrix of each frame of image and the video feature entropy matrix includes:
    针对每一帧图像,将该帧图像的特征熵矩阵中的所有特征维度的特征值相加,以获取该帧图像的评估分值;For each frame image, add the eigenvalues of all feature dimensions in the feature entropy matrix of the frame image to obtain the evaluation score of the frame image;
    将所述视频特征熵矩阵中的所有特征维度的特征值相加,以获取所述待提取视频的评估分值;adding the eigenvalues of all feature dimensions in the video feature entropy matrix to obtain the evaluation score of the video to be extracted;
    根据每一帧图像的评估分值和所述待提取视频的评估分值,提取出预设数量的责任帧,其中,所述待提取视频的评估分值与由所述预设数量的责任帧所构成的图像集合的评估分值的差值最小。According to the evaluation score of each frame image and the evaluation score of the video to be extracted, a preset number of responsible frames are extracted, wherein the evaluation score of the video to be extracted is related to the preset number of responsible frames The resulting set of images has the smallest difference in evaluation scores.
  7. 根据权利要求6所述的责任帧提取方法,其特征在于,所述根据每一帧图像的评估分值和所述待提取视频的评估分值,提取出预设数量的责任帧,包括:The responsible frame extraction method according to claim 6, wherein the extraction of a preset number of responsible frames according to the evaluation score of each frame image and the evaluation score of the video to be extracted includes:
    步骤A2、针对每一帧图像,计算所述待提取视频的评估分值与该帧图像的评估分值的差值,以获取该帧图像的特征熵差;Step A2. For each frame of image, calculate the difference between the evaluation score of the video to be extracted and the evaluation score of the frame image, so as to obtain the feature entropy difference of the frame image;
    步骤B2、将特征熵差最小的图像确定为责任帧;Step B2, determining the image with the smallest feature entropy difference as the responsible frame;
    步骤C2、将所有的责任帧与每一非责任帧分别组成一图像集合,并分别计算每一图像集合的评估分值;Step C2, forming an image set with all responsible frames and each non-responsible frame, and calculating the evaluation score of each image set;
    步骤D2、针对每一图像集合,计算所述待提取视频的评估分值与该图像集合的评估分值的差值,以获取该图像集合的特征熵差;Step D2. For each image set, calculate the difference between the evaluation score of the video to be extracted and the evaluation score of the image set to obtain the feature entropy difference of the image set;
    步骤E2、将特征熵差最小的图像集合中的所有图像确定为责任帧;Step E2, determining all images in the image set with the smallest feature entropy difference as responsible frames;
    重复上述步骤C2至E2,直至提取出预设数量的责任帧。The above steps C2 to E2 are repeated until a preset number of responsible frames are extracted.
  8. 根据权利要求1所述的责任帧提取方法,其特征在于,还包括:The responsibility frame extraction method according to claim 1, further comprising:
    采用目标检测神经网络模型对所获取的待提取视频中的每一帧图像进行感兴趣区域的提取,以获取每一帧图像所对应的感兴趣区域图像;Using the target detection neural network model to extract the region of interest for each frame of image in the acquired video to be extracted, so as to obtain the region of interest image corresponding to each frame of image;
    采用静态图像分类神经网络模型的骨架网络对每一帧感兴趣区域图像进行特征提取,以获取每一帧感兴趣区域图像的特征矩阵;The skeleton network of the static image classification neural network model is used to extract the features of each frame of the region of interest image to obtain the feature matrix of each frame of the region of interest image;
    根据各帧感兴趣区域图像的特征矩阵,进行恶性责任帧的提取,直至由所有的所述恶性责任帧所构成的恶性责任帧集合所对应的恶性特征熵达到最小值;和/或者According to the feature matrix of the image of the region of interest of each frame, the malicious responsible frame is extracted until the malignant feature entropy corresponding to the malicious responsible frame set composed of all the malicious responsible frames reaches a minimum value; and/or
    根据各帧感兴趣区域图像的特征矩阵,进行良性责任帧的提取,直至由所有的所述良性责任帧所构成的良性责任帧集合所对应的良性特征熵达到最小值。According to the feature matrix of the ROI image of each frame, the benign responsible frame is extracted until the benign feature entropy corresponding to the benign responsible frame set composed of all the benign responsible frames reaches a minimum value.
  9. 根据权利要求8所述的责任帧提取方法,其特征在于,所述根据各帧感兴趣区域图像的特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件,包括:The responsible frame extraction method according to claim 8, wherein the extraction of the malicious responsible frame is performed according to the feature matrix of each frame of the region of interest image until the first preset end condition is met, including:
    针对每一帧感兴趣区域图像,根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的恶性特征权重参数和良性特征权重参数之差,获取所述感兴趣区域图像的恶性特征矩阵;For each frame of the ROI image, the ROI image is acquired according to the feature matrix of the ROI image and the difference between the malignant feature weight parameter and the benign feature weight parameter corresponding to the static image classification neural network model The malignant feature matrix of ;
    根据各帧感兴趣区域图像的恶性特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件;和/或者According to the malignant feature matrix of each frame of the region of interest image, the malicious responsible frame is extracted until the first preset end condition is met; and/or
    所述根据各帧感兴趣区域图像的特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件,包括:According to the feature matrix of the region of interest image of each frame, the benign responsible frame is extracted until the second preset end condition is met, including:
    针对每一帧感兴趣区域图像,根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的良性特征权重参数和恶性特征权重参数之差,获取所述感兴趣区域图像的良性特征矩阵;For each frame of the ROI image, the ROI image is acquired according to the feature matrix of the ROI image and the difference between the benign feature weight parameter and the malignant feature weight parameter corresponding to the static image classification neural network model The benign feature matrix of ;
    根据各帧感兴趣区域图像的良性特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件。According to the benign feature matrix of each frame of the region of interest image, the benign responsible frame is extracted until the second preset end condition is met.
  10. 根据权利要求9所述的责任帧提取方法,其特征在于,所述根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的恶性特征权重参数和良性特征权重参数之差,获取所述感兴趣区域图像的恶性特征矩阵,包括:The responsible frame extraction method according to claim 9, characterized in that, according to the feature matrix of the image of the region of interest and the weight parameter of the malignant feature and the weight parameter of the benign feature corresponding to the static image classification neural network model Poor, obtain the malignant feature matrix of the ROI image, including:
    按照如下公式,获取所述感兴趣区域图像的恶性特征矩阵:According to the following formula, the malignant feature matrix of the image of the region of interest is obtained:
    Figure PCTCN2022134699-appb-100001
    Figure PCTCN2022134699-appb-100001
    Figure PCTCN2022134699-appb-100002
    Figure PCTCN2022134699-appb-100002
    式中,
    Figure PCTCN2022134699-appb-100003
    表示第i帧感兴趣区域图像的特征矩阵中的第j个特征维度的特征值,
    Figure PCTCN2022134699-appb-100004
    表示所述静态图像分类神经网络模型所对应的第j个特征维度的恶性特征权重,
    Figure PCTCN2022134699-appb-100005
    表示所述静态图像分类神经网络模型所对应的第j个特征维度的良性特征权重,
    Figure PCTCN2022134699-appb-100006
    表示第i帧感兴趣区域图像的恶性特征矩阵中的第j个特征维度的恶性特征值,[FM] i表示第i帧感兴趣区域图像的恶性特征矩阵;和/或者
    In the formula,
    Figure PCTCN2022134699-appb-100003
    Represents the eigenvalue of the jth feature dimension in the feature matrix of the i-th frame region of interest image,
    Figure PCTCN2022134699-appb-100004
    Indicates the malignant feature weight of the jth feature dimension corresponding to the static image classification neural network model,
    Figure PCTCN2022134699-appb-100005
    Indicates the benign feature weight of the jth feature dimension corresponding to the static image classification neural network model,
    Figure PCTCN2022134699-appb-100006
    Represents the malignant feature value of the jth feature dimension in the malignant feature matrix of the i-th frame region of interest image, [FM] i represents the malignant feature matrix of the i-th frame region of interest image; and/or
    所述根据所述感兴趣区域图像的特征矩阵以及所述静态图像分类神经网络模型所对应的良性特征权重参数和恶性特征权重参数之差,获取所述感兴趣区域图像的良性特征矩阵,包括:The acquisition of the benign feature matrix of the ROI image according to the feature matrix of the ROI image and the difference between the benign feature weight parameters and the malignant feature weight parameters corresponding to the static image classification neural network model includes:
    按照如下公式,获取所述感兴趣区域图像的良性特征矩阵:Obtain the benign feature matrix of the ROI image according to the following formula:
    Figure PCTCN2022134699-appb-100007
    Figure PCTCN2022134699-appb-100007
    Figure PCTCN2022134699-appb-100008
    Figure PCTCN2022134699-appb-100008
    式中,
    Figure PCTCN2022134699-appb-100009
    表示第i帧感兴趣区域图像的特征矩阵中的第j个特征维度的特征值,
    Figure PCTCN2022134699-appb-100010
    表示所述静态图像分类神经网络模型所对应的第j个特征维度的良性特征权重,
    Figure PCTCN2022134699-appb-100011
    表示所述静态图像分类神经网络模型所对应的第j个特征维度的恶性特征权重,
    Figure PCTCN2022134699-appb-100012
    表示第i帧感兴趣区域图像的良性特征矩阵中的第j个特征维度的良性特征值,[FB] i表示第i帧感兴趣区域图像的良性特征矩阵。
    In the formula,
    Figure PCTCN2022134699-appb-100009
    Represents the eigenvalue of the jth feature dimension in the feature matrix of the i-th frame region of interest image,
    Figure PCTCN2022134699-appb-100010
    Indicates the benign feature weight of the jth feature dimension corresponding to the static image classification neural network model,
    Figure PCTCN2022134699-appb-100011
    Indicates the malignant feature weight of the jth feature dimension corresponding to the static image classification neural network model,
    Figure PCTCN2022134699-appb-100012
    Indicates the benign eigenvalue of the jth feature dimension in the benign feature matrix of the i-th frame ROI image, and [FB] i indicates the benign feature matrix of the i-th frame ROI image.
  11. 根据权利要求9所述的责任帧提取方法,其特征在于,所述根据各帧感兴趣区域图像的恶性特征矩阵,进行恶性责任帧的提取,直至满足第一预设结束条件,包括:The method for extracting the responsible frame according to claim 9, wherein the extraction of the malicious responsible frame is performed according to the malignant feature matrix of each frame of the region of interest image until the first preset end condition is met, including:
    针对每一帧感兴趣区域图像,将所述感兴趣区域图像的恶性特征矩阵中的所有特征维度的恶性特征值相加,以获取所述感兴趣区域图像的总恶性特征值;For each frame of the ROI image, add the malignant eigenvalues of all the feature dimensions in the malignant feature matrix of the ROI image to obtain the total malignant eigenvalue of the ROI image;
    根据各帧感兴趣区域图像的总恶性特征值,进行恶性责任帧的提取,直至满足第一预设结束条件;和/或者According to the total malignant feature value of each frame of the region of interest image, the malicious responsible frame is extracted until the first preset end condition is met; and/or
    所述根据各帧感兴趣区域图像的良性特征矩阵,进行良性责任帧的提取,直至满足第二预设结束条件,包括:According to the benign feature matrix of each frame of the region of interest image, the benign responsible frame is extracted until the second preset end condition is met, including:
    针对每一帧感兴趣区域图像,将所述感兴趣区域图像的良性特征矩阵中的所有特征维度的良性特征值相加,以获取所述感兴趣区域图像的总良性特征值;For each frame of the region of interest image, add the benign eigenvalues of all the feature dimensions in the benign feature matrix of the region of interest image to obtain the total benign eigenvalue of the region of interest image;
    根据各帧感兴趣区域图像的总良性特征值,进行良性责任帧的提取,直至满足第二预设结束条件。According to the total benign feature values of the ROI images of each frame, the benign responsible frame is extracted until the second preset end condition is met.
  12. 根据权利要求11所述的责任帧提取方法,其特征在于,所述根据各帧感兴趣区域图像的总恶性特征值,进行恶性责任帧的提取,直至满足第一预设结束条件,包括:The method for extracting the responsible frame according to claim 11, wherein the extraction of the malicious responsible frame is performed according to the total malignant feature value of the region-of-interest image of each frame until the first preset end condition is met, including:
    步骤A10、对各帧感兴趣区域图像的总恶性特征值进行排序,将总恶性特征值最大的感兴趣区域图像确定为恶性责任帧;Step A10, sorting the total malignant feature values of the ROI images of each frame, and determining the ROI image with the largest total malignant feature value as the malignant responsible frame;
    步骤A20、将所有的恶性责任帧与每一非恶性责任帧分别组成一第一图像集合,并分别计算每一第一图像集合的总恶性特征值,其中所述第一图像集合的总恶性特征值等于所述第一图像集合中的所有帧感兴趣区域图像的恶性特征矩阵进行最大池化操作后所得到的恶性特征矩阵中的所有特征维度的恶性特征值之和,所述非恶性责任帧为未被确定为恶性责任帧的感兴趣区域图像;Step A20, forming a first image set with all malicious responsible frames and each non-malignant responsible frame, and calculating the total malignant feature value of each first image set respectively, wherein the total malignant feature value of the first image set The value is equal to the sum of malignant feature values of all feature dimensions in the malignant feature matrix obtained after performing the maximum pooling operation on the malignant feature matrices of all frame ROI images in the first image set, and the non-malignant responsible frame is an image of a region of interest that is not determined to be a malicious frame;
    步骤A30、判断总恶性特征值最小的第一图像集合所对应的恶性特征熵是否大于由所有恶性责任帧所构成的恶性责任帧集合所对应的恶性特征熵;Step A30, judging whether the malignant feature entropy corresponding to the first image set with the smallest total malignant feature value is greater than the malignant feature entropy corresponding to the malignant responsible frame set composed of all malignant responsible frames;
    若否,则执行步骤A40,若是,则执行步骤A50;If not, then perform step A40, if so, then perform step A50;
    步骤A40、将总恶性特征值最小的第一图像集合中的所有帧感兴趣区域图像均确定为恶性责任帧,并返回执行步骤A20;Step A40, determine all frames of ROI images in the first image set with the smallest total malignant feature value as malignant responsible frames, and return to step A20;
    步骤A50、结束恶性责任帧的提取;和/或者Step A50, ending the extraction of malicious responsibility frames; and/or
    所述根据各帧感兴趣区域图像的总良性特征值,进行良性责任帧的提取,直至满足第二预设结束条件,包括:According to the total benign eigenvalues of the ROI images of each frame, the benign responsible frame is extracted until the second preset end condition is met, including:
    步骤B10、对各帧感兴趣区域图像的总良性特征值进行排序,将总良性特征值最大的感兴趣区域图像确定为良性责任帧;Step B10, sorting the total benign feature values of the ROI images of each frame, and determining the ROI image with the largest total benign feature value as the benign responsible frame;
    步骤B20、将所有的良性责任帧与每一非良性责任帧分别组成一第二图像集合,并分别计算每一第二图像集合的总良性特征值,其中所述第二图像集合的总良性特征值等于所述第二图像集合中的所有帧感兴趣区域图像的良性特征矩阵进行最大池化操作后所得到的良性特征矩阵中的所有特征维度的良性特征值之和,所述非良性责任帧为还未被确定为良性责任帧的感兴趣区域图像;Step B20. Composing all benign responsible frames and each non-benign responsible frame into a second image set, and calculating the total benign feature value of each second image set, wherein the total benign feature of the second image set The value is equal to the sum of the benign eigenvalues of all the feature dimensions in the benign feature matrix obtained after the benign feature matrix of all frame ROI images in the second image set is subjected to the maximum pooling operation, and the non-benign responsible frame is an image of a region of interest that has not been determined to be a benign responsible frame;
    步骤B30、判断总良性特征值最小的第二图像集合所对应的良性特征熵是否大于由所有良性责任帧所构成的良性责任帧集合所对应的良性特征熵;Step B30, judging whether the benign feature entropy corresponding to the second image set with the smallest total benign feature value is greater than the benign feature entropy corresponding to the benign responsible frame set composed of all benign responsible frames;
    若否,则执行步骤B40,若是,则执行步骤B50;If not, then perform step B40, if so, then perform step B50;
    步骤B40、将总良性特征值最小的第二图像集合中的所有感兴趣区域图像均确定为良性责任帧,并返回执行步骤B20;Step B40, determine all ROI images in the second image set with the smallest total benign feature value as benign responsible frames, and return to step B20;
    步骤B50、结束良性责任帧的提取。Step B50, ending the extraction of benign responsibility frames.
  13. 根据权利要求8所述的责任帧提取方法,其特征在于,按照如下公式,计算图像集合的恶性特征熵:The responsibility frame extraction method according to claim 8, wherein the malignant feature entropy of the image set is calculated according to the following formula:
    H 1(A)=-p 1(A)×log 2p 1(A) H 1 (A)=-p 1 (A)×log 2 p 1 (A)
    Figure PCTCN2022134699-appb-100013
    Figure PCTCN2022134699-appb-100013
    式中,H 1(A)表示图像集合X的恶性特征熵,MScoreA表示图像集合A的总恶性特征值,BScoreA表示图像集合A的总良性特征值;和/或者 In the formula, H 1 (A) represents the malignant feature entropy of the image set X, MScoreA represents the total malignant feature value of the image set A, BScoreA represents the total benign feature value of the image set A; and/or
    按照如下公式,计算图像集合的良性特征熵:Calculate the benign feature entropy of the image set according to the following formula:
    H 0(A)=-p 0(A)×log 2p 0(A) H 0 (A)=-p 0 (A)×log 2 p 0 (A)
    Figure PCTCN2022134699-appb-100014
    Figure PCTCN2022134699-appb-100014
    式中,H 0(A)表示图像集合A的良性特征熵,MScoreA表示图像集合A的总恶性特征值,BScoreA表示图像集合A的总良性特征值。 In the formula, H 0 (A) represents the benign feature entropy of image set A, MScoreA represents the total malignant feature value of image set A, and BScoreA represents the total benign feature value of image set A.
  14. 根据权利要求8所述的责任帧提取方法,其特征在于,所述采用目标检测神经网络模型对所获取的待提取视频中的每一帧图像进行感兴趣区域的提取,以获取每一帧图像所对应的感兴趣区域图像,包括:The responsible frame extraction method according to claim 8, wherein the target detection neural network model is used to extract the region of interest for each frame of image in the acquired video to be extracted, so as to obtain each frame of image Corresponding ROI images, including:
    采用目标检测神经网络模型对所获取的待提取视频中的每一帧图像进行感兴趣区域的提取,以获取每一帧图像所对应的感兴趣区域的位置信息;Using the target detection neural network model to extract the region of interest for each frame of image in the acquired video to be extracted, to obtain the position information of the region of interest corresponding to each frame of image;
    根据各帧图像所对应的感兴趣区域的位置信息,在各帧图像上裁剪出对应的区域,以获取每一帧图像所对应的感兴趣区域图像。According to the position information of the region of interest corresponding to each frame of image, the corresponding region is cut out on each frame of image, so as to obtain the image of the region of interest corresponding to each frame of image.
  15. 一种视频分类方法,其特征在于,包括:A video classification method, characterized in that, comprising:
    采用权利要求1至14中任一项所述的责任帧提取方法,从所获取的视频中提取出预设数量的责任帧;Using the responsible frame extraction method described in any one of claims 1 to 14 to extract a preset number of responsible frames from the acquired video;
    对所述预设数量的责任帧的特征矩阵进行最大池化操作,以获取责任帧集合的特征矩阵;以及performing a maximum pooling operation on the feature matrices of the preset number of responsible frames to obtain a feature matrix of the responsible frame set; and
    根据所述责任帧集合的特征矩阵进行视频的分类。Video classification is performed according to the feature matrix of the responsible frame set.
  16. 根据权利要求15所述的视频分类方法,其特征在于,所述根据所述责任帧集合的特征矩阵进行视频的分类,包括:The video classification method according to claim 15, wherein the classification of the video according to the feature matrix of the responsible frame set includes:
    将所述责任帧集合的特征矩阵输入视频分类模型中,以进行视频的分类。Input the feature matrix of the responsible frame set into the video classification model to classify the video.
  17. 根据权利要求16所述的视频分类方法,其特征在于,所述视频分类模型为随机森林分类模型。The video classification method according to claim 16, wherein the video classification model is a random forest classification model.
  18. 根据权利要求15所述的视频分类方法,其特征在于,所述视频分类方法还包括:The video classification method according to claim 15, wherein the video classification method further comprises:
    对所述视频的分类结果以及所提取出的预设数量的责任帧进行显示。The classification result of the video and the extracted preset number of responsible frames are displayed.
  19. 一种电子设备,其特征在于,包括处理器和存储器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时,实现权利要求1至14中任一项所述的责任帧提取方法或权利要求15至18中任一项所述的视频分类方法。An electronic device, characterized in that it includes a processor and a memory, and a computer program is stored in the memory, and when the computer program is executed by the processor, the responsibility described in any one of claims 1 to 14 is realized A frame extraction method or a video classification method according to any one of claims 15 to 18.
  20. 一种可读存储介质,其特征在于,所述可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时,实现权利要求1至14中任一项所述的责任帧提取方法或权利要求15至18中任一项所述的视频分类方法。A readable storage medium, characterized in that a computer program is stored in the readable storage medium, and when the computer program is executed by a processor, the responsibility frame extraction method described in any one of claims 1 to 14 is realized Or the video classification method described in any one of claims 15 to 18.
PCT/CN2022/134699 2021-12-21 2022-11-28 Responsibility frame extraction method, video classification method, device and medium WO2023116351A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111572826.1 2021-12-21
CN202111572826.1A CN116343073A (en) 2021-12-21 2021-12-21 Responsibility frame extraction method, video classification method, equipment and medium
CN202210639251.9 2022-06-07
CN202210639251.9A CN117237263A (en) 2022-06-07 2022-06-07 Responsibility frame extraction method, medical video classification method, equipment and medium

Publications (1)

Publication Number Publication Date
WO2023116351A1 true WO2023116351A1 (en) 2023-06-29

Family

ID=86901197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134699 WO2023116351A1 (en) 2021-12-21 2022-11-28 Responsibility frame extraction method, video classification method, device and medium

Country Status (1)

Country Link
WO (1) WO2023116351A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120283569A1 (en) * 2011-05-04 2012-11-08 Boston Scientific Scimed, Inc. Systems and methods for navigating and visualizing intravascular ultrasound sequences
CN110569702A (en) * 2019-02-14 2019-12-13 阿里巴巴集团控股有限公司 Video stream processing method and device
CN111160191A (en) * 2019-12-23 2020-05-15 腾讯科技(深圳)有限公司 Video key frame extraction method and device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120283569A1 (en) * 2011-05-04 2012-11-08 Boston Scientific Scimed, Inc. Systems and methods for navigating and visualizing intravascular ultrasound sequences
CN110569702A (en) * 2019-02-14 2019-12-13 阿里巴巴集团控股有限公司 Video stream processing method and device
CN111160191A (en) * 2019-12-23 2020-05-15 腾讯科技(深圳)有限公司 Video key frame extraction method and device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUIJUAN XU UMASS, LOWELL LOWELL, MA, VENUGOPALAN SUBHASHINI, RAMANISHKA UMASS VASILI, ROHRBACH MARCUS, UMASS KATE SAENKO: "A Multi-scale Multiple Instance Video Description Network", 19 March 2016 (2016-03-19), XP055462088, Retrieved from the Internet <URL:https://arxiv.org/pdf/1505.05914.pdf> [retrieved on 20180322] *

Similar Documents

Publication Publication Date Title
US20200250497A1 (en) Image classification method, server, and computer-readable storage medium
JP7086336B2 (en) Tissue nodule detection method and its model Training method, equipment, equipment, system, and its computer program
Lee et al. Detection and classification of intracranial haemorrhage on CT images using a novel deep-learning algorithm
Al-Antari et al. Deep learning computer-aided diagnosis for breast lesion in digital mammogram
US11182894B2 (en) Method and means of CAD system personalization to reduce intraoperator and interoperator variation
WO2021164306A1 (en) Image classification model training method, apparatus, computer device, and storage medium
TWI754195B (en) Image processing method and device, electronic device and computer-readable storage medium
WO2018120942A1 (en) System and method for automatically detecting lesions in medical image by means of multi-model fusion
Murakami et al. Automatic identification of bone erosions in rheumatoid arthritis from hand radiographs based on deep convolutional neural network
US10726948B2 (en) Medical imaging device- and display-invariant segmentation and measurement
Byra et al. Impact of ultrasound image reconstruction method on breast lesion classification with deep learning
Shia et al. Classification of malignant tumours in breast ultrasound using unsupervised machine learning approaches
Wankhade et al. A novel hybrid deep learning method for early detection of lung cancer using neural networks
Zhao et al. Bascnet: Bilateral adaptive spatial and channel attention network for breast density classification in the mammogram
CN111524109A (en) Head medical image scoring method and device, electronic equipment and storage medium
Hu et al. A multi-instance networks with multiple views for classification of mammograms
Sendra-Balcells et al. Generalisability of fetal ultrasound deep learning models to low-resource imaging settings in five African countries
Younas et al. An ensemble framework of deep neural networks for colorectal polyp classification
WO2023116351A1 (en) Responsibility frame extraction method, video classification method, device and medium
Nemade et al. Deep learning-based ensemble model for classification of breast cancer
Lee et al. Computational discrimination of breast cancer for Korean women based on epidemiologic data only
CN116416221A (en) Ultrasonic image analysis method
Kim et al. Prediction of locations in medical images using orthogonal neural networks
CN116343073A (en) Responsibility frame extraction method, video classification method, equipment and medium
CN117237263A (en) Responsibility frame extraction method, medical video classification method, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909666

Country of ref document: EP

Kind code of ref document: A1