CN110599421A

CN110599421A - Model training method, video fuzzy frame conversion method, device and storage medium

Info

Publication number: CN110599421A
Application number: CN201910865820.XA
Authority: CN
Inventors: 陈思宏; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2019-12-20
Anticipated expiration: 2039-09-12
Also published as: CN110599421B

Abstract

The application discloses a model training method, a video fuzzy frame conversion method, equipment and a storage medium, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring a sample video; extracting a training data pair from a sample video, wherein the training data pair comprises a sample clear frame and a sample fuzzy frame, and the sample clear frame and the sample fuzzy frame belong to the same video frame interval; a generator for generating a model for the input image by the training data to obtain a sample generation frame and sample optical flow information output by the generator; inputting the sample generation frame into a discriminator of an image generation model to obtain a discrimination result output by the discriminator; and training an image generation model according to the training data pair, the sample generation frame, the sample optical flow information and the discrimination result. The image generation model obtained by training in the embodiment of the application is used for converting the fuzzy video frames in the video, so that the time sequence consistency among the video frames can be kept, the conversion effect of converting the fuzzy video frames into the clear video frames is improved, and the quality of the converted video is improved.

Description

Model training method, video fuzzy frame conversion method, device and storage medium

Technical Field

The embodiment of the application relates to the field of artificial intelligence, in particular to a model training method, a video fuzzy frame conversion method, equipment and a storage medium.

Background

Medical imaging is a technique for imaging a human body or a designated part of the human body in a non-invasive manner, and common medical imaging methods include Ultrasound (Ultrasound) imaging, Computed Tomography (CT) imaging, Nuclear Magnetic Resonance (NMR) imaging, and the like.

Under the influence of various factors, a large amount of artifacts may exist in a medical image obtained by adopting a medical imaging mode, so that the quality of the medical imaging is poor; moreover, the contrast of the medical image is low, the focus edge is fuzzy, and the subsequent focus segmentation and diagnosis are adversely affected. In the related art, a pre-trained image generation model is usually adopted for definition conversion, and a blurred medical image is converted into a clear medical image, so that the quality of the medical image is improved, and the accuracy of subsequent lesion segmentation and diagnosis is further improved. Among them, the image generation model is usually trained based on a cyclic generated confrontation network (Cycle-GAN).

However, for a medical video including a blurred video frame and a clear video frame (both medical images obtained by medical imaging), since Cycle-GAN is not specially designed for video-like data, the conversion effect of converting the blurred video frame into the clear video frame by using an image generation model in the related art is not good, and the quality of the converted medical video cannot be guaranteed.

Disclosure of Invention

The embodiment of the application provides a model training method, a video fuzzy frame conversion method, equipment and a storage medium, and can solve the problems that the conversion effect of converting a fuzzy video frame into a clear video frame is poor and the quality of a converted medical video cannot be guaranteed by using an image generation model in the related technology. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a model training method, where the method includes:

obtaining a sample video, wherein the sample video is obtained through medical imaging;

extracting a training data pair from the sample video, wherein the training data pair comprises a sample clear frame and a sample fuzzy frame, and the sample clear frame and the sample fuzzy frame belong to the same video frame interval;

a generator for generating a model for an input image by using the training data to obtain a sample generation frame and sample optical flow information output by the generator, wherein the sample generation frame is obtained by performing style conversion on the sample sharp frame and the sample fuzzy frame, and the sample optical flow information is used for indicating the optical flow transformation condition in the style conversion process;

inputting the sample generating frame into a discriminator of the image generating model to obtain a discrimination result output by the discriminator;

and training the image generation model according to the training data pair, the sample generation frame, the sample optical flow information and the discrimination result.

In another aspect, an embodiment of the present application provides a video blurred frame conversion method, where the method includes:

acquiring a target video, wherein the target video is obtained through medical imaging;

determining a blurred frame in the target video;

inputting the blurred frame into a generator of an image generation model to obtain a sharp generated frame output by the generator, wherein the image generator comprises the generator and a discriminator, the image generator is obtained by training according to a training data pair, a sample generated frame, sample optical flow information and a discrimination result, the training data pair comprises a sample sharp frame and a sample blurred frame which belong to the same video frame interval in a sample video, the sample generated frame is obtained by performing style conversion on the sample sharp frame and the sample blurred frame by the generator, the sample optical flow information is output when the generator performs style conversion and is used for indicating the optical flow conversion condition in the style conversion process, and the discrimination result is obtained by discriminating the sample generated frame by the discriminator.

In another aspect, an embodiment of the present application provides a model training apparatus, where the apparatus includes:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample video, and the sample video is obtained through medical imaging;

the extraction module is used for extracting a training data pair from the sample video, wherein the training data pair comprises a sample clear frame and a sample fuzzy frame, and the sample clear frame and the sample fuzzy frame belong to the same video frame interval;

the first generation module is used for generating a model from the training data to an input image to obtain a sample generation frame and sample optical flow information output by the generator, wherein the sample generation frame is obtained by performing style conversion on the sample clear frame and the sample fuzzy frame, and the sample optical flow information is used for indicating the optical flow transformation condition in the style conversion process;

the judging module is used for inputting the sample generating frame into a discriminator of the image generating model to obtain a judging result output by the discriminator;

and the training module is used for training the image generation model according to the training data pair, the sample generation frame, the sample optical flow information and the discrimination result.

In another aspect, an embodiment of the present application provides a video blurred frame conversion apparatus, where the apparatus includes:

the second acquisition module is used for acquiring a target video, and the target video is obtained through medical imaging;

a determining module, configured to determine a blurred frame in the target video;

the second generation module is used for inputting the blurred frame into a generator of an image generation model to obtain a clear generated frame output by the generator, the image generator comprises the generator and a discriminator, the image generator is obtained by training according to a training data pair, a sample generated frame, sample optical flow information and a discrimination result, the training data pair comprises a sample clear frame and a sample blurred frame which belong to the same video frame interval in a sample video, the sample generated frame is obtained by performing style conversion on the sample clear frame and the sample blurred frame by the generator, the sample optical flow information is output when the generator performs style conversion and is used for indicating the optical flow conversion condition in the style conversion process, and the discrimination result is obtained by discriminating the discriminator on the sample generated frame.

In another aspect, embodiments of the present application provide a computer device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the model training method according to the above aspect, or to implement the video blurred frame conversion method according to the above aspect.

In another aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the model training method as described in the above aspect or to implement the video blur frame conversion method as described in the above aspect.

In another aspect, a computer program product is provided, which, when run on a computer, causes the computer to perform the model training method as described in the above aspect or to perform the video blur frame conversion method as described in the above aspect.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the embodiment of the application, after a training data pair comprising a sample clear frame and a sample fuzzy frame is extracted from a sample video, the training data pair is input into a generator of an image generation model to obtain a sample generation frame and sample optical flow information output by the generator, the sample generation frame is further input into a discriminator of the image generation model to obtain a discrimination result output by the discriminator, and therefore the image generation model is trained according to the training data pair, the sample generation frame, the sample optical flow information and the discrimination result; by taking the sample fuzzy frame and the sample clear frame with the time sequence relation as the training data pair and adding the optical flow information to constrain the training process of the image generation model, the time sequence consistency among the video frames can be kept when the fuzzy video frames in the video are converted by the image generation model obtained by the training in the follow-up process, the conversion effect of converting the fuzzy video frames into the clear video frames is improved, and the quality of the converted video is further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a model training method provided in an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation of a blur frame restoration process for a medical video using an image generation model;

FIG. 3 illustrates a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application;

FIG. 4 illustrates a flow chart of a model training method provided by an exemplary embodiment of the present application;

FIG. 5 illustrates a flow chart of a model training method provided by another exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a video frame interval provided by an exemplary embodiment;

FIG. 7 is a flow chart of a process for extracting training data pairs in the model training method of FIG. 5;

FIG. 8 is a schematic diagram of an implementation of the model training method shown in FIG. 5;

FIG. 9 is a block diagram of a residual block provided by an exemplary embodiment;

FIG. 10 is a flow chart illustrating a video blur frame conversion method provided by an exemplary embodiment of the present application;

FIG. 11 is a block diagram of a model training apparatus according to an exemplary embodiment of the present application;

fig. 12 is a block diagram illustrating a structure of a video blur frame conversion apparatus according to an exemplary embodiment of the present application;

fig. 13 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In the related art, in order to convert a blurred video frame in a video into a clear video frame and improve the video quality, the Cycle-GAN is generally adopted to perform style conversion on the blurred video frame to generate a corresponding clear video frame, and the generated clear video frame is used to replace the blurred video frame, so that the video quality is improved.

However, the Cycle-GAN is not specially designed for video data, and only the image style is taken as a constraint in the process of training the Cycle-GAN, so that the Cycle-GAN only focuses on the image style change when style conversion is carried out on blurred video frames by using the Cycle-GAN. For the video obtained by medical imaging, because the image styles of two video frames in the same video are similar and the continuity in time sequence exists, when the fuzzy video frames in the video are converted into the clear video frames by directly utilizing the Cycle-GAN provided in the related technology, the continuity in time sequence of the converted clear video frames cannot be maintained, the situations of deviation of the positions of focuses in the video frames before and after conversion and the like easily occur, and the quality of the converted video is finally influenced.

In order to solve the problems in the related art, an embodiment of the present application provides a model training method, and a model training principle of the method is shown in fig. 1. For the sample video 11 obtained through medical imaging, in order to enable the model to learn the timing information between video frames in the training process, sample video frames belonging to the same video frame interval are first extracted from the sample video frames 11, and a sample training pair 12 is generated by pairing a sample clear frame 121 and a sample fuzzy frame 122 in the sample video frames. Further, the sample training pair 12 is input into the image generation model 13 (based on Cycle-GAN) to be trained, and the generator 131 performs style conversion on the sample clear frame 121 or the sample fuzzy frame 122 to obtain a sample generation frame 14.

Unlike the generator in the related art that can only perform the style conversion, the generator 131 in the embodiment of the present application can predict the optical flow information during the style conversion at the same time of performing the style conversion, thereby outputting the sample optical flow information 15 corresponding to the sample generation frame 14.

For the sample generation frame 14 output by the generator 131, the image generation model 13 determines the sample generation frame 14 through the discriminator 132 (i.e. determines whether the sample generation frame 14 is a real video frame or a generated video frame after style conversion), and obtains a corresponding discrimination result 16.

In order to enable the image generation model 13 to learn the time sequence information between the video frames, the sample optical flow information 15 is added in the process of training the image generation model 13, namely, the optical flow information is added in the process of training the generator to carry out time sequence constraint, so that the video frames output by the generator are ensured to keep time sequence continuity, and the situation that the position of a focus in the video frames before and after conversion is greatly deviated is avoided.

The image generation model trained by the model training method provided by the embodiment of the application can be used for restoring scenes by using the fuzzy video frames. In one possible embodiment, the image generation model may be implemented as all or part of a video processing application that may be installed on the medical imaging device to directly process medical video output by the medical imaging device using the video processing application, or may be installed on a terminal used by medical personnel to selectively process video using the video processing application.

Illustratively, as shown in fig. 2, when a video processing application is installed on a terminal used by a medical staff, medical videos corresponding to respective patients are displayed on a user interface 21 of the video processing application, and the medical staff can select to perform blur restoration processing on a certain medical video, or even a specified video segment in the certain medical video. When medical staff select to carry out fuzzy reduction processing on the medical video corresponding to the patient D, the medical video is input into the image generation model, and the image generation model converts the fuzzy frame in the medical video into a clear frame. After the blur restoration process is completed, the terminal displays the processed medical video 22 for the medical staff to view.

Further, medical staff can use the processed medical video to perform subsequent diagnosis such as lesion classification, lesion image segmentation, lesion detection and the like. Because the blurred frames in the medical video are restored to the clear frames, diagnosis based on the processed medical video helps to improve the diagnosis accuracy.

Of course, besides being applied to the above-mentioned scenes, the method provided in the embodiment of the present application may also be applied to other scenes that need to restore the blurred frame in the video, and the embodiment of the present application does not limit a specific application scene.

The model training method provided by the embodiment of the application can be applied to computer equipment with stronger data processing capacity, and the computer equipment can be a personal computer or a server. The image generation model obtained by training through the model training method can be realized as an application program or a part of the application program and is installed in the terminal, so that the terminal has the function of restoring the fuzzy frame in the video into a clear frame; or the method can be applied to a background server of the application program, so that the server performs fuzzy frame restoration on the video provided by the application program in the terminal.

Referring to fig. 3, a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application is shown. The implementation environment includes a terminal 310 and a server 320, where the terminal 310 and the server 320 perform data communication through a communication network, optionally, the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a metropolitan area network, and a wide area network.

The terminal 310 is installed with an application program having a video blurred frame restoration requirement. The application program may be a medical video processing application program, a video-based lesion detection application program, and the like, which is not limited in this application. Optionally, the terminal 310 may be a mobile phone, a tablet computer, a laptop portable notebook computer, or a terminal such as a desktop computer, a projection computer, and the like, which is not limited in this embodiment of the application.

The server 320 may be implemented as one server, or may be implemented as a server cluster formed by a group of servers, which may be physical servers or cloud servers. In one possible implementation, server 320 is a backend server for applications in terminal 310.

As shown in fig. 3, in the embodiment of the present application, a model frame extraction module 321 and a pre-trained image generation model 322 are disposed in the server 520. In a possible application scenario, the terminal 310 uploads a video to be processed to the server 320, and for the received video to be processed, the server 320 first extracts a blurred video frame from the video to be processed through the blurred frame extraction module 321, then inputs the extracted blurred video frame into the image generation model 322, and the blurred video frame is converted into a clear video frame by the image generation model 322. After each fuzzy video frame in the video to be processed is converted into a clear video frame, the server 320 generates a video after fuzzy restoration, and sends the video to the terminal 310 for the terminal 310 to play.

In other possible embodiments, the blurred frame extraction module 321 and the image generation model 322 may also be implemented as part or all of an application program, and accordingly, the terminal 310 may perform blurred frame restoration locally without using the server 320, which is not limited in this embodiment.

For convenience of description, the following embodiments are described as examples of a model training method performed by a computer device.

Referring to fig. 4, a flowchart of a model training method provided in an exemplary embodiment of the present application is shown. The embodiment is described by taking the method as an example for a computer device, and the method comprises the following steps.

Step 401, a sample video is obtained, and the sample video is obtained through medical imaging.

Wherein the sample video may be a video obtained by medical imaging, and the sample video does not contain annotation information. For example, the sample video may be an ultrasound video obtained by ultrasound imaging, a CT video obtained by CT imaging, or an NMR video obtained by NMR imaging, or the like. For convenience of description, the following embodiments are described by taking the sample video as the ultrasound video as an example, but the embodiments are not limited thereto.

Step 402, extracting a training data pair from the sample video, wherein the training data pair comprises a sample clear frame and a sample fuzzy frame, and the sample clear frame and the sample fuzzy frame belong to the same video frame interval.

In order to enable the image generation model to learn timing information between video frames, the computer device first needs to extract a sample sharp frame and a sample fuzzy frame with continuous timing from a sample video frame, so as to generate a training data pair according to the pair of the two frames.

In one possible implementation, the computer device extracts the sample sharp frame and the sample image frame from consecutive video frames within the same video frame interval because the timing variation range of the video frames within the same video frame interval in the video is small and continuous.

For example, the video frame interval is 7 frames, i.e., the sample sharp frame and the sample blurred frame are two video frames in the continuous 7 video frames.

Step 403, generating a model from the training data to the input image, and obtaining a sample generation frame and sample optical flow information output by the generator, where the sample generation frame is obtained by performing style conversion on a sample sharp frame and a sample fuzzy frame, and the sample optical flow information is used for indicating an optical flow transformation condition in the style conversion process.

The image generation model in the embodiment of the application adopts Cycle-GAN, which comprises a generator and a discriminator. Wherein the generator is for performing a style conversion on an input video frame, the generator may comprise a first generator for converting an input sample blurred frame into a sharp frame, and a second generator for converting an input sample sharp frame into a blurred frame.

Different from the related art, a generator in the Cycle-GAN can only output images after style conversion, and the generator in the embodiment of the application can predict the optical flow transformation condition of the video frame in the style conversion process besides performing the style conversion on the video frame, so that the time sequence change condition of the video frame is reflected through the optical flow transformation condition.

In one possible implementation, the computer device inputs the sample blurred frame into the first generator, and obtains the sample sharp generation frame and the first sample optical flow information output by the first generator; and inputting the sample sharp frame into a second generator to obtain a sample fuzzy generation frame and second sample optical flow information output by the second generator. The sample optical flow information may be in the form of an optical flow transformation displacement map, which is not limited in this embodiment.

Step 404, inputting the sample generation frame into the discriminator of the image generation model to obtain the discrimination result output by the discriminator.

Wherein, the discriminator is used for discriminating the probability that the input video frame is a real (group channel) frame or a generated frame, and the discriminator may include a first discriminator used for discriminating the blurred frame (i.e. discriminating whether the input blurred frame is a real frame or a generated frame), and a discriminator used for discriminating the sample sharp frame (i.e. discriminating whether the input sharp frame is a real frame or a generated frame).

In a possible implementation mode, the computer equipment inputs the sample clear generation frame output by the first generator into the second discriminator to obtain a discrimination result output by the second discriminator; and inputting the sample fuzzy generation frame output by the second generator into the first discriminator to obtain a discrimination result output by the first discriminator.

Step 405, training an image generation model according to the training data pair, the sample generation frame, the sample optical flow information and the discrimination result.

In order to enable the image generation model to learn the time sequence change of the video frame in the training process, on the basis of the training data pair, the sample generation frame and the judgment result, the computer equipment adds the sample optical flow information to constrain the training process.

In one possible implementation, the computer device takes the sample fuzzy frame as the supervision information of the sample fuzzy generation frame in the sample generation frame, and takes the sample sharp frame as the supervision information of the sample sharp generation frame in the sample generation frame; taking the marks of the sample fuzzy frame and the sample clear frame as supervision information of a judgment result; and taking the optical flow information between the sample clear frame and the sample fuzzy frame as the supervision information of the sample optical flow information, and training a generator and a discriminator in the image generation model. The generator and the discriminator may be trained by a gradient descent algorithm or a back propagation algorithm, which is not limited in the embodiment of the present application.

Optionally, in the process of using the trained image generation model, the blurred frame in the target video is input into the generator, and the (first) generator generates a sharp generation frame according to the blurred frame, so as to implement blurred frame restoration.

In summary, in the embodiment of the present application, after a training data pair including a sample sharp frame and a sample fuzzy frame is extracted from a sample video, the training data pair is input to a generator of an image generation model to obtain a sample generation frame and sample optical flow information output by the generator, and the sample generation frame is further input to a discriminator of the image generation model to obtain a discrimination result output by the discriminator, so that the image generation model is trained according to the training data pair, the sample generation frame, the sample optical flow information, and the discrimination result; by taking the sample fuzzy frame and the sample clear frame with the time sequence relation as the training data pair and adding the optical flow information to constrain the training process of the image generation model, the time sequence consistency among the video frames can be kept when the fuzzy video frames in the video are converted by the image generation model obtained by the training in the follow-up process, the conversion effect of converting the fuzzy video frames into the clear video frames is improved, and the quality of the converted video is further improved.

In one possible implementation, since the generator in the image generation model simultaneously outputs the sample generation frame and the sample optical flow information, the computer device needs to use the optical flow transformation loss as a part of the cyclic consistency loss in addition to the style transformation loss as the cyclic consistency loss of the generator in the process of training the image generation model, so that the generator can learn the time-series transformation condition of the video frame. The following description will be made by using exemplary embodiments.

Referring to fig. 5, a flowchart of a model training method provided in another exemplary embodiment of the present application is shown. The embodiment is described by taking the method as an example for a computer device, and the method comprises the following steps.

Step 501, a sample video is obtained, and the sample video is obtained through medical imaging.

The step 401 may be referred to in the implementation manner of this step, and this embodiment is not described herein again.

Step 502, acquiring n sample video frames belonging to the same video frame interval in a sample video.

In one possible implementation, the computer device divides the sample video into different video frame intervals according to a predetermined number of frames, where each video frame interval includes n consecutive sample video frames, and n may be, for example, 7, 10, 15, or the like.

In one illustrative example, as shown in FIG. 6, a computer device obtains 7 sample video frames in a video frame interval.

Step 503, for each sample video frame in the n sample video frames, inputting the sample video frame into the lesion detection model, and obtaining a lesion prediction result output by the lesion detection model.

When the sample video frame is a clear frame, the outline of each focus in the sample video frame is clear, and correspondingly, when the focus detection is carried out on the sample video frame, the detected focus integrity and accuracy are higher; when the sample video frame is a fuzzy frame, the focus detection is carried out on the sample video frame, and the problems of focus deletion and low focus accuracy exist. In a possible implementation manner, therefore, the computer device performs focus prediction on each sample video frame through a focus detection model obtained through pre-training, so as to determine a sample sharp frame and a sample fuzzy frame in a video frame interval according to a focus prediction result.

In one possible embodiment, the computer device trains a lesion detection model using a picture containing labeling information (for marking the position of a lesion in the picture) as sample data, wherein the lesion detection model employs a convolutional neural network.

In one illustrative example, a computer device acquires 5000 breast ultrasound tumor pictures containing labeling information as sample data, and trains a breast tumor detection model for predicting the position and probability of a breast tumor in the pictures.

Step 504, determining a sample clear frame and a sample fuzzy frame in the n sample video frames according to the focus prediction results corresponding to the n sample video frames.

In a possible implementation manner, the lesion prediction result includes the number of lesions of the predicted lesion and the prediction probability corresponding to each predicted lesion, and the computer device determines a sample sharp frame and a sample fuzzy frame in the sample video frame according to the number of lesions and the prediction probability. As shown in fig. 7, this step may include the following steps.

Step 504A, determining a threshold value of the number of the lesions according to the number of the lesions corresponding to each of the n sample video frames, wherein the threshold value of the number of the lesions is an average value or a median of the number of the lesions.

Optionally, the computer device determines the number of the lesions corresponding to each sample video frame according to the lesion prediction result corresponding to each sample video frame, and determines the lesion number threshold value by means of summation average or median taking. When the number of focuses corresponding to the sample video frame is smaller than a focus number threshold, determining the sample video frame as a sample fuzzy frame; when the number of the focuses corresponding to the sample video frame is greater than the threshold value of the number of the focuses, whether the sample video frame is a clear sample frame or not needs to be further determined according to the prediction probability of each predicted focus.

Schematically, as shown in fig. 6, the computer device obtains that the number of lesions (i.e., predicted lesions in a white dotted frame in fig. 6) corresponding to each of 7 video frames is 2, 1, 2, and determines that the threshold value of the number of lesions is 2 by taking a median.

In step 504B, if the number of the lesions corresponding to the sample video frame is smaller than the threshold value of the number of the lesions, the sample video frame is determined as a sample blurred frame.

When the sample video frame is a blurred frame, the focus detection model cannot detect the focus with blurred edges, so that the predicted number of the focuses is small, and correspondingly, the computer equipment can determine the sample video frame with the focus number smaller than the focus number threshold as the sample blurred frame.

Illustratively, as shown in fig. 6, the computer device determines sample blurred frames from the 2 nd frame, the 3 rd frame, and the 5 th frame video frames according to the lesion number threshold 2.

Step 504C, if the number of the focuses corresponding to the sample video frame is greater than the focus number threshold, obtaining a prediction probability corresponding to each predicted focus in the sample video frame.

When the fuzzy degree of the sample video frame is low, all the focuses can be predicted by using the focus detection model, but the prediction probability corresponding to the predicted focuses is low (namely, the confidence coefficient is low), so that the computer equipment further obtains the prediction probability corresponding to each predicted focus for the sample video frame with the focus number larger than the focus number threshold value.

Illustratively, as shown in fig. 6, the computer device obtains 86.3% and 91.1% of prediction probabilities corresponding to the predicted lesion in the sample video frame of the 1 st frame, obtains 75.3% and 70.1% of prediction probabilities corresponding to the predicted lesion in the sample video frame of the 4 th frame, obtains 72.5% and 77.4% of prediction probabilities corresponding to the predicted lesion in the sample video frame of the 6 th frame, and obtains 74.1% and 70.5% of prediction probabilities corresponding to the predicted lesion in the sample video frame of the 7 th frame.

Step 504D, if the prediction probabilities corresponding to the respective predicted lesions in the sample video frame are all greater than the probability threshold, determining that the frame is a clear sample frame.

In a possible implementation manner, a probability threshold is preset in the computer device, and if the prediction probabilities corresponding to the respective predicted lesions in the sample video frame are greater than the probability threshold, the computer device determines that the sample video frame is a sample sharp frame. For example, the probability threshold is 85%, 90%, etc.

In step 504E, if the prediction probability corresponding to the predicted focus in the sample video frame is smaller than the probability threshold, the sample video frame is determined to be a sample fuzzy frame.

In contrast to step 504D, if the prediction probability corresponding to the presence of the predicted lesion in the sample video frame is less than the probability threshold, the computer device determines that the sample video frame is a sample blurred frame.

Illustratively, as shown in fig. 6, since there are prediction focuses with prediction probabilities smaller than a probability threshold (85%) in the sample video frames of the 4 th, 6 th and 7 th frames, the computer device determines the sample video frame of the 1 st frame as a sample sharp frame and determines the sample video frames of the 2 nd to 7 th frames as sample fuzzy frames.

Of course, in addition to determining the sample blurred frame and the sample clear frame by the above method, the sample blurred frame and the sample clear frame in the sample video may also be marked by a manual marking method, and accordingly, the computer device does not need to perform the clear blur determination frame by frame, which is not limited in this embodiment.

And 505, pairing the sample clear frame and the sample fuzzy frame to generate a training data pair.

After the sample clear frame and the sample fuzzy frame in the same video frame interval are determined, pairwise matching is carried out on the sample clear frame and the sample fuzzy frame by the computer equipment, and a plurality of training data pairs are generated.

Illustratively, as shown in fig. 6, the training data pairs generated by the computer device include: (1 st frame, 2 nd frame), (1 st frame, 3 rd frame), (1 st frame, 4 th frame), (1 st frame, 5 th frame), (1 st frame, 6 th frame), and (1 st frame, 7 th frame).

Step 506, a generator of a model is generated by the training data for the input image, and a sample generation frame and sample optical flow information output by the generator are obtained.

In one possible embodiment, the image generation model comprises a first generator and a second generator, and accordingly, after the computer device inputs the sample fuzzy frame into the first generator, the first generator outputs the sample generation clear frame and the first sample optical flow information; and after the sample sharp frame is input into the second generator, the second generator outputs a sample generation fuzzy frame and second sample optical flow information.

Illustratively, as shown in FIG. 8, the computer device blurs the sample into a frame X_GTAfter the generator G is input, obtaining a sample generation clear frame Y output by the generator G_GAnd first sample optical flow information Y_OFClear frame of samples Y_GTAfter the input of the generator F, obtaining the sample generation fuzzy frame X output by the generator F_GAnd second sample optical flow information X_OF。

With respect to the network structure adopted by the generator in the above embodiments, in one possible implementation, the generator adopts a U-shaped network (UNet). The network structure adopted by the generator is not limited in the embodiments of the present application.

And 507, inputting the sample generation frame into a discriminator of the image generation model to obtain a discrimination result output by the discriminator.

In a possible implementation mode, the image generation model comprises a first discriminator and a second discriminator, and correspondingly, a sample generation clear frame output by the first discriminator is input into the second discriminator, and a discrimination result is output by the second discriminator; the sample output by the second generator generates a fuzzy frame and inputs the fuzzy frame into the first discriminator, and the first discriminator outputs a discrimination result.

Illustratively, as shown in FIG. 8, the computer device generates a sample into a clear frame Y_GInput discriminator D_YObtaining a discriminator D_YThe output discrimination result generates a fuzzy frame X from the sample_GInput discriminator D_XObtaining a discriminator D_XAnd outputting the judgment result.

With respect to the network structure adopted by the discriminator in the above embodiment, in a possible implementation, the discriminator adopts a residual error network (ResNet). In one illustrative example, the arbiter employs ResNet-50, the network structure of which is shown in Table one.

Watch 1

The step size of the first layer of convolutional layer 3 and convolutional layer 4 is 2, and each convolutional layer is followed by an active (ReLU) layer and a Batch Normalization (BN) layer, and the structure of the residual block is shown in fig. 9. The input (256-dimensional vector) of the residual block is subjected to convolution processing of 64 1 × 1 convolution kernels, 64 3 × 3 convolution kernels and 256 1 × 1 convolution kernels in sequence, and then is spliced with the original input and output.

Of course, the arbiter may also adopt other network structures, and the embodiment of the present application is only schematically illustrated by Resnet-50, but is not limited thereto.

The sample generation clear frame output by the first generator is also input into the second generator, the second generator performs inverse transformation on the sample generation clear frame, and the first discriminator discriminates the sample generation clear frame after the inverse transformation; the sample generated blurred frame output by the second generator is also input into the first generator, the first generator performs inverse transformation on the sample generated blurred frame, and the second discriminator discriminates the sample generated blurred frame after the inverse transformation.

After the sample generation frame, the sample optical flow information and the discrimination result are obtained through the steps, the computer equipment performs countermeasure training on the generator and the discriminator according to the loss function. In this embodiment, a strict sequence does not exist between the following steps 508 and 509, that is, the steps 508 and 509 may be executed simultaneously, which is not limited in this embodiment.

Step 508, determining a loss of cyclic consistency for the generator based on the training data pairs, the sample generation frame, and the sample optical flow information.

The loss function of the image generation model comprises two parts, namely a cycle consistency loss (cycle consistency loss) function of a generator and a discriminant loss function of a discriminator. In one possible implementation, the computer device trains data pairs, sample generation frames, and sample optical flow information, calculating a cyclic consistency loss for the generator by a cyclic consistency loss function.

In the related art, since the generator is used only for performing the style transformation, the computer device treats only the style transformation loss as the cycle consistency loss; in the embodiment of the present application, the generator can also predict the optical flow information in the style transformation process, and therefore, the computer device needs to add the optical flow transformation loss to the cyclic consistency loss. In one possible embodiment, the step may include the steps of:

firstly, determining the style transformation loss of a generator according to training data pairs and sample generation frames.

In a possible implementation mode, the sample generation frame obtained through the generator style transformation needs to be subjected to style inverse transformation through another generator, so that a style video frame similar to the original sample video frame is obtained. In order to prevent distortion during the transformation into the inverse transformation, the computer device determines the style transformation loss of the generator by calculating the difference between the original sample video frame (including the sample sharp frame and the sample fuzzy frame) and the sample generating frame (obtained after transforming and inversely transforming the original video frame) in the training data pair. Wherein the part of the stylistic transformation loss can be expressed as: i F (G (X)_GT))-X_GT||₁+||G(F(Y_GT))-Y_GT||₁，X_GTFor sample blurred frames, Y_GTFor the sample sharp frame, G denotes style conversion (blurred frame to sharp frame) by the first generator, and F denotes style conversion (sharp frame to blurred frame) by the second generator.

And secondly, determining the optical-flow transformation loss of the generator according to the training data pair and the sample optical-flow information.

In one possible embodiment, in order to enable the generator to learn timing information before and after the video frame style conversion, the computer device calculates an optical-flow transformation loss using optical-flow information between the sample sharp frame and the sample blurred frame in the training data as supervision information of the sample optical-flow information (output by the generator).

Optionally, the step may include the following steps:

1. first optical flow information is calculated for transforming the sample blurred frame to the sample sharp frame.

In the direction of the transformation of the blurred frame into a sharp frame, in a possible embodiment the computer device calculates the blurred frame X from the samples by means of an optical flow algorithm_GTConversion into sample sharp frames Y_GTThe optical flow is converted into a displacement map, and the displacement map is determined as first optical flow information. Wherein, the optical flow transformation displacement graph can be expressed as:and representing the displacement of the pixel in the x-axis direction,representing the displacement of the pixel in the y-axis direction.

2. A first optical-flow transformation loss is determined from the first optical-flow information and the first sample optical-flow information.

Optionally, the first generator generates a sample sharpness generation frame from the sample blur frame and outputs first sample optical flow information, and further, the computer device determines the first optical flow transformation loss by calculating a difference between the first optical flow information and the first sample optical flow information.

3. Second optical flow information is computed for transforming the sample sharp frame to the sample blurred frame.

In the direction of the transformation of the sharp frames into blurred frames, in a possible embodiment the computer device calculates the sharp frames Y from the samples by means of optical flow algorithms_GTTransformation into sample blurred frame X_GTThe optical flow is converted into a displacement map, and the displacement map is determined as second optical flow information. Wherein the optical flow transformation displacement map can be representedShown as follows:and representing the displacement of the pixel in the x-axis direction,representing the displacement of the pixel in the y-axis direction.

4. A second optical-flow transformation loss is determined from the second optical-flow information and the second sample optical-flow information.

Optionally, the second generator generates a sample blur generation frame from the sample sharp frame and outputs second sample optical flow information, and further, the computer device determines a second optical flow transformation loss by calculating a difference between the second optical flow information and the second sample optical flow information.

In addition to using the optical flow information between the sample sharp frame and the sample blurred frame as the supervision information, in another possible implementation, the computer device may perform optical flow transformation on the corresponding sample video frame in the training data pair according to the sample optical flow information, so as to calculate the optical flow transformation loss between the two using the other sample video frame in the training data pair as the supervision information.

Optionally, this step may include the following steps.

1. And carrying out optical flow transformation on the sample fuzzy frame through the first sample optical flow information to obtain a first optical flow generation frame.

In one possible embodiment, the computer device performs displacement transformation on pixels in the sample blurred frame according to the first sample optical flow information to obtain a first optical flow generation frame, which generates an optical flow generation blurred frame because the optical flow information from the sample blurred frame to the sample sharp frame is trained as supervision information in the training process.

Illustratively, as shown in FIG. 8, the computer device is based on the first sample optical flow information Y_OFAnd sample video frame X_GTGenerating a first optical flow generating frame Y_G ^OF。

2. A third optical-flow transformation loss is determined from the first optical-flow generation frame and the sample-sharp frame.

Further, the computer device generates frame supervision information by taking the sample clear frame as the first optical flow, and determines a third optical flow transformation loss by calculating the difference between the two, wherein the computer device can obtain the third optical flow transformation loss by calculating the difference of each pixel point.

3. And carrying out optical flow transformation on the sample clear frame through the second sample optical flow information to obtain a second optical flow generation frame.

In one possible embodiment, the computer device performs displacement transformation on pixels in the sample sharp frame according to the second sample optical flow information to obtain a second optical flow generation frame, and the second optical flow generation frame generates the fuzzy frame due to training with optical flow information from the sample sharp frame to the sample fuzzy frame as supervision information in the training process.

Illustratively, as shown in FIG. 8, the computer device is based on the second sample optical flow information X_OFAnd sample video frame Y_GTGenerating a first optical flow generating frame X_G ^OF。

4. A fourth optical-flow transform loss is determined from the second optical-flow generation frame and the sample-blurred frame.

Further, the computer device uses the sample blurred frame as the supervision information of the second optical flow generation frame, and determines a fourth optical flow transformation loss by calculating a difference between the two, where the fourth optical flow transformation loss is the same as the third optical flow transformation loss in the calculation method, and the description of this embodiment is omitted here.

And thirdly, determining the cycle consistency loss of the generator according to the style transformation loss and the optical flow transformation loss.

The computer device further determines a generated cyclic consistency loss by determining a stylistic transformation loss and an optical-flow transformation loss through the above steps. In one illustrative example, the round robin consistency loss function of the generator is as follows:

L_cOn＝||F(G(X_GT))-X_GT||₁+||f_OF(g_OF(X_GT))-X_GT||₁+||G(F(Y_GT))-Y_GT||₁+||g_OF(f_OF(Y_GT))-Y_GT||₁

wherein, X_GTFor sample blurred frames, Y_GTFor sample sharp frames, G denotes style conversion with a first generator, F denotes style conversion with a second generator, G_oFIt is shown that optical-flow conversion is performed using sample optical-flow information corresponding to the sample blurred frame, and foF shows that optical-flow conversion is performed using sample optical-flow information corresponding to the sample sharp frame.

In step 509, the discriminator loss of the discriminator is determined based on the discrimination result.

Similar to the determination of the discriminator loss process in the related art, the computer apparatus calculates a difference between the discrimination result and a flag (for identifying whether the video frame is a real frame or a generated frame), thereby determining the discriminator loss from the difference.

In one illustrative example, the first discriminator has a discriminator loss function as follows:

wherein, X_GTFor sample blurred frames, Y_GTFor sharp frames of samples, D_XIndicating that the input video frame is discriminated.

The discriminator loss function of the second discriminator is as follows:

wherein, C_GTFor sample blurred frames, Y_GTFor sharp frames of samples, D_YIndicating that the input video frame is discriminated.

Step 510, training an image generation model according to the cycle consistency loss and the discriminator loss.

Further, the computer device performs co-training on the image generation model by using the cycle consistency loss and the discriminator loss until the loss of the image generation model reaches a convergence condition. When the convergence condition is reached, the video frame generated by the generator is close to the real video frame, and the discriminator cannot discriminate the real video frame from the generated video frame (i.e. the discrimination probability is 50%).

Optionally, to further ensure the generated transform uniqueness and prevent transform distortion, the computer device may also train the image generation model in combination with a transform uniqueness loss function as follows.

L_spe＝||G(X_GT)-Y_GT||₁+||g_OF(X_GT)-Y_GT||₁+||F(Y_GT)-X_GT||₁+||/f_OF(Y_GT)-XGT1

The meaning of each parameter may refer to the above cyclic consistency loss function, which is not described herein again.

In the embodiment, the computer device determines the optical flow transformation loss of the generator according to the training data pair and the sample optical flow information, and integrates the optical flow transformation loss into the cycle consistency loss, so that the generator can learn the time sequence change information among the video frames in the training process, and the time sequence consistency of the generated video frames output by the generator is improved.

In addition, in this embodiment, the computer device performs focus prediction on the sample video frame by using the focus detection model, so as to automatically mark a clear sample frame and a fuzzy sample frame in the sample video frame according to a focus prediction result, thereby avoiding manual labeling and contributing to improving the training speed of the model.

In a possible embodiment, after the computer device extracts the training data pair from the sample video, data enhancement may be performed on the sample blurred frame and the sample sharp frame in the training data pair, in order to improve robustness, where the data enhancement includes at least one of: pixel random offset, video frame random scaling, or video frame random angular rotation.

For example, the computer device may perform random offset within 15 pixels on a sample video frame in the training data pair, may perform random scaling of 0.9 to 1.1 times on the sample video frame, and may perform random rotation within 10 ° on the sample video frame, thereby obtaining a training data pair after data enhancement, and then perform model training using the training data pair after data enhancement.

Referring to fig. 10, a flowchart of a video blur frame conversion method according to an exemplary embodiment of the present application is shown. The embodiment is described by taking the method as an example for a computer device, and the method comprises the following steps.

Step 1001, a target video is obtained, and the target video is obtained through medical imaging.

After the image generation model is obtained by training with the model training method provided by the embodiment, the computer equipment can convert the fuzzy frame in the target video into the clear frame by using the image generation model, so that the accuracy of subsequent diagnosis is improved. The target video may be an ultrasound video, a CT video, an NMR video, or the like (the type of the target video is consistent with that of the sample video used in the model training), which is not limited in this embodiment.

Step 1002, determine a blurred frame in the target video.

In a possible implementation manner, the blurred frame in the target video may be marked manually, or a focus prediction may be performed on each video frame by using a focus detection model, so as to determine the blurred frame according to a focus prediction result (refer to a process of determining a clear sample frame and a blurred sample frame in a training process), which is not limited in this embodiment.

Step 1003, inputting the blurred frame into a generator of an image generation model to obtain a sharp generated frame output by the generator, wherein the image generator comprises a generator and a discriminator, the image generator is obtained by training according to a training data pair, a sample generated frame, sample optical flow information and a discrimination result, the training data pair comprises a sample sharp frame and a sample blurred frame which belong to the same video frame interval in a sample video, the sample generated frame is obtained by performing style conversion on the sample sharp frame and the sample blurred frame by the generator, the sample optical flow information is output when the style conversion is performed by the generator and is used for indicating the optical flow conversion condition in the style conversion process, and the discrimination result is obtained by discriminating the sample generated frame by the discriminator.

When the trained image generation model is used for restoring a video blurred frame, the blurred frame is input into a generator (for converting the blurred frame into a clear frame, such as the first generator in the above embodiment) of the image generation model, and the generator performs style conversion on the blurred frame to obtain a clear generated frame.

Optionally, the computer device replaces the blurred frame in the target video with the clear generated frame output by the generator, so as to obtain a clear target video, and then performs medical diagnosis based on the clear target video.

In an exemplary application scenario, when a blurred frame in a breast ultrasound video needs to be restored, a computer device first trains a breast tumor detection model according to a large number (e.g., 5000) of breast ultrasound pictures (sharp pictures) containing tumor marking information, so as to train a breast ultrasound image generation model by using the breast tumor detection model and a large number (e.g., 90 segments of videos) of breast ultrasound videos without labels.

When a breast ultrasound image generation model is trained, a computer device divides a sample breast ultrasound video into different video frame intervals, and inputs sample video frames in the same video frame interval into a breast tumor detection model, so that a sample clear frame and a sample fuzzy frame in the sample video frames are determined according to a breast tumor prediction result output by the breast tumor detection model.

Further, the computer pairs the determined sample clear frame and the sample fuzzy frame to generate a plurality of training data pairs. And for each training data pair, inputting the training data pair into a mammary gland ultrasonic image generation model by the computer equipment, and training a generator and a discriminator of the mammary gland ultrasonic image generation model according to a sample generation frame, sample optical flow information and a discrimination result output by the mammary gland ultrasonic image generation model until a training convergence condition is reached.

After the mammary gland ultrasonic image generation model is obtained through the training in the process, the computer equipment can perform fuzzy frame restoration on the mammary gland ultrasonic video obtained through actual ultrasonic imaging by using the model to obtain a clear mammary gland ultrasonic video, so that medical staff can diagnose based on the clear mammary gland ultrasonic video.

Of course, the above embodiment only takes the breast ultrasound video as an example for schematic illustration, and in other possible application scenarios, the method may also perform blurred frame restoration on the ultrasound video of other human organs (such as the lung, the stomach, and the like), which is not limited by this embodiment.

Fig. 11 is a block diagram of a model training apparatus according to an exemplary embodiment of the present application, which may be disposed in the computer device according to the foregoing embodiment, as shown in fig. 11, the apparatus includes:

a first obtaining module 1110, which obtains a sample video, where the sample video is obtained through medical imaging;

an extracting module 1120, configured to extract a training data pair from the sample video, where the training data pair includes a sample sharp frame and a sample blurred frame, and the sample sharp frame and the sample blurred frame belong to a same video frame interval;

a second generating module 1130, configured to generate a model from the training data to a generator of an input image, and obtain a sample generating frame and sample optical flow information output by the generator, where the sample generating frame is obtained by performing style conversion on the sample sharp frame and the sample fuzzy frame, and the sample optical flow information is used to indicate an optical flow transformation condition in a style conversion process;

a judging module 1140, configured to input the sample generation frame into a discriminator of the image generation model, and obtain a judgment result output by the discriminator;

a training module 1150, configured to train the image generation model according to the training data pair, the sample generation frame, the sample optical flow information, and the discrimination result.

Optionally, the training module 1150 includes:

a first determining unit for determining a cyclic consistency loss of the generator from the training data pair, the sample generation frame and the sample optical flow information;

a second determination unit configured to determine a discriminator loss of the discriminator according to the discrimination result;

and the training unit is used for training the image generation model according to the cycle consistency loss and the discriminator loss.

Optionally, the first determining unit is configured to:

determining a loss of style transformation for the generator from the training data pairs and the sample generation frames;

determining an optical-flow transformation loss for the generator from the training data pairs and the sample optical-flow information;

determining the cyclic consistency loss of the generator from the stylistic transformation loss and the optical-flow transformation loss.

Optionally, the sample optical flow information includes first sample optical flow information and second sample optical flow information, the first sample optical flow information is output when a first generator generates a sample sharpness generation frame according to the sample blur frame, and the second sample optical flow information is output when a second generator generates a sample sharpness generation frame according to the sample sharpness frame;

the first determining unit is configured to:

calculating first optical flow information of the sample blurred frame transformed to the sample sharp frame;

determining a first optical-flow transformation loss from the first optical-flow information and the first sample optical-flow information;

calculating second optical flow information for transforming the sample sharp frame to the sample blurred frame;

determining a second optical-flow transformation loss from the second optical-flow information and the second sample optical-flow information.

Optionally, the first determining unit is further configured to:

carrying out optical flow transformation on the sample fuzzy frame through the first sample optical flow information to obtain a first optical flow generating frame;

determining a third optical-flow transformation loss from the first optical-flow generation frame and the sample-sharp frame;

performing optical flow transformation on the sample clear frame through the second sample optical flow information to obtain a second optical flow generation frame;

determining a fourth optical-flow transform loss from the second optical-flow generation frame and the sample-blurred frame.

Optionally, the extracting module 1120 includes:

the acquisition unit is used for acquiring n sample video frames belonging to the same video frame interval in the sample video;

the prediction unit is used for inputting the sample video frames into a focus detection model for each sample video frame in the n sample video frames to obtain a focus prediction result output by the focus detection model;

a third determining unit, configured to determine the sample sharp frame and the sample blurred frame in the n sample video frames according to the focus prediction results corresponding to the n sample video frames respectively;

and the generating unit is used for pairing the sample clear frame and the sample fuzzy frame to generate the training data pair.

Optionally, the lesion prediction result includes the number of lesions of the predicted lesion and the prediction probability corresponding to each predicted lesion;

the third determining unit is configured to:

determining a focus number threshold according to the focus number corresponding to each of the n sample video frames, wherein the focus number threshold is an average value or a median of the focus number;

if the number of the focuses corresponding to the sample video frame is smaller than the focus number threshold, determining the sample video frame as the sample fuzzy frame;

if the number of the focuses corresponding to the sample video frame is larger than the focus number threshold, obtaining the prediction probability corresponding to each prediction focus in the sample video frame;

and if the prediction probability corresponding to each prediction focus in the sample video frame is greater than a probability threshold, determining the frame as a clear sample frame.

Optionally, the apparatus further comprises:

a data enhancement module, configured to perform data enhancement on the sample blurred frame and the sample sharp frame in the training data pair, where a data enhancement mode includes at least one of the following: pixel random offset, video frame random scaling, or video frame random angular rotation.

Optionally, the generator uses UNet, and the discriminator uses ResNet.

Optionally, the sample video is an ultrasound video.

Fig. 12 is a block diagram of a video blur frame conversion apparatus according to an exemplary embodiment of the present application, which may be disposed in the computer device according to the foregoing embodiment, as shown in fig. 12, and the apparatus includes:

a second obtaining module 1210, configured to obtain a target video, where the target video is obtained through medical imaging;

a determining module 1220, configured to determine a blurred frame in the target video;

a second generating module 1230, configured to input the blurred frame into a generator of an image generation model, to obtain a sharp generated frame output by the generator, where the image generator includes the generator and a discriminator, the image generator is obtained by training according to a training data pair, a sample generated frame, sample optical flow information, and a discrimination result, the training data pair includes a sample sharp frame and a sample blurred frame belonging to a same video frame interval in a sample video, the sample generated frame is obtained by performing style conversion on the sample sharp frame and the sample blurred frame by the generator, the sample optical flow information is output when the generator performs style conversion, and is used to indicate an optical flow conversion condition in a style conversion process, and the discrimination result is obtained by discriminating the discriminator on the sample generated frame.

It should be noted that: the device provided in the above embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiment of the model training device and the embodiment of the model training method provided by the above embodiment belong to the same concept, the embodiment of the video blurred frame conversion device and the embodiment of the video blurred frame conversion method provided by the above embodiment belong to the same concept, and the specific implementation process thereof is described in the embodiment of the method, and is not described herein again.

Referring to fig. 13, a schematic structural diagram of a computer device according to an exemplary embodiment of the present application is shown. Specifically, the method comprises the following steps: the computer device 1300 includes a Central Processing Unit (CPU)1301, a system memory 1304 including a Random Access Memory (RAM)1302 and a Read Only Memory (ROM)1303, and a system bus 1305 connecting the system memory 1304 and the central processing unit 1301. The computer device 1300 also includes a basic input/output system (I/O system) 1306, which facilitates transfer of information between devices within the computer, and a mass storage device 1307 for storing an operating system 1313, application programs 1314, and other program modules 1315.

The basic input/output system 1306 includes a display 1308 for displaying information and an input device 1309, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 1308 and input device 1309 are connected to the central processing unit 1301 through an input-output controller 1310 connected to the system bus 1305. The basic input/output system 1306 may also include an input/output controller 1310 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1310 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1307 is connected to the central processing unit 1301 through a mass storage controller (not shown) connected to the system bus 1305. The mass storage device 1307 and its associated computer-readable media provide non-volatile storage for the computer device 1300. That is, the mass storage device 1307 may include a computer-readable medium (not shown) such as a hard disk or CD-ROI drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1304 and mass storage device 1307 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 1301, the one or more programs containing instructions for implementing the methods described above, and the central processing unit 1301 executes the one or more programs to implement the methods provided by the various method embodiments described above.

According to various embodiments of the present application, the computer device 1300 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1300 may be connected to the network 1312 through the network interface unit 1311, which is connected to the system bus 1305, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1311.

The memory also includes one or more programs, stored in the memory, that include instructions for performing the steps performed by the computer device in the methods provided by the embodiments of the present application.

The present application further provides a computer-readable storage medium, where at least one instruction, at least one program, a code set, or an instruction set is stored in the computer-readable storage medium, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the model training method according to any of the foregoing embodiments, or to implement the video blurred frame conversion method according to any of the foregoing embodiments.

The present application further provides a computer program product, which when running on a computer, causes the computer to execute the model training method provided in the above embodiments of the method, or execute the video fuzzy frame conversion method described in the above embodiments.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, which may be a computer readable storage medium contained in a memory of the above embodiments; or it may be a separate computer-readable storage medium not incorporated in the terminal. The computer readable storage medium has stored therein at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement the method of any of the above method embodiments.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein said training the image generation model based on the training data pairs, the sample generation frames, the sample optical flow information, and the discrimination results comprises:

determining a loss of cyclic consistency for the generator from the training data pairs, the sample generation frames, and the sample optical flow information;

determining the loss of the discriminator according to the discrimination result;

and training the image generation model according to the cycle consistency loss and the discriminator loss.

3. The method of claim 2, wherein said determining a loss of cyclic consistency for the generator from the training data pairs, the sample generation frames, and the sample optical flow information comprises:

4. The method according to claim 3, wherein the sample optical flow information includes first sample optical flow information and second sample optical flow information, the first sample optical flow information being output by a first generator when generating a sample sharpness generation frame from the sample blur frame, the second sample optical flow information being output by a second generator when generating a sample blur generation frame from the sample sharpness frame;

said determining optical-flow transform losses for said generator from said training data pairs and said sample optical-flow information, comprising:

5. The method of claim 4, wherein determining optical-flow transformation losses for the generator based on the training data pairs and the sample optical-flow information further comprises:

6. The method of any one of claims 1 to 5, wherein said extracting training data pairs from said sample video comprises:

acquiring n sample video frames belonging to the same video frame interval in the sample video;

for each sample video frame in the n sample video frames, inputting the sample video frame into a focus detection model to obtain a focus prediction result output by the focus detection model;

determining the sample clear frame and the sample fuzzy frame in the n sample video frames according to the focus prediction results corresponding to the n sample video frames respectively;

and matching the sample clear frame and the sample fuzzy frame to generate the training data pair.

7. The method according to claim 6, wherein the lesion prediction result comprises the number of lesions of the predicted lesion and the prediction probability corresponding to each predicted lesion;

the determining the sample sharp frame and the sample fuzzy frame in the n sample video frames according to the focus prediction results corresponding to the n sample video frames respectively comprises:

8. The method of any of claims 1 to 5, wherein after extracting the training data pairs from the sample video, the method further comprises:

performing data enhancement on the sample blurred frame and the sample sharp frame in the training data pair, wherein the data enhancement mode comprises at least one of the following modes: pixel random offset, video frame random scaling, or video frame random angular rotation.

9. A method according to any one of claims 1 to 5, wherein the generator employs a U-network UNet and the arbiter employs a residual network ResNet.

10. The method of any one of claims 1 to 5, wherein the sample video is an ultrasound video.

11. A method for video blur frame conversion, the method comprising:

determining a blurred frame in the target video;

12. A model training apparatus, the apparatus comprising:

13. An apparatus for video blur frame conversion, the apparatus comprising:

14. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the model training method of any one of claims 1 to 10 or to implement the video blur frame conversion method of claim 11.

15. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the model training method according to any one of claims 1 to 10 or to implement the video blurred frame conversion method according to claim 11.