CN113327265A - Optical flow estimation method and system based on guiding learning strategy - Google Patents
Optical flow estimation method and system based on guiding learning strategy Download PDFInfo
- Publication number
- CN113327265A CN113327265A CN202110649574.1A CN202110649574A CN113327265A CN 113327265 A CN113327265 A CN 113327265A CN 202110649574 A CN202110649574 A CN 202110649574A CN 113327265 A CN113327265 A CN 113327265A
- Authority
- CN
- China
- Prior art keywords
- network
- optical flow
- teacher
- student
- decoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Abstract
The invention provides an optical flow estimation method and system based on a guiding learning strategy, which comprises the steps of respectively sending images into a teacher network and a student network for feature extraction to obtain corresponding feature maps; calculating and minimizing Euclidean distances of feature maps acquired by a student network and a teacher network; and minimizing the optical flow estimation value and the real label value of the student network by using the loss function, and guiding the training of the student network by using the characteristics of a decoder of the teacher network. The method and system can obtain a student network with smaller parameters and still good performance, the guiding learning strategy achieves competitive performance on a plurality of data sets, and the model is compressed to a great extent.
Description
Technical Field
The invention relates to the technical field of computer image analysis, in particular to an optical flow estimation method and system based on a guiding learning strategy.
Background
The optical flow estimation is to give two frames of images, discriminate the difference of each pixel between the next frame and the previous frame, and estimate the amount of movement. At present, the operation of the existing optical flow estimation model needs to consume a large amount of computing resources, so that the optical flow estimation model is difficult to be applied to various mobile terminal devices. Although the neural network compression technology can effectively reduce the network parameters and save the computing resources, the compression method can also cause the reduction of the model precision when reducing the model scale.
The estimation of dense optical flow is a basic and key task in computer vision, and is widely applied to tracking, video segmentation, video target detection and motion recognition. However, optical flow estimation remains an open challenge due to illumination, occlusion, large displacements, and real-time requirements. Many studies have attempted to reduce the size of optical flow estimation networks by optimizing the network structure while maintaining performance levels. The main network compression methods include network pruning, network quantification and knowledge distillation. Although network pruning can greatly reduce network parameters, the original network is usually modified and retrained to obtain the basis of pruning. Furthermore, network quantification needs to rely on a customized hardware environment that does not provide verifiable efficiency improvements over general hardware to achieve full performance. Furthermore, pioneering studies in knowledge distillation have shown that training shallow or compact models with deep or highly complex models as managers can maintain good performance of small models. However, no study has been made to apply knowledge distillation to optical flow estimation networks.
Disclosure of Invention
In order to solve the technical problem that the optical flow estimation model in the prior art is too large, and the precision of the model is reduced after compression, the invention provides an optical flow estimation method and system based on a guiding learning strategy, so as to solve the technical problem.
According to one aspect of the invention, an optical flow estimation method based on a guide learning strategy is provided, which comprises the following steps:
s1: respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
s2: calculating and minimizing Euclidean distances of feature maps acquired by a student network and a teacher network; and
s3: and minimizing the optical flow estimation value and the real label value of the student network by using the loss function of the optical flow estimation network, and guiding the training of the student network by using the characteristics of a decoder of the teacher network.
In some embodiments, the number of convolution kernel channels of the student network is half that of the teacher network.
In some specific embodiments, a transformation layer is introduced in the training stage to transform the number of convolution kernel channels of the student network to be consistent with the number of convolution kernel channels of the teacher network.
In some specific embodiments, the transformation layer employs a 3 x 3 convolution kernel.
In some specific embodiments, the Euclidean distance is calculated by the formulaWherein, TiCharacteristic diagrams obtained for the teacher' S network, SiCharacteristic maps obtained for the student network.
In some specific embodiments, the optical flow estimates and the true tag values of the student network are minimized in step S3 using a loss function EPE, the loss functionWherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the true tag value.
In some specific embodiments, step S3 specifically includes using the feature map of the decoder of the teacher network to facilitate training of the decoder of the student network, and minimizing a difference between the feature map of the decoder side of the teacher network and the feature map of the decoder side of the student network.
In some specific embodiments, the loss function L (ω) ═ L of the overall network structuref(ω)+LEPE(ω)+γ||ω||2Optimizing student networks under the constraint of teacher network to minimizeA loss function L (ω), wherein Lf(ω) is a guiding loss function of the profile in guiding the strategy, LEPEAnd (omega) is a loss function of the optical flow estimation network, wherein omega represents a parameter of the whole network training, and gamma represents a weight attenuation coefficient.
In some specific embodiments, the loss function is guidedLoss function for optical flow estimation networkWherein λ isfIs hyperparametric, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,a translation layer is represented that is,an optical flow estimate representing the prediction of the i-th decoding module,is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
According to a second aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor implement the method of any of the above.
According to a third aspect of the present invention, there is provided an optical flow estimation system based on a guided learning strategy, the system comprising:
a feature extraction unit: the system is configured for respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
a distance minimizing unit: the Euclidean distance calculation method comprises the steps that Euclidean distances of feature graphs acquired by a student network and a teacher network are configured and minimized;
a guiding training unit: and the optical flow estimation value and the real label value of the student network are minimized by using the loss function of the optical flow estimation network, and the training of the student network is guided by using the characteristics of the decoder of the teacher network.
In some specific embodiments, the number of convolution kernel channels of the student network is half of that of the teacher network, and a transformation layer is introduced in the training stage to transform the number of convolution kernel channels of the student network to be consistent with that of the teacher network, wherein the transformation layer adopts 3 × 3 convolution kernels.
In some specific embodiments, the training unit is specifically configured to minimize the optical flow estimates and the true label values of the student network using a loss function EPE, the loss functionWherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) And representing the real label value, and utilizing the characteristic diagram of the decoder of the teacher network to promote the training of the decoder of the student network, and minimizing the difference value of the characteristic diagram of the decoder end of the teacher network and the characteristic diagram of the decoder end of the student network.
In some embodiments, the loss function L (ω) of the entire network in the system is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the student network under the constraint of the teacher network, and minimizing a loss function L (omega), wherein omega represents a parameter of whole network training, gamma represents a weight attenuation coefficient, and a guidance loss function of a characteristic diagram in the guidance strategy Loss function for optical flow estimation networkλfIs hyperparametric, TiAnd SiDecoder for teacher network and student network respectivelyIs characterized by comprising a characteristic diagram of (A),a translation layer is represented that is,an optical flow estimate representing the prediction of the i-th decoding module,is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
The invention provides an optical flow estimation method and system based on a guiding learning strategy, wherein two different frame images are input into a teacher network and a student network through the method, and feature learning of the student network is guided by using feature information of the teacher network, so that a lightweight student network is obtained through training. Under the supervision of the teacher network, the invention can obtain the student network with smaller parameter quantity and good performance. Such guided learning strategies achieve competitive performance across multiple data sets and compress the model to a large extent.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a guided learning strategy-based optical flow estimation method according to an embodiment of the present application;
FIG. 2 is a network framework diagram of guided learning strategy based optical flow estimation according to a specific embodiment of the present application;
FIG. 3 is a block diagram of an optical flow estimation system based on a guided learning strategy according to an embodiment of the present application;
FIG. 4 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a flowchart of an optical flow estimation method based on a guided learning strategy according to an embodiment of the present application. In connection with the network framework diagram of fig. 2, the overall framework includes a teacher and student network. The teacher network is an optical flow estimation network trained by using real-label (ground-route), and the weight of the teacher network is fixed and unchangeable when the guiding learning strategy is executed. In the present invention, two effective teacher networks are used, FlowNet and PWC-Net. Student networks have fewer parameters and faster reasoning speeds than teacher networks. And randomly initializing and training the weights of the student network by guiding learning. In order to prove the effectiveness of the leading learning framework, when the student network is designed, no skill is needed, and only the number of convolution kernel channels in the teacher network (FlowNets and PWC-Net) is reduced by half, so that the student network (Minor-FlowNets and Minor-PWC-Net) with less corresponding parameter quantity is obtained.
As shown in fig. 1, the method includes:
s101: and respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps. Two images { I1,I2The teacher network (TN1) and the student network (SN2) are respectively sent to carry outExtracting features to obtain a corresponding feature map: { TiAnd { S }iI denotes the i-th volume block.
In a particular embodiment, S is the number of convolution kernel channels of the student network is half that of the teacher networkiWill be TiHalf of, for the following step, loss function calculation, introducing a transform layer to SiChannel becomes sum TiThe number of channels is the same. The transform layer uses an n × n convolution kernel, and the transform layer is only used for training and is not needed for testing, and preferably, the present application uses a 3 × 3 convolution kernel.
S102: and calculating and minimizing Euclidean distances of the feature maps acquired by the student network and the teacher network. The calculation formula of the Euclidean distance isWherein, TiCharacteristic diagrams obtained for the teacher' S network, SiCharacteristic maps obtained for the student network.
S103: and minimizing the optical flow estimation value and the real label value of the student network by using the loss function, and guiding the training of the student network by using the characteristics of a decoder of the teacher network.
In particular embodiments, student networks are trained by minimizing optical flow estimates and true tag values for the student network by a loss function Excepted Prediction Error (EPE) Wherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the true tag value.
In a specific embodiment, the optical flow estimation network mainly performs convolution, pooling and nonlinear activation on an input image to obtain a feature map, and then sends the optical flow estimation map to a decoder for deconvolution. The teaching strategy is used in the decoder, and the feature diagram of the teacher network decoder is used for teaching the training of the student network. The teaching strategy is that in the training process, the characteristic diagram of the teacher network decoder is used for promoting the training of the student network decoder, and the difference value between the characteristic diagram of the teacher network decoder end and the characteristic diagram of the student network decoder end is minimized, so that the student network can learn the characteristic information which is the same as that of the teacher network by using fewer parameters.
In a specific embodiment, the loss function of the entire network structure is as follows: l (ω) ═ Lf(ω)+LEPE(ω)+γ||ω||2,
Wherein, omega represents the parameter of the whole network training, gamma represents the weight attenuation coefficient, L (omega) represents the whole loss function, the student network is optimized under the constraint of the teacher network, and L (omega) is minimized; l isf(ω) a guidance loss function, λ, representing a feature map in guiding the strategyfIs a hyperparametric, SiRepresenting features extracted by the i-th decoding module, TiAnd SiA feature map representing teacher network and student network decoders;representation conversion layer, LEPE(ω) is defined as a loss function of the optical flow estimation network;the optical flow estimate predicted by the i-th decoding module,is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
The guiding learning provided by the method can obtain competitive network performance without modifying or retraining the original network as network pruning or without specific hardware environment support as network quantification. The high-level semantic feature information in the original optical flow estimation network is used for supervision, so that the network with smaller parameter number is trained, and the student network can effectively learn the robust and efficient features. The compression framework is realized by extracting knowledge from a feature map of a decoder in an original network, and a feature guidance strategy is proposed to transmit the knowledge into a small network.
The inventors of the present application performed performance verification on three data sets. The experimental results are verified in data sets of Flying Chairs, Sintel Clean and Sintel Final respectively, the teacher network is FlowNet and PWC-Net respectively, the corresponding student networks are Minor-FlowNet and Minor-PWC-Net, and the performance verification in the following table 1 shows the precision improvement of the student networks in guiding the learning strategy.
TABLE 1 Performance verification
The size of the Minor-FlowNeTS model is only 9.2M, the running speed on the GTX1080 video card is nearly 2.3 times faster, the running speed of the FlowNeTS is 20ms, and the running speed of the Minor-FlowNeTS is only 8 ms. Compared with the parameter quantity of the PWC-Net, the model size of the Minor-PWC-Net is only 2.7M, while the model size of the PWC-Net is 8.9M, and the operation speed on GTX1080 is 1.4 times faster. The results show that the optical flow estimation network has a great improvement in the accuracy and speed trade-off.
With continued reference to FIG. 3, FIG. 3 illustrates a framework diagram of a guided learning strategy-based optical flow estimation system according to an embodiment of the present application. The system specifically comprises a feature extraction unit 301, a distance minimization unit 302 and a guiding training unit 303.
In a specific embodiment, the feature extraction unit 301 is configured to send the images to a teacher network and a student network respectively for feature extraction to obtain corresponding feature maps, where the number of convolution kernel channels of the student network is half of that of the teacher network, and a transformation layer is introduced in the training stage to extract features of the imagesThe number of convolution kernel channels of the student network is converted to be consistent with that of the convolution kernel channels of the teacher network, wherein the conversion layer adopts 3 x 3 convolution kernels. The distance minimizing unit 302 is configured to calculate and minimize the euclidean distance of the feature maps acquired by the student network and the teacher network, the euclidean distance being calculated by the formula Wherein, TiCharacteristic diagrams obtained for the teacher' S network, SiCharacteristic maps obtained for the student network. The guiding training unit 303 is configured to minimize the optical flow estimate and the true tag values of the student network using a loss function EPE, the loss function EPEWherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) And representing the real label value, and utilizing the characteristic diagram of the decoder of the teacher network to promote the training of the decoder of the student network, and minimizing the difference value of the characteristic diagram of the decoder end of the teacher network and the characteristic diagram of the decoder end of the student network.
In a specific embodiment, the loss function L (ω) of the entire network in the system is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the student network under the constraint of the teacher network, and minimizing a loss function L (omega), wherein omega represents a parameter of whole network training, gamma represents a weight attenuation coefficient, and a guidance loss function of a characteristic diagram in the guidance strategy Loss function for optical flow estimation networkλfIs a root of Chao ShenNumber, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,a translation layer is represented that is,an optical flow estimate representing the prediction of the i-th decoding module,is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
The system provides a new compression framework of optical flow estimation network, namely guiding learning, the framework is composed of two flow networks, and the compressed optical flow estimation network can be trained more effectively than the compressed optical flow estimation network which only uses real-label (ground-route) supervision. The basic idea is that an existing network which achieves satisfactory performance in optical flow estimation can be used as a teacher network, and another lightweight network is supervised as student network training for optical flow estimation. For how to effectively monitor the lightweight network and keep the accuracy by using a complete optical flow estimation network, the optical flow estimation network adopts a mode of firstly extracting a feature map and then decoding the feature map by adopting various methods to obtain an optical flow estimation result, so that the optical flow estimation network can be divided into a feature encoder and a decoder. The optical flow estimation network is realized by powerful improvement on the feature decoder, because the feature decoder fuses together the feature maps extracted by the encoder, and each decoder block can also directly predict the optical flow graph, which cannot be realized by the low-level feature map of the encoder. On this basis, the feature decoder contains rich information and effective knowledge, so the teacher network should guide the student network to train the decoder. Such guided learning strategies achieve competitive performance across multiple data sets and compress the model to a large extent.
Referring now to FIG. 4, shown is a block diagram of a computer system 400 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps; calculating and minimizing Euclidean distances of feature maps acquired by a student network and a teacher network; and minimizing the optical flow estimation value and the real label value of the student network by using the loss function, and guiding the training of the student network by using the characteristics of a decoder of the teacher network.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
Claims (14)
1. An optical flow estimation method based on a guiding learning strategy is characterized by comprising the following steps:
s1: respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
s2: calculating and minimizing Euclidean distances of the feature maps acquired by the student network and the teacher network; and
s3: minimizing the optical flow estimate and the true tag values of the student network using a loss function of the optical flow estimation network, and guiding training of the student network using features of a decoder of the teacher network.
2. The guided learning strategy-based optical flow estimation method of claim 1, wherein the number of convolution kernel channels of the student network is half of that of the teacher network.
3. The guided learning strategy-based optical flow estimation method of claim 2, wherein a transformation layer is introduced in a training phase to transform the number of convolution kernel channels of the student network to be consistent with the number of convolution kernel channels of the teacher network.
4. The guided learning strategy-based optical flow estimation method of claim 3, wherein the transformation layer employs a 3 x 3 convolution kernel.
6. The optical flow estimation method based on the guided learning strategy according to claim 1, wherein the optical flow estimation value and the real label value of the student network are minimized by using a loss function EPE in the step S3, wherein the loss function EPE is used for the estimationWherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the true tag value.
7. The method for optical flow estimation based on guided learning strategy according to claim 1, wherein the step S3 specifically comprises using the feature map of the decoder of the teacher network to facilitate the training of the decoder of the student network, and minimizing the difference between the feature map of the decoder side of the teacher network and the feature map of the decoder side of the student network.
8. The guided learning strategy-based optical flow estimation method of claim 1, wherein a loss function L (ω) of the entire network structure is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the learning under the constraints of the teacher networkGenerating a network, minimizing the loss function L (ω), wherein Lf(ω) is a guiding loss function of the profile in guiding the strategy, LEPEAnd (omega) is a loss function of the optical flow estimation network, wherein omega represents a parameter of the whole network training, and gamma represents a weight attenuation coefficient.
9. The guided learning strategy-based optical flow estimation method of claim 8, wherein the guided loss functionLoss function of the optical flow estimation network Wherein λ isfIs hyperparametric, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,a translation layer is represented that is,an optical flow estimate representing the prediction of the i-th decoding module,is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
10. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any one of claims 1 to 9.
11. An optical flow estimation system based on a guided learning strategy, the system comprising:
a feature extraction unit: the system is configured for respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
a distance minimizing unit: the Euclidean distance calculation method comprises the steps that Euclidean distances of feature graphs acquired by the student network and the teacher network are configured and minimized;
a guiding training unit: configured to minimize optical flow estimate values and true tag values of the student network using a loss function of the optical flow estimation network, the training of the student network being guided using features of a decoder of the teacher network.
12. The guided learning strategy-based optical flow estimation system of claim 11, wherein the number of convolution kernel channels of the student network is half of the number of convolution kernel channels of the teacher network, and a transformation layer is introduced in a training phase to transform the number of convolution kernel channels of the student network to be consistent with the number of convolution kernel channels of the teacher network, wherein the transformation layer employs a 3 x 3 convolution kernel.
13. The guided learning strategy-based optical flow estimation system of claim 11, wherein the training unit is specifically configured to minimize optical flow estimates and true label values of the student network using a loss function EPE, the loss function EPEWherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the real label value, utilizing the characteristic diagram of the decoder of the teacher network to promote the training of the decoder of the student network, and minimizing the difference value of the characteristic diagram of the decoder end of the teacher network and the characteristic diagram of the decoder end of the student network.
14. The method of claim 11The optical flow estimation system based on the guided learning strategy is characterized in that the loss function L (omega) of the whole network in the system is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the student network under the constraint of the teacher network, and minimizing the loss function L (omega), wherein omega represents the parameter of the whole network training, gamma represents the weight attenuation coefficient, and the guiding loss function of the characteristic diagram in guiding the strategyLoss function for optical flow estimation networkλfIs hyperparametric, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,a translation layer is represented that is,an optical flow estimate representing the prediction of the i-th decoding module,is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110649574.1A CN113327265B (en) | 2021-06-10 | 2021-06-10 | Optical flow estimation method and system based on guiding learning strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110649574.1A CN113327265B (en) | 2021-06-10 | 2021-06-10 | Optical flow estimation method and system based on guiding learning strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113327265A true CN113327265A (en) | 2021-08-31 |
CN113327265B CN113327265B (en) | 2022-07-15 |
Family
ID=77420479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110649574.1A Active CN113327265B (en) | 2021-06-10 | 2021-06-10 | Optical flow estimation method and system based on guiding learning strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113327265B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113920574A (en) * | 2021-12-15 | 2022-01-11 | 深圳市视美泰技术股份有限公司 | Training method and device for picture quality evaluation model, computer equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110268342A1 (en) * | 2006-11-09 | 2011-11-03 | Drvision Technologies Llc | Method for moving cell detection from temporal image sequence model estimation |
CN110880036A (en) * | 2019-11-20 | 2020-03-13 | 腾讯科技(深圳)有限公司 | Neural network compression method and device, computer equipment and storage medium |
CN111401406A (en) * | 2020-02-21 | 2020-07-10 | 华为技术有限公司 | Neural network training method, video frame processing method and related equipment |
CN111523410A (en) * | 2020-04-09 | 2020-08-11 | 哈尔滨工业大学 | Video saliency target detection method based on attention mechanism |
CN111667399A (en) * | 2020-05-14 | 2020-09-15 | 华为技术有限公司 | Method for training style migration model, method and device for video style migration |
-
2021
- 2021-06-10 CN CN202110649574.1A patent/CN113327265B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110268342A1 (en) * | 2006-11-09 | 2011-11-03 | Drvision Technologies Llc | Method for moving cell detection from temporal image sequence model estimation |
CN110880036A (en) * | 2019-11-20 | 2020-03-13 | 腾讯科技(深圳)有限公司 | Neural network compression method and device, computer equipment and storage medium |
CN111401406A (en) * | 2020-02-21 | 2020-07-10 | 华为技术有限公司 | Neural network training method, video frame processing method and related equipment |
CN111523410A (en) * | 2020-04-09 | 2020-08-11 | 哈尔滨工业大学 | Video saliency target detection method based on attention mechanism |
CN111667399A (en) * | 2020-05-14 | 2020-09-15 | 华为技术有限公司 | Method for training style migration model, method and device for video style migration |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113920574A (en) * | 2021-12-15 | 2022-01-11 | 深圳市视美泰技术股份有限公司 | Training method and device for picture quality evaluation model, computer equipment and medium |
CN113920574B (en) * | 2021-12-15 | 2022-03-18 | 深圳市视美泰技术股份有限公司 | Training method and device for picture quality evaluation model, computer equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113327265B (en) | 2022-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111930992B (en) | Neural network training method and device and electronic equipment | |
CN110119757B (en) | Model training method, video category detection method, device, electronic equipment and computer readable medium | |
CN107481717B (en) | Acoustic model training method and system | |
CN110622176B (en) | Video partitioning | |
CN111079532B (en) | Video content description method based on text self-encoder | |
CN110929780B (en) | Video classification model construction method, video classification device, video classification equipment and medium | |
US20200104640A1 (en) | Committed information rate variational autoencoders | |
CN111916067A (en) | Training method and device of voice recognition model, electronic equipment and storage medium | |
CN111523640B (en) | Training method and device for neural network model | |
CN111210446B (en) | Video target segmentation method, device and equipment | |
CN110781413B (en) | Method and device for determining interest points, storage medium and electronic equipment | |
CN111597961B (en) | Intelligent driving-oriented moving target track prediction method, system and device | |
CN112634296A (en) | RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism | |
CN113327599B (en) | Voice recognition method, device, medium and electronic equipment | |
CN111653270B (en) | Voice processing method and device, computer readable storage medium and electronic equipment | |
CN116050496A (en) | Determination method and device, medium and equipment of picture description information generation model | |
CN115239593A (en) | Image restoration method, image restoration device, electronic device, and storage medium | |
CN117475038B (en) | Image generation method, device, equipment and computer readable storage medium | |
CN110472673B (en) | Parameter adjustment method, fundus image processing device, fundus image processing medium and fundus image processing apparatus | |
CN113327265B (en) | Optical flow estimation method and system based on guiding learning strategy | |
CN113850012B (en) | Data processing model generation method, device, medium and electronic equipment | |
CN111161724B (en) | Method, system, equipment and medium for Chinese audio-visual combined speech recognition | |
CN117291232A (en) | Image generation method and device based on diffusion model | |
CN112364933A (en) | Image classification method and device, electronic equipment and storage medium | |
US11501759B1 (en) | Method, system for speech recognition, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |