CN113327265A - Optical flow estimation method and system based on guiding learning strategy - Google Patents

Optical flow estimation method and system based on guiding learning strategy Download PDF

Info

Publication number
CN113327265A
CN113327265A CN202110649574.1A CN202110649574A CN113327265A CN 113327265 A CN113327265 A CN 113327265A CN 202110649574 A CN202110649574 A CN 202110649574A CN 113327265 A CN113327265 A CN 113327265A
Authority
CN
China
Prior art keywords
network
optical flow
teacher
student
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110649574.1A
Other languages
Chinese (zh)
Other versions
CN113327265B (en
Inventor
吴俊毅
姚灿荣
高志鹏
赵建强
杜新胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202110649574.1A priority Critical patent/CN113327265B/en
Publication of CN113327265A publication Critical patent/CN113327265A/en
Application granted granted Critical
Publication of CN113327265B publication Critical patent/CN113327265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The invention provides an optical flow estimation method and system based on a guiding learning strategy, which comprises the steps of respectively sending images into a teacher network and a student network for feature extraction to obtain corresponding feature maps; calculating and minimizing Euclidean distances of feature maps acquired by a student network and a teacher network; and minimizing the optical flow estimation value and the real label value of the student network by using the loss function, and guiding the training of the student network by using the characteristics of a decoder of the teacher network. The method and system can obtain a student network with smaller parameters and still good performance, the guiding learning strategy achieves competitive performance on a plurality of data sets, and the model is compressed to a great extent.

Description

Optical flow estimation method and system based on guiding learning strategy
Technical Field
The invention relates to the technical field of computer image analysis, in particular to an optical flow estimation method and system based on a guiding learning strategy.
Background
The optical flow estimation is to give two frames of images, discriminate the difference of each pixel between the next frame and the previous frame, and estimate the amount of movement. At present, the operation of the existing optical flow estimation model needs to consume a large amount of computing resources, so that the optical flow estimation model is difficult to be applied to various mobile terminal devices. Although the neural network compression technology can effectively reduce the network parameters and save the computing resources, the compression method can also cause the reduction of the model precision when reducing the model scale.
The estimation of dense optical flow is a basic and key task in computer vision, and is widely applied to tracking, video segmentation, video target detection and motion recognition. However, optical flow estimation remains an open challenge due to illumination, occlusion, large displacements, and real-time requirements. Many studies have attempted to reduce the size of optical flow estimation networks by optimizing the network structure while maintaining performance levels. The main network compression methods include network pruning, network quantification and knowledge distillation. Although network pruning can greatly reduce network parameters, the original network is usually modified and retrained to obtain the basis of pruning. Furthermore, network quantification needs to rely on a customized hardware environment that does not provide verifiable efficiency improvements over general hardware to achieve full performance. Furthermore, pioneering studies in knowledge distillation have shown that training shallow or compact models with deep or highly complex models as managers can maintain good performance of small models. However, no study has been made to apply knowledge distillation to optical flow estimation networks.
Disclosure of Invention
In order to solve the technical problem that the optical flow estimation model in the prior art is too large, and the precision of the model is reduced after compression, the invention provides an optical flow estimation method and system based on a guiding learning strategy, so as to solve the technical problem.
According to one aspect of the invention, an optical flow estimation method based on a guide learning strategy is provided, which comprises the following steps:
s1: respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
s2: calculating and minimizing Euclidean distances of feature maps acquired by a student network and a teacher network; and
s3: and minimizing the optical flow estimation value and the real label value of the student network by using the loss function of the optical flow estimation network, and guiding the training of the student network by using the characteristics of a decoder of the teacher network.
In some embodiments, the number of convolution kernel channels of the student network is half that of the teacher network.
In some specific embodiments, a transformation layer is introduced in the training stage to transform the number of convolution kernel channels of the student network to be consistent with the number of convolution kernel channels of the teacher network.
In some specific embodiments, the transformation layer employs a 3 x 3 convolution kernel.
In some specific embodiments, the Euclidean distance is calculated by the formula
Figure BDA0003111209200000021
Wherein, TiCharacteristic diagrams obtained for the teacher' S network, SiCharacteristic maps obtained for the student network.
In some specific embodiments, the optical flow estimates and the true tag values of the student network are minimized in step S3 using a loss function EPE, the loss function
Figure BDA0003111209200000022
Wherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the true tag value.
In some specific embodiments, step S3 specifically includes using the feature map of the decoder of the teacher network to facilitate training of the decoder of the student network, and minimizing a difference between the feature map of the decoder side of the teacher network and the feature map of the decoder side of the student network.
In some specific embodiments, the loss function L (ω) ═ L of the overall network structuref(ω)+LEPE(ω)+γ||ω||2Optimizing student networks under the constraint of teacher network to minimizeA loss function L (ω), wherein Lf(ω) is a guiding loss function of the profile in guiding the strategy, LEPEAnd (omega) is a loss function of the optical flow estimation network, wherein omega represents a parameter of the whole network training, and gamma represents a weight attenuation coefficient.
In some specific embodiments, the loss function is guided
Figure BDA0003111209200000023
Loss function for optical flow estimation network
Figure BDA0003111209200000024
Wherein λ isfIs hyperparametric, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,
Figure BDA0003111209200000025
a translation layer is represented that is,
Figure BDA0003111209200000026
an optical flow estimate representing the prediction of the i-th decoding module,
Figure BDA0003111209200000027
is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
According to a second aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor implement the method of any of the above.
According to a third aspect of the present invention, there is provided an optical flow estimation system based on a guided learning strategy, the system comprising:
a feature extraction unit: the system is configured for respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
a distance minimizing unit: the Euclidean distance calculation method comprises the steps that Euclidean distances of feature graphs acquired by a student network and a teacher network are configured and minimized;
a guiding training unit: and the optical flow estimation value and the real label value of the student network are minimized by using the loss function of the optical flow estimation network, and the training of the student network is guided by using the characteristics of the decoder of the teacher network.
In some specific embodiments, the number of convolution kernel channels of the student network is half of that of the teacher network, and a transformation layer is introduced in the training stage to transform the number of convolution kernel channels of the student network to be consistent with that of the teacher network, wherein the transformation layer adopts 3 × 3 convolution kernels.
In some specific embodiments, the training unit is specifically configured to minimize the optical flow estimates and the true label values of the student network using a loss function EPE, the loss function
Figure BDA0003111209200000031
Wherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) And representing the real label value, and utilizing the characteristic diagram of the decoder of the teacher network to promote the training of the decoder of the student network, and minimizing the difference value of the characteristic diagram of the decoder end of the teacher network and the characteristic diagram of the decoder end of the student network.
In some embodiments, the loss function L (ω) of the entire network in the system is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the student network under the constraint of the teacher network, and minimizing a loss function L (omega), wherein omega represents a parameter of whole network training, gamma represents a weight attenuation coefficient, and a guidance loss function of a characteristic diagram in the guidance strategy
Figure BDA0003111209200000032
Figure BDA0003111209200000033
Loss function for optical flow estimation network
Figure BDA0003111209200000034
λfIs hyperparametric, TiAnd SiDecoder for teacher network and student network respectivelyIs characterized by comprising a characteristic diagram of (A),
Figure BDA0003111209200000035
a translation layer is represented that is,
Figure BDA0003111209200000036
an optical flow estimate representing the prediction of the i-th decoding module,
Figure BDA0003111209200000037
is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
The invention provides an optical flow estimation method and system based on a guiding learning strategy, wherein two different frame images are input into a teacher network and a student network through the method, and feature learning of the student network is guided by using feature information of the teacher network, so that a lightweight student network is obtained through training. Under the supervision of the teacher network, the invention can obtain the student network with smaller parameter quantity and good performance. Such guided learning strategies achieve competitive performance across multiple data sets and compress the model to a large extent.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a guided learning strategy-based optical flow estimation method according to an embodiment of the present application;
FIG. 2 is a network framework diagram of guided learning strategy based optical flow estimation according to a specific embodiment of the present application;
FIG. 3 is a block diagram of an optical flow estimation system based on a guided learning strategy according to an embodiment of the present application;
FIG. 4 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a flowchart of an optical flow estimation method based on a guided learning strategy according to an embodiment of the present application. In connection with the network framework diagram of fig. 2, the overall framework includes a teacher and student network. The teacher network is an optical flow estimation network trained by using real-label (ground-route), and the weight of the teacher network is fixed and unchangeable when the guiding learning strategy is executed. In the present invention, two effective teacher networks are used, FlowNet and PWC-Net. Student networks have fewer parameters and faster reasoning speeds than teacher networks. And randomly initializing and training the weights of the student network by guiding learning. In order to prove the effectiveness of the leading learning framework, when the student network is designed, no skill is needed, and only the number of convolution kernel channels in the teacher network (FlowNets and PWC-Net) is reduced by half, so that the student network (Minor-FlowNets and Minor-PWC-Net) with less corresponding parameter quantity is obtained.
As shown in fig. 1, the method includes:
s101: and respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps. Two images { I1,I2The teacher network (TN1) and the student network (SN2) are respectively sent to carry outExtracting features to obtain a corresponding feature map: { TiAnd { S }iI denotes the i-th volume block.
In a particular embodiment, S is the number of convolution kernel channels of the student network is half that of the teacher networkiWill be TiHalf of, for the following step, loss function calculation, introducing a transform layer to SiChannel becomes sum TiThe number of channels is the same. The transform layer uses an n × n convolution kernel, and the transform layer is only used for training and is not needed for testing, and preferably, the present application uses a 3 × 3 convolution kernel.
S102: and calculating and minimizing Euclidean distances of the feature maps acquired by the student network and the teacher network. The calculation formula of the Euclidean distance is
Figure BDA0003111209200000041
Wherein, TiCharacteristic diagrams obtained for the teacher' S network, SiCharacteristic maps obtained for the student network.
S103: and minimizing the optical flow estimation value and the real label value of the student network by using the loss function, and guiding the training of the student network by using the characteristics of a decoder of the teacher network.
In particular embodiments, student networks are trained by minimizing optical flow estimates and true tag values for the student network by a loss function Excepted Prediction Error (EPE)
Figure BDA0003111209200000051
Figure BDA0003111209200000052
Wherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the true tag value.
In a specific embodiment, the optical flow estimation network mainly performs convolution, pooling and nonlinear activation on an input image to obtain a feature map, and then sends the optical flow estimation map to a decoder for deconvolution. The teaching strategy is used in the decoder, and the feature diagram of the teacher network decoder is used for teaching the training of the student network. The teaching strategy is that in the training process, the characteristic diagram of the teacher network decoder is used for promoting the training of the student network decoder, and the difference value between the characteristic diagram of the teacher network decoder end and the characteristic diagram of the student network decoder end is minimized, so that the student network can learn the characteristic information which is the same as that of the teacher network by using fewer parameters.
In a specific embodiment, the loss function of the entire network structure is as follows: l (ω) ═ Lf(ω)+LEPE(ω)+γ||ω||2
Figure BDA0003111209200000053
Wherein, omega represents the parameter of the whole network training, gamma represents the weight attenuation coefficient, L (omega) represents the whole loss function, the student network is optimized under the constraint of the teacher network, and L (omega) is minimized; l isf(ω) a guidance loss function, λ, representing a feature map in guiding the strategyfIs a hyperparametric, SiRepresenting features extracted by the i-th decoding module, TiAnd SiA feature map representing teacher network and student network decoders;
Figure BDA0003111209200000054
representation conversion layer, LEPE(ω) is defined as a loss function of the optical flow estimation network;
Figure BDA0003111209200000055
the optical flow estimate predicted by the i-th decoding module,
Figure BDA0003111209200000056
is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
The guiding learning provided by the method can obtain competitive network performance without modifying or retraining the original network as network pruning or without specific hardware environment support as network quantification. The high-level semantic feature information in the original optical flow estimation network is used for supervision, so that the network with smaller parameter number is trained, and the student network can effectively learn the robust and efficient features. The compression framework is realized by extracting knowledge from a feature map of a decoder in an original network, and a feature guidance strategy is proposed to transmit the knowledge into a small network.
The inventors of the present application performed performance verification on three data sets. The experimental results are verified in data sets of Flying Chairs, Sintel Clean and Sintel Final respectively, the teacher network is FlowNet and PWC-Net respectively, the corresponding student networks are Minor-FlowNet and Minor-PWC-Net, and the performance verification in the following table 1 shows the precision improvement of the student networks in guiding the learning strategy.
TABLE 1 Performance verification
Figure BDA0003111209200000057
Figure BDA0003111209200000061
The size of the Minor-FlowNeTS model is only 9.2M, the running speed on the GTX1080 video card is nearly 2.3 times faster, the running speed of the FlowNeTS is 20ms, and the running speed of the Minor-FlowNeTS is only 8 ms. Compared with the parameter quantity of the PWC-Net, the model size of the Minor-PWC-Net is only 2.7M, while the model size of the PWC-Net is 8.9M, and the operation speed on GTX1080 is 1.4 times faster. The results show that the optical flow estimation network has a great improvement in the accuracy and speed trade-off.
With continued reference to FIG. 3, FIG. 3 illustrates a framework diagram of a guided learning strategy-based optical flow estimation system according to an embodiment of the present application. The system specifically comprises a feature extraction unit 301, a distance minimization unit 302 and a guiding training unit 303.
In a specific embodiment, the feature extraction unit 301 is configured to send the images to a teacher network and a student network respectively for feature extraction to obtain corresponding feature maps, where the number of convolution kernel channels of the student network is half of that of the teacher network, and a transformation layer is introduced in the training stage to extract features of the imagesThe number of convolution kernel channels of the student network is converted to be consistent with that of the convolution kernel channels of the teacher network, wherein the conversion layer adopts 3 x 3 convolution kernels. The distance minimizing unit 302 is configured to calculate and minimize the euclidean distance of the feature maps acquired by the student network and the teacher network, the euclidean distance being calculated by the formula
Figure BDA0003111209200000062
Figure BDA0003111209200000063
Wherein, TiCharacteristic diagrams obtained for the teacher' S network, SiCharacteristic maps obtained for the student network. The guiding training unit 303 is configured to minimize the optical flow estimate and the true tag values of the student network using a loss function EPE, the loss function EPE
Figure BDA0003111209200000064
Wherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) And representing the real label value, and utilizing the characteristic diagram of the decoder of the teacher network to promote the training of the decoder of the student network, and minimizing the difference value of the characteristic diagram of the decoder end of the teacher network and the characteristic diagram of the decoder end of the student network.
In a specific embodiment, the loss function L (ω) of the entire network in the system is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the student network under the constraint of the teacher network, and minimizing a loss function L (omega), wherein omega represents a parameter of whole network training, gamma represents a weight attenuation coefficient, and a guidance loss function of a characteristic diagram in the guidance strategy
Figure BDA0003111209200000065
Figure BDA0003111209200000066
Loss function for optical flow estimation network
Figure BDA0003111209200000067
λfIs a root of Chao ShenNumber, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,
Figure BDA0003111209200000068
a translation layer is represented that is,
Figure BDA0003111209200000069
an optical flow estimate representing the prediction of the i-th decoding module,
Figure BDA00031112092000000610
is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
The system provides a new compression framework of optical flow estimation network, namely guiding learning, the framework is composed of two flow networks, and the compressed optical flow estimation network can be trained more effectively than the compressed optical flow estimation network which only uses real-label (ground-route) supervision. The basic idea is that an existing network which achieves satisfactory performance in optical flow estimation can be used as a teacher network, and another lightweight network is supervised as student network training for optical flow estimation. For how to effectively monitor the lightweight network and keep the accuracy by using a complete optical flow estimation network, the optical flow estimation network adopts a mode of firstly extracting a feature map and then decoding the feature map by adopting various methods to obtain an optical flow estimation result, so that the optical flow estimation network can be divided into a feature encoder and a decoder. The optical flow estimation network is realized by powerful improvement on the feature decoder, because the feature decoder fuses together the feature maps extracted by the encoder, and each decoder block can also directly predict the optical flow graph, which cannot be realized by the low-level feature map of the encoder. On this basis, the feature decoder contains rich information and effective knowledge, so the teacher network should guide the student network to train the decoder. Such guided learning strategies achieve competitive performance across multiple data sets and compress the model to a large extent.
Referring now to FIG. 4, shown is a block diagram of a computer system 400 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable storage medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps; calculating and minimizing Euclidean distances of feature maps acquired by a student network and a teacher network; and minimizing the optical flow estimation value and the real label value of the student network by using the loss function, and guiding the training of the student network by using the characteristics of a decoder of the teacher network.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (14)

1. An optical flow estimation method based on a guiding learning strategy is characterized by comprising the following steps:
s1: respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
s2: calculating and minimizing Euclidean distances of the feature maps acquired by the student network and the teacher network; and
s3: minimizing the optical flow estimate and the true tag values of the student network using a loss function of the optical flow estimation network, and guiding training of the student network using features of a decoder of the teacher network.
2. The guided learning strategy-based optical flow estimation method of claim 1, wherein the number of convolution kernel channels of the student network is half of that of the teacher network.
3. The guided learning strategy-based optical flow estimation method of claim 2, wherein a transformation layer is introduced in a training phase to transform the number of convolution kernel channels of the student network to be consistent with the number of convolution kernel channels of the teacher network.
4. The guided learning strategy-based optical flow estimation method of claim 3, wherein the transformation layer employs a 3 x 3 convolution kernel.
5. The method of claim 1, wherein the Euclidean distance is calculated by the following formula
Figure FDA0003111209190000011
Wherein, TiCharacteristic diagrams obtained for the teacher network, SiA profile obtained for the student network.
6. The optical flow estimation method based on the guided learning strategy according to claim 1, wherein the optical flow estimation value and the real label value of the student network are minimized by using a loss function EPE in the step S3, wherein the loss function EPE is used for the estimation
Figure FDA0003111209190000012
Wherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the true tag value.
7. The method for optical flow estimation based on guided learning strategy according to claim 1, wherein the step S3 specifically comprises using the feature map of the decoder of the teacher network to facilitate the training of the decoder of the student network, and minimizing the difference between the feature map of the decoder side of the teacher network and the feature map of the decoder side of the student network.
8. The guided learning strategy-based optical flow estimation method of claim 1, wherein a loss function L (ω) of the entire network structure is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the learning under the constraints of the teacher networkGenerating a network, minimizing the loss function L (ω), wherein Lf(ω) is a guiding loss function of the profile in guiding the strategy, LEPEAnd (omega) is a loss function of the optical flow estimation network, wherein omega represents a parameter of the whole network training, and gamma represents a weight attenuation coefficient.
9. The guided learning strategy-based optical flow estimation method of claim 8, wherein the guided loss function
Figure FDA0003111209190000021
Loss function of the optical flow estimation network
Figure FDA0003111209190000022
Figure FDA0003111209190000023
Wherein λ isfIs hyperparametric, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,
Figure FDA0003111209190000024
a translation layer is represented that is,
Figure FDA0003111209190000025
an optical flow estimate representing the prediction of the i-th decoding module,
Figure FDA0003111209190000026
is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
10. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any one of claims 1 to 9.
11. An optical flow estimation system based on a guided learning strategy, the system comprising:
a feature extraction unit: the system is configured for respectively sending the images into a teacher network and a student network for feature extraction to obtain corresponding feature maps;
a distance minimizing unit: the Euclidean distance calculation method comprises the steps that Euclidean distances of feature graphs acquired by the student network and the teacher network are configured and minimized;
a guiding training unit: configured to minimize optical flow estimate values and true tag values of the student network using a loss function of the optical flow estimation network, the training of the student network being guided using features of a decoder of the teacher network.
12. The guided learning strategy-based optical flow estimation system of claim 11, wherein the number of convolution kernel channels of the student network is half of the number of convolution kernel channels of the teacher network, and a transformation layer is introduced in a training phase to transform the number of convolution kernel channels of the student network to be consistent with the number of convolution kernel channels of the teacher network, wherein the transformation layer employs a 3 x 3 convolution kernel.
13. The guided learning strategy-based optical flow estimation system of claim 11, wherein the training unit is specifically configured to minimize optical flow estimates and true label values of the student network using a loss function EPE, the loss function EPE
Figure FDA0003111209190000027
Wherein, M (M)x,My) Representing the optical flow estimate, G (G)x,Gy) Representing the real label value, utilizing the characteristic diagram of the decoder of the teacher network to promote the training of the decoder of the student network, and minimizing the difference value of the characteristic diagram of the decoder end of the teacher network and the characteristic diagram of the decoder end of the student network.
14. The method of claim 11The optical flow estimation system based on the guided learning strategy is characterized in that the loss function L (omega) of the whole network in the system is Lf(ω)+LEPE(ω)+γ||ω||2Optimizing the student network under the constraint of the teacher network, and minimizing the loss function L (omega), wherein omega represents the parameter of the whole network training, gamma represents the weight attenuation coefficient, and the guiding loss function of the characteristic diagram in guiding the strategy
Figure FDA0003111209190000028
Loss function for optical flow estimation network
Figure FDA0003111209190000029
λfIs hyperparametric, TiAnd SiCharacteristic diagrams of the teacher network decoder and the student network decoder respectively,
Figure FDA00031112091900000210
a translation layer is represented that is,
Figure FDA00031112091900000211
an optical flow estimate representing the prediction of the i-th decoding module,
Figure FDA0003111209190000031
is a corresponding supervisory signal, i.e. a characteristic diagram, alpha, of the teacher's network corresponding decoder moduleiIs a hyper-parameter.
CN202110649574.1A 2021-06-10 2021-06-10 Optical flow estimation method and system based on guiding learning strategy Active CN113327265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110649574.1A CN113327265B (en) 2021-06-10 2021-06-10 Optical flow estimation method and system based on guiding learning strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110649574.1A CN113327265B (en) 2021-06-10 2021-06-10 Optical flow estimation method and system based on guiding learning strategy

Publications (2)

Publication Number Publication Date
CN113327265A true CN113327265A (en) 2021-08-31
CN113327265B CN113327265B (en) 2022-07-15

Family

ID=77420479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110649574.1A Active CN113327265B (en) 2021-06-10 2021-06-10 Optical flow estimation method and system based on guiding learning strategy

Country Status (1)

Country Link
CN (1) CN113327265B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920574A (en) * 2021-12-15 2022-01-11 深圳市视美泰技术股份有限公司 Training method and device for picture quality evaluation model, computer equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110268342A1 (en) * 2006-11-09 2011-11-03 Drvision Technologies Llc Method for moving cell detection from temporal image sequence model estimation
CN110880036A (en) * 2019-11-20 2020-03-13 腾讯科技(深圳)有限公司 Neural network compression method and device, computer equipment and storage medium
CN111401406A (en) * 2020-02-21 2020-07-10 华为技术有限公司 Neural network training method, video frame processing method and related equipment
CN111523410A (en) * 2020-04-09 2020-08-11 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN111667399A (en) * 2020-05-14 2020-09-15 华为技术有限公司 Method for training style migration model, method and device for video style migration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110268342A1 (en) * 2006-11-09 2011-11-03 Drvision Technologies Llc Method for moving cell detection from temporal image sequence model estimation
CN110880036A (en) * 2019-11-20 2020-03-13 腾讯科技(深圳)有限公司 Neural network compression method and device, computer equipment and storage medium
CN111401406A (en) * 2020-02-21 2020-07-10 华为技术有限公司 Neural network training method, video frame processing method and related equipment
CN111523410A (en) * 2020-04-09 2020-08-11 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN111667399A (en) * 2020-05-14 2020-09-15 华为技术有限公司 Method for training style migration model, method and device for video style migration

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113920574A (en) * 2021-12-15 2022-01-11 深圳市视美泰技术股份有限公司 Training method and device for picture quality evaluation model, computer equipment and medium
CN113920574B (en) * 2021-12-15 2022-03-18 深圳市视美泰技术股份有限公司 Training method and device for picture quality evaluation model, computer equipment and medium

Also Published As

Publication number Publication date
CN113327265B (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN111930992B (en) Neural network training method and device and electronic equipment
CN110119757B (en) Model training method, video category detection method, device, electronic equipment and computer readable medium
CN107481717B (en) Acoustic model training method and system
CN110622176B (en) Video partitioning
CN111079532B (en) Video content description method based on text self-encoder
CN110929780B (en) Video classification model construction method, video classification device, video classification equipment and medium
US20200104640A1 (en) Committed information rate variational autoencoders
CN111916067A (en) Training method and device of voice recognition model, electronic equipment and storage medium
CN111523640B (en) Training method and device for neural network model
CN111210446B (en) Video target segmentation method, device and equipment
CN110781413B (en) Method and device for determining interest points, storage medium and electronic equipment
CN111597961B (en) Intelligent driving-oriented moving target track prediction method, system and device
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN113327599B (en) Voice recognition method, device, medium and electronic equipment
CN111653270B (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN116050496A (en) Determination method and device, medium and equipment of picture description information generation model
CN115239593A (en) Image restoration method, image restoration device, electronic device, and storage medium
CN117475038B (en) Image generation method, device, equipment and computer readable storage medium
CN110472673B (en) Parameter adjustment method, fundus image processing device, fundus image processing medium and fundus image processing apparatus
CN113327265B (en) Optical flow estimation method and system based on guiding learning strategy
CN113850012B (en) Data processing model generation method, device, medium and electronic equipment
CN111161724B (en) Method, system, equipment and medium for Chinese audio-visual combined speech recognition
CN117291232A (en) Image generation method and device based on diffusion model
CN112364933A (en) Image classification method and device, electronic equipment and storage medium
US11501759B1 (en) Method, system for speech recognition, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant