CN111311646B

CN111311646B - Optical flow neural network training method and device

Info

Publication number: CN111311646B
Application number: CN201811518482.4A
Authority: CN
Inventors: 石大虎; 虞抒沁; 谭文明
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2023-04-07
Anticipated expiration: 2038-12-12
Also published as: CN111311646A

Abstract

The embodiment of the application provides an optical flow neural network training method and device. The method comprises the following steps: inputting a sample image frame and an adjacent image frame of the sample image frame into a first neural network and a second neural network to obtain a first dense optical flow output by the first neural network and a second dense optical flow output by the second neural network, wherein the first neural network is a pre-trained optical flow neural network, and the second neural network is an optical flow neural network with a model complexity lower than that of the first neural network; adjusting network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow. By adopting the embodiment of the application, the second neural network can learn the mapping relation from the image pair which is learned by the first neural network to the dense optical flow, so that the performance of the second neural network approaches to the first neural network with higher model complexity.

Description

Optical flow neural network training method and device

Technical Field

The application relates to the technical field of computer vision, in particular to an optical flow neural network training method and device.

Background

When performing computer vision tasks, dense optical flow can be used to represent the motion state of each pixel in an image frame. In the prior art, an optical flow neural network can be trained, an image frame to be processed and an adjacent image frame of the image frame to be processed are input into the optical flow neural network, and the output of the optical flow neural network is taken as a dense optical flow of a target image frame.

Because the dense optical flow comprises the motion states of all pixel points in the target image frame, the calculation difficulty is high, and the relatively accurate dense optical flow can be output only by an optical flow neural network with high model complexity. However, the optical flow neural network with a higher model complexity operates less efficiently and may occupy more computational resources.

Disclosure of Invention

An object of the embodiments of the present application is to provide an optical flow neural network training method and apparatus, so as to enable an optical flow neural network with a low model complexity to approach an optical flow neural network with a high model complexity in terms of performance. The specific technical scheme is as follows:

in a first aspect of embodiments of the present application, there is provided an optical flow neural network training method, including:

inputting a sample image frame and an adjacent image frame of the sample image frame into a first neural network and a second neural network to obtain a first dense optical flow output by the first neural network and a second dense optical flow output by the second neural network, wherein the first neural network is a pre-trained optical flow neural network, and the second neural network is an optical flow neural network with a model complexity lower than that of the first neural network;

adjusting network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow.

With reference to the first aspect, in a first possible implementation manner, before the adjusting the network parameters of the second neural network based on the deviation between the first dense optical flow and the second dense optical flow, the method further includes:

calculating the difference value between the optical flow value of the pixel point in the first dense optical flow and the optical flow value of the pixel point in the second dense optical flow as the optical flow difference value of the pixel point aiming at each pixel point in the sample image frame;

and accumulating the optical flow difference values of all pixel points in the sample image frame to obtain the deviation between the first dense optical flow and the second dense optical flow.

With reference to the first aspect, in a second possible implementation manner, the adjusting network parameters of the second neural network based on the deviation between the first dense optical flow and the second dense optical flow includes:

adjusting network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow and a loss between the second dense optical flow and the labeled dense optical flow for the sample image frame.

With reference to the first aspect, in a third possible implementation manner, the first dense optical flow is a plurality of dense optical flows output by the first neural network at a plurality of different down-sampling magnifications;

the second dense optical flow is a plurality of dense optical flows output by the second neural network at the plurality of different downsampling magnifications;

the adjusting network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow comprises:

for each of the plurality of different down-sampling magnifications, adjusting network parameters of the second neural network based on a deviation between the dense optical flow output at the down-sampling magnification in the first dense optical flow and the dense optical flow output at the down-sampling magnification in the second dense optical flow.

With reference to the first aspect, in a fourth possible implementation manner, the first neural network is a plurality of optical flow neural networks trained in advance;

carrying out weighted average on the first dense optical flows output by the first neural networks to obtain an average dense optical flow;

adjusting network parameters of the second neural network based on a deviation between the average dense optical flow and the second dense optical flow.

With reference to the first aspect, in a fifth possible implementation manner, a depth of the second neural network is smaller than a depth of the first neural network, and a width of the second neural network is smaller than a width of the first neural network.

In a second aspect of embodiments of the present application, there is provided an optical flow neural network training apparatus, the apparatus including:

the system comprises a sample input module, a first neural network and a second neural network, wherein the sample input module is used for inputting a sample image frame and an adjacent image frame of the sample image frame into the first neural network and the second neural network to obtain a first dense optical flow output by the first neural network and a second dense optical flow output by the second neural network, the first neural network is a pre-trained optical flow neural network, and the second neural network is a lower model complexity than the first neural network;

a parameter adjustment module to adjust network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow.

With reference to the second aspect, in a first possible implementation manner, the parameter adjusting module is further configured to, before the network parameter of the second neural network is adjusted based on the deviation between the first dense optical flow and the second dense optical flow, calculate, for each pixel point in the sample image frame, a difference value between an optical flow value of the pixel point in the first dense optical flow and an optical flow value of the pixel point in the second dense optical flow as an optical flow difference value of the pixel point;

With reference to the second aspect, in a second possible implementation manner, the parameter adjustment module is specifically configured to adjust the network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow and a loss between the second dense optical flow and the labeled dense optical flow of the sample image frame.

With reference to the second aspect, in a third possible implementation manner, the first dense optical flow is a plurality of dense optical flows output by the first neural network at a plurality of different down-sampling magnifications;

the second dense optical flow is a plurality of dense optical flows output by the second neural network at the plurality of different down-sampling magnifications;

the parameter adjusting module is specifically configured to, for each of the plurality of different down-sampling magnifications, adjust network parameters of the second neural network based on a deviation between a dense optical flow output at the down-sampling magnification in the first dense optical flow and a dense optical flow output at the down-sampling magnification in the second dense optical flow.

With reference to the second aspect, in a fourth possible implementation manner, the first neural network is a plurality of optical flow neural networks trained in advance;

the parameter adjusting module is specifically configured to perform weighted averaging on the first dense optical flows output by the plurality of first neural networks to obtain an average dense optical flow;

With reference to the second aspect, in a fifth possible implementation manner, a depth of the second neural network is smaller than a depth of the first neural network, and a width of the second neural network is smaller than a width of the first neural network.

In a third aspect of embodiments of the present application, there is provided an electronic device, comprising a memory for storing a computer program;

and the processor is used for realizing any optical flow neural network training method when executing the program stored in the memory.

In a fourth aspect of embodiments of the present application, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the optical flow neural network training methods described above.

According to the optical flow neural network training method and device provided by the embodiment of the application, the second neural network can learn the mapping relation between the image which is learned by the first neural network and the dense optical flow by utilizing the first dense training second neural network output by the trained first neural network, so that the performance of the second neural network approaches to the first neural network with higher model complexity. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart of a method for training an optical flow neural network according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an optical flow neural network according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a deviation calculating method according to an embodiment of the present application;

fig. 4 is another schematic flow chart of a deviation calculating method according to an embodiment of the present application;

FIG. 5a is a schematic diagram of a principle of an optical flow neural network training method provided in the embodiment of the present application;

FIG. 5b is a schematic diagram illustrating another principle of the optical flow neural network training method according to the embodiment of the present application;

FIG. 6 is a schematic structural diagram of an optical flow neural network training device according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The image of the object in one image frame is in a still state, but the object itself may be in a moving state. For example, a running athlete is photographed by using a camera, in a photographed image frame, the position of a pixel area where an image of the athlete is located in the image frame does not change with time, and the athlete is actually in a motion state, so that the motion state of the athlete is difficult to reflect only according to the pixel area where the image of the athlete is located in the image frame. In some machine vision tasks, however, it may be desirable to utilize the motion state of the athlete, such as athlete trajectory prediction. The optical flow of the image frame may be obtained through optical flow estimation, where the optical flow includes optical flow values of some or all pixel points in the image frame, and the optical flow value of each pixel point may represent a motion state of the pixel point. And if the optical flow comprises the optical flow values of all pixel points in the image frame, the optical flow is called as dense optical flow.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for training an optical flow neural network according to an embodiment of the present application, where the method includes:

s101, inputting the sample image frame and the adjacent image frame of the sample image frame into a first neural network and a second neural network to obtain a first dense optical flow output by the first neural network and a second dense optical flow output by the second neural network.

The first neural network is a pre-trained optical flow neural network, and the second neural network is an optical flow neural network with model complexity lower than that of the first neural network. The model complexity of an optical flow neural network may include a depth of the optical flow neural network and/or a width of the optical flow neural network, the depth may represent the number of convolution layers included in the optical flow neural network, and the width may represent the number of convolution kernels included in each convolution layer of the optical flow neural network and the size of the convolution kernels. The smaller the depth of an optical flow neural network is, the lower the model complexity of the optical flow neural network is, and the smaller the width of an optical flow neural network is, the lower the model complexity of the optical flow neural network is, in the case of the constant width. The second neural network may be an optical flow neural network having a width identical to that of the first neural network and a depth smaller than that of the first neural network, or an optical flow neural network having a depth identical to that of the first neural network and a width smaller than that of the first neural network. It is also possible to have an optical flow neural network that is smaller in width than the first neural network and smaller in depth than the first neural network, in which case the second neural network can be maximally simplified.

The sample image frames may be image frames labeled with dense optical flow or image frames not labeled with dense optical flow. In the embodiment of the present application, the adjacent image frame of the sample image frame may be the next image frame adjacent to the sample image frame in the time domain, for example, if three image frames are continuously captured at t =0ms, t =10ms, t =20ms, and if the image frame captured at t =10ms is selected as the sample image frame, the adjacent image frame may be the image frame captured at t =20 ms.

In an embodiment, the second neural network may not be trained prior to performing the step, or may be trained prior to performing the step. Because the model complexity of the second neural network is lower than that of the first neural network, the accuracy of the second dense optical flow output by the second neural network is theoretically lower than that of the first dense optical flow output by the first neural network, and if the accuracy of the second dense optical flow is not lower than that of the first dense optical flow, the optical flow estimation is performed by using the second neural network instead of the first neural network, so that the efficiency of the optical flow neural network can be improved and the calculation resources occupied by the optical flow neural network are reduced on the premise of hardly influencing the accuracy of the output dense optical flow, i.e., the technical problem required to be solved by the embodiment of the application does not exist, and therefore, the discussion is omitted here. The following discussion is only directed to the case where the accuracy of the second dense optical flow is lower than the accuracy of the first dense optical flow.

Further, the network structure of the first neural network and the second neural network may be as shown in fig. 2, and includes a feature encoding (Encoder) module 201 and a feature decoding (Decoder) module 202, the feature encoding module 201 may include convolutional layers and pooling layers, the feature decoding module may include convolutional layers and deconvolution layers, wherein the feature encoding module 201 may be configured to extract image features from an image pair (including a target image frame to be calculated optical flow and an adjacent image frame of the target image frame) input to the neural network, and the feature decoding module may be configured to calculate a dense optical flow based on the extracted image features.

S102, adjusting network parameters of a second neural network based on the deviation between the first dense optical flow and the second dense optical flow.

The first dense flow of light may be one dense flow of light or a plurality of dense flows of light. For example, there may be a plurality of first neural networks, the first dense optical flow being one dense optical flow output by each of the plurality of first neural networks, e.g., there is a first neural network a, a first neural network B, then the first dense optical flow may include the dense optical flow output by the first neural network a, and the dense optical flow output by the first neural network B.

It is also possible that there is only one first neural network, the first dense optical flow being a plurality of dense optical flows output by the first neural network at a plurality of different down-sampling magnifications, e.g., the first dense optical flow includes a dense optical flow output by the first neural network at a 64-fold down-sampling magnification, and a dense optical flow output by the first neural network at a 4-fold down-sampling magnification. It may also be that there are a plurality of first neural networks, the first dense optical flow being a plurality of dense optical flows that the plurality of first neural networks each output at a plurality of different down-sampling magnifications, e.g., there are first neural network a, first neural network B, then the first dense optical flow may include a dense optical flow that first neural network a outputs at a 64 times down-sampling magnification, a dense optical flow that first neural network a outputs at a 4 times down-sampling magnification, a dense optical flow that first neural network B outputs at a 64 times down-sampling magnification, and a dense optical flow that first neural network B outputs at a 4 times down-sampling magnification. If the first dense optical flow comprises a dense optical flow output by the first neural network at a certain down-sampling rate, the second dense optical flow comprises at least a dense optical flow output by the second neural network at the down-sampling rate, and the network parameters of the second neural network are adjusted based on a random gradient descent method according to a deviation between the two dense optical flows. By selecting the embodiment, the second neural network can be trained under a plurality of different down-sampling multiplying factors, so that the performance of the second neural network can better approach the first neural network.

For convenience of discussion, assuming that the pixel resolution of the sample image frame is 1920 × 1080, the output dense optical flow of the first neural network or the second neural network may be a light flow graph with a resolution of 1920 × 1080 without down-sampling output, and may be a light flow graph with a resolution of (1920/n) × (1080/n) in the case of n times down-sampling output, for example, a light flow graph with a resolution of 480 × 270 in the case of 4 times down-sampling output, where one point in the light flow graph may represent the overall motion speed of a 16-pixel region composed of 4 × 4 pixels in the sample image frame.

The input of the optical flow neural network is an image pair, and the output of the optical flow neural network is dense optical flow, so that the optical flow neural network can be regarded as end-to-end mapping from the image pair to the dense optical flow, and the training process of the optical flow neural network can be regarded as a process for making the mapping approximate to a real situation. It can be considered that the better the mapping is to the real situation after training, the better the performance of the optical flow neural network is.

If the optical flow neural network is trained by using a sample image pair marked with dense optical flow (including the sample image frame and an adjacent image frame of the sample image frame), under the condition that the marked dense optical flow has errors, the optical flow neural network can accurately learn the mapping relation between the sample image pair and the dense optical flow theoretically, and the mapping relation between the image pair other than the sample image pair and the dense optical flow is generalized and is possibly not accurate enough based on the existing mapping relation between the sample image pair and the marked dense optical flow. If the generalization capability of the optical flow neural network is strong, the mapping relationship obtained by generalization may be closer to the real situation, i.e. the performance of the optical flow neural network is good, whereas if the generalization capability of the optical flow neural network is poor, the difference between the mapping relationship obtained by generalization and the real situation is large, i.e. the performance of the optical flow neural network is poor.

For convenience of discussion, assuming that the network structure of the optical flow neural network is as shown in fig. 2, if the model complexity of the optical flow neural network is higher, the model complexity of the feature encoding module and/or the feature decoding module of the optical flow neural network is higher, and therefore the performance of the feature encoding module and/or the feature decoding module may be higher, and therefore the image features extracted by the optical flow neural network from the input image pair may be more accurate, and/or the dense optical flow neural network calculated based on the image features is more accurate, and therefore the generalization capability of the optical flow neural network tends to be better. Conversely, if the model complexity of the optical flow neural network is low, the generalization capability of the optical flow neural network is often poor.

Therefore, if the sample image pair marked with the dense optical flows is used to train the optical flow neural network with the higher model complexity and the optical flow neural network with the lower model complexity, respectively, after the training, the optical flow neural network with the lower model complexity may have weaker performance than the optical flow neural network with the higher model complexity due to the poorer generalization capability. In this embodiment, because the first neural network is a trained neural network, the first dense optical flow may be obtained by mapping the sample image pair based on the learned end-to-end mapping relationship of the first neural network, that is, the first dense optical flow already shows the mapping relationship learned by the first neural network, so that the present embodiment is selected, the second neural network may learn the mapping relationship learned by the first neural network based on the generalization capability of the first neural network, and performance degradation caused by poor generalization capability of the second neural network may be effectively avoided, so that the performance of the trained second neural network may approach the first neural network. Further, since the second neural network is a mapping relationship from the first neural network to the dense optical flow in the embodiment, the first neural network may be regarded as a teacher neural network and the second neural network may be regarded as a student neural network.

The second neural network has higher operation efficiency due to lower model complexity, and the occupied computing resources are possibly lower. That is, this embodiment is selected to improve the performance of the optical flow neural network without increasing the model complexity of the optical flow neural network.

If only one first neural network exists, see fig. 3, where fig. 3 is a schematic flow chart of the deviation calculating method provided by the embodiment of the present application, and the method may include:

s301, aiming at each pixel point in the sample image frame, calculating the optical flow value of the pixel point in the first dense optical flow and the difference value of the optical flow value of the pixel point in the second dense optical flow to be used as the optical flow difference value of the pixel point.

Assuming that the optical flow value of a pixel point (i, j) in the sample image frame in the first dense optical flow is

The stream value of the pixel point (i, j) in the second dense stream is >>

Then, in this embodiment, the optical flow difference value of the pixel point (i, j) can be calculated by the following formula:

wherein L is _(i，j) Is the optical flow difference value of the pixel point (i, j). In other alternative embodiments, the optical flow difference value of the pixel point may also be calculated by other distance formulas than the euclidean distance.

S302, accumulating the optical flow difference values of all pixel points in the sample image frame to obtain the deviation between the first dense optical flow and the second dense optical flow.

For example, assuming that the pixel resolution of the sample image frame is 1920 × 1080, and assuming that the pixel point (i, j) represents the j-th pixel point of the i-th pixel row in the sample image frame, the deviation between the first dense optical flow and the second dense optical flow may be calculated according to the following formula:

where L is the deviation between the first dense stream of light and the second dense stream of light. With the embodiment, since the deviation is calculated based on the optical flow difference value of each pixel point, the deviation can comprehensively reflect the difference between the first dense optical flow and the second dense optical flow.

If a plurality of first neural networks exist, see fig. 4, and fig. 4 is a schematic flow chart of a deviation calculating method provided by an embodiment of the present application, which may include:

s401, carrying out weighted average on the first dense optical flows output by the first neural networks to obtain an average dense optical flow.

Assuming that there are n first neural networks, the first dense optical flows output by these n first neural networks are respectively F ₁ 、F ₂ 、…、F _n Then the average dense optical flow can be calculated as follows:

wherein F _ave To average dense light flow, a _i Normalized weighting coefficient of the first dense optical flow output by the ith first neural network

For a _i The specific value of (a) may be positively correlated with the performance of the ith first neural network, for example, if the performance of the ith first neural network is better than the performance of the other first neural networks, then a _i May take a large value, if the performance of the ith first neural network is poor relative to the performance of the other first neural networks, then a _i Can be made smallerThe value is obtained.

S402, aiming at each pixel point in the sample image frame, calculating the difference value between the optical flow value of the pixel point in the average dense optical flow and the optical flow value of the pixel point in the second dense optical flow to be used as the optical flow difference value of the pixel point.

The average dense optical flow is obtained by weighted averaging of a plurality of first dense optical flows, and the optical flow values of each pixel point in the sample image frame are also included in the average dense optical flow, so the step can refer to the related calculation in S301, and details are not repeated here.

And S403, accumulating the optical flow difference values of all pixel points in the sample image frame to obtain the deviation between the average dense optical flow and the second dense optical flow as the deviation between the first dense optical flow and the second dense optical flow.

The mapping relation learned by the single first neural network may have a certain error, the average dense optical flow can reflect the average mapping relation obtained by weighting the mapping relations learned by the multiple first neural networks, and the average mapping relation is more accurate and has stronger generalization capability compared with the mapping relation learned by the single first neural network due to the integration of the mapping relations learned by the multiple first neural networks. Therefore, the performance of the second neural network trained by the embodiment can be further improved.

Referring to fig. 5a, fig. 5a is a schematic structural diagram of a training framework provided in the embodiment of the present application, and the structural diagram includes a first neural network 510 and a second neural network 520, where the first neural network 510 includes a first feature encoding module 511, a first feature decoding module 512, and a second feature decoding module 513, and the second neural network 520 includes a second feature encoding module 521, a third feature decoding module 522, and a fourth feature decoding module 523. The first feature encoding module 511 and the second feature encoding module 521 are configured to extract image features from the input image pair, the first feature decoding module 512 and the third feature decoding module 522 are configured to calculate a dense optical flow at a down-sampling magnification of 64 times (in other embodiments, other magnifications are also possible) based on the extracted image features, and the second feature decoding module 513 and the fourth feature decoding module 523 are configured to calculate a dense optical flow of the input image pair based on the calculated dense optical flow at the down-sampling magnification of 64 times and the image features.

In training the second neural network, the sample image pairs may be input to the first feature encoding module 511 and the second feature encoding module 521, respectively, and the network parameters of the second neural network may be adjusted based on the deviation between the dense optical flow at the down-sampling magnification of 64 times and the dense optical flow at the down-sampling magnification of 64 times output by the third feature decoding module 522, and the deviation between the dense optical flow output by the second feature decoding module 513 and the dense optical flow output by the fourth feature decoding module 523, which are output by the first feature decoding module 512.

Referring to fig. 5b, fig. 5b is another structural schematic diagram of the training framework provided in the embodiment of the present application, when adjusting the network parameters of the second neural network, in addition to the deviation between the dense optical flow at the down-sampling rate of 64 times output by the first feature decoding module 512 and the dense optical flow at the down-sampling rate of 64 times output by the third feature decoding module 522, and the deviation between the dense optical flow output by the second feature decoding module 513 and the dense optical flow output by the fourth feature decoding module 523, the loss between the dense optical flow at the down-sampling rate of 64 times output by the third feature decoding module 522 and the first true value, and the loss between the dense optical flow output by the fourth feature decoding module 523 and the second true value are also based on the difference between the first and second values. Wherein the first true value is a dense optical flow at a down-sampling magnification of 64 times for the input image pair, and the second true value is a dense optical flow without down-sampling for the input image pair. By adopting the embodiment, the second neural network can be trained by combining the first neural network and the labeled truth value, so that the performance of the second neural network is further improved.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an optical flow neural network training device provided in the embodiment of the present application, which may include:

the sample input module 601 is configured to input a sample image frame and an adjacent image frame of the sample image frame into a first neural network and a second neural network to obtain a first dense optical flow output by the first neural network and a second dense optical flow output by the second neural network, where the first neural network is an optical flow neural network trained in advance, and the second neural network is an optical flow neural network whose model complexity is lower than that of the first neural network;

a parameter adjustment module 602 to adjust network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow.

Further, the parameter adjusting module 602 is further configured to calculate, for each pixel point in the sample image frame, a difference between an optical flow value of the pixel point in the first dense optical flow and an optical flow value of the pixel point in the second dense optical flow, as an optical flow difference value of the pixel point;

Further, the parameter adjusting module 602 is configured to adjust network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow and a deviation between the second dense optical flow and the labeled dense optical flow of the sample image frame.

Further, the first dense optical flow is a plurality of dense optical flows output by the first neural network at a plurality of different down-sampling rates;

the second dense optical flow is a plurality of dense optical flows output by the second neural network at a plurality of different down-sampling magnifications;

the parameter adjusting module 602 is specifically configured to, for each of a plurality of different down-sampling magnifications, adjust network parameters of the second neural network based on a deviation between a dense optical flow output at the down-sampling magnification in the first dense optical flow and a dense optical flow output at the down-sampling magnification in the second dense optical flow.

Further, the first neural network is a plurality of optical flow neural networks trained in advance;

a parameter adjusting module 602, specifically configured to perform weighted average on the first dense optical flows output by the multiple first neural networks, so as to obtain an average dense optical flow;

network parameters of a second neural network are adjusted based on a deviation between the average dense optical flow and the second dense optical flow.

Further, the depth of the second neural network is smaller than that of the first neural network, and the width of the second neural network is smaller than that of the first neural network.

An embodiment of the present application further provides an electronic device, as shown in fig. 7, including a memory 701 configured to store a computer program;

the processor 702 is configured to implement the following steps when executing the program stored in the memory 701:

inputting a sample image frame and an adjacent image frame of the sample image frame into a first neural network and a second neural network to obtain a first dense optical flow output by the first neural network and a second dense optical flow output by the second neural network, wherein the first neural network is an optical flow neural network trained in advance, and the second neural network is an optical flow neural network with a model complexity lower than that of the first neural network;

network parameters of a second neural network are adjusted based on a deviation between the first dense optical flow and the second dense optical flow.

Further, the deviation between the first dense optical flow and the second dense optical flow is obtained by:

Further, adjusting network parameters of a second neural network based on a deviation between the first dense optical flow and the second dense optical flow, comprising:

network parameters of a second neural network are adjusted based on a deviation between the first dense optical flow and the second dense optical flow, and a deviation between the second dense optical flow and the labeled dense optical flow of the sample image frame.

the second dense optical flow is a plurality of dense optical flows output by the second neural network on a plurality of different down-sampling multiplying factors;

adjusting network parameters of a second neural network based on a deviation between the first dense optical flow and the second dense optical flow, comprising:

for each of a plurality of different down-sampling magnifications, network parameters of a second neural network are adjusted based on a deviation between a dense optical flow output at the down-sampling magnification in the first dense optical flow and a dense optical flow output at the down-sampling magnification in the second dense optical flow.

The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute any of the optical flow neural network training methods in the above embodiments.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the optical flow neural network training methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the embodiments of the apparatus, the electronic device, the computer-readable storage medium, and the computer program product, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. An optical flow neural network training method, the method comprising:

adjusting network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow;

the first dense optical flow is a plurality of dense optical flows output by the first neural network at a plurality of different down-sampling magnifications;

for each of the plurality of different down-sampling magnifications, adjusting network parameters of the second neural network based on a deviation between a dense optical flow output at the down-sampling magnification in the first dense optical flow and a dense optical flow output at the down-sampling magnification in the second dense optical flow.

2. The method of claim 1, wherein prior to the adjusting network parameters of the second neural network based on the deviation between the first dense optical flow and the second dense optical flow, the method further comprises:

3. The method of claim 1, wherein the adjusting network parameters of the second neural network based on the deviation between the first dense optical flow and the second dense optical flow comprises:

adjusting network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow and a loss between the second dense optical flow and the labeled dense optical flow of the sample image frame.

4. The method of claim 1, wherein the first neural network is a plurality of pre-trained optical flow neural networks;

performing weighted average on the first dense optical flows output by the first neural networks to obtain an average dense optical flow;

5. The method of claim 1, wherein the depth of the second neural network is less than the depth of the first neural network, and wherein the width of the second neural network is less than the width of the first neural network.

6. An optical flow neural network training apparatus, the apparatus comprising:

the device comprises a sample input module, a first neural network and a second neural network, wherein the sample image frame and adjacent image frames of the sample image frame are input into the first neural network and the second neural network to obtain a first dense optical flow output by the first neural network and a second dense optical flow output by the second neural network, the first neural network is a pre-trained optical flow neural network, and the second neural network is an optical flow neural network with a model complexity lower than that of the first neural network;

a parameter adjustment module to adjust network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow;

7. The apparatus of claim 6, wherein the parameter adjustment module is further configured to, before the adjusting the network parameters of the second neural network based on the deviation between the first dense optical flow and the second dense optical flow, calculate, for each pixel point in the sample image frame, a difference between an optical flow value of the pixel point in the first dense optical flow and an optical flow value of the pixel point in the second dense optical flow as an optical flow difference value for the pixel point;

8. The apparatus of claim 6, wherein the parameter adjustment module is specifically configured to adjust the network parameters of the second neural network based on a deviation between the first dense optical flow and the second dense optical flow and a loss between the second dense optical flow and the labeled dense optical flow of the sample image frame.

9. The apparatus of claim 6, wherein the first neural network is a plurality of pre-trained optical flow neural networks;

10. The apparatus of claim 6, wherein a depth of the second neural network is less than a depth of the first neural network, and wherein a width of the second neural network is less than a width of the first neural network.