WO2023116117A1

WO2023116117A1 - Training method and apparatus for optical flow estimation model

Info

Publication number: WO2023116117A1
Application number: PCT/CN2022/123230
Authority: WO
Inventors: 于雷; 隋伟; 张骞; 黄畅
Original assignee: 北京地平线信息技术有限公司
Priority date: 2021-12-21
Filing date: 2022-09-30
Publication date: 2023-06-29
Also published as: CN114239736A

Abstract

A training method and apparatus for an optical flow estimation model. The training method comprises: S1: performing semantic segmentation on a first sample image and a second sample image to respectively obtain a first semantic segmentation result and a second semantic segmentation result; S2: determining a first static region in the first sample image and a second static region in the second sample image on the basis of the first semantic segmentation result and the second semantic segmentation result; S3: determining a pixel point mapping relationship between the first sample image and the second sample image on the basis of inter-frame pose information of the first sample image and the second sample image, and point cloud data of the first sample image; S4: determining a first optical flow between the first static region and the second static region on the basis of the pixel point mapping relationship; and S5: constraining the training of a first optical flow estimation model on the basis of the first optical flow. The precision of the trained optical flow estimation model is obviously higher than a self-supervised method, and the shadow problem can be solved.

Description

Training method and device for optical flow estimation model

This disclosure claims the priority of the Chinese patent application with application number 202111572711.2 and titled "Optical Flow Estimation Model Training Method and Device" filed on December 21, 2021, the entire contents of which are incorporated by reference in this disclosure .

technical field

The present disclosure relates to the technical fields of image processing and artificial intelligence (AI), in particular to a training method and device for an optical flow estimation model.

Background technique

Dense optical flow estimation is to calculate the offset of all points on the image to form a dense optical flow field, and then based on the dense optical flow field, pixel-level image registration can be performed. Dense optical flow estimation has a wide range of applications in the fields of autonomous driving and autonomous robots. In recent years, with the development of deep learning technology, dense optical flow estimation technology based on deep learning has achieved good results. The supervised optical flow estimation method based on deep learning usually requires a large number of labels for model training, but the optical flow labels of real scenes are very difficult to obtain, and the model based on virtual data training often has low generalization in real scenes. question.

In related technologies, in the absence of optical flow labels, the self-supervised method is generally used for model training. During the training process, based on the assumption of photometric consistency between frames, the photometric error loss function is used to train the optical flow estimation model. However, this self-supervised The accuracy of the model training method is low.

Contents of the invention

In order to solve the above-mentioned technical problems, the present disclosure is proposed. Embodiments of the present disclosure provide a method and device for training an optical flow estimation model.

According to the first aspect of the embodiments of the present disclosure, a method for training an optical flow estimation model is provided, including:

Semantic segmentation is performed on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively;

determining a first static region in the first sample image and a second static region in the second sample image based on the first semantic segmentation result and the second semantic segmentation result;

Based on the pose information between the frames of the first sample image and the second sample image and the point cloud data of the first sample image, determine the distance between the first sample image and the second sample image The pixel point mapping relationship;

determining a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;

Based on the first optical flow, constrain the training of the first optical flow estimation model.

According to a second aspect of an embodiment of the present disclosure, a training device for an optical flow estimation model is provided, including:

A semantic segmentation module, configured to perform semantic segmentation on the first sample image and the second sample image, to obtain the first semantic segmentation result and the second semantic segmentation result respectively;

A static area determining module, configured to determine a first static area in the first sample image and a second static area in the second sample image based on the first semantic segmentation result and the second semantic segmentation result area;

A mapping relationship determination module, configured to determine the relationship between the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image and the point cloud data of the first sample image The pixel point mapping relationship between the second sample images;

A first optical flow determination module, configured to determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;

The constraint training module is configured to constrain the training of the first optical flow estimation model based on the first optical flow.

According to the third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the optical flow estimation model training method described in the first aspect above .

According to a fourth aspect of an embodiment of the present disclosure, an electronic device is provided, and the electronic device includes:

processor;

memory for storing said processor-executable instructions;

The processor is configured to read the executable instructions from the memory, and execute the instructions to implement the optical flow estimation model training method described in the first aspect above.

Based on the optical flow estimation model training method and device provided by the above-mentioned embodiments of the present disclosure, semantic segmentation is performed on the first sample image and the second sample image, and the first semantic segmentation result and the second semantic segmentation result are respectively obtained. Based on the first Semantic segmentation results and second semantic segmentation results, determining the first static region in the first sample image and the second static region in the second sample image, based on inter-frame pose information between the first sample image and the second sample image , and the point cloud data of the first sample image, determine the pixel point mapping relationship between the first sample image and the second sample image, and determine the first static area and the second static area based on the pixel point mapping relationship The first optical flow between is based on the first optical flow to constrain the training of the first optical flow estimation model. In the embodiment of the present disclosure, since the first optical flow is obtained after processing based on point cloud data, it is the true value of optical flow, and the true value of optical flow is used as the supervision information during model training, so that the trained first optical flow Flow estimation models significantly outperform self-supervised methods in accuracy. In addition, due to the shadow of the vehicle under the light or sunlight, the optical flow will be generated due to the movement of the shadow during the optical flow estimation process. In the embodiment of the present disclosure, due to the partial true value of the optical flow in the image area where the shadow , which can reduce the influence of shadows on the optical flow estimation model and guide the model to learn correctly.

The technical solution of the present disclosure will be described in further detail below with reference to the drawings and embodiments.

Description of drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing the embodiments of the present disclosure in more detail with reference to the accompanying drawings. The accompanying drawings are used to provide a further understanding of the embodiments of the present disclosure, and constitute a part of the specification, and are used together with the embodiments of the present disclosure to explain the present disclosure, and do not constitute limitations to the present disclosure. In the drawings, the same reference numerals generally represent the same components or steps.

FIG. 1 is a schematic flowchart of a training method of an optical flow estimation model according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow diagram of step S3 in an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of determining a third optical flow in an embodiment of the present disclosure;

FIG. 4 is a schematic flow diagram of step D in an embodiment of the present disclosure;

FIG. 5 is a schematic flow diagram of training a second optical flow estimation model in an embodiment of the present disclosure;

6 is a structural block diagram of a training device for an optical flow estimation model according to an embodiment of the present disclosure;

FIG. 7 is a structural block diagram of a mapping relationship determination module 300 according to an embodiment of the present disclosure;

Fig. 8 is a structural block diagram of a training device for an optical flow estimation model in another embodiment of the present disclosure;

FIG. 9 is a structural block diagram of a third optical flow determination module 1100 in an embodiment of the present disclosure;

Fig. 10 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.

Detailed ways

Hereinafter, exemplary embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. Apparently, the described embodiments are only some of the embodiments of the present disclosure, rather than all the embodiments of the present disclosure, and it should be understood that the present disclosure is not limited by the exemplary embodiments described here.

It should be noted that relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Those skilled in the art can understand that terms such as "first" and "second" in the embodiments of the present disclosure are only used to distinguish different steps, devices or modules, etc. necessary logical sequence.

It should also be understood that in the embodiments of the present disclosure, "plurality" may refer to two or more than two, and "at least one" may refer to one, two or more than two.

It should also be understood that any component, data or structure mentioned in the embodiments of the present disclosure can generally be understood as one or more unless there is a clear limitation or a contrary suggestion is given in the context.

In addition, the term "and/or" in the present disclosure is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may indicate: A exists alone, and A and B exist at the same time , there are three cases of B alone. In addition, the character "/" in the present disclosure generally indicates that the contextual objects are an "or" relationship.

It should also be understood that the description of the various embodiments in the present disclosure emphasizes the differences between the various embodiments, and the same or similar points can be referred to each other, and for the sake of brevity, details are not repeated here.

The following description of at least one exemplary embodiment is merely illustrative in nature and in no way intended as any limitation of the disclosure, its application or uses.

Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.

It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.

Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick client Computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, etc.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including storage devices.

exemplary method

FIG. 1 is a schematic flowchart of a training method for an optical flow estimation model according to an embodiment of the present disclosure. This embodiment can be applied to electronic equipment, as shown in Figure 1, including the following steps:

S1: Perform semantic segmentation on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively.

In an optional manner, when the vehicle is driving, the vehicle-mounted camera device adopts the method of shooting video to capture the image in front of the vehicle, and from the video collected by the vehicle-mounted camera device, the first sample image and the second sample image at an interval of N frames are obtained. image. Wherein, N is an integer greater than or equal to 1.

In another optional manner, when the vehicle is driving, the vehicle-mounted camera device captures images in front of the vehicle by taking a frame of images at intervals of a preset time, and obtains the first sample image and the second sample image from the images captured by the vehicle-mounted camera device. Two sample images.

After obtaining the first sample image and the second sample image, use the pre-trained semantic segmentation model to perform semantic segmentation on the first sample image to obtain the first semantic segmentation result, and use the semantic segmentation model to perform semantic segmentation on the second sample image The semantic segmentation obtains a second semantic segmentation result. Wherein, the first semantic segmentation result may include sky area, road area, pedestrian area, vehicle area and other image areas appearing in the first sample image. The second semantic segmentation result may include sky area, road area, pedestrian area, vehicle area and other image areas appearing in the second sample image.

S2: Based on the first semantic segmentation result and the second semantic segmentation result, determine a first static area in the first sample image and a second static area in the second sample image.

Determine the first static area in the first sample image according to the first semantic segmentation result, for example, the static area of the road surface area, mountain peak area and roadside fixed objects (such as utility poles and traffic lights) in the first sample image, etc. .

The second static area in the first sample image is determined according to the second semantic segmentation result, such as static areas of road surface area, mountain peak area, roadside fixed objects (such as utility poles and traffic lights) in the second sample image.

S3: Determine the pixel point mapping relationship between the first sample image and the second sample image based on the pose information between the frames of the first sample image and the second sample image and the point cloud data of the first sample image.

There is a mapping relationship between the point cloud data of the first sample image and the pixel points of the target object in the first sample image. Based on the inter-frame pose information of the first sample image and the second sample image, the pixel displacement of the target object between the first sample image and the second sample image can be calculated, so based on the The pose information between frames and the point cloud data of the first sample image can determine the pixel point mapping relationship between the first sample image and the second sample image.

S4: Determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship.

Each time a static sub-area is extracted from the first static area (such as the area of the utility pole A), and the corresponding static sub-area (ie the area of the utility pole A) is extracted from the second static area, combined with the first The pixel point mapping relationship between this image and the second sample image can obtain the true value of the optical flow of the static sub-region between the first sample image and the second sample image.

The true value of the optical flow is determined in the same manner for all the static sub-regions of the first static region and the corresponding static sub-regions of the second static region.

Summarize the true values of all the optical flows in the first static area and the second static area, and denote it as the first optical flow.

S5: Based on the first optical flow, constrain the training of the first optical flow estimation model.

The first sample image and the second sample image are combined into a sample image pair, and the first optical flow is used as supervision information when the first optical flow estimation model is trained based on the sample image pair. It should be noted that when training the first optical flow estimation model, multiple sample image pairs are required, and for each sample image pair, the corresponding first optical flow is obtained in the same manner as steps S1 to S4.

When training the first optical flow estimation model, some sample image pairs and the corresponding first optical flow are obtained from all sample image pairs to form the training set, and the remaining sample image pairs and the corresponding first optical flow are composed Validation set, use training set for training and validation set for validation. When using the training set for training, for areas other than the static area in the sample image pairs included in the training set, the existing optical flow estimation model training method can be used for training. When the training of the first optical flow estimation model meets the preset termination condition (for example, the number of model iterations reaches the predetermined number of iterations, or exceeds the preset model prediction accuracy threshold), the training of the first optical flow estimation model is terminated to obtain the first optical flow Estimation model.

In the embodiment of the present disclosure, since the first optical flow is obtained after processing based on point cloud data, it is the true value of optical flow, and the true value of optical flow is used as the supervision information during model training, so that the trained first optical flow Flow estimation models significantly outperform self-supervised methods in accuracy. In addition, due to the shadow of the vehicle under the light or sunlight, the optical flow will be generated due to the movement of the shadow during the optical flow estimation process. In the embodiment of the present disclosure, due to the partial true value of the optical flow in the image area where the shadow , which can reduce the influence of shadows on the optical flow estimation model and guide the model to learn correctly.

Fig. 2 is a schematic flowchart of step S3 in an embodiment of the present disclosure. As shown in Figure 2, step S3 includes:

S3-1: Project the point cloud data of the first sample image to the first sample image and the second sample image respectively based on the inter-frame pose information of the first sample image and the second sample image.

In an optional manner, the pose information of the camera device when shooting the first sample image is obtained through a pose sensor (such as a gyroscope), and the pose information of the camera device when the second sample image is shot is obtained through the pose sensor information, and then obtain inter-frame pose information of the first sample image and the second sample image.

There is a mapping relationship between the point cloud data of the first sample image and the pixel points of the target object in the first sample image. Based on the point cloud data of the first sample image and the mapping relationship between the point cloud data and the pixel points, the point cloud data of the first sample image can be projected onto the first sample image.

Based on the inter-frame attitude information of the first sample image and the second sample image, the displacement relationship of the pixels between the first sample image and the second sample image can be obtained, and then based on the displacement relationship, the first sample image can be The point cloud data of is projected onto the second sample image.

S3-2: Based on the projection point position of the point cloud data of the first sample image on the first sample image, and the projection point position of the point cloud data of the first sample image on the second sample image, determine the first A pixel point mapping relationship between the sample image and the second sample image.

By calculating the position correspondence between the same projection point representing the target object between the first sample image and the second sample image, the pixel of the projection point of the target object between the first sample image and the second sample image can be obtained point mapping. For all the projected points of the point cloud data of the first sample image, the position correspondence between the first sample image and the second sample image is calculated, and the relationship between the first sample image and the second sample image can be obtained Pixel mapping relationship.

In this embodiment, the point cloud data of the first sample image is respectively projected to the first sample image and the second sample image, based on the position correspondence between the projected points between the first sample image and the second sample image , the pixel point mapping relationship between the first sample image and the second sample image can be accurately obtained.

In one embodiment of the present disclosure, before step S5, preferably before step S2, further includes: determining the first target noise region based on the first semantic segmentation result; determining the second target noise region based on the second semantic segmentation result ; Set the second optical flow between the first target noise area and the second target noise area to 0. Correspondingly, step S5 includes: constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow.

In this embodiment, the first target noise area may be the sky area in the first sample image, and the second target noise area may be the sky area in the second sample image. Since the optical flow in the sky is meaningless in practical applications, when training the first optical flow estimation model, setting the optical flow in the sky area to 0 can reduce the influence of image noise on the model effect and produce a clearer boundary. Good visualization.

Fig. 3 is a schematic flowchart of determining a third optical flow in an embodiment of the present disclosure. As shown in Figure 3, before step S5, it also includes:

A: According to the preset cropping method, the first sample image and the second sample image are respectively cropped to obtain the first cropped image and the second cropped image. Wherein, the size of the first cropped image and the second cropped image are the same, and the position area of the first cropped image in the first sample image is the same as the position area of the second cropped image in the second sample image, and both position areas Corresponding.

B: Process the first cropped image and the second cropped image based on the first optical flow estimation model to obtain a first optical flow estimation result. In an implementation manner, an initial first optical flow estimation model may be used to perform optical flow estimation on the first cropped image and the second cropped image to obtain a first optical flow estimation result.

C: Using the pre-trained second optical flow estimation model to process the first sample image and the second sample image to obtain a second optical flow estimation result between the first sample image and the second sample image. Wherein, the second optical flow estimation model is pre-trained, and its model prediction accuracy is higher than that of the untrained first optical flow estimation model.

D: Determine a third optical flow based on the first optical flow estimation result and the second optical flow estimation result. That is, the second optical flow estimation result predicted by the pre-trained and high-precision second optical flow estimation model is compared with the first optical flow estimation result, and the supervision of the first optical flow estimation model can be determined according to the comparison processing result. The third optical flow of information.

Correspondingly, step S5 includes: constraining the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.

In this embodiment, the first optical flow estimation result obtained by using the first optical flow estimation model to predict the cropped image pair is the same as the first optical flow estimation result obtained by processing the original sample image pair using the pre-trained second optical flow estimation model The second optical flow estimation result, comparing the first optical flow estimation result with the second optical flow estimation result, can determine the third optical flow as the supervisory information of the first optical flow estimation model, based on the third optical flow Constraining the first optical flow estimation model can effectively improve the prediction accuracy of the first optical flow estimation model after training.

Fig. 4 is a schematic flowchart of step D in an embodiment of the present disclosure. As shown in Figure 4, step D includes:

D-1: Based on the first optical flow estimation result, determine an occlusion area between the first cropped image and the second cropped image. Among them, the forward optical flow and backward optical flow of the first cropped image and the second cropped image are obtained from the first optical flow estimation result, and the verification is performed based on the forward optical flow and backward optical flow, so as to determine the first cropping The occluded area between the image and the second cropped image.

D-2: Determine a non-occluded area between the first sample image and the second sample image based on the second optical flow estimation result. Wherein, the forward optical flow and the backward optical flow of the first sample image and the second sample image are obtained from the second optical flow estimation result, and the verification is performed based on the forward optical flow and the backward optical flow, so as to determine the first The non-occluded area between the sample image and the second sample image.

D-3: Determine the target between the first sample image and the second sample image based on the occluded area between the first cropped image and the second cropped image, and the non-occluded area between the first sample image and the second sample image area. Wherein, the target area is an occlusion area between the first cropping image and the second cropping image, and a non-occlusion area between the first sample image and the second sample image.

D-4: Determining the optical flow of the target area between the first sample image and the second sample image in the second optical flow estimation result as the third optical flow.

In this embodiment, the cropped image crops out a part of the image area relative to the original sample image. Since the first optical flow estimation result is the result obtained by using the non-trained first optical flow estimation model to perform optical flow estimation on the full image of the sample image, and the second optical flow estimation result is the result of using the pre-trained second optical flow The estimation model estimates the results of the optical flow of the cropped image. Therefore, when the first optical flow estimation result and the second optical flow estimation result are inconsistent in the judgment of the occluded area and the non-occluded area, the second optical flow estimation result shall prevail, that is Obtaining the third optical flow for the target area from the second optical flow estimation result as the supervision information of the first optical flow estimation model during model training can effectively improve the prediction accuracy of the first optical flow estimation model after training.

In an embodiment of the present disclosure, the above-mentioned preset cropping mode is cropping by removing an outer frame. That is, the first cropped image removes the image range of the preset outer frame size relative to the first sample image, and the second cropped image removes the image range of the preset outer frame size relative to the second sample image.

In this embodiment, due to the use of outer frame cropping, the cropped image is cut out of the outer frame image area relative to the original sample image. When the acquisition time interval between sample images is short, optical flow estimation on the cropped image can be effective Avoid misjudgment as an occluded area due to exceeding the image acquisition range. On this basis, when the first optical flow estimation result is inconsistent with the second optical flow estimation result in the determination of the occlusion area and the non-occlusion area, the second optical flow estimation result shall prevail, and the second optical flow estimation result shall be obtained from The third optical flow of the target area is used as the supervision information of the first optical flow estimation model during model training, which can effectively improve the prediction accuracy of the first optical flow estimation model after training.

Fig. 5 is a schematic flowchart of training a second optical flow estimation model in an embodiment of the present disclosure. As shown in Figure 5, including:

O: Obtain the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image.

When the acquisition time of the third sample image and the fourth sample image can be obtained, the previous image and the subsequent image of the third sample image and the fourth sample image at the acquisition time can be determined according to the front and rear relationship of the image acquisition time; When the acquisition time of the third sample image and the fourth sample image cannot be obtained, one of the third sample image and the fourth sample image may be selected as the previous image, and the remaining images may be used as the subsequent image.

Calculate the optical flow of the previous image relative to the subsequent image as the forward optical flow between the third sample image and the fourth sample image.

The optical flow of the subsequent image relative to the previous image is calculated as a backward optical flow between the third sample image and the fourth sample image.

P: Verify based on the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image, and determine the distance between the third sample image and the fourth sample image non-occluded area.

An affine transformation operation is performed on the backward optical flow based on the forward optical flow between the third sample image and the fourth sample image, and the sum of the affine-transformed backward optical flow and the forward optical flow is calculated.

For a certain pixel position between the third sample image and the fourth sample image, the absolute value of the sum of the backward optical flow and the forward optical flow after the affine transformation is calculated.

If the absolute value is smaller than the preset threshold, it is determined that the pixel position is a non-occlusion position; if the absolute value is greater than or equal to the preset threshold, it is determined that the pixel position is an occlusion position.

Use the same method to determine whether all pixel positions between the third sample image and the fourth sample image are occluded positions, and combine the non-occluded positions of all pixels to obtain the non-occluded position between the third sample image and the fourth sample image area.

Q: Based on the position of the non-occluded area in the third sample image and the fourth sample image, use the photometric error loss function to train the second optical flow estimation model.

For the pixel position of the non-occlusion area between the third sample image and the fourth sample image, the photometric error is calculated based on the photometric error loss function, and then model training is constrained based on the photometric error calculation result.

In one embodiment of the present disclosure, the photometric error loss function can adopt the following formula:

Photometric loss error = Lp(Ii*Ij)

Among them, Lp is the photometric loss coefficient, SSIM(Ii, Ij) represents the structural similarity parameter between the sample image Ii and the sample image Ij, α represents the weight, and α is a constant.

It should be noted that when training the second optical flow estimation model, multiple sample image pairs are required, and for each sample image pair, the optical flow of the corresponding non-occluded area is acquired in the same manner as steps O to P.

When training the second optical flow estimation model, the optical flow of some sample image pairs and corresponding non-occluded regions is obtained from all sample image pairs to form a training set, and the remaining sample image pairs and corresponding non-occluded regions The optical flow constitutes the verification set, the training set is used for training, and the verification set is used for verification. When the training of the second optical flow estimation model meets the preset termination condition (for example, the number of model iterations reaches a predetermined number of iterations, or exceeds the preset model prediction accuracy threshold), the training of the second optical flow estimation model is terminated to obtain the second optical flow Estimation model.

In this embodiment, by training the second optical flow estimation model for non-occluded areas, the second optical flow estimation model can be made to have extremely high prediction accuracy when performing optical flow prediction for non-occluded areas, so that the second optical flow estimation model can be improved. The accuracy of the three optical flow values can further improve the prediction accuracy of the model trained based on the constraints of the third optical flow value and the first optical flow estimation model.

Any optical flow estimation model training method provided in the embodiments of the present disclosure may be executed by any appropriate device with data processing capability, including but not limited to: a terminal device, a server, and the like. Alternatively, the training method of any optical flow estimation model provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor executes any optical flow estimation model mentioned in the embodiments of the present disclosure by calling the corresponding instructions stored in the memory training method. I won't go into details below.

Exemplary device

Fig. 6 is a structural block diagram of an optical flow estimation model training device according to an embodiment of the present disclosure. As shown in FIG. 6 , the training device of the optical flow estimation model includes: a semantic segmentation module 100 , a static region determination module 200 , a mapping relation determination module 300 , a first optical flow determination module 400 and a constraint training module 500 .

Wherein, the semantic segmentation module 100 is used to perform semantic segmentation on the first sample image and the second sample image to obtain the first semantic segmentation result and the second semantic segmentation result respectively; the static region determination module 200 is used to perform semantic segmentation based on the first semantic segmentation The segmentation result and the second semantic segmentation result determine the first static area in the first sample image and the second static area in the second sample image; the mapping relationship determination module 300 is used to determine based on the first Inter-frame pose information of a sample image and the second sample image, and point cloud data of the first sample image, determine pixel point mapping between the first sample image and the second sample image relationship; the first optical flow determination module 400 is used to determine the first optical flow between the first static region and the second static region based on the pixel point mapping relationship; the constraint training module 500 is used to determine the first optical flow between the first static region and the second static region based on the first An optical flow that constrains the training of the first optical flow estimation model.

Fig. 7 is a structural block diagram of a mapping relationship determining module 300 according to an embodiment of the present disclosure. As shown in Figure 7, the mapping relationship determination module 300 includes:

A projection unit 301, configured to project the point cloud data of the first sample image to the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image, respectively. the second sample image;

A mapping relation determining unit 302, configured to project point positions of the point cloud data of the first sample image on the first sample image, and the point cloud data of the first sample image on the first sample image The position of the projected point on the two-sample image is used to determine the pixel point mapping relationship.

Fig. 8 is a structural block diagram of a training device for an optical flow estimation model in another embodiment of the present disclosure. As shown in Figure 8, the training device of the optical flow estimation model also includes:

A noise region determination module 600, configured to determine a first target noise region based on the first semantic segmentation result, and determine a second target noise region based on the second semantic segmentation result;

The second optical flow determination module 700 is configured to set the second optical flow between the first target noise area and the second target noise area to 0;

Wherein, the constraint training module 500 is specifically configured to constrain the training of the first optical flow estimation model based on the first optical flow and the second optical flow.

As shown in Figure 8, the training device of the optical flow estimation model also includes:

A cropping module 800, configured to respectively crop the first sample image and the second sample image according to a preset cropping method to obtain a first cropped image and a second cropped image;

The first optical flow estimation module 900 is configured to process the first cropped image and the second cropped image based on the first optical flow estimation model to obtain a first optical flow estimation result;

The second optical flow estimation module 1000 is configured to use a pre-trained second optical flow estimation model to process the first sample image and the second sample image to obtain the first sample image and the second sample image. A second optical flow estimation result between two sample images;

A third optical flow determination module 1100, configured to determine a third optical flow based on the first optical flow estimation result and the second optical flow estimation result;

Wherein, the constraint training module 500 is specifically configured to constrain the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.

Fig. 9 is a structural block diagram of a third optical flow determination module 1100 in an embodiment of the present disclosure. As shown in Figure 9, the third optical flow determination module 1100 includes:

An occlusion area determination unit 1101, configured to determine an occlusion area between the first cropped image and the second cropped image based on the first optical flow estimation result;

A non-occlusion area determination unit 1102, configured to determine a non-occlusion area between the first sample image and the second sample image based on the second optical flow estimation result;

A target area determining unit 1103, configured to determine the target area based on the occluded area between the first cropped image and the second cropped image, and the non-occluded area between the first sample image and the second sample image The target area between the first sample image and the second sample image, wherein the target area is an occluded area between the first cropped image and the second cropped image, and in the first An area between the sample image and the second sample image is a non-occlusion area;

The third optical flow determination unit 1104 is configured to determine the optical flow of the target area between the first sample image and the second sample image in the second optical flow estimation result as the third optical flow flow.

In an embodiment of the present disclosure, the preset cropping method is cropping by removing the outer border; wherein, the first cropped image has the preset outer border size removed relative to the first sample image An image range, the image range of which the preset outer frame size is removed from the second cropped image relative to the second sample image.

An optical flow acquisition module 1200, configured to acquire the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image;

A non-occlusion area determination module 1300, configured to perform verification based on the forward optical flow and the backward optical flow, and determine a non-occlusion area between the third sample image and the fourth sample image;

The second optical flow estimation model training module 1400 is configured to train the second optical flow estimation model based on the position of the non-occluded area in the third sample image and the fourth sample image, using a photometric error loss function .

It should be noted that the specific implementation of the training device for the optical flow estimation model in the embodiment of the present disclosure is similar to the specific implementation of the training method for the optical flow estimation model in the embodiment of the present disclosure. For details, refer to the training method of the optical flow estimation model , in order to reduce redundancy, it is not repeated here.

Exemplary electronic device

Hereinafter, an electronic device according to an embodiment of the present disclosure is described with reference to FIG. 10 . As shown in FIG. 10 , the electronic device includes one or more processors 10 and memory 20 .

Processor 10 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

Memory 20 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like. One or more computer program instructions can be stored on the computer-readable storage medium, and the processor 10 can execute the program instructions to implement the training method of the optical flow estimation model of the various embodiments of the present disclosure described above and/or other desired functionality. Various contents such as input signal, signal component, noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device may further include: an input device 30 and an output device 40, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown). The input device 30 can be, for example, a keyboard, a mouse, and the like. The output device 40 may include, for example, a display, a speaker, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components related to the present disclosure in the electronic device are shown in FIG. 10 , and components such as bus, input/output interface, etc. are omitted. In addition, the electronic device may also include any other suitable components according to specific applications.

Exemplary computer readable storage medium

The computer readable storage medium may utilize any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in conjunction with specific embodiments, but it should be pointed out that the advantages, advantages, effects, etc. mentioned in the present disclosure are only examples rather than limitations, and these advantages, advantages, effects, etc. Various embodiments of the present disclosure must have. In addition, the specific details disclosed above are only for the purpose of illustration and understanding, rather than limitation, and the above details do not limit the present disclosure to be implemented by using the above specific details.

Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. As for the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the related parts, please refer to the part of the description of the method embodiment.

The block diagrams of devices, devices, devices, and systems involved in the present disclosure are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be appreciated by those skilled in the art, these devices, devices, devices, systems may be connected, arranged, configured in any manner. Words such as "including", "comprising", "having" and the like are open-ended words meaning "including but not limited to" and may be used interchangeably therewith. As used herein, the words "or" and "and" refer to the word "and/or" and are used interchangeably therewith, unless the context clearly dictates otherwise. As used herein, the word "such as" refers to the phrase "such as but not limited to" and can be used interchangeably therewith.

The methods and apparatus of the present disclosure may be implemented in many ways. For example, the methods and apparatuses of the present disclosure may be implemented by software, hardware, firmware or any combination of software, hardware, and firmware. The above sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure can also be implemented as programs recorded in recording media, the programs including machine-readable instructions for realizing the method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It should also be pointed out that, in the devices, equipment and methods of the present disclosure, each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of the present disclosure.

The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the disclosed embodiments to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

A training method for an optical flow estimation model, comprising:

Semantic segmentation is performed on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively;

determining a first static region in the first sample image and a second static region in the second sample image based on the first semantic segmentation result and the second semantic segmentation result;

Based on the pose information between the frames of the first sample image and the second sample image and the point cloud data of the first sample image, determine the distance between the first sample image and the second sample image The pixel point mapping relationship;

determining a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;

Based on the first optical flow, constrain the training of the first optical flow estimation model.
The training method of the optical flow estimation model according to claim 1, wherein the inter-frame pose information based on the first sample image and the second sample image, and the points of the first sample image Cloud data, determining the pixel point mapping relationship between the first sample image and the second sample image, including:

Projecting the point cloud data of the first sample image to the first sample image and the second sample image respectively based on inter-frame pose information of the first sample image and the second sample image ;

Based on the projection point position of the point cloud data of the first sample image on the first sample image, and the projection point position of the point cloud data of the first sample image on the second sample image , to determine the pixel point mapping relationship.
The training method of the optical flow estimation model according to claim 1 or 2, wherein, before the training of the first optical flow estimation model is constrained based on the first optical flow, further comprising:

determining a first target noise region based on the first semantic segmentation result;

determining a second target noise region based on the second semantic segmentation result;

setting the second optical flow between the first target noise area and the second target noise area to 0;

Wherein, the training of constraining the first optical flow estimation model based on the first optical flow includes:

Constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow.
The training method of the optical flow estimation model according to claim 3, wherein, before constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow, further comprising:

Cutting the first sample image and the second sample image respectively according to a preset clipping method to obtain a first cropped image and a second cropped image;

Processing the first cropped image and the second cropped image based on the first optical flow estimation model to obtain a first optical flow estimation result;

Processing the first sample image and the second sample image by using a pre-trained second optical flow estimation model to obtain a second optical flow estimation between the first sample image and the second sample image result;

determining a third optical flow based on the first optical flow estimation result and the second optical flow estimation result;

Wherein, the constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow includes:

Constraining the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.
The training method of an optical flow estimation model according to claim 4, wherein said determining a third optical flow based on the first optical flow estimation result and the second optical flow estimation result comprises:

determining an occlusion area between the first cropped image and the second cropped image based on the first optical flow estimation result;

determining a non-occluded area between the first sample image and the second sample image based on the second optical flow estimation result;

Based on an occluded area between the first cropped image and the second cropped image, and a non-occluded area between the first sample image and the second sample image, determine the first sample image and the second sample image The target area between the second sample images, wherein the target area is an occluded area between the first cropped image and the second cropped image, and between the first sample image and the second cropped image The area between the sample images is a non-occluded area;

In the second optical flow estimation result, the optical flow of the target area between the first sample image and the second sample image is determined as the third optical flow.
The method for training an optical flow estimation model according to claim 4, wherein the preset clipping method is clipping by removing an outer frame.
The training method of the optical flow estimation model according to claim 4, wherein the second optical flow estimation model is obtained by the following methods, comprising:

Acquiring forward optical flow between the third sample image and the fourth sample image, and backward optical flow between the third sample image and the fourth sample image;

performing verification based on the forward optical flow and the backward optical flow, and determining a non-occluded area between the third sample image and the fourth sample image;

Based on the position of the non-occlusion area in the third sample image and the fourth sample image, the second optical flow estimation model is trained using a photometric error loss function.
A training device for an optical flow estimation model, comprising:

A semantic segmentation module, configured to perform semantic segmentation on the first sample image and the second sample image, to obtain the first semantic segmentation result and the second semantic segmentation result respectively;

A static area determining module, configured to determine a first static area in the first sample image and a second static area in the second sample image based on the first semantic segmentation result and the second semantic segmentation result area;

A mapping relationship determination module, configured to determine the relationship between the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image and the point cloud data of the first sample image The pixel point mapping relationship between the second sample images;

A first optical flow determination module, configured to determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;

The constraint training module is configured to constrain the training of the first optical flow estimation model based on the first optical flow.
A computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the optical flow estimation model training method according to any one of claims 1-7.
An electronic device comprising:

processor;

memory for storing said processor-executable instructions;

The processor is configured to read the executable instructions from the memory, and execute the instructions to implement the optical flow estimation model training method according to any one of claims 1-7.