WO2023116117A1 - Training method and apparatus for optical flow estimation model - Google Patents

Training method and apparatus for optical flow estimation model Download PDF

Info

Publication number
WO2023116117A1
WO2023116117A1 PCT/CN2022/123230 CN2022123230W WO2023116117A1 WO 2023116117 A1 WO2023116117 A1 WO 2023116117A1 CN 2022123230 W CN2022123230 W CN 2022123230W WO 2023116117 A1 WO2023116117 A1 WO 2023116117A1
Authority
WO
WIPO (PCT)
Prior art keywords
optical flow
sample image
flow estimation
estimation model
image
Prior art date
Application number
PCT/CN2022/123230
Other languages
French (fr)
Chinese (zh)
Inventor
于雷
隋伟
张骞
黄畅
Original Assignee
北京地平线信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京地平线信息技术有限公司 filed Critical 北京地平线信息技术有限公司
Publication of WO2023116117A1 publication Critical patent/WO2023116117A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the present disclosure relates to the technical fields of image processing and artificial intelligence (AI), in particular to a training method and device for an optical flow estimation model.
  • AI artificial intelligence
  • Dense optical flow estimation is to calculate the offset of all points on the image to form a dense optical flow field, and then based on the dense optical flow field, pixel-level image registration can be performed.
  • Dense optical flow estimation has a wide range of applications in the fields of autonomous driving and autonomous robots. In recent years, with the development of deep learning technology, dense optical flow estimation technology based on deep learning has achieved good results. The supervised optical flow estimation method based on deep learning usually requires a large number of labels for model training, but the optical flow labels of real scenes are very difficult to obtain, and the model based on virtual data training often has low generalization in real scenes. question.
  • the self-supervised method is generally used for model training.
  • the photometric error loss function is used to train the optical flow estimation model.
  • this self-supervised The accuracy of the model training method is low.
  • Embodiments of the present disclosure provide a method and device for training an optical flow estimation model.
  • a method for training an optical flow estimation model including:
  • Semantic segmentation is performed on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively;
  • a training device for an optical flow estimation model including:
  • a semantic segmentation module configured to perform semantic segmentation on the first sample image and the second sample image, to obtain the first semantic segmentation result and the second semantic segmentation result respectively;
  • a static area determining module configured to determine a first static area in the first sample image and a second static area in the second sample image based on the first semantic segmentation result and the second semantic segmentation result area;
  • a mapping relationship determination module configured to determine the relationship between the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image and the point cloud data of the first sample image The pixel point mapping relationship between the second sample images;
  • a first optical flow determination module configured to determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship
  • the constraint training module is configured to constrain the training of the first optical flow estimation model based on the first optical flow.
  • a computer-readable storage medium stores a computer program, and the computer program is used to execute the optical flow estimation model training method described in the first aspect above .
  • an electronic device includes:
  • the processor is configured to read the executable instructions from the memory, and execute the instructions to implement the optical flow estimation model training method described in the first aspect above.
  • semantic segmentation is performed on the first sample image and the second sample image, and the first semantic segmentation result and the second semantic segmentation result are respectively obtained.
  • determining the first static region in the first sample image and the second static region in the second sample image based on inter-frame pose information between the first sample image and the second sample image , and the point cloud data of the first sample image, determine the pixel point mapping relationship between the first sample image and the second sample image, and determine the first static area and the second static area based on the pixel point mapping relationship
  • the first optical flow between is based on the first optical flow to constrain the training of the first optical flow estimation model.
  • the first optical flow is obtained after processing based on point cloud data, it is the true value of optical flow, and the true value of optical flow is used as the supervision information during model training, so that the trained first optical flow Flow estimation models significantly outperform self-supervised methods in accuracy.
  • the optical flow will be generated due to the movement of the shadow during the optical flow estimation process.
  • due to the partial true value of the optical flow in the image area where the shadow which can reduce the influence of shadows on the optical flow estimation model and guide the model to learn correctly.
  • FIG. 1 is a schematic flowchart of a training method of an optical flow estimation model according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flow diagram of step S3 in an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of determining a third optical flow in an embodiment of the present disclosure
  • FIG. 4 is a schematic flow diagram of step D in an embodiment of the present disclosure.
  • FIG. 5 is a schematic flow diagram of training a second optical flow estimation model in an embodiment of the present disclosure
  • FIG. 6 is a structural block diagram of a training device for an optical flow estimation model according to an embodiment of the present disclosure
  • FIG. 7 is a structural block diagram of a mapping relationship determination module 300 according to an embodiment of the present disclosure.
  • Fig. 8 is a structural block diagram of a training device for an optical flow estimation model in another embodiment of the present disclosure.
  • FIG. 9 is a structural block diagram of a third optical flow determination module 1100 in an embodiment of the present disclosure.
  • Fig. 10 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
  • plural may refer to two or more than two, and “at least one” may refer to one, two or more than two.
  • the term "and/or" in the present disclosure is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may indicate: A exists alone, and A and B exist at the same time , there are three cases of B alone.
  • the character "/" in the present disclosure generally indicates that the contextual objects are an "or" relationship.
  • Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known terminal devices, computing systems, environments and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick client Computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, etc.
  • Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system.
  • program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computing system storage media including storage devices.
  • FIG. 1 is a schematic flowchart of a training method for an optical flow estimation model according to an embodiment of the present disclosure. This embodiment can be applied to electronic equipment, as shown in Figure 1, including the following steps:
  • S1 Perform semantic segmentation on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively.
  • the vehicle-mounted camera device adopts the method of shooting video to capture the image in front of the vehicle, and from the video collected by the vehicle-mounted camera device, the first sample image and the second sample image at an interval of N frames are obtained. image.
  • N is an integer greater than or equal to 1.
  • the vehicle-mounted camera device captures images in front of the vehicle by taking a frame of images at intervals of a preset time, and obtains the first sample image and the second sample image from the images captured by the vehicle-mounted camera device. Two sample images.
  • the semantic segmentation After obtaining the first sample image and the second sample image, use the pre-trained semantic segmentation model to perform semantic segmentation on the first sample image to obtain the first semantic segmentation result, and use the semantic segmentation model to perform semantic segmentation on the second sample image
  • the semantic segmentation obtains a second semantic segmentation result.
  • the first semantic segmentation result may include sky area, road area, pedestrian area, vehicle area and other image areas appearing in the first sample image.
  • the second semantic segmentation result may include sky area, road area, pedestrian area, vehicle area and other image areas appearing in the second sample image.
  • the first static area in the first sample image according to the first semantic segmentation result, for example, the static area of the road surface area, mountain peak area and roadside fixed objects (such as utility poles and traffic lights) in the first sample image, etc. .
  • the second static area in the first sample image is determined according to the second semantic segmentation result, such as static areas of road surface area, mountain peak area, roadside fixed objects (such as utility poles and traffic lights) in the second sample image.
  • the second semantic segmentation result such as static areas of road surface area, mountain peak area, roadside fixed objects (such as utility poles and traffic lights) in the second sample image.
  • S3 Determine the pixel point mapping relationship between the first sample image and the second sample image based on the pose information between the frames of the first sample image and the second sample image and the point cloud data of the first sample image.
  • mapping relationship between the point cloud data of the first sample image and the pixel points of the target object in the first sample image Based on the inter-frame pose information of the first sample image and the second sample image, the pixel displacement of the target object between the first sample image and the second sample image can be calculated, so based on the The pose information between frames and the point cloud data of the first sample image can determine the pixel point mapping relationship between the first sample image and the second sample image.
  • S4 Determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship.
  • a static sub-area is extracted from the first static area (such as the area of the utility pole A), and the corresponding static sub-area (ie the area of the utility pole A) is extracted from the second static area, combined with the first
  • the pixel point mapping relationship between this image and the second sample image can obtain the true value of the optical flow of the static sub-region between the first sample image and the second sample image.
  • the true value of the optical flow is determined in the same manner for all the static sub-regions of the first static region and the corresponding static sub-regions of the second static region.
  • the first sample image and the second sample image are combined into a sample image pair, and the first optical flow is used as supervision information when the first optical flow estimation model is trained based on the sample image pair. It should be noted that when training the first optical flow estimation model, multiple sample image pairs are required, and for each sample image pair, the corresponding first optical flow is obtained in the same manner as steps S1 to S4.
  • the existing optical flow estimation model training method can be used for training.
  • the training of the first optical flow estimation model meets the preset termination condition (for example, the number of model iterations reaches the predetermined number of iterations, or exceeds the preset model prediction accuracy threshold)
  • the training of the first optical flow estimation model is terminated to obtain the first optical flow Estimation model.
  • the first optical flow is obtained after processing based on point cloud data, it is the true value of optical flow, and the true value of optical flow is used as the supervision information during model training, so that the trained first optical flow Flow estimation models significantly outperform self-supervised methods in accuracy.
  • the optical flow will be generated due to the movement of the shadow during the optical flow estimation process.
  • due to the partial true value of the optical flow in the image area where the shadow which can reduce the influence of shadows on the optical flow estimation model and guide the model to learn correctly.
  • step S3 includes:
  • S3-1 Project the point cloud data of the first sample image to the first sample image and the second sample image respectively based on the inter-frame pose information of the first sample image and the second sample image.
  • the pose information of the camera device when shooting the first sample image is obtained through a pose sensor (such as a gyroscope), and the pose information of the camera device when the second sample image is shot is obtained through the pose sensor information, and then obtain inter-frame pose information of the first sample image and the second sample image.
  • a pose sensor such as a gyroscope
  • the point cloud data of the first sample image can be projected onto the first sample image.
  • the displacement relationship of the pixels between the first sample image and the second sample image can be obtained, and then based on the displacement relationship, the first sample image can be The point cloud data of is projected onto the second sample image.
  • S3-2 Based on the projection point position of the point cloud data of the first sample image on the first sample image, and the projection point position of the point cloud data of the first sample image on the second sample image, determine the first A pixel point mapping relationship between the sample image and the second sample image.
  • the pixel of the projection point of the target object between the first sample image and the second sample image can be obtained point mapping.
  • the position correspondence between the first sample image and the second sample image is calculated, and the relationship between the first sample image and the second sample image can be obtained Pixel mapping relationship.
  • the point cloud data of the first sample image is respectively projected to the first sample image and the second sample image, based on the position correspondence between the projected points between the first sample image and the second sample image , the pixel point mapping relationship between the first sample image and the second sample image can be accurately obtained.
  • step S5 before step S5, preferably before step S2, further includes: determining the first target noise region based on the first semantic segmentation result; determining the second target noise region based on the second semantic segmentation result ; Set the second optical flow between the first target noise area and the second target noise area to 0.
  • step S5 includes: constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow.
  • the first target noise area may be the sky area in the first sample image
  • the second target noise area may be the sky area in the second sample image. Since the optical flow in the sky is meaningless in practical applications, when training the first optical flow estimation model, setting the optical flow in the sky area to 0 can reduce the influence of image noise on the model effect and produce a clearer boundary. Good visualization.
  • Fig. 3 is a schematic flowchart of determining a third optical flow in an embodiment of the present disclosure. As shown in Figure 3, before step S5, it also includes:
  • the first sample image and the second sample image are respectively cropped to obtain the first cropped image and the second cropped image.
  • the size of the first cropped image and the second cropped image are the same, and the position area of the first cropped image in the first sample image is the same as the position area of the second cropped image in the second sample image, and both position areas Corresponding.
  • an initial first optical flow estimation model may be used to perform optical flow estimation on the first cropped image and the second cropped image to obtain a first optical flow estimation result.
  • step S5 includes: constraining the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.
  • the first optical flow estimation result obtained by using the first optical flow estimation model to predict the cropped image pair is the same as the first optical flow estimation result obtained by processing the original sample image pair using the pre-trained second optical flow estimation model
  • the second optical flow estimation result, comparing the first optical flow estimation result with the second optical flow estimation result can determine the third optical flow as the supervisory information of the first optical flow estimation model, based on the third optical flow Constraining the first optical flow estimation model can effectively improve the prediction accuracy of the first optical flow estimation model after training.
  • step D includes:
  • D-1 Based on the first optical flow estimation result, determine an occlusion area between the first cropped image and the second cropped image.
  • the forward optical flow and backward optical flow of the first cropped image and the second cropped image are obtained from the first optical flow estimation result, and the verification is performed based on the forward optical flow and backward optical flow, so as to determine the first cropping The occluded area between the image and the second cropped image.
  • D-2 Determine a non-occluded area between the first sample image and the second sample image based on the second optical flow estimation result.
  • the forward optical flow and the backward optical flow of the first sample image and the second sample image are obtained from the second optical flow estimation result, and the verification is performed based on the forward optical flow and the backward optical flow, so as to determine the first The non-occluded area between the sample image and the second sample image.
  • D-3 Determine the target between the first sample image and the second sample image based on the occluded area between the first cropped image and the second cropped image, and the non-occluded area between the first sample image and the second sample image area.
  • the target area is an occlusion area between the first cropping image and the second cropping image, and a non-occlusion area between the first sample image and the second sample image.
  • D-4 Determining the optical flow of the target area between the first sample image and the second sample image in the second optical flow estimation result as the third optical flow.
  • the cropped image crops out a part of the image area relative to the original sample image. Since the first optical flow estimation result is the result obtained by using the non-trained first optical flow estimation model to perform optical flow estimation on the full image of the sample image, and the second optical flow estimation result is the result of using the pre-trained second optical flow
  • the estimation model estimates the results of the optical flow of the cropped image.
  • the second optical flow estimation result shall prevail, that is Obtaining the third optical flow for the target area from the second optical flow estimation result as the supervision information of the first optical flow estimation model during model training can effectively improve the prediction accuracy of the first optical flow estimation model after training.
  • the above-mentioned preset cropping mode is cropping by removing an outer frame. That is, the first cropped image removes the image range of the preset outer frame size relative to the first sample image, and the second cropped image removes the image range of the preset outer frame size relative to the second sample image.
  • the cropped image is cut out of the outer frame image area relative to the original sample image.
  • optical flow estimation on the cropped image can be effective Avoid misjudgment as an occluded area due to exceeding the image acquisition range.
  • the second optical flow estimation result shall prevail, and the second optical flow estimation result shall be obtained from
  • the third optical flow of the target area is used as the supervision information of the first optical flow estimation model during model training, which can effectively improve the prediction accuracy of the first optical flow estimation model after training.
  • FIG. 5 is a schematic flowchart of training a second optical flow estimation model in an embodiment of the present disclosure. As shown in Figure 5, including:
  • the previous image and the subsequent image of the third sample image and the fourth sample image at the acquisition time can be determined according to the front and rear relationship of the image acquisition time;
  • one of the third sample image and the fourth sample image may be selected as the previous image, and the remaining images may be used as the subsequent image.
  • the optical flow of the subsequent image relative to the previous image is calculated as a backward optical flow between the third sample image and the fourth sample image.
  • P Verify based on the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image, and determine the distance between the third sample image and the fourth sample image non-occluded area.
  • An affine transformation operation is performed on the backward optical flow based on the forward optical flow between the third sample image and the fourth sample image, and the sum of the affine-transformed backward optical flow and the forward optical flow is calculated.
  • the absolute value of the sum of the backward optical flow and the forward optical flow after the affine transformation is calculated.
  • the pixel position is a non-occlusion position; if the absolute value is greater than or equal to the preset threshold, it is determined that the pixel position is an occlusion position.
  • the photometric error is calculated based on the photometric error loss function, and then model training is constrained based on the photometric error calculation result.
  • the photometric error loss function can adopt the following formula:
  • Lp is the photometric loss coefficient
  • SSIM(Ii, Ij) represents the structural similarity parameter between the sample image Ii and the sample image Ij
  • represents the weight
  • is a constant.
  • the optical flow of some sample image pairs and corresponding non-occluded regions is obtained from all sample image pairs to form a training set, and the remaining sample image pairs and corresponding non-occluded regions
  • the optical flow constitutes the verification set
  • the training set is used for training
  • the verification set is used for verification.
  • the training of the second optical flow estimation model meets the preset termination condition (for example, the number of model iterations reaches a predetermined number of iterations, or exceeds the preset model prediction accuracy threshold)
  • the training of the second optical flow estimation model is terminated to obtain the second optical flow Estimation model.
  • the second optical flow estimation model by training the second optical flow estimation model for non-occluded areas, the second optical flow estimation model can be made to have extremely high prediction accuracy when performing optical flow prediction for non-occluded areas, so that the second optical flow estimation model can be improved.
  • the accuracy of the three optical flow values can further improve the prediction accuracy of the model trained based on the constraints of the third optical flow value and the first optical flow estimation model.
  • Any optical flow estimation model training method provided in the embodiments of the present disclosure may be executed by any appropriate device with data processing capability, including but not limited to: a terminal device, a server, and the like.
  • the training method of any optical flow estimation model provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor executes any optical flow estimation model mentioned in the embodiments of the present disclosure by calling the corresponding instructions stored in the memory training method. I won't go into details below.
  • Fig. 6 is a structural block diagram of an optical flow estimation model training device according to an embodiment of the present disclosure.
  • the training device of the optical flow estimation model includes: a semantic segmentation module 100 , a static region determination module 200 , a mapping relation determination module 300 , a first optical flow determination module 400 and a constraint training module 500 .
  • the semantic segmentation module 100 is used to perform semantic segmentation on the first sample image and the second sample image to obtain the first semantic segmentation result and the second semantic segmentation result respectively;
  • the static region determination module 200 is used to perform semantic segmentation based on the first semantic segmentation The segmentation result and the second semantic segmentation result determine the first static area in the first sample image and the second static area in the second sample image;
  • the mapping relationship determination module 300 is used to determine based on the first Inter-frame pose information of a sample image and the second sample image, and point cloud data of the first sample image, determine pixel point mapping between the first sample image and the second sample image relationship;
  • the first optical flow determination module 400 is used to determine the first optical flow between the first static region and the second static region based on the pixel point mapping relationship;
  • the constraint training module 500 is used to determine the first optical flow between the first static region and the second static region based on the first An optical flow that constrains the training of the first optical flow estimation model.
  • Fig. 7 is a structural block diagram of a mapping relationship determining module 300 according to an embodiment of the present disclosure. As shown in Figure 7, the mapping relationship determination module 300 includes:
  • a projection unit 301 configured to project the point cloud data of the first sample image to the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image, respectively. the second sample image;
  • a mapping relation determining unit 302 configured to project point positions of the point cloud data of the first sample image on the first sample image, and the point cloud data of the first sample image on the first sample image The position of the projected point on the two-sample image is used to determine the pixel point mapping relationship.
  • Fig. 8 is a structural block diagram of a training device for an optical flow estimation model in another embodiment of the present disclosure. As shown in Figure 8, the training device of the optical flow estimation model also includes:
  • a noise region determination module 600 configured to determine a first target noise region based on the first semantic segmentation result, and determine a second target noise region based on the second semantic segmentation result;
  • the second optical flow determination module 700 is configured to set the second optical flow between the first target noise area and the second target noise area to 0;
  • the constraint training module 500 is specifically configured to constrain the training of the first optical flow estimation model based on the first optical flow and the second optical flow.
  • the training device of the optical flow estimation model also includes:
  • a cropping module 800 configured to respectively crop the first sample image and the second sample image according to a preset cropping method to obtain a first cropped image and a second cropped image;
  • the first optical flow estimation module 900 is configured to process the first cropped image and the second cropped image based on the first optical flow estimation model to obtain a first optical flow estimation result;
  • the second optical flow estimation module 1000 is configured to use a pre-trained second optical flow estimation model to process the first sample image and the second sample image to obtain the first sample image and the second sample image.
  • a third optical flow determination module 1100 configured to determine a third optical flow based on the first optical flow estimation result and the second optical flow estimation result
  • the constraint training module 500 is specifically configured to constrain the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.
  • Fig. 9 is a structural block diagram of a third optical flow determination module 1100 in an embodiment of the present disclosure. As shown in Figure 9, the third optical flow determination module 1100 includes:
  • An occlusion area determination unit 1101 configured to determine an occlusion area between the first cropped image and the second cropped image based on the first optical flow estimation result
  • a non-occlusion area determination unit 1102 configured to determine a non-occlusion area between the first sample image and the second sample image based on the second optical flow estimation result
  • a target area determining unit 1103, configured to determine the target area based on the occluded area between the first cropped image and the second cropped image, and the non-occluded area between the first sample image and the second sample image.
  • the third optical flow determination unit 1104 is configured to determine the optical flow of the target area between the first sample image and the second sample image in the second optical flow estimation result as the third optical flow flow.
  • the preset cropping method is cropping by removing the outer border; wherein, the first cropped image has the preset outer border size removed relative to the first sample image An image range, the image range of which the preset outer frame size is removed from the second cropped image relative to the second sample image.
  • the training device of the optical flow estimation model also includes:
  • An optical flow acquisition module 1200 configured to acquire the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image;
  • a non-occlusion area determination module 1300 configured to perform verification based on the forward optical flow and the backward optical flow, and determine a non-occlusion area between the third sample image and the fourth sample image;
  • the second optical flow estimation model training module 1400 is configured to train the second optical flow estimation model based on the position of the non-occluded area in the third sample image and the fourth sample image, using a photometric error loss function .
  • the specific implementation of the training device for the optical flow estimation model in the embodiment of the present disclosure is similar to the specific implementation of the training method for the optical flow estimation model in the embodiment of the present disclosure.
  • the training method of the optical flow estimation model in order to reduce redundancy, it is not repeated here.
  • the electronic device includes one or more processors 10 and memory 20 .
  • Processor 10 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
  • CPU central processing unit
  • Processor 10 may control other components in the electronic device to perform desired functions.
  • Memory 20 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
  • One or more computer program instructions can be stored on the computer-readable storage medium, and the processor 10 can execute the program instructions to implement the training method of the optical flow estimation model of the various embodiments of the present disclosure described above and/or other desired functionality.
  • Various contents such as input signal, signal component, noise component, etc. may also be stored in the computer-readable storage medium.
  • the electronic device may further include: an input device 30 and an output device 40, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown).
  • the input device 30 can be, for example, a keyboard, a mouse, and the like.
  • the output device 40 may include, for example, a display, a speaker, a printer, and a communication network and its connected remote output devices, among others.
  • the electronic device may also include any other suitable components according to specific applications.
  • the computer readable storage medium may utilize any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • a readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • the methods and apparatus of the present disclosure may be implemented in many ways.
  • the methods and apparatuses of the present disclosure may be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise.
  • the present disclosure can also be implemented as programs recorded in recording media, the programs including machine-readable instructions for realizing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
  • each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A training method and apparatus for an optical flow estimation model. The training method comprises: S1: performing semantic segmentation on a first sample image and a second sample image to respectively obtain a first semantic segmentation result and a second semantic segmentation result; S2: determining a first static region in the first sample image and a second static region in the second sample image on the basis of the first semantic segmentation result and the second semantic segmentation result; S3: determining a pixel point mapping relationship between the first sample image and the second sample image on the basis of inter-frame pose information of the first sample image and the second sample image, and point cloud data of the first sample image; S4: determining a first optical flow between the first static region and the second static region on the basis of the pixel point mapping relationship; and S5: constraining the training of a first optical flow estimation model on the basis of the first optical flow. The precision of the trained optical flow estimation model is obviously higher than a self-supervised method, and the shadow problem can be solved.

Description

光流估计模型的训练方法和装置Training method and device for optical flow estimation model
本公开要求在2021年12月21日提交的、申请号为202111572711.2、发明名称为“光流估计模型的训练方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application with application number 202111572711.2 and titled "Optical Flow Estimation Model Training Method and Device" filed on December 21, 2021, the entire contents of which are incorporated by reference in this disclosure .
技术领域technical field
本公开涉及图像处理和人工智能(Artificial Intelligence,AI)技术领域,尤其是一种光流估计模型的训练方法和装置。The present disclosure relates to the technical fields of image processing and artificial intelligence (AI), in particular to a training method and device for an optical flow estimation model.
背景技术Background technique
稠密光流估计是通过计算图像上所有的点的偏移量,从而形成一个稠密的光流场,然后基于稠密的光流场,可以进行像素级别的图像配准。稠密光流估计在自动驾驶、自主机器人领域有着广泛的应用。近年来随着深度学习技术的发展,基于深度学习的稠密光流估计技术取得了较好的效果。基于深度学习的有监督光流估计方法,通常需要大量的标签进行模型训练,但真实场景的光流标签非常难以获取,而基于虚拟数据训练的模型在真实场景中应用往往存在泛化性低的问题。Dense optical flow estimation is to calculate the offset of all points on the image to form a dense optical flow field, and then based on the dense optical flow field, pixel-level image registration can be performed. Dense optical flow estimation has a wide range of applications in the fields of autonomous driving and autonomous robots. In recent years, with the development of deep learning technology, dense optical flow estimation technology based on deep learning has achieved good results. The supervised optical flow estimation method based on deep learning usually requires a large number of labels for model training, but the optical flow labels of real scenes are very difficult to obtain, and the model based on virtual data training often has low generalization in real scenes. question.
相关技术中,在缺乏光流标签的情况下,一般采用自监督的方法进行模型训练,训练过程中基于帧间光度一致性假设,采用光度误差损失函数训练光流估计模型,但是这种自监督的模型训练方法精度较低。In related technologies, in the absence of optical flow labels, the self-supervised method is generally used for model training. During the training process, based on the assumption of photometric consistency between frames, the photometric error loss function is used to train the optical flow estimation model. However, this self-supervised The accuracy of the model training method is low.
发明内容Contents of the invention
为了解决上述技术问题,提出了本公开。本公开的实施例提供了一种光流估计模型的训练方法和装置。In order to solve the above-mentioned technical problems, the present disclosure is proposed. Embodiments of the present disclosure provide a method and device for training an optical flow estimation model.
根据本公开实施例的第一方面,提供了一种光流估计模型的训练方法,包括:According to the first aspect of the embodiments of the present disclosure, a method for training an optical flow estimation model is provided, including:
对第一样本图像和第二样本图像进行语义分割,分别得到第一语义分割结果和第二语义分割结果;Semantic segmentation is performed on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively;
基于所述第一语义分割结果和所述第二语义分割结果,确定所述第一样本图像中的第一静态区域和所述第二样本图像中的第二静态区域;determining a first static region in the first sample image and a second static region in the second sample image based on the first semantic segmentation result and the second semantic segmentation result;
基于所述第一样本图像与所述第二样本图像的帧间姿态信息、以及所述第一样本图像的点云数据,确定所述第一样本图像与所述第二样本图像间的像素点映射关系;Based on the pose information between the frames of the first sample image and the second sample image and the point cloud data of the first sample image, determine the distance between the first sample image and the second sample image The pixel point mapping relationship;
基于所述像素点映射关系,确定所述第一静态区域与所述第二静态区域间的第一光流;determining a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;
基于所述第一光流,约束第一光流估计模型的训练。Based on the first optical flow, constrain the training of the first optical flow estimation model.
根据本公开实施例的第二方面,提供了一种光流估计模型的训练装置,包括:According to a second aspect of an embodiment of the present disclosure, a training device for an optical flow estimation model is provided, including:
语义分割模块,用于对第一样本图像和第二样本图像进行语义分割,分别得到第一语义分割结果和第二语义分割结果;A semantic segmentation module, configured to perform semantic segmentation on the first sample image and the second sample image, to obtain the first semantic segmentation result and the second semantic segmentation result respectively;
静态区域确定模块,用于基于所述第一语义分割结果和所述第二语义分割结果,确定所述第一样本图像中的第一静态区域和所述第二样本图像中的第二静态区域;A static area determining module, configured to determine a first static area in the first sample image and a second static area in the second sample image based on the first semantic segmentation result and the second semantic segmentation result area;
映射关系确定模块,用于基于所述第一样本图像与所述第二样本图像的帧间姿态信息、以及所述第一样本图像的点云数据,确定所述第一样本图像与所述第二样本图像间的像素点映射关系;A mapping relationship determination module, configured to determine the relationship between the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image and the point cloud data of the first sample image The pixel point mapping relationship between the second sample images;
第一光流确定模块,用于基于所述像素点映射关系,确定所述第一静态区域与所述第二静态区域间的第一光流;A first optical flow determination module, configured to determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;
约束训练模块,用于基于所述第一光流,约束第一光流估计模型的训练。The constraint training module is configured to constrain the training of the first optical flow estimation model based on the first optical flow.
根据本公开实施例的第三方面,提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述第一方面所述的光流估计模型的训练方法。According to the third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the optical flow estimation model training method described in the first aspect above .
根据本公开实施例的第四方面,提供了一种电子设备,所述电子设备包括:According to a fourth aspect of an embodiment of the present disclosure, an electronic device is provided, and the electronic device includes:
处理器;processor;
用于存储所述处理器可执行指令的存储器;memory for storing said processor-executable instructions;
所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述第一方面所述的光流估计模型的训练方法。The processor is configured to read the executable instructions from the memory, and execute the instructions to implement the optical flow estimation model training method described in the first aspect above.
基于本公开上述实施例提供的光流估计模型的训练方法和装置,对第一样本图像和第二样本图像进行语义分割,分别得到第一语义分割结果和第二语义分割结果,基于第一语义分割结果和第二语义分割结果,确定第一样本图像中的第一静态区域和第二样本图像中的第二静态区域,基于第一样本图像与第二样本图像的帧间姿态信息、以及第一样本图像的点云数据,确定所述第一样本图像与所述第二样本图像间的像素点映射关系,基于像素点映射关系,确定第一静态区域与第二静态区域间的第一光流,基于第一光流约束第一光流估计模型的训练。在本公开实施例中,由于第一光流是基于点云数据进行处理后得到的,是光流真值,将光流真值作为模型训练时的监督信息,可以让训练出的第一光流估计模型在精度上明显高于自监督的方法。此外,由于车辆在灯光或日光下会有影子,在光流估计的过程中会出现因为影子移动而产生光流,在本公开的实施例中,由于影子所在的图像区域存在部分光流真值,从而可以降低影子对光流估计模型的影响,引导模型正确学习。Based on the optical flow estimation model training method and device provided by the above-mentioned embodiments of the present disclosure, semantic segmentation is performed on the first sample image and the second sample image, and the first semantic segmentation result and the second semantic segmentation result are respectively obtained. Based on the first Semantic segmentation results and second semantic segmentation results, determining the first static region in the first sample image and the second static region in the second sample image, based on inter-frame pose information between the first sample image and the second sample image , and the point cloud data of the first sample image, determine the pixel point mapping relationship between the first sample image and the second sample image, and determine the first static area and the second static area based on the pixel point mapping relationship The first optical flow between is based on the first optical flow to constrain the training of the first optical flow estimation model. In the embodiment of the present disclosure, since the first optical flow is obtained after processing based on point cloud data, it is the true value of optical flow, and the true value of optical flow is used as the supervision information during model training, so that the trained first optical flow Flow estimation models significantly outperform self-supervised methods in accuracy. In addition, due to the shadow of the vehicle under the light or sunlight, the optical flow will be generated due to the movement of the shadow during the optical flow estimation process. In the embodiment of the present disclosure, due to the partial true value of the optical flow in the image area where the shadow , which can reduce the influence of shadows on the optical flow estimation model and guide the model to learn correctly.
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。The technical solution of the present disclosure will be described in further detail below with reference to the drawings and embodiments.
附图说明Description of drawings
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其他目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。The above and other objects, features and advantages of the present disclosure will become more apparent by describing the embodiments of the present disclosure in more detail with reference to the accompanying drawings. The accompanying drawings are used to provide a further understanding of the embodiments of the present disclosure, and constitute a part of the specification, and are used together with the embodiments of the present disclosure to explain the present disclosure, and do not constitute limitations to the present disclosure. In the drawings, the same reference numerals generally represent the same components or steps.
图1是本公开一个实施例的光流估计模型的训练方法的流程示意图;FIG. 1 is a schematic flowchart of a training method of an optical flow estimation model according to an embodiment of the present disclosure;
图2是本公开一个实施例中步骤S3的流程示意图;FIG. 2 is a schematic flow diagram of step S3 in an embodiment of the present disclosure;
图3为本公开一个实施例中确定第三光流的流程示意图;FIG. 3 is a schematic flowchart of determining a third optical flow in an embodiment of the present disclosure;
图4是本公开一个实施例中步骤D的流程示意图;FIG. 4 is a schematic flow diagram of step D in an embodiment of the present disclosure;
图5是本公开一个实施例中训练第二光流估计模型的流程示意图;FIG. 5 is a schematic flow diagram of training a second optical flow estimation model in an embodiment of the present disclosure;
图6是本公开一个实施例的光流估计模型的训练装置的结构框图;6 is a structural block diagram of a training device for an optical flow estimation model according to an embodiment of the present disclosure;
图7是本公开一个实施例的映射关系确定模块300的结构框图;FIG. 7 is a structural block diagram of a mapping relationship determination module 300 according to an embodiment of the present disclosure;
图8是本公开另一个实施例中光流估计模型的训练装置的结构框图;Fig. 8 is a structural block diagram of a training device for an optical flow estimation model in another embodiment of the present disclosure;
图9是本公开一个实施例中第三光流确定模块1100的结构框图;FIG. 9 is a structural block diagram of a third optical flow determination module 1100 in an embodiment of the present disclosure;
图10是本公开一示例性实施例提供的电子设备的结构图。Fig. 10 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
下面,将参考附图详细地描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理解,本公开不受这里描述的示例实施例的限制。Hereinafter, exemplary embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. Apparently, the described embodiments are only some of the embodiments of the present disclosure, rather than all the embodiments of the present disclosure, and it should be understood that the present disclosure is not limited by the exemplary embodiments described here.
应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。It should be noted that relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。Those skilled in the art can understand that terms such as "first" and "second" in the embodiments of the present disclosure are only used to distinguish different steps, devices or modules, etc. necessary logical sequence.
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。It should also be understood that in the embodiments of the present disclosure, "plurality" may refer to two or more than two, and "at least one" may refer to one, two or more than two.
还应理解,对于本公开实施例中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。It should also be understood that any component, data or structure mentioned in the embodiments of the present disclosure can generally be understood as one or more unless there is a clear limitation or a contrary suggestion is given in the context.
另外,本公开中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本公开中字符“/”,一般表示前后关联对象是一种“或”的关系。In addition, the term "and/or" in the present disclosure is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may indicate: A exists alone, and A and B exist at the same time , there are three cases of B alone. In addition, the character "/" in the present disclosure generally indicates that the contextual objects are an "or" relationship.
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。It should also be understood that the description of the various embodiments in the present disclosure emphasizes the differences between the various embodiments, and the same or similar points can be referred to each other, and for the sake of brevity, details are not repeated here.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and in no way intended as any limitation of the disclosure, its application or uses.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.
本公开实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick client Computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, etc.
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including storage devices.
示例性方法exemplary method
图1是本公开一个实施例的光流估计模型的训练方法的流程示意图。本实施例可应用在电子设备上,如图1所示,包括如下步骤:FIG. 1 is a schematic flowchart of a training method for an optical flow estimation model according to an embodiment of the present disclosure. This embodiment can be applied to electronic equipment, as shown in Figure 1, including the following steps:
S1:对第一样本图像和第二样本图像进行语义分割,分别得到第一语义分割结果和第二语义分割结果。S1: Perform semantic segmentation on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively.
在一种可选的方式中,车辆行驶时,通过车载摄像装置采用拍摄视频的方式拍摄车辆前方图像,从车载摄像装置采集的视频中,获取间隔N帧的第一样本图像和第二样本图像。其中,N为大于等于1的整数。In an optional manner, when the vehicle is driving, the vehicle-mounted camera device adopts the method of shooting video to capture the image in front of the vehicle, and from the video collected by the vehicle-mounted camera device, the first sample image and the second sample image at an interval of N frames are obtained. image. Wherein, N is an integer greater than or equal to 1.
在另一种可选的方式中,车辆行驶时,通过车载摄像装置采用间隔预设时间拍摄一帧图像的方式拍摄车辆前方图像,从车载摄像装置拍摄的图像中获取第一样本图像和第二样本图像。In another optional manner, when the vehicle is driving, the vehicle-mounted camera device captures images in front of the vehicle by taking a frame of images at intervals of a preset time, and obtains the first sample image and the second sample image from the images captured by the vehicle-mounted camera device. Two sample images.
在获取到第一样本图像和第二样本图像之后,使用预训练的语义分割模型对第一样本图像进行语义分割得到第一语义分割结果,并使用该语义分割模型对第二样本图像进行语义分割得到第二语 义分割结果。其中,第一语义分割结果可以包括天空区域、路面区域、行人区域、车辆区域和其他在第一样本图像中出现的图像区域。第二语义分割结果可以包括天空区域、路面区域、行人区域、车辆区域和其他在第二样本图像中出现的图像区域。After obtaining the first sample image and the second sample image, use the pre-trained semantic segmentation model to perform semantic segmentation on the first sample image to obtain the first semantic segmentation result, and use the semantic segmentation model to perform semantic segmentation on the second sample image The semantic segmentation obtains a second semantic segmentation result. Wherein, the first semantic segmentation result may include sky area, road area, pedestrian area, vehicle area and other image areas appearing in the first sample image. The second semantic segmentation result may include sky area, road area, pedestrian area, vehicle area and other image areas appearing in the second sample image.
S2:基于第一语义分割结果和第二语义分割结果,确定第一样本图像中的第一静态区域和第二样本图像中的第二静态区域。S2: Based on the first semantic segmentation result and the second semantic segmentation result, determine a first static area in the first sample image and a second static area in the second sample image.
根据第一语义分割结果确定第一样本图像中的第一静态区域,例如第一样本图像中的路面区域、山峰区域和路侧固定物(例如电线杆和交通灯)等等的静态区域。Determine the first static area in the first sample image according to the first semantic segmentation result, for example, the static area of the road surface area, mountain peak area and roadside fixed objects (such as utility poles and traffic lights) in the first sample image, etc. .
根据第二语义分割结果确定第一样本图像中的第二静态区域,例如第二样本图像中的路面区域、山峰区域、路侧固定物(例如电线杆和交通灯)等等的静态区域。The second static area in the first sample image is determined according to the second semantic segmentation result, such as static areas of road surface area, mountain peak area, roadside fixed objects (such as utility poles and traffic lights) in the second sample image.
S3:基于第一样本图像与第二样本图像的帧间姿态信息、以及第一样本图像的点云数据,确定第一样本图像与第二样本图像间的像素点映射关系。S3: Determine the pixel point mapping relationship between the first sample image and the second sample image based on the pose information between the frames of the first sample image and the second sample image and the point cloud data of the first sample image.
第一样本图像的点云数据与第一样本图像中目标物的像素点之间存在映射关系。基于第一样本图像与第二样本图像的帧间姿态信息,可以计算出第一样本图像与第二样本图像间目标物的像素位移,因此基于第一样本图像与第二样本图像的帧间姿态信息、以及第一样本图像的点云数据,可以确定第一样本图像与第二样本图像间的像素点映射关系。There is a mapping relationship between the point cloud data of the first sample image and the pixel points of the target object in the first sample image. Based on the inter-frame pose information of the first sample image and the second sample image, the pixel displacement of the target object between the first sample image and the second sample image can be calculated, so based on the The pose information between frames and the point cloud data of the first sample image can determine the pixel point mapping relationship between the first sample image and the second sample image.
S4:基于像素点映射关系,确定第一静态区域与第二静态区域间的第一光流。S4: Determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship.
每次从第一静态区域中提取一个静态子区域(例如A电线杆的区域),并从第二静态区域中提取与之对应的静态子区域(即A电线杆的区域),结合第一样本图像与第二样本图像间的像素点映射关系,即可以得到该静态子区域在第一样本图像和第二样本图像间的光流真值。Each time a static sub-area is extracted from the first static area (such as the area of the utility pole A), and the corresponding static sub-area (ie the area of the utility pole A) is extracted from the second static area, combined with the first The pixel point mapping relationship between this image and the second sample image can obtain the true value of the optical flow of the static sub-region between the first sample image and the second sample image.
对第一静态区域的所有静态子区域、和第二静态区域与之对应的静态子区域均按照相同方式确定光流真值。The true value of the optical flow is determined in the same manner for all the static sub-regions of the first static region and the corresponding static sub-regions of the second static region.
汇总第一静态区域和第二静态区域所有的光流真值,记为第一光流。Summarize the true values of all the optical flows in the first static area and the second static area, and denote it as the first optical flow.
S5:基于第一光流,约束第一光流估计模型的训练。S5: Based on the first optical flow, constrain the training of the first optical flow estimation model.
将第一样本图像和第二样本图像组成一个样本图像对,以第一光流作为基于该样本图像对进行第一光流估计模型训练时的监督信息。需要说明的是,训练第一光流估计模型时需要多个样本图像对,对于每个样本图像对,均按照与步骤S1至S4的相同方式获取对应的第一光流。The first sample image and the second sample image are combined into a sample image pair, and the first optical flow is used as supervision information when the first optical flow estimation model is trained based on the sample image pair. It should be noted that when training the first optical flow estimation model, multiple sample image pairs are required, and for each sample image pair, the corresponding first optical flow is obtained in the same manner as steps S1 to S4.
在训练第一光流估计模型时,从所有样本图像对中获取部分样本图像对和与之对应的第一光流组成训练集,将剩余的样本图像对和与之对应的第一光流组成验证集,用训练集进行训练,用验证集进行验证。在使用训练集进行训练时,对于训练集中包括的样本图像对中除了静态区域以外的其他区域,可以采用现有光流估计模型的训练方式进行训练。当第一光流估计模型的训练满足预设终止条件(例如模型迭代次数达到预定迭代次数,或者超过预设模型预测精度阈值)时,终止第一光流估计模型的训练,得到第一光流估计模型。When training the first optical flow estimation model, some sample image pairs and the corresponding first optical flow are obtained from all sample image pairs to form the training set, and the remaining sample image pairs and the corresponding first optical flow are composed Validation set, use training set for training and validation set for validation. When using the training set for training, for areas other than the static area in the sample image pairs included in the training set, the existing optical flow estimation model training method can be used for training. When the training of the first optical flow estimation model meets the preset termination condition (for example, the number of model iterations reaches the predetermined number of iterations, or exceeds the preset model prediction accuracy threshold), the training of the first optical flow estimation model is terminated to obtain the first optical flow Estimation model.
在本公开实施例中,由于第一光流是基于点云数据进行处理后得到的,是光流真值,将光流真值作为模型训练时的监督信息,可以让训练出的第一光流估计模型在精度上明显高于自监督的方法。此外,由于车辆在灯光或日光下会有影子,在光流估计的过程中会出现因为影子移动而产生光流,在本公开的实施例中,由于影子所在的图像区域存在部分光流真值,从而可以降低影子对光流估计模型的影响,引导模型正确学习。In the embodiment of the present disclosure, since the first optical flow is obtained after processing based on point cloud data, it is the true value of optical flow, and the true value of optical flow is used as the supervision information during model training, so that the trained first optical flow Flow estimation models significantly outperform self-supervised methods in accuracy. In addition, due to the shadow of the vehicle under the light or sunlight, the optical flow will be generated due to the movement of the shadow during the optical flow estimation process. In the embodiment of the present disclosure, due to the partial true value of the optical flow in the image area where the shadow , which can reduce the influence of shadows on the optical flow estimation model and guide the model to learn correctly.
图2是本公开一个实施例中步骤S3的流程示意图。如图2所示,步骤S3包括:Fig. 2 is a schematic flowchart of step S3 in an embodiment of the present disclosure. As shown in Figure 2, step S3 includes:
S3-1:基于第一样本图像与第二样本图像的帧间姿态信息,将第一样本图像的点云数据分别投影至第一样本图像和第二样本图像。S3-1: Project the point cloud data of the first sample image to the first sample image and the second sample image respectively based on the inter-frame pose information of the first sample image and the second sample image.
在一种可选的方式中,通过位姿传感器(例如陀螺仪)获取拍摄第一样本图像时摄像装置的位姿信息,并通过位姿传感器获取拍摄第二样本图像时摄像装置的位姿信息,进而得到第一样本图像与第二样本图像的帧间姿态信息。In an optional manner, the pose information of the camera device when shooting the first sample image is obtained through a pose sensor (such as a gyroscope), and the pose information of the camera device when the second sample image is shot is obtained through the pose sensor information, and then obtain inter-frame pose information of the first sample image and the second sample image.
第一样本图像的点云数据与第一样本图像中目标物的像素点之间存在映射关系。基于第一样本图像的点云数据,以及该点云数据与像素点之间的映射关系,可以将第一样本图像的点云数据投影到第一样本图像上。There is a mapping relationship between the point cloud data of the first sample image and the pixel points of the target object in the first sample image. Based on the point cloud data of the first sample image and the mapping relationship between the point cloud data and the pixel points, the point cloud data of the first sample image can be projected onto the first sample image.
基于第一样本图像与第二样本图像的帧间姿态信息,可以得到第一样本图像与第二样本图像之间像素点的位移关系,进而基于该位移关系,可以将第一样本图像的点云数据投影到第二样本图像上。Based on the inter-frame attitude information of the first sample image and the second sample image, the displacement relationship of the pixels between the first sample image and the second sample image can be obtained, and then based on the displacement relationship, the first sample image can be The point cloud data of is projected onto the second sample image.
S3-2:基于第一样本图像的点云数据在第一样本图像上的投影点位置,以及第一样本图像的点云数据在第二样本图像上的投影点位置,确定第一样本图像与第二样本图像间的像素点映射关系。S3-2: Based on the projection point position of the point cloud data of the first sample image on the first sample image, and the projection point position of the point cloud data of the first sample image on the second sample image, determine the first A pixel point mapping relationship between the sample image and the second sample image.
通过计算代表目标物的同一投影点在第一样本图像与第二样本图像之间的位置对应关系,即可得到该目标物的投影点在第一样本图像与第二样本图像间的像素点的映射关系。对第一样本图像的点云数据的所有投影点,均计算在第一样本图像与第二样本图像之间的位置对应关系,即可得到第一样本图像与第二样本图像间的像素点映射关系。By calculating the position correspondence between the same projection point representing the target object between the first sample image and the second sample image, the pixel of the projection point of the target object between the first sample image and the second sample image can be obtained point mapping. For all the projected points of the point cloud data of the first sample image, the position correspondence between the first sample image and the second sample image is calculated, and the relationship between the first sample image and the second sample image can be obtained Pixel mapping relationship.
在本实施例中,将第一样本图像的点云数据分别投影至第一样本图像和第二样本图像,基于投影点在第一样本图像与第二样本图像之间的位置对应关系,即可准确定地得到第一样本图像与第二样本图像间的像素点映射关系。In this embodiment, the point cloud data of the first sample image is respectively projected to the first sample image and the second sample image, based on the position correspondence between the projected points between the first sample image and the second sample image , the pixel point mapping relationship between the first sample image and the second sample image can be accurately obtained.
在本公开的一个实施例中,在步骤S5之前,优选在步骤S2之前,还包括:基于第一语义分割结果,确定第一目标噪声区域;基于第二语义分割结果,确定第二目标噪声区域;将第一目标噪声区域和第二目标噪声区域间的第二光流设为0。相应地,步骤S5包括:基于第一光流和第二光流,约束第一光流估计模型的训练。In one embodiment of the present disclosure, before step S5, preferably before step S2, further includes: determining the first target noise region based on the first semantic segmentation result; determining the second target noise region based on the second semantic segmentation result ; Set the second optical flow between the first target noise area and the second target noise area to 0. Correspondingly, step S5 includes: constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow.
在本实施例中,第一目标噪声区域可以是第一样本图像中的天空区域,第二目标噪声区域可以是第二样本图像中的天空区域。由于天空中的光流在实际应用中没有意义,因此在训练第一光流估计模型时,将天空区域的光流置0,能够减少图像噪点对模型效果的影响,并产生更加清晰的边界,可视化效果好。In this embodiment, the first target noise area may be the sky area in the first sample image, and the second target noise area may be the sky area in the second sample image. Since the optical flow in the sky is meaningless in practical applications, when training the first optical flow estimation model, setting the optical flow in the sky area to 0 can reduce the influence of image noise on the model effect and produce a clearer boundary. Good visualization.
图3为本公开一个实施例中确定第三光流的流程示意图。如图3所示,在步骤S5之前,还包括:Fig. 3 is a schematic flowchart of determining a third optical flow in an embodiment of the present disclosure. As shown in Figure 3, before step S5, it also includes:
A:按照预设的裁剪方式,分别对第一样本图像和第二样本图像进行裁剪,得到第一裁剪图像和第二裁剪图像。其中,第一裁剪图像和第二裁剪图像的尺寸相同,且第一裁剪图像在第一样本图像中的位置区域,与第二裁剪图像在第二样本图像中的位置区域,两者位置区域相对应。A: According to the preset cropping method, the first sample image and the second sample image are respectively cropped to obtain the first cropped image and the second cropped image. Wherein, the size of the first cropped image and the second cropped image are the same, and the position area of the first cropped image in the first sample image is the same as the position area of the second cropped image in the second sample image, and both position areas Corresponding.
B:基于第一光流估计模型对第一裁剪图像和第二裁剪图像进行处理,得到第一光流估计结果。在一个实现方式中,可利用初始的第一光流估计模型对第一裁剪图像和第二裁剪图像进行光流估计,得到第一光流估计结果。B: Process the first cropped image and the second cropped image based on the first optical flow estimation model to obtain a first optical flow estimation result. In an implementation manner, an initial first optical flow estimation model may be used to perform optical flow estimation on the first cropped image and the second cropped image to obtain a first optical flow estimation result.
C:利用预训练的第二光流估计模型对第一样本图像和第二样本图像进行处理,得到第一样本图像与第二样本图像间的第二光流估计结果。其中,第二光流估计模型是预先训练好的,其模型预测精度要高于未训练完成的第一光流估计模型的精度。C: Using the pre-trained second optical flow estimation model to process the first sample image and the second sample image to obtain a second optical flow estimation result between the first sample image and the second sample image. Wherein, the second optical flow estimation model is pre-trained, and its model prediction accuracy is higher than that of the untrained first optical flow estimation model.
D:基于第一光流估计结果和第二光流估计结果,确定第三光流。即利用预训练且精度高的第二光流估计模型预测出的第二光流估计结果,与第一光流估计结果进行对比处理,根据对比处理结果 可以确定作为第一光流估计模型的监督信息的第三光流。D: Determine a third optical flow based on the first optical flow estimation result and the second optical flow estimation result. That is, the second optical flow estimation result predicted by the pre-trained and high-precision second optical flow estimation model is compared with the first optical flow estimation result, and the supervision of the first optical flow estimation model can be determined according to the comparison processing result. The third optical flow of information.
相应地,步骤S5包括:基于第一光流、第二光流和第三光流,约束第一光流估计模型的训练。Correspondingly, step S5 includes: constraining the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.
在本实施例中,利用第一光流估计模型对裁剪后图像对进行预测得到的第一光流估计结果,与利用预训练的第二光流估计模型对原始的样本图像对进行处理得到的第二光流估计结果,将第一光流估计结果与第二光流估计结果进行比较,即可确定作为第一光流估计模型的监督信息的第三光流,基于第三光流对第一光流估计模型进行约束,可以有效提升训练后第一光流估计模型的预测准确度。In this embodiment, the first optical flow estimation result obtained by using the first optical flow estimation model to predict the cropped image pair is the same as the first optical flow estimation result obtained by processing the original sample image pair using the pre-trained second optical flow estimation model The second optical flow estimation result, comparing the first optical flow estimation result with the second optical flow estimation result, can determine the third optical flow as the supervisory information of the first optical flow estimation model, based on the third optical flow Constraining the first optical flow estimation model can effectively improve the prediction accuracy of the first optical flow estimation model after training.
图4是本公开一个实施例中步骤D的流程示意图。如图4所示,步骤D包括:Fig. 4 is a schematic flowchart of step D in an embodiment of the present disclosure. As shown in Figure 4, step D includes:
D-1:基于第一光流估计结果,确定第一裁剪图像与第二裁剪图像间的遮挡区域。其中,从第一光流估计结果中获取第一裁剪图像与第二裁剪图像的前向光流和后向光流,基于前向光流和后向光流进行校验,从而确定第一裁剪图像与第二裁剪图像间的遮挡区域。D-1: Based on the first optical flow estimation result, determine an occlusion area between the first cropped image and the second cropped image. Among them, the forward optical flow and backward optical flow of the first cropped image and the second cropped image are obtained from the first optical flow estimation result, and the verification is performed based on the forward optical flow and backward optical flow, so as to determine the first cropping The occluded area between the image and the second cropped image.
D-2:基于第二光流估计结果,确定第一样本图像和第二样本图像间的非遮挡区域。其中,从第二光流估计结果中获取第一样本图像和第二样本图像的前向光流和后向光流,基于前向光流和后向光流进行校验,从而确定第一样本图像和第二样本图像间的非遮挡区域。D-2: Determine a non-occluded area between the first sample image and the second sample image based on the second optical flow estimation result. Wherein, the forward optical flow and the backward optical flow of the first sample image and the second sample image are obtained from the second optical flow estimation result, and the verification is performed based on the forward optical flow and the backward optical flow, so as to determine the first The non-occluded area between the sample image and the second sample image.
D-3:基于第一裁剪图像与第二裁剪图像间的遮挡区域,以及第一样本图像和第二样本图像间的非遮挡区域,确定第一样本图像与第二样本图像间的目标区域。其中,目标区域为在第一裁剪图像与第二裁剪图像间为遮挡区域,且在第一样本图像与第二样本图像间为非遮挡区域的区域。D-3: Determine the target between the first sample image and the second sample image based on the occluded area between the first cropped image and the second cropped image, and the non-occluded area between the first sample image and the second sample image area. Wherein, the target area is an occlusion area between the first cropping image and the second cropping image, and a non-occlusion area between the first sample image and the second sample image.
D-4:将第二光流估计结果中,第一样本图像与第二样本图像间目标区域的光流确定为第三光流。D-4: Determining the optical flow of the target area between the first sample image and the second sample image in the second optical flow estimation result as the third optical flow.
在本实施例中,裁剪图像相对于原始的样本图像裁剪掉了部分的图像区域。由于第一光流估计结果是利用非训练完成的第一光流估计模型对样本图像的全图进行光流估计得到的结果,且第二光流估计结果是利用预先训练好的第二光流估计模型对裁剪图像光流估计得到的结果,因此当第一光流估计结果与第二光流估计结果对遮挡区域和非遮挡区域的判定不一致时,以第二光流估计结果为准,即从第二光流估计结果中获取针对目标区域的第三光流,作为第一光流估计模型在模型训练时的监督信息,可以有效提升第一光流估计模型训练后的预测精度。In this embodiment, the cropped image crops out a part of the image area relative to the original sample image. Since the first optical flow estimation result is the result obtained by using the non-trained first optical flow estimation model to perform optical flow estimation on the full image of the sample image, and the second optical flow estimation result is the result of using the pre-trained second optical flow The estimation model estimates the results of the optical flow of the cropped image. Therefore, when the first optical flow estimation result and the second optical flow estimation result are inconsistent in the judgment of the occluded area and the non-occluded area, the second optical flow estimation result shall prevail, that is Obtaining the third optical flow for the target area from the second optical flow estimation result as the supervision information of the first optical flow estimation model during model training can effectively improve the prediction accuracy of the first optical flow estimation model after training.
在本公开的一个实施例中,上述预设的裁剪方式为去除外边框的方式裁剪。即第一裁剪图像相对于第一样本图像去除预设外边框尺寸的图像范围,第二裁剪图像相对于第二样本图像去除预设外边框尺寸的图像范围。In an embodiment of the present disclosure, the above-mentioned preset cropping mode is cropping by removing an outer frame. That is, the first cropped image removes the image range of the preset outer frame size relative to the first sample image, and the second cropped image removes the image range of the preset outer frame size relative to the second sample image.
在本实施例中,由于采用外边框裁剪,使得裁剪图像相对于原始的样本图像裁剪掉了外边框图像区域,当样本图像间的采集时间间隔较短时,对裁剪图像进行光流估计可以有效避免因超出图像采集范围而误判为遮挡区域的情况。在此基础上,当第一光流估计结果与第二光流估计结果对遮挡区域和非遮挡区域的判定不一致时,以第二光流估计结果为准,从第二光流估计结果中获取针对目标区域的第三光流,作为第一光流估计模型在模型训练时的监督信息,可以有效提升第一光流估计模型训练后的预测精度。In this embodiment, due to the use of outer frame cropping, the cropped image is cut out of the outer frame image area relative to the original sample image. When the acquisition time interval between sample images is short, optical flow estimation on the cropped image can be effective Avoid misjudgment as an occluded area due to exceeding the image acquisition range. On this basis, when the first optical flow estimation result is inconsistent with the second optical flow estimation result in the determination of the occlusion area and the non-occlusion area, the second optical flow estimation result shall prevail, and the second optical flow estimation result shall be obtained from The third optical flow of the target area is used as the supervision information of the first optical flow estimation model during model training, which can effectively improve the prediction accuracy of the first optical flow estimation model after training.
图5是本公开一个实施例中训练第二光流估计模型的流程示意图。如图5所示,包括:Fig. 5 is a schematic flowchart of training a second optical flow estimation model in an embodiment of the present disclosure. As shown in Figure 5, including:
O:获取第三样本图像与第四样本图像间的前向光流,以及第三样本图像与第四样本图像间的后向光流。O: Obtain the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image.
当可以获取到第三样本图像和第四样本图像的采集时间时,可以根据图像采集时间的前后关系,确定第三样本图像和第四样本图像在采集时间上的在先图像和在后图像;当无法获取到第三样本图 像和第四样本图像的采集时间时,可以在第三样本图像和第四样本图像中选择一个图像作为在先图像,将剩余的图像作为在后图像。When the acquisition time of the third sample image and the fourth sample image can be obtained, the previous image and the subsequent image of the third sample image and the fourth sample image at the acquisition time can be determined according to the front and rear relationship of the image acquisition time; When the acquisition time of the third sample image and the fourth sample image cannot be obtained, one of the third sample image and the fourth sample image may be selected as the previous image, and the remaining images may be used as the subsequent image.
计算在先图像相对于在后图像的光流,作为第三样本图像与第四样本图像间的前向光流。Calculate the optical flow of the previous image relative to the subsequent image as the forward optical flow between the third sample image and the fourth sample image.
计算在后图像相对于在先图像的光流,作为第三样本图像与第四样本图像间的后向光流。The optical flow of the subsequent image relative to the previous image is calculated as a backward optical flow between the third sample image and the fourth sample image.
P:基于第三样本图像与第四样本图像间的前向光流,和第三样本图像与第四样本图像间的后向光流进行校验,确定第三样本图像与第四样本图像间的非遮挡区域。P: Verify based on the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image, and determine the distance between the third sample image and the fourth sample image non-occluded area.
基于作为第三样本图像与第四样本图像间的前向光流对后向光流进行仿射变换操作,计算仿射变换后的后向光流和前向光流之和。An affine transformation operation is performed on the backward optical flow based on the forward optical flow between the third sample image and the fourth sample image, and the sum of the affine-transformed backward optical flow and the forward optical flow is calculated.
对于第三样本图像与第四样本图像间的某个像素位置,计算仿射变换后的后向光流和前向光流之和的绝对值。For a certain pixel position between the third sample image and the fourth sample image, the absolute value of the sum of the backward optical flow and the forward optical flow after the affine transformation is calculated.
如果该绝对值小于预设阈值,则判定该像素位置为非遮挡位置;如果该绝对值大于等于预设阈值,则判定该像素位置为遮挡位置。If the absolute value is smaller than the preset threshold, it is determined that the pixel position is a non-occlusion position; if the absolute value is greater than or equal to the preset threshold, it is determined that the pixel position is an occlusion position.
对第三样本图像与第四样本图像间的所有像素位置采用相同方式判断是否为遮挡位置,将所有像素的非遮挡位置进行区域合并,即得到第三样本图像与第四样本图像间的非遮挡区域。Use the same method to determine whether all pixel positions between the third sample image and the fourth sample image are occluded positions, and combine the non-occluded positions of all pixels to obtain the non-occluded position between the third sample image and the fourth sample image area.
Q:基于非遮挡区域在第三样本图像和第四样本图像中位置,使用光度误差损失函数,训练第二光流估计模型。Q: Based on the position of the non-occluded area in the third sample image and the fourth sample image, use the photometric error loss function to train the second optical flow estimation model.
对第三样本图像与第四样本图像间的非遮挡区域的像素位置,基于光度误差损失函数计算光度误差,再基于光度误差的计算结果约束模型训练。For the pixel position of the non-occlusion area between the third sample image and the fourth sample image, the photometric error is calculated based on the photometric error loss function, and then model training is constrained based on the photometric error calculation result.
在本公开的一个实施例中,光度误差损失函数可以采用如下公式:In one embodiment of the present disclosure, the photometric error loss function can adopt the following formula:
光度损失误差=Lp(Ii*Ij)Photometric loss error = Lp(Ii*Ij)
Figure PCTCN2022123230-appb-000001
Figure PCTCN2022123230-appb-000001
其中,Lp为光度损失系数,SSIM(Ii,Ij)表示样本图像Ii与样本图像Ij间的结构相似参数,α表示权重,且α为常数。Among them, Lp is the photometric loss coefficient, SSIM(Ii, Ij) represents the structural similarity parameter between the sample image Ii and the sample image Ij, α represents the weight, and α is a constant.
需要说明的是,训练第二光流估计模型时需要多个样本图像对,对于每个样本图像对,均按照与步骤O至P的相同方式获取对应的非遮挡区域的光流。It should be noted that when training the second optical flow estimation model, multiple sample image pairs are required, and for each sample image pair, the optical flow of the corresponding non-occluded area is acquired in the same manner as steps O to P.
在训练第二光流估计模型时,从所有样本图像对中获取部分样本图像对和与之对应的非遮挡区域的光流组成训练集,将剩余的样本图像对和与之对应的非遮挡区域的光流组成验证集,用训练集进行训练,用验证集进行验证。当第二光流估计模型的训练满足预设终止条件(例如模型迭代次数达到预定迭代次数,或者超过预设模型预测精度阈值)时,终止第二光流估计模型的训练,得到第二光流估计模型。When training the second optical flow estimation model, the optical flow of some sample image pairs and corresponding non-occluded regions is obtained from all sample image pairs to form a training set, and the remaining sample image pairs and corresponding non-occluded regions The optical flow constitutes the verification set, the training set is used for training, and the verification set is used for verification. When the training of the second optical flow estimation model meets the preset termination condition (for example, the number of model iterations reaches a predetermined number of iterations, or exceeds the preset model prediction accuracy threshold), the training of the second optical flow estimation model is terminated to obtain the second optical flow Estimation model.
在本实施例中,通过训练针对非遮挡区域的第二光流估计模型,可以使得第二光流估计模型在针对非遮挡区域进行光流预测时,具有极高的预测精度,从而可以提升第三光流值的准确性,进而可以提升基于第三光流值约束第一光流估计模型进行训练后的模型的预测精度。In this embodiment, by training the second optical flow estimation model for non-occluded areas, the second optical flow estimation model can be made to have extremely high prediction accuracy when performing optical flow prediction for non-occluded areas, so that the second optical flow estimation model can be improved. The accuracy of the three optical flow values can further improve the prediction accuracy of the model trained based on the constraints of the third optical flow value and the first optical flow estimation model.
本公开实施例提供的任一种光流估计模型的训练方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种光流估计模型的训练方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种光流估计模型的训练方法。下文不再赘述。Any optical flow estimation model training method provided in the embodiments of the present disclosure may be executed by any appropriate device with data processing capability, including but not limited to: a terminal device, a server, and the like. Alternatively, the training method of any optical flow estimation model provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor executes any optical flow estimation model mentioned in the embodiments of the present disclosure by calling the corresponding instructions stored in the memory training method. I won't go into details below.
示例性装置Exemplary device
图6是本公开一个实施例的光流估计模型的训练装置的结构框图。如图6所示,光流估计模型的训练装置包括:语义分割模块100、静态区域确定模块200、映射关系确定模块300、第一光流确定模块400和约束训练模块500。Fig. 6 is a structural block diagram of an optical flow estimation model training device according to an embodiment of the present disclosure. As shown in FIG. 6 , the training device of the optical flow estimation model includes: a semantic segmentation module 100 , a static region determination module 200 , a mapping relation determination module 300 , a first optical flow determination module 400 and a constraint training module 500 .
其中,语义分割模块100用于对第一样本图像和第二样本图像进行语义分割,分别得到第一语义分割结果和第二语义分割结果;静态区域确定模块200用于基于所述第一语义分割结果和所述第二语义分割结果,确定所述第一样本图像中的第一静态区域和所述第二样本图像中的第二静态区域;映射关系确定模块300用于基于所述第一样本图像与所述第二样本图像的帧间姿态信息、以及所述第一样本图像的点云数据,确定所述第一样本图像与所述第二样本图像间的像素点映射关系;第一光流确定模块400用于基于所述像素点映射关系,确定所述第一静态区域与所述第二静态区域间的第一光流;约束训练模块500用于基于所述第一光流,约束第一光流估计模型的训练。Wherein, the semantic segmentation module 100 is used to perform semantic segmentation on the first sample image and the second sample image to obtain the first semantic segmentation result and the second semantic segmentation result respectively; the static region determination module 200 is used to perform semantic segmentation based on the first semantic segmentation The segmentation result and the second semantic segmentation result determine the first static area in the first sample image and the second static area in the second sample image; the mapping relationship determination module 300 is used to determine based on the first Inter-frame pose information of a sample image and the second sample image, and point cloud data of the first sample image, determine pixel point mapping between the first sample image and the second sample image relationship; the first optical flow determination module 400 is used to determine the first optical flow between the first static region and the second static region based on the pixel point mapping relationship; the constraint training module 500 is used to determine the first optical flow between the first static region and the second static region based on the first An optical flow that constrains the training of the first optical flow estimation model.
图7是本公开一个实施例的映射关系确定模块300的结构框图。如图7所示,映射关系确定模块300包括:Fig. 7 is a structural block diagram of a mapping relationship determining module 300 according to an embodiment of the present disclosure. As shown in Figure 7, the mapping relationship determination module 300 includes:
投影单元301,用于基于所述第一样本图像与所述第二样本图像的帧间姿态信息,将所述第一样本图像的点云数据分别投影至所述第一样本图像和所述第二样本图像;A projection unit 301, configured to project the point cloud data of the first sample image to the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image, respectively. the second sample image;
映射关系确定单元302,用于基于所述第一样本图像的点云数据在所述第一样本图像上的投影点位置,以及所述第一样本图像的点云数据在所述第二样本图像上的投影点位置,确定所述像素点映射关系。A mapping relation determining unit 302, configured to project point positions of the point cloud data of the first sample image on the first sample image, and the point cloud data of the first sample image on the first sample image The position of the projected point on the two-sample image is used to determine the pixel point mapping relationship.
图8是本公开另一个实施例中光流估计模型的训练装置的结构框图。如图8所示,光流估计模型的训练装置还包括:Fig. 8 is a structural block diagram of a training device for an optical flow estimation model in another embodiment of the present disclosure. As shown in Figure 8, the training device of the optical flow estimation model also includes:
噪声区域确定模块600,用于基于所述第一语义分割结果,确定第一目标噪声区域,并基于所述第二语义分割结果,确定第二目标噪声区域;A noise region determination module 600, configured to determine a first target noise region based on the first semantic segmentation result, and determine a second target noise region based on the second semantic segmentation result;
第二光流确定模块700,用于将所述第一目标噪声区域和所述第二目标噪声区域间的第二光流设为0;The second optical flow determination module 700 is configured to set the second optical flow between the first target noise area and the second target noise area to 0;
其中,所述约束训练模块500具体用于基于所述第一光流和所述第二光流,约束所述第一光流估计模型的训练。Wherein, the constraint training module 500 is specifically configured to constrain the training of the first optical flow estimation model based on the first optical flow and the second optical flow.
如图8所示,光流估计模型的训练装置还包括:As shown in Figure 8, the training device of the optical flow estimation model also includes:
裁剪模块800,用于按照预设的裁剪方式,分别对所述第一样本图像和所述第二样本图像进行裁剪,得到第一裁剪图像和第二裁剪图像;A cropping module 800, configured to respectively crop the first sample image and the second sample image according to a preset cropping method to obtain a first cropped image and a second cropped image;
第一光流估计模块900,用于基于所述第一光流估计模型对所述第一裁剪图像和所述第二裁剪图像进行处理,得到第一光流估计结果;The first optical flow estimation module 900 is configured to process the first cropped image and the second cropped image based on the first optical flow estimation model to obtain a first optical flow estimation result;
第二光流估计模块1000,用于利用预训练的第二光流估计模型对所述第一样本图像和所述第二样本图像进行处理,得到所述第一样本图像与所述第二样本图像间的第二光流估计结果;The second optical flow estimation module 1000 is configured to use a pre-trained second optical flow estimation model to process the first sample image and the second sample image to obtain the first sample image and the second sample image. A second optical flow estimation result between two sample images;
第三光流确定模块1100,用于基于所述第一光流估计结果和所述第二光流估计结果,确定第三光流;A third optical flow determination module 1100, configured to determine a third optical flow based on the first optical flow estimation result and the second optical flow estimation result;
其中,所述约束训练模块500具体用于基于所述第一光流、所述第二光流和第三光流,约束所述第一光流估计模型的训练。Wherein, the constraint training module 500 is specifically configured to constrain the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.
图9是本公开一个实施例中第三光流确定模块1100的结构框图。如图9所示,第三光流确定模块1100包括:Fig. 9 is a structural block diagram of a third optical flow determination module 1100 in an embodiment of the present disclosure. As shown in Figure 9, the third optical flow determination module 1100 includes:
遮挡区域确定单元1101,用于基于所述第一光流估计结果,确定所述第一裁剪图像与所述第二裁剪图像间的遮挡区域;An occlusion area determination unit 1101, configured to determine an occlusion area between the first cropped image and the second cropped image based on the first optical flow estimation result;
非遮挡区域确定单元1102,用于基于所述第二光流估计结果,确定所述第一样本图像和所述第二样本图像间的非遮挡区域;A non-occlusion area determination unit 1102, configured to determine a non-occlusion area between the first sample image and the second sample image based on the second optical flow estimation result;
目标区域确定单元1103,用于基于所述第一裁剪图像与所述第二裁剪图像间的遮挡区域,以及所述第一样本图像和所述第二样本图像间的非遮挡区域,确定所述第一样本图像与所述第二样本图像间的目标区域,其中,所述目标区域为在所述第一裁剪图像与所述第二裁剪图像间为遮挡区域,且在所述第一样本图像与所述第二样本图像间为非遮挡区域的区域;A target area determining unit 1103, configured to determine the target area based on the occluded area between the first cropped image and the second cropped image, and the non-occluded area between the first sample image and the second sample image The target area between the first sample image and the second sample image, wherein the target area is an occluded area between the first cropped image and the second cropped image, and in the first An area between the sample image and the second sample image is a non-occlusion area;
第三光流确定单元1104,用于将所述第二光流估计结果中,所述第一样本图像与所述第二样本图像间所述目标区域的光流确定为所述第三光流。The third optical flow determination unit 1104 is configured to determine the optical flow of the target area between the first sample image and the second sample image in the second optical flow estimation result as the third optical flow flow.
在本公开的一个实施例中,所述预设的裁剪方式为以去除外边框的方式裁剪;其中,所述第一裁剪图像相对于所述第一样本图像去除了预设外边框尺寸的图像范围,所述第二裁剪图像相对于所述第二样本图像去除了所述预设外边框尺寸的图像范围。In an embodiment of the present disclosure, the preset cropping method is cropping by removing the outer border; wherein, the first cropped image has the preset outer border size removed relative to the first sample image An image range, the image range of which the preset outer frame size is removed from the second cropped image relative to the second sample image.
如图8所示,光流估计模型的训练装置还包括:As shown in Figure 8, the training device of the optical flow estimation model also includes:
光流获取模块1200,用于获取第三样本图像与第四样本图像间的前向光流,以及所述第三样本图像与所述第四样本图像间的后向光流;An optical flow acquisition module 1200, configured to acquire the forward optical flow between the third sample image and the fourth sample image, and the backward optical flow between the third sample image and the fourth sample image;
非遮挡区域确定模块1300,用于基于所述前向光流和所述后向光流进行校验,确定所述第三样本图像与所述第四样本图像间的非遮挡区域;A non-occlusion area determination module 1300, configured to perform verification based on the forward optical flow and the backward optical flow, and determine a non-occlusion area between the third sample image and the fourth sample image;
第二光流估计模型训练模块1400,用于基于所述非遮挡区域在所述第三样本图像和所述第四样本图像中位置,使用光度误差损失函数,训练所述第二光流估计模型。The second optical flow estimation model training module 1400 is configured to train the second optical flow estimation model based on the position of the non-occluded area in the third sample image and the fourth sample image, using a photometric error loss function .
需要说明的是,本公开实施例的光流估计模型的训练装置的具体实施方式与本公开实施例的光流估计模型的训练方法的具体实施方式类似,具体参见光流估计模型的训练方法部分,为了减少冗余,不作赘述。It should be noted that the specific implementation of the training device for the optical flow estimation model in the embodiment of the present disclosure is similar to the specific implementation of the training method for the optical flow estimation model in the embodiment of the present disclosure. For details, refer to the training method of the optical flow estimation model , in order to reduce redundancy, it is not repeated here.
示例性电子设备Exemplary electronic device
下面,参考图10来描述根据本公开实施例的电子设备。如图10所示,电子设备包括一个或多个处理器10和存储器20。Hereinafter, an electronic device according to an embodiment of the present disclosure is described with reference to FIG. 10 . As shown in FIG. 10 , the electronic device includes one or more processors 10 and memory 20 .
处理器10可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备中的其他组件以执行期望的功能。Processor 10 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
存储器20可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器10可以运行所述程序指令,以实现上文所述的本公开的各个实施例的光流估计模型的训练方法以及/或者其他期望的功能。在所述计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。Memory 20 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like. One or more computer program instructions can be stored on the computer-readable storage medium, and the processor 10 can execute the program instructions to implement the training method of the optical flow estimation model of the various embodiments of the present disclosure described above and/or other desired functionality. Various contents such as input signal, signal component, noise component, etc. may also be stored in the computer-readable storage medium.
在一个示例中,电子设备还可以包括:输入装置30和输出装置40,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。输入装置30可以例如键盘、鼠标等等。输出装置40可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。In one example, the electronic device may further include: an input device 30 and an output device 40, and these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown). The input device 30 can be, for example, a keyboard, a mouse, and the like. The output device 40 may include, for example, a display, a speaker, a printer, and a communication network and its connected remote output devices, among others.
当然,为了简化,图10中仅示出了该电子设备中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备还可以包括任何其他适当的组件。Of course, for simplicity, only some of the components related to the present disclosure in the electronic device are shown in FIG. 10 , and components such as bus, input/output interface, etc. are omitted. In addition, the electronic device may also include any other suitable components according to specific applications.
示例性计算机可读存储介质Exemplary computer readable storage medium
计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The computer readable storage medium may utilize any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。The basic principles of the present disclosure have been described above in conjunction with specific embodiments, but it should be pointed out that the advantages, advantages, effects, etc. mentioned in the present disclosure are only examples rather than limitations, and these advantages, advantages, effects, etc. Various embodiments of the present disclosure must have. In addition, the specific details disclosed above are only for the purpose of illustration and understanding, rather than limitation, and the above details do not limit the present disclosure to be implemented by using the above specific details.
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other. As for the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the related parts, please refer to the part of the description of the method embodiment.
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。The block diagrams of devices, devices, devices, and systems involved in the present disclosure are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be appreciated by those skilled in the art, these devices, devices, devices, systems may be connected, arranged, configured in any manner. Words such as "including", "comprising", "having" and the like are open-ended words meaning "including but not limited to" and may be used interchangeably therewith. As used herein, the words "or" and "and" refer to the word "and/or" and are used interchangeably therewith, unless the context clearly dictates otherwise. As used herein, the word "such as" refers to the phrase "such as but not limited to" and can be used interchangeably therewith.
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。The methods and apparatus of the present disclosure may be implemented in many ways. For example, the methods and apparatuses of the present disclosure may be implemented by software, hardware, firmware or any combination of software, hardware, and firmware. The above sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure can also be implemented as programs recorded in recording media, the programs including machine-readable instructions for realizing the method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
还需要指出的是,在本公开的装置、设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。It should also be pointed out that, in the devices, equipment and methods of the present disclosure, each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of the present disclosure.
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the disclosed embodiments to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims (10)

  1. 一种光流估计模型的训练方法,包括:A training method for an optical flow estimation model, comprising:
    对第一样本图像和第二样本图像进行语义分割,分别得到第一语义分割结果和第二语义分割结果;Semantic segmentation is performed on the first sample image and the second sample image to obtain a first semantic segmentation result and a second semantic segmentation result respectively;
    基于所述第一语义分割结果和所述第二语义分割结果,确定所述第一样本图像中的第一静态区域和所述第二样本图像中的第二静态区域;determining a first static region in the first sample image and a second static region in the second sample image based on the first semantic segmentation result and the second semantic segmentation result;
    基于所述第一样本图像与所述第二样本图像的帧间姿态信息、以及所述第一样本图像的点云数据,确定所述第一样本图像与所述第二样本图像间的像素点映射关系;Based on the pose information between the frames of the first sample image and the second sample image and the point cloud data of the first sample image, determine the distance between the first sample image and the second sample image The pixel point mapping relationship;
    基于所述像素点映射关系,确定所述第一静态区域与所述第二静态区域间的第一光流;determining a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;
    基于所述第一光流,约束第一光流估计模型的训练。Based on the first optical flow, constrain the training of the first optical flow estimation model.
  2. 根据权利要求1所述的光流估计模型的训练方法,其中,所述基于所述第一样本图像与所述第二样本图像的帧间姿态信息、以及所述第一样本图像的点云数据,确定所述第一样本图像与所述第二样本图像间的像素点映射关系,包括:The training method of the optical flow estimation model according to claim 1, wherein the inter-frame pose information based on the first sample image and the second sample image, and the points of the first sample image Cloud data, determining the pixel point mapping relationship between the first sample image and the second sample image, including:
    基于所述第一样本图像与所述第二样本图像的帧间姿态信息,将所述第一样本图像的点云数据分别投影至所述第一样本图像和所述第二样本图像;Projecting the point cloud data of the first sample image to the first sample image and the second sample image respectively based on inter-frame pose information of the first sample image and the second sample image ;
    基于所述第一样本图像的点云数据在所述第一样本图像上的投影点位置,以及所述第一样本图像的点云数据在所述第二样本图像上的投影点位置,确定所述像素点映射关系。Based on the projection point position of the point cloud data of the first sample image on the first sample image, and the projection point position of the point cloud data of the first sample image on the second sample image , to determine the pixel point mapping relationship.
  3. 根据权利要求1或2所述的光流估计模型的训练方法,其中,在所述基于所述第一光流,约束第一光流估计模型的训练之前,还包括:The training method of the optical flow estimation model according to claim 1 or 2, wherein, before the training of the first optical flow estimation model is constrained based on the first optical flow, further comprising:
    基于所述第一语义分割结果,确定第一目标噪声区域;determining a first target noise region based on the first semantic segmentation result;
    基于所述第二语义分割结果,确定第二目标噪声区域;determining a second target noise region based on the second semantic segmentation result;
    将所述第一目标噪声区域和所述第二目标噪声区域间的第二光流设为0;setting the second optical flow between the first target noise area and the second target noise area to 0;
    其中,所述基于所述第一光流,约束第一光流估计模型的训练,包括:Wherein, the training of constraining the first optical flow estimation model based on the first optical flow includes:
    基于所述第一光流和所述第二光流,约束所述第一光流估计模型的训练。Constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow.
  4. 根据权利要求3所述的光流估计模型的训练方法,其中,在基于所述第一光流和所述第二光流,约束所述第一光流估计模型的训练之前,还包括:The training method of the optical flow estimation model according to claim 3, wherein, before constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow, further comprising:
    按照预设的裁剪方式,分别对所述第一样本图像和所述第二样本图像进行裁剪,得到第一裁剪图像和第二裁剪图像;Cutting the first sample image and the second sample image respectively according to a preset clipping method to obtain a first cropped image and a second cropped image;
    基于所述第一光流估计模型对所述第一裁剪图像和所述第二裁剪图像进行处理,得到第一光流估计结果;Processing the first cropped image and the second cropped image based on the first optical flow estimation model to obtain a first optical flow estimation result;
    利用预训练的第二光流估计模型对所述第一样本图像和所述第二样本图像进行处理,得到所述第一样本图像与所述第二样本图像间的第二光流估计结果;Processing the first sample image and the second sample image by using a pre-trained second optical flow estimation model to obtain a second optical flow estimation between the first sample image and the second sample image result;
    基于所述第一光流估计结果和所述第二光流估计结果,确定第三光流;determining a third optical flow based on the first optical flow estimation result and the second optical flow estimation result;
    其中,所述基于所述第一光流和所述第二光流,约束所述第一光流估计模型的训练,包括:Wherein, the constraining the training of the first optical flow estimation model based on the first optical flow and the second optical flow includes:
    基于所述第一光流、所述第二光流和第三光流,约束所述第一光流估计模型的训练。Constraining the training of the first optical flow estimation model based on the first optical flow, the second optical flow and the third optical flow.
  5. 根据权利要求4所述的光流估计模型的训练方法,其中,所述基于所述第一光流估计结果和第二光流估计结果,确定第三光流,包括:The training method of an optical flow estimation model according to claim 4, wherein said determining a third optical flow based on the first optical flow estimation result and the second optical flow estimation result comprises:
    基于所述第一光流估计结果,确定所述第一裁剪图像与所述第二裁剪图像间的遮挡区域;determining an occlusion area between the first cropped image and the second cropped image based on the first optical flow estimation result;
    基于所述第二光流估计结果,确定所述第一样本图像和所述第二样本图像间的非遮挡区域;determining a non-occluded area between the first sample image and the second sample image based on the second optical flow estimation result;
    基于所述第一裁剪图像与所述第二裁剪图像间的遮挡区域,以及所述第一样本图像和所述第二样本图像间的非遮挡区域,确定所述第一样本图像与所述第二样本图像间的目标区域,其中,所述目标区域为在所述第一裁剪图像与所述第二裁剪图像间为遮挡区域,且在所述第一样本图像与所述第二样本图像间为非遮挡区域的区域;Based on an occluded area between the first cropped image and the second cropped image, and a non-occluded area between the first sample image and the second sample image, determine the first sample image and the second sample image The target area between the second sample images, wherein the target area is an occluded area between the first cropped image and the second cropped image, and between the first sample image and the second cropped image The area between the sample images is a non-occluded area;
    将所述第二光流估计结果中,所述第一样本图像与所述第二样本图像间所述目标区域的光流确定为所述第三光流。In the second optical flow estimation result, the optical flow of the target area between the first sample image and the second sample image is determined as the third optical flow.
  6. 根据权利要求4所述的光流估计模型的训练方法,其中,所述预设的裁剪方式为以去除外边框的方式裁剪。The method for training an optical flow estimation model according to claim 4, wherein the preset clipping method is clipping by removing an outer frame.
  7. 根据权利要求4所述的光流估计模型的训练方法,其中,所述第二光流估计模型通过如下方式得到,包括:The training method of the optical flow estimation model according to claim 4, wherein the second optical flow estimation model is obtained by the following methods, comprising:
    获取第三样本图像与第四样本图像间的前向光流,以及所述第三样本图像与所述第四样本图像间的后向光流;Acquiring forward optical flow between the third sample image and the fourth sample image, and backward optical flow between the third sample image and the fourth sample image;
    基于所述前向光流和所述后向光流进行校验,确定所述第三样本图像与所述第四样本图像间的非遮挡区域;performing verification based on the forward optical flow and the backward optical flow, and determining a non-occluded area between the third sample image and the fourth sample image;
    基于所述非遮挡区域在所述第三样本图像和所述第四样本图像中位置,使用光度误差损失函数,训练所述第二光流估计模型。Based on the position of the non-occlusion area in the third sample image and the fourth sample image, the second optical flow estimation model is trained using a photometric error loss function.
  8. 一种光流估计模型的训练装置,包括:A training device for an optical flow estimation model, comprising:
    语义分割模块,用于对第一样本图像和第二样本图像进行语义分割,分别得到第一语义分割结果和第二语义分割结果;A semantic segmentation module, configured to perform semantic segmentation on the first sample image and the second sample image, to obtain the first semantic segmentation result and the second semantic segmentation result respectively;
    静态区域确定模块,用于基于所述第一语义分割结果和所述第二语义分割结果,确定所述第一样本图像中的第一静态区域和所述第二样本图像中的第二静态区域;A static area determining module, configured to determine a first static area in the first sample image and a second static area in the second sample image based on the first semantic segmentation result and the second semantic segmentation result area;
    映射关系确定模块,用于基于所述第一样本图像与所述第二样本图像的帧间姿态信息、以及所述第一样本图像的点云数据,确定所述第一样本图像与所述第二样本图像间的像素点映射关系;A mapping relationship determination module, configured to determine the relationship between the first sample image and the second sample image based on the inter-frame pose information of the first sample image and the second sample image and the point cloud data of the first sample image The pixel point mapping relationship between the second sample images;
    第一光流确定模块,用于基于所述像素点映射关系,确定所述第一静态区域与所述第二静态区域间的第一光流;A first optical flow determination module, configured to determine a first optical flow between the first static area and the second static area based on the pixel point mapping relationship;
    约束训练模块,用于基于所述第一光流,约束第一光流估计模型的训练。The constraint training module is configured to constrain the training of the first optical flow estimation model based on the first optical flow.
  9. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-7任一所述的光流估计模型的训练方法。A computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute the optical flow estimation model training method according to any one of claims 1-7.
  10. 一种电子设备,所述电子设备包括:An electronic device comprising:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;memory for storing said processor-executable instructions;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-7任一所述的光流估计模型的训练方法。The processor is configured to read the executable instructions from the memory, and execute the instructions to implement the optical flow estimation model training method according to any one of claims 1-7.
PCT/CN2022/123230 2021-12-21 2022-09-30 Training method and apparatus for optical flow estimation model WO2023116117A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111572711.2A CN114239736A (en) 2021-12-21 2021-12-21 Method and device for training optical flow estimation model
CN202111572711.2 2021-12-21

Publications (1)

Publication Number Publication Date
WO2023116117A1 true WO2023116117A1 (en) 2023-06-29

Family

ID=80760431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/123230 WO2023116117A1 (en) 2021-12-21 2022-09-30 Training method and apparatus for optical flow estimation model

Country Status (2)

Country Link
CN (1) CN114239736A (en)
WO (1) WO2023116117A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239736A (en) * 2021-12-21 2022-03-25 北京地平线信息技术有限公司 Method and device for training optical flow estimation model
CN114972425A (en) * 2022-05-18 2022-08-30 北京地平线机器人技术研发有限公司 Training method of motion state estimation model, motion state estimation method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057509A1 (en) * 2017-08-16 2019-02-21 Nvidia Corporation Learning rigidity of dynamic scenes for three-dimensional scene flow estimation
CN110060264A (en) * 2019-04-30 2019-07-26 北京市商汤科技开发有限公司 Neural network training method, video frame processing method, apparatus and system
CN110910447A (en) * 2019-10-31 2020-03-24 北京工业大学 Visual odometer method based on dynamic and static scene separation
CN111581313A (en) * 2020-04-25 2020-08-25 华南理工大学 Semantic SLAM robustness improvement method based on instance segmentation
CN112381868A (en) * 2020-11-13 2021-02-19 北京地平线信息技术有限公司 Image depth estimation method and device, readable storage medium and electronic equipment
CN113570713A (en) * 2021-07-05 2021-10-29 北京科技大学 Semantic map construction method and device for dynamic environment
CN113762173A (en) * 2021-09-09 2021-12-07 北京地平线信息技术有限公司 Training method and device for human face light stream estimation and light stream value prediction model
CN114239736A (en) * 2021-12-21 2022-03-25 北京地平线信息技术有限公司 Method and device for training optical flow estimation model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057509A1 (en) * 2017-08-16 2019-02-21 Nvidia Corporation Learning rigidity of dynamic scenes for three-dimensional scene flow estimation
CN110060264A (en) * 2019-04-30 2019-07-26 北京市商汤科技开发有限公司 Neural network training method, video frame processing method, apparatus and system
CN110910447A (en) * 2019-10-31 2020-03-24 北京工业大学 Visual odometer method based on dynamic and static scene separation
CN111581313A (en) * 2020-04-25 2020-08-25 华南理工大学 Semantic SLAM robustness improvement method based on instance segmentation
CN112381868A (en) * 2020-11-13 2021-02-19 北京地平线信息技术有限公司 Image depth estimation method and device, readable storage medium and electronic equipment
CN113570713A (en) * 2021-07-05 2021-10-29 北京科技大学 Semantic map construction method and device for dynamic environment
CN113762173A (en) * 2021-09-09 2021-12-07 北京地平线信息技术有限公司 Training method and device for human face light stream estimation and light stream value prediction model
CN114239736A (en) * 2021-12-21 2022-03-25 北京地平线信息技术有限公司 Method and device for training optical flow estimation model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WULFF JONAS; SEVILLA-LARA LAURA; BLACK MICHAEL J.: "Optical Flow in Mostly Rigid Scenes", 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOCIETY, US, 21 July 2017 (2017-07-21), US , pages 6911 - 6920, XP033250057, ISSN: 1063-6919, DOI: 10.1109/CVPR.2017.731 *

Also Published As

Publication number Publication date
CN114239736A (en) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2023116117A1 (en) Training method and apparatus for optical flow estimation model
US10636152B2 (en) System and method of hybrid tracking for match moving
JP7078139B2 (en) Video stabilization methods and equipment, as well as non-temporary computer-readable media
KR20080020595A (en) Human detection and tracking for security applications
CN111612842B (en) Method and device for generating pose estimation model
Heo et al. Appearance and motion based deep learning architecture for moving object detection in moving camera
WO2021027543A1 (en) Monocular image-based model training method and apparatus, and data processing device
WO2021013049A1 (en) Foreground image acquisition method, foreground image acquisition apparatus, and electronic device
WO2019033575A1 (en) Electronic device, face tracking method and system, and storage medium
US9865061B2 (en) Constructing a 3D structure
CN111241928B (en) Face recognition base optimization method, system, equipment and readable storage medium
CN110490910A (en) Object detection method, device, electronic equipment and storage medium
CN113592940B (en) Method and device for determining target object position based on image
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
KR101942646B1 (en) Feature point-based real-time camera pose estimation method and apparatus therefor
CN113592706B (en) Method and device for adjusting homography matrix parameters
CN116740126A (en) Target tracking method, high-speed camera, and storage medium
CN108509876B (en) Object detection method, device, apparatus, storage medium, and program for video
CN117115900B (en) Image segmentation method, device, equipment and storage medium
CN113743357A (en) Video representation self-supervision contrast learning method and device
CN111179331B (en) Depth estimation method, depth estimation device, electronic equipment and computer readable storage medium
JPWO2018179119A1 (en) Video analysis device, video analysis method, and program
CN113869163B (en) Target tracking method and device, electronic equipment and storage medium
CN116128922A (en) Object drop detection method, device, medium and equipment based on event camera
CN110602487B (en) Video image jitter detection method based on TSN (time delay network)

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE