CN112712571B

CN112712571B - Object plane mapping method, device and equipment based on video

Info

Publication number: CN112712571B
Application number: CN202011566751.1A
Authority: CN
Inventors: 林垠; 刘炎; 何山; 胡金水
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2023-12-01
Anticipated expiration: 2040-12-25
Also published as: CN112712571A

Abstract

The invention discloses an object plane mapping method, device and equipment based on video. The invention can not require complex operation to complete plane selection on the premise of user interaction, but primarily selects the candidate plane area by combining the video object plane detection technology with simple interaction operation of the user in the first stage, and judges whether the candidate plane area is available according to the image characteristics of the candidate plane area in multi-frame images in the second stage, thereby realizing the determination of the plane to be mapped. The method and the device can efficiently and conveniently determine the plane to be mapped while optimizing user experience, and further greatly improve the stability and effectiveness of image implantation.

Description

Object plane mapping method, device and equipment based on video

Technical Field

The present invention relates to the field of video processing, and in particular, to a method, an apparatus, and a device for object plane mapping based on video.

Background

The specific technical scenario related to the present invention is that specific image materials are added to the surface of an object, for example, but not limited to, applications such as 2D planar advertisement implantation on a video object, for which more specific applications are as follows: in a video of shooting a building body, a static plane image material or a dynamic plane image material of a certain advertisement is attached to one or more surfaces of the building body, so that the specific advertisement information can be displayed on the building body in the continuous multi-frame playing process of the video. The dynamic planar image material of the advertisement referred to herein is for video advertisements in the form of non-still pictures, and in the mapping operation, the video image material may actually be considered to be composed of a plurality of still planar pictures, and thus, the dynamic planar material also belongs to the object planar mapping category referred to in the present invention.

However, in the current mapping schemes based on video content, users who perform mapping operation need to perform relatively complex interactive operations, for example, users need to manually point the outer contour pixel points of the object plane where the image is expected to be implanted in the video according to their own judgment and experience, the process is complex, the professional requirement is relatively high, errors are easy to occur, the mapping schemes are not friendly to non-professional persons, and especially once the point of the outer contour point is not accurate or the selected area lacks significance, key information for locking the target plane is highly likely to be lost, so that accurate mapping is difficult to be reliably and effectively realized on the target plane.

Disclosure of Invention

In view of the foregoing, the present invention aims to provide a method, an apparatus and a device for object plane mapping based on video, and accordingly provides a computer readable storage medium and a computer program product, so as to solve the problems that mapping interaction is complicated, operation requirements are high, key information is easy to lose, quality of a selected plane is poor, and final mapping effect is even affected.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a video-based object plane mapping method, including:

acquiring a first frame image of a video to be processed set by a user, and clicking at least one target pixel point in the first frame image by the user;

detecting a candidate plane area from the first frame image according to the first frame image, the position information of the target pixel point and a pre-constructed plane detection model;

judging whether the candidate plane area is available or not based on the image characteristics of the candidate plane area in the first frame image and the subsequent multi-frame images;

and mapping the candidate plane area which is judged to be available as a target plane.

In at least one possible implementation manner, the determining, based on the image features of the candidate planar area in the first frame image and the subsequent multiple frame images, whether the candidate planar area is available includes:

Acquiring specific pixel points in the candidate plane area and tracking the specific pixel points in a plurality of video frames;

determining a plurality of stable pixel points which can be stably tracked from the specific pixel points;

and judging whether the candidate plane area is available or not according to the number of the stable pixel points and a preset number threshold value.

In at least one possible implementation manner, the acquiring and tracking the specific pixel point in the candidate plane area in a plurality of video frames includes:

extracting a plurality of corner points of the candidate plane area from the first frame image;

and tracking each corner in a plurality of video frames according to the image characteristics of the corner.

In at least one possible implementation manner, the tracking each corner in a plurality of video frames according to the image features of the corner includes:

forward optical flow tracking is performed for each of the corner points along the time axis of the plurality of video frames.

In at least one possible implementation manner, the forward optical flow tracking of each corner along the time axis of the plurality of video frames includes:

selecting a certain current corner point as a central pixel point in the candidate plane area of the current video frame, and setting a first window containing a plurality of adjacent pixel points;

Obtaining a plurality of candidate windows with the same size as the first window in the candidate plane area in the next frame of images which are adjacent or sampled at equal intervals according to the preset pixel displacement offset;

comparing the brightness of the pixel points contained in each candidate window with the brightness of the pixel points contained in the first window;

taking the candidate window which meets the preset brightness constant standard and has the minimum pixel displacement offset as a second window;

and taking the central pixel point of the second window as the corresponding pixel point of the tracked current corner point.

In at least one possible implementation manner, the determining, from the specific pixel points, a plurality of stable pixel points that can be stably tracked includes:

tracking a corresponding pixel point of a current corner point in a current video frame in a follow-up video frame of a preset frame number, and determining the current corner point as a stable pixel point of the current video frame;

and/or

Performing inverse tracking verification by using the corresponding pixel points tracked in the subsequent video frames;

and determining the current corner passing verification as a stable pixel point of the current video frame.

In at least one possible implementation manner, the performing inverse tracking verification by using the corresponding pixel points tracked in the subsequent video frame includes:

Tracking a first corresponding pixel point of a current corner point in the current video frame in a first subsequent frame adjacent to the current video frame or sampled at equal intervals;

acquiring a corresponding first inverse tracking corner point from a current video frame based on the first corresponding pixel point;

and verifying the current corner according to the position relation between the first inverse tracking corner and the current corner.

In at least one possible implementation manner, the verifying the current corner according to the position relationship between the first inverse tracking corner and the current corner includes:

when the coordinates of the first inverse tracking angular point and the coordinates of the current angular point meet a preset distance standard, determining that the current angular point passes verification;

or alternatively

Taking the current corner corresponding to the first inverse tracking corner meeting the distance standard as a to-be-determined corner;

tracking a second corresponding pixel point relative to the to-be-determined corner point in a second subsequent frame adjacent to the first subsequent frame or sampled at equal intervals;

based on the second corresponding pixel points, acquiring corresponding second inverse tracking corner points from the current video frame;

and when the coordinates of the second inverse tracking angular point and the coordinates of the current angular point meet the distance standard, determining that the current angular point passes verification.

In a second aspect, the present invention provides a video-based object plane mapping apparatus, including:

the input module is used for acquiring a first frame image of the video to be processed set by a user and at least one target pixel point clicked in the first frame image by the user;

the candidate plane area detection module is used for detecting a candidate plane area from the first frame image according to the first frame image, the position information of the target pixel point and a pre-constructed plane detection model;

the target plane screening module is used for judging whether the candidate plane area is available or not based on the image characteristics of the candidate plane area in the first frame image and the subsequent multi-frame images;

and the mapping module is used for performing mapping processing by taking the candidate plane area determined to be available as a target plane.

In at least one possible implementation manner, the target plane screening module includes:

the pixel tracking sub-module is used for acquiring specific pixel points in the candidate plane area and tracking the specific pixel points in a plurality of video frames;

a stable pixel determination submodule, configured to determine a plurality of stable pixel points that can be stably tracked from the specific pixel points;

And the target plane judging sub-module is used for judging whether the candidate plane area is available or not according to the number of the stable pixel points and a preset number threshold value.

In at least one possible implementation, the pixel tracking submodule includes:

the corner extraction unit is used for extracting a plurality of corners of the candidate plane area from the first frame image;

and the corner tracking unit is used for tracking each corner in a plurality of video frames according to the image characteristics of the corner.

In at least one possible implementation manner, the corner tracking unit is specifically configured to perform forward optical flow tracking on each of the corners along a time axis of a plurality of video frames.

In at least one possible implementation manner, the corner tracking unit includes:

the first window setting component is used for selecting a certain current corner point as a central pixel point in the candidate plane area of the current video frame, and setting a first window comprising a plurality of adjacent pixel points;

a candidate window construction component, configured to obtain a plurality of candidate windows with the same size as the first window in the candidate plane area in the next frame of image that is sampled at equal intervals or adjacent next to the first frame according to a preset pixel displacement offset;

The brightness comparison component is used for comparing the brightness of the pixel points contained in each candidate window with the brightness of the pixel points contained in the first window;

the second window determining component is used for taking the candidate window which meets the preset brightness constant standard and has the minimum pixel displacement offset as a second window;

and the tracking result determining component is used for taking the central pixel point of the second window as the corresponding pixel point of the tracked current corner point.

In at least one possible implementation manner, the stable pixel determination submodule includes:

the first stable pixel determining unit is used for tracking the corresponding pixel point of the current corner point in the current video frame in the subsequent video frames of the preset frame number, and determining the current corner point as the stable pixel point of the current video frame;

and/or a second stable pixel determination unit including:

the reverse verification subunit is used for performing reverse tracking verification by utilizing the corresponding pixel points tracked in the subsequent video frames;

and the stable pixel determination subunit is used for determining the current corner passing verification as a stable pixel point of the current video frame.

In at least one possible implementation thereof, the reverse authentication subunit includes:

The forward tracking component is used for tracking a first corresponding pixel point of a current corner point in the current video frame in a first subsequent frame adjacent to the current video frame or sampled at equal intervals;

the inverse tracking component is used for acquiring a corresponding first inverse tracking angular point from the current video frame based on the first corresponding pixel point;

and the verification component is used for verifying the current corner point according to the position relation between the first inverse tracking corner point and the current corner point.

In at least one possible implementation manner, the verification component is specifically configured to determine that the current corner is verified when the coordinates of the first inverse tracking corner and the coordinates of the current corner meet a preset distance criterion;

alternatively, the verification component specifically includes:

a to-be-determined angular point determining sub-component, configured to use a current angular point corresponding to the first inverse tracking angular point that meets the distance criterion as a to-be-determined angular point;

a forward tracking subassembly for tracking to a second corresponding pixel point relative to the pending corner point in a second subsequent frame adjacent to or equally spaced from the first subsequent frame;

the inverse tracking subassembly is used for acquiring a corresponding second inverse tracking angular point from the current video frame based on the second corresponding pixel point;

And the stability verification sub-component is used for determining that the current corner point passes verification when the coordinates of the second inverse tracking corner point and the coordinates of the current corner point meet the distance standard.

In a third aspect, the present invention provides a video-based object plane mapping apparatus, comprising:

one or more processors, a memory, and one or more computer programs, the memory may employ a non-volatile storage medium, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the device, cause the device to perform the method as in the first aspect or any of the possible implementations of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform at least the method as in the first aspect or any of the possible implementations of the first aspect.

In a fifth aspect, the invention also provides a computer program product for performing at least the method of the first aspect or any of the possible implementations of the first aspect, when the computer program product is executed by a computer.

In at least one possible implementation manner of the fifth aspect, the relevant program related to the product may be stored in whole or in part on a memory packaged with the processor, or may be stored in part or in whole on a storage medium not packaged with the processor.

The invention is based on the idea that on the premise of user interaction, plane selection is no longer required by complex and professional operations, but the finally required area to be mapped is locked by two automatic stages. Specifically, the first stage is to preliminarily select the candidate plane area by combining the video object plane detection technology with simple interactive operation of a user, and the second stage is to judge whether the candidate plane area is available or not according to the image characteristics of the candidate plane area in multi-frame images on the basis, and compared with the prior art, the two stages can greatly simplify the complexity of the user interaction flow, comprehensively and reliably obtain the characteristic information of the plane area to be mapped, and are not limited by non-professional and inaccurate hand selection areas of the user; furthermore, the image features of the candidate plane areas are displayed in a plurality of video frames, so that the availability of the candidate plane areas is automatically and efficiently screened, namely, the implantation reliability judgment is carried out by utilizing the information of the images, and the deviation of the processing effect caused by the lack of the background knowledge of the related technology is effectively avoided. The method and the device can efficiently and conveniently determine the required plane to be mapped while optimizing user experience, and further greatly improve the stability and effectiveness of image implantation.

Further, in the process from the candidate plane area to the finally determined available target plane, in some embodiments, the present invention uses specific image features in the candidate plane area to track in multiple frames of images, first determines stable tracking objects, and then uses the number of the tracking objects meeting the stability requirement as a main evaluation condition for finally measuring whether the current candidate plane area is available, thereby obtaining a reliable basis for the subsequent paste-out processing.

Further, in the process of judging the stability of the tracked object, in some embodiments, the invention uses dynamic brightness characteristics as main image characteristic information and performs tracking of the pixel points by combining an optical flow method, so that the pixel points meeting the stability requirement can be locked in a changed video frame.

Furthermore, in the process of judging the stability of the tracked object, the invention also adopts a reverse verification strategy in some preferred embodiments, after the forward tracking result is obtained according to the time axis of the video frame, the forward tracking result is not directly used as a final basis, the backward tracking is carried out by utilizing the forward tracking result, whether the performance of the forward tracking result in the front frame and the rear frame meets the requirement of stability is examined, thus the final tracked object determined by the verification strategy is more accurate and reliable, and further a powerful and effective evaluation support is provided for locking the target plane to be mapped.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the accompanying drawings, in which:

FIG. 1 is a flow chart of an embodiment of a video-based object plane mapping method provided by the present invention;

FIG. 2 is a flow chart of an embodiment of a candidate plane availability determination method provided by the present invention;

FIG. 3 is a flowchart of an embodiment of an optical flow tracking method provided by the present invention;

FIG. 4 is a flowchart of an embodiment of a reverse tracking verification method provided by the present invention;

FIG. 5 is a schematic diagram of a homography matrix provided by the present invention;

FIG. 6 is a schematic diagram of an embodiment of a video-based object plane mapping apparatus according to the present invention;

fig. 7 is a schematic diagram of an embodiment of a video-based object plane mapping apparatus provided by the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

Before introducing the specific technical scheme of the invention, the design purpose and logic of the invention are also needed to be described, the traditional mapping scheme based on video content at present usually requires a user to precisely trace out the plane area expected to be implanted by clicking the pixel points of the plane edge outline required by the user, the requirement on the clicking accuracy of the user is very high, in particular, the understanding of the plane edge outline in actual operation is different from person to person, ambiguity is often generated, and the difficulty is also increased for the interactive operation of the user; in addition, the image area given by the user is likely to be inaccurate in contour definition, insufficient in personal processing experience, poor in understanding of mapping processing requirements and the like, so that effective information of the manually selected image area is too little, for example, if the defined plane area lacks an obvious texture plane, available image feature points are difficult to detect or have poor stability, which directly causes obvious deviation of a homography matrix calculated subsequently, and on the basis, if image implantation is forcedly performed, the effect is necessarily poor.

The invention considers that when mapping objects in a video, one of key links is to ensure that the mapped position area is reliable, because the video has the characteristic of dynamic change, the objects in the image can change state along with the advancing of the video frame, and the operation of implanting the image on the surface of the object is challenged. Therefore, the method is initially convenient to locate in a mapping area which is stable and reliable for mapping operation, and after analysis and research are carried out on the traditional scheme, the method considers that the method is required to simplify complicated operation of a user, so that the interaction friendliness can be improved, the selected candidate plane can be ensured to cover comprehensive characteristic information required by subsequent steps, and the availability of the candidate plane can be accurately and efficiently judged on the basis. In view of this, the present invention provides an embodiment of a video-based object plane mapping method, which specifically may include:

Step S1, acquiring a first frame image of a video to be processed set by a user and at least one target pixel point clicked in the first frame image by the user.

The first frame image here is set by the user, in other words, the first frame image of the video file as the base material is not necessarily the first frame image described in this embodiment, and specifically, from which frame, it may be determined by the interactive operation of the user, that is, the user may select any frame image from the video file as the first frame image according to the actual requirement thereof.

Then, compared with the traditional scheme, the method does not require the user to perform multiple clicks on the surface profile of the object appearing in the frame image, and only the user is required to click at least one target pixel point. It should be noted that, the target pixel point refers to any one or more pixels on the surface of the object that the user expects to implant in the image (not necessarily the contour line of the object) in the frame video, and even if a plurality of target pixels are selected, there may not be any association between the plurality of pixels, and the function of the target pixel point is merely to indicate the position of the mapping plane actually expected by the user when a plurality of object planes appear in the first frame image, which will be specifically described in connection with the following steps.

And S2, detecting a candidate plane area from the first frame image according to the first frame image, the position information of the target pixel point and a pre-constructed plane detection model.

In order to achieve the foregoing object, the present embodiment proposes to train a plane detection model based on an interactive operation in advance, take the first frame image and the position of the target pixel point as main input information of the model, and then automatically divide a continuous candidate plane area (segmented plane) in the first frame image through the processing of the plane detection model. In practical operation, a set of plane detection models can be trained in advance by using the deep neural network, a first frame RGB image (not excluding a non-RGB image) of an image to be implanted is provided, and a user can automatically find a plane area expected to be implanted by the user and take the plane area as a candidate by clicking one or more pixel point coordinates in an object plane expected to be implanted on a given RGB image.

The plane detection technology is adopted, but the invention does not directly utilize the technology, and proposes a plane detection and segmentation scheme including but not limited to interactive saliency plane detection, namely, a plane of interest selected by a user in a first frame image of a video is defined as a final saliency plane, and the requirement of automatically detecting a candidate plane to be implanted according to a target pixel point selected by the user can be realized by training a neural network model.

With respect to saliency detection, it can be stated that this technique is mainly a mechanism of attention that simulates human vision, detecting and segmenting saliency areas in a scene. For the saliency detection model, the input is a picture to be detected, and the output is a binary mask of a saliency object in the picture (taking input RGB image as an example, the output can be a gray scale image of 0-1 or 0-255). During training, the significance detection model is typically optimized for parameters of the model using techniques including, but not limited to, cross entropy loss or minimum mean square error. The following describes an implementation procedure for selecting weighted cross entropy as a loss function, where a standard cross entropy loss function is defined as follows:

in the above formula, y represents a true labeled category, P is a prediction probability of a saliency detection model on a certain pixel point with respect to category 1 (generally defined as a category of a saliency object), correspondingly, 1-P represents a prediction probability of the point on category 0, and further, the prediction probability P can be rewritten as P _R The form of (2):

will P _R Substituting the formula (1) into the above formula (1) may rewrite the formula (1) as:

wherein N is the number of all pixel points in the image, I represents all pixel points in the image area, and P _Ri Is the probability that a pixel i predicts with respect to a saliency class (defined herein as class 1).

As can be seen from the above formula, the more similar the predicted result output after the network model is trained to the tag class of the real data, namely P _R The closer the value of 1, the closer the cross entropy loss function is to 0, and the better the network will be at detecting the salient target region, as will be appreciated by those skilled in the art, in this example the goal of the saliency detection model training is to minimize the cost function.

Of course, saliency detection is not an important point of the invention, the invention considers that the technology is utilized to avoid the complicated image segmentation operation with low accuracy of users, but if only a traditional saliency detection scheme is adopted, certain defects still exist for the video mapping scene focused by the invention, and analysis by the invention is that the saliency detection usually focuses on only the most obvious object in the whole image, and the segmentation mask of one of the most obvious objects is directly given after the image is input. However, for the first frame image of the video, there may be multiple plane areas that can be implanted, for example, multiple floors of a building or multiple buildings in the video frame, so if the plane that the user desires to implant is not the most "significant" plane in the whole image, it is difficult to apply the significance detection model directly to the specific task scene. Therefore, in this embodiment, it is proposed that, in combination with the interactive operation of the user, the position factors of any one or more target pixels selected by the user in the frame image are taken into consideration, that is, the saliency detection result is not directly obtained, but the image segmentation technology involved in the saliency detection is combined with the actual interaction of the user, so as to obtain a personalized interesting plane area which is not necessarily significant from the video frame. In a specific implementation, the weighted cross entropy loss function may be designed in combination with the aforementioned equation (3), as follows:

Wherein N is the number of pixel points in the image, I represents all the pixel points in the image area, and P _Ri Is the probability that a pixel i predicts with respect to a significance class (defined herein as class 1), and weight w _i Including but not limited to the following forms of single gaussian weights:

wherein (x, y) represents the coordinates of the target pixel point selected by the user, (x) _i ,y _i ) Is the coordinate of any point i on the image, sigma represents the standard deviation of the defined Gaussian weight, lambda is an adjustable super-parameter and is used for controlling the gradient size returned by the loss function; as one mention of the values, since the invention can not strictly limit the interactive operation of the user, when the number of the target pixel points clicked by the user is K more than 1, the weight w is given _i The definition mode of (1) can select a mixed weight model formed by combining K single Gaussian weights:

wherein, (x) _k ,y _k ) Representing coordinates of a target pixel point selected by a user in the kth interaction, (x) _i ,y _i ) Is the coordinates of any point i, sigma, on the image _k Represents the standard deviation, lambda of the defined Gaussian weights _k For the purpose of convenient operation, sigma can be calculated as an adjustable super parameter _k And lambda (lambda) _k Set to a uniform value, i.e. lambda ₁ ＝λ ₂ ＝…＝λ _k ，σ ₁ ＝σ ₂ ＝…＝σ _k 。

After the initial design of the invention and the defect analysis of the traditional means, an improved plane detection model for dividing the object plane in the image according to the user interaction can be constructed, in the model training process, for a group of image frames with a plurality of object planes, one candidate plane area where the target pixel points fall can be automatically selected according to the position of one or more target pixel points selected by the user, and then the weight of the plane detection model is updated based on the weighted cross entropy loss function set in the formula (5) or the formula (6), so that the area range where the plane of interest of the user is located can be directly given by the forward reasoning process of the plane detection model for carrying out usability screening on the candidate plane area based on the image characteristics in the subsequent steps.

Returning to step S3 of fig. 1, based on the image features of the candidate plane area in the first frame image and the subsequent multi-frame images, it is determined whether the candidate plane area is available.

After the foregoing steps are completed, the candidate plane area with comprehensive coverage information can be obtained, and because of the scene characteristics focused by the present invention, the candidate plane area may also change due to factors such as a shooting mode and an angle position of a video, so whether the candidate plane area obtained by the foregoing steps can be used for mapping processing needs to be further determined, where this embodiment proposes that the availability determination is performed by using the image features of the candidate plane area based on the current first frame image and a plurality of candidate video frames.

Specific discriminant strategies may be provided with reference to fig. 2, including:

step S31, specific pixel points in the candidate plane area are obtained and tracked in a plurality of video frames;

step S32, determining a plurality of stable pixel points which can be stably tracked from the specific pixel points;

and step S33, judging whether the candidate plane area is available or not according to the number of the stable pixel points and a preset number threshold value.

Specifically, some higher quality feature points (pixel points) in the candidate plane area in a plurality of video frames starting from the first frame image can be automatically detected and tracked, and some less stable pixel points are eliminated in the process, so that the stable pixel points are obtained. The term stable pixel point may refer to a high-quality pixel point that can be stably tracked during video advancing, and the term high-quality pixel point may refer to a plurality of corner points in the candidate plane area extracted from the first frame image, so that each corner point may be tracked in a plurality of video frames according to one or more image features of the corner point, such as contrast, color, gray value, and the like. Then, the number of the angular points which can be stably tracked in the candidate plane area can be counted, if the number of the angular points which can be stably tracked is higher than a set certain number threshold, the candidate plane area is considered to be a reliable area of an implantable image, namely the candidate plane area is available, otherwise, a user can be prompted to reset the first frame image and/or reselect the target pixel point, namely the new candidate plane area is obtained again.

Regarding the corner detection involved in this embodiment, it can be explained as follows: the embodiment considers that the corner points are used as specific pixel points, namely, some pixel points with obvious features in the image, and the corner points are considered to be more convenient to track in a plurality of video frames. In connection with the present invention, corner detection schemes may include, but are not limited to, the mature detection schemes of Harris corner, with specific reference to the following:

let the center of a window be at a position (x, y) of the image, the pixel value of this position being I (x, y), so that the window is displaced by a relatively small displacement u and v in the lateral x-direction and in the longitudinal y-direction, respectively, to a new position (x+u, y+v), the gray value of this position being I (x+u, y+v).

Wherein I (x+u, y+v) -I (x, y) is the change in the pixel shift gray value. w (x, y) is a window function at the position (x, y), its physical meaning is the weight of each pixel in the window, specifically, the weight scheme adopted may be a gaussian distribution set with the center of the window as the origin. Obviously E (u, v) will be relatively large for one corner point and so relatively easier to track.

For the aforementioned feature tracking, various implementations, for example, a descriptor-based tracking mode, generally uses information near the feature points to make the description of the feature points more robust, such as SIFT descriptors, generally calculates the extremum direction of the gradient histogram in the feature domain and establishes a scale space, so that the feature descriptors have scale and rotation invariance, and thus, the pixel point matching of different images of the video frame can be performed by using the descriptor tracking mode.

Further, it is also possible to consider forward optical flow tracking of feature points (preferably corner points) along the time axis of a plurality of video frames. The main concept of tracking by optical flow method is that the pixel points have constant brightness (gray invariance) and spatial consistency in continuous video frames:

first, the constant brightness refers to a corner point selected in the first frame image, such as a certain pixel point P (x, y), whose brightness is kept substantially unchanged over a plurality of video frames adjacent to each other or at a certain time interval, as shown in formula (8), where I (x, y, t) represents the brightness I of the pixel point P (x, y) at time t, and at time dt, the pixel point (x, y) may be shifted to reach a new coordinate position (x+dx, y+dy), however, due to the constant brightness, the following formula can be satisfied:

I(x,y,t)＝I(x+dx,y+dy,t+dt) (8)

the equation (8) can be further derived:

on the premise of constant brightness, namely:

further from mathematical derivations:

then, can makeThe method comprises the following steps:

I _x ·u+I _y ·v+I _t ＝0 (12)

regarding spatial consistency, since the optical flow is based on time continuity, one condition of adopting optical flow tracking is that the motion of a default object is in a small range, namely, the motion range of the extracted corner points does not change greatly along with the change of time, otherwise, if the displacement of a corner point between frames is too large, the effective tracking cannot be considered, and the spatial consistency characteristic of an optical flow method is obtained. This means that the same displacement variation (u, v) can be maintained for all the pixel points adjacent to the periphery of a certain extracted corner point. For example, a window range of size 5*5 is to be used, then for 25 pixels within this window, the following equation can be derived from equation (12) above for calculating (u, v):

For formula (13), the coordinate offset (u, v) in unit time can be further obtained by using a least square method, so that the coordinate position of a certain corner to be tracked in other video frames is obtained, that is, the specific pixel point is tracked in other frames.

For ease of understanding, the present invention provides in some preferred embodiments the logic for implementing optical flow forward tracking, as shown in FIG. 3, comprising:

step S310, selecting a certain current corner point as a central pixel point in the candidate plane area of the current video frame, and setting a first window containing a plurality of adjacent pixel points;

step S311, obtaining a plurality of candidate windows with the same size as the first window in the candidate plane area in the next frame of image which is sampled at the same interval or adjacent to the next frame according to the preset pixel displacement offset;

step S312, comparing the brightness of the pixel points contained in each candidate window with the brightness of the pixel points contained in the first window;

step 313, taking the candidate window which meets the preset brightness constant standard and has the minimum pixel displacement offset as a second window;

and step S314, taking the central pixel point of the second window as the corresponding pixel point of the tracked current corner point.

The current video frame in this example may refer to the aforementioned first frame image, or may refer to a video frame image after the first frame, depending on the starting point of the tracking operation currently performed, which is not limited to the present invention. It should be noted that tracking a corner alone does not accurately reflect the change of pixels in a continuously advancing video frame, and is particularly inconvenient for finding a pixel point in a subsequent frame that corresponds to the current corner. Thus, as described above, a first window centered on the current corner may be constructed in the current video frame, and then, in the candidate plane area in the next frame image sampled at the next adjacent or equal interval according to the preset pixel displacement offset (based on the optical flow characteristic, the offset may be set relatively smaller), a plurality of candidate windows with equal size are formed, for example, the abscissa of 25 pixels contained in the first window of 5*5 is integrally offset by 1 (or 2, etc.) pixel units in four directions, and then, the corresponding candidate windows of four 5*5 are also formed, or the abscissa of 49 pixels contained in the first window of 7*7 is integrally offset by 1 (or 2, etc.) pixel units in two directions, so that a plurality of candidate windows corresponding to the first window may be obtained in the next frame, which is referred to herein as "next frame" and is not limited to the next frame adjacent to the current frame, or, the next frame may be set at the preset interval, and then, if the next frame is a relatively rapid video sampling may be used, and the next frame may be taken as a relatively rapid video sampling frame; if the video changes relatively slowly, the next frame can be obtained according to a preset sampling interval, and the invention is not limited.

Then, the brightness of the pixel points included in the candidate window and the first window can be compared, and a second window as a target can be selected from the plurality of candidate windows according to the following two conditions. The first condition is that the brightness is constant, that is, the brightness of all or most of pixel points in a candidate window is consistent relative to the brightness of corresponding pixel points in a first window, the second condition is that a candidate window with the smallest pixel displacement offset is taken as a second window on the basis of consistent brightness, for example, after comparison, one candidate window which is displaced in the right-down direction by 3 pixel units is consistent with the brightness of the first window, and the other candidate window with the right-shifted abscissa by 1 pixel unit is consistent with the brightness of the first window, then the latter is preferably taken as a target window (second window), because the brightness is constant and the displacement is smallest, thus more accurate corresponding pixel points can be obtained. Then, the central pixel point of the second window is selected as the corresponding pixel point of the tracked current corner point, in other words, the corresponding corner point of the current corner point in the current frame is tracked in the next frame, and the position of the corresponding pixel point can be obtained through the coordinates of the current corner point and the calculated offset as described above.

In this way, it can be realized that tracking all the selected corner points in the current frame, as it is based on the requirement of the optical flow characteristic, the number of corner points after each round of tracking may change, that is, some corner points which are not satisfactory are filtered, for example, 200 corner points are extracted from the first frame image, the tracking process from the first frame to the next frame (for convenience of explanation, referred to as the second frame) may "keep track of 50 corner points, that is, 150 corresponding corner points are tracked in the second frame, then forward tracking is continued, 30 corner points may be kept track of again in the tracking process from the second frame to the third frame, so that the third frame of the video can only keep track of 120 corner points corresponding to the current frame, and so on, it is an implementation manner that the stable pixel points capable of being stably tracked are determined from the specific pixel points in the foregoing step 32, that is: and tracking the corresponding pixel point of the current corner point in the current video frame in the subsequent video frames of the preset frame number, and determining the current corner point as a stable pixel point of the current video frame.

In actual operation, if a certain pixel point can be tracked N times or more in a video frame adopted in the forward direction of the time axis, the pixel point can be considered as a pixel point that can be stably tracked, where N is the preset frame number. For example, if n=3, one corner a in the first frame image can be tracked on the second frame image by using an optical flow method, that is, a pixel a ' corresponding to the point a is found, and likewise, the pixel a ' in the second frame image can still be tracked on the third frame image, that is, an a "corresponding to the point a (a ') is found, then the point a can be considered as a pixel that can be stably tracked. In connection with the previous example, 120 corner points in the current frame corresponding to 120 pixel points found in the third frame may then be defined as "stable pixel points".

Then, whether the candidate plane area obtained by automatic segmentation is a reliable plane available for subsequent mapping can be judged according to the stable pixel points, specifically, the total number of pixel points which can be defined as the stable pixel points can be determined according to the tracking results of all video frames participating in tracking in the video material, then whether the number of the stable pixel points meets a preset number threshold, for example, the number threshold is 100 is examined, if the number of the stable pixel points meets the preset number threshold, the candidate plane area in the previous example is considered to be a reliable plane to be implanted, otherwise, the plane is considered to be not stably tracked, namely, the plane is not available, and then the user can be prompted to select the candidate plane area again, which is not repeated. It should be noted that the setting of the threshold value of the number is related to the reliability of the implantation effect of the subsequent image, that is, the higher the threshold value is, the more the implantation reliability is, and the weaker the implantation reliability is, otherwise; on the other hand, the condition that the number of stable pixels is taken as the condition for judging whether the candidate plane area is available or not is considered in the embodiment, because the design concept is that the implementation logic of the automatic plane detection is established, if the user manually selects points on the plane outline or selects the plane to be implanted based on experience and interest, key information is possibly missed by the selected plane, and the candidate plane which can cover the whole information is obtained by the mode of significance detection and interaction point selection just because the invention utilizes the mode of significance detection and interaction point selection, the number of reliable tracking points can be obviously increased, and the number of stable pixels is taken as the factor for evaluating whether the candidate plane is available or not in the link, so that the design context and the technical requirement of the invention for improving the mapping stability of the object surface in the video scene are more met.

Finally, it may be further added that, besides the foregoing manner of determining stable pixels by using the preset number of frames of forward tracking, in other preferred embodiments, the present invention also considers that another manner of eliminating unstable pixels may be adopted, that is, reverse tracking verification may be performed by using the corresponding pixels tracked in the subsequent video frame, and the current corner passing verification may be determined as the stable pixels of the current video frame, so as to improve the robustness and stability of the candidate plane availability determination process. In short, the concept uses the forward tracking result to reversely perform the 'reverse tracking' so as to verify the stability of a certain pixel, and it should be noted that the reverse tracking verification strategy can be independently used for determining the stability of the pixel, and can also be comprehensively considered with the previous strategy of continuous frame number according to the tracking result, so that the invention is not limited.

Regarding the implementation of the reverse trace verification, reference may be made to the following procedure shown in fig. 4:

step S321, tracking a first corresponding pixel point of a current corner point in a current video frame in a first subsequent frame adjacent to the current video frame or sampled at equal intervals;

Step S322, based on the first corresponding pixel point, acquiring a corresponding first inverse tracking angular point from the current video frame;

step S323, verifying the current corner according to the positional relationship between the first inverse tracking corner and the current corner.

For example, an angular point P1 is extracted from a current video frame, and a corresponding pixel point P2 of the angular point is tracked forward in an adjacent next frame image or a next frame image sampled at equal intervals by using an optical flow method, then, an inverse optical flow calculation may be performed by using the obtained P2 position, a specific optical flow tracking manner may refer to the foregoing description, so as to obtain a corresponding point P1 'of P2 in the current frame, and at this time, whether the coordinates of the first inverse tracking angular point P1' and the coordinates of the current angular point P1 meet a preset distance standard may be examined to determine whether the current angular point P1 passes verification. That is, if the distance between the pixel coordinates of P1' and P1 does not satisfy a certain preset threshold, P1 can be considered as a corner point that can be stably tracked, and otherwise P1 can be eliminated.

Based on the preferred embodiment, a more stable non-stable point elimination scheme may be further designed, for example, the current corner point P1 corresponding to the first inverse tracking corner point P1' meeting the distance standard may be used as a to-be-determined corner point, then in a second subsequent frame adjacent to the first subsequent frame or sampled at equal intervals, a second corresponding pixel point P3 corresponding to the to-be-determined corner point is tracked, and then the second corresponding pixel point P3 is returned to the current video frame again to obtain a corresponding second inverse tracking corner point P1", if the coordinates of the second inverse tracking corner point P1" and the coordinates of the current corner point P1 also meet a certain set distance standard, the verification of the current corner point P1 is determined to pass.

For ease of understanding, the following examples are: 200 angular points (P1) are selected from the current first frame, 150 angular points (P2) are tracked forward in the second frame, the 150 angular points are reversely tracked in the first frame to 120 first reverse tracking angular points (P1'), and 100 angular points in the 120 angular points and the corresponding P1 meet the established distance standard, namely 100 undetermined angular points are obtained at the moment. Then the 100 undetermined corner points directly cross the second frame, optical flow tracking is carried out in the third frame to obtain 80 corner points (P3), then the 80 corner points reversely track 60 second reverse tracking corner points (P1 ") in the first frame, and finally 50 corner points which meet the coordinate distance standard are selected from P1", namely, the 50 current corner points pass verification and can be identified as the stable pixel points.

The preferred stable pixel point determination scheme can be described as follows:

the number of frames and tracking times involved in reverse verification are not limited in actual operation, four times of tracking (the current frame is forward to the first frame, the first frame is reverse to the current frame, the current frame is forward to the second frame, the second frame is reverse to the current frame) completed in the three frames are only illustrative examples, in actual operation, the second subsequent frame can refer to a plurality of subsequent frames, the corresponding tracking times are increased, and the judgment result is more reliable.

Secondly, it can be understood that the calculation amount is increased by forward and backward tracking for multiple times, so that local verification can be considered in actual operation, that is, the whole candidate plane area is not subjected to traversal tracking, local pixel point investigation is performed according to the tracking result obtained in the backward tracking process, range constraint can be performed by adjusting the size of the window, for example, the backward verification is performed by adjusting the window from the initial window of 7*7 to 5*5 or 3*3, and in any case, the thinking of local investigation (which can comprise various constraint means) can be adopted to balance the calculation amount additionally generated in the backward tracking verification.

Continuing the previous step, returning to fig. 1, in step S4, mapping processing is performed with the candidate plane area determined to be available as a target plane.

After final locking to the planar area that can be used for the mapping process via the previous embodiments, the image implantation process can be completed in accordance with conventional mapping operations. The invention is described here in two stages, homography matrix computation and image implantation.

First stage, homography matrix

The homography matrix can be obtained based on the reliable plane obtained in the previous step and according to the stable tracking points in each video frame, so that stable plane tracking is realized. It should be noted that, the main content of the foregoing is to determine whether a candidate plane area is available, and when it is determined that the candidate plane area is available, plane tracking may be performed based on the actual change condition of the plane in each relevant video frame, and of course, the plane tracking manner used herein may also refer to the conclusion and the processing thought of the stable pixel point obtained in the foregoing processing procedure, which is not repeated in the present invention; for the calculation of the homography matrix, it is assumed that the camera observes the same plane at two different positions, and as shown in fig. 5, the normal vector of the plane in frame1 is N, and the distance from the origin of frame1 is d. Then the transition from frame1 to frame2 at any point in the plane is:

X ₂ ＝RX ₁ +T (14)

Wherein X is ₂ ，X ₁ Respectively, the 3D points corresponding to the points in the frame2 and frame1 corresponding coordinate systems, and R and T are respectively a rotation matrix and a translation matrix, and because:

combining can obtain

/>

Therefore, the homography matrix can be obtained

Thus, a homography matrix maps any point on one image to a corresponding point in the other image, i.e., all matching points in the two images that lie in the same plane can be represented by the same homography matrix H. Therefore, in the known video, after obtaining the correspondence point (tracking, i.e., matching, and thus may also be referred to as matching point) relation of different frames based on the characteristic point tracking method, the homography matrix may be obtained:

since the simultaneous multiplication of both sides of the equation with a non-0 constant is still satisfied, it is not difficult to find that there are 8 degrees of freedom for the homography matrix, and since a pair of matching points can provide 2 constraints, at least 4 solutions to the matching points are required. In general, the number of matching points between adjacent frames is greater than 4, and random sampling consistency (RANSAC) can be used for filtering out mismatching to obtain a better initialization H, and then nonlinear optimization is used for more accurately solving the H. Specifically, the error optimization function that can be adopted is to minimize the symmetric transfer error, as follows:

Considering that errors may exist in both frames of images, the matching point x can be minimized simultaneously _i Forward conversion pi (Hx) _i ) Then match point x 'with the target' _i And (2) the error (and can also comprise an inverse matching error), the minimization function is used for optimizing the H, and the solving accuracy of the H matrix can be remarkably improved based on the optimization thought.

Second stage, image implantation

After the plane detection, feature tracking, plane availability determination, and homography matrix calculation described above are completed, mapping processing may be performed in the target plane according to the size of the image or video that the user desires to implant. The mapping operation may be to automatically match a region corresponding to the size of the material in the target plane, that is, adaptively implant an image or video according to the size information of the material to be implanted and the size information of the target plane, where the implantation means includes, but is not limited to, an image fusion scheme including poisson fusion, for example, if the area of the target plane is too small, the size of the material to be implanted is automatically adjusted; in addition, the mapping operation may be further based on a user's further interactive operation, for example, but not limited to, acquiring 4 point positions of a specific shape (rectangle) in a target plane selected by the user, implanting an image or video material in the selected rectangle, where the mapping means may be selected according to actual needs, and the present invention is not limited in particular.

In summary, the idea of the invention is that the user is not required to select the plane by complex and specialized operations on the premise of participating in the interaction, but the finally required area to be mapped is locked by two automatic stages. Specifically, the first stage is to preliminarily select the candidate plane area by combining the video object plane detection technology with simple interactive operation of a user, and the second stage is to judge whether the candidate plane area is available or not according to the image characteristics of the candidate plane area in multi-frame images on the basis, and compared with the prior art, the two stages can greatly simplify the complexity of the user interaction flow, comprehensively and reliably obtain the characteristic information of the plane area to be mapped, and are not limited by non-professional and inaccurate hand selection areas of the user; furthermore, the image features of the candidate plane areas are displayed in a plurality of video frames, so that the availability of the candidate plane areas is automatically and efficiently screened, namely, the implantation reliability judgment is carried out by utilizing the information of the images, and the deviation of the processing effect caused by the lack of the background knowledge of the related technology is effectively avoided. The method and the device can efficiently and conveniently determine the required plane to be mapped while optimizing user experience, and further greatly improve the stability and effectiveness of image implantation.

Corresponding to the above embodiments and preferred solutions, the present invention further provides an embodiment of a video-based object plane mapping device, as shown in fig. 6, which may specifically include the following components:

the input module 1 is used for acquiring a first frame image of a video to be processed set by a user and at least one target pixel point clicked in the first frame image by the user;

a candidate plane area detection module 2, configured to detect a candidate plane area from the first frame image according to the first frame image, the position information of the target pixel point, and a plane detection model that is constructed in advance;

a target plane screening module 3, configured to determine whether the candidate plane area is available based on image features of the candidate plane area in the first frame image and subsequent multiple frame images;

and the mapping module 4 is used for performing mapping processing by taking the candidate plane area determined to be available as a target plane.

In at least one possible implementation, the pixel tracking submodule includes:

and/or a second stable pixel determination unit including:

alternatively, the verification component specifically includes:

It should be understood that the division of the components in the video-based object plane mapping apparatus shown in fig. 6 is merely a division of logic functions, and may be fully or partially integrated into a physical entity or may be physically separated. And these components may all be implemented in software in the form of a call through a processing element; or can be realized in hardware; it is also possible that part of the components are implemented in the form of software called by the processing element and part of the components are implemented in the form of hardware. For example, some of the above modules may be individually set up processing elements, or may be integrated in a chip of the electronic device. The implementation of the other components is similar. In addition, all or part of the components can be integrated together or can be independently realized. In implementation, each step of the above method or each component above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above components may be one or more integrated circuits configured to implement the above methods, such as: one or more specific integrated circuits (Application Specific Integrated Circuit; hereinafter ASIC), or one or more microprocessors (Digital Singnal Processor; hereinafter DSP), or one or more field programmable gate arrays (Field Programmable Gate Array; hereinafter FPGA), etc. For another example, these components may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

In view of the foregoing examples and preferred embodiments thereof, it will be appreciated by those skilled in the art that in actual operation, the technical concepts of the present invention may be applied to various embodiments, and the present invention is schematically illustrated by the following carriers:

(1) An object plane mapping device based on video. The device may specifically include: one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the device, cause the device to perform the steps/functions of the foregoing embodiments or equivalent implementations.

Fig. 7 is a schematic structural diagram of an embodiment of a video-based object plane mapping device according to the present invention, where the device may be a server, a desktop PC, a notebook computer, a smart terminal, etc. (e.g., but not limited to, a mobile phone, a tablet, a reader, a learning machine, a recording pen, a sound box, a reading light, a watch, glasses, etc.).

As particularly shown in fig. 7, the video-based object plane mapping apparatus 900 includes a processor 910 and a memory 930. Wherein the processor 910 and the memory 930 may communicate with each other via an internal connection, and transfer control and/or data signals, the memory 930 is configured to store a computer program, and the processor 910 is configured to call and execute the computer program from the memory 930. The processor 910 and the memory 930 may be combined into a single processing device, more commonly referred to as separate components, and the processor 910 is configured to execute program code stored in the memory 930 to perform the functions described above. In particular, the memory 930 may also be integrated within the processor 910 or may be separate from the processor 910.

In addition, to further improve the functionality of the video-based object plane mapping device 900, the device 900 may further comprise one or more of an input unit 960, a display unit 970, audio circuitry 980, a camera 990, and a sensor 901, etc., which may further comprise a speaker 982, a microphone 984, etc. Wherein the display unit 970 may include a display screen.

Further, the apparatus 900 may also include a power supply 950 for providing electrical power to various devices or circuits in the apparatus 900.

It should be appreciated that the operation and/or function of the various components in the apparatus 900 may be found in particular in the foregoing description of embodiments of the method, system, etc., and detailed descriptions thereof are omitted here as appropriate to avoid redundancy.

It should be appreciated that the processor 910 in the video-based object plane mapping apparatus 900 shown in fig. 7 may be a system on a chip SOC, and the processor 910 may include a central processing unit (Central Processing Unit; hereinafter referred to as a CPU), and may further include other types of processors, for example: an image processor (Graphics Processing Unit; hereinafter referred to as GPU) or the like, as will be described in detail below.

In general, portions of the processors or processing units within the processor 910 may cooperate to implement the preceding method flows, and corresponding software programs for the portions of the processors or processing units may be stored in the memory 930.

(2) A readable storage medium having stored thereon a computer program or the above-mentioned means, which when executed, causes a computer to perform the steps/functions of the foregoing embodiments or equivalent implementations.

In several embodiments provided by the present invention, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, certain aspects of the present invention may be embodied in the form of a software product as described below, in essence, or as a part of, contributing to the prior art.

(3) A computer program product (which may comprise the apparatus described above) which, when run on a terminal device, causes the terminal device to perform the video-based object plane mapping method of the preceding embodiment or equivalent implementation.

From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described methods may be implemented in software plus necessary general purpose hardware platforms. Based on such understanding, the above-described computer program product may include, but is not limited to, an APP; in connection with the foregoing, the device/terminal may be a computer device, and the hardware structure of the computer device may specifically further include: at least one processor, at least one communication interface, at least one memory and at least one communication bus; the processor, the communication interface and the memory can all communicate with each other through a communication bus. The processor may be a central processing unit CPU, DSP, microcontroller or digital signal processor, and may further include a GPU, an embedded Neural network processor (Neural-network Process Units; hereinafter referred to as NPU) and an image signal processor (Image Signal Processing; hereinafter referred to as ISP), and the processor may further include an ASIC (application specific integrated circuit) or one or more integrated circuits configured to implement embodiments of the present invention, and in addition, the processor may have a function of operating one or more software programs, and the software programs may be stored in a storage medium such as a memory; and the aforementioned memory/storage medium may include: nonvolatile Memory (non-volatile Memory), such as a non-removable magnetic disk, a USB flash disk, a removable hard disk, an optical disk, and the like, and Read-Only Memory (ROM), random access Memory (Random Access Memory; RAM), and the like.

In the embodiments of the present invention, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.

Those of skill in the art will appreciate that the various modules, units, and method steps described in the embodiments disclosed herein can be implemented in electronic hardware, computer software, and combinations of electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

And, each embodiment in the specification is described in a progressive manner, and the same and similar parts of each embodiment are mutually referred to. In particular, for embodiments of the apparatus, device, etc., as they are substantially similar to method embodiments, the relevance may be found in part in the description of method embodiments. The above-described embodiments of apparatus, devices, etc. are merely illustrative, in which modules, units, etc. illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed across multiple places, e.g., nodes of a system network. In particular, some or all modules and units in the system can be selected according to actual needs to achieve the purpose of the embodiment scheme. Those skilled in the art will understand and practice the invention without undue burden.

The construction, features and effects of the present invention are described in detail according to the embodiments shown in the drawings, but the above is only a preferred embodiment of the present invention, and it should be understood that the technical features of the above embodiment and the preferred mode thereof can be reasonably combined and matched into various equivalent schemes by those skilled in the art without departing from or changing the design concept and technical effects of the present invention; therefore, the invention is not limited to the embodiments shown in the drawings, but is intended to be within the scope of the invention as long as changes made in the concept of the invention or modifications to the equivalent embodiments do not depart from the spirit of the invention as covered by the specification and drawings.

Claims

1. A video-based object plane mapping method, comprising:

based on the image characteristics of the candidate plane area in the first frame image and the subsequent multi-frame images, determining whether the candidate plane area is available comprises: extracting a plurality of corner points of the candidate plane area from the first frame image; tracking each corner in a plurality of video frames according to the image characteristics of the corner; tracking a corresponding pixel point of a current corner point in a current video frame in a follow-up video frame of a preset frame number, determining the current corner point as a stable pixel point of the current video frame, and/or performing reverse tracking verification by utilizing the corresponding pixel point tracked in the follow-up video frame, determining the current corner point passing verification as the stable pixel point of the current video frame, and judging whether the candidate plane area is available or not according to the number of the stable pixel points and a preset number threshold;

2. The video-based object plane mapping method according to claim 1, wherein tracking each corner in a plurality of video frames according to the image features of the corner comprises:

3. The video-based object plane mapping method of claim 2, wherein the forward optical flow tracking of each of the corner points along the time axis of a plurality of video frames comprises:

4. The method of claim 1, wherein the performing inverse tracking verification using the corresponding pixels tracked in the subsequent video frame comprises:

5. The video-based object plane mapping method according to claim 4, wherein verifying the current corner according to the positional relationship of the first inverse tracking corner and the current corner comprises:

or alternatively

6. An object plane mapping apparatus based on video, comprising:

the target plane screening module is configured to determine whether the candidate plane area is available based on image features of the candidate plane area in the first frame image and subsequent multiple frame images, and includes: extracting a plurality of corner points of the candidate plane area from the first frame image; tracking each corner in a plurality of video frames according to the image characteristics of the corner; tracking a corresponding pixel point of a current corner point in a current video frame in a follow-up video frame of a preset frame number, determining the current corner point as a stable pixel point of the current video frame, and/or performing reverse tracking verification by utilizing the corresponding pixel point tracked in the follow-up video frame, determining the current corner point passing verification as the stable pixel point of the current video frame, and judging whether the candidate plane area is available or not according to the number of the stable pixel points and a preset number threshold;

7. A video-based object plane mapping apparatus, comprising:

one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the device, cause the device to perform the video-based object plane mapping method of any of claims 1-5.