WO2022193180A1 - 视频帧处理方法和装置 - Google Patents

视频帧处理方法和装置 Download PDF

Info

Publication number
WO2022193180A1
WO2022193180A1 PCT/CN2021/081341 CN2021081341W WO2022193180A1 WO 2022193180 A1 WO2022193180 A1 WO 2022193180A1 CN 2021081341 W CN2021081341 W CN 2021081341W WO 2022193180 A1 WO2022193180 A1 WO 2022193180A1
Authority
WO
WIPO (PCT)
Prior art keywords
current frame
frame
information
image data
attribute
Prior art date
Application number
PCT/CN2021/081341
Other languages
English (en)
French (fr)
Inventor
李伟
金明磊
林天鹏
刘宇
丁宗合
占云龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180086746.7A priority Critical patent/CN116671099A/zh
Priority to PCT/CN2021/081341 priority patent/WO2022193180A1/zh
Publication of WO2022193180A1 publication Critical patent/WO2022193180A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/275Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a video frame processing method and a video frame processing device.
  • the principle of the existing video frame processing technology is: on the basis of the original image data of the video frame, various image processing algorithms are used to process the video frame.
  • the data base of the existing video frame processing technology is only the original image data of the video frame, so in some cases, the processing of the image processing algorithm on the same dimension (such as color, brightness) of adjacent frames will appear inconsistent phenomenon (ie, domain inconsistency). For example, the brightness of the same object (such as a wall) in adjacent frames is different.
  • the present application provides a video frame processing method and a video frame processing device, which are used to solve the phenomenon of inconsistency in the processing of image processing algorithms on the same dimension of adjacent frames (ie, domain inconsistency).
  • the attribute information of the current frame obtained through the 3D perceptual map is calculated based on the first video frame, that is, the attribute information of the current frame carries the current frame.
  • the information of the video frame before the frame, so that the target image data of the current frame is obtained through the attribute information of the current frame which can be understood as the target image data of the current frame obtained by combining the information of the video frame before the current frame, which can avoid the image processing algorithm in the Inconsistency occurs in the processing of adjacent frames in the same dimension (inconsistency in the real-time domain), which improves the display effect of the video.
  • the three-dimensional perceptual map is a three-dimensional map of the first scene and is used to at least indicate the image attributes of the first video frame
  • the three-dimensional perceptual map not only describes the spatial location information of the first scene, but also describes the image of the first video frame Attributes, which carry rich and diverse information, improve the robustness of video frame processing methods.
  • the obtaining the target image data of the current frame according to the attribute information of the current frame includes: determining the shooting parameters of the camera according to the attribute information of the current frame, The shooting parameters, the first attitude information and the camera of the first field of view shoot the current frame to obtain the target image data of the current frame; or obtain the original image data of the current frame, according to The attribute information of the current frame processes the original image data of the current frame to obtain target image data of the current frame.
  • the obtaining the target image data of the current frame according to the attribute information of the current frame includes: according to the second posture information and the first field of view, in the three-dimensional perception map The attribute information of the reference frame is obtained in , wherein the second posture information is the posture information when the camera shoots the reference frame, and the reference frame is the video frame before the current frame; according to the reference frame The attribute information of the current frame and the attribute information of the current frame are calculated, and the similarity between the current frame and the reference frame is calculated; if the similarity is greater than the preset similarity, the attribute information of the current frame is obtained.
  • Target image data for the current frame includes: according to the second posture information and the first field of view, in the three-dimensional perception map The attribute information of the reference frame is obtained in , wherein the second posture information is the posture information when the camera shoots the reference frame, and the reference frame is the video frame before the current frame; according to the reference frame The attribute information of the current frame and the attribute information of the current frame are calculated, and the similarity between the current frame
  • the target image data of the current frame is obtained according to the attribute information of the current frame, which is equivalent to determining that the current frame and the reference frame should be consistent.
  • the target image data of the current frame is obtained according to the attribute information of the current frame, which improves the application accuracy of the video frame processing method, and also increases the practicability of the video frame processing method in different scenarios.
  • the obtaining the target image data of the current frame according to the attribute information of the current frame includes: determining the target image data according to the attribute information of the current frame and the attribute information of the reference frame. Attribute difference between the current frame and the reference frame; obtain target image data of the current frame according to the attribute difference.
  • the obtaining the target image data of the current frame according to the attribute difference includes: determining shooting parameters of the camera according to the attribute difference, The first posture information and the camera of the first field of view capture the current frame to obtain the target image data of the current frame; or obtain the original image data of the current frame, according to the attribute difference.
  • the original image data of the current frame is processed to obtain the target image data of the current frame.
  • the three-dimensional perceptual map includes a plurality of first feature points; the obtaining the attribute information of the current frame in the three-dimensional perceptual map includes: determining the the second feature point corresponding to the current frame, wherein the second feature point corresponding to the current frame is the first feature point captured by the camera based on the first posture information and the first field of view; according to The image attribute of the first video frame determines the image attribute of the current frame; the attribute information of the current frame is determined according to the position of the second feature point corresponding to the current frame and the image attribute of the current frame.
  • the three-dimensional perception map includes a plurality of first feature points; the obtaining attribute information of the reference frame in the three-dimensional perception map includes: determining from the plurality of first feature points the second feature point corresponding to the reference frame, wherein the second feature point corresponding to the reference frame is the first feature point captured by the camera based on the second pose information and the first field of view ; Determine the image attribute of the reference frame according to the image attribute of the first video frame; Determine the attribute of the reference frame according to the position of the second feature point corresponding to the reference frame and the image attribute of the reference frame information.
  • the method further includes: constructing the three-dimensional perceptual map according to the target image data of the first video frame; or obtaining the 3D perceptual map according to the target image data of the first video frame and a first sensor
  • the three-dimensional perception map is constructed with the first information of the first video frame, wherein the first information includes the motion information and/or the spatial ranging information of the first scene when the camera captures the first video frame.
  • the three-dimensional perceptual map includes a plurality of first feature points
  • the constructing the three-dimensional perceptual map according to the target image data of the first video frame includes: analyzing the first video frame Perform image analysis on the target image data of the first video frame to obtain the image attributes of the first video frame; perform spatial analysis on the target image data of the first video frame to obtain the positions of the plurality of first feature points and the camera shooting The pose sequence of the first video frame; the three-dimensional perception map is constructed according to the image attributes of the first video frame, the positions of the plurality of first feature points, and the pose sequence.
  • the three-dimensional perception map includes a plurality of first feature points; the three-dimensional perception map is constructed according to target image data of the first video frame and first information obtained by a first sensor
  • the method includes: performing image analysis on the target image data of the first video frame to obtain image attributes of the first video frame; and performing spatial analysis on the target image data of the first video frame in combination with the first information to obtain the positions of the plurality of first feature points and the pose sequence of the first video frame captured by the camera; according to the image attributes of the first video frame, the positions of the plurality of first feature points and the The pose sequence is used to construct the three-dimensional perception map.
  • the method further includes: updating the three-dimensional perceptual map according to the target image data of the current frame; or updating the three-dimensional perception map according to the target image data of the current frame and the first sensor
  • the three-dimensional perception map is updated with two pieces of information, where the second information includes motion information when the camera captures the current frame and/or spatial ranging information of the real scene indicated by the current frame.
  • the present application provides a video frame processing device, including: a first obtaining module, configured to obtain attribute information of a current frame in a three-dimensional perception map according to first posture information and a first field of view, wherein the The first attitude information is the attitude information of the camera shooting the current frame, and the first field of view is the field of view of the camera; the second obtaining module is used for obtaining the current frame according to the attribute information of the current frame.
  • the target image data wherein, the three-dimensional perceptual map is a three-dimensional map of a first scene and is used to indicate at least an image attribute of a first video frame, and the first scene is a real scene indicated by the first video frame,
  • the first video frame is a video frame for constructing the three-dimensional perception map, the first video frame and the current frame are video frames in the video captured by the camera, and the first video frame is located in the current frame. frame before.
  • the second obtaining module is specifically configured to determine the shooting parameters of the camera according to the attribute information of the current frame, and determine the shooting parameters of the camera based on the shooting parameters, the first posture information and the The camera of the first field of view shoots the current frame to obtain the target image data of the current frame; or obtains the original image data of the current frame, and analyzes the current frame according to the attribute information of the current frame.
  • the original image data is processed to obtain the target image data of the current frame.
  • the second obtaining module is specifically configured to obtain attribute information of the reference frame in the three-dimensional perception map according to the second posture information and the first field of view, wherein the The second attitude information is the attitude information when the camera shoots the reference frame, and the reference frame is a video frame before the current frame; according to the attribute information of the reference frame and the attribute information of the current frame, Calculate the similarity between the current frame and the reference frame; if the similarity is greater than a preset similarity, obtain target image data of the current frame according to attribute information of the current frame.
  • the second obtaining module obtains the target image data of the current frame in the following manner: determining the current frame according to attribute information of the current frame and attribute information of the reference frame The attribute difference between the frame and the reference frame; according to the attribute difference, the target image data of the current frame is obtained.
  • the second obtaining module obtains the target image data of the current frame in the following manner: determining the shooting parameters of the camera according to the attribute difference, and determining the shooting parameters of the camera based on the shooting parameters. , the first posture information and the camera of the first field of view to shoot the current frame, to obtain the target image data of the current frame; or obtain the original image data of the current frame, according to the attribute difference The original image data of the current frame is processed to obtain target image data of the current frame.
  • the three-dimensional perception map includes a plurality of first feature points; the first obtaining module is specifically configured to determine, among the plurality of first feature points, the first feature point corresponding to the current frame Two feature points, wherein the second feature point corresponding to the current frame is the first feature point captured by the camera based on the first posture information and the first field of view; according to the first video
  • the image attribute of the frame determines the image attribute of the current frame; the attribute information of the current frame is determined according to the position of the second feature point corresponding to the current frame and the image attribute of the current frame.
  • the three-dimensional perceptual map includes a plurality of first feature points; the second obtaining module specifically obtains the attribute information of the reference frame in a subordinate manner: at the plurality of first feature points The second feature point corresponding to the reference frame is determined in feature point; determine the image attribute of the reference frame according to the image attribute of the first video frame; determine the reference frame according to the position of the second feature point corresponding to the reference frame and the image attribute of the reference frame attribute information.
  • the first obtaining module is further configured to construct the three-dimensional perception map according to the target image data of the first video frame; or according to the target image data of the first video frame and The first information obtained by the first sensor constructs the three-dimensional perception map, wherein the first information includes motion information when the camera captures the first video frame and/or spatial ranging information of the first scene .
  • the three-dimensional perceptual map includes a plurality of first feature points; the first obtaining module constructs the three-dimensional perceptual map in the following manner: for the target image of the first video frame Perform image analysis on the data to obtain the image attributes of the first video frame; perform spatial analysis on the target image data of the first video frame to obtain the positions of the plurality of first feature points and the camera to capture the first feature points.
  • a pose sequence of a video frame constructing the three-dimensional perception map according to the image attributes of the first video frame, the positions of the plurality of first feature points, and the pose sequence.
  • the three-dimensional perceptual map includes a plurality of first feature points; the first obtaining module constructs the three-dimensional perceptual map in the following manner: for the target image of the first video frame Perform image analysis on the data to obtain the image attributes of the first video frame; perform spatial analysis on the target image data of the first video frame in combination with the first information to obtain the positions and all the first feature points of the plurality of first feature points.
  • the camera captures the pose sequence of the first video frame; the three-dimensional perception map is constructed according to the image attributes of the first video frame, the positions of the plurality of first feature points, and the pose sequence.
  • the first obtaining module is further configured to update the three-dimensional perception map according to the target image data of the current frame; or according to the target image data of the current frame and the first
  • the second information obtained by the sensor updates the three-dimensional perception map, and the second information includes the motion information when the camera captures the current frame and/or the spatial ranging information of the real scene indicated by the current frame .
  • the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer or a processor, the computer or the processor is made to execute The method of any one of the aspects.
  • a computer program product comprising instructions, when run on a computer or processor, causes the computer or processor to perform the method of any one of the first aspects.
  • a fifth aspect a chip, comprising a processor and a memory, the memory is used for storing a computer program, the processor is used for calling and running the computer program stored in the memory, to execute any one of the first aspect. method described.
  • FIG. 1 is a schematic flowchart of a video frame processing method provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of constructing a three-dimensional perception map in a first manner according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of carrying an image attribute of a first video frame in a three-dimensional map according to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of constructing a three-dimensional perception map in a second manner according to an embodiment of the present application
  • FIG. 5 is a schematic flowchart of obtaining attribute information of a current frame according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of determining the attribute difference between a reference frame and a current frame provided by an embodiment of the present application
  • FIG. 7 is a schematic diagram of managing the size of a three-dimensional perception map according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a video frame processing apparatus according to an embodiment of the present application.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, c can be single or multiple.
  • the present application provides a video frame processing method, which can be applied to mobile phones, notebook computers, tablet computers, desktop computers, electronic readers, etc. capable of video shooting and/or data processing. on electronic devices.
  • the video frame processing method includes the following steps:
  • the first attitude information is the attitude information of the current frame captured by the camera, and the first field of view is the field of view of the camera.
  • the three-dimensional perceptual map is a three-dimensional map of the first scene and is at least used to indicate the image attribute of the first video frame
  • the first scene is the real scene indicated by the first video frame
  • the first video frame is the video for constructing the three-dimensional perceptual map frame
  • the first video frame and the current frame are video frames in the video captured by the camera
  • the first video frame is located before the current frame.
  • the attribute information of the current frame obtained through the 3D perceptual map is calculated based on the first video frame, that is, the attributes of the current frame.
  • the information carries the information of the video frame before the current frame.
  • obtaining the target image data of the current frame through the attribute information of the current frame can be understood as obtaining the target image data of the current frame in combination with the information of the video frame before the current frame, which can avoid Inconsistencies occur in the processing of the image processing algorithm on the same dimension of adjacent frames (inconsistency in the instant domain), which improves the display effect of the video.
  • the three-dimensional perceptual map is a three-dimensional map of the first scene and is used to at least indicate the image attributes of the first video frame
  • the three-dimensional perceptual map not only describes the spatial location information of the first scene, but also describes the image of the first video frame Attributes, which carry rich and diverse information, improve the robustness of video frame processing methods.
  • the three-dimensional perception map is a three-dimensional map of the first scene, which includes a plurality of first feature points, and is used to indicate the image attributes of the first video frame, the pose sequence of the first video frame captured by the camera, etc., which is not specially limited in this application .
  • the first feature points may be points in the three-dimensional perceptual map used to indicate elements in the first scene.
  • Elements include, but are not limited to, semantics and features, among others.
  • a boundary point used to indicate an object in the first scene in the three-dimensional perception map may be determined as the first feature point.
  • the ways to build a 3D perception map include but are not limited to the following two:
  • the first is to construct a three-dimensional perceptual map according to the target image data of the first video frame.
  • the second is to construct a three-dimensional perceptual map according to the target image data of the first video frame and the first information obtained by the first sensor, wherein the first information includes motion information when the camera captures the first video frame and/or the first scene. Spatial ranging information.
  • the first video frame is a video frame located before the current frame in the video captured by the camera.
  • the number of the first video frame may be one or more, which is not specially limited in this application.
  • the number of first video frames may be determined according to the number of video frames in the video before the current frame. Specifically, the number of first video frames is positively correlated with the number of video frames before the current frame.
  • the ways of selecting the first video frame include but are not limited to the following four ways:
  • the first video frame is arbitrarily selected from the video frames located before the current frame.
  • the first video frame is selected from the video frames located before the current frame according to a certain time interval.
  • the third method is to determine all video frames located before the current frame as the first video frame.
  • the key frame located before the current frame is determined as the first video frame, wherein the determination basis of the key frame may be, for example, the content in the video frame.
  • the target image data of the first video frame includes, but is not limited to, the position and pixel value of each pixel in the first video frame, exposure parameters of the first video frame, white balance coefficient, color temperature, light source information, and GPS information.
  • the first sensor includes, but is not limited to, an inertial measurement unit, a time-of-flight ranging unit, and the like.
  • the spatial ranging information of the first scene includes the distance between the camera and the feature point in the first scene when the first video frame is captured.
  • the feature points in the first scene are in one-to-one correspondence with the first feature points in the 3D perception map, and the first feature points and the corresponding feature points in the first scene represent the same elements.
  • the feature points in the first scene are in the real scene.
  • Elements include, but are not limited to, semantics, features, and the like.
  • the motion information when the camera captures the first video frame includes the position where the camera captures the first video frame, and the movement speed and movement direction at which the camera captures the first video frame.
  • FIG. 2 is a schematic flowchart of constructing a three-dimensional perception map in a first manner according to an embodiment of the present application. As shown in Figure 2, the build process includes:
  • Image analysis includes but is not limited to: semantic analysis, color analysis, brightness analysis, light source analysis, color temperature analysis, noise analysis, quality analysis, etc.
  • Semantic analysis can be implemented, for example, by techniques such as deep learning.
  • the objects contained in the first video frame and the location of each object can be identified through semantic analysis.
  • Objects include but are not limited to blue sky, white clouds, people, cars, roads, etc.
  • the image attributes of the first video frame include local attributes and global attributes.
  • the global attributes include, but are not limited to, the brightness, focus information, exposure information, white balance, color temperature and light source information of the first video frame.
  • the local attributes include, but are not limited to, the brightness, white balance, color temperature of the region corresponding to each object in the first video frame, local light source information, identification information and position information of each object, and the like.
  • the position of the first feature point refers to the coordinates of the first feature point in the three-dimensional perception map.
  • the pose sequence of the first video frame captured by the camera is: pose information of the first video frame captured by the camera.
  • the pose sequence for the camera to shoot the first video frame is: the pose information when the camera shoots each first video frame is arranged in the order of the shooting time of the first video frame the sequence of.
  • the posture information of the camera for shooting the first video frame includes but is not limited to the shooting position and shooting angle of the camera for shooting the first video frame.
  • Spatial analysis includes but is not limited to depth analysis, visual inertial odometry (VIO) analysis, SLAM (simultaneous localization and mapping) analysis, etc., which are not specifically limited in this application.
  • VIO visual inertial odometry
  • SLAM simultaneous localization and mapping
  • the specific construction process may be, for example: constructing a three-dimensional map according to the position of the first feature point, carrying the image attributes of the first video frame and the pose sequence of the first video frame captured by the camera in the three-dimensional map, and carrying the above information
  • the 3D map is determined as a 3D perception map.
  • the three-dimensional map may be, for example, a sparse three-dimensional point cloud map or a dense three-dimensional point cloud map.
  • the ways in which the image attribute of the first video frame is carried in the 3D map include but are not limited to the following two ways:
  • the first is to save the image attribute of the first video frame in a preset storage area in the three-dimensional map.
  • the second is to carry the image attribute of the first video frame on a first feature point corresponding to the first video frame.
  • the first feature point corresponding to the first video frame is a first feature point indicating an element in the first video frame among the plurality of first feature points.
  • one first feature point may carry image attributes of multiple first video frames.
  • FIG. 3 is a schematic diagram of carrying an image attribute of a first video frame in a three-dimensional map according to an embodiment of the present application. It can be seen from FIG. 3 that the number of the first video frame 301 is 3, the triangle pointing to the first video frame 301 represents the global attribute of the first video frame, the square in the first video frame 301 represents the local attribute of the first video frame, The circles in the three-dimensional map 302 represent the first feature points. It can be known from the three-dimensional perception map 303 that the image attributes of the first video frame 301 are carried on the first feature points, wherein two first feature points carry the image attributes of the two first video frames, and other first feature points carry the image attributes of the two first video frames. the image properties of a first video frame.
  • the manner of carrying the pose sequence in the three-dimensional map may be: storing the pose sequence in a preset storage area in the three-dimensional map, or carrying the pose sequence on the first feature point, and the like.
  • augmented reality rendering may also be performed on the three-dimensional perception map to optimize the display effect of the three-dimensional perception map.
  • FIG. 4 is a schematic flowchart of constructing a three-dimensional perception map in a second manner according to an embodiment of the present application. As shown in Figure 4, the build process includes:
  • the camera in the process of capturing the first video frame by the camera, other sensors (such as a wide-angle camera, an infrared sensor, etc.) may also be used to obtain the real scene indicated by the first video frame.
  • sensors such as a wide-angle camera, an infrared sensor, etc.
  • the information in the three-dimensional perception map is supplemented and calibrated by the information obtained by other sensors, so as to further increase the dimension of the information carried by the three-dimensional perception map and the accuracy of the information carried.
  • FIG. 5 is a schematic flowchart of obtaining attribute information of a current frame provided by an embodiment of the present application, as shown in FIG. 5 , including the following steps:
  • the second feature point corresponding to the current frame is the first feature point captured by the camera based on the first posture information and the first field of view among the plurality of first feature points.
  • a first two-dimensional perceptual map can be constructed, the specification and position of the first two-dimensional perceptual map are determined according to the first attitude information and the first field of view, and a plurality of first feature points are directed to the first two-dimensional perceptual map.
  • the first feature point mapped to the first two-dimensional perceptual map is determined as the second feature point corresponding to the current frame.
  • the first gesture information includes, but is not limited to, the shooting position and shooting angle when the camera shoots the current frame.
  • the position of the second feature point corresponding to the current frame is the coordinate of the second feature point in the two-dimensional space.
  • the way of determining the coordinates of the second feature point in the two-dimensional space can be, for example, as follows: according to the coordinates of the second feature point (that is, the first feature point captured by the camera) in the three-dimensional perception map, the second feature point is converted from the three-dimensional By mapping the perception map to the two-dimensional space, the coordinates of the second feature point in the two-dimensional space can be obtained.
  • the first video frame whose shooting time is closest to the shooting time of the current frame is determined as the target video frame, and the image attributes of the target video frame are determined. is the image attribute of the current frame.
  • the second is to determine the posture information of each first video frame captured by the camera in the pose sequence in which the camera captures the first video frame.
  • the first video frame closest to the first attitude information is determined as the target video frame, and the image attribute of the target video frame is determined as the image attribute of the current frame.
  • the above manner of determining the image attribute of the current frame is only exemplary, and is not intended to limit the present application.
  • the first video frame whose time difference between the shooting time and the shooting time of the current frame is within the preset time difference may also be determined as the target video frame, and the image attribute of the current frame is determined according to the image attribute of the target video frame. Specifically, if the number of target video frames is one, the image attribute of the target video frame is determined as the image attribute of the current frame; if the number of target video frames is multiple, the image attributes of multiple target video frames can be determined. The attributes in the same dimension are averaged, and the average value of the attributes of all dimensions is determined as the image attribute of the current frame.
  • the methods for obtaining the target image data of the current frame include but are not limited to the following two, wherein:
  • the shooting parameters of the camera are determined according to the attribute information of the current frame, and the target image data of the current frame is obtained by shooting the current frame with the camera based on the shooting parameters, the first posture information and the first field of view.
  • the corresponding shooting parameters in the camera can be determined according to the value of the attribute of each dimension in the attribute information of the current frame, and the current frame is captured by the camera based on the shooting parameters, the first attitude information and the first field of view, and the current frame is captured when the current frame is captured.
  • the data is determined as the target image data of the current frame.
  • the shooting parameters of the camera include but are not limited to brightness, focus information, exposure, white balance coefficient, and the like.
  • the application scenarios of the video frame processing method are as follows:
  • the attribute information of the current frame is obtained from the three-dimensional perception map constructed by the video frame before the current frame according to the posture information of the current frame captured by the camera and the field of view of the camera, according to the attributes of the current frame.
  • the information determines the shooting parameters of the camera, and the current frame is captured by the camera based on the shooting parameters, the first attitude information and the first field of view, so as to obtain target image data of the current frame.
  • the original image data of the current frame is obtained, and the original image data of the current frame is processed according to the attribute information of the current frame to obtain the target image data of the current frame.
  • the application scenarios of the video processing method are as follows:
  • the to-be-processed video frame in the video is determined as the current frame, and the image data collected when the current frame is captured is determined as the original image data of the current frame. Then, according to the pose information of the current frame captured by the camera and the field of view of the camera, the attribute information of the current frame is obtained from the three-dimensional perception map constructed by the video frame before the current frame. Finally, by processing the original image of the current frame according to the attribute information of the current frame, the target image data of the current frame can be obtained.
  • the original image data of the current frame is the image data collected when the camera captures the current frame.
  • the process of processing the original image data of the current frame according to the attribute information of the current frame may be: The data of the corresponding dimension in the data is processed to obtain the target image data of the current frame.
  • the opportunities for obtaining the target image data of the current frame include but are not limited to the following two, wherein:
  • the target image data of the current frame is directly obtained through the attribute information of the current frame.
  • a video frame is selected as a reference frame in the video captured by the camera, and the reference frame is located before the current frame.
  • the reference frame may or may not be adjacent to the current frame, which is not specifically limited in this application.
  • the attribute information of the reference frame is obtained in the three-dimensional perception map.
  • the second attitude information is the attitude information of the camera to shoot the reference frame, and the attitude information of the camera to shoot the reference frame includes but is not limited to the shooting position and shooting angle of the camera to shoot the reference frame.
  • the similarity between the current frame and the reference frame is calculated according to the attribute information of the reference frame and the attribute information of the current frame. If the similarity is greater than the preset similarity, the target image data of the current frame is obtained according to the attribute information of the current frame.
  • the target image data of the current frame is obtained according to the attribute information of the current frame, which improves the accuracy of the application of the video frame processing method, and also increases the practicability of the video frame processing method in different scenarios.
  • the process of obtaining the attribute information of the reference frame can be as follows:
  • a second feature point corresponding to the reference frame is determined from the multiple first feature points in the three-dimensional perception map, and the second feature point corresponding to the reference frame is one of the multiple first feature points based on the second pose information and the first field of view
  • the first feature point captured by the camera Exemplarily, a second two-dimensional perceptual map can be constructed, the specification and position of the second two-dimensional perceptual map are determined according to the second posture information and the first field of view, and a plurality of first feature points are mapped to the second two-dimensional perceptual map. mapping, and determining the first feature point mapped to the second two-dimensional perceptual map as the second feature point corresponding to the reference frame.
  • the image attribute of the reference frame is determined according to the image attribute of the first video frame.
  • the attribute information of the reference frame is determined according to the position of the second feature point corresponding to the reference frame and the image attribute of the reference frame.
  • the position of the second feature point corresponding to the reference frame is the coordinate of the second feature point in the two-dimensional space. Since the principle of determining the coordinates of the second feature point in the two-dimensional space has been described above, it will not be repeated here. In addition, since the principle of determining the image attribute of the reference frame is similar to the principle of determining the image attribute of the current frame, the principle of determining the image attribute of the reference frame will not be described here.
  • the first is to calculate the spatial distance between the reference frame and the current frame according to the position of the second feature point in the attribute information of the reference frame and the position of the second feature point in the attribute information of the current frame, and then determine the reference frame according to the distance.
  • the similarity of the frame to the current frame Specifically, the smaller the distance between the reference frame and the current frame, the greater the similarity between the current frame and the reference frame.
  • the similarity of the attributes of the reference frame and the current frame based on the same dimension can be calculated (for example, the spatial similarity between the reference frame and the current frame, the exposure parameter similarity, similarity in white balance, similarity in the objects contained in the two, etc.).
  • the similarity obtained based on each dimension is weighted and summed to obtain the similarity between the reference frame and the current frame.
  • the preset similarity may be set according to the number of video frames spaced between the reference frame and the current frame, the difference between the first gesture information and the second gesture information, and the like.
  • the method of obtaining the target image data of the current frame may also be: determining the attribute difference between the current frame and the reference frame according to the attribute information of the current frame and the attribute information of the reference frame. According to the attribute difference between the current frame and the reference frame, the target image data of the current frame is obtained.
  • the method of determining the attribute difference between the current frame and the reference frame may be:
  • the attribute information of the current frame and the attribute information of the reference frame determine the difference value of the attributes of the same dimension between the current frame and the reference frame, and collect the difference values of the attributes of all dimensions between the current frame and the reference frame, and then we can get Attribute difference between the current frame and the reference frame.
  • the methods for obtaining the target image data of the current frame include but are not limited to the following two:
  • the first is to determine the shooting parameters of the camera according to the attribute difference between the current frame and the reference frame, and obtain the target image data of the current frame by shooting the current frame with the camera based on the shooting parameters, the first attitude information and the first field of view.
  • the corresponding shooting parameters in the camera can be determined respectively, and the current frame can be shot by the camera based on the shooting parameters, the first attitude information and the first scene, and the data collected when shooting the current frame is determined as the current frame.
  • the target image data for the frame can be determined respectively, and the current frame can be shot by the camera based on the shooting parameters, the first attitude information and the first scene, and the data collected when shooting the current frame is determined as the current frame.
  • the first method is applied in the video shooting process, and the application scenario has been described above, so it will not be repeated here.
  • the original image data of the current frame is obtained, and the original image data of the current frame is processed according to the attribute difference to obtain the target image data of the current frame.
  • the data of the corresponding dimension in the original image data of the current frame can be processed according to the attribute difference of each dimension and combined with the corresponding algorithm, so as to obtain the target image data of the current frame.
  • the second method is applied after the video shooting is completed. Since the application scenario has been described above, it will not be repeated here.
  • FIG. 6 is a schematic diagram of determining an attribute difference between a reference frame and a current frame according to an embodiment of the present application. As shown in Figure 6, the process of determining the attribute difference is:
  • first two-dimensional perceptual map 601 and a second two-dimensional perceptual map 602 constructing a first two-dimensional perceptual map 601 and a second two-dimensional perceptual map 602, and mapping a plurality of first feature points in the three-dimensional perceptual map 603 to the first two-dimensional perceptual map 601 and the second two-dimensional perceptual map 602, respectively, And determine the attribute information of the current frame and the attribute information of the reference frame.
  • the circle in the first two-dimensional perceptual map 601 represents the second feature point corresponding to the current frame
  • the circle in the second two-dimensional perceptual map 602 represents the second feature point corresponding to the reference frame.
  • the squares below the first two-dimensional perceptual map 601 represent local attributes in the attribute information of the current frame, and the triangles below the first two-dimensional perceptual map 601 represent global attributes in the attribute information of the current frame.
  • the squares below the second two-dimensional perceptual map 602 represent local attributes in the attribute information of the reference frame, and the triangles below the second two-dimensional perceptual map 602 represent global attributes in the attribute information of the reference frame.
  • the spatial difference between the current frame and the reference frame is determined according to the position of the second feature point corresponding to the current frame and the position of the second feature point corresponding to the reference frame.
  • the difference between the current frame and the reference frame of the global attribute of the same dimension is determined.
  • the difference between the local attributes of the same dimension in the current frame and the reference frame is determined.
  • the three-dimensional perception map can also be managed, and the management here includes but is not limited to the update of the three-dimensional perception map, the size management of the three-dimensional perception map, and the like.
  • the three-dimensional perception map is updated by the target image data of the current frame, or the three-dimensional perception map is updated by the target image data of the current frame and the second information obtained by the first sensor, wherein the second information includes the camera when the current frame is captured.
  • the three-dimensional perception map can be extended in the time domain, so that the information carried by the three-dimensional perception map can be expanded with the passage of the video, so that the three-dimensional perception map can be more accurate with the passage of the video. And comprehensively reflect the real scene shown by the video, thereby improving the accuracy and flexibility of processing subsequent video frames.
  • FIG. 7 is a schematic diagram of managing the size of a three-dimensional perception map according to an embodiment of the present application.
  • the circle represents the first feature point
  • the triangle represents the global attribute of the first video frame
  • the square represents the local attribute of the first video frame.
  • the 3D sensing map 702 on the right can be obtained by merging the first feature points with similar distances in the 3D sensing map 701 on the left.
  • the 3D sensing map 702 on the right is compared with the 3D sensing map on the left.
  • the 701 is much smaller.
  • the condition that triggers the management of the size of the 3D perception map can be any of the following:
  • the first is to manage the size of the three-dimensional perception map according to a preset period.
  • the size of the 3D sensing map is detected according to a preset period, and if the size of the 3D sensing map is larger than the preset size, the size of the 3D sensing map is managed.
  • the preset period and preset size are determined according to technical requirements, which are not specially limited in this application.
  • FIG. 8 is a schematic structural diagram of a video frame processing apparatus provided by an embodiment of the present application.
  • the apparatus 800 may include: a first obtaining module 801 and a second obtaining module 802, wherein:
  • the first obtaining module 801 is configured to obtain the attribute information of the current frame in the three-dimensional perception map according to the first attitude information and the first field of view, wherein the first attitude information is the attitude information of the camera shooting the current frame, the first field of view is the field of view of the camera;
  • the second obtaining module 802 is configured to obtain the target image data of the current frame according to the attribute information of the current frame; wherein the three-dimensional perceptual map is a three-dimensional map of the first scene and is at least used to indicate the first video frame , the first scene is the real scene indicated by the first video frame, the first video frame is the video frame for constructing the 3D perception map, the first video frame and the current frame For a video frame in the video captured by the camera, the first video frame is located before the current frame.
  • the second obtaining module 802 is specifically configured to determine the shooting parameters of the camera according to the attribute information of the current frame. Shoot the current frame with the camera of the first field of view to obtain the target image data of the current frame; or obtain the original image data of the current frame, and analyze the current frame according to the attribute information of the current frame. The original image data of the frame is processed to obtain the target image data of the current frame.
  • the second obtaining module 802 is specifically configured to obtain attribute information of the reference frame in the three-dimensional perception map according to the second posture information and the first field of view, wherein the The second attitude information is the attitude information when the camera shoots the reference frame, and the reference frame is a video frame before the current frame; according to the attribute information of the reference frame and the attribute information of the current frame , calculate the similarity between the current frame and the reference frame; if the similarity is greater than a preset similarity, obtain the target image data of the current frame according to the attribute information of the current frame.
  • the second obtaining module 802 obtains the target image data of the current frame in the following manner: determining the target image data according to the attribute information of the current frame and the attribute information of the reference frame Attribute difference between the current frame and the reference frame; obtain target image data of the current frame according to the attribute difference.
  • the second obtaining module 802 obtains the target image data of the current frame in the following manner: determining the shooting parameters of the camera according to the attribute difference, and determining the shooting parameters of the camera based on the shooting parameters, the first attitude information, and the camera of the first field of view to capture the current frame to obtain the target image data of the current frame; or obtain the original image data of the current frame, according to the attribute
  • the difference processes the original image data of the current frame to obtain target image data of the current frame.
  • the three-dimensional perceptual map includes a plurality of first feature points; the first obtaining module 801 is specifically configured to determine, among the plurality of first feature points, the corresponding feature points of the current frame The second feature point, wherein the second feature point corresponding to the current frame is the first feature point captured by the camera based on the first posture information and the first field of view; according to the first feature point
  • the image attribute of the video frame determines the image attribute of the current frame; the attribute information of the current frame is determined according to the position of the second feature point corresponding to the current frame and the image attribute of the current frame.
  • the three-dimensional perceptual map includes a plurality of first feature points; the second obtaining module 802 specifically obtains the attribute information of the reference frame through a subordinate method: in the plurality of first features
  • the second feature point corresponding to the reference frame is determined from the points, wherein the second feature point corresponding to the reference frame is the first feature point captured by the camera based on the second pose information and the first field of view.
  • a feature point determine the image attribute of the reference frame according to the image attribute of the first video frame; determine the reference frame according to the position of the second feature point corresponding to the reference frame and the image attribute of the reference frame Attribute information of the frame.
  • the first obtaining module 801 is further configured to construct the three-dimensional perception map according to the target image data of the first video frame; or according to the target image data of the first video frame and the first information obtained by the first sensor to construct the three-dimensional perception map, wherein the first information includes the motion information when the camera captures the first video frame and/or the spatial ranging of the first scene information.
  • the three-dimensional perceptual map includes a plurality of first feature points; the first obtaining module 801 constructs the three-dimensional perceptual map in the following manner: for the target of the first video frame Perform image analysis on the image data to obtain the image attributes of the first video frame; perform spatial analysis on the target image data of the first video frame to obtain the positions of the plurality of first feature points and the descriptions captured by the camera. the pose sequence of the first video frame; constructing the three-dimensional perception map according to the image attributes of the first video frame, the positions of the plurality of first feature points, and the pose sequence.
  • the three-dimensional perceptual map includes a plurality of first feature points; the first obtaining module 801 constructs the three-dimensional perceptual map in the following manner: for the target of the first video frame Perform image analysis on the image data to obtain the image attributes of the first video frame; perform spatial analysis on the target image data of the first video frame in combination with the first information to obtain the positions and values of the plurality of first feature points.
  • the camera shoots the pose sequence of the first video frame; and the three-dimensional perception map is constructed according to the image attributes of the first video frame, the positions of the plurality of first feature points, and the pose sequence.
  • the first obtaining module 801 is further configured to update the three-dimensional perceptual map according to the target image data of the current frame; or according to the target image data of the current frame and the first The three-dimensional perception map is updated with second information obtained by a sensor, where the second information includes motion information when the camera captures the current frame and/or spatial ranging of the real scene indicated by the current frame information.
  • the embodiment of the present application further provides an electronic device, and the electronic device may be, for example, a computer, a server, a mobile phone, an electronic reader, etc., which is not particularly limited in the embodiment of the present application.
  • the electronic device may include a communication module, one or more memories, and one or more processors, wherein: the communication module is used to communicate with other devices, the one or more memories are used to store one or more computer programs, one or more Each processor is configured to execute one or more computer programs, so that the electronic device executes the technical solutions of any one of the foregoing method embodiments.
  • the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is made to execute the technical solutions of any one of the foregoing method embodiments.
  • the present application also provides a computer program including instructions, which, when the computer program is executed by a computer, is used to execute the technical solution of any one of the above method embodiments.
  • the present application further provides a chip, including a processor and a memory, where the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute any of the above method embodiments.
  • the chip may further include a memory and a communication interface.
  • the communication interface may be an input/output interface, a pin or an input/output circuit, or the like.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed in the embodiments of the present application may be directly embodied as executed by a hardware coding processor, or executed by a combination of hardware and software modules in the coding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供一种视频帧处理方法和装置,其中,方法包括:根据第一姿态信息和第一视场,在三维感知地图中获得当前帧的属性信息,其中,第一姿态信息为摄像头拍摄当前帧的姿态信息,第一视场为摄像头的视场;根据当前帧的属性信息,获得当前帧的目标图像数据; 其中,三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,第一场景为第一视频帧所指示的真实场景,第一视频帧为构建三维感知地图的视频帧,第一视频帧和当前帧为摄像头拍摄的视频中的视频帧,第一视频帧位于当前帧之前。本申请能够避免图像处理算法在相邻帧的同一维度上的处理出现不一致的现象(即时域不一致现象),提升视频的显示效果。

Description

视频帧处理方法和装置 技术领域
本申请涉及图像处理技术领域,具体涉及一种视频帧处理方法和视频帧处理装置。
背景技术
随着科学技术的飞速发展,视频拍摄已经成为人们记录生活和分享知识的重要途径之一。在这种大环境下,为了向用户提供更好的拍摄体验,对视频帧处理技术提出了更高要求。
现有视频帧处理技术的原理为:在视频帧的原始图像数据的基础上,采用各种图像处理算法,对视频帧进行处理。显然,现有视频帧处理技术的数据基础仅为视频帧的原始图像数据,这样,在一些情况下,图像处理算法在相邻帧的同一维度(例如颜色、亮度)上的处理会出现不一致的现象(即时域不一致现象)。例如,相邻帧中同一对象(例如墙面)的亮度不同。
发明内容
本申请提供一种视频帧处理方法和视频帧处理装置,用于解决图像处理算法在相邻帧的同一维度上的处理出现不一致的现象(即时域不一致现象)。
第一方面,本申请提供一种视频帧处理方法,包括:根据第一姿态信息和第一视场,在三维感知地图中获得当前帧的属性信息,其中,所述第一姿态信息为摄像头拍摄所述当前帧的姿态信息,所述第一视场为所述摄像头的视场;根据所述当前帧的属性信息,获得所述当前帧的目标图像数据;其中,所述三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,所述第一场景为所述第一视频帧所指示的真实场景,所述第一视频帧为构建所述三维感知地图的视频帧,所述第一视频帧和所述当前帧为所述摄像头拍摄的视频中的视频帧,所述第一视频帧位于所述当前帧之前。
由于三维感知地图是根据位于当前帧之前的第一视频帧构建而成,因此通过三维感知地图获得的当前帧的属性信息是基于第一视频帧推算得到的,即当前帧的属性信息携带了当前帧之前的视频帧的信息,这样,通过当前帧的属性信息获得当前帧的目标图像数据,可以理解为结合当前帧之前的视频帧的信息获得当前帧的目标图像数据,能够避免图像处理算法在相邻帧的同一维度上的处理出现不一致的现象(即时域不一致现象),提升视频的显示效果。此外,由于三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,因此三维感知地图不仅描述了第一场景的空间位置信息,还描述了第一视频帧的图像属性,其携带的信息丰富多样,提高了视频帧处理方法的鲁棒性。
在一种可能的实现方式中,所述根据所述当前帧的属性信息,获得所述当前帧的目标图像数据包括:根据所述当前帧的属性信息,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当 前帧的目标图像数据;或者获得所述当前帧的原始图像数据,根据所述当前帧的属性信息对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
在一种可能的实现方式中,所述根据所述当前帧的属性信息,获得所述当前帧的目标图像数据包括:根据第二姿态信息和所述第一视场,在所述三维感知地图中获得参考帧的属性信息,其中,所述第二姿态信息为所述摄像头拍摄所述参考帧时的姿态信息,所述参考帧为位于所述当前帧之前的视频帧;根据所述参考帧的属性信息和所述当前帧的属性信息,计算所述当前帧与所述参考帧的相似度;若所述相似度大于预设相似度,则根据所述当前帧的属性信息,获得所述当前帧的目标图像数据。
通过计算参考帧与当前帧的相似度,以及在相似度大于预设相似度时,根据当前帧的属性信息获得当前帧的目标图像数据,相当于在确定当前帧与参考帧应该保持一致时,根据当前帧的属性信息获得当前帧的目标图像数据,提高了视频帧处理方法应用的准确性,同时也增加了视频帧处理方法在不同情景下的实用性。
在一种可能的实现方式中,所述根据所述当前帧的属性信息,获得所述当前帧的目标图像数据包括:根据所述当前帧的属性信息和所述参考帧的属性信息确定所述当前帧与所述参考帧的属性差异;根据所述属性差异,获得所述当前帧的目标图像数据。
在一种可能的实现方式中,所述根据所述属性差异,获得所述当前帧的目标图像数据包括:根据所述属性差异,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者获得所述当前帧的原始图像数据,根据所述属性差异对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述在三维感知地图中获得当前帧的属性信息包括:在所述多个第一特征点中确定所述当前帧对应的第二特征点,其中,所述当前帧对应的第二特征点为被基于所述第一姿态信息和所述第一视场的所述摄像头拍到的第一特征点;根据所述第一视频帧的图像属性确定所述当前帧的图像属性;根据所述当前帧对应的第二特征点的位置和所述当前帧的图像属性,确定所述当前帧的属性信息。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述在所述三维感知地图中获得参考帧的属性信息包括:在所述多个第一特征点中确定所述参考帧对应的第二特征点,其中,所述参考帧对应的第二特征点为被基于所述第二姿态信息和所述第一视场的所述摄像头拍到的第一特征点;根据所述第一视频帧的图像属性,确定所述参考帧的图像属性;根据所述参考帧对应的第二特征点的位置和所述参考帧的图像属性,确定所述参考帧的属性信息。
在一种可能的实现方式中,所述方法还包括:根据所述第一视频帧的目标图像数据构建所述三维感知地图;或者根据所述第一视频帧的目标图像数据和第一传感器获得的第一信息构建所述三维感知地图,其中,所述第一信息包括所述摄像头拍摄所述第一视频帧时的运动信息和/或所述第一场景的空间测距信息。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述根据所述第一视频帧的目标图像数据构建所述三维感知地图包括:对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;对所述第一视频帧的目标图像数据进行 空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述根据所述第一视频帧的目标图像数据和第一传感器获得的第一信息构建所述三维感知地图包括:对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;结合所述第一信息对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
在一种可能的实现方式中,所述方法还包括:根据所述当前帧的目标图像数据对所述三维感知地图进行更新;或者根据所述当前帧的目标图像数据和第一传感器获得的第二信息对所述三维感知地图进行更新,所述第二信息包括所述摄像头拍摄所述当前帧时的运动信息和/或所述当前帧所指示的真实场景的空间测距信息。
第二方面,本申请提供一种视频帧处理装置,包括:第一获得模块,用于根据第一姿态信息和第一视场,在三维感知地图中获得当前帧的属性信息,其中,所述第一姿态信息为摄像头拍摄所述当前帧的姿态信息,所述第一视场为所述摄像头的视场;第二获得模块,用于根据所述当前帧的属性信息,获得所述当前帧的目标图像数据;其中,所述三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,所述第一场景为所述第一视频帧所指示的真实场景,所述第一视频帧为构建所述三维感知地图的视频帧,所述第一视频帧和所述当前帧为所述摄像头拍摄的视频中的视频帧,所述第一视频帧位于所述当前帧之前。
在一种可能的实现方式中,所述第二获得模块,具体用于根据所述当前帧的属性信息,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者获得所述当前帧的原始图像数据,根据所述当前帧的属性信息对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
在一种可能的实现方式中,所述第二获得模块,具体用于根据第二姿态信息和所述第一视场,在所述三维感知地图中获得参考帧的属性信息,其中,所述第二姿态信息为所述摄像头拍摄所述参考帧时的姿态信息,所述参考帧为位于所述当前帧之前的视频帧;根据所述参考帧的属性信息和所述当前帧的属性信息,计算所述当前帧与所述参考帧的相似度;若所述相似度大于预设相似度,则根据所述当前帧的属性信息,获得所述当前帧的目标图像数据。
在一种可能的实现方式中,所述第二获得模块具体通过下述方式获得所述当前帧的目标图像数据:根据所述当前帧的属性信息和所述参考帧的属性信息确定所述当前帧与所述参考帧的属性差异;根据所述属性差异,获得所述当前帧的目标图像数据。
在一种可能的实现方式中,所述第二获得模块具体通过下述方式获得所述当前帧的目标图像数据:根据所述属性差异,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者获得所述当前帧的原始图像数据,根据所述属性差异对所述当前帧的原始 图像数据进行处理,以得到所述当前帧的目标图像数据。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第一获得模块,具体用于在所述多个第一特征点中确定所述当前帧对应的第二特征点,其中,所述当前帧对应的第二特征点为被基于所述第一姿态信息和所述第一视场的所述摄像头拍到的第一特征点;根据所述第一视频帧的图像属性确定所述当前帧的图像属性;根据所述当前帧对应的第二特征点的位置和所述当前帧的图像属性,确定所述当前帧的属性信息。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第二获得模块具体通过下属方式获得所述参考帧的属性信息:在所述多个第一特征点中确定所述参考帧对应的第二特征点,其中,所述参考帧对应的第二特征点为被基于所述第二姿态信息和所述第一视场的所述摄像头拍到的第一特征点;根据所述第一视频帧的图像属性,确定所述参考帧的图像属性;根据所述参考帧对应的第二特征点的位置和所述参考帧的图像属性,确定所述参考帧的属性信息。
在一种可能的实现方式中,所述第一获得模块,还用于根据所述第一视频帧的目标图像数据构建所述三维感知地图;或者根据所述第一视频帧的目标图像数据和第一传感器获得的第一信息构建所述三维感知地图,其中,所述第一信息包括所述摄像头拍摄所述第一视频帧时的运动信息和/或所述第一场景的空间测距信息。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第一获得模块具体通过下述方式构建所述三维感知地图:对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第一获得模块具体通过下述方式构建所述三维感知地图:对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;结合所述第一信息对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
在一种可能的实现方式中,所述第一获得模块,还用于根据所述当前帧的目标图像数据对所述三维感知地图进行更新;或者根据所述当前帧的目标图像数据和第一传感器获得的第二信息对所述三维感知地图进行更新,所述第二信息包括所述摄像头拍摄所述当前帧时的运动信息和/或所述当前帧所指示的真实场景的空间测距信息。
第三方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机或处理器上运行时,使得所述计算机或处理器执行如第一方面中任一项所述的方法。
第四方面,一种包含指令的计算机程序产品,当其在计算机或处理器上运行时,使得所述计算机或处理器执行如第一方面中任一项所述的方法。
第五方面,一种芯片,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行第一方面中任一项所述的 方法。
附图说明
图1为本申请实施例提供的视频帧处理方法的流程示意图;
图2为本申请实施例提供的通过第一种方式构建三维感知地图的流程示意图;
图3为本申请实施例提供的将第一视频帧的图像属性携带在三维地图中的示意图;
图4为本申请实施例提供的通过第二种方式构建三维感知地图的流程示意图;
图5为本申请实施例提供的获得当前帧的属性信息的流程示意图;
图6为本申请实施例提供的确定参考帧与当前帧的属性差异的示意图;
图7为本申请实施例提供的对三维感知地图的大小进行管理的示意图;
图8为本申请实施例提供的一种视频帧处理装置的结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书实施例和权利要求书及附图中的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
为了解决上述技术问题,本申请提供了一种视频帧处理方法,该视频帧处理方法可以应用在手机、笔记本电脑、平板电脑、台式电脑、电子阅读器等能够进行视频拍摄和/或数据处理的电子设备上。
如图1所示,该视频帧处理方法包括以下步骤:
101、根据第一姿态信息和第一视场,在三维感知地图中获得当前帧的属性信息。
其中,第一姿态信息为摄像头拍摄当前帧的姿态信息,第一视场为摄像头的视场。
102、根据当前帧的属性信息,获得当前帧的目标图像数据。
其中,三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性, 第一场景为第一视频帧所指示的真实场景,第一视频帧为构建三维感知地图的视频帧,第一视频帧和当前帧为摄像头拍摄的视频中的视频帧,第一视频帧位于当前帧之前。
由上可知,由于三维感知地图是根据位于当前帧之前的第一视频帧构建而成,因此通过三维感知地图获得的当前帧的属性信息是基于第一视频帧推算得到的,即当前帧的属性信息携带了当前帧之前的视频帧的信息,这样,通过当前帧的属性信息获得当前帧的目标图像数据,可以理解为结合当前帧之前的视频帧的信息获得当前帧的目标图像数据,能够避免图像处理算法在相邻帧的同一维度上的处理出现不一致的现象(即时域不一致现象),提升视频的显示效果。此外,由于三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,因此三维感知地图不仅描述了第一场景的空间位置信息,还描述了第一视频帧的图像属性,其携带的信息丰富多样,提高了视频帧处理方法的鲁棒性。
下面,对三维感知地图及其构建过程进行说明。
三维感知地图为第一场景的三维地图,其包括多个第一特征点,并用于指示第一视频帧的图像属性、摄像头拍摄第一视频帧的位姿序列等,本申请对此不作特殊限定。
由于三维感知地图为第一场景的三维地图,因此第一特征点可以为三维感知地图中用于指示第一场景中的要素的点。要素包括但不限于语义和特征等。例如,可以将三维感知地图中用于指示第一场景中的对象的边界点确定为第一特征点。
构建三维感知地图的方式包括但不限于以下两种:
第一种,根据第一视频帧的目标图像数据构建三维感知地图。第二种,根据第一视频帧的目标图像数据和第一传感器获得的第一信息构建三维感知地图,其中,第一信息包括摄像头拍摄第一视频帧时的运动信息和/或第一场景的空间测距信息。
第一视频帧为摄像头拍摄的视频中位于当前帧之前的视频帧。第一视频帧的数量可以为1个或者多个,本申请对此不作特殊限定。例如,可以根据视频中位于当前帧之前的视频帧的数量确定第一视频帧的数量,具体的,第一视频帧的数量与位于当前帧之前的视频帧的数量呈正相关关系。
选择第一视频帧的方式包括但不限于以下四种方式:
第一种,在位于当前帧之前的视频帧中任意选择第一视频帧。
第二种,按照一定的时间间隔在位于当前帧之前的视频帧中选择第一视频帧。
第三种,将位于当前帧之前的视频帧均确定为第一视频帧。
第四种,将位于当前帧之前的关键帧确定为第一视频帧,其中,关键帧的确定依据例如可以为视频帧中的内容。
第一视频帧的目标图像数据包括但不限于第一视频帧中的每个像素的位置和像素值、第一视频帧的曝光参数、白平衡系数、色温、光源信息以及GPS信息等。
第一传感器包括但不限于惯性测量单元、飞行时间测距单元等。
第一场景的空间测距信息包括在拍摄第一视频帧时,摄像头与第一场景中的特征点的距离。第一场景中的特征点与三维感知地图中的第一特征点一一对应,第一特征点和第一场景中与其对应的特征点表示相同的要素,区别仅在于第一特征点在三维感知地图中,第一场景中的特征点在真实场景中。要素包括但不限于语义、特征等。
摄像头拍摄第一视频帧时的运动信息包括摄像头拍摄第一视频帧的位置和摄像头拍摄第一视频帧的运动速度和运动方向等。
图2为本申请实施例提供的通过第一种方式构建三维感知地图的流程示意图。如图2所示,构建过程包括:
201、对第一视频帧的目标图像数据进行图像分析,得到第一视频帧的图像属性。
图像分析包括但不限于:语义分析、颜色分析、亮度分析、光源分析、色温分析、噪声分析、质量分析等。
语义分析例如可以通过深度学习等技术来实现。通过语义分析可以识别第一视频帧所包含的对象,以及每个对象的位置。对象包括但不限于蓝天、白云、人、车、马路等。
第一视频帧的图像属性包括局部属性和全局属性。其中,全局属性包括但不限于第一视频帧的亮度、对焦信息、曝光信息、白平衡、色温和光源信息等。局部属性包括但不限于第一视频帧中每个对象所对应的区域的亮度、白平衡、色温,局部光源信息以及每个对象的标识信息和位置信息等。
需要说明的是,若第一视频帧为多个,则在201中,需要分别对每个第一视频帧进行图像分析,以得到每个第一视频帧的图像属性。
202、对第一视频帧的目标图像数据进行空间分析,得到多个第一特征点的位置和摄像头拍摄第一视频帧的位姿序列。
第一特征点的位置指第一特征点在三维感知地图中的坐标。
若第一视频帧的数量为1,则摄像头拍摄第一视频帧的位姿序列为:摄像头拍摄第一视频帧的姿态信息。
若第一视频帧的数量为多个,则摄像头拍摄第一视频帧的位姿序列为:摄像头拍摄每个第一视频帧时的姿态信息,按照第一视频帧的拍摄时间的先后顺序排列后的序列。
摄像头拍摄第一视频帧的姿态信息包括但不限于摄像头拍摄第一视频帧的拍摄位置和拍摄角度。
空间分析包括但不限于深度分析、VIO(Visual inertial odometry视觉惯性里程计)分析、SLAM(simultaneous localization and mapping同时定位和地图构建)分析等,本申请对此不作特殊限定。
203、根据第一视频帧的图像属性、多个第一特征点的位置和摄像头拍摄第一视频帧的位姿序列,构建三维感知地图。
具体的构建过程例如可以为:根据第一特征点的位置构建一个三维地图,将第一视频帧的图像属性和摄像头拍摄第一视频帧的位姿序列携带在三维地图中,以及将携带上述信息的三维地图确定为三维感知地图。三维地图例如可以为稀疏三维点云地图或者密集三维点云地图。
第一视频帧的图像属性携带在三维地图中的方式包括但不限于以下两种方式:
第一种,将第一视频帧的图像属性保存在三维地图中的一个预设的存储区域中。
第二种,将第一视频帧的图像属性携带在与第一视频帧对应的一个第一特征点上。第一视频帧对应的第一特征点为多个第一特征点中指示第一视频帧中的要素的第一特征点。
需要说明的是,由于相同的要素可能出现在多个第一视频帧中,因此一个第一特征点上可能携带多个第一视频帧的图像属性。
图3为本申请实施例提供的将第一视频帧的图像属性携带在三维地图中的示意图。由图3可知,第一视频帧301的数量为3个,指向第一视频帧301的三角形表示第一视频帧 的全局属性,第一视频帧301中的正方形表示第一视频帧的局部属性,三维地图302中的圆形表示第一特征点。由三维感知地图303可知,第一视频帧301的图像属性携带在第一特征点上,其中两个第一特征点携带了两个第一视频帧的图像属性,其他的第一特征点均携带了一个第一视频帧的图像属性。
位姿序列携带在三维地图中的方式可以为:将位姿序列存储在三维地图中的一个预设的存储区域中,或者将位姿序列携带在第一特征点上等。
需要说明的是,在其他可能的实现方式中,还可以对三维感知地图进行增强现实渲染,以优化三维感知地图的显示效果。
图4为本申请实施例提供的通过第二种方式构建三维感知地图的流程示意图。如图4所示,构建过程包括:
401、对第一视频帧的目标图像数据进行图像分析,得到第一视频帧的图像属性。由于该过程已经在上文中进行了说明,因此此处不再赘述。
402、结合第一信息对第一视频帧的目标图像数据进行空间分析,得到多个第一特征点的位置和摄像头拍摄第一视频帧的位姿序列。
由于空间分析、第一信息、摄像头拍摄第一视频帧的位姿序列、多个第一特征点的位置已经在上文中进行了说明,因此此处不再赘述。
403、根据第一视频帧的图像属性、多个第一特征点的位置和摄像头拍摄第一视频帧的位姿序列,构建三维感知地图。由于该步骤已经在上文中进行了说明,因此此处不再赘述。
需要说明的是,在本申请的其他实施例中,在摄像头拍摄第一视频帧的过程中,还可以通过其他传感器(例如广角摄像头、红外传感器等)获得第一视频帧所指示的真实场景中的信息,并通过其他传感器获得的信息对三维感知地图中的信息进行补充和校准,以进一步增加三维感知地图携带信息的维度和携带的信息的准确性。
图5为本申请实施例提供的获得当前帧的属性信息的流程示意图,如图5所示,包括以下步骤:
501、在三维感知地图中的多个第一特征点中确定当前帧对应的第二特征点。当前帧对应的第二特征点为多个第一特征点中被基于第一姿态信息和第一视场的摄像头拍到的第一特征点。
示例性的,可以构建第一二维感知地图,该第一二维感知地图的规格和位置根据第一姿态信息和第一视场确定,将多个第一特征点向第一二维感知地图中映射,将映射至第一二维感知地图的第一特征点确定为当前帧对应的第二特征点。
第一姿态信息包括但不限于摄像头拍摄当前帧时的拍摄位置和拍摄角度。
由于摄像头拍摄的视频帧为二维平面,因此,当前帧对应的第二特征点的位置为第二特征点在二维空间中的坐标。确定第二特征点在二维空间中的坐标的方式例如可以为:根据第二特征点(即被摄像头拍摄到的第一特征点)在三维感知地图中的坐标,将第二特征点从三维感知地图映射至二维空间,即可得到了第二特征点在二维空间中的坐标。
502,根据第一视频帧的图像属性确定当前帧的图像属性。具体的实现过程包括但不限于以下两种方式:
第一种,根据每个第一视频帧的拍摄时间和当前帧的拍摄时间,将拍摄时间与当前帧的拍摄时间最近的第一视频帧确定为目标视频帧,将目标视频帧的图像属性确定为当前帧的图像属性。
第二种,在摄像头拍摄第一视频帧的位姿序列中确定摄像头拍摄每个第一视频帧的姿态信息,根据摄像头拍摄每个第一视频帧的姿态信息和第一姿态信息,将姿态信息与第一姿态信息最接近的第一视频帧确定为目标视频帧,将目标视频帧的图像属性确定为当前帧的图像属性。
需要说明的是,上述确定当前帧的图像属性的方式仅为示例性的,并不用于限定本申请。例如,还可以将拍摄时间与当前帧的拍摄时间的时间差在预设时差内的第一视频帧确定为目标视频帧,根据目标视频帧的图像属性确定当前帧的图像属性。具体的,若目标视频帧的数量为一个,则将该目标视频帧的图像属性确定为当前帧的图像属性,若目标视频帧的数量为多个,则可以对多个目标视频帧的图像属性中的相同维度的属性做平均值,以及将所有维度的属性的平均值确定为当前帧的图像属性。
503,根据当前帧对应的第二特征点的位置和当前帧的图像属性,确定当前帧的属性信息。示例性的,将当前帧对应的第二特征点的位置和当前帧的图像属性集合后,确定为当前帧的属性信息。
获得当前帧的目标图像数据的方式包括但不限于以下两种,其中:
第一种,根据当前帧的属性信息,确定摄像头的拍摄参数,通过基于该拍摄参数、第一姿态信息和第一视场的摄像头拍摄当前帧,得到当前帧的目标图像数据。
可以根据当前帧的属性信息中每个维度的属性的数值确定摄像头中对应的拍摄参数,通过基于该拍摄参数、第一姿态信息和第一视场的摄像头拍摄当前帧,将拍摄当前帧时采集的数据确定为当前帧的目标图像数据。
摄像头的拍摄参数包括但不限于亮度、对焦信息、曝光、白平衡系数等。
基于上述第一种方式,该视频帧处理方法的应用场景如下:
在用户即将通过摄像头拍摄当前帧时,根据摄像头拍摄当前帧的姿态信息和摄像头的视场,从由当前帧之前的视频帧构建的三维感知地图中获得当前帧的属性信息,根据当前帧的属性信息确定摄像头的拍摄参数,以及通过基于该拍摄参数、第一姿态信息和第一视场的摄像头拍摄当前帧,以得到当前帧的目标图像数据。
第二种,获得当前帧的原始图像数据,根据当前帧的属性信息对当前帧的原始图像数据进行处理,以得到当前帧的目标图像数据。
基于第二种方式,该视频处理方法的应用场景如下:
在用户完成视频拍摄后,将视频中的待处理视频帧确定为当前帧,以及将拍摄当前帧时采集的图像数据确定为当前帧的原始图像数据。然后,根据摄像头拍摄当前帧的姿态信息和摄像头的视场,从由当前帧之前的视频帧构建的三维感知地图中获得当前帧的属性信息。最后,根据当前帧的属性信息对当前帧的原始图像进行处理,即可得到当前帧的目标图像数据。
显然,基于上述应用场景,当前帧的原始图像数据为摄像头拍摄当前帧时采集的图像数据。示例性的,根据当前帧的属性信息对当前帧的原始图像数据进行处理的过程可以为: 可以根据当前帧的属性信息中每个维度的属性的数值并结合相应的算法对当前帧的原始图像数据中相应维度的数据进行处理,以得到当前帧的目标图像数据。
获得当前帧的目标图像数据的时机包括但不限于以下两种,其中:
第一种,在获得当前帧的属性信息后,直接通过当前帧的属性信息获得当前帧的目标图像数据。
第二种,首先,在摄像头拍摄的视频中选择一个视频帧作为参考帧,该参考帧位于当前帧之前。参考帧可以与当前帧相邻,也可以不相邻,本申请对此不作特殊限定。然后,根据第二姿态信息和第一视场,在三维感知地图中获得参考帧的属性信息。第二姿态信息为摄像头拍摄参考帧的姿态信息,摄像头拍摄参考帧的姿态信息包括但不限于摄像头拍摄参考帧的拍摄位置和拍摄角度。最后,根据参考帧的属性信息和当前帧的属性信息计算当前帧与参考帧的相似度,若相似度大于预设相似度,则根据当前帧的属性信息,获得当前帧的目标图像数据。
由于参考帧与当前帧的相似度越高,说明参考帧与当前帧越接近,进而说明参考帧与当前帧越应该保持一致。因此,在上述方式中,通过计算参考帧与当前帧的相似度,以及在相似度大于预设相似度时,根据当前帧的属性信息获得当前帧的目标图像数据,相当于在确定当前帧与参考帧应该保持一致时,根据当前帧的属性信息获得当前帧的目标图像数据,提高了视频帧处理方法应用的准确性,同时也增加了视频帧处理方法在不同情景下的实用性。
获得参考帧的属性信息的过程可以如下所示:
在三维感知地图中的多个第一特征点中确定参考帧对应的第二特征点,参考帧对应的第二特征点为多个第一特征点中被基于第二姿态信息和第一视场的摄像头拍到的第一特征点。示例性的,可以构建第二二维感知地图,该第二二维感知地图的规格和位置根据第二姿态信息和第一视场确定,将多个第一特征点向第二二维感知地图映射,将映射至第二二维感知地图的第一特征点确定为参考帧对应的第二特征点。根据第一视频帧的图像属性确定参考帧的图像属性。根据参考帧对应的第二特征点的位置和参考帧的图像属性,确定参考帧的属性信息。
需要说明的是,由于摄像头拍摄的视频帧为二维平面,因此,参考帧对应的第二特征点的位置为第二特征点在二维空间中的坐标。由于确定第二特征点在二维空间中的坐标的原理已经在上文中说明,因此此处不再赘述。另外,由于确定参考帧的图像属性的原理与确定当前帧的图像属性的原理相似,因此此处不再对确定参考帧的图像属性的原理进行说明。
确定当前帧和参考帧的相似度的方式包括但不限于以下两种:
第一种,根据参考帧的属性信息中第二特征点的位置和当前帧的属性信息中的第二特征点的位置计算参考帧与当前帧在空间上的距离,然后,根据该距离确定参考帧与当前帧的相似度。具体的,参考帧与当前帧之间的距离越小,当前帧与参考帧的相似度越大。
第二种,可以根据参考帧的属性信息以及当前帧的属性信息,计算参考帧与当前帧基于同一维度的属性的相似度(例如,参考帧与当前帧在空间上的相似度、曝光参数上的相似度、白平衡上的相似度、两者包含的对象上的相似度等)。对基于各个维度求得的相似 度加权求和,以得到参考帧与当前帧的相似度。
可以根据参考帧与当前帧之间间隔的视频帧的数量、第一姿态信息与第二姿态信息之间的差异等设置预设相似度。
在引入参考帧的基础上,获得当前帧的目标图像数据的方式还可以为:根据当前帧的属性信息和参考帧的属性信息确定当前帧与参考帧的属性差异。根据当前帧与参考帧的属性差异,获得当前帧的目标图像数据。
具体的,确定当前帧与参考帧的属性差异的方式可以为:
根据当前帧的属性信息和参考帧的属性信息,确定相同维度的属性在当前帧与参考帧中的差异值,将所有维度的属性在当前帧与参考帧中的差异值集合后,即可得到当前帧与参考帧的属性差异。
根据当前帧与参考帧的属性差异,获得当前帧的目标图像数据的方式包括但不限于以下两种:
第一种,根据当前帧与参考帧的属性差异,确定摄像头的拍摄参数,通过基于该拍摄参数、第一姿态信息和第一视场的摄像头拍摄当前帧,得到当前帧的目标图像数据。
可以根据每个维度的属性差异,分别确定摄像头中对应的拍摄参数,以及通过基于该拍摄参数、第一姿态信息和第一场景的摄像头拍摄当前帧,将拍摄当前帧时采集的数据确定为当前帧的目标图像数据。
第一种方式应用在视频的拍摄过程中,该应用场景已经在上文中进行了说明,因此此处不再赘述。
第二种,获得当前帧的原始图像数据,根据属性差异对当前帧的原始图像数据进行处理,以得到当前帧的目标图像数据。
可以根据每个维度的属性差异,并结合相应的算法对当前帧的原始图像数据中相应维度的数据进行处理,以得到当前帧的目标图像数据。
第二种方式应用在视频拍摄完成后,由于该应用场景已经在上文中进行了说明,因此此处不再赘述。
图6为本申请实施例提供的确定参考帧与当前帧的属性差异的示意图。如图6所示,确定属性差异的过程为:
构建第一二维感知地图601和第二二维感知地图602,将三维感知地图603中的多个第一特征点分别向第一二维感知地图601和第二二维感知地图602中映射,以及确定当前帧的属性信息和参考帧的属性信息。其中,第一二维感知地图601中的圆形表示当前帧对应的第二特征点,第二二维感知地图602中的圆形表示参考帧对应的第二特征点。第一二维感知地图601下方的正方形表示当前帧的属性信息中的局部属性,第一二维感知地图601下方的三角形表示当前帧的属性信息中的全局属性。第二二维感知地图602下方的正方形表示参考帧的属性信息中的局部属性,第二二维感知地图602下方的三角形表示参考帧的属性信息中的全局属性。
根据当前帧对应的第二特征点的位置和参考帧对应的第二特征点的位置确定当前帧与参考帧在空间上的差异。根据当前帧的属性信息中的全局属性和参考帧的属性信息中的全局属性,确定相同维度的全局属性在当前帧与参考帧中的差异。根据当前帧的属性信息 中的局部属性和参考帧的属性信息中的局部属性,确定相同维度的局部属性在当前帧与参考帧中的差异。
根据当前帧与参考帧在空间上的差异、相同维度的全局属性在当前帧与参考帧中的差异、相同维度的局部属性在当前帧与参考帧中的差异,确定当前帧与参考帧的属性差异。
在完成三维感知地图的构建后,还可以对三维感知地图进行管理,此处的管理包括但不限于三维感知地图的更新、三维感知地图的大小管理等。
三维感知地图更新的原理如下:
通过当前帧的目标图像数据对三维感知地图进行更新,或者通过当前帧的目标图像数据和第一传感器获得的第二信息对三维感知地图进行更新,其中,第二信息包括摄像头拍摄当前帧时的运动信息和/或当前帧所指示的真实场景的空间测距信息。
由于此处的三维感知地图的更新原理与上文中三维感知地图的构建原理相似,因此,此处不再赘述。
通过对三维感知地图进行更新,能够使三维感知地图在时域上延展,从而使三维感知地图携带的信息随着视频的推移而进行扩充,进而使三维感知地图能够随着视频的推移,更加准确和全面的反应视频所展示的真实场景,进而提高处理后续视频帧的准确性和灵活性。
三维感知地图的大小管理的原理如下:
对三维感知地图中距离相近的第一特征点进行合并,以在确保三维感知地图的准确性的前提下,缩小三维感知地图,节省三维感知地图的存储空间。例如,图7为本申请实施例提供的对三维感知地图的大小进行管理的示意图。如图7所示,圆形表示第一特征点,三角形表示第一视频帧的全局属性,正方形表示第一视频帧的局部属性。由图7可知,通过对左边的三维感知地图701中距离相近的第一特征点进行合并,即可得到右边的三维感知地图702,显然,右边的三维感知地图702相较于左边的三维感知地图701小了很多。
触发对三维感知地图的大小进行管理的条件可以为下述任意一种:
第一种,按照一预设周期对三维感知地图的大小进行管理。
第二种,按照一预设周期检测三维感知地图的大小,若三维感知地图的大小大于预设大小,则对三维感知地图的大小进行管理。预设周期与预设大小根据技术需求确定,本申请对此不作特殊限定。
图8为本申请实施例提供的一种视频帧处理装置的结构示意图,如图8所示,该装置800可以包括:第一获得模块801和第二获得模块802,其中:
第一获得模块801,用于根据第一姿态信息和第一视场,在三维感知地图中获得当前帧的属性信息,其中,所述第一姿态信息为摄像头拍摄所述当前帧的姿态信息,所述第一视场为所述摄像头的视场;
第二获得模块802,用于根据所述当前帧的属性信息,获得所述当前帧的目标图像数据;其中,所述三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,所述第一场景为所述第一视频帧所指示的真实场景,所述第一视频帧为构建所述三维感知地图的视频帧,所述第一视频帧和所述当前帧为所述摄像头拍摄的视频中的视频帧, 所述第一视频帧位于所述当前帧之前。
在一种可能的实现方式中,所述第二获得模块802,具体用于根据所述当前帧的属性信息,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者获得所述当前帧的原始图像数据,根据所述当前帧的属性信息对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
在一种可能的实现方式中,所述第二获得模块802,具体用于根据第二姿态信息和所述第一视场,在所述三维感知地图中获得参考帧的属性信息,其中,所述第二姿态信息为所述摄像头拍摄所述参考帧时的姿态信息,所述参考帧为位于所述当前帧之前的视频帧;根据所述参考帧的属性信息和所述当前帧的属性信息,计算所述当前帧与所述参考帧的相似度;若所述相似度大于预设相似度,则根据所述当前帧的属性信息,获得所述当前帧的目标图像数据。
在一种可能的实现方式中,所述第二获得模块802具体通过下述方式获得所述当前帧的目标图像数据:根据所述当前帧的属性信息和所述参考帧的属性信息确定所述当前帧与所述参考帧的属性差异;根据所述属性差异,获得所述当前帧的目标图像数据。
在一种可能的实现方式中,所述第二获得模块802具体通过下述方式获得所述当前帧的目标图像数据:根据所述属性差异,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者获得所述当前帧的原始图像数据,根据所述属性差异对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第一获得模块801,具体用于在所述多个第一特征点中确定所述当前帧对应的第二特征点,其中,所述当前帧对应的第二特征点为被基于所述第一姿态信息和所述第一视场的所述摄像头拍到的第一特征点;根据所述第一视频帧的图像属性确定所述当前帧的图像属性;根据所述当前帧对应的第二特征点的位置和所述当前帧的图像属性,确定所述当前帧的属性信息。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第二获得模块802具体通过下属方式获得所述参考帧的属性信息:在所述多个第一特征点中确定所述参考帧对应的第二特征点,其中,所述参考帧对应的第二特征点为被基于所述第二姿态信息和所述第一视场的所述摄像头拍到的第一特征点;根据所述第一视频帧的图像属性,确定所述参考帧的图像属性;根据所述参考帧对应的第二特征点的位置和所述参考帧的图像属性,确定所述参考帧的属性信息。
在一种可能的实现方式中,所述第一获得模块801,还用于根据所述第一视频帧的目标图像数据构建所述三维感知地图;或者根据所述第一视频帧的目标图像数据和第一传感器获得的第一信息构建所述三维感知地图,其中,所述第一信息包括所述摄像头拍摄所述第一视频帧时的运动信息和/或所述第一场景的空间测距信息。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第一获得模块801具体通过下述方式构建所述三维感知地图:对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列; 根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
在一种可能的实现方式中,所述三维感知地图包括多个第一特征点;所述第一获得模块801具体通过下述方式构建所述三维感知地图:对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;结合所述第一信息对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
在一种可能的实现方式中,所述第一获得模块801,还用于根据所述当前帧的目标图像数据对所述三维感知地图进行更新;或者根据所述当前帧的目标图像数据和第一传感器获得的第二信息对所述三维感知地图进行更新,所述第二信息包括所述摄像头拍摄所述当前帧时的运动信息和/或所述当前帧所指示的真实场景的空间测距信息。
本申请的上述装置,可以用于执行上述任一种方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例还提供了一种电子设备,该电子设备例如可以是计算机、服务器、手机、电子阅读器等,本申请实施例对此不作特殊限定。该电子设备可以包括通信模块、一个或者多个存储器以及一个或多个处理器,其中:通信模块用于与其他设备通信,一个或多个存储器用于存储一个或多个计算机程序,一个或多个处理器用于执行一个或多个计算机程序,使得电子设备执行上述任一种方法实施例的技术方案。
本申请还提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,所述指令在计算机上被执行时,使得所述计算机执行上述任一种方法实施例的技术方案。
本申请还提供一种包含指令的计算机程序,当所述计算机程序被计算机执行时,用于执行上述任一种方法实施例的技术方案。
本申请还提供一种芯片,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行上述任一种方法实施例的技术方案。
进一步地,所述芯片还可以包括存储器和通信接口。所述通信接口可以是输入/输出接口、管脚或输入/输出电路等。
在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失 性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (25)

  1. 一种视频帧处理方法,其特征在于,包括:
    根据第一姿态信息和第一视场,在三维感知地图中获得当前帧的属性信息,其中,所述第一姿态信息为摄像头拍摄所述当前帧的姿态信息,所述第一视场为所述摄像头的视场;
    根据所述当前帧的属性信息,获得所述当前帧的目标图像数据;
    其中,所述三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,所述第一场景为所述第一视频帧所指示的真实场景,所述第一视频帧为构建所述三维感知地图的视频帧,所述第一视频帧和所述当前帧为所述摄像头拍摄的视频中的视频帧,所述第一视频帧位于所述当前帧之前。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述当前帧的属性信息,获得所述当前帧的目标图像数据包括:
    根据所述当前帧的属性信息,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者
    获得所述当前帧的原始图像数据,根据所述当前帧的属性信息对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述当前帧的属性信息,获得所述当前帧的目标图像数据包括:
    根据第二姿态信息和所述第一视场,在所述三维感知地图中获得参考帧的属性信息,其中,所述第二姿态信息为所述摄像头拍摄所述参考帧时的姿态信息,所述参考帧为位于所述当前帧之前;
    根据所述参考帧的属性信息和所述当前帧的属性信息,计算所述当前帧与所述参考帧的相似度;
    若所述相似度大于预设相似度,则根据所述当前帧的属性信息,获得所述当前帧的目标图像数据。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述当前帧的属性信息,获得所述当前帧的目标图像数据包括:
    根据所述当前帧的属性信息和所述参考帧的属性信息确定所述当前帧与所述参考帧的属性差异;
    根据所述属性差异,获得所述当前帧的目标图像数据。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述属性差异,获得所述当前帧的目标图像数据包括:
    根据所述属性差异,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一 姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者
    获得所述当前帧的原始图像数据,根据所述属性差异对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
  6. 根据权利要求1~5中任一项所述的方法,其特征在于,所述三维感知地图包括多个第一特征点;
    所述在三维感知地图中获得当前帧的属性信息包括:
    在所述多个第一特征点中确定所述当前帧对应的第二特征点,其中,所述当前帧对应的第二特征点为被基于所述第一姿态信息和所述第一视场的所述摄像头拍到的第一特征点;
    根据所述第一视频帧的图像属性确定所述当前帧的图像属性;
    根据所述当前帧对应的第二特征点的位置和所述当前帧的图像属性,确定所述当前帧的属性信息。
  7. 根据权利要求3~5中任一项所述的方法,其特征在于,所述三维感知地图包括多个第一特征点;
    所述在所述三维感知地图中获得参考帧的属性信息包括:
    在所述多个第一特征点中确定所述参考帧对应的第二特征点,其中,所述参考帧对应的第二特征点为被基于所述第二姿态信息和所述第一视场的所述摄像头拍到的第一特征点;
    根据所述第一视频帧的图像属性,确定所述参考帧的图像属性;
    根据所述参考帧对应的第二特征点的位置和所述参考帧的图像属性,确定所述参考帧的属性信息。
  8. 根据权利要求1~7中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述第一视频帧的目标图像数据构建所述三维感知地图;或者
    根据所述第一视频帧的目标图像数据和第一传感器获得的第一信息构建所述三维感知地图,其中,所述第一信息包括所述摄像头拍摄所述第一视频帧时的运动信息和/或所述第一场景的空间测距信息。
  9. 根据权利要求8所述的方法,其特征在于,所述三维感知地图包括多个第一特征点;
    所述根据所述第一视频帧的目标图像数据构建所述三维感知地图包括:
    对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;
    对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;
    根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
  10. 根据权利要求8所述的方法,其特征在于,所述三维感知地图包括多个第一特征点;
    所述根据所述第一视频帧的目标图像数据和第一传感器获得的第一信息构建所述三维感知地图包括:
    对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;
    结合所述第一信息对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;
    根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
  11. 根据权利要求1~10中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述当前帧的目标图像数据对所述三维感知地图进行更新;或者
    根据所述当前帧的目标图像数据和第一传感器获得的第二信息对所述三维感知地图进行更新,所述第二信息包括所述摄像头拍摄所述当前帧时的运动信息和/或所述当前帧所指示的真实场景的空间测距信息。
  12. 一种视频帧处理装置,其特征在于,包括:
    第一获得模块,用于根据第一姿态信息和第一视场,在三维感知地图中获得当前帧的属性信息,其中,所述第一姿态信息为摄像头拍摄所述当前帧的姿态信息,所述第一视场为所述摄像头的视场;
    第二获得模块,用于根据所述当前帧的属性信息,获得所述当前帧的目标图像数据;
    其中,所述三维感知地图为第一场景的三维地图并至少用于指示第一视频帧的图像属性,所述第一场景为所述第一视频帧所指示的真实场景,所述第一视频帧为构建所述三维感知地图的视频帧,所述第一视频帧和所述当前帧为所述摄像头拍摄的视频中的视频帧,所述第一视频帧位于所述当前帧之前。
  13. 根据权利要求12所述的装置,其特征在于,所述第二获得模块,具体用于根据所述当前帧的属性信息,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者获得所述当前帧的原始图像数据,根据所述当前帧的属性信息对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
  14. 根据权利要求12所述的装置,其特征在于,所述第二获得模块,具体用于根据第二姿态信息和所述第一视场,在所述三维感知地图中获得参考帧的属性信息,其中,所述第二姿态信息为所述摄像头拍摄所述参考帧时的姿态信息,所述参考帧为位于所述当前帧之前的视频帧;根据所述参考帧的属性信息和所述当前帧的属性信息,计算所述当前帧与所述参考帧的相似度;若所述相似度大于预设相似度,则根据所述当前帧的属性信息,获得所述当前帧的目标图像数据。
  15. 根据权利要求14所述的装置,其特征在于,所述第二获得模块具体通过下述方式获得所述当前帧的目标图像数据:
    根据所述当前帧的属性信息和所述参考帧的属性信息确定所述当前帧与所述参考帧的属性差异;
    根据所述属性差异,获得所述当前帧的目标图像数据。
  16. 根据权利要求15所述的装置,其特征在于,所述第二获得模块具体通过下述方式获得所述当前帧的目标图像数据:
    根据所述属性差异,确定所述摄像头的拍摄参数,通过基于所述拍摄参数、所述第一姿态信息和所述第一视场的所述摄像头拍摄所述当前帧,得到所述当前帧的目标图像数据;或者
    获得所述当前帧的原始图像数据,根据所述属性差异对所述当前帧的原始图像数据进行处理,以得到所述当前帧的目标图像数据。
  17. 根据权利要求12~16中任一项所述的装置,其特征在于,所述三维感知地图包括多个第一特征点;
    所述第一获得模块,具体用于在所述多个第一特征点中确定所述当前帧对应的第二特征点,其中,所述当前帧对应的第二特征点为被基于所述第一姿态信息和所述第一视场的所述摄像头拍到的第一特征点;根据所述第一视频帧的图像属性确定所述当前帧的图像属性;根据所述当前帧对应的第二特征点的位置和所述当前帧的图像属性,确定所述当前帧的属性信息。
  18. 根据权利要求14~16中任一项所述的装置,其特征在于,所述三维感知地图包括多个第一特征点;
    所述第二获得模块具体通过下述方式获得所述参考帧的属性信息:
    在所述多个第一特征点中确定所述参考帧对应的第二特征点,其中,所述参考帧对应的第二特征点为被基于所述第二姿态信息和所述第一视场的所述摄像头拍到的第一特征点;
    根据所述第一视频帧的图像属性,确定所述参考帧的图像属性;
    根据所述参考帧对应的第二特征点的位置和所述参考帧的图像属性,确定所述参考帧的属性信息。
  19. 根据权利要求12~18中任一项所述的装置,其特征在于,所述第一获得模块,还用于根据所述第一视频帧的目标图像数据构建所述三维感知地图;或者根据所述第一视频帧的目标图像数据和第一传感器获得的第一信息构建所述三维感知地图,其中,所述第一信息包括所述摄像头拍摄所述第一视频帧时的运动信息和/或所述第一场景的空间测距信息。
  20. 根据权利要求19所述的装置,其特征在于,所述三维感知地图包括多个第一特征点;
    所述第一获得模块具体通过下述方式构建所述三维感知地图:
    对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;
    对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;
    根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
  21. 根据权利要求19所述的装置,其特征在于,所述三维感知地图包括多个第一特征点;
    所述第一获得模块具体通过下述方式构建所述三维感知地图:
    对所述第一视频帧的目标图像数据进行图像分析,得到所述第一视频帧的图像属性;
    结合所述第一信息对所述第一视频帧的目标图像数据进行空间分析,得到所述多个第一特征点的位置和所述摄像头拍摄所述第一视频帧的位姿序列;
    根据所述第一视频帧的图像属性、所述多个第一特征点的位置和所述位姿序列,构建所述三维感知地图。
  22. 根据权利要求12~21中任一项所述的装置,其特征在于,所述第一获得模块,还用于根据所述当前帧的目标图像数据对所述三维感知地图进行更新;或者根据所述当前帧的目标图像数据和第一传感器获得的第二信息对所述三维感知地图进行更新,所述第二信息包括所述摄像头拍摄所述当前帧时的运动信息和/或所述当前帧所指示的真实场景的空间测距信息。
  23. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机或处理器上运行时,使得所述计算机或处理器执行如权利要求1-11中任一项所述的方法。
  24. 一种包含指令的计算机程序产品,当其在计算机或处理器上运行时,使得所述计算机或处理器执行如权利要求1-11中任一项所述的方法。
  25. 一种芯片,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行权利要求1~11中任一项所述的方法。
PCT/CN2021/081341 2021-03-17 2021-03-17 视频帧处理方法和装置 WO2022193180A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180086746.7A CN116671099A (zh) 2021-03-17 2021-03-17 视频帧处理方法和装置
PCT/CN2021/081341 WO2022193180A1 (zh) 2021-03-17 2021-03-17 视频帧处理方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/081341 WO2022193180A1 (zh) 2021-03-17 2021-03-17 视频帧处理方法和装置

Publications (1)

Publication Number Publication Date
WO2022193180A1 true WO2022193180A1 (zh) 2022-09-22

Family

ID=83321819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/081341 WO2022193180A1 (zh) 2021-03-17 2021-03-17 视频帧处理方法和装置

Country Status (2)

Country Link
CN (1) CN116671099A (zh)
WO (1) WO2022193180A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110149039A1 (en) * 2009-12-18 2011-06-23 Electronics And Telecommunications Research Institute Device and method for producing new 3-d video representation from 2-d video
CN103177475A (zh) * 2013-03-04 2013-06-26 腾讯科技(深圳)有限公司 一种街景地图展现方法及系统
CN104915965A (zh) * 2014-03-14 2015-09-16 华为技术有限公司 一种摄像机跟踪方法及装置
CN109640068A (zh) * 2018-10-31 2019-04-16 百度在线网络技术(北京)有限公司 视频帧的信息预测方法、装置、设备以及存储介质
CN110969648A (zh) * 2019-12-11 2020-04-07 华中科技大学 一种基于点云序列数据的3d目标跟踪方法及系统
CN111223101A (zh) * 2020-01-17 2020-06-02 湖南视比特机器人有限公司 点云处理方法、点云处理系统和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110149039A1 (en) * 2009-12-18 2011-06-23 Electronics And Telecommunications Research Institute Device and method for producing new 3-d video representation from 2-d video
CN103177475A (zh) * 2013-03-04 2013-06-26 腾讯科技(深圳)有限公司 一种街景地图展现方法及系统
CN104915965A (zh) * 2014-03-14 2015-09-16 华为技术有限公司 一种摄像机跟踪方法及装置
CN109640068A (zh) * 2018-10-31 2019-04-16 百度在线网络技术(北京)有限公司 视频帧的信息预测方法、装置、设备以及存储介质
CN110969648A (zh) * 2019-12-11 2020-04-07 华中科技大学 一种基于点云序列数据的3d目标跟踪方法及系统
CN111223101A (zh) * 2020-01-17 2020-06-02 湖南视比特机器人有限公司 点云处理方法、点云处理系统和存储介质

Also Published As

Publication number Publication date
CN116671099A (zh) 2023-08-29

Similar Documents

Publication Publication Date Title
US9886774B2 (en) Photogrammetric methods and devices related thereto
KR20220009393A (ko) 이미지 기반 로컬화
WO2020063139A1 (zh) 脸部建模方法、装置、电子设备和计算机可读介质
KR20200005999A (ko) 듀얼 이벤트 카메라를 이용한 slam 방법 및 slam 시스템
JP6560480B2 (ja) 画像処理システム、画像処理方法、及びプログラム
CN106575160B (zh) 根据用户视点识别动作的界面提供方法及提供装置
US10789717B2 (en) Apparatus and method of learning pose of moving object
US9129435B2 (en) Method for creating 3-D models by stitching multiple partial 3-D models
JP2019536154A (ja) 直方体検出のための深層機械学習システム
US11847796B2 (en) Calibrating cameras using human skeleton
US10438405B2 (en) Detection of planar surfaces for use in scene modeling of a captured scene
WO2023151251A1 (zh) 地图构建方法、位姿确定方法、装置、设备及计算机程序产品
WO2023024441A1 (zh) 模型重建方法及相关装置、电子设备和存储介质
WO2023016182A1 (zh) 位姿确定方法、装置、电子设备和可读存储介质
Yeh et al. 3D reconstruction and visual SLAM of indoor scenes for augmented reality application
CN110310325B (zh) 一种虚拟测量方法、电子设备及计算机可读存储介质
CA3099748C (en) Spatial construction using guided surface detection
CN113610702B (zh) 一种建图方法、装置、电子设备及存储介质
US11188787B1 (en) End-to-end room layout estimation
WO2022193180A1 (zh) 视频帧处理方法和装置
CN115578432B (zh) 图像处理方法、装置、电子设备及存储介质
CN114742967B (zh) 一种基于建筑数字孪生语义图的视觉定位方法及装置
CN114972599A (zh) 一种对场景进行虚拟化的方法
CN116136408A (zh) 室内导航方法、服务器、装置和终端
JP7255709B2 (ja) 推定方法、推定装置及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21930773

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180086746.7

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21930773

Country of ref document: EP

Kind code of ref document: A1