WO2023093186A1 - 基于神经辐射场的行人重识别三维数据集构建方法和装置 - Google Patents

基于神经辐射场的行人重识别三维数据集构建方法和装置 Download PDF

Info

Publication number
WO2023093186A1
WO2023093186A1 PCT/CN2022/116174 CN2022116174W WO2023093186A1 WO 2023093186 A1 WO2023093186 A1 WO 2023093186A1 CN 2022116174 W CN2022116174 W CN 2022116174W WO 2023093186 A1 WO2023093186 A1 WO 2023093186A1
Authority
WO
WIPO (PCT)
Prior art keywords
ray
dimensional
pedestrian
neural
identification
Prior art date
Application number
PCT/CN2022/116174
Other languages
English (en)
French (fr)
Inventor
王宏升
陈�光
鲍虎军
Original Assignee
之江实验室
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 之江实验室 filed Critical 之江实验室
Priority to US17/950,033 priority Critical patent/US20230410560A1/en
Publication of WO2023093186A1 publication Critical patent/WO2023093186A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • G06T15/55Radiosity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation

Definitions

  • the invention relates to the technical field of pedestrian re-identification, in particular to a method and device for constructing a three-dimensional data set for pedestrian re-identification based on a neural radiation field.
  • Person re-identification also known as person re-identification, is widely considered as a subproblem of image retrieval. It uses computer vision technology to judge whether there is a specific pedestrian in the video and image, and in the case of a given pedestrian image, conduct cross-device retrieval of the pedestrian's image. Pedestrian re-identification technology can be combined with a variety of technologies and applied to security, video surveillance, prisoner monitoring, etc.
  • Pedestrian re-identification technology has many advantages, such as using conditions such as gait, body characteristics and clothing to identify people more comprehensively, and can retrieve pedestrians across devices under a multi-camera device network. Whether used alone or in combination with other technologies, it can exert great value, but it also brings great challenges, such as susceptibility to clothing, occlusion, posture and viewing angle.
  • the early data sets have fewer pedestrians and cameras and a smaller amount of data; the time span is short, the lighting conditions change less, and data under different lighting conditions are lacking; the scene is single and the scene coverage is small; more The manual labeling costs are high, and the data collection is cumbersome and difficult.
  • the quality of data needs to be improved urgently, and more accurate methods are needed to construct data sets.
  • the present invention provides a pedestrian re-identification three-dimensional data set construction method and device based on the neural radiation field, and provides a new idea of the data entry method: by inputting parameters representing five-dimensional scenes, optimizing and rendering a set of images and rendering a set of captures At the same time, through the improvement of the method, different rendered images can be generated to enrich and improve the data set to meet the challenges of small data size and difficult data collection.
  • the present invention provides the following technical solutions:
  • the invention discloses a method for constructing a three-dimensional data set for pedestrian re-identification based on a neural radiation field, which includes the following steps:
  • S1 Collect images of pedestrians to be entered through a group of cameras with different viewing angles
  • S2 Sampling and generating a three-dimensional space position point set through the camera ray in the scene, converting the observation direction of the camera corresponding to the three-dimensional space position point set into a three-dimensional Cartesian unit vector;
  • S3 Input the set of three-dimensional spatial position points and the observation direction converted into a three-dimensional Cartesian unit vector into a multi-layer perceptron, and output corresponding density and color;
  • step S4 Use the neural body rendering method to accumulate the ray color passing through each pixel into the image collected in step S1, the sub-steps are as follows:
  • step S5 Refer to position coding and multi-level sampling to improve the quality of the image generated by ray color accumulation in step S4, specifically:
  • S51 Introduce position encoding: encode the spatial position of the point, convert the three-dimensional vector input into the neural network into a specified dimension, and increase the accuracy of the generated image;
  • S52 Introduce multi-level sampling: first, adopt stratified sampling to collect a group of points, and conduct a preliminary evaluation of the neural network, based on the output of the preliminary evaluated neural network, generate a probability density function, and then use the probability density function along each ray function to collect, and then combine the points of the two samplings to make a more accurate evaluation of the neural network;
  • the three-dimensional space position point set in the step S2 refers to the three-dimensional space position (x, y, z) where the camera is located, and the viewing direction of the camera corresponding to the three-dimensional space position point set is d, which can be Convert it to a 3D Cartesian unit vector.
  • the concrete process of described step S3 is: adopt a multi-layer perceptron, input the spatial position of camera and viewing direction ( ⁇ , d), the color of output point and density (c, ⁇ ), wherein ⁇ is space Position (x, y, z), d is the three-dimensional Cartesian unit vector converted from the viewing direction, c is the color, and ⁇ is the volume density.
  • the neural body rendering method in step S4 is as follows: trace the light of the scene, and integrate the light of a specified length to generate an image or video.
  • trace the light of the scene and integrate the light of a specified length to generate an image or video.
  • it is necessary to render through the scene The color of any ray of is rendered into an image.
  • d the observation direction
  • t refers to the position of a certain point in the space where the camera ray passes
  • the ray The specific definition of color is as follows:
  • t n and t f are the near and far boundaries of the ray
  • c represents the color
  • represents the volume density
  • T(t) is the cumulative transparency of the ray from t n to t, that is, this ray travels from t n
  • the probability of not hitting any particle on the path to t specifically:
  • the step S42 is specifically: dividing the distance [t n , t f ] between the near boundary t n and the far boundary t f of the ray into N intervals with even intervals, and then randomly extracting from each interval A sample, namely t i obeys the uniform distribution:
  • ⁇ i bulk density
  • ci color
  • the specific method of introducing position encoding in the step S51 is: standardize the spatial position ⁇ and the viewing direction d, and encode each coordinate value in the spatial position and viewing direction as follows:
  • ⁇ (p) (sin(2 0 ⁇ p), cos(2 0 ⁇ p), . . . , sin(2 L-1 ⁇ p), cos(2 L-1 ⁇ p)).
  • the specific sub-steps of introducing multi-level sampling in the step S52 are as follows:
  • Step 1 Using stratified sampling to collect N c points on the ray;
  • Step 2 input sampling points, and use the quadrature method to conduct a preliminary evaluation of the neural network at the sampling point positions;
  • Step 3 Generate a probability density function through normalization processing, and rewrite the integral formula in the step S42 as:
  • Step 4 based on the above probability density function, collect N f points along each ray;
  • Step 5 Use the N c + N f points collected above to evaluate the neural network more accurately and render the ray color better.
  • the invention also discloses a device for constructing a three-dimensional data set for pedestrian re-identification based on the neural radiation field, the device includes a memory and one or more processors, executable codes are stored in the memory, and the one or more When the processor executes the executable code, it is used to implement the above-mentioned method for constructing a three-dimensional data set for pedestrian re-identification based on the neural radiation field.
  • the present invention provides a method and device for constructing a three-dimensional data set for pedestrian re-identification based on neural radiation fields, provides a brand-new method for constructing a pedestrian re-identification data set, and provides a new idea for constructing a data set .
  • the data acquisition method is more direct and clear through the images and spatial positions collected by multiple devices; the neural radiation field is introduced, and the three-dimensional image is reconstructed on the basis of the existing data, and through the improvement Constructing images with different effects, simulating images in different scenes and under different lighting conditions, greatly enriched the data set; after collecting and rebuilding the data, the data will be labeled, reducing the cost of manual labeling in the later stage; through this
  • the three-dimensional data set constructed by the method contains more comprehensive and perfect information.
  • Fig. 1 is the architecture diagram of the construction method and device of the pedestrian re-identification three-dimensional dataset based on the neural radiation field;
  • Fig. 2 is the schematic diagram of multi-level sampling
  • Fig. 3 is the device schematic diagram of the embodiment of the present invention.
  • Neural Radiation Field is a method of inputting multiple images, connecting and representing a 3D scene using a multi-layer perceptron (neural network), which can be stored in a among files of comparable size.
  • the rendered graphics show satisfactory accuracy, and can render the details of any viewing angle, and perform high-resolution modeling of complex scenes.
  • neural radiation fields overcome the high storage cost of discrete voxel grids.
  • the present invention provides a method and device for constructing a 3D data set for pedestrian re-identification based on neural radiation fields, and provides a new idea of data entry method: by inputting parameters representing a five-dimensional scene, a set of image rendering is optimized and rendered A set of captured images; at the same time, through the improvement of the method, different rendered images can be generated to enrich and improve the data set to meet the challenges of small data size and difficult data collection.
  • an embodiment of the present invention provides a method for constructing a three-dimensional data set for pedestrian re-identification based on a neural radiation field, and the method includes the following steps:
  • Step 2 Camera space position and observation direction data collection:
  • a sampled 3D point set is generated through the camera ray in the scene, and the 3D space position ⁇ where the point is obtained is expressed as (x, y, z) and the viewing direction d of the camera is expressed as
  • observation direction of the camera corresponding to the point set in three-dimensional space can be converted into a three-dimensional Cartesian unit vector.
  • Step 3 Neural network output density and color:
  • Multi-layer perceptron also known as artificial neural network, includes an input layer, an output layer, and multiple hidden layers in the middle. It weights each input dimension to obtain the output dimension, and adds an activation function to it to obtain the ability to learn The model of nonlinear relationship can achieve better results.
  • the present invention adopts a multi-layer perceptron F ⁇ : ( ⁇ , d) ⁇ (c, ⁇ ). Its input is ⁇ , a spatial position (x, y, z), d is the viewing direction, represented by a three-dimensional Cartesian unit vector; the output is the color and density of the point, that is, c is the RGB color (r, g, b), ⁇ is the bulk density.
  • Step 4 Compositing images with volume rendering technology:
  • Volume rendering methods refer to various methods of generating images from 3D scalar data, and visualize volume data under complex lighting through conventional path tracing, photon mapping, etc.
  • Neural body rendering refers to a method of tracing the light of a scene and integrating a certain length of light to generate an image or video.
  • the present invention uses a classic volume rendering method, namely neural volume rendering, to estimate the color of any ray passing through the scene.
  • the ray color is defined by continuous integral, and the specific definition is as follows:
  • T(t) is the cumulative transparency of the ray on the path from t n to t, that is, the ray does not hit any particles on the path from t n to t.
  • the quadrature method is used to numerically estimate this continuous integral. Due to the nature of multilayer perceptrons, which can only query a fixed set of discrete points, the deterministic orthogonality used to render a discrete voxel grid limits the resolution of the scene representation.
  • the present invention adopts the method of stratified sampling: divide the distance [t n , t f ] between the near boundary t n and the far boundary of t f of the ray into N evenly spaced intervals, and then randomly select one from each interval
  • the sample that is, t i obeys the uniform distribution:
  • layered sampling enables the multilayer perceptron to be optimized for evaluation at continuous locations, so that, despite estimating integrals using discrete sets of samples, continuous scenes can be represented. Then the integral can be simplified to:
  • ⁇ i bulk density
  • ci color
  • the image is generated by accumulating the color of rays passing through each pixel into the image.
  • Step 5 Improve image quality with the improved method:
  • the above process describes the parameter acquisition and image generation based on the neural radiation field technology.
  • the above method can be improved in the following two ways:
  • Deep networks are more inclined to low-frequency function learning.
  • the high-frequency function is used to map the input to a higher-dimensional space, which can make the result contain high-frequency changing data and thus have better performance.
  • the spatial position ⁇ and the viewing direction d are standardized, and each coordinate value in the spatial position and viewing direction is encoded as follows:
  • ⁇ (p) (sin(2 0 ⁇ p), cos(2 0 ⁇ p), . . . , sin(2 L-1 ⁇ p), cos(2 L-1 ⁇ p)).
  • L is set to 10, that is, a vector with a length of 60 is obtained; for ⁇ (d), L is set to 4, and a vector with a length of 24 is obtained.
  • the positional encoding introduces a higher dimensional space, so using this positional encoding allows the multilayer perceptron to approximate high frequency functions.
  • the aforementioned rendering strategy is estimated along each camera ray, and the space and occluded areas without valid information are still re-sampled, which leads to the inefficiency of the strategy.
  • This improvement scheme proposes a multi-level sampling. In order to improve the efficiency of the rendering strategy, samples are collected in the final rendering in proportion to the expected effect.
  • (1) in Figure 2 represents the collection of points in the "coarse" network, that is, randomly sample points according to the stratified sampling described above, and the number of sampling points is N c
  • (2) in Figure 2 represents the normalized When the sampling points of the probability density function generated after processing are combined with the sampling points in (1), the number of sampling points is N c +N f .
  • ⁇ i T i ⁇ (1-exp(- ⁇ i ⁇ i )), and then normalize ⁇ i
  • ⁇ i volume density
  • ci color
  • the images generated by the above two improved methods of position encoding and multi-level sampling have higher image quality, but it does not mean that the images generated in step 4 lose their value. Since pedestrian re-identification techniques are easily affected by clothing, occlusion, posture, viewing angle, and weather, images of different qualities and in different states can enrich the pedestrian re-identification dataset and create conditions for better pedestrian re-identification.
  • Step 6 Save the generated image into the dataset:
  • an embodiment of the present invention also provides a device for constructing a three-dimensional data set for pedestrian re-identification based on a neural radiation field, which also includes a memory and one or more processors, executable codes are stored in the memory, and the one or When multiple processors execute the executable code, they are used to implement the method for constructing a three-dimensional dataset for pedestrian re-identification based on the neural radiation field in the above embodiment.
  • An embodiment of the device for constructing a three-dimensional data set for pedestrian re-identification based on the neural radiation field of the present invention can be applied to any device with data processing capability, and the device with data processing capability can be a device or device such as a computer.
  • the device embodiments can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of any device capable of data processing.
  • any device with data processing capabilities where the neural radiation field-based pedestrian re-identification 3D dataset construction device of the present invention is located, except as shown in Figure 3
  • any device with data processing capabilities where the device in the embodiment is usually based on the actual function of any device with data processing capabilities may also include other hardware , which will not be repeated here.
  • the implementation process of the functions and effects of each unit in the above device please refer to the implementation process of the corresponding steps in the above method for details, and will not be repeated here.
  • the device embodiment since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. It can be understood and implemented by those skilled in the art without creative effort.
  • An embodiment of the present invention also provides a computer-readable storage medium on which a program is stored.
  • the program is executed by a processor, the method for constructing a three-dimensional data set for pedestrian re-identification based on the neural radiation field in the above-mentioned embodiment is implemented.
  • the computer-readable storage medium may be an internal storage unit of any device capable of data processing described in any of the foregoing embodiments, such as a hard disk or a memory.
  • the computer-readable storage medium can also be an external storage device of any device with data processing capabilities, such as a plug-in hard disk, a smart media card (Smart Media Card, SMC), an SD card, and a flash memory card equipped on the device. (Flash Card) etc.
  • the computer-readable storage medium may also include both an internal storage unit of any device capable of data processing and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by any device capable of data processing, and may also be used to temporarily store data that has been output or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于神经辐射场的行人重识别三维数据集构建方法和装置,包括如下步骤:S1:通过一组不同视角的相机对待录入行人进行图像采集;S2:通过场景中的相机射线,采样生成一个三维空间位置点集,将所述三维空间位置点集所对应相机的观察方向转换为三维笛卡尔单位向量;S3:将所述三维空间位置点集及其转换为三维笛卡尔单位向量的观察方向输入多层感知器,输出对应的密度和颜色;本发明一种基于神经辐射场的行人重识别三维数据集构建方法和装置,给出了一种全新的行人重识别数据集构建的方法,提供了数据集构建的新思路。相比于传统的数据集构建方法,通过多设备所采集的图像和空间位置,获取数据方式更加直接明了。

Description

基于神经辐射场的行人重识别三维数据集构建方法和装置
相关申请的交叉引用
本发明要求于2022年6月15日向中国国家知识产权局提交的申请号为CN 202210670964.1、发明名称为“基于神经辐射场的行人重识别三维数据集构建方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及行人重识别技术领域,特别涉及一种基于神经辐射场的行人重识别三维数据集构建方法和装置。
背景技术
行人重识别亦称行人再识别,被广泛认为是图像检索的一个子问题。其利用计算机视觉技术,判断视频和图像中是否存在特定行人,且在给定行人图像的情况下,对该行人的图像进行跨设备检索。行人重识别技术可与多种技术相结合,并应用于安保、视频监控、犯人监测等方向。
行人重识别技术存在许多的优势,如利用步态、身体特征和衣着等条件,更加全面地识别人物,且可以在多摄像设备网络下,跨设备对行人进行检索。无论是单独使用还是与其他技术相结合,都能发挥很大的价值,但同时也带来了巨大挑战,如易受穿着、遮挡、姿态和视角等影响。
在采集行人重识别数据时,需要考虑的因素有:数据采集需跨设备;公开的数据集规模远小于实际需求;影响识别的因素众多,处理难度大;监控涉及数据和行人的隐私问题等。都对行人重识别提出了挑战和研究重点。
早期数据集相比于实际的监控网络,数据集中行人和摄像头的数目少,数据量小;时间跨度短,光照条件变化少,缺少不同光照下的数据;场景单一,场景覆盖范围小;更有人工标注费用高、数据采集繁琐困难等不足。数据质量亟待提升,需要更加准确的方法去构建数据集。
发明内容
本发明提供了一种基于神经辐射场的行人重识别三维数据集构建方法和装置,提供数据录入方法的新思路:通过输入表示五维场景的参数,优化并渲染一组图像渲染一组捕获图像;同时通过对方法的改进,可生成不同的渲染图像,丰富和完善数据集,用以应对数据规模小、数据采集难的挑战。
为实现上述目的,本发明提供如下技术方案:
本发明公开了一种基于神经辐射场的行人重识别三维数据集构建方法,包括如下步骤:
S1:通过一组不同视角的相机对待录入行人进行图像采集;
S2:通过场景中的相机射线,采样生成一个三维空间位置点集,将所述三维空间位置点集所对应相机的观察方向转换为三维笛卡尔单位向量;
S3:将所述三维空间位置点集及其转换为三维笛卡尔单位向量的观察方向输入多层感知器,输出对应的密度和颜色;
S4:使用神经体渲染方法,将经过每个像素的射线颜色累积到步骤S1中采集的图像中,子步骤如下:
S41:用连续积分定义相机射线的累计透明率,并据此生成射线颜色的定义;
S42:采用求积法对射线颜色进行估计,将射线的近边界到远边界划分为N个间隔均匀的区间,并用分层抽样的方法均匀选取离散点;
S5:引用位置编码、多层级采样以提高步骤S4中射线颜色累计所生成图像的质量,具体为:
S51:引入位置编码:对点的空间位置进行编码,将输入神经网络的三维向量转化为指定维数,增加生成图像的精度;
S52:引入多层级采样:首先采用分层抽样采集一组点,并对神经网络进行初步评估,基于这个初步评估的神经网络的输出,生成概率密度函数,然后沿着每条射线以该概率密度函数进行采集,再结合两次采样的点,对神经网络进行更精确的评估;
S6:将生成图像打上标签,存入数据集。
作为优选的,所述步骤S2中所述三维空间位置点集指所述相机所在的三维空间位置(x,y,z),所述三维空间位置点集所对应相机的观察方向为d,可将其转化为三维笛卡尔单位向量。
作为优选的,所述步骤S3的具体过程为:采用一个多层感知器,输入相机的空间位置和观察方向(ζ,d),输出点的颜色和密度(c,σ),其中ζ为空间位置(x,y,z),d为观察方向所转化成的三维笛卡尔单位向量,c表示颜色,σ为体积密度。
作为优选的,所述步骤S4中神经体渲染方法具体如下:追踪场景的光线,并对规定长度的光线进行积分来生成图像或者视频,在从三维标量数据生成图像的方法中,需渲染通过场景的任何一条射线的颜色以渲染成为图像。
作为优选的,所述步骤S41的具体过程为:将相机射线标记为r(t)=o+td,o为射线原点,d为观察方向,t指相机射线经过的空间某点的位置,射线颜色的具体定义如下:
Figure PCTCN2022116174-appb-000001
其中t n和t f是射线的近边界和远边界,c表示颜色,σ表示体积密度,T(t)是射线从t n到t这一段路径上的累积透明度,即这条射线从t n到t路径上没有击中任何粒子的概率,具体为:
Figure PCTCN2022116174-appb-000002
作为优选的,所述步骤S42具体为:将射线的近边界t n和t f远边界之间的距离[t n,t f]分成N个间隔均匀的区间,然后从每个区间中随机抽取一个样本,即t i服从均匀分布:
Figure PCTCN2022116174-appb-000003
则可将射线颜色C(r)的积分公式简化为:
Figure PCTCN2022116174-appb-000004
其中
Figure PCTCN2022116174-appb-000005
σ i表示体积密度,c i表示颜色。
作为优选的,所述步骤S51中引入位置编码的具体方法为:对空间位置ζ和观察方向d进行标准化,并对空间位置和观察方向中的每一个坐标值进行如下编码:
γ(p)=(sin(2 0πp),cos(2 0πp),...,sin(2 L-1πp),cos(2 L-1πp))。
作为优选的,所述步骤S52中引入多层级采样的具体子步骤如下:
步骤一:采用分层抽样在射线上采集N c个点;
步骤二:输入采样点,并在所述采样点位置用求积法对神经网络进行初步评估;
步骤三:通过归一化处理,生成概率密度函数,将所述步骤S42中积分公式重写为:
Figure PCTCN2022116174-appb-000006
其中ω i=T i·(1-exp(-σ i·δ i)),再将ω i进行归一化
Figure PCTCN2022116174-appb-000007
从而生成一个分段常数的概率密度函数;
步骤四:基于上述概率密度函数,沿着每条射线采集N f个点;
步骤五:使用上述采集的N c+N f个点来对神经网络进行更精确的评估,更好地渲染射线颜色。
本发明还公开了一种基于神经辐射场的行人重识别三维数据集构建装置,所述装置包括存储器和一个或多个处理器,所述存储器中存储有可执行代码,所述一个或多个处理器执行所述可执行代码时,用于实现上述基于神经辐射场的行人重识别三维数据集构建方法。
本发明的有益效果:本发明一种基于神经辐射场的行人重识别三维数据集构建方法和装置,给出了一种全新的行人重识别数据集构建的方法,提供了数据集构建的新思路。相比于传统的数据集构建方法,通过多设备所采集的图像和空间位置,获取数据方式更加直接明了;引入了神经辐射场,在已有数据的基础之上重新构建三维图像,并通过改进构建不同效果的图像,模拟了不同场景、不同光照下的图像,很大程度上丰富了数据集;在采集数据并重新构建后,便将数据打上标签,减少了后期人工标注的费用;通过本方法构建的三维数据集,包含更全面、更完善的信息。
附图说明
图1为基于神经辐射场的行人重识别三维数据集构建方法和装置的架构图;
图2为多层级采样的示意图;
图3为本发明实施例的装置示意图;
图2中:1-相机、2-相机射线、3-采样点、4-神经体渲染方法积累的颜色、5-采样物体。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚明了,下面通过附图及实施例,对本发明进行进一步详细说明。但是应该理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限制本发明的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本发明的概念。
神经辐射场,作为一项全新的技术,是一种通过输入多张图像,使用多层感知器(神经网络)进行连接并表示三维场景的方法,该多层感知器可存储于一个与压缩图像大小相当的文件之中。在此技术的表示之下,渲染出的图形表现出了让人满意的精度,并可以渲染出任意视角的细节,对复杂场景进行高分辨率的建模。同时,神经辐射场克服了离散体素网格的高存储成本问题。
鉴于此,本发明提供了一种基于神经辐射场的行人重识别三维数据集构建方法和装置,提供数据录入方法的新思路:通过输入表示五维场景的参数,优化并渲染一组图像渲染一组捕获图像;同时通过对方法的改进,可生成不同的渲染图像,丰富和完善数据集,用以应对数据规模小、数据采集难的挑战。
如图1所示,本发明实施例提供一种基于神经辐射场的行人重识别三维数据集构建方法,所述方法包括如下步骤:
步骤一:图像采集:
通过一组不同视角的相机对待录入行人进行多次图像采集,针对一个特定的待录入行人,需要提供大量相机参数已知的图像,即空间位置和观察方向已知的图像。
步骤二:相机空间位置和观察方向数据采集:
通过场景中的相机射线生成一个采样的三维点集,获取点所在的三维空间位置ζ表示为(x,y,z)和相机的观察方向d,表示为
Figure PCTCN2022116174-appb-000008
实际上,三维空间位置点集所对应相机的观察方向,可转换为三维笛卡尔单位向量。
步骤三:神经网络输出密度和颜色:
多层感知器,也称人工神经网络,包括输入层、输出层,还有中间的多个隐层,其将每一个输入的维度加权得到输出的维度,并在其中加入激活函数,得到能够学习非线性关系的模型,因而能取得较好的效果。
本发明采用一个多层感知器F θ:(ζ,d)→(c,σ)。其输入为ζ,一个空间位置(x,y,z),d为观察方向,用三维笛卡尔单位向量表示;输出为点的颜色和密度,即c为RGB颜色(r,g,b),σ为体积密度。
使用上述方法,可优化其权重Θ,将上述所采集的三维空间位置点集及其观察方向,输入该映射得到对应的体积密度和颜色。
步骤四:体渲染技术合成图像:
体渲染方法,指以三维标量数据生成图像的多种方法,通过常规路径跟踪、光子映射等方式,在复杂的光照下可视化体数据。而神经体渲染是指追踪场景的光线,并对一定长度的光线进行积分来生成图像或者视频的一种方法。
本发明使用经典的体渲染方法,即神经体渲染,估计经过该场景的任何一条射线的颜色。将相机射线标记为r(t)=o+td,o为射线原点,d为前述观察方向,t指相机射线经过的空间某点的位置。射线颜色用连续积分定义,具体定义如下:
Figure PCTCN2022116174-appb-000009
其中t n和t f是射线的近边界和远边界,T(t)是射线从t n到t这一段路径上的累积透明度,即这条射线从t n到t路径上没有击中任何粒子的概率,具体为:
Figure PCTCN2022116174-appb-000010
上述连续神经辐射场渲染视图,需要估计通过所需通过虚拟摄像机每个像素的射线颜色,即估计连续积分C(r),在本发明中采用求积法对这个连续积分进行数值估计。鉴于多层感知器的特性,它只能查询一组固定的离散点,故而用于渲染离散体素网格的确定性正交会限制场景表示的分辨率。本发明采用分层抽样的方法,:将射线的近边界t n和t f远边界之间 的距离[t n,t f]分成N个间隔均匀的区间,然后从每个区间中随机抽取一个样本,即t i服从均匀分布:
Figure PCTCN2022116174-appb-000011
在整个过程中,分层采样使得多层感知器在连续位置进行评估优化,所以,尽管使用离散样本集估计积分,但能够表示连续场景。则可将积分简化为:
Figure PCTCN2022116174-appb-000012
其中
Figure PCTCN2022116174-appb-000013
σ i表示体积密度,c i表示颜色。
将经过每个像素的射线颜色积累到图像之中,从而生成图像。
步骤五:利用改进方法提高图像质量:
上述过程描述了基于神经辐射场技术的参数采集和图像生成,为提高生成图像的质量,同时对数据集进行进一步的扩充,可对以上方法进行如下两种改进:
改进一:位置编码
深度网络更加倾向于低频函数学习。在输入传递到网络之前,使用高频函数将输入映射到更高维度的空间,能够使结果包含高频变化的数据,从而有更好的表现。据此,对空间位置ζ和观察方向d进行标准化,并对空间位置和观察方向中的每一个坐标值进行如下编码:
γ(p)=(sin(2 0πp),cos(2 0πp),...,sin(2 L-1πp),cos(2 L-1πp))。
在本发明中,对于γ(x),L设置为10,即得到长度为60的向量;对于γ(d),L设置为4,得到长度为24的向量。位置编码引入了更高维度的空间,因此,使用此位置编码使得多层感知器逼近高频率函数。
重新定义一个多层感知器
Figure PCTCN2022116174-appb-000014
其中F′ θ是一个普通的多层感知器。在此多层感知器输出的基础之上,再对场景中的射线颜色进行渲染,使得图像更加接近真实。
改进二:多层级采样
如图2所示,前述的渲染策略沿着每条相机射线进行估计,对没有有效信息的空间、遮挡区域仍然重复采样,这导致了该策略的低效。此改进方案提出了一种多层级采样,为了提高渲染策略的效率,按预期效果的比例在最终的渲染中采集样本。
图2中的(1)表示“粗”网络的点的采集,也就是按照前文所描述的分层抽样随机采点,采样点数量为N c,图2中的(2)表示按照归一化处理后生成的概率密度函数采样的 点与(1)图中采点合在一起的情况,采样点数量为N c+N f
在这个方案中,优化的不只是一个网络,而是同时对“粗”网络和“细”网络进行优化,以表示场景。首先,用分层抽样采集一组点,并评估“粗”网络。基于这个“粗”网络的输出,生成概率密度函数,然后沿着每条射线以该概率密度函数进行采集,再结合两次采样的点进行“细”网络的评估。具体步骤如下:
1、采用分层抽样在射线上采集N c个点;
2、输入采样点,并在这些位置用求积法对神经网络进行初步评估,及对网络进行“粗”评估,具体为通过上述公式(2)进行评估;
3、通过归一化处理,生成概率密度函数,将公式(2)重写为
Figure PCTCN2022116174-appb-000015
其中ω i=T i·(1-exp(-σ i·δ i)),再将ω i进行归一化
Figure PCTCN2022116174-appb-000016
从而生成一个分段常数的概率密度函数,σ i表示体积密度,c i表示颜色;
4、基于上述概率密度函数,沿着每条射线采集N f个点;
5、使用上述采集的N c+N f个点来对神经网络进行更精确的评估,即使用上述采集的N c+N f个点来估计“细”网络,更好地渲染射线颜色。
值得一提的是,利用上述位置编码、多层级采样两种改进方法生成的图像拥有更高的图像质量,但并不代表步骤四中生成的图像失去了价值。由于行人重识别技如易受穿着、遮挡、姿态、视角和天气等的影响,不同质量、不同状态下的图像都能够丰富行人重识别数据集,为更好的行人重识别创造条件。
步骤六:将生成的图像存入数据集:
将生成的图像打上行人所对应的标签,即以该行人名称标记该图像,得到含有不同质量图像的该行人数据,存入数据集之中。重复以上步骤,最终生成含有多行人的较为完备的数据集。
参见图3,本发明实施例还提供了一种基于神经辐射场的行人重识别三维数据集构建装置,还包括存储器和一个或多个处理器,存储器中存储有可执行代码,所述一个或多个处理器执行所述可执行代码时,用于实现上述实施例中的基于神经辐射场的行人重识别三维数据集构建方法。
本发明一种基于神经辐射场的行人重识别三维数据集构建装置的实施例可以应用在 任意具备数据处理能力的设备上,该任意具备数据处理能力的设备可以为诸如计算机等设备或装置。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在任意具备数据处理能力的设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图3所示,为本发明一种基于神经辐射场的行人重识别三维数据集构建装置所在任意具备数据处理能力的设备的一种硬件结构图,除了图3所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的任意具备数据处理能力的设备通常根据该任意具备数据处理能力的设备的实际功能,还可以包括其他硬件,对此不再赘述。上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本发明方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本发明实施例还提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时,实现上述实施例中的基于神经辐射场的行人重识别三维数据集构建方法。
所述计算机可读存储介质可以是前述任一实施例所述的任意具备数据处理能力的设备的内部存储单元,例如硬盘或内存。所述计算机可读存储介质也可以是任意具备数据处理能力的设备的外部存储设备,例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card,SMC)、SD卡、闪存卡(Flash Card)等。进一步的,所述计算机可读存储介质还可以既包括任意具备数据处理能力的设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述任意具备数据处理能力的设备所需的其他程序和数据,还可以用于暂时地存储已经输出或者将要输出的数据。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换或改进等,均应包含在本发明的保护范围之内。

Claims (9)

  1. 一种基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:包括如下步骤:
    S1:通过一组不同视角的相机对待录入行人进行图像采集;
    S2:通过场景中的相机射线,采样生成一个三维空间位置点集,将所述三维空间位置点集所对应相机的观察方向转换为三维笛卡尔单位向量;
    S3:将所述三维空间位置点集及其转换为三维笛卡尔单位向量的观察方向输入多层感知器,输出对应的密度和颜色;
    S4:使用神经体渲染方法,将经过每个像素的射线颜色累积到步骤S1中采集的图像中,子步骤如下:
    S41:用连续积分定义相机射线的累计透明率,并据此生成射线颜色的定义;
    S42:采用求积法对射线颜色进行估计,将射线的近边界到远边界划分为N个间隔均匀的区间,并用分层抽样的方法均匀选取离散点;
    S5:引用位置编码、多层级采样以提高步骤S4中射线颜色累计所生成图像的质量,具体为:
    S51:引入位置编码:对点的空间位置进行编码,将输入神经网络的三维向量转化为指定维数,增加生成图像的精度;
    S52:引入多层级采样:首先采用分层抽样采集一组点,并对神经网络进行初步评估,基于这个初步评估的神经网络的输出,生成概率密度函数,然后沿着每条射线以该概率密度函数进行采集,再结合两次采样的点,对神经网络进行更精确的评估;
    S6:将生成图像打上标签,存入数据集。
  2. 如权利要求1所述的基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:所述步骤S2中所述三维空间位置点集指所述相机所在的三维空间位置(x,y,z),所述三维空间位置点集所对应相机的观察方向为d,可将其转化为三维笛卡尔单位向量。
  3. 如权利要求1所述的基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:所述步骤S3的具体过程为:采用一个多层感知器,输入相机的空间位置和观察方向(ζ,d),输出点的颜色和密度(c,σ),其中ζ为空间位置(x,y,z),d为观察方向所转化成的三维笛卡尔单位向量,c表示颜色,σ为体积密度。
  4. 如权利要求1所述的基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:所述步骤S4中神经体渲染方法具体如下:追踪场景的光线,并对规定长度的光线进行积分来生成图像或者视频,在从三维标量数据生成图像的中,渲染通过场景的任何一条射线的颜色以渲染成为图像。
  5. 如权利要求1所述的基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:所 述步骤S41的具体过程为:将相机射线标记为r(t)=o+td,o为射线原点,d为观察方向,t指相机射线经过的空间某点的位置,射线颜色的具体定义如下:
    Figure PCTCN2022116174-appb-100001
    其中t n和t f是射线的近边界和远边界,c表示颜色,σ表示体积密度,T(t)是射线从t n到t这一段路径上的累积透明度,即这条射线从t n到t路径上没有击中任何粒子的概率,具体为:
    Figure PCTCN2022116174-appb-100002
  6. 如权利要求5所述的基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:所述步骤S42具体为:将射线的近边界t n和t f远边界之间的距离[t n,t f]分成N个间隔均匀的区间,然后从每个区间中随机抽取一个样本,即t i服从均匀分布:
    Figure PCTCN2022116174-appb-100003
    则可将射线颜色C(r)的积分公式简化为:
    Figure PCTCN2022116174-appb-100004
    其中δ i=t i+1-t i
    Figure PCTCN2022116174-appb-100005
    σ i表示体积密度,c i表示颜色。
  7. 如权利要求1所述的基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:所述步骤S51中引入位置编码的具体方法为:对空间位置ζ和观察方向d进行标准化,并对空间位置和观察方向中的每一个坐标值进行如下编码:
    γ(p)=(sin(2 0πp),cos(2 0πp),...,sin(2 L-1πp),cos(2 L-1πp))。
  8. 如权利要求6所述的基于神经辐射场的行人重识别三维数据集构建方法,其特征在于:所述步骤S52中引入多层级采样的具体子步骤如下:
    步骤一:采用分层抽样在射线上采集N c个点;
    步骤二:输入采样点,并在所述采样点位置用求积法对神经网络进行初步评估;
    步骤三:通过归一化处理,生成概率密度函数;将所述步骤S42中积分公式重写为:
    Figure PCTCN2022116174-appb-100006
    其中ω i=T i·(1-exp(-σ i·δ i)),再将ω i进行归一化
    Figure PCTCN2022116174-appb-100007
    从而生成一个分段常数 的概率密度函数;
    步骤四:基于上述概率密度函数,沿着每条射线采集N f个点;
    步骤五:使用上述采集的N c+N f个点来对神经网络进行更精确的评估,更好地渲染射线颜色。
  9. 一种基于神经辐射场的行人重识别三维数据集构建装置,其特征在于:所述装置包括存储器和一个或多个处理器,所述存储器中存储有可执行代码,所述一个或多个处理器执行所述可执行代码时,用于实现权利要求1-8任一项所述基于神经辐射场的行人重识别三维数据集构建方法。
PCT/CN2022/116174 2022-06-15 2022-08-31 基于神经辐射场的行人重识别三维数据集构建方法和装置 WO2023093186A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/950,033 US20230410560A1 (en) 2022-06-15 2022-09-21 Method and apparatus for constructing three-dimensional data set of pedestrian re-identification based on neural radiation field

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210670964.1 2022-06-15
CN202210670964.1A CN114758081A (zh) 2022-06-15 2022-06-15 基于神经辐射场的行人重识别三维数据集构建方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/950,033 Continuation US20230410560A1 (en) 2022-06-15 2022-09-21 Method and apparatus for constructing three-dimensional data set of pedestrian re-identification based on neural radiation field

Publications (1)

Publication Number Publication Date
WO2023093186A1 true WO2023093186A1 (zh) 2023-06-01

Family

ID=82336702

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116174 WO2023093186A1 (zh) 2022-06-15 2022-08-31 基于神经辐射场的行人重识别三维数据集构建方法和装置

Country Status (3)

Country Link
US (1) US20230410560A1 (zh)
CN (1) CN114758081A (zh)
WO (1) WO2023093186A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778061A (zh) * 2023-08-24 2023-09-19 浙江大学 一种基于非真实感图片的三维物体生成方法
CN116958492A (zh) * 2023-07-12 2023-10-27 数元科技(广州)有限公司 一种基于NeRf重建三维底座场景渲染的VR编辑应用
CN116977525A (zh) * 2023-07-31 2023-10-31 之江实验室 一种图像渲染方法、装置、存储介质及电子设备
CN117036639A (zh) * 2023-08-21 2023-11-10 北京大学 一种面向受限空间的多视角几何场景建立方法和装置
CN117173343A (zh) * 2023-11-03 2023-12-05 北京渲光科技有限公司 一种基于神经辐射场的重新照明方法及系统
CN117333609A (zh) * 2023-12-01 2024-01-02 北京渲光科技有限公司 图像渲染方法、网络的训练方法、设备及介质
CN117422804A (zh) * 2023-10-24 2024-01-19 中国科学院空天信息创新研究院 一种大规模城市街区三维场景渲染与目标精细空间定位方法
CN117422829A (zh) * 2023-10-24 2024-01-19 南京航空航天大学 一种基于神经辐射场的人脸图像合成优化方法
CN117710583A (zh) * 2023-12-18 2024-03-15 中铁第四勘察设计院集团有限公司 基于神经辐射场的空地影像三维重建方法、系统及设备
CN117422804B (zh) * 2023-10-24 2024-06-07 中国科学院空天信息创新研究院 一种大规模城市街区三维场景渲染与目标精细空间定位方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758081A (zh) * 2022-06-15 2022-07-15 之江实验室 基于神经辐射场的行人重识别三维数据集构建方法和装置
CN115243025B (zh) * 2022-09-21 2023-01-24 深圳市明源云科技有限公司 三维渲染方法、装置、终端设备以及存储介质
CN115761565B (zh) * 2022-10-09 2023-07-21 名之梦(上海)科技有限公司 视频生成方法、装置、设备与计算机可读存储介质
CN117893693B (zh) * 2024-03-15 2024-05-28 南昌航空大学 一种密集slam三维场景重建方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117823A (zh) * 2018-08-31 2019-01-01 常州大学 一种基于多层神经网络的跨场景行人重识别的方法
US20200320777A1 (en) * 2019-04-04 2020-10-08 Google Llc Neural rerendering from 3d models
CN114004941A (zh) * 2022-01-04 2022-02-01 苏州浪潮智能科技有限公司 一种基于神经辐射场的室内场景三维重建系统及方法
CN114119839A (zh) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 三维模型重建与图像生成方法、设备以及存储介质
WO2022104299A1 (en) * 2020-11-16 2022-05-19 Google Llc Deformable neural radiance fields
CN114549731A (zh) * 2022-04-22 2022-05-27 清华大学 视角图像的生成方法、装置、电子设备及存储介质
CN114758081A (zh) * 2022-06-15 2022-07-15 之江实验室 基于神经辐射场的行人重识别三维数据集构建方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230230275A1 (en) * 2020-11-16 2023-07-20 Google Llc Inverting Neural Radiance Fields for Pose Estimation
CN113099208B (zh) * 2021-03-31 2022-07-29 清华大学 基于神经辐射场的动态人体自由视点视频生成方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117823A (zh) * 2018-08-31 2019-01-01 常州大学 一种基于多层神经网络的跨场景行人重识别的方法
US20200320777A1 (en) * 2019-04-04 2020-10-08 Google Llc Neural rerendering from 3d models
WO2022104299A1 (en) * 2020-11-16 2022-05-19 Google Llc Deformable neural radiance fields
CN114004941A (zh) * 2022-01-04 2022-02-01 苏州浪潮智能科技有限公司 一种基于神经辐射场的室内场景三维重建系统及方法
CN114119839A (zh) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 三维模型重建与图像生成方法、设备以及存储介质
CN114549731A (zh) * 2022-04-22 2022-05-27 清华大学 视角图像的生成方法、装置、电子设备及存储介质
CN114758081A (zh) * 2022-06-15 2022-07-15 之江实验室 基于神经辐射场的行人重识别三维数据集构建方法和装置

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958492A (zh) * 2023-07-12 2023-10-27 数元科技(广州)有限公司 一种基于NeRf重建三维底座场景渲染的VR编辑应用
CN116958492B (zh) * 2023-07-12 2024-05-03 数元科技(广州)有限公司 一种基于NeRf重建三维底座场景渲染的VR编辑方法
CN116977525A (zh) * 2023-07-31 2023-10-31 之江实验室 一种图像渲染方法、装置、存储介质及电子设备
CN116977525B (zh) * 2023-07-31 2024-03-01 之江实验室 一种图像渲染方法、装置、存储介质及电子设备
CN117036639A (zh) * 2023-08-21 2023-11-10 北京大学 一种面向受限空间的多视角几何场景建立方法和装置
CN117036639B (zh) * 2023-08-21 2024-04-30 北京大学 一种面向受限空间的多视角几何场景建立方法和装置
CN116778061A (zh) * 2023-08-24 2023-09-19 浙江大学 一种基于非真实感图片的三维物体生成方法
CN116778061B (zh) * 2023-08-24 2023-10-27 浙江大学 一种基于非真实感图片的三维物体生成方法
CN117422829A (zh) * 2023-10-24 2024-01-19 南京航空航天大学 一种基于神经辐射场的人脸图像合成优化方法
CN117422804A (zh) * 2023-10-24 2024-01-19 中国科学院空天信息创新研究院 一种大规模城市街区三维场景渲染与目标精细空间定位方法
CN117422804B (zh) * 2023-10-24 2024-06-07 中国科学院空天信息创新研究院 一种大规模城市街区三维场景渲染与目标精细空间定位方法
CN117173343B (zh) * 2023-11-03 2024-02-23 北京渲光科技有限公司 一种基于神经辐射场的重新照明方法及系统
CN117173343A (zh) * 2023-11-03 2023-12-05 北京渲光科技有限公司 一种基于神经辐射场的重新照明方法及系统
CN117333609B (zh) * 2023-12-01 2024-02-09 北京渲光科技有限公司 图像渲染方法、网络的训练方法、设备及介质
CN117333609A (zh) * 2023-12-01 2024-01-02 北京渲光科技有限公司 图像渲染方法、网络的训练方法、设备及介质
CN117710583A (zh) * 2023-12-18 2024-03-15 中铁第四勘察设计院集团有限公司 基于神经辐射场的空地影像三维重建方法、系统及设备

Also Published As

Publication number Publication date
US20230410560A1 (en) 2023-12-21
CN114758081A (zh) 2022-07-15

Similar Documents

Publication Publication Date Title
WO2023093186A1 (zh) 基于神经辐射场的行人重识别三维数据集构建方法和装置
Zhou et al. Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network
Sheng et al. UrbanLF: A comprehensive light field dataset for semantic segmentation of urban scenes
CN105051754B (zh) 用于通过监控系统检测人的方法和装置
Choi et al. Depth analogy: Data-driven approach for single image depth estimation using gradient samples
CN113345082B (zh) 一种特征金字塔多视图三维重建方法和系统
CN116310076A (zh) 基于神经辐射场的三维重建方法、装置、设备及存储介质
CN109977834B (zh) 从深度图像中分割人手与交互物体的方法和装置
CN116205962B (zh) 基于完整上下文信息的单目深度估计方法及系统
CN111539247A (zh) 一种超光谱人脸识别方法、装置、电子设备及其存储介质
CN116503703A (zh) 一种基于分流注意力Transformer的红外光和可见光图像融合系统
Hwang et al. Lidar depth completion using color-embedded information via knowledge distillation
Qin et al. Depth estimation by parameter transfer with a lightweight model for single still images
Liao et al. Aerial 3D reconstruction with line-constrained dynamic programming
CN113724527A (zh) 一种停车位管理方法
Shit et al. An encoder‐decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection
CN114170422A (zh) 一种煤矿井下图像语义分割方法
Mei et al. GTMFuse: Group-attention transformer-driven multiscale dense feature-enhanced network for infrared and visible image fusion
DE102022120595A1 (de) Durchführen von verdeckungs-bewusster globaler 3d-posen- und formschätzung von gelenkigen objekten
Xu et al. Interactive algorithms in complex image processing systems based on big data
Yi et al. Progressive back-traced dehazing network based on multi-resolution recurrent reconstruction
CN114119678A (zh) 光流估计方法、计算机程序产品、存储介质及电子设备
Siddiqua et al. MACGAN: an all-in-one image restoration under adverse conditions using multidomain attention-based conditional GAN
Alsisan et al. Variation-Factored Encoding of Facade Images.
Yang et al. Semantic perceptive infrared and visible image fusion Transformer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897278

Country of ref document: EP

Kind code of ref document: A1