WO2020134818A1 - Image processing method and related product - Google Patents

Image processing method and related product Download PDF

Info

Publication number
WO2020134818A1
WO2020134818A1 PCT/CN2019/121345 CN2019121345W WO2020134818A1 WO 2020134818 A1 WO2020134818 A1 WO 2020134818A1 CN 2019121345 W CN2019121345 W CN 2019121345W WO 2020134818 A1 WO2020134818 A1 WO 2020134818A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
value
target
video images
Prior art date
Application number
PCT/CN2019/121345
Other languages
French (fr)
Chinese (zh)
Inventor
赵培骁
虞勇波
黄轩
王孝宇
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020134818A1 publication Critical patent/WO2020134818A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • This application relates to the technical field of image processing, and in particular to an image processing method and related products.
  • 3D reconstruction technology has been widely used in many cutting-edge technologies. It is a common scientific problem and core technology in the fields of computer vision, medical image processing, scientific computing and virtual reality, and digital media creation.
  • three-dimensional reconstruction is mostly based on scene point cloud data, and point cloud data is mostly obtained through multiple cameras, laser cameras, etc. After the acquisition, multiple steps such as three-dimensional matching are required, which brings system costs. High, high requirements on the system's computing power, unable to achieve miniaturization and other issues.
  • the embodiments of the present application provide an image processing method and related products, which can reduce the implementation cost of three-dimensional reconstruction.
  • a first aspect of an embodiment of the present application provides an image processing method, including:
  • the depth map is processed according to point cloud data processing technology to obtain a 3D image.
  • the performing deep feature extraction according to the multiple video images to obtain a feature set includes:
  • a maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
  • each video image in the plurality of video images includes a human face
  • the image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, including:
  • the image quality evaluation value corresponding to the target angle value is determined.
  • the acquiring two weight values corresponding to the two-dimensional angle value includes:
  • each mapping relationship includes a first mapping between the angle value in the x direction and the first weight value relationship;
  • the target second weight value is determined according to the target first weight value.
  • a second aspect of an embodiment of the present application provides an image processing apparatus, including:
  • the acquisition unit is used to acquire a video stream in a specified area through a single camera
  • a sampling unit configured to sample the video stream to obtain multiple video images
  • a preprocessing unit configured to preprocess the multiple video images to obtain the preprocessed multiple video images
  • An extraction unit configured to perform depth feature extraction based on the pre-processed multiple video images to obtain a feature set
  • a generating unit configured to generate a depth map according to the feature set
  • the processing unit is configured to process the depth map according to point cloud data processing technology to obtain a 3D image.
  • an embodiment of the present application provides an electronic device, including a processor, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor,
  • the above program includes instructions for performing the steps in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the first embodiment of the present application. Part or all of the steps described in one aspect.
  • an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as implemented in the present application Examples of some or all of the steps described in the first aspect.
  • the computer program product may be a software installation package.
  • a single camera is used to obtain a video stream in a specified area, the video stream is sampled, multiple video images are obtained, and multiple video images are preprocessed.
  • Obtain multiple pre-processed video images perform depth feature extraction based on the pre-processed multiple video images, obtain a feature set, generate a depth map according to the feature set, and process the depth map according to point cloud data processing technology to obtain a 3D image
  • a single camera can be used to collect video images, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene graph is realized through point cloud data processing technology, which further reduces Three-dimensional reproduction costs.
  • FIG. 1A is a schematic flowchart of an embodiment of an image processing method provided by an embodiment of the present application.
  • 1B is a schematic structural diagram of a preset convolutional neural network provided by an embodiment of the present application.
  • 1C is a demonstration effect diagram of any video image provided by an embodiment of the present application.
  • FIG. 1D is a depth map of any video image in FIG. 1C provided by an embodiment of the present application.
  • 1E is a simple schematic diagram of a point cloud data processing technology provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of another embodiment of an image processing method according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an embodiment of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of an image processing apparatus provided by an embodiment of the present application.
  • the electronic device in the embodiment of the present application can be connected to multiple cameras, each camera can be used to capture video images, and each camera can have a corresponding position mark, or, there can be a The corresponding number.
  • cameras can be installed in public places, such as schools, museums, intersections, pedestrian streets, office buildings, garages, airports, hospitals, subway stations, stations, bus platforms, supermarkets, hotels, entertainment venues, and so on. After the camera captures the video image, the video image can be saved to the memory of the system where the electronic device is located. Multiple image libraries can be stored in the memory, and each image library can contain different video images of the same person. Of course, each image library can also be used to store video images of an area or video images taken by a specified camera.
  • each frame of video image captured by the camera corresponds to one piece of attribute information
  • the attribute information is at least one of the following: the shooting time of the video image, the position of the video image, and the attribute parameters of the video image ( Format, size, resolution, etc.), the number of the video image, and the character attributes of the video image.
  • the character characteristic attributes in the video image may include, but are not limited to: the number of characters in the video image, the position of the character, the angle value of the character, the age, the image quality, and so on.
  • the requirements on the device are very low, and only a single camera capable of capturing RGB images or videos is needed to complete the data collection and point cloud generation, and then send the point cloud data and the original RGB images to the subsequent package.
  • Three-dimensional reconstruction of the scene can be achieved in the process.
  • the scene 3D reconstruction technology based on single camera depth of field prediction can be divided into: video stream acquisition, image preprocessing, depth feature extraction and scene depth map generation, depth map-based point cloud data generation, RGB image and point cloud data matching fusion, 3D Six modules are generated on the surface of the object.
  • this application can optimize the method of generating point cloud data from the scene, greatly reducing its requirements for equipment and computing power.
  • FIG. 1A is a schematic flowchart of an embodiment of an image processing method according to an embodiment of the present application.
  • the image processing method described in this embodiment includes the following steps:
  • the electronic device may include a single camera, and the single camera may be a visible light camera.
  • the specified area can be set by the user or the system default.
  • the electronic device can shoot a specified area at a preset time interval through a single camera to obtain a video stream, and the preset time interval can be set by the user or the system default.
  • the electronic device can capture the video stream collected by the camera after the camera is turned on, and perform frame extraction processing on the acquired video stream, that is, the video stream is sampled according to a preset sampling frequency to obtain multiple video images.
  • the sampling frequency can be set by the user or the system default.
  • the above preprocessing may include at least one of the following: scaling processing, noise reduction processing, image enhancement processing, etc., which is not limited herein.
  • the preprocessing may be to scale the size of the image, and the framed image is scaled and expanded to a height of 224 pixels and a width of 320 pixels into a feature extraction network for feature extraction.
  • the electronic device can perform depth feature extraction on the pre-processed multiple video images.
  • multiple pre-processed video images can be input to a preset convolutional neural network to perform deep feature extraction to obtain a feature set.
  • performing depth feature extraction based on the multiple video images to obtain a feature set may include the following steps:
  • the preset convolutional neural network may include operations such as convolution, pooling, and normalization.
  • the purpose of these operations is to extract image features, remove image redundant information, and speed up the network speed.
  • the extracted features include the outline, texture, and surface information of each object in the image, the edge information of the connection between the object and the object, and the position information of the existing object in the entire scene.
  • a feature image containing the entire image information is generated.
  • the image quality evaluation can be performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values.
  • the maximum value among the image quality evaluation values can be selected and the The preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
  • image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, which can be implemented as follows:
  • At least one image quality evaluation index may be used to perform image quality evaluation on each of the pre-processed multiple video images to obtain multiple image quality evaluation values.
  • the image quality evaluation indicators may include, but are not limited to: average grayscale, mean square deviation, entropy, edge retention, signal-to-noise ratio, and so on. It can be defined that the larger the obtained image quality evaluation value, the better the image quality.
  • an image quality evaluation index can be used for evaluation.
  • entropy the better the image quality of the face Well
  • the smaller the entropy the worse the quality of the face image.
  • multiple image quality evaluation indicators may be used to evaluate the image to be evaluated.
  • the weight of each image quality evaluation index in the image quality evaluation indexes can obtain multiple image quality evaluation values, and the final image quality evaluation value can be obtained according to the multiple image quality evaluation values and their corresponding weights, for example, three
  • the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1; step 42 above, corresponding to the maximum value after preprocessing
  • the video image is input to the preset convolutional neural network to obtain the feature set, which may include the following steps:
  • N downsampling performs N downsampling on the preprocessed video image corresponding to the maximum value through the N downsampling layers to obtain a downsampled video image.
  • At least one of the N downsampling includes the following at least One operation: convolution operation, pooling operation and normalization operation;
  • the preset convolutional neural network may include N down-sampling layers, N up-sampling layers, and convolution layers, where N is an integer greater than 1.
  • the foregoing preset convolutional neural network can be understood as an encoding-decoding network.
  • the above-mentioned N down-sampling layers can be understood as an encoding process
  • the above-mentioned N up-sampling layers and convolution layers can be understood as a decoding process.
  • the encoding process (in the dashed frame on the left) is feature extraction, and the feature image is obtained by four downsampling.
  • Downsampling includes operations such as convolution, pooling, and normalization. I don't know the specific number and specifications. I will add it to you if necessary. The number of downsampling is obtained through experiments, taking into account the speed and accuracy of the algorithm. Theoretically, the more sampling times the accuracy will increase but the overall speed will decrease, so four times are used to balance speed and accuracy. In the process of downsampling, the image size will be reduced.
  • the image I input is 224*320
  • the length and width of the image after each downsampling will become 1/2 of the original, that is, After four downsampling, the image is only 7*10, so the size of the image needs to be restored through the decoding (upsampling) network on the right, and the process of matching the extracted feature image to the depth image is also completed.
  • the number of upsampling is the same as downsampling, taking into account the balance of accuracy and speed, and finally takes four.
  • the straight line connecting the down-sampling and the up-sampling represents a "skip-connection", which can improve the accuracy of the algorithm.
  • step 104 depth feature extraction is performed according to the pre-processed multiple video images to obtain a feature set, which may be implemented as follows:
  • the multiple video images are input to a preset convolutional neural network to obtain a feature set.
  • the preset convolutional neural network may include operations such as convolution, pooling, and normalization.
  • the purpose of these operations is to extract image features, remove image redundant information, and speed up the network speed.
  • the extracted features include the outline, texture, and surface information of each object in the image, the edge information of the connection between the object and the object, and the position information of the existing object in the entire scene. Finally, a feature image containing the entire image information is generated.
  • each video image in the plurality of video images includes a human face
  • image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, including:
  • the electronic device can perform image segmentation on any video image to obtain a face image.
  • the above two-dimensional angle value can be understood as the two-dimensional angle between the face and the camera. Each of the two-dimensional angle values may correspond to a weight value.
  • the two weight values corresponding to the two-dimensional angle value may be preset or the system defaults.
  • the first target weight value corresponding to the x-angle value and the second target weight value corresponding to the y-angle value, the first target weight value + the second target weight value 1.
  • target angle value x angle value * target first weight value + y angle value * target second weight value, in this way, a two-dimensional angle value can be converted into a one-dimensional angle value, which is used to achieve The angle is accurately expressed.
  • obtaining two weights corresponding to the two-dimensional angle value may include the following steps:
  • mapping relationship determines the target mapping relationship corresponding to the target environmental brightness value, and each mapping relationship includes the number between the angle value in the x direction and the first weight value.
  • the face evaluation device may pre-store the mapping relationship between the preset angle value and the angle quality evaluation value, and then, according to the mapping relationship, determine the first target evaluation value corresponding to the target angle value, further, such as the first If the target evaluation value is greater than the preset evaluation threshold, it can be understood that the face image is easy to recognize and will be recognized to a large extent.
  • the face corresponding to this angle can be used to unlock the face, or, such an angle
  • the corresponding face can be used for camera collection, which improves the face collection efficiency of the face evaluation device.
  • the above-mentioned feature set is also called a feature map.
  • the feature map is not the final depth image, so the decoding network is necessary.
  • the value of each point is not the pixel value of a regular image, but the distance of the point from the camera in millimeters.
  • FIG. 1C shows a frame of video image
  • FIG. 1D is a depth map, which is presented as a grayscale image.
  • the grayscale image is displayed after the distance values in the depth map are processed by correlation processing. The further away from the lens, the lower the gray value, the closer the color looks to black. Conversely, the closer the point to the lens, the greater the gray value, and the closer the color appears to white.
  • the above feature set may include multiple feature points, each feature point includes coordinate position, feature size and feature direction, because the feature point is a vector, therefore, you can calculate the feature value from the feature size and feature direction, so, you can calculate A feature value corresponding to each feature point in the feature set is obtained to obtain multiple target feature values, and each feature point corresponds to a feature value.
  • the electronic device can also pre-store the mapping relationship between the preset feature value and the depth value, and further, the target depth value corresponding to each target feature value in the multiple target feature values can be determined according to the mapping relationship to obtain multiple targets Depth value, each target depth value corresponds to a coordinate position, and a depth map is constructed based on multiple target depth values. In this way, feature points can be established to construct a depth map.
  • each point in the depth map is the distance from the camera to each point in the original image.
  • the point cloud generation is essentially the mapping of points between different coordinate systems, that is, the process of mapping any coordinate m (u, v) in a two-dimensional image to the spatial coordinate M (Xw, Yw, Zw) in a three-dimensional world.
  • the final coordinate conversion formula is:
  • M (Xw, Yw, Zw) is the world coordinates
  • m (u, v) is the depth map coordinates
  • Zc is the value of each point in the depth map is the distance of the point from the camera.
  • u0 and v0 are the coordinate values of the center of the two-dimensional image.
  • dx and dy convert distance units to meters, and 1000 if the distance value is millimeter units.
  • f is the focal length of the camera lens.
  • a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessing
  • the depth feature extraction is performed according to the pre-processed multiple video images to obtain the feature set, the depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image.
  • a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the three-dimensional weight Current costs.
  • FIG. 2 is a schematic flowchart of an embodiment of an image processing method according to an embodiment of the present application.
  • the image processing method described in this embodiment includes the following steps:
  • a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessing
  • the image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, and the maximum value is selected from the multiple image quality evaluation values, and the The preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set, a depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image.
  • a single camera collects video images, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the cost of 3D reproduction .
  • FIG. 3 is a schematic structural diagram of an embodiment of an image processing apparatus according to an embodiment of the present application.
  • the image processing device described in this embodiment includes: an acquisition unit 301, a sampling unit 302, a preprocessing unit 303, an extraction unit 304, a generation unit 305, and a processing unit 306, as follows:
  • the obtaining unit 301 is configured to obtain a video stream in a specified area through a single camera
  • the sampling unit 302 is configured to sample the video stream to obtain multiple video images
  • a pre-processing unit 303 configured to pre-process the multiple video images to obtain the pre-processed multiple video images
  • the extraction unit 304 is configured to perform depth feature extraction based on the pre-processed multiple video images to obtain a feature set
  • the generating unit 305 is configured to generate a depth map according to the feature set
  • the processing unit 306 is configured to process the depth map according to a point cloud data processing technology to obtain a 3D image.
  • a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessing
  • the depth feature extraction is performed according to the pre-processed multiple video images to obtain the feature set, the depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image.
  • a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the three-dimensional weight Current costs.
  • the above obtaining unit 301 can be used to implement the method described in step 101 above
  • the sampling unit 302 can be used to implement the method described in step 102 above
  • the above preprocessing unit 303 can be used to implement the method described in step 103 above
  • the above extraction unit 304 may be used to implement the method described in step 104 above
  • the generation unit 305 may be used to implement the method described in step 105 above
  • the processing unit 306 may be used to implement the method described in step 106 above, and so on.
  • the extraction unit 304 is specifically configured to:
  • a maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
  • the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1;
  • the extraction unit 304 is specifically used to:
  • At least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
  • the convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
  • the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1;
  • the extraction unit 304 is specifically used to:
  • At least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
  • the convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
  • each video image in the plurality of video images includes a human face
  • the extraction unit 304 is specifically configured to:
  • the image quality evaluation value corresponding to the target angle value is determined.
  • the feature set includes multiple feature points, and each feature point includes a coordinate position, feature direction, and feature size;
  • the generating unit 305 is specifically configured to:
  • the target depth value corresponding to each target feature value in the multiple target feature values is determined to obtain multiple target depth values, each target depth value corresponding to a coordinate position;
  • the depth map is constructed according to the plurality of target depth values.
  • FIG. 4 is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application.
  • the electronic device described in this embodiment includes: at least one input device 1000; at least one output device 2000; at least one processor 3000, such as a CPU; and memory 4000, the above input device 1000, output device 2000, processor 3000 and The memory 4000 is connected through a bus 5000.
  • the input device 1000 may specifically be a touch panel, physical buttons, or a mouse.
  • the above output device 2000 may specifically be a display screen.
  • the above-mentioned memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the above memory 4000 is used to store a set of program codes, and the above input device 1000, output device 2000, and processor 3000 are used to call the program codes stored in the memory 4000, and perform the following operations:
  • the aforementioned processor 3000 is used for:
  • the depth map is processed according to point cloud data processing technology to obtain a 3D image.
  • a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessed
  • the depth feature extraction is performed based on the pre-processed multiple video images to obtain a feature set, a depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image.
  • a single camera is used to collect video images, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the three-dimensional reproduction cost.
  • the processor 3000 is specifically used to:
  • a maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
  • the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1;
  • the processor 3000 is specifically used to:
  • At least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
  • the convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
  • each video image in the plurality of video images includes a human face
  • the processor 3000 is specifically used to perform image segmentation on the video image i to obtain A target face image, the video image i is any frame of the pre-processed multiple video images; obtain a target face image, and obtain a two-dimensional angle value of the target face image ,
  • the two-dimensional angle value includes an x-angle value and a y-angle value; two weight values corresponding to the two-dimensional angle value are obtained, where the target first weight value corresponding to the x-angle value and the y angle value Corresponding target second weight, the sum of the target first weight and the target second weight is 1; according to the x angle value, the y angle value, the target first weight value, all
  • the target second weight value is weighted to obtain the target angle value; according to the preset mapping relationship between the angle value and the angle quality evaluation value, the image quality evaluation value corresponding to the target angle value is determined.
  • An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes some or all steps of any one of the image processing methods described in the foregoing method embodiments.

Abstract

An image processing method and a related product. The method comprises: obtaining the video stream of a designated region by means of a single camera (101); sampling the video stream to obtain multiple video images (102); preprocessing the multiple video images to obtain multiple preprocessed video images (103); according to the multiple preprocessed video images, performing depth feature extraction to obtain a feature set (104); according to the feature set, generating a depth map (105); and processing the depth map according to a point cloud data processing technique to obtain a 3D image (106). The implementation cost of three-dimensional reconstruction can be reduced.

Description

图像处理方法及相关产品Image processing method and related products
本申请要求于2018年12月29日提交中国专利局,申请号为201811643004.6、发明名称为“图像处理方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application filed on December 29, 2018, with the application number 201811643004.6 and the invention titled "Image Processing Methods and Related Products", the entire contents of which are incorporated by reference in this application.
技术领域Technical field
本申请涉及图像处理技术领域,具体涉及一种图像处理方法及相关产品。This application relates to the technical field of image processing, and in particular to an image processing method and related products.
背景技术Background technique
随着人工智能技术的发展与进步,三维重建技术已经大量应用于许多前沿科技之中,是计算机视觉、医学图像处理、科学计算和虚拟现实、数字媒体创作等领域的共性科学问题和核心技术。With the development and progress of artificial intelligence technology, 3D reconstruction technology has been widely used in many cutting-edge technologies. It is a common scientific problem and core technology in the fields of computer vision, medical image processing, scientific computing and virtual reality, and digital media creation.
传统上三维重建多是基于场景点云数据来实现,而点云数据的获取多是通过多摄像头、激光摄像头等获取,而获取之后还需要三维匹配等多个步骤,这就带来了系统成本高、对系统计算能力要求高、无法做到小型化等问题。Traditionally, three-dimensional reconstruction is mostly based on scene point cloud data, and point cloud data is mostly obtained through multiple cameras, laser cameras, etc. After the acquisition, multiple steps such as three-dimensional matching are required, which brings system costs. High, high requirements on the system's computing power, unable to achieve miniaturization and other issues.
发明内容Summary of the invention
本申请实施例提供了一种图像处理方法及相关产品,可以降低三维重建的实现成本。The embodiments of the present application provide an image processing method and related products, which can reduce the implementation cost of three-dimensional reconstruction.
本申请实施例第一方面提供了一种图像处理方法,包括:A first aspect of an embodiment of the present application provides an image processing method, including:
通过单摄像头获取指定区域的视频流;Obtain the video stream of the specified area through a single camera;
对所述视频流进行采样,得到多张视频图像;Sampling the video stream to obtain multiple video images;
对所述多张视频图像进行预处理,得到预处理后的所述多张视频图像;Preprocessing the multiple video images to obtain the preprocessed multiple video images;
根据预处理后的所述多张视频图像进行深度特征提取,得到特征集;Performing depth feature extraction according to the pre-processed multiple video images to obtain a feature set;
依据所述特征集生成深度图;Generate a depth map according to the feature set;
依据点云数据处理技术对所述深度图进行处理,得到3D图像。The depth map is processed according to point cloud data processing technology to obtain a 3D image.
可选地,所述根据所述多张视频图像进行深度特征提取,得到特征集,包括:Optionally, the performing deep feature extraction according to the multiple video images to obtain a feature set includes:
对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值;Performing image quality evaluation on each video image in the pre-processed multiple video images to obtain multiple image quality evaluation values;
从所述多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。A maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
可选地,在所述多张视频图像中每一张视频图像包括人脸的情况下,Optionally, in the case where each video image in the plurality of video images includes a human face,
所述对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值,包括:The image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, including:
对视频图像i进行图像分割,得到目标人脸图像,所述视频图像i为所述预处理后的所述多张视频图像中的任一帧视频图像;Performing image segmentation on the video image i to obtain a target face image, where the video image i is any frame of the plurality of video images after the preprocessing;
获取目标人脸图像,并获取所述目标人脸图像的二维角度值,所述二维角度值包括x角度值、y角度值;Acquiring a target face image, and acquiring a two-dimensional angle value of the target face image, where the two-dimensional angle value includes an x angle value and a y angle value;
获取所述二维角度值对应的二个权值,其中,所述x角度值对应的目标第一权值,所述y角度值对应的目标第二权值,所述目标第一权值与所述目标第二权值之和为1;Acquiring two weight values corresponding to the two-dimensional angle value, wherein the target first weight value corresponding to the x angle value, the target second weight value corresponding to the y angle value, and the target first weight value are The sum of the second weights of the target is 1;
依据所述x角度值、所述y角度值、所述目标第一权值、所述目标第二权值进行加权运算,得到目标角度值;Performing a weighted operation according to the x angle value, the y angle value, the target first weight value, and the target second weight value to obtain a target angle value;
按照预设的角度值与角度质量评价值之间的映射关系,确定所述目标角度值对应的图像质量评价值。According to the mapping relationship between the preset angle value and the angle quality evaluation value, the image quality evaluation value corresponding to the target angle value is determined.
可选地,所述获取所述二维角度值对应的二个权值,包括:Optionally, the acquiring two weight values corresponding to the two-dimensional angle value includes:
获取目标环境亮度值;Obtain the target environment brightness value;
按照预设的环境亮度值与映射关系之间的映射关系,确定所述目标环境亮度值对应的目标映射关系,每一映射关系包括x方向的角度值与第一权值之间的第一映射关系;According to the mapping relationship between the preset environmental brightness value and the mapping relationship, determine the target mapping relationship corresponding to the target environmental brightness value, each mapping relationship includes a first mapping between the angle value in the x direction and the first weight value relationship;
依据所述目标映射关系确定所述x角度值对应的所述目标第一权值;Determine the first target weight value corresponding to the x angle value according to the target mapping relationship;
依据所述目标第一权值确定所述目标第二权值。The target second weight value is determined according to the target first weight value.
本申请实施例第二方面提供了一种图像处理装置,包括:A second aspect of an embodiment of the present application provides an image processing apparatus, including:
获取单元,用于通过单摄像头获取指定区域的视频流;The acquisition unit is used to acquire a video stream in a specified area through a single camera;
采样单元,用于对所述视频流进行采样,得到多张视频图像;A sampling unit, configured to sample the video stream to obtain multiple video images;
预处理单元,用于对所述多张视频图像进行预处理,得到预处理后的所述多张视频图像;A preprocessing unit, configured to preprocess the multiple video images to obtain the preprocessed multiple video images;
提取单元,用于根据预处理后的所述多张视频图像进行深度特征提取,得到特征集;An extraction unit, configured to perform depth feature extraction based on the pre-processed multiple video images to obtain a feature set;
生成单元,用于依据所述特征集生成深度图;A generating unit, configured to generate a depth map according to the feature set;
处理单元,用于依据点云数据处理技术对所述深度图进行处理,得到3D图像。The processing unit is configured to process the depth map according to point cloud data processing technology to obtain a 3D image.
第三方面,本申请实施例提供一种电子设备,包括处理器、存储器以及一个或多个程序,其中,上述一个或多个程序被存储在上述存储器中,并且被配置由上述处理器执行,上述程序包括用于执行本申请实施例第一方面中的步骤的指令。In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, The above program includes instructions for performing the steps in the first aspect of the embodiments of the present application.
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。According to a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the first embodiment of the present application. Part or all of the steps described in one aspect.
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。In a fifth aspect, an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as implemented in the present application Examples of some or all of the steps described in the first aspect. The computer program product may be a software installation package.
实施本申请实施例,具备如下有益效果:The implementation of the embodiments of the present application has the following beneficial effects:
可以看出,通过本申请实施例所描述的图像处理方法及相关产品,通过单摄像头获取指定区域的视频流,对视频流进行采样,得到多张视频图像,对多张视频图像进行预处理,得到预处理后的多张视频图像,根据预处理后的多张 视频图像进行深度特征提取,得到特征集,依据特征集生成深度图,依据点云数据处理技术对深度图进行处理,得到3D图像,如此,能够通过单摄像头采集视频图像,并经过采样,预处理,以及特征提取,得到特征集,将该特征集转化为深度图,并通过点云数据处理技术实现3D场景图,进而,降低了三维重现成本。It can be seen that through the image processing method and related products described in the embodiments of the present application, a single camera is used to obtain a video stream in a specified area, the video stream is sampled, multiple video images are obtained, and multiple video images are preprocessed. Obtain multiple pre-processed video images, perform depth feature extraction based on the pre-processed multiple video images, obtain a feature set, generate a depth map according to the feature set, and process the depth map according to point cloud data processing technology to obtain a 3D image In this way, a single camera can be used to collect video images, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene graph is realized through point cloud data processing technology, which further reduces Three-dimensional reproduction costs.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1A是本申请实施例提供的一种图像处理方法的实施例流程示意图;1A is a schematic flowchart of an embodiment of an image processing method provided by an embodiment of the present application;
图1B是本申请实施例提供的预设卷积神经网络的结构示意图;1B is a schematic structural diagram of a preset convolutional neural network provided by an embodiment of the present application;
图1C是本申请实施例提供的任一视频图像的演示效果图;1C is a demonstration effect diagram of any video image provided by an embodiment of the present application;
图1D是本申请实施例提供的图1C中任一视频图像的深度图;1D is a depth map of any video image in FIG. 1C provided by an embodiment of the present application;
图1E是本申请实施例提供的点云数据处理技术的简单原理图;1E is a simple schematic diagram of a point cloud data processing technology provided by an embodiment of the present application;
图2是本申请实施例提供的一种图像处理方法的另一实施例流程示意图;2 is a schematic flowchart of another embodiment of an image processing method according to an embodiment of the present application;
图3是本申请实施例提供的一种图像处理装置的实施例结构示意图;3 is a schematic structural diagram of an embodiment of an image processing apparatus provided by an embodiment of the present application;
图4是本申请实施例提供的一种图像处理装置的实施例结构示意图。4 is a schematic structural diagram of an embodiment of an image processing apparatus provided by an embodiment of the present application.
具体实施方式detailed description
需要说明的是,本申请实施例中的电子设备可与多个摄像头连接,每一摄像头均可用于抓拍视频图像,每一摄像头均可有一个与之对应的位置标记,或者,可有一个与之对应的编号。通常情况下,摄像头可设置在公共场所,例如,学校、博物馆、十字路口、步行街、写字楼、车库、机场、医院、地铁站、车站、公交站台、超市、酒店、娱乐场所等等。摄像头在拍摄到视频图像后,可将该视频图像保存到电子设备所在系统的存储器。存储器中可存储有多个图像库,每一图像库可包含同一人的不同视频图像,当然,每一图像库还可以用于 存储一个区域的视频图像或者某个指定摄像头拍摄的视频图像。It should be noted that the electronic device in the embodiment of the present application can be connected to multiple cameras, each camera can be used to capture video images, and each camera can have a corresponding position mark, or, there can be a The corresponding number. Generally, cameras can be installed in public places, such as schools, museums, intersections, pedestrian streets, office buildings, garages, airports, hospitals, subway stations, stations, bus platforms, supermarkets, hotels, entertainment venues, and so on. After the camera captures the video image, the video image can be saved to the memory of the system where the electronic device is located. Multiple image libraries can be stored in the memory, and each image library can contain different video images of the same person. Of course, each image library can also be used to store video images of an area or video images taken by a specified camera.
进一步可选地,本申请实施例中,摄像头拍摄的每一帧视频图像均对应一个属性信息,属性信息为以下至少一种:视频图像的拍摄时间、视频图像的位置、视频图像的属性参数(格式、大小、分辨率等)、视频图像的编号和视频图像中的人物特征属性。上述视频图像中的人物特征属性可包括但不仅限于:视频图像中的人物个数、人物位置、人物角度值、年龄、图像质量等等。Further optionally, in the embodiment of the present application, each frame of video image captured by the camera corresponds to one piece of attribute information, and the attribute information is at least one of the following: the shooting time of the video image, the position of the video image, and the attribute parameters of the video image ( Format, size, resolution, etc.), the number of the video image, and the character attributes of the video image. The character characteristic attributes in the video image may include, but are not limited to: the number of characters in the video image, the position of the character, the angle value of the character, the age, the image quality, and so on.
本申请实施例,在设备上要求很低,仅需要能够拍摄RGB图像或视频的单个摄像头即可完成数据的采集与点云的生成,再将点云数据与原始RGB图像送入后续封装好的流程中即可实现场景的三维重建。基于单摄像头景深预测的场景三维重建技术可分为:视频流获取、图像预处理、深度特征提取与场景深度图生成、基于深度图的点云数据生成、RGB图像与点云数据匹配融合、三维物体表面生成六个模块。其中视频流获取以及后面的RGB图像与点云数据匹配融合、三维物体表面生成技术相对成熟,本申请可优化从场景中生成点云数据的方法,大大降低了其对设备和计算能力的要求。In the embodiment of the present application, the requirements on the device are very low, and only a single camera capable of capturing RGB images or videos is needed to complete the data collection and point cloud generation, and then send the point cloud data and the original RGB images to the subsequent package. Three-dimensional reconstruction of the scene can be achieved in the process. The scene 3D reconstruction technology based on single camera depth of field prediction can be divided into: video stream acquisition, image preprocessing, depth feature extraction and scene depth map generation, depth map-based point cloud data generation, RGB image and point cloud data matching fusion, 3D Six modules are generated on the surface of the object. Among them, the acquisition of video streams and the matching and fusion of subsequent RGB images and point cloud data, and the relatively mature generation technology of the surface of three-dimensional objects, this application can optimize the method of generating point cloud data from the scene, greatly reducing its requirements for equipment and computing power.
请参阅图1A,为本申请实施例提供的一种图像处理方法的实施例流程示意图。本实施例中所描述的图像处理方法,包括以下步骤:Please refer to FIG. 1A, which is a schematic flowchart of an embodiment of an image processing method according to an embodiment of the present application. The image processing method described in this embodiment includes the following steps:
101、通过单摄像头获取指定区域的视频流。101. Obtain a video stream in a specified area through a single camera.
其中,本申请实施例中,电子设备可以包括单摄像头,该单摄像头可以为可见光摄像头。上述指定区域可以为由用户自行设置或者系统默认。具体实现中,电子设备可以通过单摄像头按照预设时间间隔对指定区域进行拍摄,得到视频流,预设时间间隔可以由用户自行设置或者系统默认。Wherein, in the embodiment of the present application, the electronic device may include a single camera, and the single camera may be a visible light camera. The specified area can be set by the user or the system default. In a specific implementation, the electronic device can shoot a specified area at a preset time interval through a single camera to obtain a video stream, and the preset time interval can be set by the user or the system default.
102、对所述视频流进行采样,得到多张视频图像。102. Sampling the video stream to obtain multiple video images.
具体实现中,电子设备可在摄像头开启后,捕捉摄像头所采集的视频流,并对获取的视频流进行抽帧处理,即按照预设采样频率对视频流进行采样,得到多张视频图像,预设采样频率可以由用户自行设置或者系统默认。In a specific implementation, the electronic device can capture the video stream collected by the camera after the camera is turned on, and perform frame extraction processing on the acquired video stream, that is, the video stream is sampled according to a preset sampling frequency to obtain multiple video images. The sampling frequency can be set by the user or the system default.
103、对所述多张视频图像进行预处理,得到预处理后的所述多张视频图像。103. Pre-process the multiple video images to obtain the pre-processed multiple video images.
其中,上述预处理可以包括以下至少一种:缩放处理、降噪处理、图像增强处理等等,在此不做限定。具体地,预处理可以为对图像的大小进行缩放,将抽帧出来的图像缩放、扩大到高度为224像素、宽度为320像素的图像送入特征提取网络,以便进行特征提取。Wherein, the above preprocessing may include at least one of the following: scaling processing, noise reduction processing, image enhancement processing, etc., which is not limited herein. Specifically, the preprocessing may be to scale the size of the image, and the framed image is scaled and expanded to a height of 224 pixels and a width of 320 pixels into a feature extraction network for feature extraction.
104、根据预处理后的所述多张视频图像进行深度特征提取,得到特征集。104. Perform depth feature extraction according to the pre-processed multiple video images to obtain a feature set.
其中,电子设备可以将预处理后的多张视频图像进行深度特征提取。具体地,可以将预处理后的多张视频图像输入到预设卷积神经网络,以进行深度特征提取,得到特征集。Among them, the electronic device can perform depth feature extraction on the pre-processed multiple video images. Specifically, multiple pre-processed video images can be input to a preset convolutional neural network to perform deep feature extraction to obtain a feature set.
可选地,上述步骤104,根据所述多张视频图像进行深度特征提取,得到特征集,可包括如下步骤:Optionally, in the above step 104, performing depth feature extraction based on the multiple video images to obtain a feature set may include the following steps:
41、对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值;41. Perform image quality evaluation on each video image in the pre-processed multiple video images to obtain multiple image quality evaluation values;
42、从所述多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。42. Select a maximum value from the plurality of image quality evaluation values, and input the preprocessed video image corresponding to the maximum value to a preset convolutional neural network to obtain a feature set.
本申请实施例中,上述预设卷积神经网络可以包含卷积、池化、归一化等操作,这些操作的目的是提取图像的特征、去除图像冗余信息、以加快网络速度等。提取的特征包含了图像中每个物体的轮廓、纹理、表面信息,物体与物体之间的相接处的边缘信息,已经物体处在整个场景中的位置信息等。最终生成了一张包含了整张图像信息的特征图像。具体实现中,可以对预处理后的多张视频图像中的每一张视频图像进行图像质量评价,得到多个图像质量评价值,进而,可以选取图像质量评价值中的最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。In the embodiment of the present application, the preset convolutional neural network may include operations such as convolution, pooling, and normalization. The purpose of these operations is to extract image features, remove image redundant information, and speed up the network speed. The extracted features include the outline, texture, and surface information of each object in the image, the edge information of the connection between the object and the object, and the position information of the existing object in the entire scene. Finally, a feature image containing the entire image information is generated. In a specific implementation, the image quality evaluation can be performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values. Furthermore, the maximum value among the image quality evaluation values can be selected and the The preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
可选的,上述步骤41中,对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值,可以按照如下方式实施:Optionally, in step 41 above, image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, which can be implemented as follows:
可采用至少一个图像质量评价指标对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值。At least one image quality evaluation index may be used to perform image quality evaluation on each of the pre-processed multiple video images to obtain multiple image quality evaluation values.
其中,图像质量评价指标可包括但不仅限于:平均灰度、均方差、熵、边缘保持度、信噪比等等。可定义为得到的图像质量评价值越大,则图像质量越好。Among them, the image quality evaluation indicators may include, but are not limited to: average grayscale, mean square deviation, entropy, edge retention, signal-to-noise ratio, and so on. It can be defined that the larger the obtained image quality evaluation value, the better the image quality.
需要说明的是,由于采用单一评价指标对图像质量进行评价时,具有一定的局限性,因此,可采用多个图像质量评价指标对图像质量进行评价,当然,对图像质量进行评价时,并非图像质量评价指标越多越好,因为图像质量评价指标越多,图像质量评价过程的计算复杂度越高,也不见得图像质量评价效果越好,因此,在对图像质量评价要求较高的情况下,可采用2~10个图像质量评价指标对图像质量进行评价。具体地,选取图像质量评价指标的个数及哪个指标,依据具体实现情况而定。当然,也得结合具体地场景选取图像质量评价指标,在暗环境下进行图像质量评价和亮环境下进行图像质量评价选取的图像质量指标可不一样。It should be noted that since the evaluation of image quality using a single evaluation index has certain limitations, multiple image quality evaluation indexes can be used to evaluate image quality. Of course, when evaluating image quality, it is not an image The more quality evaluation indexes, the better, because the more image quality evaluation indexes, the higher the computational complexity of the image quality evaluation process, and the better the image quality evaluation effect may not be. Therefore, in the case of higher image quality evaluation requirements , You can use 2 to 10 image quality evaluation indicators to evaluate the image quality. Specifically, the number and index of image quality evaluation indicators are selected according to the specific implementation. Of course, the image quality evaluation index has to be selected in conjunction with the specific scene, and the image quality index selected for image quality evaluation in a dark environment and image quality evaluation in a bright environment may be different.
可选地,在对图像质量评价精度要求不高的情况下,可用一个图像质量评价指标进行评价,例如,以熵为图像质量评价指标时,可认为熵越大,则说明人脸图像质量越好,相反地,熵越小,则说明人脸图像质量越差。Optionally, when the accuracy of image quality evaluation is not high, an image quality evaluation index can be used for evaluation. For example, when entropy is used as the image quality evaluation index, the greater the entropy, the better the image quality of the face Well, conversely, the smaller the entropy, the worse the quality of the face image.
可选地,在对图像质量评价精度要求较高的情况下,可以采用多个图像质量评价指标对待评价图像进行评价,在多个图像质量评价指标对待评价图像进行图像质量评价时,可设置多个图像质量评价指标中每一图像质量评价指标的权重,可得到多个图像质量评价值,根据该多个图像质量评价值及其对应的权重可得到最终的图像质量评价值,例如,三个图像质量评价指标分别为:A指标、B指标和C指标,A的权重为a1,B的权重为a2,C的权重为a3,采用A、B和C对某一图像进行图像质量评价时,A对应的图像质量评价值为b1,B对应的图像质量评价值为b2,C对应的图像质量评价值为b3,那么,最后的图像质量评价值=a1b1+a2b2+a3b3。通常情况下,图像质量评价值越大,说明人脸图像质量越好。Optionally, when the image quality evaluation accuracy is required to be high, multiple image quality evaluation indicators may be used to evaluate the image to be evaluated. The weight of each image quality evaluation index in the image quality evaluation indexes can obtain multiple image quality evaluation values, and the final image quality evaluation value can be obtained according to the multiple image quality evaluation values and their corresponding weights, for example, three The image quality evaluation indicators are: A index, B index and C index, the weight of A is a1, the weight of B is a2, and the weight of C is a3, when A, B and C are used to evaluate the image quality of an image, The image quality evaluation value corresponding to A is b1, the image quality evaluation value corresponding to B is b2, and the image quality evaluation value corresponding to C is b3. Then, the final image quality evaluation value = a1b1+a2b2+a3b3. In general, the larger the image quality evaluation value, the better the face image quality.
可选地,所述预设卷积神经网络包括N个下采样层、N个上采样层以及卷 积层,N为大于1的整数;上述步骤42,将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集,可包括如下步骤:Optionally, the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1; step 42 above, corresponding to the maximum value after preprocessing The video image is input to the preset convolutional neural network to obtain the feature set, which may include the following steps:
421、通过所述N个下采样层对该最大值对应的预处理后的视频图像进行N次下采样,得到下采样后的视频图像,所述N次下采样中至少一次下采样包括以下至少一个操作:卷积操作、池化操作和归一化操作;421. Perform N downsampling on the preprocessed video image corresponding to the maximum value through the N downsampling layers to obtain a downsampled video image. At least one of the N downsampling includes the following at least One operation: convolution operation, pooling operation and normalization operation;
422、通过所述N个上采样层对所述下采样后的视频图像进行N次上采样,得到上采样后的视频图像;422. Perform up-sampling on the down-sampled video image through the N up-sampling layers to obtain an up-sampled video image;
423、通过所述卷积层对所述上采样后的视频图像进行卷积运算,得到所述特征集。423. Perform a convolution operation on the up-sampled video image through the convolution layer to obtain the feature set.
其中,本申请实施例中,预设卷积神经网络可以包括N个下采样层、N个上采样层以及卷积层,N为大于1的整数。上述预设卷积神经网络可以理解为一个编码-解码的网络。上述N个下采样层可以理解为编码过程,上述N个上采样层以及卷积层可以理解为解码过程。In the embodiment of the present application, the preset convolutional neural network may include N down-sampling layers, N up-sampling layers, and convolution layers, where N is an integer greater than 1. The foregoing preset convolutional neural network can be understood as an encoding-decoding network. The above-mentioned N down-sampling layers can be understood as an encoding process, and the above-mentioned N up-sampling layers and convolution layers can be understood as a decoding process.
如图1B所示,编码过程(左侧虚线框内)即为特征提取,通过四次下采样获取特征图像。下采样包括了卷积、池化以及归一化等操作,具体数量和规格我不清楚需不需要,如果需要的话我再补充给您。这个下采样的次数是通过实验得出的,考虑了算法的速度和准确率。理论上来说,采样次数越多准确率会提高但是整体的速度会下降,因此采用了四次来兼顾速度和准确率。而在下采样的过程中,会带来图像尺寸的减少,比如我输入的图像为224*320的话,经过每次下采样图像的长和宽会均变成原来的1/2,也就是说,在经过四次下采样后图像仅有7*10,所以需要通过右侧的解码(上采样)网络恢复图像的尺寸,同时也完成了由提取到的特征图像匹配到深度图像的过程。而上采样的次数与下采样一样,考虑了准确率和速度的平衡,最终取四次。As shown in FIG. 1B, the encoding process (in the dashed frame on the left) is feature extraction, and the feature image is obtained by four downsampling. Downsampling includes operations such as convolution, pooling, and normalization. I don't know the specific number and specifications. I will add it to you if necessary. The number of downsampling is obtained through experiments, taking into account the speed and accuracy of the algorithm. Theoretically, the more sampling times the accuracy will increase but the overall speed will decrease, so four times are used to balance speed and accuracy. In the process of downsampling, the image size will be reduced. For example, if the image I input is 224*320, the length and width of the image after each downsampling will become 1/2 of the original, that is, After four downsampling, the image is only 7*10, so the size of the image needs to be restored through the decoding (upsampling) network on the right, and the process of matching the extracted feature image to the depth image is also completed. The number of upsampling is the same as downsampling, taking into account the balance of accuracy and speed, and finally takes four.
另外,上述连接下采样和上采样的直线表示了“跳跃连接”(skip-connection),这种做法可以提高算法的准确率。In addition, the straight line connecting the down-sampling and the up-sampling represents a "skip-connection", which can improve the accuracy of the algorithm.
可选地,上述步骤104,根据预处理后的所述多张视频图像进行深度特征提 取,得到特征集,可以按照如下方式实施:Optionally, in the above step 104, depth feature extraction is performed according to the pre-processed multiple video images to obtain a feature set, which may be implemented as follows:
将所述多张视频图像输入到预设卷积神经网络,得到特征集。The multiple video images are input to a preset convolutional neural network to obtain a feature set.
本申请实施例中,上述预设卷积神经网络可以包含卷积、池化、归一化等操作,这些操作的目的是提取图像的特征、去除图像冗余信息、以加快网络速度等。提取的特征包含了图像中每个物体的轮廓、纹理、表面信息,物体与物体之间的相接处的边缘信息,已经物体处在整个场景中的位置信息等。最终生成了一张包含了整张图像信息的特征图像。In the embodiment of the present application, the preset convolutional neural network may include operations such as convolution, pooling, and normalization. The purpose of these operations is to extract image features, remove image redundant information, and speed up the network speed. The extracted features include the outline, texture, and surface information of each object in the image, the edge information of the connection between the object and the object, and the position information of the existing object in the entire scene. Finally, a feature image containing the entire image information is generated.
可选地,在所述多张视频图像中每一张视频图像包括人脸的情况下,Optionally, in the case where each video image in the plurality of video images includes a human face,
上述步骤41,对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值,包括:In the above step 41, image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, including:
411、对视频图像i进行图像分割,得到目标人脸图像,所述视频图像i为所述预处理后的所述多张视频图像中的任一帧视频图像;411. Perform image segmentation on the video image i to obtain a target face image, where the video image i is any frame of the plurality of video images after the preprocessing;
412、获取目标人脸图像,并获取所述目标人脸图像的二维角度值,所述二维角度值包括x角度值、y角度值;412. Acquire a target face image, and obtain a two-dimensional angle value of the target face image, where the two-dimensional angle value includes an x angle value and a y angle value;
413、获取所述二维角度值对应的二个权值,其中,所述x角度值对应的目标第一权值,所述y角度值对应的目标第二权值,所述目标第一权值与所述目标第二权值之和为1;413. Obtain two weight values corresponding to the two-dimensional angle value, wherein the target first weight value corresponding to the x angle value, the target second weight value corresponding to the y angle value, and the target first weight The sum of the value and the second weight of the target is 1;
414、依据所述x角度值、所述y角度值、所述目标第一权值、所述目标第二权值进行加权运算,得到目标角度值;414. Perform a weighted operation according to the x angle value, the y angle value, the target first weight value, and the target second weight value to obtain a target angle value;
415、按照预设的角度值与角度质量评价值之间的映射关系,确定所述目标角度值对应的图像质量评价值。415. Determine the image quality evaluation value corresponding to the target angle value according to a preset mapping relationship between the angle value and the angle quality evaluation value.
其中,本申请实施例中,电子设备可以对任一视频图像进行图像分割,得到人脸图像,人脸图像与摄像头之间存在一定的角度,由于是平面图像,因此,对应二维空间坐标系,x方向的x角度值,y方向的y角度值,如此,可以精准描述摄像头与人脸图像之间的角度关系。不同的角度则在一定程度上影响识别精度,例如,人脸角度直接影响到特征点数量或者特征点质量。上述二维角度 值可以理解为人脸相对于摄像头之间的二维夹角。上述二维角度值中每一角度值可以对应一个权值,当然,二维角度值对应的二个权值,均可以预先设置或者系统默认。x角度值对应的目标第一权值,y角度值对应的目标第二权值,上述目标第一权值+目标第二权值=1。In this embodiment of the present application, the electronic device can perform image segmentation on any video image to obtain a face image. There is a certain angle between the face image and the camera. Since it is a planar image, it corresponds to a two-dimensional space coordinate system , The x angle value in the x direction, and the y angle value in the y direction. In this way, the angle relationship between the camera and the face image can be accurately described. Different angles affect the recognition accuracy to a certain extent. For example, the face angle directly affects the number of feature points or the quality of feature points. The above two-dimensional angle value can be understood as the two-dimensional angle between the face and the camera. Each of the two-dimensional angle values may correspond to a weight value. Of course, the two weight values corresponding to the two-dimensional angle value may be preset or the system defaults. The first target weight value corresponding to the x-angle value and the second target weight value corresponding to the y-angle value, the first target weight value + the second target weight value=1.
进一步地,目标角度值=x角度值*目标第一权值+y角度值*目标第二权值,如此,可以实现将二维角度值转化为一维角度值,用于实现对人脸的角度进行精准表示。Further, the target angle value = x angle value * target first weight value + y angle value * target second weight value, in this way, a two-dimensional angle value can be converted into a one-dimensional angle value, which is used to achieve The angle is accurately expressed.
可选地,上述步骤102,获取所述二维角度值对应的二个权值,可包括如下步骤:Optionally, in step 102 above, obtaining two weights corresponding to the two-dimensional angle value may include the following steps:
21、获取目标环境亮度值;21. Obtain the target environment brightness value;
22、按照预设的环境亮度值与映射关系之间的映射关系,确定所述目标环境亮度值对应的目标映射关系,每一映射关系包括x方向的角度值与第一权值之间的第一映射关系;22. According to the mapping relationship between the preset environmental brightness value and the mapping relationship, determine the target mapping relationship corresponding to the target environmental brightness value, and each mapping relationship includes the number between the angle value in the x direction and the first weight value. A mapping relationship;
23、依据所述目标映射关系确定所述x角度值对应的所述目标第一权值;23. Determine the first target weight value corresponding to the x angle value according to the target mapping relationship;
24、依据所述目标第一权值确定所述目标第二权值。24. Determine the second target weight according to the first target weight.
其中,具体实现中,可以通过环境光传感器获取目标环境亮度值,还可以预先存储预设的环境亮度值与映射关系之间的映射关系,每一映射关系均可以包括x方向的角度值与第一权值之间的第一映射关系,进而,可以依据预设的环境亮度值与映射关系之间的映射关系确定目标环境亮度值对应的目标映射关系,依据目标映射关系确定x角度值对应的目标第一权值,目标第二权值=1-目标第一权值,由于不同的环境光线下,能够被识别到的人脸的角度也不一样,如此,可以依据环境光线,确定与光线对应的权值,有利于精准对人脸进行评价,当然,针对不同环境光线,对应的评价规则不一样,有利于精准实现对人脸角度进行评价。Among them, in specific implementation, the target ambient brightness value can be obtained through an ambient light sensor, and the mapping relationship between the preset ambient brightness value and the mapping relationship can also be stored in advance, and each mapping relationship can include the angle value in the x direction and the first A first mapping relationship between weights, and further, the target mapping relationship corresponding to the target environmental brightness value can be determined according to the mapping relationship between the preset environmental brightness value and the mapping relationship, and the corresponding x angle value can be determined according to the target mapping relationship Target first weight, target second weight = 1-target first weight, due to different ambient light, the angle of the face that can be recognized is different, so, according to the ambient light, you can determine the light Corresponding weights are conducive to the accurate evaluation of the face. Of course, for different ambient light, the corresponding evaluation rules are different, which is conducive to accurately evaluating the angle of the face.
104、按照预设的角度值与角度质量评价值之间的映射关系,确定所述目标角度值对应的第一目标评价值。104. Determine the first target evaluation value corresponding to the target angle value according to a preset mapping relationship between the angle value and the angle quality evaluation value.
其中,人脸评价装置中可以预先存储预设的角度值与角度质量评价值之间的映射关系,进而,依据该映射关系确定目标角度值对应的第一目标评价值,进一步地,如第一目标评价值大于预设评价阈值,则可以理解为,人脸图像容易被识别,很大程度上会被识别成功,当然,这样的角度对应的人脸可以用于人脸解锁,或者,这样角度对应的人脸可以用于摄像头采集,提升了人脸评价装置的人脸采集效率。Wherein, the face evaluation device may pre-store the mapping relationship between the preset angle value and the angle quality evaluation value, and then, according to the mapping relationship, determine the first target evaluation value corresponding to the target angle value, further, such as the first If the target evaluation value is greater than the preset evaluation threshold, it can be understood that the face image is easy to recognize and will be recognized to a large extent. Of course, the face corresponding to this angle can be used to unlock the face, or, such an angle The corresponding face can be used for camera collection, which improves the face collection efficiency of the face evaluation device.
105、依据所述特征集生成深度图。105. Generate a depth map according to the feature set.
其中,上述提及的特征集也称之为特征图,特征图并不是最终的深度图像,因此解码网络就是必须的。在一张深度图像中,每个点的值并不是常规图像的像素值,而是代表的该点距离摄像头的距离,单位为毫米。下图是RGB图像与深度图的一个例子。如图1C所示,图1C示出了一帧视频图像,图1D则为深度图,以灰度图呈现,该灰度图是将深度图中的距离值做了相关处理后显示出来的,离镜头越远的点其灰度值越低,看起来颜色就越靠近黑色。相反地,离镜头越近的点灰度值越大,看起来颜色越靠近白色。Among them, the above-mentioned feature set is also called a feature map. The feature map is not the final depth image, so the decoding network is necessary. In a depth image, the value of each point is not the pixel value of a regular image, but the distance of the point from the camera in millimeters. The figure below is an example of an RGB image and a depth map. As shown in FIG. 1C, FIG. 1C shows a frame of video image, and FIG. 1D is a depth map, which is presented as a grayscale image. The grayscale image is displayed after the distance values in the depth map are processed by correlation processing. The further away from the lens, the lower the gray value, the closer the color looks to black. Conversely, the closer the point to the lens, the greater the gray value, and the closer the color appears to white.
可选地,所述特征集中包括多个特征点,每一特征点包括坐标位置、特征方向和特征大小;上述步骤105,依据所述特征集生成深度图,可包括如下步骤:Optionally, the feature set includes multiple feature points, and each feature point includes coordinate position, feature direction, and feature size; the above step 105, generating a depth map according to the feature set may include the following steps:
51、依据所述特征集中每一特征点的特征方向和特征大小计算特征值,得到多个目标特征值,每一特征点对应一个特征值;51. Calculate the feature value according to the feature direction and feature size of each feature point in the feature set to obtain multiple target feature values, and each feature point corresponds to a feature value;
52、按照预设的特征值和深度值之间的映射关系,确定所述多个目标特征值中每一目标特征值对应的目标深度值,得到多个目标深度值,每一目标深度值对应一个坐标位置;52. Determine a target depth value corresponding to each target feature value of the multiple target feature values according to a preset mapping relationship between the feature value and the depth value, to obtain multiple target depth values, each corresponding to the target depth value A coordinate position;
53、依据所述多个目标深度值构成所述深度图。53. Construct the depth map according to the plurality of target depth values.
其中,上述特征集可以包括多个特征点,每一特征点包括坐标位置、特征大小和特征方向,由于特征点为矢量,因此,则可以通过特征大小和特征方向计算特征值,如此,可以计算出特征集中每一特征点对应的特征值,得到多个目标特征值,每一特征点对应一个特征值。电子设备中还可以预先存储预设的 特征值和深度值之间的映射关系,进而,可以依据该映射关系确定多个目标特征值中每一目标特征值对应的目标深度值,得到多个目标深度值,每一目标深度值对应一个坐标位置,依据多个目标深度值构成深度图,如此,可以实现建立特征点构建深度图。Among them, the above feature set may include multiple feature points, each feature point includes coordinate position, feature size and feature direction, because the feature point is a vector, therefore, you can calculate the feature value from the feature size and feature direction, so, you can calculate A feature value corresponding to each feature point in the feature set is obtained to obtain multiple target feature values, and each feature point corresponds to a feature value. The electronic device can also pre-store the mapping relationship between the preset feature value and the depth value, and further, the target depth value corresponding to each target feature value in the multiple target feature values can be determined according to the mapping relationship to obtain multiple targets Depth value, each target depth value corresponds to a coordinate position, and a depth map is constructed based on multiple target depth values. In this way, feature points can be established to construct a depth map.
106、依据点云数据处理技术对所述深度图进行处理,得到3D图像。106. Process the depth map according to point cloud data processing technology to obtain a 3D image.
其中,上述深度图中的每一个点即使原图中每一个点距离摄像头的距离。点云生成实质是不同坐标系之间点的映射,即从二维图像中的任一坐标m(u,v)映射到三维世界中的空间坐标M(Xw,Yw,Zw)的过程。如图1E所示,最终得到坐标转换公式为:Wherein, each point in the depth map is the distance from the camera to each point in the original image. The point cloud generation is essentially the mapping of points between different coordinate systems, that is, the process of mapping any coordinate m (u, v) in a two-dimensional image to the spatial coordinate M (Xw, Yw, Zw) in a three-dimensional world. As shown in Figure 1E, the final coordinate conversion formula is:
Figure PCTCN2019121345-appb-000001
Figure PCTCN2019121345-appb-000001
其中,M(Xw,Yw,Zw)为世界坐标,m(u,v)为深度图坐标,Zc是深度图中每一个点的数值即为该点距离摄像头的距离。u0、v0是二维图像中心坐标值。dx和dy是将距离单位转换为米,如果距离值是毫米单位则为1000。f是相机镜头的焦距。通过这个计算就可以实现二维深度图到三维图即点云的转换。最终可以利用点云数据处理技术结合原始RGB图像实现三维重建。Among them, M (Xw, Yw, Zw) is the world coordinates, m (u, v) is the depth map coordinates, Zc is the value of each point in the depth map is the distance of the point from the camera. u0 and v0 are the coordinate values of the center of the two-dimensional image. dx and dy convert distance units to meters, and 1000 if the distance value is millimeter units. f is the focal length of the camera lens. Through this calculation, the conversion from the two-dimensional depth map to the three-dimensional map, namely the point cloud, can be realized. Finally, point cloud data processing technology can be used to combine with the original RGB image to achieve 3D reconstruction.
可以看出,通过本申请实施例所描述的图像处理方法,通过单摄像头获取指定区域的视频流,对视频流进行采样,得到多张视频图像,对多张视频图像进行预处理,得到预处理后的多张视频图像,根据预处理后的多张视频图像进行深度特征提取,得到特征集,依据特征集生成深度图,依据点云数据处理技术对深度图进行处理,得到3D图像,如此,能够通过单摄像头采集视频图像,并经过采样,预处理,以及特征提取,得到特征集,将该特征集转化为深度图,并通过点云数据处理技术实现3D场景图,进而,降低了三维重现成本。It can be seen that, through the image processing method described in the embodiment of the present application, a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessing After the multiple video images, the depth feature extraction is performed according to the pre-processed multiple video images to obtain the feature set, the depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image. Capable of collecting video images through a single camera, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the three-dimensional weight Current costs.
与上述一致地,请参阅图2,为本申请实施例提供的一种图像处理方法的实施例流程示意图。本实施例中所描述的图像处理方法,包括以下步骤:Consistent with the above, please refer to FIG. 2, which is a schematic flowchart of an embodiment of an image processing method according to an embodiment of the present application. The image processing method described in this embodiment includes the following steps:
201、通过单摄像头获取指定区域的视频流。201. Obtain a video stream in a specified area through a single camera.
202、对所述视频流进行采样,得到多张视频图像。202. Sampling the video stream to obtain multiple video images.
203、对所述多张视频图像进行预处理,得到预处理后的所述多张视频图像;203. Preprocess the multiple video images to obtain the preprocessed multiple video images;
204、对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值。204. Perform image quality evaluation on each video image in the plurality of pre-processed video images to obtain multiple image quality evaluation values.
205、从所述多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。205. Select a maximum value from the plurality of image quality evaluation values, and input the preprocessed video image corresponding to the maximum value to a preset convolutional neural network to obtain a feature set.
206、依据所述特征集生成深度图。206. Generate a depth map according to the feature set.
207、依据点云数据处理技术对所述深度图进行处理,得到3D图像。207. Process the depth map according to the point cloud data processing technology to obtain a 3D image.
其中,上述步骤201-步骤207所描述的图像处理方法可参考图1A所描述的图像处理方法的对应步骤。For the image processing method described in the above steps 201-207, reference may be made to the corresponding steps of the image processing method described in FIG. 1A.
可以看出,通过本申请实施例所描述的图像处理方法,通过单摄像头获取指定区域的视频流,对视频流进行采样,得到多张视频图像,对多张视频图像进行预处理,得到预处理后的多张视频图像,对预处理后的多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值,从多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集,依据特征集生成深度图,依据点云数据处理技术对深度图进行处理,得到3D图像,如此,能够通过单摄像头采集视频图像,并经过采样,预处理,以及特征提取,得到特征集,将该特征集转化为深度图,并通过点云数据处理技术实现3D场景图,进而,降低了三维重现成本。It can be seen that, through the image processing method described in the embodiment of the present application, a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessing After multiple pre-processed video images, the image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, and the maximum value is selected from the multiple image quality evaluation values, and the The preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set, a depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image. A single camera collects video images, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the cost of 3D reproduction .
与上述一致地,以下为实施上述图像处理方法的装置,具体如下:Consistent with the above, the following is an apparatus for implementing the above image processing method, specifically as follows:
请参阅图3,为本申请实施例提供的一种图像处理装置的实施例结构示意图。本实施例中所描述的图像处理装置,包括:获取单元301、采样单元302、预处理单元303、提取单元304、生成单元305和处理单元306,具体如下:Please refer to FIG. 3, which is a schematic structural diagram of an embodiment of an image processing apparatus according to an embodiment of the present application. The image processing device described in this embodiment includes: an acquisition unit 301, a sampling unit 302, a preprocessing unit 303, an extraction unit 304, a generation unit 305, and a processing unit 306, as follows:
获取单元301,用于通过单摄像头获取指定区域的视频流;The obtaining unit 301 is configured to obtain a video stream in a specified area through a single camera;
采样单元302,用于对所述视频流进行采样,得到多张视频图像;The sampling unit 302 is configured to sample the video stream to obtain multiple video images;
预处理单元303,用于对所述多张视频图像进行预处理,得到预处理后的所 述多张视频图像;A pre-processing unit 303, configured to pre-process the multiple video images to obtain the pre-processed multiple video images;
提取单元304,用于根据预处理后的所述多张视频图像进行深度特征提取,得到特征集;The extraction unit 304 is configured to perform depth feature extraction based on the pre-processed multiple video images to obtain a feature set;
生成单元305,用于依据所述特征集生成深度图;The generating unit 305 is configured to generate a depth map according to the feature set;
处理单元306,用于依据点云数据处理技术对所述深度图进行处理,得到3D图像。The processing unit 306 is configured to process the depth map according to a point cloud data processing technology to obtain a 3D image.
可以看出,通过本申请实施例所描述的图像处理装置,通过单摄像头获取指定区域的视频流,对视频流进行采样,得到多张视频图像,对多张视频图像进行预处理,得到预处理后的多张视频图像,根据预处理后的多张视频图像进行深度特征提取,得到特征集,依据特征集生成深度图,依据点云数据处理技术对深度图进行处理,得到3D图像,如此,能够通过单摄像头采集视频图像,并经过采样,预处理,以及特征提取,得到特征集,将该特征集转化为深度图,并通过点云数据处理技术实现3D场景图,进而,降低了三维重现成本。It can be seen that through the image processing device described in the embodiment of the present application, a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessing After the multiple video images, the depth feature extraction is performed according to the pre-processed multiple video images to obtain the feature set, the depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image. Capable of collecting video images through a single camera, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the three-dimensional weight Current costs.
其中,上述获取单元301可用于实现上述步骤101所描述的方法,采样单元302可用于实现上述步骤102所描述的方法,上述预处理单元303可用于实现上述步骤103所描述的方法,上述提取单元304可用于实现上述步骤104所描述的方法,上述生成单元305可用于实现上述步骤105所描述的方法,上述处理单元306可用于实现上述步骤106所描述的方法,以下如此类推。Wherein, the above obtaining unit 301 can be used to implement the method described in step 101 above, the sampling unit 302 can be used to implement the method described in step 102 above, the above preprocessing unit 303 can be used to implement the method described in step 103 above, the above extraction unit 304 may be used to implement the method described in step 104 above, the generation unit 305 may be used to implement the method described in step 105 above, the processing unit 306 may be used to implement the method described in step 106 above, and so on.
可选地,在所述根据所述多张视频图像进行深度特征提取,得到特征集方面,所述提取单元304具体用于:Optionally, in terms of performing depth feature extraction based on the multiple video images to obtain a feature set, the extraction unit 304 is specifically configured to:
对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值;Performing image quality evaluation on each video image in the pre-processed multiple video images to obtain multiple image quality evaluation values;
从所述多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。A maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
可选地,所述预设卷积神经网络包括N个下采样层、N个上采样层以及卷积层,N为大于1的整数;Optionally, the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1;
在所述将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集方面,所述提取单元304具体用于:In terms of inputting the preprocessed video image corresponding to the maximum value to a preset convolutional neural network to obtain a feature set, the extraction unit 304 is specifically used to:
通过所述N个下采样层对该最大值对应的预处理后的视频图像进行N次下采样,得到下采样后的视频图像,所述N次下采样中至少一次下采样包括以下至少一个操作:卷积操作、池化操作和归一化操作;Performing N downsampling on the preprocessed video image corresponding to the maximum value through the N downsampling layers to obtain a downsampled video image, at least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
通过所述N个上采样层对所述下采样后的视频图像进行N次上采样,得到上采样后的视频图像;Performing up-sampling on the down-sampled video image through the N up-sampling layers to obtain an up-sampled video image;
通过所述卷积层对所述上采样后的视频图像进行卷积运算,得到所述特征集。The convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
可选地,所述预设卷积神经网络包括N个下采样层、N个上采样层以及卷积层,N为大于1的整数;Optionally, the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1;
在所述将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集方面,所述提取单元304具体用于:In terms of inputting the preprocessed video image corresponding to the maximum value to a preset convolutional neural network to obtain a feature set, the extraction unit 304 is specifically used to:
通过所述N个下采样层对该最大值对应的预处理后的视频图像进行N次下采样,得到下采样后的视频图像,所述N次下采样中至少一次下采样包括以下至少一个操作:卷积操作、池化操作和归一化操作;Performing N downsampling on the preprocessed video image corresponding to the maximum value through the N downsampling layers to obtain a downsampled video image, at least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
通过所述N个上采样层对所述下采样后的视频图像进行N次上采样,得到上采样后的视频图像;Performing up-sampling on the down-sampled video image through the N up-sampling layers to obtain an up-sampled video image;
通过所述卷积层对所述上采样后的视频图像进行卷积运算,得到所述特征集。The convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
可选地,在所述多张视频图像中每一张视频图像包括人脸的情况下,Optionally, in the case where each video image in the plurality of video images includes a human face,
在所述对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值方面,所述提取单元304具体用于:In terms of performing image quality evaluation on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, the extraction unit 304 is specifically configured to:
对视频图像i进行图像分割,得到目标人脸图像,所述视频图像i为所述预处理后的所述多张视频图像中的任一帧视频图像;Performing image segmentation on the video image i to obtain a target face image, where the video image i is any frame of the plurality of video images after the preprocessing;
获取目标人脸图像,并获取所述目标人脸图像的二维角度值,所述二维角 度值包括x角度值、y角度值;Acquiring a target face image, and acquiring a two-dimensional angle value of the target face image, where the two-dimensional angle value includes an x angle value and a y angle value;
获取所述二维角度值对应的二个权值,其中,所述x角度值对应的目标第一权值,所述y角度值对应的目标第二权值,所述目标第一权值与所述目标第二权值之和为1;Acquiring two weight values corresponding to the two-dimensional angle value, wherein the target first weight value corresponding to the x angle value, the target second weight value corresponding to the y angle value, and the target first weight value are The sum of the second weights of the target is 1;
依据所述x角度值、所述y角度值、所述目标第一权值、所述目标第二权值进行加权运算,得到目标角度值;Performing a weighted operation according to the x angle value, the y angle value, the target first weight value, and the target second weight value to obtain a target angle value;
按照预设的角度值与角度质量评价值之间的映射关系,确定所述目标角度值对应的图像质量评价值。According to the mapping relationship between the preset angle value and the angle quality evaluation value, the image quality evaluation value corresponding to the target angle value is determined.
可选地,所述特征集中包括多个特征点,每一特征点包括坐标位置、特征方向和特征大小;Optionally, the feature set includes multiple feature points, and each feature point includes a coordinate position, feature direction, and feature size;
在所述依据所述特征集生成深度图方面,所述生成单元305具体用于:In terms of generating a depth map according to the feature set, the generating unit 305 is specifically configured to:
依据所述特征集中每一特征点的特征方向和特征大小计算特征值,得到多个目标特征值,每一特征点对应一个特征值;Calculate feature values according to the feature direction and feature size of each feature point in the feature set, and obtain multiple target feature values, each feature point corresponding to a feature value;
按照预设的特征值和深度值之间的映射关系,确定所述多个目标特征值中每一目标特征值对应的目标深度值,得到多个目标深度值,每一目标深度值对应一个坐标位置;According to the mapping relationship between the preset feature value and the depth value, the target depth value corresponding to each target feature value in the multiple target feature values is determined to obtain multiple target depth values, each target depth value corresponding to a coordinate position;
依据所述多个目标深度值构成所述深度图。The depth map is constructed according to the plurality of target depth values.
可以理解的是,本实施例的图像处理装置的各程序模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。It can be understood that the functions of each program module of the image processing apparatus of this embodiment may be specifically implemented according to the method in the above method embodiments, and the specific implementation process may refer to the related description of the above method embodiments, which will not be repeated here.
与上述一致地,请参阅图4,为本申请实施例提供的一种电子设备的实施例结构示意图。本实施例中所描述的电子设备,包括:至少一个输入设备1000;至少一个输出设备2000;至少一个处理器3000,例如CPU;和存储器4000,上述输入设备1000、输出设备2000、处理器3000和存储器4000通过总线5000连接。Consistent with the above, please refer to FIG. 4, which is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application. The electronic device described in this embodiment includes: at least one input device 1000; at least one output device 2000; at least one processor 3000, such as a CPU; and memory 4000, the above input device 1000, output device 2000, processor 3000 and The memory 4000 is connected through a bus 5000.
其中,上述输入设备1000具体可为触控面板、物理按键或者鼠标。The input device 1000 may specifically be a touch panel, physical buttons, or a mouse.
上述输出设备2000具体可为显示屏。The above output device 2000 may specifically be a display screen.
上述存储器4000可以是高速RAM存储器,也可为非易失存储器(non-volatile memory),例如磁盘存储器。上述存储器4000用于存储一组程序代码,上述输入设备1000、输出设备2000和处理器3000用于调用存储器4000中存储的程序代码,执行如下操作:The above-mentioned memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. The above memory 4000 is used to store a set of program codes, and the above input device 1000, output device 2000, and processor 3000 are used to call the program codes stored in the memory 4000, and perform the following operations:
上述处理器3000,用于:The aforementioned processor 3000 is used for:
通过单摄像头获取指定区域的视频流;Obtain the video stream of the specified area through a single camera;
对所述视频流进行采样,得到多张视频图像;Sampling the video stream to obtain multiple video images;
对所述多张视频图像进行预处理,得到预处理后的所述多张视频图像;Preprocessing the multiple video images to obtain the preprocessed multiple video images;
根据预处理后的所述多张视频图像进行深度特征提取,得到特征集;Performing depth feature extraction according to the pre-processed multiple video images to obtain a feature set;
依据所述特征集生成深度图;Generate a depth map according to the feature set;
依据点云数据处理技术对所述深度图进行处理,得到3D图像。The depth map is processed according to point cloud data processing technology to obtain a 3D image.
可以看出,通过本申请实施例所描述的电子设备,通过单摄像头获取指定区域的视频流,对视频流进行采样,得到多张视频图像,对多张视频图像进行预处理,得到预处理后的多张视频图像,根据预处理后的多张视频图像进行深度特征提取,得到特征集,依据特征集生成深度图,依据点云数据处理技术对深度图进行处理,得到3D图像,如此,能够通过单摄像头采集视频图像,并经过采样,预处理,以及特征提取,得到特征集,将该特征集转化为深度图,并通过点云数据处理技术实现3D场景图,进而,降低了三维重现成本。It can be seen that through the electronic device described in the embodiments of the present application, a single camera is used to obtain a video stream in a specified area, the video stream is sampled to obtain multiple video images, and multiple video images are preprocessed to obtain preprocessed For multiple video images, the depth feature extraction is performed based on the pre-processed multiple video images to obtain a feature set, a depth map is generated according to the feature set, and the depth map is processed according to the point cloud data processing technology to obtain a 3D image. A single camera is used to collect video images, and after sampling, preprocessing, and feature extraction, a feature set is obtained, which is converted into a depth map, and a 3D scene map is realized through point cloud data processing technology, which further reduces the three-dimensional reproduction cost.
可选地,在所述根据所述多张视频图像进行深度特征提取,得到特征集方面,上述处理器3000具体用于:Optionally, in terms of extracting depth features based on the multiple video images to obtain a feature set, the processor 3000 is specifically used to:
对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值;Performing image quality evaluation on each video image in the pre-processed multiple video images to obtain multiple image quality evaluation values;
从所述多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。A maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
可选地,所述预设卷积神经网络包括N个下采样层、N个上采样层以及卷 积层,N为大于1的整数;Optionally, the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, where N is an integer greater than 1;
在所述将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集方面,上述处理器3000具体用于:In terms of inputting the preprocessed video image corresponding to the maximum value to a preset convolutional neural network to obtain a feature set, the processor 3000 is specifically used to:
通过所述N个下采样层对该最大值对应的预处理后的视频图像进行N次下采样,得到下采样后的视频图像,所述N次下采样中至少一次下采样包括以下至少一个操作:卷积操作、池化操作和归一化操作;Performing N downsampling on the preprocessed video image corresponding to the maximum value through the N downsampling layers to obtain a downsampled video image, at least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
通过所述N个上采样层对所述下采样后的视频图像进行N次上采样,得到上采样后的视频图像;Performing up-sampling on the down-sampled video image through the N up-sampling layers to obtain an up-sampled video image;
通过所述卷积层对所述上采样后的视频图像进行卷积运算,得到所述特征集。The convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
可选地,在所述多张视频图像中每一张视频图像包括人脸的情况下,Optionally, in the case where each video image in the plurality of video images includes a human face,
在所述对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值方面,上述处理器3000具体用于对视频图像i进行图像分割,得到目标人脸图像,所述视频图像i为所述预处理后的所述多张视频图像中的任一帧视频图像;获取目标人脸图像,并获取所述目标人脸图像的二维角度值,所述二维角度值包括x角度值、y角度值;获取所述二维角度值对应的二个权值,其中,所述x角度值对应的目标第一权值,所述y角度值对应的目标第二权值,所述目标第一权值与所述目标第二权值之和为1;依据所述x角度值、所述y角度值、所述目标第一权值、所述目标第二权值进行加权运算,得到目标角度值;按照预设的角度值与角度质量评价值之间的映射关系,确定所述目标角度值对应的图像质量评价值。In terms of performing image quality evaluation on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, the processor 3000 is specifically used to perform image segmentation on the video image i to obtain A target face image, the video image i is any frame of the pre-processed multiple video images; obtain a target face image, and obtain a two-dimensional angle value of the target face image , The two-dimensional angle value includes an x-angle value and a y-angle value; two weight values corresponding to the two-dimensional angle value are obtained, where the target first weight value corresponding to the x-angle value and the y angle value Corresponding target second weight, the sum of the target first weight and the target second weight is 1; according to the x angle value, the y angle value, the target first weight value, all The target second weight value is weighted to obtain the target angle value; according to the preset mapping relationship between the angle value and the angle quality evaluation value, the image quality evaluation value corresponding to the target angle value is determined.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的任何一种图像处理方法的部分或全部步骤。An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes some or all steps of any one of the image processing methods described in the foregoing method embodiments.

Claims (10)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, which includes:
    通过单摄像头获取指定区域的视频流;Obtain the video stream of the specified area through a single camera;
    对所述视频流进行采样,得到多张视频图像;Sampling the video stream to obtain multiple video images;
    对所述多张视频图像进行预处理,得到预处理后的所述多张视频图像;Preprocessing the multiple video images to obtain the preprocessed multiple video images;
    根据预处理后的所述多张视频图像进行深度特征提取,得到特征集;Performing depth feature extraction according to the pre-processed multiple video images to obtain a feature set;
    依据所述特征集生成深度图;Generate a depth map according to the feature set;
    依据点云数据处理技术对所述深度图进行处理,得到3D图像。The depth map is processed according to point cloud data processing technology to obtain a 3D image.
  2. 根据权利要求1所述的方法,其特征在于,所述根据预处理后的所述多张视频图像进行深度特征提取,得到特征集,包括:The method according to claim 1, wherein the deep feature extraction based on the pre-processed multiple video images to obtain a feature set includes:
    对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值;Performing image quality evaluation on each video image in the pre-processed multiple video images to obtain multiple image quality evaluation values;
    从所述多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。A maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
  3. 根据权利要求2所述的方法,其特征在于,所述预设卷积神经网络包括N个下采样层、N个上采样层以及卷积层,N为大于1的整数;The method according to claim 2, wherein the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, and N is an integer greater than 1;
    所述将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集,包括:The pre-processed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set, including:
    通过所述N个下采样层对该最大值对应的预处理后的视频图像进行N次下采样,得到下采样后的视频图像,所述N次下采样中至少一次下采样包括以下至少一个操作:卷积操作、池化操作和归一化操作;Performing N downsampling on the preprocessed video image corresponding to the maximum value through the N downsampling layers to obtain a downsampled video image, at least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
    通过所述N个上采样层对所述下采样后的视频图像进行N次上采样,得到上采样后的视频图像;Performing up-sampling on the down-sampled video image through the N up-sampling layers to obtain an up-sampled video image;
    通过所述卷积层对所述上采样后的视频图像进行卷积运算,得到所述特征集。The convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
  4. 根据权利要求2所述的方法,其特征在于,在所述多张视频图像中每一张视频图像包括人脸的情况下,The method according to claim 2, characterized in that, in the case where each of the plurality of video images includes a human face,
    所述对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值,包括:The image quality evaluation is performed on each of the pre-processed multiple video images to obtain multiple image quality evaluation values, including:
    对视频图像i进行图像分割,得到目标人脸图像,所述视频图像i为所述预处理后的所述多张视频图像中的任一帧视频图像;Performing image segmentation on the video image i to obtain a target face image, where the video image i is any frame of the plurality of video images after the preprocessing;
    获取目标人脸图像,并获取所述目标人脸图像的二维角度值,所述二维角度值包括x角度值、y角度值;Acquiring a target face image, and acquiring a two-dimensional angle value of the target face image, where the two-dimensional angle value includes an x angle value and a y angle value;
    获取所述二维角度值对应的二个权值,其中,所述x角度值对应的目标第一权值,所述y角度值对应的目标第二权值,所述目标第一权值与所述目标第二权值之和为1;Acquiring two weight values corresponding to the two-dimensional angle value, wherein the target first weight value corresponding to the x angle value, the target second weight value corresponding to the y angle value, and the target first weight value are The sum of the second weights of the target is 1;
    依据所述x角度值、所述y角度值、所述目标第一权值、所述目标第二权值进行加权运算,得到目标角度值;Performing a weighted operation according to the x angle value, the y angle value, the target first weight value, and the target second weight value to obtain a target angle value;
    按照预设的角度值与角度质量评价值之间的映射关系,确定所述目标角度值对应的图像质量评价值。According to the mapping relationship between the preset angle value and the angle quality evaluation value, the image quality evaluation value corresponding to the target angle value is determined.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述特征集中包括多个特征点,每一特征点包括坐标位置、特征方向和特征大小;The method according to any one of claims 1-4, wherein the feature set includes a plurality of feature points, and each feature point includes a coordinate position, a feature direction, and a feature size;
    所述依据所述特征集生成深度图,包括:The generating a depth map according to the feature set includes:
    依据所述特征集中每一特征点的特征方向和特征大小计算特征值,得到多个目标特征值,每一特征点对应一个目标特征值;Calculate the feature value according to the feature direction and feature size of each feature point in the feature set to obtain multiple target feature values, each feature point corresponding to a target feature value;
    按照预设的特征值和深度值之间的映射关系,确定所述多个目标特征值中每一目标特征值对应的目标深度值,得到多个目标深度值,每一目标深度值对应一个坐标位置;According to the mapping relationship between the preset feature value and the depth value, the target depth value corresponding to each target feature value in the multiple target feature values is determined to obtain multiple target depth values, each target depth value corresponding to a coordinate position;
    依据所述多个目标深度值构成所述深度图。The depth map is constructed according to the plurality of target depth values.
  6. 一种图像处理装置,其特征在于,包括:An image processing device, characterized in that it includes:
    获取单元,用于通过单摄像头获取指定区域的视频流;The acquisition unit is used to acquire a video stream in a specified area through a single camera;
    采样单元,用于对所述视频流进行采样,得到多张视频图像;A sampling unit, configured to sample the video stream to obtain multiple video images;
    预处理单元,用于对所述多张视频图像进行预处理,得到预处理后的所述多张视频图像;A preprocessing unit, configured to preprocess the multiple video images to obtain the preprocessed multiple video images;
    提取单元,用于根据预处理后的所述多张视频图像进行深度特征提取,得到特征集;An extraction unit, configured to perform depth feature extraction based on the pre-processed multiple video images to obtain a feature set;
    生成单元,用于依据所述特征集生成深度图;A generating unit, configured to generate a depth map according to the feature set;
    处理单元,用于依据点云数据处理技术对所述深度图进行处理,得到3D图像。The processing unit is configured to process the depth map according to point cloud data processing technology to obtain a 3D image.
  7. 根据权利要求6所述的装置,其特征在于,在所述根据所述多张视频图像进行深度特征提取,得到特征集方面,所述提取单元具体用于:The device according to claim 6, characterized in that, in terms of performing depth feature extraction based on the multiple video images to obtain a feature set, the extraction unit is specifically configured to:
    对预处理后的所述多张视频图像中每一张视频图像进行图像质量评价,得到多个图像质量评价值;Performing image quality evaluation on each video image in the pre-processed multiple video images to obtain multiple image quality evaluation values;
    从所述多个图像质量评价值中选取最大值,并将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集。A maximum value is selected from the plurality of image quality evaluation values, and the preprocessed video image corresponding to the maximum value is input to a preset convolutional neural network to obtain a feature set.
  8. 根据权利要求7所述的装置,其特征在于,所述预设卷积神经网络包括N个下采样层、N个上采样层以及卷积层,N为大于1的整数;The apparatus according to claim 7, wherein the preset convolutional neural network includes N downsampling layers, N upsampling layers, and convolutional layers, and N is an integer greater than 1;
    在所述将该最大值对应的预处理后的视频图像输入到预设卷积神经网络,得到特征集方面,所述提取单元具体用于:In terms of inputting the preprocessed video image corresponding to the maximum value to a preset convolutional neural network to obtain a feature set, the extraction unit is specifically used to:
    通过所述N个下采样层对该最大值对应的预处理后的视频图像进行N次下采样,得到下采样后的视频图像,所述N次下采样中至少一次下采样包括以下至少一个操作:卷积操作、池化操作和归一化操作;Performing N downsampling on the preprocessed video image corresponding to the maximum value through the N downsampling layers to obtain a downsampled video image, at least one of the N downsampling includes at least one of the following operations : Convolution operation, pooling operation and normalization operation;
    通过所述N个上采样层对所述下采样后的视频图像进行N次上采样,得到上采样后的视频图像;Performing up-sampling on the down-sampled video image through the N up-sampling layers to obtain an up-sampled video image;
    通过所述卷积层对所述上采样后的视频图像进行卷积运算,得到所述特征集。The convolution layer performs a convolution operation on the up-sampled video image to obtain the feature set.
  9. 一种电子设备,其特征在于,包括处理器、存储器,所述存储器用于存储一个或多个程序,并且被配置由所述处理器执行,所述程序包括用于执行如权利要求1-5任一项所述的方法中的步骤的指令。An electronic device, characterized in that it includes a processor and a memory, and the memory is used to store one or more programs and is configured to be executed by the processor, and the program includes a program for executing claims 1-5 Instructions for steps in any of the methods described.
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行以实现如权利要求1-5任一项所述的方法。A computer-readable storage medium storing a computer program, the computer program is executed by a processor to implement the method according to any one of claims 1-5.
PCT/CN2019/121345 2018-12-29 2019-11-27 Image processing method and related product WO2020134818A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811643004.6 2018-12-29
CN201811643004.6A CN109754461A (en) 2018-12-29 2018-12-29 Image processing method and related product

Publications (1)

Publication Number Publication Date
WO2020134818A1 true WO2020134818A1 (en) 2020-07-02

Family

ID=66404534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121345 WO2020134818A1 (en) 2018-12-29 2019-11-27 Image processing method and related product

Country Status (2)

Country Link
CN (1) CN109754461A (en)
WO (1) WO2020134818A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754461A (en) * 2018-12-29 2019-05-14 深圳云天励飞技术有限公司 Image processing method and related product
CN110197464A (en) * 2019-05-24 2019-09-03 清华大学 Depth camera depth map real-time de-noising method and apparatus
CN110505398B (en) * 2019-07-16 2021-03-02 北京三快在线科技有限公司 Image processing method and device, electronic equipment and storage medium
CN110784644B (en) * 2019-08-26 2022-12-09 腾讯科技(深圳)有限公司 Image processing method and device
CN112529783B (en) * 2019-09-19 2024-01-16 北京京东乾石科技有限公司 Image processing method, image processing apparatus, storage medium, and electronic device
CN112037363B (en) * 2019-10-22 2021-05-14 山东新亮电子科技有限公司 Driving record big data auxiliary analysis system
CN112887665B (en) * 2020-12-30 2023-07-18 重庆邮电大学移通学院 Video image processing method and related device
CN112906488A (en) * 2021-01-26 2021-06-04 广东电网有限责任公司 Security protection video quality evaluation system based on artificial intelligence
WO2023225808A1 (en) * 2022-05-23 2023-11-30 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Learned image compress ion and decompression using long and short attention module

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198977A1 (en) * 2012-03-21 2014-07-17 Texas Instruments Incorporated Enhancement of Stereo Depth Maps
CN104331924A (en) * 2014-11-26 2015-02-04 西安冉科信息技术有限公司 Three-dimensional reconstruction method based on single camera SFS algorithm
CN107038753A (en) * 2017-04-14 2017-08-11 中国科学院深圳先进技术研究院 Stereo vision three-dimensional rebuilding system and method
CN107170037A (en) * 2016-03-07 2017-09-15 深圳市鹰眼在线电子科技有限公司 A kind of real-time three-dimensional point cloud method for reconstructing and system based on multiple-camera
CN109754461A (en) * 2018-12-29 2019-05-14 深圳云天励飞技术有限公司 Image processing method and related product

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471988B2 (en) * 2011-11-02 2016-10-18 Google Inc. Depth-map generation for an input image using an example approximate depth-map associated with an example similar image
KR102146398B1 (en) * 2015-07-14 2020-08-20 삼성전자주식회사 Three dimensional content producing apparatus and three dimensional content producing method thereof
US10242458B2 (en) * 2017-04-21 2019-03-26 Qualcomm Incorporated Registration of range images using virtual gimbal information
CN107169458B (en) * 2017-05-18 2018-04-06 深圳云天励飞技术有限公司 Data processing method, device and storage medium
CN108230293A (en) * 2017-05-31 2018-06-29 深圳市商汤科技有限公司 Determine method and apparatus, electronic equipment and the computer storage media of quality of human face image
CN107590452A (en) * 2017-09-04 2018-01-16 武汉神目信息技术有限公司 A kind of personal identification method and device based on gait and face fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140198977A1 (en) * 2012-03-21 2014-07-17 Texas Instruments Incorporated Enhancement of Stereo Depth Maps
CN104331924A (en) * 2014-11-26 2015-02-04 西安冉科信息技术有限公司 Three-dimensional reconstruction method based on single camera SFS algorithm
CN107170037A (en) * 2016-03-07 2017-09-15 深圳市鹰眼在线电子科技有限公司 A kind of real-time three-dimensional point cloud method for reconstructing and system based on multiple-camera
CN107038753A (en) * 2017-04-14 2017-08-11 中国科学院深圳先进技术研究院 Stereo vision three-dimensional rebuilding system and method
CN109754461A (en) * 2018-12-29 2019-05-14 深圳云天励飞技术有限公司 Image processing method and related product

Also Published As

Publication number Publication date
CN109754461A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
WO2020134818A1 (en) Image processing method and related product
US11762475B2 (en) AR scenario-based gesture interaction method, storage medium, and communication terminal
KR102319177B1 (en) Method and apparatus, equipment, and storage medium for determining object pose in an image
He et al. Towards fast and accurate real-world depth super-resolution: Benchmark dataset and baseline
WO2021083242A1 (en) Map constructing method, positioning method and system, wireless communication terminal, and computer-readable medium
WO2021175050A1 (en) Three-dimensional reconstruction method and three-dimensional reconstruction device
CN110378838B (en) Variable-view-angle image generation method and device, storage medium and electronic equipment
CN103839277B (en) A kind of mobile augmented reality register method of outdoor largescale natural scene
CN108121931B (en) Two-dimensional code data processing method and device and mobile terminal
WO2020134528A1 (en) Target detection method and related product
WO2019035155A1 (en) Image processing system, image processing method, and program
CN110910437B (en) Depth prediction method for complex indoor scene
WO2023082784A1 (en) Person re-identification method and apparatus based on local feature attention
CN111160291B (en) Human eye detection method based on depth information and CNN
CN111209811B (en) Method and system for detecting eyeball attention position in real time
WO2021249114A1 (en) Target tracking method and target tracking device
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN115880415A (en) Three-dimensional reconstruction method and device, electronic equipment and storage medium
US11961266B2 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN111626241A (en) Face detection method and device
WO2022208440A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture
CN113436251A (en) Pose estimation system and method based on improved YOLO6D algorithm
CN113361360B (en) Multi-person tracking method and system based on deep learning
CN112435345B (en) Human body three-dimensional measurement method and system based on deep learning
CN115345927A (en) Exhibit guide method and related device, mobile terminal and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19904627

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19904627

Country of ref document: EP

Kind code of ref document: A1