WO2019127102A1

WO2019127102A1 - Information processing method and apparatus, cloud processing device, and computer program product

Info

Publication number: WO2019127102A1
Application number: PCT/CN2017/119008
Authority: WO
Inventors: 王恺; 廉士国
Original assignee: 深圳前海达闼云端智能科技有限公司
Priority date: 2017-12-27
Filing date: 2017-12-27
Publication date: 2019-07-04
Also published as: CN108124489A; CN108124489B

Abstract

An information processing method and apparatus, a cloud processing device, and a computer program product, applied to the technical field of data processing. The information processing method comprises: obtaining RGBD data collected by an image collection device (101); extracting and processing key frame data in the RGBD data to obtain geometric reconstruction data (102); mapping RGB data in the key frame data and the geometric reconstruction data to obtain three-dimensional reconstruction data; and performing semantic segmentation treatment on the RGB data in the key frame data to obtain semantic segmentation data (103); and mapping the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map (104). According to the method, three-dimensional reconstruction and semantic segmentation can be performed at the same time; not only the three-dimensional reconstruction can be performed according to RGBD data, but also semantic information can be obtained, the calculation time is shortened, and precision of scene segmentation can also be improved.

Description

Information processing method, device, cloud processing device, and computer program product

Technical field

The present application relates to the field of data processing technologies, and in particular, to an information processing method, apparatus, cloud processing device, and computer program product.

Background technique

Semantic map construction refers to the high-level semantic information (such as the object name and location) that can be used by the computer and other devices based on the perceptual data, cognition and understanding of the environment, and comprehensive analysis of the data. Among them, the acquisition of sensory data can be achieved through key technologies such as radio frequency identification technology, auditory technology, and visual technology. At present, most research focuses on visual technology.

In the specific operation process of generating the semantic map, the deep learning technology can be relied on, and the image perceived by the computer in real time may contain multiple objects, firstly segment the image, and then perform the object in the segmented image by means of machine learning or the like. Identification, this process involves a large number of image operations and takes a long time.

However, the processing method in the prior art is mainly for the processing of two-dimensional data. When the three-dimensional data is semantically segmented, geometrically continuous segmentation results cannot be obtained by using this method, and the number of samples is limited, and can be segmented. The types of objects are limited and take a long time.

Summary of the invention

The embodiment of the present application provides an information processing method, device, cloud processing device, and computer program product, which can process three-dimensional data in real time and generate a three-dimensional semantic map, which not only improves the accuracy of scene segmentation but also shortens the processing time.

In a first aspect, an embodiment of the present application provides an information processing method, including:

Obtaining RGBD data collected by the image acquisition device;

Extracting key frame data in the RGBD data and processing to obtain geometric reconstruction data;

Mapping the RGB data in the key frame data and the geometric reconstruction data to obtain three-dimensional reconstruction data; and performing semantic segmentation processing on the RGB data in the key frame data to obtain semantic segmentation data;

The semantic segmentation data is mapped to the three-dimensional reconstruction data to obtain a three-dimensional semantic map.

The aspect as described above, and any possible implementation manner, further provide an implementation manner, the mapping the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map, including:

Determining RGB data corresponding to each point in the three-dimensional reconstruction data;

Determining, according to the first correspondence between the RGB data and the semantic segmentation data, semantic information corresponding to each point in the three-dimensional reconstruction data;

The semantic information of all points in the three-dimensional reconstruction data is integrated to obtain the three-dimensional semantic map.

An aspect of the above, and any possible implementation, further providing an implementation manner,

And performing mapping processing on the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map, including:

Determining RGB data corresponding to each face in the three-dimensional reconstruction data;

Determining, according to the second correspondence between the RGB data and the semantic segmentation data, semantic information corresponding to each face in the three-dimensional reconstruction data;

Determining a face around each connection point in the three-dimensional data;

Determining semantic information of each connection point according to semantic information corresponding to each face;

The semantic information of all the faces in the three-dimensional reconstruction data and the semantic information of all the connection points are integrated to obtain the three-dimensional semantic map.

Extracting and processing the key frame data in the RGBD data to obtain geometric reconstruction data, including:

Calculating pose information of the image collection device according to key frame data in the RGBD data;

Reconstruction is performed according to the pose information and the D data in the key frame data to obtain geometric reconstruction data.

In a second aspect, the embodiment of the present application further provides an information processing apparatus, including:

An acquiring unit, configured to acquire RGBD data collected by the image capturing device;

An extracting unit, configured to extract key frame data in the RGBD data and perform processing to obtain geometric reconstruction data;

a processing unit, configured to perform mapping processing on the RGB data in the key frame data and the geometric reconstruction data to obtain three-dimensional reconstruction data; and perform semantic segmentation on the RGB data in the key frame data to obtain semantic segmentation data;

And a mapping unit, configured to perform mapping processing on the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map.

The mapping unit is specifically configured to:

Determining a face around each connection point in the three-dimensional data;

The extracting unit is specifically configured to:

In a third aspect, the embodiment of the present application further provides a cloud processing device, where the device includes a processor and a memory; the memory is configured to store an instruction, when the instruction is executed by the processor, causing the device to perform, for example, The method of any of the first aspects.

In a fourth aspect, the embodiment of the present application further provides a computer program product, which can be directly loaded into an internal memory of a computer and includes software code. After the computer program is loaded and executed by a computer, the first aspect can be implemented. One such method.

The information processing method, device, cloud processing device and computer program product provided by the embodiments of the present application process geometric key reconstruction data by extracting key frame data in RGBD data, and then perform three-dimensional reconstruction and semantic segmentation simultaneously. The two processes respectively obtain the three-dimensional reconstruction data and the semantic segmentation data, and finally the semantic segmentation data and the three-dimensional reconstruction data are mapped to obtain a three-dimensional semantic map. In the technical solution provided by the embodiment of the present application, the three-dimensional reconstruction and the semantic segmentation can be performed. Simultaneously, it can not only perform three-dimensional reconstruction based on RGBD data, but also obtain semantic information at the same time, which can shorten the calculation time and improve the accuracy of scene segmentation, and achieve the effect of generating three-dimensional map in real time, and solve the three-dimensional map in the prior art. When the data is semantically segmented, the types of objects that can be segmented are limited and take a long time.

DRAWINGS

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present application, and other drawings can be obtained according to the drawings without any creative labor for those skilled in the art.

FIG. 1 is a flowchart of an embodiment of an information processing method according to an embodiment of the present application;

2 is a schematic structural diagram of an embodiment of an information processing apparatus according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of an embodiment of a cloud processing device according to an embodiment of the present disclosure.

Detailed ways

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present application. It is a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

The terms used in the embodiments of the present application are for the purpose of describing particular embodiments only, and are not intended to limit the application. The singular forms "a", "the", and "the"

It should be understood that the term "and/or" as used herein is merely an association describing the associated object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately, while A and B, there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual object is an "or" relationship.

Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining" or "in response to detecting." Similarly, depending on the context, the phrase "if determined" or "if detected (conditions or events stated)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event) "Time" or "in response to a test (condition or event stated)".

In order to enhance the perception and understanding of computers and other devices in the surrounding environment, we need to provide high quality 3D semantic maps. The three-dimensional semantic map consists of two parts, one of which is a three-dimensional reconstruction model obtained by reconstructing one environment, and the other is scene recognition information obtained by precise semantic segmentation. In the prior art, semantic segmentation is mostly based on two-dimensional data processing, and semantic segmentation of three-dimensional data in the same manner cannot obtain geometrically continuous segmentation results, and it takes a long time, and it is difficult to achieve real-time completion. The embodiment of the present application provides an information processing method, which performs semantic transformation on the collected environmental information while realizing three-dimensional reconstruction of the collected environmental information, and real-time generates a three-dimensional semantic map. Specifically, FIG. 1 is the present application. A flowchart of an embodiment of the information processing method provided by the embodiment, as shown in FIG. 1 , the information processing method provided by the embodiment of the present application may specifically include the following steps:

101. Acquire RGBD data collected by the image collection device.

In the embodiment of the present application, when it is required to perform three-dimensional reconstruction on a certain scene and obtain a three-dimensional semantic map, the image acquisition device is first used to collect images on the scene, and the image collection device needs to include an RGB camera and a depth (Depth) camera, and The RGBD data is obtained after the acquisition is completed. In a specific implementation process, the computer for generating a three-dimensional semantic map may include a real-time mapping positioning module, which is used to acquire RGBD data collected by the image acquisition device, and specifically, may be constructed by a real-time mapping module. Actively acquiring RGBD data, the image acquisition device can also actively send RGBD data to the real-time mapping positioning module.

102. Extract key frame data in the RGBD data and process the data to obtain geometric reconstruction data.

In the embodiment of the present application, the following steps may be used to obtain the geometric reconstruction data: first, the pose information of the image acquisition device is calculated according to the key frame data in the RGBD data, and specifically, the RGBD data corresponding to the key frame is extracted in all the RGBD data. Calculating the pose of the image acquisition device according to the RGBD data corresponding to the key frame; then reconstructing according to the pose information and the D data in the key frame data to obtain geometric reconstruction data.

The geometric reconstruction data can include two formats, one is a point cloud format, and the other is a grid format, and the two formats can be selected according to actual needs. For example, in a specific implementation process, the pose information and the D data in the key frame data are processed by a fast fusion algorithm to reconstruct the data in the point cloud format. For another example, in a specific implementation process, the fast fusion algorithm is used to process the pose information and the D data in the key frame data to reconstruct the data in the grid format.

In the embodiment of the present application, during the reconstruction process, there are at least two key frames, and therefore, all key frame data needs to be used for reconstruction at the same time.

103. Perform mapping processing on the RGB data and the geometric reconstruction data in the key frame data to obtain three-dimensional reconstruction data; and perform semantic segmentation processing on the RGB data in the key frame data to obtain semantic segmentation data.

In the embodiment of the present application, the two processes of obtaining the three-dimensional reconstruction data and obtaining the semantic segmentation data have large calculation amounts and occupy a large amount of computing resources, so the two are put into different threads, or Parallel computing is used.

Among them, since the geometric reconstruction data can include two formats, the process of generating three-dimensional reconstruction data according to different formats will be different. When the geometric reconstruction data is in the point cloud format, first find the D data corresponding to each point in the point cloud, and then, according to the calibration result of the RGB camera and the depth camera, find the RGB data corresponding to each point, and finally each point. The value of the corresponding RGB data is assigned to the corresponding point. When the geometric reconstruction data is in a grid format, the RGB data corresponding to the key frame is mapped to the grid as a texture according to an algorithm. In a specific implementation process, the algorithm may include a nearest sampling point algorithm, a bilinear interpolation algorithm, and three Linear interpolation algorithm, etc.

The semantic segmentation data can be obtained by selecting different prior art methods according to different scenarios.

104. Perform mapping processing on the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map.

In the embodiment of the present application, since the geometric reconstruction data has different formats, each format is separately described in this step.

When the geometric reconstruction data is in a point cloud format: first, determining RGB data corresponding to each point in the three-dimensional reconstruction data; and then determining, corresponding to each point in the three-dimensional reconstruction data, according to the first correspondence relationship between the RGB data and the semantic segmentation data Semantic information; finally, the semantic information of all points in the 3D reconstruction data is integrated to obtain a 3D semantic map.

In order to explain the flow in more detail, a calculation formula is used in the embodiment of the present application. Assume that each point of the 3D geometric reconstruction result is V _P (P is the number of the point), and the RGB value V _C corresponding to each point can be obtained by looking up the table Ω. The table Ω is a table representing the correspondence between the sequence number P and the RGB values. The semantic information of each point is determined by the first correspondence function, and the specific function is:

F(V _P , V _C )=V _S

Where V _S is semantic information, V _P is a point, and V _C is an RGB value.

When the geometric reconstruction data is in a grid format: first, determining RGB data corresponding to each face in the three-dimensional reconstruction data; and then, according to the second correspondence relationship between the RGB data and the semantic segmentation data, determining each face in the three-dimensional reconstruction data Semantic information; then, determine the faces around each connection point in the three-dimensional data; determine the semantic information of each connection point according to the semantic information corresponding to each face; finally, integrate the semantic information of all faces in the three-dimensional reconstruction data A semantic map of all the connection points is obtained to obtain a three-dimensional semantic map.

In order to explain the flow in more detail, a calculation formula is used in the embodiment of the present application.

A mesh consists of points and faces, and faces are connected by points. Assume that the 3D geometric reconstruction result includes n points, each point is set to V _i (i=1 to n), m faces, each face is set to F _j (j=1 to m), where n, m, j is a positive integer. Let each face F _j correspond to a region F _{c of} RGB data, and the corresponding RGB value F _{c of} each region can be obtained by looking up the table σ. The table Ω is a table representing the correspondence between the sequence number j and the RGB values.

First, the semantic information of each face is determined by a second corresponding relationship function. The specific function is:

G(F _j , F _c )=F _s

Where F _s is semantic information, F _j is a face, and F _c is an RGB value.

Then, it is determined semantic information for each connection point V _i is set each face around the connection points is _{F k (k = 1 to p} ), semantic information corresponding to F _k F _k ^s, semantic information can function Express, the specific function is:

V _i ^s =Q(F _k ^s )(k=1 to p)

Where V _i ^s is semantic information, F _k ^s is semantic information of all faces around V _i , and p is the number of faces around V _i .

In a specific implementation process, the function Q(F _k ^s ) can be expressed as follows:

Where F _k ^s is the semantic information of all faces around V _i , and p is the number of faces around V _i .

In another specific implementation process, the function Q(F _k ^s ) can take the following specific representation:

Where F _k ^s is the semantic information of all faces around V _i , p is the number of faces around V _i , and F _k ^A is the area of F _k .

The information processing method provided by the embodiment of the present application extracts key frame data in the RGBD data, processes the key frame data to obtain geometric reconstruction data, and then simultaneously performs two processes of three-dimensional reconstruction and semantic segmentation to obtain three-dimensional reconstruction data and Semanticly segmenting the data, and finally mapping the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map. In the technical solution provided by the embodiment of the present application, the three-dimensional reconstruction and the semantic segmentation can be simultaneously performed, and the three-dimensional reconstruction can be performed according to the RGBD data. Reconstruction can obtain semantic information at the same time, which can shorten the calculation time and improve the accuracy of scene segmentation, and achieve the effect of generating 3D maps in real time, which can solve the problem of segmentation of 3D data in the prior art. The type of object is limited and takes a long time.

In order to implement the method flow of the foregoing, the embodiment of the present application further provides an information processing apparatus. FIG. 2 is a schematic structural diagram of an embodiment of an information processing apparatus according to an embodiment of the present application. As shown in FIG. 2, the apparatus of this embodiment may be used. The acquisition unit 11 includes an acquisition unit 11, an extraction unit 12, a processing unit 13, and a mapping unit 14.

The obtaining unit 11 is configured to acquire RGBD data collected by the image capturing device.

The extracting unit 12 is configured to extract key frame data in the RGBD data and perform processing to obtain geometric reconstruction data.

The processing unit 13 is configured to perform mapping processing on the RGB data and the geometric reconstruction data in the key frame data to obtain three-dimensional reconstruction data; and perform semantic segmentation processing on the RGB data in the key frame data to obtain semantic segmentation data.

The mapping unit 14 is configured to perform mapping processing on the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map.

In a specific implementation process, the mapping unit 14 is specifically configured to:

Determining semantic information corresponding to each point in the three-dimensional reconstruction data according to the first correspondence between the RGB data and the semantic segmentation data;

Integrate the semantic information of all points in the 3D reconstruction data to obtain a 3D semantic map.

In another specific implementation process, the mapping unit 14 is specifically configured to:

Determining semantic information corresponding to each face in the three-dimensional reconstruction data according to the second correspondence between the RGB data and the semantic segmentation data;

Determining the faces around each connection point in the three-dimensional data;

Integrate the semantic information of all faces in the 3D reconstruction data with the semantic information of all connected points to obtain a 3D semantic map.

The extracting unit 12 is specifically configured to:

Calculating pose information of the image acquisition device according to key frame data in the RGBD data;

The reconstruction is performed based on the pose information and the D data in the key frame data to obtain geometric reconstruction data.

The information processing apparatus provided by the embodiment of the present application may be used to implement the technical solution of the method embodiment shown in FIG. 1 , and the implementation principle and technical effects thereof are similar, and details are not described herein again.

The embodiment of the present application provides a cloud processing device, and FIG. 3 is a schematic structural diagram of an embodiment of a cloud processing device according to an embodiment of the present disclosure. The cloud processing device includes a processor 21 and a memory 22; the memory 22 is for storing instructions that, when executed by the processor 21, cause the device to perform any of the methods described above.

The cloud processing device provided by the embodiment of the present application may be used to implement the technical solution of the method embodiment shown in FIG. 1 , and the implementation principle and the technical effect are similar, and details are not described herein again.

In order to implement the method flow of the foregoing, the embodiment of the present application further provides a computer program product, which can be directly loaded into an internal memory of a computer and contains software code, and the computer program can be implemented by being loaded and executed by a computer. Any method.

One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

The device embodiments described above are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located in one place. Or it can be distributed to at least two network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Finally, it should be noted that the above embodiments are only for explaining the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present application. range.

Claims

An information processing method, comprising:

Obtaining RGBD data collected by the image acquisition device;

Extracting key frame data in the RGBD data and processing to obtain geometric reconstruction data;

Mapping the RGB data in the key frame data and the geometric reconstruction data to obtain three-dimensional reconstruction data; and performing semantic segmentation processing on the RGB data in the key frame data to obtain semantic segmentation data;

The semantic segmentation data is mapped to the three-dimensional reconstruction data to obtain a three-dimensional semantic map.
The method according to claim 1, wherein the mapping the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map comprises:

Determining RGB data corresponding to each point in the three-dimensional reconstruction data;

Determining, according to the first correspondence between the RGB data and the semantic segmentation data, semantic information corresponding to each point in the three-dimensional reconstruction data;

The semantic information of all points in the three-dimensional reconstruction data is integrated to obtain the three-dimensional semantic map.
The method according to claim 1, wherein the mapping the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map comprises:

Determining RGB data corresponding to each face in the three-dimensional reconstruction data;

Determining, according to the second correspondence between the RGB data and the semantic segmentation data, semantic information corresponding to each face in the three-dimensional reconstruction data;

Determining a face around each connection point in the three-dimensional data;

Determining semantic information of each connection point according to semantic information corresponding to each face;

The semantic information of all the faces in the three-dimensional reconstruction data and the semantic information of all the connection points are integrated to obtain the three-dimensional semantic map.
The method according to claim 1, wherein the extracting and processing the key frame data in the RGBD data to obtain geometric reconstruction data comprises:

Calculating pose information of the image collection device according to key frame data in the RGBD data;

Reconstruction is performed according to the pose information and the D data in the key frame data to obtain geometric reconstruction data.
An information processing apparatus, comprising:

An acquiring unit, configured to acquire RGBD data collected by the image capturing device;

An extracting unit, configured to extract key frame data in the RGBD data and perform processing to obtain geometric reconstruction data;

a processing unit, configured to perform mapping processing on the RGB data in the key frame data and the geometric reconstruction data to obtain three-dimensional reconstruction data; and perform semantic segmentation on the RGB data in the key frame data to obtain semantic segmentation data;

And a mapping unit, configured to perform mapping processing on the semantic segmentation data and the three-dimensional reconstruction data to obtain a three-dimensional semantic map.
The device according to claim 5, wherein the mapping unit is specifically configured to:

Determining RGB data corresponding to each point in the three-dimensional reconstruction data;

Determining, according to the first correspondence between the RGB data and the semantic segmentation data, semantic information corresponding to each point in the three-dimensional reconstruction data;

The semantic information of all points in the three-dimensional reconstruction data is integrated to obtain the three-dimensional semantic map.
The device according to claim 5, wherein the mapping unit is specifically configured to:

Determining RGB data corresponding to each face in the three-dimensional reconstruction data;

Determining, according to the second correspondence between the RGB data and the semantic segmentation data, semantic information corresponding to each face in the three-dimensional reconstruction data;

Determining a face around each connection point in the three-dimensional data;

Determining semantic information of each connection point according to semantic information corresponding to each face;

The semantic information of all the faces in the three-dimensional reconstruction data and the semantic information of all the connection points are integrated to obtain the three-dimensional semantic map.
The device according to claim 5, wherein the extracting unit is specifically configured to:

Calculating pose information of the image collection device according to key frame data in the RGBD data;

Reconstruction is performed according to the pose information and the D data in the key frame data to obtain geometric reconstruction data.
A cloud processing device, characterized in that the device comprises a processor and a memory; the memory is for storing instructions, when the instructions are executed by the processor, causing the device to perform as claimed in claims 1 to 4 Any of the methods described.
A computer program product, which can be directly loaded into an internal memory of a computer and containing software code, which can be implemented by any one of claims 1 to 4 after being loaded and executed by a computer Methods.