CN109313820B

CN109313820B - Three-dimensional data encoding method, decoding method, encoding device, and decoding device

Info

Publication number: CN109313820B
Application number: CN201780036423.0A
Authority: CN
Inventors: 杉尾敏康; 西孝启; 远间正真; 松延徹; 吉川哲史; 小山达也
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2016-06-14
Filing date: 2017-05-23
Publication date: 2023-07-04
Anticipated expiration: 2037-05-23
Also published as: JP7041191B2; JP2022082586A; JP6711913B2; EP3471065A4; US11593970B2; JPWO2017217191A1; US11127169B2; JP2020145751A; CN109313820A; US20210327100A1; US20190108656A1; WO2017217191A1; CN116630452A; EP3471065A1

Abstract

A method of three-dimensional data encoding, comprising: an extraction step (S403) for extracting 2 nd three-dimensional data (412) having a feature value equal to or greater than a threshold value from the 1 st three-dimensional data (411); and a 1 st encoding step (S405) of encoding the 2 nd three-dimensional data (412) to generate 1 st encoded three-dimensional data (414). For example, the three-dimensional data encoding method further includes a 2 nd encoding step (S404) of generating 2 nd encoded three-dimensional data (413) by encoding the 1 st three-dimensional data (411).

Description

Three-dimensional data encoding method, decoding method, encoding device, and decoding device

Technical Field

The present application relates to a three-dimensional data encoding method, a three-dimensional data decoding method, a three-dimensional data encoding device, and a three-dimensional data decoding device.

Background

Devices and services using three-dimensional data are becoming popular in the future in large fields such as computer vision, map information, monitoring, infrastructure inspection, and video distribution, which are used for autonomous operation of automobiles and robots. Three-dimensional data is obtained by various methods such as a range finder equidistant sensor, a stereo camera, or a combination of a plurality of monocular cameras.

One expression method of three-dimensional data is a method called point cloud data, which expresses the shape of a three-dimensional structure by a point group in a three-dimensional space (for example, refer to non-patent document 1). The point cloud data stores the positions and colors of the point clusters. Although it is expected that point cloud data will become the mainstream as a method of expressing three-dimensional data, the data amount of the point group is very large. Therefore, in the accumulation or transmission of three-dimensional data, as in the case of two-dimensional moving pictures (for example, MPEG-4AVC, HEVC, or the like standardized by MPEG), it is necessary to compress the data amount by encoding.

In addition, some of the compression of the point cloud data is supported by a public library (Point Cloud Library: point cloud library) or the like that performs processing relating to the point cloud data.

Prior art literature

Non-patent literature

Non-patent document 1, "Octree-Based Progressive Geometry Coding of Point Clouds", eurographics Symposium on Point-Based Graphics (2006)

Disclosure of Invention

Problems to be solved by the invention

Such three-dimensional data has a huge amount of data compared with two-dimensional data, and the three-dimensional encoded data to be transmitted has a huge amount of data.

The present application aims to provide a three-dimensional data encoding method, a three-dimensional data decoding method, a three-dimensional data encoding device, or a three-dimensional data decoding device, which can reduce the amount of data during transmission.

Means for solving the problems

The three-dimensional data encoding method according to one aspect of the present application includes: an extraction step of extracting 2 nd three-dimensional data having a feature value equal to or greater than a threshold value from the 1 st three-dimensional data; and a 1 st encoding step of generating 1 st encoded three-dimensional data by encoding the 2 nd three-dimensional data.

The three-dimensional data decoding method according to one aspect of the present application includes: a 1 st decoding step of decoding 1 st encoded three-dimensional data obtained by encoding 2 nd three-dimensional data having a feature value extracted from the 1 st three-dimensional data equal to or greater than a threshold value by a 1 st decoding method; and a 2 nd decoding step of decoding the 2 nd encoded three-dimensional data obtained by encoding the 1 st three-dimensional data in a 2 nd decoding method different from the 1 st decoding method.

In addition, all or a specific aspect of the present invention may be implemented as a system, a method, an integrated circuit, a computer program, or a recording medium such as a computer-readable CD-ROM, and may be implemented by combining the system, the method, the integrated circuit, the computer program, and the recording medium.

ADVANTAGEOUS EFFECTS OF INVENTION

The present invention can provide a three-dimensional data encoding method, a three-dimensional data decoding method, a three-dimensional data encoding device, or a three-dimensional data decoding device, which can reduce the amount of data at the time of transmission.

Drawings

Fig. 1 shows a structure of encoded three-dimensional data according to embodiment 1.

Fig. 2 shows an example of the prediction structure between SPCs belonging to the lowest layer of GOS according to embodiment 1.

Fig. 3 shows an example of an inter-layer prediction structure according to embodiment 1.

Fig. 4 shows an example of the coding sequence of GOS according to embodiment 1.

Fig. 5 shows an example of the coding sequence of GOS according to embodiment 1.

Fig. 6 is a block diagram of the three-dimensional data encoding device according to embodiment 1.

Fig. 7 is a flowchart of the encoding process according to embodiment 1.

Fig. 8 is a block diagram of the three-dimensional data decoding device according to embodiment 1.

Fig. 9 is a flowchart of decoding processing according to embodiment 1.

Fig. 10 shows an example of meta information according to embodiment 1.

Fig. 11 shows an example of the structure of the SWLD according to embodiment 2.

Fig. 12 shows an example of the operation of the server and the client according to embodiment 2.

Fig. 13 shows an example of the operation of the server and the client according to embodiment 2.

Fig. 14 shows an example of the operation of the server and the client according to embodiment 2.

Fig. 15 shows an example of the operation of the server and the client according to embodiment 2.

Fig. 16 is a block diagram of a three-dimensional data encoding device according to embodiment 2.

Fig. 17 is a flowchart of the encoding process according to embodiment 2.

Fig. 18 is a block diagram of a three-dimensional data decoding device according to embodiment 2.

Fig. 19 is a flowchart of decoding processing according to embodiment 2.

Fig. 20 shows an example of the structure of WLD according to embodiment 2.

Fig. 21 shows an example of an octree structure of WLD according to embodiment 2.

Fig. 22 shows an example of the structure of the SWLD according to embodiment 2.

Fig. 23 shows an example of an octree structure of the SWLD according to embodiment 2.

Detailed Description

When encoded data such as point cloud data is used in an actual device or service, random access is required for a desired spatial position, a target object, or the like, but heretofore, random access in three-dimensional encoded data does not exist as a function, and thus, an encoding method therefor does not exist.

In the present application, a three-dimensional data encoding method, a three-dimensional data decoding method, a three-dimensional data encoding apparatus, or a three-dimensional data decoding apparatus capable of providing a random access function in encoding three-dimensional data can be provided.

The three-dimensional data coding method related to one aspect of the present application codes three-dimensional data, the three-dimensional data coding method includes a dividing step of dividing the three-dimensional data into first processing units corresponding to three-dimensional coordinates, respectively, the first processing units being random access units; and an encoding step of generating encoded data by encoding each of the plurality of first processing units.

Accordingly, random access in each first processing unit becomes possible. Thus, the three-dimensional data encoding method can provide a random access function in encoding three-dimensional data.

For example, the three-dimensional data encoding method may include a generation step of generating first information indicating the plurality of first processing units and three-dimensional coordinates corresponding to each of the plurality of first processing units, and the encoded data may include the first information.

For example, the first information may further include at least one of an object, a time, and a data storage destination corresponding to each of the plurality of first processing units.

For example, in the dividing step, the first processing unit may be further divided into a plurality of second processing units, and in the encoding step, each of the plurality of second processing units may be encoded.

For example, in the encoding step, the encoding may be performed with reference to a second processing unit of the processing object included in the first processing unit of the processing object, and with reference to another second processing unit included in the first processing unit of the processing object.

Accordingly, the coding efficiency can be improved by referring to the other second processing units.

For example, in the encoding step, one type of the second processing unit to be processed may be selected from among a first type of the second processing unit to be processed, a second type of the second processing unit to be processed, and a third type of the second processing unit to be processed, the second processing unit to be processed being processed may be encoded in accordance with the selected type.

For example, in the encoding step, the frequency of selecting the first type may be changed in accordance with the number or the degree of the density of the objects included in the three-dimensional data.

Accordingly, the random accessibility and the coding efficiency in the trade-off relation can be appropriately set.

For example, in the encoding step, the size of the first processing unit may be determined according to the number or degree of density of the objects included in the three-dimensional data or the number or degree of density of the dynamic objects.

For example, the first processing unit may include a plurality of layers spatially divided in a predetermined direction, each of the plurality of layers may include one or more of the second processing units, and the encoding step may encode the second processing units with reference to the second processing units included in a layer that is the same layer as or lower than the second processing units.

Accordingly, for example, the random accessibility of an important layer in a system can be improved, and a decrease in coding efficiency can be suppressed.

For example, in the dividing step, the second processing unit including only the static object and the second processing unit including only the dynamic object may be allocated to different first processing units.

Accordingly, the dynamic object and the static object can be easily controlled.

For example, in the encoding step, each of the plurality of dynamic objects may be encoded, and the encoded data of the plurality of dynamic objects may correspond to the second processing unit including only the static object.

Accordingly, the dynamic object and the static object can be easily controlled.

For example, in the dividing step, the second processing unit may be further divided into a plurality of third processing units, and in the encoding step, each of the plurality of third processing units may be encoded.

For example, the third processing unit may include one or more voxels, and the voxels may be the smallest unit corresponding to the position information.

For example, the second processing unit may include a feature point group derived from information obtained by a sensor.

For example, the encoded data may include information indicating an encoding order of the plurality of first processing units.

For example, the encoded data may include information indicating the sizes of the plurality of first processing units.

For example, in the encoding step, a plurality of the first processing units may be encoded in parallel.

The three-dimensional data decoding method according to one aspect of the present application includes a decoding step of decoding each of the encoded data of the first processing unit corresponding to the three-dimensional coordinates, thereby generating the three-dimensional data of the first processing unit, the first processing unit being a random access unit.

Accordingly, random access per first processing unit becomes possible. Thus, the three-dimensional data decoding method can provide a random access function in encoding three-dimensional data.

The three-dimensional data encoding device according to one aspect of the present application may include a dividing unit that divides the three-dimensional data into first processing units corresponding to the three-dimensional coordinates, the first processing units being random access units; and an encoding unit configured to generate encoded data by encoding each of the plurality of first processing units.

Accordingly, random access per first processing unit becomes possible. In this way, the three-dimensional data encoding device can provide a random access function in encoding three-dimensional data.

The three-dimensional data decoding device according to one aspect of the present application may be configured to decode three-dimensional data, and the three-dimensional data decoding device may include a decoding unit configured to generate three-dimensional data of a first processing unit, which is a random access unit, by decoding each of encoded data of the first processing unit corresponding to a three-dimensional coordinate.

Accordingly, random access per first processing unit becomes possible. Thus, the three-dimensional data decoding apparatus can provide a random access function for encoding three-dimensional data.

In addition, the present application can make quantization, prediction, and the like of a space possible by a configuration of dividing and encoding the space, and is effective even without performing random access.

Accordingly, the three-dimensional data encoding method generates 1 st encoded three-dimensional data obtained by encoding data having a feature value equal to or greater than a threshold value. In this way, the amount of data of the encoded three-dimensional data can be reduced as compared with the case where the 1 st three-dimensional data is directly encoded. Therefore, the three-dimensional data encoding method can reduce the amount of data at the time of transmission.

For example, the three-dimensional data encoding method may further include a 2 nd encoding step of encoding the 1 st three-dimensional data to generate 2 nd encoded three-dimensional data.

Accordingly, the three-dimensional data encoding method can selectively transmit the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data according to the use, for example.

For example, the 2 nd three-dimensional data may be encoded by a 1 st encoding method, and the 1 st three-dimensional data may be encoded by a 2 nd encoding method different from the 1 st encoding method.

Accordingly, the three-dimensional data encoding method can adopt an appropriate encoding method for the 1 st three-dimensional data and the 2 nd three-dimensional data, respectively.

For example, in the 1 st encoding method, inter prediction among intra prediction and inter prediction may be prioritized over the 2 nd encoding method.

Accordingly, the three-dimensional data encoding method can improve the priority of inter-frame prediction for the 2 nd three-dimensional data in which the correlation between adjacent data is likely to be low.

For example, the 1 st encoding method may be different from the 2 nd encoding method in terms of the three-dimensional position expression method.

Accordingly, the three-dimensional data encoding method can adopt a more appropriate three-dimensional position representation method for three-dimensional data having different data numbers.

For example, at least one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data may include an identifier indicating whether the encoded three-dimensional data is encoded by the 1 st three-dimensional data or encoded by a part of the 1 st three-dimensional data.

Accordingly, the decoding apparatus can easily determine whether the obtained encoded three-dimensional data is the 1 st encoded three-dimensional data or the 2 nd encoded three-dimensional data.

For example, in the 1 st encoding step, the 2 nd three-dimensional data may be encoded so that the data amount of the 1 st encoded three-dimensional data is smaller than the data amount of the 2 nd encoded three-dimensional data.

Accordingly, the three-dimensional data encoding method can reduce the data amount of the 1 st encoded three-dimensional data compared with the data amount of the 2 nd encoded three-dimensional data.

For example, in the extracting step, data corresponding to an object having a predetermined attribute may be extracted as the 2 nd three-dimensional data from the 1 st three-dimensional data.

Accordingly, the three-dimensional data encoding method can generate 1 st encoded three-dimensional data including data required by the decoding apparatus.

For example, the three-dimensional data encoding method may further include a transmission step of transmitting one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data to the client in accordance with a state of the client.

Accordingly, the three-dimensional data encoding method can transmit appropriate data according to the state of the client.

For example, the state of the client may include a communication status of the client or a movement speed of the client.

For example, the three-dimensional data encoding method may further include a transmission step of transmitting one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data to the client according to a request from the client.

Accordingly, the three-dimensional data encoding method can transmit appropriate data according to a request of a client.

Accordingly, the three-dimensional data decoding method can selectively receive the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data, which are obtained by encoding data having a feature value equal to or greater than a threshold value, for example, according to the use application or the like. Accordingly, the three-dimensional data decoding method can reduce the data amount during transmission. In addition, the three-dimensional data decoding method can adopt an appropriate decoding method for the 1 st three-dimensional data and the 2 nd three-dimensional data, respectively.

For example, in the 1 st decoding method, inter prediction among intra prediction and inter prediction may be prioritized over the 2 nd decoding method.

Accordingly, the three-dimensional data decoding method can improve the priority of inter-frame prediction for the 2 nd three-dimensional data in which the correlation between adjacent data is likely to be low.

For example, the 1 st decoding method and the 2 nd decoding method may be different from each other in terms of the three-dimensional position expression method.

Accordingly, the three-dimensional data decoding method can adopt a more appropriate three-dimensional position representation method for three-dimensional data having different data numbers.

For example, at least one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data may include an identifier indicating whether the encoded three-dimensional data is encoded by encoding the 1 st three-dimensional data or encoded by encoding a part of the 1 st three-dimensional data, and the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data may be identified by referring to the identifier.

Accordingly, the three-dimensional data decoding method can easily determine whether the obtained encoded three-dimensional data is the 1 st encoded three-dimensional data or the 2 nd encoded three-dimensional data.

For example, the three-dimensional data decoding method may further include: a notification step of notifying a server of the state of the client; and a receiving step of receiving one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data transmitted from the server in accordance with the state of the client.

Accordingly, the three-dimensional data decoding method can receive appropriate data according to the state of the client.

For example, the three-dimensional data decoding method may further include: a request step of requesting a server for one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data; and a receiving step of receiving one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data transmitted from the server in accordance with the request.

Accordingly, the three-dimensional data decoding method can receive appropriate data according to the application.

A three-dimensional data encoding device according to an aspect of the present application includes: an extraction unit that extracts, from the 1 st three-dimensional data, the 2 nd three-dimensional data having a feature value equal to or greater than a threshold value; and a 1 st encoding unit configured to encode the 2 nd three-dimensional data to generate 1 st encoded three-dimensional data.

Accordingly, the three-dimensional data encoding device generates 1 st encoded three-dimensional data obtained by encoding data having a feature value equal to or greater than a threshold value. Accordingly, the data amount can be reduced as compared with the case where the 1 st three-dimensional data is directly encoded. Therefore, the three-dimensional data encoding device can reduce the amount of data at the time of transmission.

A three-dimensional data decoding device according to an aspect of the present application includes: a 1 st decoding unit that decodes 1 st encoded three-dimensional data obtained by encoding 2 nd three-dimensional data having a feature value extracted from the 1 st three-dimensional data equal to or greater than a threshold value by a 1 st decoding method; and a 2 nd decoding unit that decodes the 2 nd encoded three-dimensional data obtained by encoding the 1 st three-dimensional data by a 2 nd decoding method different from the 1 st decoding method.

Accordingly, the three-dimensional data decoding device selectively receives, for example, the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data obtained by encoding the data having the feature value equal to or greater than the threshold value according to the use application or the like. Accordingly, the three-dimensional data decoding device can reduce the data amount at the time of transmission. The three-dimensional data decoding device can adopt an appropriate decoding method for each of the 1 st three-dimensional data and the 2 nd three-dimensional data.

The general and specific aspects may be implemented by a system, a method, an integrated circuit, a computer program, a computer-readable recording medium such as a CD-ROM, or any combination of the system, the method, the integrated circuit, the computer program, and the recording medium.

The embodiments are specifically described below with reference to the drawings. In addition, the embodiments to be described below are each a specific example showing the present application. The numerical values, shapes, materials, components, arrangement positions of components, connection patterns, steps, order of steps, and the like shown in the following embodiments are examples, and the gist thereof is not limited to the present application. Among the constituent elements of the following embodiments, constituent elements not described in the embodiments showing the uppermost concept are described as arbitrary constituent elements.

(embodiment 1)

First, a data structure of encoded three-dimensional data (hereinafter also referred to as encoded data) according to the present embodiment will be described. Fig. 1 shows a structure of encoded three-dimensional data according to the present embodiment.

In the present embodiment, the three-dimensional space is divided into Spaces (SPCs) corresponding to pictures in the encoding of moving images, and three-dimensional data is encoded in units of space. The space is further divided into Volumes (VLM) corresponding to macro blocks and the like in moving image coding, and prediction and conversion are performed in units of VLM. The volume comprises a minimum unit, i.e. a plurality of Voxels (VXL), corresponding to the position coordinates. In addition, the prediction refers to generating predicted three-dimensional data similar to a processing unit of a processing target with reference to another processing unit, and encoding a difference between the predicted three-dimensional data and the processing unit of the processing target, as in the prediction performed on the two-dimensional image. The prediction includes not only spatial prediction with reference to other prediction units at the same time but also temporal prediction with reference to prediction units at different times.

For example, when encoding a three-dimensional space represented by point group data such as point cloud data, a three-dimensional data encoding device (hereinafter also referred to as an encoding device) encodes a plurality of points included in each point or voxel of a point group together according to the size of the voxel. The three-dimensional shape of the point group can be represented with high accuracy by subdividing the voxels, and with increasing the size of the voxels, the three-dimensional shape of the point group can be roughly represented.

In the following, the case where three-dimensional data is point cloud data will be described as an example, but the three-dimensional data is not limited to the point cloud data and may be any form of three-dimensional data.

And, voxels of the hierarchical structure may be utilized. In this case, among the n-th order hierarchy, it is possible to sequentially show whether or not there are sampling points in the n-1 th order hierarchy (the lower layer of the n-th order hierarchy). For example, when a hierarchy of n times is decoded, if a sampling point exists in a hierarchy of n-1 times or less, it can be regarded that a sampling point exists in the center of a voxel of the hierarchy of n times.

The encoding device obtains point group data by a distance sensor, a stereo camera, a monocular camera, a gyroscope, an inertial sensor, or the like.

The space is classified into at least any one of three prediction structures, i.e., an intra-frame space (I-SPC) that can be decoded alone, a prediction space (P-SPC) that can be referred to only in one direction, and a bidirectional space (B-SPC) that can be referred to in two directions, similarly to the encoding of a moving picture. The space has two kinds of time information, namely, decoding time and display time.

As shown in fig. 1, a GOS (Group Of spaces) is a random access unit as a processing unit including a plurality Of spaces. Moreover, as a processing unit including a plurality of GOSs, there is a world space (WLD).

The spatial area occupied by the world space is associated with an absolute position on the earth by GPS, latitude and longitude information, or the like. The position information is stored as meta information. In addition, the meta information may be included in the encoded data or may be transmitted separately from the encoded data.

In the GOS, all the SPCs may be adjacent in three dimensions, or SPCs that are not adjacent in three dimensions to other SPCs may be present.

In the following, processing such as encoding, decoding, and referencing corresponding to three-dimensional data included in processing units such as GOS, SPC, and VLM will be simply referred to as encoding, decoding, and referencing processing units. The three-dimensional data included in the processing unit includes, for example, at least one set of a spatial position such as a three-dimensional coordinate and a characteristic value such as color information.

Next, a prediction structure of SPC in GOS will be described. The multiple SPCs within the same GOS or the multiple VLMs within the same SPC occupy different spaces from each other, but hold the same time information (decoding time and display time).

In the GOS, the SPC at the beginning of the decoding order is I-SPC. Also, there are two types of GOSs, closed GOS and open GOS. The closed GOS is a GOS that can decode all SPCs within the GOS when decoding is started from the beginning I-SPC. In the open GOS, a part of SPC earlier than the display time of the leading I-SPC refers to a different GOS within the GOS, and can only be decoded in that GOS.

In addition, in encoded data such as map information, WLD may be decoded from a direction opposite to the encoding order, and if there is a dependency between GOS, it is difficult to perform reverse reproduction. In this case, therefore, substantially closed GOS is employed.

And, the GOS has a layer structure in a height direction, and is sequentially encoded or decoded from the SPC of the lower layer.

Fig. 2 shows an example of an inter-SPC prediction structure belonging to the lowest layer of GOS.

Fig. 3 shows an example of an inter-layer prediction structure.

There is more than one I-SPC in the GOS. In a three-dimensional space, although objects such as a person, an animal, an automobile, a bicycle, a signal lamp, or a building serving as a land mark exist, it is effective to encode a small-sized object as I-SPC in particular. For example, when a three-dimensional data decoding device (hereinafter also referred to as a decoding device) decodes a GOS at a low throughput or a high speed, only the I-SPC in the GOS is decoded.

The encoding device may switch the encoding interval or the occurrence frequency of the I-SPC according to the degree of the density of the object in the WLD.

In the configuration shown in fig. 3, the encoding device or the decoding device encodes or decodes a plurality of layers sequentially from the lower layer (layer 1). Accordingly, for example, the priority of data in the vicinity of the ground with a large information amount can be improved for a vehicle or the like that is traveling automatically.

In addition, in encoded data used for unmanned aerial vehicles (drones) and the like, encoding and decoding may be performed sequentially from the SPC of the layer above in the height direction within the GOS.

The encoding device or the decoding device may encode or decode the plurality of layers so that the decoding device can grasp the GOS and gradually increase the resolution. For example, the encoding device or decoding device may encode or decode in the order of

layers

3, 8, 1, 9.

Next, a description will be given of a method for associating a static object and a dynamic object.

In the three-dimensional space, there are static objects or scenes (hereinafter collectively referred to as static objects) such as buildings and roads, and dynamic objects (hereinafter referred to as dynamic objects) such as vehicles and people. The detection of the object may be additionally performed by extracting feature points from data of point cloud data, or captured images of a stereo camera or the like. Here, an example of a method of encoding a dynamic object will be described.

The first method is a method of encoding without distinguishing between a static object and a dynamic object. The second method is a method of distinguishing a static object from a dynamic object by identifying information.

For example, GOS is used as a recognition unit. In this case, the GOS including the SPC constituting the static object is discriminated from the GOS including the SPC constituting the dynamic object in the encoded data or by the identification information stored separately from the encoded data.

Alternatively, SPC is used as the identification unit. In this case, the SPC including only the VLM constituting the static object and the SPC including the VLM constituting the dynamic object are distinguished from each other by the identification information described above.

Alternatively, VLM or VXL may be used as the recognition unit. In this case, the VLM or VXL including the static object is distinguished from the VLM or VXL including the dynamic object by the above-described identification information.

The encoding device may encode the dynamic object as one or more VLMs or SPCs, and encode the VLMs or SPCs including the static object and the SPCs including the dynamic object as GOSs different from each other. When the size of the GOS is variable according to the size of the moving object, the encoding device stores the size of the GOS as meta information.

The encoding device encodes the static object and the dynamic object independently of each other, and can superimpose the dynamic object on the world space constituted by the static object. In this case, the dynamic object is composed of one or more SPCs, and each SPC corresponds to one or more SPCs constituting the static object on which the SPC is superimposed. In addition, the dynamic object may not be represented by SPC, but may be represented by more than one VLM or VXL.

Also, the encoding means may encode the static object and the dynamic object as streams different from each other.

The encoding device may generate GOS including one or more SPCs constituting the dynamic object. The encoding device may set GOS (gos_m) including a dynamic object and GOS(s) of a static object corresponding to a spatial region of gos_m to be the same size (occupy the same spatial region). In this way, the superimposition processing can be performed in units of GOS.

The P-SPC or B-SPC constituting the dynamic object may refer to SPC contained in the encoded different GOS. When the positions of the dynamic objects change with time and the same dynamic object is encoded as GOSs at different times, the reference across GOSs is effective from the viewpoint of compression rate.

The first method and the second method may be switched according to the use of the encoded data. For example, in the case where the encoded three-dimensional data is applied as a map, the encoding apparatus adopts the second method because it is desired to separate from the dynamic object. In addition, when the encoding device encodes three-dimensional data of an event such as a concert or a sport, the encoding device adopts the first method if it is not necessary to separate dynamic objects.

The decoding time and the display time of the GOS or SPC can be stored in the encoded data or stored as meta information. Also, the time information of the static objects may all be the same. In this case, the actual decoding time and display time may be determined by the decoding apparatus. Alternatively, different values may be assigned for each GOS or SPC as decoding time, and the same value may be assigned for each display time. As shown in a decoder mode in moving picture encoding such as HRD (Hypothetical Reference Decoder) of HEVC, a decoder has a buffer of a predetermined size, and a model that is not destroyed and can be decoded can be introduced by reading a bit stream at a predetermined bit rate according to the decoding time.

Next, the configuration of GOS in the world space will be described. The coordinates of the three-dimensional space in the world space are represented by three coordinate axes (x-axis, y-axis, z-axis) orthogonal to each other. By setting a predetermined rule in the coding order of GOSs, GOSs that are spatially adjacent can be coded continuously in the coded data. For example, in the example shown in fig. 4, GOS in the xz plane is encoded consecutively. After the encoding of all GOSs in one xz plane is finished, the value of the y-axis is updated. That is, as the encoding continues, world space extends in the y-axis direction. The index number of the GOS is set to the coding order.

Here, the three-dimensional space of the world space corresponds to geographic absolute coordinates such as GPS, latitude, and longitude. Alternatively, the three-dimensional space may be represented by a relative position with respect to a reference position set in advance. Directions of x-axis, y-axis, and z-axis of the three-dimensional space are expressed as direction vectors determined based on latitude, longitude, and the like, and the direction vectors are stored as meta information together with encoded data.

The size of GOS is set to be fixed, and the encoding device stores the size as meta information. The GOS size can be switched depending on whether it is in a city, or whether it is indoor or outdoor, for example. That is, the size of GOS may be switched according to the amount or nature of an object having value as information. Alternatively, the encoding device may appropriately switch the size of the GOS or the interval of the I-SPC in the GOS in the same world space according to the density of the object or the like. For example, the higher the density of the object, the smaller the size of the GOS and the shorter the interval of the I-SPC in the GOS.

In the example of fig. 5, in the region from 3 rd to 10 th GOS, since the density of objects is high, in order to achieve random access with fine granularity, GOS is subdivided. And, the 7 th to 10 th GOSs exist on the back of the 3 rd to 6 th GOSs, respectively.

Next, the configuration of the three-dimensional data encoding device according to the present embodiment and the flow of operations will be described. Fig. 6 is a block diagram of the three-dimensional data encoding device 100 according to the present embodiment. Fig. 7 is a flowchart showing an example of the operation of the three-dimensional data encoding apparatus 100.

The three-dimensional data encoding device 100 shown in fig. 6 encodes the three-dimensional data 111 to generate encoded three-dimensional data 112. The three-dimensional data encoding device 100 includes an obtaining unit 101, an encoding region determining unit 102, a dividing unit 103, and an encoding unit 104.

As shown in fig. 7, first, the obtaining section 101 obtains three-dimensional data 111 as point group data (S101).

Next, the encoding region determining unit 102 determines a region to be encoded from the spatial region corresponding to the obtained point group data (S102). For example, the encoding region determining unit 102 determines a spatial region around a position of a user or a vehicle as a region to be encoded.

Next, the dividing unit 103 divides the point group data included in the region to be encoded into individual processing units. Here, the processing units are the GOS, the SPC, and the like described above. The region to be encoded corresponds to, for example, the world space described above. Specifically, the dividing unit 103 divides the point group data into processing units according to the size of GOS, the presence or absence of a moving object, or the size set in advance (S103). The dividing unit 103 determines the start position of the SPC that starts in the coding order for each GOS.

Next, the encoding unit 104 sequentially encodes a plurality of SPCs in each GOS to generate encoded three-dimensional data 112 (S104).

Here, although an example of encoding each GOS is shown after dividing the region to be encoded into GOS and SPC, the order of processing is not limited to the above. For example, after determining the composition of one GOS, the GOS may be encoded, and after that, the order of the composition of GOS may be determined.

In this way, the three-dimensional data encoding device 100 encodes the three-dimensional data 111 to generate encoded three-dimensional data 112. Specifically, the three-dimensional data encoding device 100 divides three-dimensional data into random access units, that is, into first processing units (GOS) corresponding to three-dimensional coordinates, respectively, divides the first processing units (GOS) into a plurality of second processing units (SPC), and divides the second processing units (SPC) into a plurality of third processing units (VLM). The third processing unit (VLM) includes one or more Voxels (VXL), and the Voxel (VXL) is the minimum unit corresponding to the position information.

Next, the three-dimensional data encoding apparatus 100 generates encoded three-dimensional data 112 by encoding each of a plurality of first processing units (GOS). Specifically, the three-dimensional data encoding device 100 encodes each of the plurality of second processing units (SPC) in each of the first processing units (GOS). The three-dimensional data encoding device 100 encodes each of the plurality of third processing units (VLM) in each of the second processing units (SPC).

For example, when the first processing unit (GOS) of the processing object is a closed GOS, the three-dimensional data encoding device 100 performs encoding with respect to the second processing unit (SPC) of the processing object included in the first processing unit (GOS) of the processing object, with reference to the other second processing units (SPC) included in the first processing unit (GOS) of the processing object. That is, the three-dimensional data encoding device 100 does not refer to the second processing unit (SPC) included in the first processing unit (GOS) different from the first processing unit (GOS) of the processing target.

When the first processing unit (GOS) of the processing object is an open GOS, the second processing unit (SPC) of the processing object included in the first processing unit (GOS) of the processing object is encoded with reference to another second processing unit (SPC) included in the first processing unit (GOS) of the processing object or a second processing unit (SPC) included in a first processing unit (GOS) different from the first processing unit (GOS) of the processing object.

The three-dimensional data encoding device 100 selects one type of the second processing unit (SPC) to be processed from among the first type (I-SPC) which is not referred to the other second processing unit (SPC), the second type (P-SPC) which is referred to the other second processing unit (SPC), and the third type which is referred to the other two second processing units (SPC), and encodes the second processing unit (SPC) to be processed in accordance with the selected type.

Next, the configuration of the three-dimensional data decoding device according to the present embodiment and the flow of operations will be described. Fig. 8 is a block diagram of the three-dimensional data decoding device 200 according to the present embodiment. Fig. 9 is a flowchart showing an operation example of the three-dimensional data decoding apparatus 200.

The three-dimensional data decoding apparatus 200 shown in fig. 8 generates decoded three-dimensional data 212 by decoding the encoded three-dimensional data 211. Here, the encoded three-dimensional data 211 is, for example, the encoded three-dimensional data 112 generated by the three-dimensional data encoding device 100. The three-dimensional data decoding device 200 includes an obtaining unit 201, a decoding start GOS determining unit 202, a decoding SPC determining unit 203, and a decoding unit 204.

First, the obtaining section 201 obtains encoded three-dimensional data 211 (S201). Next, the decoding start GOS determination unit 202 determines a GOS to be decoded (S202). Specifically, the decoding start GOS determination unit 202 refers to meta information stored in the encoded three-dimensional data 211 or in each of the encoded three-dimensional data, and determines GOS including a spatial position, an object, or an SPC corresponding to time at which decoding is started as GOS to be decoded.

Next, the decoding SPC determining unit 203 determines the type (I, P, B) of SPC to decode in the GOS (S203). For example, the decoding SPC determining unit 203 determines whether (1) only I-SPC is decoded, (2) I-SPC and P-SPC are decoded, and (3) all types are decoded. In addition, this step may not be performed in the case where the type of SPC to be decoded is specified in advance, such as decoding all SPCs.

Next, the decoding unit 204 obtains the SPC at the beginning in the decoding order (same as the encoding order) in the GOS, obtains the encoded data of the beginning SPC from the address position at which the encoding of the three-dimensional data 211 starts, and decodes each SPC sequentially from the beginning SPC (S204). The address location is stored in meta information or the like.

In this way, the three-dimensional data decoding apparatus 200 decodes the decoded three-dimensional data 212. Specifically, the three-dimensional data decoding apparatus 200 generates decoded three-dimensional data 212 of a first processing unit (GOS) as a random access unit by decoding each of the encoded three-dimensional data 211 of the first processing unit (GOS) respectively corresponding to the three-dimensional coordinates. More specifically, the three-dimensional data decoding apparatus 200 decodes each of the plurality of second processing units (SPC) at each of the first processing units (GOS). The three-dimensional data decoding device 200 decodes each of the plurality of third processing units (VLM) in each of the second processing units (SPC).

The meta information for random access is described below. The meta information is generated by the three-dimensional data encoding device 100 and included in the encoded three-dimensional data 112 (211).

In the conventional random access of a two-dimensional moving image, decoding starts from the top frame of a random access unit in the vicinity of a predetermined time. However, in world space, random access for (coordinates or objects etc.) is also envisaged in addition to time of day.

Therefore, in order to realize random access to at least three elements, i.e., coordinates, objects, and time, a table is prepared in which index numbers of the respective elements and GOSs are associated. The index number of the GOS is associated with the address of the I-SPC that is the beginning of the GOS. Fig. 10 shows an example of a table included in meta information. In addition, it is not necessary to use all the tables shown in fig. 10, and at least one table may be used.

The following describes random access using coordinates as a starting point, as an example. When accessing the coordinates (x 2, y2, z 2), the location with coordinates (x 2, y2, z 2) is first known to be included in the second GOS by referring to the coordinate-GOS table. Next, referring to the GOS address table, since the address of the first I-SPC in the second GOS is addr (2), the decoding unit 204 obtains data from the address and starts decoding.

The address may be an address in a logical format or a physical address of the HDD or the memory. Instead of the address, information for specifying the file segment may be used. For example, a file segment is a unit obtained by segmenting one or more GOSs and the like.

In addition, when the object spans a plurality of GOSs, the GOS to which the plurality of objects belong may be shown in the object GOS table. If the plurality of GOSs are closed GOSs, the encoding device and the decoding device can perform encoding or decoding in parallel. In addition, if the plurality of GOSs are open GOSs, the plurality of GOSs can be referred to each other, whereby the compression efficiency can be further improved.

Examples of the object are a person, an animal, an automobile, a bicycle, a signal lamp, a building which is a land sign, and the like. For example, the three-dimensional data encoding device 100 can extract characteristic points unique to an object from three-dimensional point cloud data or the like at the time of encoding in the world space, detect the object from the characteristic points, and set the detected object as a random access point.

In this way, the three-dimensional data encoding apparatus 100 generates first information showing a plurality of first processing units (GOSs) and three-dimensional coordinates corresponding to each of the plurality of first processing units (GOSs). And, the encoded three-dimensional data 112 (211) includes the first information. The first information further indicates at least one of an object, a time, and a data storage destination corresponding to each of the plurality of first processing units (GOS).

The three-dimensional data decoding device 200 obtains first information from the encoded three-dimensional data 211, determines the encoded three-dimensional data 211 of the first processing unit corresponding to the specified three-dimensional coordinates, object, or time using the first information, and decodes the encoded three-dimensional data 211.

Examples of other meta information are described below. In addition to the meta information for random access, the three-dimensional data encoding apparatus 100 may generate and store the following meta information. The three-dimensional data decoding device 200 may use the meta information at the time of decoding.

When three-dimensional data is used as map information, a profile (profile) is defined according to the application, and information showing the profile may be included in the meta information. For example, a grade for urban or suburban areas, or a grade for aircraft, and defines the maximum or minimum size of world space, SPC or VLM, respectively, etc. For example, in urban-oriented grades, more detailed information is required than in suburban areas, so the minimum size of the VLM is set smaller.

The meta information may also include a tag value showing the kind of the object. The tag value corresponds to VLM, SPC, or GOS constituting the object. The tag value may be set according to the kind of object, for example, the tag value "0" indicates "person", the tag value "1" indicates "car", and the tag value "2" indicates "signal lamp". Alternatively, when the type of the object is difficult to judge or does not need to be judged, a tag value indicating the size, the nature of the dynamic object or the static object, or the like may be used.

Also, the meta information may include information showing the range of the spatial region occupied by the world space.

The meta information may be stored in the size of the SPC or VXL as header information shared by the entire stream of encoded data or a plurality of SPCs such as the SPC in the GOS.

The meta information may include identification information such as a distance sensor and a camera used for generating the point cloud data, or information showing the positional accuracy of the point group in the point cloud data.

Also, the meta information may include information showing whether the world space is composed of only static objects or contains dynamic objects.

A modification of the present embodiment will be described below.

The encoding device or the decoding device may encode or decode 2 or more SPCs or GOSs different from each other in parallel. The GOS encoded or decoded in parallel can be determined based on meta information or the like showing the spatial position of the GOS.

In the case where three-dimensional data is used as a space map when a vehicle, a flying object, or the like is moving, or such a space map is generated, the encoding device or the decoding device may encode or decode GOS or SPC included in a space determined based on GPS, path information, a zoom magnification, or the like.

The decoding device may sequentially perform decoding from a space near the own position or the travel path. The encoding device or the decoding device may encode or decode a space farther from the own position or the travel path with a lower priority than a space closer to the own position or the travel path. Here, lowering the priority means lowering the processing order, lowering the resolution (post-screening processing), lowering the image quality (improving the coding efficiency, for example, increasing the quantization step size), or the like.

In addition, the decoding apparatus may decode only the lower layer when decoding the coded data coded by the hierarchical layer in the space.

The decoding device may perform decoding from the lower layer according to the zoom level or the application of the map.

In addition, in applications such as self-position estimation and object recognition performed during automatic travel of an automobile or robot, the encoding device or decoding device may perform encoding or decoding by reducing the resolution of an area other than an area (an area for recognition) within a predetermined height from a road surface.

The encoding device may encode the point cloud data representing the indoor and outdoor spatial shapes independently. For example, by separating a GOS representing indoors (indoor GOS) from a GOS representing outdoors (outdoor GOS), the decoding apparatus can select a GOS to be decoded in accordance with a viewpoint position when using encoded data.

The encoding device may encode the indoor GOS and the outdoor GOS that are adjacent to each other in the encoded stream. For example, the encoding device associates the identifiers of both, and stores information showing that the associated identifiers are established in the encoded stream or in the meta information stored separately. Accordingly, the decoding apparatus can identify the indoor GOS and the outdoor GOS that are near in coordinates with reference to the information in the meta information.

The encoding device may switch the sizes of GOS and SPC between indoor GOS and outdoor GOS. For example, the encoding device sets the GOS to a smaller size in the indoor space than in the outdoor space. The encoding device may change the accuracy of extracting the feature points from the point cloud data, the accuracy of object detection, or the like between the indoor GOS and the outdoor GOS.

The encoding device may add information for the decoding device to display the dynamic object separately from the static object to the encoded data. Accordingly, the decoding device can represent a moving object in combination with a red frame, a text for explanation, or the like. Instead of the dynamic object, the decoding device may be represented by a red frame or a text. And, the decoding apparatus may represent more detailed object categories. For example, an automobile may employ a red frame and a person may employ a yellow frame.

The encoding device or decoding device may determine whether to perform encoding or decoding by using the dynamic object and the static object as different SPCs or GOSs according to the frequency of occurrence of the dynamic object, the ratio of the static object to the dynamic object, or the like. For example, if the frequency or proportion of occurrence of the dynamic object exceeds a threshold, the SPC or GOS with which the dynamic object is mixed with the static object is permitted, and if the frequency or proportion of occurrence of the dynamic object does not exceed the threshold, the SPC or GOS with which the dynamic object is mixed with the static object is not permitted.

When a dynamic object is detected not from point cloud data but from two-dimensional image information of a camera, the encoding device may acquire information (frame, text, or the like) for identifying the detection result and the object position, respectively, and encode the information as a part of three-dimensional encoded data. In this case, the decoding apparatus superimposes and displays auxiliary information (frames or characters) indicating the dynamic object on the decoding result of the static object.

The encoding device may change the degree of density of VXL or VLM according to the degree of complexity of the shape of the static object. For example, the more complex the shape of the static object, the more closely the encoding device sets VXL or VLM. The encoding device may determine a quantization step in quantizing the spatial position or color information, or the like, according to the degree of the density of VXL or VLM. For example, the more closely VXL or VLM the encoding device is, the smaller the quantization step size is set.

As described above, the encoding device or decoding device according to the present embodiment performs spatial encoding or decoding in spatial units having coordinate information.

The encoding device and the decoding device encode or decode in units of volume in space. The volume includes the smallest unit, i.e., voxel, corresponding to the position information.

The encoding device and the decoding device encode or decode by associating each element including spatial information such as coordinates, objects, time, and the like with the GOP or associating each element with a table. The decoding device determines coordinates using the values of the selected elements, determines a volume, a voxel, or a space from the coordinates, and decodes the space including the volume or the voxel, or the determined space.

The encoding device determines a volume, voxel, or space selectable by the element by feature point extraction or object recognition, and encodes the volume, voxel, or space as a volume, voxel, or space that can be randomly accessed.

The space is classified into three types, i.e., I-SPC that can be encoded or decoded with the space unit, P-SPC that is encoded or decoded with reference to any one processed space, and B-SPC that is encoded or decoded with reference to any two processed spaces.

More than one volume corresponds to a static object or a dynamic object. The space containing the static objects and the space containing the dynamic objects are encoded or decoded as different GOSs from each other. That is, the SPC containing the static object and the SPC containing the dynamic object are assigned to different GOSs.

The dynamic object is encoded or decoded for each object, and corresponds to one or more spaces including only static objects. That is, the plurality of dynamic objects are encoded, and the resulting encoded data of the plurality of dynamic objects corresponds to the SPC including only the static object.

The encoding device and the decoding device increase the priority of the I-SPC in the GOS to perform encoding or decoding. For example, the encoding device encodes the original three-dimensional data so as to reduce degradation of the I-SPC (after decoding, the original three-dimensional data can be reproduced more faithfully). The decoding device decodes only I-SPC, for example.

The encoding device may encode the world space by changing the frequency of using the I-SPC according to the degree of hydrophobicity or the number (amount) of the objects. That is, the encoding device changes the frequency of selecting the I-SPC according to the number or the degree of the density of the objects included in the three-dimensional data. For example, the higher the density of objects in world space, the higher the frequency of use of I space by the encoding device.

The encoding device sets the random access point in units of GOS, and stores information indicating a spatial region corresponding to GOS in header information.

The encoding means for example adopts a default value as the spatial size of the GOS. The encoding device may change the size of GOS according to the number (amount) or the degree of density of the object or the moving object. For example, the encoding device sets the spatial size of the GOS to be smaller as the object or the dynamic object is denser or the number is larger.

The space or volume includes a feature point group derived from information obtained by a sensor such as a depth sensor, a gyroscope, or a camera. The coordinates of the feature points are set as the center positions of the voxels. Further, the resolution of the voxels can achieve high accuracy of the positional information.

The feature point group is derived using a plurality of pictures. The plurality of pictures have at least two kinds of time information including actual time information and the same time information (for example, encoding time for rate control or the like) in the plurality of pictures corresponding to the space.

The encoding or decoding is performed in units of GOS including one or more spaces.

The encoding device and the decoding device refer to the space in the processed GOS, and predict the P space or the B space in the processed GOS.

Alternatively, the encoding device and the decoding device predict the P space or the B space in the GOS of the processing target by using the processed space in the GOS of the processing target without referring to the different GOSs.

The encoding device and the decoding device transmit or receive the encoded stream in units of world space including one or more GOSs.

The GOS has a layer structure at least in one direction in the world space, and the encoding device and the decoding device perform encoding or decoding from the lower layer. For example, a GOS capable of random access belongs to the lowest layer. The GOSs belonging to the upper layer refers only to GOSs belonging to the same layer or layers below. That is, the GOS is spatially divided in a predetermined direction, and includes a plurality of layers each having one or more SPCs. The encoding device and the decoding device perform encoding or decoding for each SPC by referring to the SPC included in the layer that is the same layer as or lower than the SPC.

The encoding device and the decoding device successively encode or decode GOSs in a world space unit including a plurality of GOSs. The encoding device and the decoding device write or read information showing the order (direction) of encoding or decoding as metadata. That is, the encoded data includes information showing the encoding order of the plurality of GOSs.

The encoding device and the decoding device encode or decode two or more different spaces or GOSs in parallel.

The encoding device and the decoding device encode or decode spatial information (coordinates, size, etc.) of the space or GOS.

The encoding device and the decoding device encode or decode a space or GOS included in a specific space specified based on external information such as GPS, route information, and magnification, which is related to the position and/or the area size of the device.

The encoding device or decoding device performs encoding or decoding by making the priority of the space far from the own position lower than the space near to the own position.

The encoding device sets a direction in world space according to magnification or use, and encodes GOS having a layer structure in the direction. The decoding device performs decoding preferentially from the lower layer for GOS having a layer structure in one direction of the world space, which is set according to the magnification or the use.

The encoding device changes the feature point extraction, the accuracy of object recognition, the size of a spatial region, and the like contained in the indoor and outdoor spaces. However, the encoding device and the decoding device encode or decode the indoor GOS and the outdoor GOS that are close to each other in the world space, and encode or decode these identifiers in correspondence with each other.

(embodiment 2)

When using encoded data of point cloud data for an actual device or service, it is desirable to transmit and receive necessary information according to the purpose in order to suppress network bandwidth. However, since such a function does not exist in the conventional three-dimensional data encoding structure, there is no encoding method therefor.

In the present embodiment, a three-dimensional data encoding method and a three-dimensional data encoding device for providing a function of transmitting and receiving necessary information according to the application among three-dimensional encoded data of point cloud data, and a three-dimensional data decoding method and a three-dimensional data decoding device for decoding the encoded data will be described.

Voxels (VXL) having a feature quantity of a certain or more are defined as Feature Voxels (FVXL), and world space (WLD) constituted by FVXL is defined as sparse world Space (SWLD). Fig. 11 shows a configuration example of a sparse world space and a world space. FGOS, GOS constituted by FVXL; FSPC is SPC composed of FVXL; and FVLM, which is a VLM composed of fvll. The data structures and prediction structures of FGOS, FSPC and FVLM may be the same as GOS, SPC and VLM.

The feature quantity is a feature quantity that expresses three-dimensional position information of VXL or visible light information of VXL position, and particularly, a feature quantity that can be detected in a corner, an edge, or the like of a solid object. Specifically, the feature amount is a three-dimensional feature amount or a feature amount of visible light described below, and may be any feature amount as long as it is a feature amount indicating the position, brightness, color information, or the like of VXL.

As the three-dimensional feature quantity, a SHOT feature quantity (Signature of Histograms of OrienTations: azimuth histogram feature), a PFH feature quantity (Point Feature Histograms: point feature histogram), or a PPF feature quantity (Point Pair Feature: point-to-point feature) is employed.

The SHOT feature amount is obtained by dividing the VXL periphery, calculating the inner product of the reference point and the normal vector of the divided region, and making a histogram. The SHOT feature quantity has the characteristics of high dimension and high feature expressive force.

The PFH feature quantity is obtained by selecting a plurality of 2-point groups near VXL, calculating a normal vector or the like from these 2 points, and making a histogram. The PFH characteristic quantity is a histogram characteristic, so that the PFH characteristic quantity has scull property for a small amount of interference and has the characteristic of high characteristic expressive force.

The PPF feature amount is a feature amount calculated by VXL of 2 points using a normal vector or the like. In this PPF feature, since all VXL is used, the PPF feature has scull properties for masking.

As the feature quantity of the visible light, SIFT (Scale-Invariant Feature Transform: scale invariant feature transform), SURF (Speeded Up Robust Features: acceleration robust feature), HOG (Histogram of Oriented Gradients: directional gradient histogram), or the like using information such as luminance gradient information of an image can be used.

SWLD is generated by calculating the above feature values from VXLs of WLD and extracting FVXL. Here, the SWLD may be updated every time the WLD is updated, or may be updated periodically after a predetermined time has elapsed, regardless of the update timing of the WLD.

The SWLD may be generated per feature quantity. For example, as shown by SWLD1 based on the SHOT feature quantity and SWLD2 based on the SIFT feature quantity, SWLD may be generated separately for each feature quantity, and used differently according to purposes. The calculated feature values of the FVXLs may be held in the FVXLs as feature value information.

Next, a method of using the sparse world Space (SWLD) will be described. Since SWLD contains only Feature Voxels (FVXL), the data size is generally smaller compared to WLD that includes all VXL.

In an application in which a certain object is achieved by using the feature quantity, by using the information of the SWLD instead of the WLD, it is possible to suppress the read time from the hard disk, and to suppress the frequency band and the transmission time at the time of network transmission. For example, by holding WLD and SWLD in advance as map information to a server and switching the map information to be transmitted to WLD or SWLD according to a demand from a client, network bandwidth and transmission time can be suppressed. Specific examples are shown below.

Fig. 12 and 13 show use examples of the SWLD and WLD. As shown in fig. 12, when the client 1 as the in-vehicle apparatus needs map information for use in its own position determination, the client 1 transmits a request for acquiring map data for its own position estimation to the server (S301). The server transmits the SWLD to the client 1 according to the acquisition demand (S302). The client 1 performs its own position determination using the received SWLD (S303). At this time, the client 1 acquires VXL information of the periphery of the client 1 by various methods such as a range finder equidistant sensor, a stereo camera, or a combination of a plurality of monocular cameras, and estimates its own position information from the acquired VXL information and SWLD. Here, the own position information includes three-dimensional position information, orientation, and the like of the client 1.

As shown in fig. 13, when the client 2 as the in-vehicle apparatus needs map information for use in map drawing such as a three-dimensional map, the client 2 transmits a request for acquiring map data for map drawing to the server (S311). The server transmits WLD to the client 2 according to the acquisition demand (S312). The client 2 performs map drawing using the received WLD (S313). At this time, the client 2 creates a conceptual image using, for example, an image captured by itself with a visible light camera or the like and WLD acquired from the server, and draws the created image on a screen such as a car navigation.

As described above, the server transmits the SWLD to the client in the application where the characteristic amount of each VXL is mainly required for the self-position estimation, and transmits the WLD to the client when detailed VXL information is required like the map drawing. Accordingly, map data can be efficiently transmitted and received.

In addition, the client can determine which of the SWLD and WLD is required by itself and request transmission of the SWLD or WLD from the server. The server may determine which SWLD or WLD should be transmitted according to the status of the client or the network.

Next, a method of switching transmission and reception of a sparse world Space (SWLD) and a world space (WLD) will be described.

The reception of WLD or SWLD may be switched according to the network bandwidth. Fig. 14 shows an example of the operation in this case. For example, when a low-speed network capable of using network bandwidth in an LTE (Long Term Evolution: long term evolution) environment or the like is used, a client accesses a server via the low-speed network (S321), and acquires SWLD as map information from the server (S322). When a high-speed network such as a WiFi environment, in which there is a margin in network bandwidth, is used, the client accesses the server via the high-speed network (S323), and acquires the WLD from the server (S324). Accordingly, the client can acquire appropriate map information according to the network bandwidth of the client.

Specifically, the client receives the SWLD via LTE outdoors, and when entering into a room such as a facility, acquires the WLD via WiFi. Accordingly, the client can acquire more detailed map information in the room.

In this way, the client can request WLD or SWLD from the server according to the frequency band of the network used by itself. Alternatively, the client may transmit information showing the frequency band of the network used by itself to the server, and the server transmits appropriate data (WLD or SWLD) to the client according to the information. Alternatively, the server may determine the network bandwidth of the client to which to send the appropriate data (WLD or SWLD).

The reception of WLD or SWLD may be switched according to the moving speed. Fig. 15 shows an example of the operation in this case. For example, in the case where the client moves at a high speed (S331), the client receives the SWLD from the server (S332). In addition, in the case where the client moves at a low speed (S333), the client receives the WLD from the server (S334). Accordingly, the client can not only suppress the network bandwidth, but also obtain the map information according to the speed. Specifically, the client can update the map information at an appropriate speed by receiving the SWLD having a small data amount during the highway driving. Further, when the client runs on a general road, more detailed map information can be acquired by receiving the WLD.

In this way, the client can request WLD or SWLD from the server according to its own moving speed. Alternatively, the client may send information showing the speed of movement itself to the server, which sends appropriate data (WLD or SWLD) to the client in accordance with the information. Alternatively, the server may determine the movement speed of the client and send appropriate data (WLD or SWLD) to the client.

The client may acquire the SWLD from the server and then acquire the WLD of the important area. For example, when acquiring map data, a client first acquires map information of a general map by using a SWLD, screens a region from which features such as a building, a logo, a person, and the like appear more, and then acquires the WLD of the screened region. Accordingly, the client can acquire detailed information of a required area while suppressing the amount of received data from the server.

The server may create SWLD for each object based on WLD, and the client may receive the SWLD for each application. Accordingly, the network bandwidth can be suppressed. For example, the server identifies a person or a car in advance from the WLD, and creates a SWLD of the person and a SWLD of the car. The client receives the SWLD of the person when it wants to acquire information of surrounding persons, and receives the SWLD of the vehicle when it wants to acquire information of the vehicle. The type of SWLD can be distinguished from information (such as a flag and a type) attached to the head.

Next, the configuration of the three-dimensional data encoding device (e.g., server) and the flow of operations according to the present embodiment will be described. Fig. 16 is a block diagram of a three-dimensional data encoding device 400 according to the present embodiment. Fig. 17 is a flowchart of the three-dimensional data encoding process performed by the three-dimensional data encoding device 400.

The three-dimensional data encoding device 400 shown in fig. 16 encodes the input three-dimensional data 411 to generate encoded three-

dimensional data

413 and 414 as an encoded stream. Here, encoded three-dimensional data 413 is encoded three-dimensional data corresponding to WLD, and encoded three-dimensional data 414 is encoded three-dimensional data corresponding to SWLD. The three-dimensional data encoding device 400 includes: an obtaining unit 401, a coding region determining unit 402, a SWLD extracting unit 403, a WLD coding unit 404, and a SWLD coding unit 405.

As shown in fig. 17, first, the obtaining section 401 obtains input three-dimensional data 411 as point group data in a three-dimensional space (S401).

Next, the encoding region determining unit 402 determines a spatial region to be encoded based on the spatial region in which the point group data exists (S402).

Next, SWLD extraction unit 403 defines a spatial region to be encoded as WLD, and calculates a feature amount from VXLs included in WLD. Then, the SWLD extraction unit 403 extracts VXL having a feature value equal to or greater than a predetermined threshold value, defines the extracted VXL as FVXL, and adds the FVXL to the SWLD to generate extracted three-dimensional data 412 (S403). That is, the extracted three-dimensional data 412 having a feature amount equal to or greater than the threshold value is extracted from the input three-dimensional data 411.

Next, the WLD encoding unit 404 encodes the input three-dimensional data 411 corresponding to WLD, thereby generating encoded three-dimensional data 413 corresponding to WLD (S404). At this time, WLD encoding unit 404 adds information for distinguishing that encoded three-dimensional data 413 is a stream including WLD to the header of encoded three-dimensional data 413.

Then, the SWLD encoding unit 405 encodes the extracted three-dimensional data 412 corresponding to the SWLD, thereby generating encoded three-dimensional data 414 corresponding to the SWLD (S405). At this time, the SWLD encoding section 405 adds information for distinguishing that the encoded three-dimensional data 414 is a stream including SWLD to the header of the encoded three-dimensional data 414.

The processing order of the processing for generating the encoded three-dimensional data 413 and the processing for generating the encoded three-dimensional data 414 may be reversed from the above. Also, some or all of the above-described processing may be executed in parallel.

The information given to the header of the encoded three-

dimensional data

413 and 414 is defined as a parameter such as "world_type", for example. In the case of world_type=0, the stream is indicated as containing WLD, and in the case of world_type=1, the stream is indicated as containing SWLD. In the case of defining other more categories, the assigned value may be increased as with the world_type=2. A specific flag may be included in one of the encoded three-

dimensional data

413 and 414. For example, the encoded three-dimensional data 414 may be given a flag that includes a representation that the stream contains a SWLD. In this case, the decoding apparatus can determine whether the stream includes the WLD or the SWLD based on the presence or absence of the flag.

The encoding method used by WLD encoding unit 404 for encoding WLD may be different from the encoding method used by SWLD encoding unit 405 for encoding SWLD.

For example, since SWLD data is decimated, correlation with surrounding data may become lower as compared with WLD. Therefore, in the encoding method for SWLD, inter prediction among intra prediction and inter prediction is prioritized as compared to the encoding method for WLD.

In addition, the encoding method for the SWLD may be different from the encoding method for the WLD in terms of the three-dimensional position expression method. For example, the three-dimensional position of FVXL may be expressed by three-dimensional coordinates at FWLD, and the three-dimensional position may be expressed by octree described later at WLD, and vice versa.

The SWLD encoding unit 405 encodes the three-dimensional data 414 of the SWLD so that the data size of the encoded three-dimensional data 413 of the WLD is smaller than the data size of the encoded three-dimensional data. For example, as described above, SWLD may have reduced correlation between data as compared to WLD. Accordingly, the encoding efficiency may be reduced, and the data size of the encoded three-dimensional data 414 may be larger than the data size of the encoded three-dimensional data 413 of WLD. Therefore, when the data size of the obtained encoded three-dimensional data 414 is larger than the data size of the encoded three-dimensional data 413 of WLD, the SWLD encoding unit 405 re-encodes the encoded three-dimensional data 414, thereby re-generating encoded three-dimensional data 414 having a reduced data size.

For example, the SWLD extraction section 403 again generates the extracted three-dimensional data 412 in which the number of extracted feature points is reduced, and the SWLD encoding section 405 encodes the extracted three-dimensional data 412. Alternatively, the quantization level in the SWLD encoding section 405 may be roughened. For example, in the octree structure described later, rounding of the data of the lowest layer can roughen the quantization level.

When the data size of the SWLD encoded three-dimensional data 414 cannot be made smaller than the data size of the WLD encoded three-dimensional data 413, the SWLD encoding unit 405 may not generate the SWLD encoded three-dimensional data 414. Alternatively, the encoded three-dimensional data 413 of WLD may be copied to the encoded three-dimensional data 414 of SWLD. That is, as the encoded three-dimensional data 414 of the SWLD, the encoded three-dimensional data 413 of the WLD may be directly used.

Next, the configuration of the three-dimensional data decoding device (e.g., client) and the flow of operations according to the present embodiment will be described. Fig. 18 is a block diagram of a three-dimensional data decoding device 500 according to the present embodiment. Fig. 19 is a flowchart of the three-dimensional data decoding process performed by the three-dimensional data decoding device 500.

The three-dimensional data decoding apparatus 500 shown in fig. 18 generates decoded three-

dimensional data

512 or 513 by decoding the encoded three-dimensional data 511. Here, the encoded three-dimensional data 511 is, for example, encoded three-

dimensional data

413 or 414 generated by the three-dimensional data encoding device 400.

The three-dimensional data decoding device 500 includes: an obtaining unit 501, a header analyzing unit 502, a WLD decoding unit 503, and a SWLD decoding unit 504.

As shown in fig. 19, first, the obtaining section 501 obtains encoded three-dimensional data 511 (S501). Next, the header analysis unit 502 analyzes the header of the encoded three-dimensional data 511, and determines whether the encoded three-dimensional data 511 is a stream including WLD or a stream including SWLD (S502). For example, the discrimination is performed with reference to the above-described parameters of the world_type.

If the encoded three-dimensional data 511 is a stream including WLD (yes in S503), the WLD decoding unit 503 decodes the encoded three-dimensional data 511 to generate decoded three-dimensional data 512 of WLD (S504). In addition, when the encoded three-dimensional data 511 is a stream including SWLD (no in S503), the SWLD decoder 504 decodes the encoded three-dimensional data 511 to generate decoded three-dimensional data 513 of SWLD (S505).

In addition, as in the encoding device, the decoding method used by the WLD decoding unit 503 to decode WLD may be different from the decoding method used by the SWLD decoding unit 504 to decode SWLD. For example, in the decoding method for SWLD, inter prediction in intra prediction and inter prediction may be prioritized over the decoding method for WLD.

Also, in the decoding method for SWLD and the decoding method for WLD, the expression method of the three-dimensional position may be different. For example, in SWLD, the three-dimensional position of FVXL can be expressed by three-dimensional coordinates, in WLD, the three-dimensional position can be expressed by octree described later, and vice versa.

Next, octree representation as a representation method of three-dimensional positions will be described. VXL data contained in the three-dimensional data is converted into an octree structure and then encoded. Fig. 20 shows an example of VXL of WLD. Fig. 21 shows the octree structure of the WLD shown in fig. 20. In the example shown in fig. 20, three VXLs 1 to 3 are present as VXLs including dot groups (hereinafter, effective VXLs). As shown in fig. 21, the octree structure is composed of nodes and leaves. Each node has a maximum of 8 nodes or leaves. Each leaf has VXL information. Here, among the leaves shown in fig. 21, leaves 1, 2, and 3 represent VXL1, VXL2, and VXL3 shown in fig. 20, respectively.

Specifically, each node and leaf corresponds to a three-dimensional position. Node 1 corresponds to all the blocks shown in fig. 20. The block corresponding to the node 1 is divided into 8 blocks, and among the 8 blocks, the block including the valid VXL is set as the node, and the other blocks are set as the leaves. The blocks corresponding to the nodes are further divided into 8 nodes or leaves, and this process is repeated as many times as the number of levels in the tree structure. And, the lowest-level blocks are all set as leaves.

Fig. 22 shows an example of SWLD generated from WLD shown in fig. 20. The results of feature quantity extraction of VXL1 and VXL2 shown in fig. 20 are determined as FVXL1 and FVXL2, and added to SWLD. In addition, VXL3 is not judged to be FVXL and is therefore not included in SWLD. Fig. 23 shows an octree structure of the SWLD shown in fig. 22. In the octree structure shown in fig. 23, leaf 3 corresponding to VXL3 shown in fig. 21 is deleted. Accordingly, node 3 shown in fig. 21 has no valid VXL and is changed to a leaf. Thus, in general, SWLD has fewer leaves than WLD, and SWLD has smaller encoded three-dimensional data than WLD.

A modification of the present embodiment will be described below.

For example, when a client such as an in-vehicle device performs its own position estimation, it may receive the SWLD from the server, perform its own position estimation by the SWLD, and perform obstacle detection by various methods such as a range finder equidistant sensor, a stereo camera, or a combination of a plurality of single-eye cameras, based on three-dimensional information of its own periphery.

In general, it is difficult to include VXL data in a flat area in the SWLD. To this end, the server maintains a downsampled world space (subWLD) down-sampled for WLD for detection of stationary obstructions, and may send the SWLD and subWLD to the client. Accordingly, the client side can perform own position estimation and obstacle detection while suppressing the network bandwidth.

In addition, when the client terminal rapidly draws three-dimensional map data, the map information is in a grid structure, so that the situation is convenient. The server may then generate a grid from the WLD, held in advance as a grid world space (MWLD). For example, when a client needs to perform rough three-dimensional drawing, the MWLD is received, and when a detailed three-dimensional drawing is needed, the WLD is received. Accordingly, the network bandwidth can be suppressed.

Further, the server sets VXL having a feature value equal to or greater than the threshold value as FVXL from among the VXLs, but may calculate FVXL by a different method. For example, if the server determines VXL, VLM, SPC or GOS constituting a signal, a junction, or the like is required for self-position estimation, driving assistance, automatic driving, or the like, it may be included in the SWLD as FVXL, FVLM, FSPC, FGOS. The determination may be performed manually. The FVXL obtained by the above method may be added to FVXL or the like set based on the feature amount. That is, the SWLD extraction unit 403 may further extract data corresponding to an object having a predetermined attribute as the extracted three-dimensional data 412 from the input three-dimensional data 411.

Further, a label different from the feature amount may be given to a situation where it is required for these purposes. The server may hold FVXL, which is required for self-position estimation such as a signal or an intersection, driving assistance, or automatic driving, as an upper layer (for example, a lane world space) of the SWLD.

The server may attach the attribute to VXL in WLD in a random access unit or a predetermined unit. Attributes include, for example: information required or not required for the self-position estimation, or information indicating whether traffic information such as a signal or an intersection is important, or the like. The attribute may include a correspondence relationship with Feature (intersection, road, or the like) in lane information (GDF, geographic Data Files, or the like).

Further, as a method for updating WLD or SWLD, the following method can be adopted.

Update information showing changes in people, construction, or street trees (track-oriented), etc. is loaded as point clusters or metadata to the server. The server updates the WLD according to the load, after which the SWLD is updated with the updated WLD.

In addition, when the client detects a mismatch between the three-dimensional information generated by itself and the three-dimensional information received from the server at the time of the self-position estimation, the three-dimensional information generated by itself may be transmitted to the server together with the update notification. In this case, the server updates the SWLD with the WLD. In the case where the SWLD is not updated, the server judges that the WLD itself is old.

Further, as header information of the encoded stream, although information for distinguishing WLD from SWLD is added, for example, when there are a plurality of world spaces such as a grid world space or a lane world space, information for distinguishing them may be added to the header information. In addition, when a plurality of SWLD's having different feature amounts exist, information for distinguishing them from each other may be added to the header information.

The SWLD is constituted by FVXL, but may include VXL which is not determined to be FVXL. For example, the SWLD may include adjacent VXL used in calculating the characteristic amount of FVXL. Accordingly, even when no feature amount information is added to each FVXL of the SWLD, the client can calculate the feature amount of the FVXL when receiving the SWLD. In addition, at this time, the SWLD may include information for distinguishing whether each VXL is FVXL or VXL.

As described above, the three-dimensional data encoding device 400 extracts the extracted three-dimensional data 412 (the 2 nd three-dimensional data) having the feature amount equal to or larger than the threshold value from the input three-dimensional data 411 (the 1 st three-dimensional data), and encodes the extracted three-dimensional data 412 to generate the encoded three-dimensional data 414 (the 1 st encoded three-dimensional data).

Accordingly, the three-dimensional data encoding device 400 generates encoded three-dimensional data 414 obtained by encoding data having a feature value equal to or greater than the threshold value. In this way, the amount of data can be reduced as compared with the case where the input three-dimensional data 411 is directly encoded. Therefore, the three-dimensional data encoding apparatus 400 can reduce the amount of data at the time of transmission.

The three-dimensional data encoding device 400 further encodes the input three-dimensional data 411 to generate encoded three-dimensional data 413 (2 nd encoded three-dimensional data).

Accordingly, the three-dimensional data encoding device 400 can selectively transmit the encoded three-dimensional data 413 and the encoded three-dimensional data 414, for example, according to the use application or the like.

The extracted three-dimensional data 412 is encoded by the 1 st encoding method, and the input three-dimensional data 411 is encoded by a 2 nd encoding method different from the 1 st encoding method.

Accordingly, the three-dimensional data encoding device 400 can employ an appropriate encoding method for each of the input three-dimensional data 411 and the extracted three-dimensional data 412.

In the 1 st coding method, inter prediction among intra prediction and inter prediction is prioritized over the 2 nd coding method.

Accordingly, the three-dimensional data encoding device 400 can extract the three-dimensional data 412, in which correlation between adjacent data is likely to be low, and can increase the priority of inter-frame prediction.

The 1 st encoding method and the 2 nd encoding method are different from each other in terms of the representation of the three-dimensional position. For example, in the 2 nd encoding method, the three-dimensional position is represented by octree, and in the 1 st encoding method, the three-dimensional position is represented by three-dimensional coordinates.

Accordingly, the three-dimensional data encoding device 400 can employ a more appropriate three-dimensional position representation technique for three-dimensional data having different data numbers (the number of VXL or FVXL).

At least one of the encoded three-

dimensional data

413 and 414 includes an identifier indicating whether the encoded three-dimensional data is encoded by encoding the input three-dimensional data 411 or encoded by encoding a part of the input three-dimensional data 411. That is, the identifier shows whether the encoded three-dimensional data is the encoded three-dimensional data 413 of WLD or the encoded three-dimensional data 414 of SWLD.

Accordingly, the decoding apparatus can easily determine whether the acquired encoded three-dimensional data is the encoded three-dimensional data 413 or the encoded three-dimensional data 414.

The three-dimensional data encoding device 400 encodes the extracted three-dimensional data 412 so that the amount of data to be encoded of the three-dimensional data 414 is smaller than the amount of data to be encoded of the three-dimensional data 413.

Accordingly, the three-dimensional data encoding device 400 can reduce the amount of data of the encoded three-dimensional data 414 compared with the amount of data of the encoded three-dimensional data 413.

The three-dimensional data encoding device 400 extracts data corresponding to an object having a predetermined attribute from the input three-dimensional data 411 as extracted three-dimensional data 412. For example, an object having a predetermined attribute is an object required for self-position estimation, driving assistance, automatic driving, or the like, and is a signal, an intersection, or the like.

Accordingly, the three-dimensional data encoding device 400 can generate encoded three-dimensional data 414 including data required by the decoding device.

The three-dimensional data encoding device 400 (server) further transmits one of the encoded three-

dimensional data

413 and 414 to the client according to the state of the client.

Accordingly, the three-dimensional data encoding apparatus 400 can transmit appropriate data according to the state of the client.

And, the state of the client includes a communication condition (e.g., network bandwidth) of the client or a moving speed of the client.

The three-dimensional data encoding device 400 further transmits one of the encoded three-

dimensional data

413 and 414 to the client according to the request of the client.

Accordingly, the three-dimensional data encoding apparatus 400 can transmit appropriate data according to a request from a client.

The three-dimensional data decoding device 500 according to the present embodiment decodes the encoded three-

dimensional data

413 or 414 generated by the three-dimensional data encoding device 400.

That is, the three-dimensional data decoding device 500 decodes the encoded three-dimensional data 414 obtained by encoding the extracted three-dimensional data 412 having the feature amount extracted from the input three-dimensional data 411 equal to or larger than the threshold value by the 1 st decoding method. The three-dimensional data decoding device 500 decodes the encoded three-dimensional data 413 obtained by encoding the input three-dimensional data 411 by a 2 nd decoding method different from the 1 st decoding method.

Accordingly, the three-dimensional data decoding device 500 selectively receives the encoded three-dimensional data 414 and the encoded three-dimensional data 413, which are obtained by encoding data having a feature value equal to or greater than the threshold value, for example, according to the use application or the like. Accordingly, the three-dimensional data decoding apparatus 500 can reduce the amount of data at the time of transmission. The three-dimensional data decoding device 500 can employ an appropriate decoding method for each of the input three-dimensional data 411 and the extracted three-dimensional data 412.

In the 1 st decoding method, inter prediction among intra prediction and inter prediction is prioritized over the 2 nd decoding method.

Accordingly, the three-dimensional data decoding apparatus 500 can increase the priority of inter-frame prediction for the extracted three-dimensional data in which the correlation between adjacent data is likely to be low.

The 1 st decoding method and the 2 nd decoding method are different from each other in terms of representation of three-dimensional positions. For example, the three-dimensional position is represented by octree in the 2 nd decoding method, and the three-dimensional position is represented by three-dimensional coordinates in the 1 st decoding method.

Accordingly, the three-dimensional data decoding device 500 can employ a more appropriate three-dimensional position representation technique for three-dimensional data having different data numbers (the number of VXL or FVXL).

At least one of the encoded three-

dimensional data

413 and 414 includes an identifier indicating whether the encoded three-dimensional data is encoded by encoding the input three-dimensional data 411 or encoded by encoding a part of the input three-dimensional data 411. The three-dimensional data decoding apparatus 500 refers to the identifier to identify the encoded three-

dimensional data

413 and 414.

Accordingly, the three-dimensional data decoding apparatus 500 can easily determine whether the obtained encoded three-dimensional data is the encoded three-dimensional data 413 or the encoded three-dimensional data 414.

The three-dimensional data decoding device 500 further notifies the server of the state of the client (three-dimensional data decoding device 500). The three-dimensional data decoding device 500 receives one of the encoded three-

dimensional data

413 and 414 transmitted from the server in accordance with the state of the client.

Accordingly, the three-dimensional data decoding apparatus 500 can receive appropriate data according to the state of the client.

The three-dimensional data decoding device 500 further requests one of the encoded three-

dimensional data

413 and 414 from the server, and receives one of the encoded three-

dimensional data

413 and 414 transmitted from the server in response to the request.

Accordingly, the three-dimensional data decoding device 500 can receive appropriate data according to the application.

Although the three-dimensional data encoding device and the three-dimensional data decoding device according to the embodiments of the present application have been described above, the present application is not limited to these embodiments.

The processing units included in the three-dimensional data encoding device or the three-dimensional data decoding device according to the above embodiment are typically realized as LSIs of integrated circuits. These may be formed as a single chip, or some or all of them may be formed as a single chip.

The integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. After LSI fabrication, a programmable FPGA (Field Programmable Gate Array: field programmable gate array) or a reconfigurable processor capable of reconfiguring connection or setting of circuit cells inside the LSI may be used.

In the above embodiments, each component may be configured by dedicated hardware, or may be implemented by executing a software program suitable for each component. Each component is realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.

Further, the present application may be implemented as a three-dimensional data encoding method or a three-dimensional data decoding method performed by a three-dimensional data encoding apparatus or a three-dimensional data decoding apparatus.

In addition, the division of the functional blocks in the block diagrams is an example, and a plurality of functional blocks may be realized as one functional block, or one functional block may be divided into a plurality of functional blocks, or a part of the functions may be transferred to other functional blocks. Also, functions of a plurality of functional blocks having similar functions may be processed in parallel by a single hardware or software or time-division.

The order in which the steps in the flowchart are executed is an example listed for the purpose of specifically explaining the present application, and may be other than the above. Further, some of the steps described above may be performed simultaneously (in parallel) with other steps.

The three-dimensional data encoding device and the three-dimensional data decoding device according to one or more aspects have been described above based on embodiments, and the present application is not limited to these embodiments. Various modifications which can be conceived by those skilled in the art are included in the present embodiment and the configuration in which the constituent elements in the different embodiments are combined, within the scope of one or more configurations, without departing from the spirit of the present application.

Industrial applicability

The present application is applicable to a three-dimensional data encoding device and a three-dimensional data decoding device.

Symbol description

100. 400 three-dimensional data coding device

101. 201, 401, 501 obtaining part

102. 402 coding region determining unit

10 3 dividing part

104. Coding unit

111. Three-dimensional data

112. 211, 413, 414, 511 encoded three-dimensional data

200. 500 three-dimensional data decoding device

202. Decoding start GOS determination unit

203. Decoding SPC decision unit

204. Decoding unit

212. 512, 513 decodes three-dimensional data

403 SWLD extraction part

404 WLD coding unit

405 SWLD coding part

411. Inputting three-dimensional data

412. Extracting three-dimensional data

502. Head analysis unit

503 WLD decoding unit

504 SWLD decoding part

Claims

1. A method of three-dimensional data encoding, comprising: an extraction step of extracting 2 nd three-dimensional data having a feature value equal to or greater than a threshold value from the 1 st three-dimensional data; a 1 st encoding step of generating 1 st encoded three-dimensional data by encoding the 2 nd three-dimensional data; and a 2 nd encoding step of generating 2 nd encoded three-dimensional data by encoding the 1 st three-dimensional data, wherein in the extracting step, data corresponding to an object having a predetermined attribute is further extracted as the 2 nd three-dimensional data from the 1 st three-dimensional data, the 1 st three-dimensional data is 1 st point cloud data, the 2 nd three-dimensional data is 2 nd point cloud data, the 1 st encoded three-dimensional data is 1 st encoded point cloud data, the 2 nd encoded three-dimensional data is 2 nd encoded point cloud data, and the feature amount is a feature amount based on three-dimensional position information or visible light information.

2. The three-dimensional data encoding method of claim 1, the 2 nd three-dimensional data being encoded by a 1 st encoding method, the 1 st three-dimensional data being encoded by a 2 nd encoding method different from the 1 st encoding method.

3. The three-dimensional data encoding method according to claim 2, wherein in the 1 st encoding method, inter prediction in intra prediction and inter prediction is prioritized as compared with the 2 nd encoding method.

4. The three-dimensional data encoding method according to claim 2, wherein the 1 st encoding method and the 2 nd encoding method differ in a representation method of a three-dimensional position.

5. The three-dimensional data encoding method according to claim 1, wherein at least one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data includes an identifier indicating whether the encoded three-dimensional data is encoded by encoding the 1 st three-dimensional data or encoded three-dimensional data obtained by encoding a part of the 1 st three-dimensional data.

6. The three-dimensional data encoding method according to claim 1, wherein in the 1 st encoding step, the 2 nd three-dimensional data is encoded so that a data amount of the 1 st encoded three-dimensional data is smaller than a data amount of the 2 nd encoded three-dimensional data.

7. The three-dimensional data encoding method according to claim 1, further comprising a transmission step of transmitting one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data to a client in accordance with a state of the client.

8. The three-dimensional data encoding method of claim 7, wherein the state of the client includes a communication condition of the client or a moving speed of the client.

9. The three-dimensional data encoding method according to claim 1, further comprising a transmission step of transmitting one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data to a client according to a request of the client.

10. A method of three-dimensional data decoding, comprising: a 1 st decoding step of decoding 1 st encoded three-dimensional data obtained by encoding 2 nd three-dimensional data, which is data corresponding to an object having a predetermined attribute and whose feature amount extracted from the 1 st three-dimensional data is equal to or greater than a threshold value, by a 1 st decoding method; and a 2 nd decoding step of decoding, in a 2 nd decoding method different from the 1 st decoding method, the 2 nd encoded three-dimensional data obtained by encoding the 1 st three-dimensional data, the 1 st three-dimensional data being 1 st point cloud data, the 2 nd three-dimensional data being 2 nd point cloud data, the 1 st encoded three-dimensional data being 1 st encoded point cloud data, the 2 nd encoded three-dimensional data being 2 nd encoded point cloud data, the feature amount being a feature amount based on three-dimensional position information or visible light information.

11. The three-dimensional data decoding method according to claim 10, wherein in the 1 st decoding method, inter prediction among intra prediction and inter prediction is prioritized compared to the 2 nd decoding method.

12. The three-dimensional data decoding method according to claim 10, wherein the 1 st decoding method and the 2 nd decoding method differ in a representation method of a three-dimensional position.

13. The three-dimensional data decoding method according to claim 10, wherein at least one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data includes an identifier indicating whether the encoded three-dimensional data is encoded by encoding the 1 st three-dimensional data or encoded three-dimensional data obtained by encoding a part of the 1 st three-dimensional data, and the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data are identified with reference to the identifier.

14. The three-dimensional data decoding method of claim 10, the three-dimensional data decoding method further comprising: a notification step of notifying a server of the state of the client; and a receiving step of receiving one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data transmitted from the server in accordance with the state of the client.

15. The three-dimensional data decoding method of claim 14, wherein the state of the client includes a communication condition of the client or a moving speed of the client.

16. The three-dimensional data decoding method of claim 10, the three-dimensional data decoding method further comprising: a request step of requesting a server for one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data; and a receiving step of receiving one of the 1 st encoded three-dimensional data and the 2 nd encoded three-dimensional data transmitted from the server in accordance with the request.

17. A three-dimensional data encoding device is provided with: an extraction unit that extracts, from the 1 st three-dimensional data, the 2 nd three-dimensional data having a feature value equal to or greater than a threshold value; a 1 st encoding unit configured to encode the 2 nd three-dimensional data to generate 1 st encoded three-dimensional data; and a 2 nd encoding unit configured to generate 2 nd encoded three-dimensional data by encoding the 1 st three-dimensional data, wherein the extracting unit further extracts data corresponding to an object having a predetermined attribute from the 1 st three-dimensional data as the 2 nd three-dimensional data, the 1 st three-dimensional data is 1 st point cloud data, the 2 nd three-dimensional data is 2 nd point cloud data, the 1 st encoded three-dimensional data is 1 st encoded point cloud data, the 2 nd encoded three-dimensional data is 2 nd encoded point cloud data, and the feature amount is a feature amount based on three-dimensional position information or visible light information.

18. A three-dimensional data decoding device is provided with: a 1 st decoding unit that decodes 1 st encoded three-dimensional data obtained by encoding 2 nd three-dimensional data, which is data that is extracted from 1 st three-dimensional data, has a feature value equal to or greater than a threshold value and corresponds to an object having a predetermined attribute, by a 1 st decoding method; and a 2 nd decoding unit configured to decode 2 nd encoded three-dimensional data obtained by encoding the 1 st three-dimensional data by a 2 nd decoding method different from the 1 st decoding method, the 1 st three-dimensional data being 1 st point cloud data, the 2 nd three-dimensional data being 2 nd point cloud data, the 1 st encoded three-dimensional data being 1 st encoded point cloud data, the 2 nd encoded three-dimensional data being 2 nd encoded point cloud data, and the feature amount being a feature amount based on three-dimensional position information or visible light information.