WO2021172040A1

WO2021172040A1 - Information processing device and method

Info

Publication number: WO2021172040A1
Application number: PCT/JP2021/005168
Authority: WO
Inventors: 塚越　郁夫
Original assignee: ソニーグループ株式会社
Priority date: 2020-02-28
Filing date: 2021-02-12
Publication date: 2021-09-02

Abstract

The present disclosure relates to an information processing device and method which make it possible to suppress an increase in the load of haptics data transmission. Haptics data detected at an observation point of a haptics device serving as an interface is mapped to pixels of a two-dimensional image, and the two-dimensional image obtained by the mapping of the haptics data is encoded to generate encoded data. Further, the encoded data is decoded, the two-dimensional image obtained by mapping the haptics data detected at the observation point of the haptics device serving as an interface is generated, and the haptics data is extracted from the generated two-dimensional image. The present disclosure may be applicable, for example, to information processing systems, information processing devices, communication devices, encoding devices, decoding devices, electronic apparatus, information processing methods, or programs and the like.

Description

Information processing equipment and methods

The present disclosure relates to an information processing device and a method, and more particularly to an information processing device and a method capable of suppressing an increase in a load of haptics data transmission.

Conventionally, a system for remote control by transmitting force sense data, tactile sense data, etc. has been considered (see, for example, Non-Patent Document 1 and Non-Patent Document 2).

In such a system, as a method of sharing haptics data such as position information, force information, and tactile information between places separated from each other, for example, data from force sense / tactile sensor is shown to show the connection relationship between them. A method of transmitting with data was considered.

However, in the case of such a transmission method, there is a risk that the transmission load may increase, such as complicated processing for transmission and an increase in the amount of data to be transmitted.

This disclosure is made in view of such a situation, and makes it possible to suppress an increase in the load of haptics data transmission.

The information processing device of one aspect of the present technology includes a pixel mapping unit that maps haptics data detected at an observation point of an interface haptics device to pixels of a two-dimensional image, and the haptics by the pixel mapping unit. It is an information processing apparatus including a coding unit that encodes the two-dimensional image to which data is mapped and generates coded data.

The information processing method of one aspect of the present technology maps the haptics data detected at the observation point of the haptics device serving as an interface to the pixels of the two-dimensional image, and the two-dimensional image to which the haptics data is mapped. Is an information processing method that encodes the data and generates encoded data.

The information processing device of the other aspect of the present technology decodes the encoded data and generates a two-dimensional image to which the haptics data detected at the observation point of the haptics device serving as an interface is mapped, and the decoding unit described above. It is an information processing device including an extraction unit that extracts the haptics data from the two-dimensional image generated by the decoding unit.

The information processing method of another aspect of the present technology decodes the encoded data and generates a two-dimensional image to which the haptics data detected at the observation point of the haptics device serving as an interface is mapped. This is an information processing method for extracting the haptics data from a two-dimensional image.

In the information processing device and method of one aspect of the present technology, the haptics data detected at the observation point of the haptics device serving as an interface is mapped to the pixels of the two-dimensional image, and the haptics data is mapped2. The 2D image is encoded and the encoded data is generated.

In the information processing apparatus and method of another aspect of the present technology, the coded data is decoded to generate a two-dimensional image to which the haptics data detected at the observation point of the interface haptics device is mapped. The haptics data is extracted from the generated two-dimensional image.

It is a figure explaining the outline of a haptics system. It is a block diagram which shows the main configuration example of a transmission device. It is a figure which shows the example of a coordinate axis. It is a figure which shows the example of the state of the capture of a three-dimensional space. It is a figure which shows the example of GPS information. It is a figure explaining the derivation example of the inclination angle. It is a block diagram which shows the main block diagram of the movement pixel editing part. It is a block diagram which shows the main block diagram of the media information synthesis part. It is a figure explaining the example of the state of the composite image generation. It is a figure explaining the example of the state of pixel mapping. It is a figure which shows the example of the pixel container of the force sense data. It is a figure which shows the structural example of the force sense data. It is a figure which shows the example of the pixel container of the force sense data. It is a figure which shows the example of the pixel container of the force sense data. It is a figure which shows the example of the pixel container of vibration data. It is a figure which shows the example of the pixel container of vibration data. It is a figure which shows the example of the pixel container of vibration data. It is a figure which shows the example of signaling. It is a flowchart explaining an example of the flow of a transmission process. It is a block diagram which shows the main configuration example of a receiving device. It is a block diagram which shows the main block diagram of the media information analysis part. It is a flowchart explaining an example of the flow of a reception process. It is a figure which shows the example of the dimension number of a capture. It is a figure which shows the example of bidirectional transmission. It is a figure explaining an example of a remote control system. It is a block diagram which shows the main configuration example of a local system. It is a block diagram which shows the main configuration example of an MPD server. It is a figure explaining an example of the relationship between MPD and a bit stream. It is a figure which shows the example of the syntax of SEI. It is the figure following FIG. 29 which shows the example of the syntax of SEI. It is a figure following FIG. 30 which shows an example of the syntax of SEI. It is a figure which shows the example of the semantics of SEI. It is the figure following FIG. 32 which shows the example of the semantics of SEI. It is a figure which shows the example of MPD. It is the figure following FIG. 34 which shows the example of MPD. It is a figure which shows the example of MPD. It is a figure which shows the example of MPD. It is a figure which shows the example of the semantics of MPD. It is a figure which shows the example of the media box. It is a block diagram which shows the main configuration example of a computer.

Hereinafter, embodiments for carrying out the present disclosure (hereinafter referred to as embodiments) will be described. The explanation will be given in the following order.
1. 1. Haptic transmission 2. First Embodiment (transmitter)
3. 3. Second Embodiment (Receiver)
4. Third Embodiment (remote control system)
5. Addendum

<1. Haptic transmission>
<Haptics system>
Telexistence society, the intention of this is to place a device at your disposal in a spatially distant place and control it via a network to realize the effect of instantaneous spatial movement. The action on the local side that leads the control is reproduced on the remote side, and by operating the remote device, the progress and results are fed back to the local side at any time, and the local activity is continued by the feedback. In addition, it is thought that the incorporation of humans into the feedback system frees them from spatiotemporal constraints and leads to the realization of Human Augmentation, which enables the amplification of human abilities rather than simply the sense of presence.

For example, the haptics system 10 of FIG. 1 is installed at a remote location from each other and has a haptics device 11 and a haptics device 15 including sensors, actuators, and the like. One of them transmits the haptics data (force sense data, tactile data, etc.) detected by the sensor, and the other receives the haptics data and drives the actuator based on the haptics data. By exchanging such haptics data, the operation of one haptics device can be reproduced in the other haptics device. That is, remote control is realized. Such exchange of haptics data is realized by the communication device 12 and the communication device 14 communicating with each other via the network 13.

The communication device 12 can also feed back the haptics data from the haptics device 11 to the haptics device 11. Similarly, the communication device 14 can also feed back the haptics data from the haptics device 15 to the haptics device 15.

The haptics device may be, for example, a bent skeletal arm-shaped device or a glove-shaped device that can be worn in the hand. When the operator moves the skeletal arm as a Haptic Display or the glove locally, the position information and movement state at each joint fluctuate.

As a haptics device, the degree of freedom in the configuration of the force sensor has been changed from the primary (1DoF (Degree of Freedom)) to the tertiary (3DoF), and the joint points have been increased. Haptics devices have been considered.

In the transmission of such haptics data, it is necessary to accurately describe how the outputs of multiple local kinesthetic sensors change in conjunction with each other and convey them to the remote receiving device. Therefore, for example, a method of transmitting data from a force sense / tactile sensor together with data showing each connection relationship has been considered. However, in the case of such a transmission method, there is a possibility that the transmission load may increase, such as complicated processing for transmission and an increase in the amount of data to be transmitted. Even if the data transmission at each location is performed more efficiently, there is a risk that the data band to be transmitted will increase as the scale of the object to be reproduced increases.

For example, a human hand has five fingers, and it is considered that the degree of freedom of movement is 20 or more. When such movements of the hand joints are faithfully transmitted to a remote location, for example, it is necessary to transmit data equivalent to 15 joints (15 channels). For example, even if the bit rate for one channel of haptics data is about 100 kbps, it will be 1.5 Mbps for 15 channels. When moving a human with more parts, such as an avatar, remotely, it is necessary to transmit data of more contacts (corresponding to the number of joints), and the bit rate may increase further. was there.

In addition, if the number of joints increases, the connection relationship of each haptics data becomes complicated, and the processing for transmission may become more complicated.

In this way, with the conventional method, there is a risk that the load of haptics data transmission will increase.

<Imaging of haptics data>
Therefore, the haptics data is mapped to the pixels of the two-dimensional image, the two-dimensional image to which the haptics data is mapped is encoded to generate the encoded data, and the encoded data is transmitted.

By doing so, the haptics data can be encoded and transmitted by the same image coding without depending on the number of contacts of the haptics data, so that the complexity of the processing can be suppressed. can. Further, since image coding can be applied, not only high coding efficiency can be realized more easily, but also an increase in the amount of data due to an increase in the number of contacts of haptics data can be suppressed. ..

<2. First Embodiment>
<Transmission device>
FIG. 2 is a diagram illustrating an outline of a transmission device which is an embodiment of an information processing device to which the present technology is applied. The transmission device 100 shown in FIG. 2 is a device that transmits haptics data such as force sense data and tactile data to another device such as a remote location. It should be noted that FIG. 2 shows the main things such as the processing unit and the data flow, and not all of them are shown in FIG. That is, in the transmission device 100, there may be a processing unit that is not shown as a block in FIG. 2, or there may be a processing or data flow that is not shown as an arrow or the like in FIG.

As shown in FIG. 2, the transmission device 100 includes an ROI (Region Of Interest) setting unit 101, a motion pixel editing unit 102, a media information synthesis unit 103, an encoding unit 104, and a container processing unit 105.

Image data is input to this transmission device 100. The ROI setting unit 101 sets a region of interest (ROI) in this image data. The ROI setting unit 101 supplies the ROI setting information indicating the ROI to the motion pixel editing unit 102.

The motion pixel editing unit 102 performs processing related to the generation of the motion focus map. The motion-focused map is map information (image information) indicating the position where the motion has occurred and the position to be focused on. For example, the motion pixel editing unit 102 acquires image data input to the transmission device 100, and identifies the motion occurrence position based on the image data. Further, the motion pixel editing unit 102 acquires the ROI setting information supplied from the ROI setting unit 101, and specifies a position to be focused on based on the ROI setting information. The motion pixel editing unit 102 generates a motion focus map from those processing results and supplies it to the media information synthesis unit 103.

The media information synthesis unit 103 performs processing related to the generation of a composite image of the motion attention map and the image mapped with the haptics data. For example, the media information synthesis unit 103 acquires a motion attention map supplied from the motion pixel editing unit 102. This motion focus map is a two-dimensional image showing the position where the motion has occurred and the position to be focused on in pixel positions.

Further, the media information synthesis unit 103 acquires force sense data and tactile data detected by the haptics device and the like and supplied to the transmission device 100 as haptics data. The force sense data is information indicating the magnitude and direction of the applied force. This force sense data is detected by, for example, a force sense sensor or the like. The tactile data is information on the tactile sensation such as vibration and temperature. This tactile data is detected by a sensor that detects tactile parameters such as a vibration sensor and a temperature sensor.

Further, the media information synthesis unit 103 maps such haptics data to a two-dimensional image, synthesizes the two-dimensional image to which the haptics data is mapped and the motion attention map, and also referred to as a composite image (also referred to as a haptics composite image). ) Is generated. The media information synthesis unit 103 supplies the haptic composite image to the coding unit 104.

The coding unit 104 performs processing related to image coding. For example, the coding unit 104 acquires a haptic composite image supplied from the media information synthesis unit 103. In addition, the coding unit 104 encodes the haptic composite image by a predetermined image coding method to generate coded data. This image coding method is arbitrary, and may be a still image coding method such as JPEG (Joint Photographic Experts Group), for example, MPEG (Moving Picture Experts Group), AVC (Advanced Video Coding). ), HEVC (High Efficiency Video Coding), etc. may be used as a video coding method. The coding unit 104 supplies the generated coded data (also referred to as a haptic composite video coded stream) to the container processing unit 105.

The container processing unit 105 performs processing related to containerization. For example, the container processing unit 105 acquires the coded data (haptics composite video coded stream) supplied from the coded unit 104. The container processing unit 105 stores the coded data in a container (file) based on a predetermined file format. This file format is arbitrary. For example, ISOBMFF (ISOBaseMediaFileFormat) may be used. Further, the container processing unit 105 may store the control information related to the haptics data in the media box of the ISOBMFF format file. The container processing unit 105 transmits the containerized coded data to the destination.

Each of these processing units (ROI setting unit 101 to container processing unit 105) of the transmission device 100 has an arbitrary configuration. For example, each processing unit may be configured by a logic circuit that realizes the above-mentioned processing. Further, each processing unit has, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like, and the above-mentioned processing is realized by executing a program using them. You may do so. Of course, each processing unit may have both configurations, and a part of the above-mentioned processing may be realized by a logic circuit, and the other may be realized by executing a program. The configurations of the respective processing units may be independent of each other. For example, some processing units realize a part of the above-mentioned processing by a logic circuit, and some other processing units execute the program. The above-mentioned processing may be realized by the other processing unit by both the logic circuit and the execution of the program.

<Image data and ROI settings>
Next, the image data input to the transmission device 100 will be described. This image data is an image captured by a haptics device or the like in a three-dimensional space represented by a coordinate system such as a three-axis Cartesian coordinate system as shown in FIG.

This imaging is performed using, for example, a plurality of cameras (imaging devices) as shown in FIG. In the case of the example of FIG. 4, the glove-shaped haptics device 112A worn by the user 112 is imaged by the three cameras of the camera 111-1, the camera 111-2, and the camera 111-3. In the following, when it is not necessary to distinguish between the cameras 111-1 and the cameras 111-3, they are referred to as the cameras 111.

Camera parameters such as Sensor position, 3D_slope, and capture_normal_vector are set in each camera 111. The Sensor position is a parameter that expresses the lens position of the camera 111 in relative position or absolute position coordinates from a certain reference point. 3D_slope is a parameter that represents the inclination of the lens of the camera 111 as an angle of deviation from the reference coordinate system, as shown in FIG. 3, for example. The capture_normal_vector is a parameter indicating the direction in which the lens of the camera 111 faces (the direction perpendicular to the lens surface).

The haptics device 112A is provided with a force sensor, a tactile sensor, and the like, and can detect the movement of the joint of the hand of the user 112 who wears it, detect the force applied to the joint, the fingertip, and the like, and the fingertip and the palm. It is a device that detects vibration, temperature, etc. By imaging the haptics device 112A with three or more cameras 111, the position and movement of the haptics device 112A in the three-dimensional space can be specified.

The ROI setting unit 101 sets the ROI 113 in the three-dimensional space based on the image data and these camera parameters. In the case of the example of FIG. 4, the ROI 113 is set to include the haptics device 112A. The ROI setting unit 101 identifies a region corresponding to such ROI 113 in each image, and supplies ROI setting information indicating the region to the motion pixel editing unit 102. The area corresponding to ROI 113 in each image is calibrated between the images and completely corresponds to ROI 113 in the three-dimensional space.

The lens position (Sensor position) of the camera 111 is specified by using, for example, GPS (Global Positioning System) information. As shown in FIG. 5, for example, the GPS information includes information that defines two-dimensional coordinates (latitude, longitude), altitude (elevation), time (time), and the like as spatial coordinates. Further, the orientation of the lens (capture_normal_vector) is detected by, for example, a 3-axis surrounding sensor or the like.

Further, the tilt of the lens is detected by using an acceleration sensor or the like, for example, as shown in FIG. For example, when the accelerometer matches the direction in which gravity acts, the accelerometer detects an acceleration of 9.8 m / sec2. On the other hand, when the acceleration sensor is arranged in the direction perpendicular to the direction in which gravity acts, the influence of gravity disappears and the output of the acceleration sensor becomes zero. When the acceleration a of the inclination of an arbitrary angle θ is obtained as the output of the acceleration sensor, the inclination angle θ can be derived by the following equation (1).

<Movement pixel editorial department>
FIG. 7 is a block diagram showing a main configuration example of the motion pixel editing unit 102. It should be noted that FIG. 7 shows the main things such as the processing unit and the data flow, and not all of them are shown in FIG. 7. That is, in the motion pixel editing unit 102, there may be a processing unit that is not shown as a block in FIG. 7, or there may be a processing or data flow that is not shown as an arrow or the like in FIG. 7.

As shown in FIG. 7, the motion pixel editing unit 102 has a motion image generation unit 131 and a pixel editing unit 132.

The motion image generation unit 131 performs processing related to detection of motion pixels in the captured image. For example, the motion image generation unit 131 acquires image data (also referred to as an image sensor output image) of an captured image captured by a camera 111 or the like. The motion image generation unit 131 derives a difference (also referred to as an inter-frame difference image) between temporally continuous images of the image sensor output. By this process, pixels (also referred to as motion pixels) that move between frames are detected. The pixel value of this inter-frame difference image is not a difference value of the pixel value between frames, but a pixel whose pixel value changes between frames is expressed by 1 bit. For example, the pixel value of a pixel whose pixel value changes between frames is set to "1", and the pixel value of a pixel whose pixel value does not change is set to "0". That is, the inter-frame difference image is map information (also referred to as a motion pixel map) indicating the positions of motion pixels. The motion image generation unit 131 supplies the motion pixel map generated in this way to the pixel editing unit 132.

The pixel editing unit 132 performs processing related to the generation of the motion focus map. For example, the pixel editing unit 132 acquires a motion pixel map supplied from the motion image generation unit 131. Further, the pixel editing unit 132 acquires the ROI setting information supplied from the ROI setting unit 101. The pixel editing unit 132 uses this information to generate a motion focus map.

The motion focus map is map information composed of 2-bit pixel values obtained by adding a 1-bit focus marker to the motion pixel map. The attention marker is a marker indicating a position to be focused on. For example, it is map information (image) in which the pixel value of the pixel to be noted is "1" and the pixel value of the other pixels is "0". That is, the motion focus map is map information indicating pixels that move between frames (movement positions) and pixels that should be focused (positions that should be focused).

The pixel editing unit 132 generates a motion focus map by setting such a focus marker (also referred to as a focus map) and synthesizing it with the motion pixel map. Whether or not the pixel should be of interest is set according to a predetermined condition set in advance. For example, the pixels corresponding to the ROI set by the ROI setting information may be set as the pixels to be noted. Further, the pixels corresponding to the positions where the sensors and actuators are present, the pixels corresponding to the positions where feedback is requested from the transmission destination of the haptics data, the pixels corresponding to the edge portion and the fingertip of the haptics device 112A, and the like are of interest. It may be set as a power pixel.

Note that the motion-focused map may be map information composed of 1-bit pixel values. For example, in the motion focus map, the pixel value of the pixel to be focused on may be set to "1" and the other pixels may be set to "0" while there is movement between frames. That is, in this case, the logical product of the pixel values of the motion pixel map and the focus marker is taken as the pixel value of the motion focus map. Of course, the logical sum of the pixel values of the motion pixel map and the focus marker may be used as the pixel value of the motion focus map.

The pixel editing unit 132 supplies the motion attention map generated as described above to the media information synthesis unit 103.

The motion image generation unit 131 and the pixel editing unit 132 perform the above-mentioned processing on each image input to the transmission device. That is, when imaging is performed by a plurality of cameras 111 as in the example of FIG. 4, a motion focus map is generated for each captured image obtained by each camera 111.

<Media Information Synthesis Department>
FIG. 8 is a block diagram showing a main configuration example of the media information synthesis unit 103. Note that FIG. 8 shows the main things such as the processing unit and the data flow, and not all of them are shown in FIG. That is, in the media information synthesis unit 103, there may be a processing unit that is not shown as a block in FIG. 8, or there may be a processing or data flow that is not shown as an arrow or the like in FIG.

As shown in FIG. 8, the media information synthesis unit 103 has a pixel mapping unit 141 and a composite image generation unit 142.

The pixel mapping unit 141 performs processing related to mapping haptics data to a two-dimensional image. For example, the pixel mapping unit 141 acquires force sense data and tactile data detected in the haptics device. Further, the pixel mapping unit 141 acquires a motion attention map supplied from the media information synthesis unit 103. The pixel mapping unit 141 maps the acquired force sense data and tactile data to a two-dimensional image by using the acquired motion attention map. The pixel mapping unit 141 supplies the two-dimensional image to which the haptics data is mapped in this way to the composite image generation unit 142.

The composite image generation unit 142 performs processing related to generation of a haptic composite image. For example, the composite image generation unit 142 acquires a two-dimensional image to which haptics data is mapped, which is supplied from the pixel mapping unit 141. Further, the composite image generation unit 142 acquires the motion attention map supplied from the media information synthesis unit 103. The composite image generation unit 142 synthesizes these acquired images (map information) to generate a haptic composite image. The composite image generation unit 142 supplies the generated haptic composite image to the coding unit 104.

<Haptic composite image>
That is, as shown in A of FIG. 9, the media information synthesis unit 103 synthesizes the motion attention map 151 and the two-dimensional image 152 to which the haptics data such as force sense data and tactile data are mapped, and shows the figure. Generates a haptic composite image 153 as shown in B of 9. That is, in this haptic composite image 153, information indicating the position of movement, information indicating the position to be focused on, information indicating the detected force, vibration, temperature, etc., and the position where the information is detected are displayed. Contains information to indicate. The position where the haptics data is detected is indicated by the position of the pixel to which the haptics data is mapped.

<Pixel mapping>
The pixel mapping unit 141 generates a two-dimensional image 152 to which the haptics data is mapped. For example, as shown in FIG. 10A, it is assumed that the haptics data 154-1 to the haptics data 154-4 are detected by the sensor included in the haptics device 112A. In the following, when it is not necessary to distinguish the haptics data 154-1 to the haptics data 154-4 from each other, they are referred to as haptics data 154.

The pixel mapping unit 141 maps these haptics data 154 to a two-dimensional image as shown in B of FIG. That is, the pixel mapping unit 141 applies haptics to the pixels (pixels in which the portion where the sensor exists) corresponding to the portion where the sensor that detects the haptics data 154 is included in the captured image obtained by the camera 111. Data 154 is mapped. Therefore, in the two-dimensional image 152 to which the haptics data is mapped, the pixel position to which the haptics data 154 is mapped indicates the detection position of the haptics data 154.

However, the pixel mapping unit 141 uses the motion focus map instead of the captured image to perform this mapping. The motion focus map is generated for each captured image and corresponds to the angle of view of the captured image. Therefore, the pixel mapping unit 141 can perform mapping in the same manner as when the captured image is used (similar mapping results can be obtained).

Note that one haptics data may be mapped to one pixel, or may be mapped to a plurality of pixels. For example, as shown in C of FIG. 10, a plurality of pixels may be grouped together to form a subblock 155, and one haptics data may be mapped to the subblock 155. That is, in this case, one haptics data is mapped to a plurality of pixels (Y0 to Y3) in the subblock 155.

Further, the haptics data may be arranged in each of the Y component, Pr component, and Pb component of the two-dimensional image.

<Force sense data>
The case of force sense data will be described. For example, force sensation data can have multiple components. For example, as a three-axis force sensor that detects the applied force in three directions, there is one that generates force data consisting of three components of Fz, Mx, and My.

For example, as shown in FIG. 11, such force sensory data composed of a plurality of components may be divided for each component and arranged in the Y component, Pr component, and Pb component of the two-dimensional image. In the case of the example of FIG. 11, Fz is arranged in the Y component, Mx is arranged in the Pr component, and My is arranged in the Pb component. The Y component, Pb component, and Pr component are each composed of 10 bits, and Fz, Mx, and My are each composed of 8 bits. Fz, Mx, and My are set in the lower 8 bits of each component, respectively. In the haptic composite image, the pixel value of the motion focus map (that is, information indicating whether or not there is motion and whether or not attention should be paid) is set in the upper two bits of each component. In FIG. 11, a 1-bit motion focus map is assigned to the MSbit 1 bit of each component. By arranging in this way, each component of the force sense data can be easily identified. In addition, it is possible to suppress an increase in the bit depth of the pixel value of the haptic composite image as compared with the case where all the components of the force sensory data are arranged in one component (for example, the Y component) of the image.

The bit length (bit depth) of each component of the force sensor data may be different from that of other components. For example, as shown in A of FIG. 12, the bit length of Fz may be 16 bits, and the bit lengths of Mx and My may be 4 bits. Since Mx and My indicate the direction in which the force is applied, two bits each can represent 16 directions as shown in B of FIG. If Mx and My are each composed of 4 bits as shown in A of FIG. 12, more accurate orientation specification can be realized. That is, by adopting the bit length configuration as shown in FIG. 12A, it is possible to suppress an increase in the amount of data while maintaining sufficient accuracy.

The sampling ratio of each component of the haptics composite image (that is, the sampling ratio of YPbPr of the two-dimensional image that maps the haptics data) is arbitrary. For example, as in the example of FIG. 11, it may be 4: 4: 4, it may be 4: 2: 0, it may be 4: 2: 2, or it may be other than that.

For example, when the force sense data is configured as shown in A in FIG. 12 and the image encoding is composed of a profile corresponding to 4: 2: 0, as shown in FIG. 13, the force sense data is detected at the detection position. In the corresponding subblock 155 (C in FIG. 10), Fz is arranged in 4 pixels (Y0 to Y3) of Y component, Mx is arranged in 1 pixel (Pb0) of Pb component, and My is arranged in 1 pixel (Pr0) of Pr component. You may do so. Fz is divided into four (divided by four bits) in the bit depth direction, and is arranged in four pixels of the Y component.

When the image encoding is composed of a profile corresponding to 4: 2: 2, the bit lengths of Mx and My are set to 8 bits, and as shown in FIG. 14, the subblock corresponding to the detection position of the haptic data. In 155 (C in FIG. 10), Fz is arranged in 4 pixels (Y0 to Y3) of Y component, Mx is arranged in 2 pixels of Pb component (Pb0, Pb2), and My is arranged in 1 pixel of Pr component (Pr0, Pr2). You may. Fz is divided into four (divided by four bits) in the bit depth direction, and is arranged in four pixels of the Y component. Similarly, Mx is divided into two (divided by 4 bits) in the bit depth direction and arranged in two pixels of the Pb component. Similarly, My is divided into two (divided by 4 bits) in the bit depth direction and arranged in two Pr component pixels.

As shown in FIGS. 13 and 14, bit assignments may be defined so as not to affect the value of the container target when setting the quantization step size at the time of video coding. For example, when the encoder quantization step size is 16, the lower 4 bits (blank parts in FIGS. 13 and 14) of each element to be containerized are set to 0, and the lower 5 bits are the LSB (Least Significant Bit) of the force sense data. Bit assignment may be performed so as to be.

<Vibration data>
Next, the case of tactile data will be described. As an example of tactile data, vibration data representing a vibration state will be described.

In the case of vibration data, if the dynamic range of amplitude is wide, the bit length becomes large, and it is also necessary to arrange using multiple Y / Pb / Pr components to be containerized. Therefore, for example, as shown in FIG. 15, the vibration data may be divided in the bit depth direction, and each divided data may be arranged in the Y component, the Pb component, and the Pr component. In the case of the example of FIG. 15, the bit length of the vibration data Amp is 24 bits, which are divided into 8 bits and arranged in the Y component, the Pb component, and the Pr component, respectively.

When the image encoding is performed with a profile corresponding to 4: 2: 0, the vibration data is divided into 6 as shown in FIG. 16, and the Y component is 4 pixels, the Pb component is 1 pixel, and the Pr component is 1 pixel. It may be arranged in. In the case of the example of FIG. 16, the 24-bit vibration data Amp is divided into 4 bits each. Further, the value of the lower bits (for example, the lower 4 bits) of each element may be set to 0 to suppress the influence of the quantization of the encoder. When a certain rounding is performed on the amplitude value, more efficient container arrangement is possible.

In addition, the sampling rate of vibration data may be higher than the frame rate due to the characteristics of the information. For example, if the vibration sampling frequency is 1 kHz and each sampled data is mapped to different frames, the frame rate of the haptic composite image must be a high frame rate of 1 kHz or higher, which increases processing costs and transmission costs. There is a risk. The video frame frequency of a typical UHDTV is 60 Hz, which is lower than the vibration sampling frequency described above.

In addition, vibration data also has the characteristic that its positional resolution is low. That is, the values tend to match or resemble in a wide range of areas as compared with the separation data and the like. Therefore, the vibration data is less likely to cause a problem even if it is assigned to a wide area of the two-dimensional image as compared with the force sense data or the like.

Therefore, it may be possible to map a plurality of sampling data to an image of one frame. For example, as shown in FIG. 17, the vibration data may be arranged in a block, and each sampling data may be arranged in each sub-block within the block. For example, in the case of FIG. 17, since 16 sub-blocks are formed in the block, 16 vibration sampling data that are continuous in time series can be arranged at a plurality of container pixel positions with respect to a certain vibration point. can. Therefore, for example, if the frame frequency of the haptic composite image is 60 Hz, the sampling frequency of the vibration data can be increased to 960 Hz. In other words, in this case, it is possible to map vibration data having a sampling frequency of 960 Hz or less. In this way, vibration data with a sampling rate higher than the frame rate can also be arranged. Of course, the number of subblocks in the block is arbitrary. Further, the area where the vibration data is arranged is arbitrary and is not limited to the block. For example, vibration data may be arranged for a plurality of blocks.

<Temperature data>
In the case of temperature data representing a temperature state, which is another example of tactile data, the method of arranging the vibration data is the same. However, the temperature data tends to change at a slower rate than the vibration data. That is, the sampling rate may be lower than that of the vibration data, and therefore, more accurate information mapping is possible.

<Signaling>
When the haptics data is mapped to a two-dimensional image and transmitted as described above, the control information related to the haptics data may also be transmitted. This control information is optional. For example, as shown in FIG. 18, information indicating the number and direction of views, flag information regarding a haptics composite image, information regarding mapped haptics data, and the like may be included, or information other than these. May be included. By transmitting such control information, the control information can be referred to on the receiving side, and haptics data can be handled more easily.

<Flow of transmission processing>
An example of the flow of the transmission process performed by the transmission device 100 will be described with reference to the flowchart of FIG. When the transmission process is started, the ROI setting unit 101 of the transmission device 100 sets the ROI of the area to be reproduced on the receiving side in step S101.

In step S102, the motion image generation unit 131 of the motion pixel editing unit 102 calculates the inter-frame difference and generates a motion pixel map.

In step S103, the pixel editing unit 132 of the motion pixel editing unit 102 edits the pixels, adds the focus data to the motion pixel map, and generates the motion focus map.

In step S104, the pixel mapping unit 141 of the media information synthesis unit 103 maps force sense data and tactile data to a two-dimensional image (pixel space).

In step S105, the composite image generation unit 142 of the media information synthesis unit 103 synthesizes the two-dimensional image to which the haptics data is mapped and the motion focus map to generate a haptics composite image.

In step S106, the coding unit 104 encodes the haptics composite image and converts the haptics composite image into coded data (haptics composite video coded stream). In step S107, the container processing unit 105 performs container processing and stores the coded data (haptics synthetic video coded stream) in a file having a predetermined file format. In step S108, the container processing unit 105 transmits the file to a transmission destination (for example, a receiving device). When the process of step S108 is completed, the transmission process is completed.

As described above, by executing each process, haptics data can be mapped to a two-dimensional image and transmitted, and an increase in the load of haptics data transmission can be suppressed.

<Encoder restrictions>
In order to avoid a change in the value due to encoding of the haptics data to be pixel-contained, the following may be applied as operational restrictions of the encoder.

For example, in quantization, the Q_step value may be determined in relation to the accuracy of the pixel container value. That is, the N bits may be raised to use the space of the Y / Pb / Pr components.

Further, in frequency conversion, in order to prevent errors due to conversion to a frequency domain such as DCT (Discrete Cosine Transform) or DST (Discrete Sine Transform), a mode for skipping these conversions may be applied.

<3. Second Embodiment>
<Receiver>
FIG. 20 is a diagram illustrating an outline of a receiving device which is an embodiment of an information processing device to which the present technology is applied. The receiving device 200 shown in FIG. 20 is a device that receives haptics data such as haptic data and tactile data transmitted from another device at a remote location. The receiving device 200 corresponds to the transmitting device 100, and can receive and process a file transmitted from the transmitting device 100 to acquire haptics data.

Note that FIG. 20 shows the main things such as the processing unit and the data flow, and not all of them are shown in FIG. 20. That is, in the receiving device 200, there may be a processing unit that is not shown as a block in FIG. 20, or there may be a processing or data flow that is not shown as an arrow or the like in FIG.

As shown in FIG. 20, the receiving device 200 has a container processing unit 201, a decoding unit 202, a media information analysis unit 203, and a haptics presentation unit 204.

The container processing unit 201 receives a file in which the haptics composite video coded stream, which is the coded data of the haptics composite image, is stored. The container processing unit 201 analyzes the file, extracts a haptics composite video coded stream, and supplies the haptics synthesis video coded stream to the decoding unit 202.

The decoding unit 202 acquires the haptics composite video coded stream supplied from the container processing unit 201, decodes it, and generates a haptics composite image. This decoding method is arbitrary as long as it corresponds to the coding method by the coding unit 104. The decoding unit 202 supplies the data of the haptic composite image to the media information analysis unit 203.

The media information analysis unit 203 analyzes the haptics composite image and extracts haptics data such as force sense data and tactile data. The media information analysis unit 203 supplies the extracted haptics data to the haptics presentation unit 204.

The haptics presentation unit 204 acquires the haptics data supplied from the media information analysis unit 203. The haptics presentation unit 204 presents the haptics data on the media and outputs the haptics data to another device (for example, a haptics device having an actuator).

Each of these processing units (container processing unit 201 to haptics presentation unit 204) of the receiving device 200 has an arbitrary configuration. For example, each processing unit may be configured by a logic circuit that realizes the above-mentioned processing. Further, each processing unit may have, for example, a CPU, ROM, RAM, etc., and execute a program using them to realize the above-mentioned processing. Of course, each processing unit may have both configurations, and a part of the above-mentioned processing may be realized by a logic circuit, and the other may be realized by executing a program. The configurations of the respective processing units may be independent of each other. For example, some processing units realize a part of the above-mentioned processing by a logic circuit, and some other processing units execute the program. The above-mentioned processing may be realized by the other processing unit by both the logic circuit and the execution of the program.

<Media Information Analysis Department>
FIG. 21 is a block diagram showing a main configuration example of the media information analysis unit 203. As shown in FIG. 21, the media information analysis unit 203 includes a position information extraction unit 221, a physical space remapping unit 222, and a force / tactile information extraction unit 223.

The position information extraction unit 221 acquires a haptics composite image supplied from the decoding unit 202. The position information extraction unit 221 analyzes the haptics composite image and extracts position information indicating the position to which the haptics data such as force sense data and tactile data are mapped. At that time, the position information extraction unit 221 extracts the position information with reference to the motion attention map included in the haptics composite image. For example, the position information extraction unit 221 can extract the position information of the haptics data of the moving portion and the haptics data of the portion of interest.

The position information extraction unit 221 supplies the position information to the physical space remapping unit 222. Further, the position information extraction unit 221 supplies the haptic composite image and the position information to the force / tactile information extraction unit 223.

The physical space remapping unit 222 remaps the position information supplied from the position information extraction unit 221 to the three-dimensional space (3D physical space) and generates the 3D physical space position information. That is, the position of the haptics data in the 3D physical space is set. For example, the physical space remapping unit 222 was obtained from the sensor position, 3D_slope, etc. of the camera parameters so that the zn axis of each camera is parallel to the 3D reference coordinate axis Z and aligned in the vertical direction with respect to the ROI region. Correct the pixel position in the renderer coordinate system. Also, the camera parameter capture_normal_vector, which indicates the lens orientation of each view, is corrected so that it intersects at the correct angle. If there are three views, paste them at the appropriate positions in the renderer coordinate system so that they are orthogonal to each other in 3D space. The remapped coordinates are remapped to the physical space by multiplying the scaling ratio S corresponding to position_mapping_ratio, and output to the renderer. In the coordinate transformation, any method of coordinate transformation such as affine transformation or homography transformation is performed on each two-dimensional image. The physical space remapping unit 222 supplies the 3D physical space position information to the haptics presentation unit 204.

The force / tactile information extraction unit 223 extracts haptics data such as force / tactile data and tactile data from the haptics composite image supplied from the position information extraction unit 221 based on the position information supplied from the position information extraction unit 221. do. That is, the force / tactile information extraction unit 223 extracts haptics data from the position indicated by the position information in the haptics composite image. The force / tactile information extraction unit 223 supplies the extracted haptics data to the haptics presentation unit 204.

By doing so, the haptics presentation unit 204 can arrange and present the extracted haptics data at a position in the 3D physical space indicated by the 3D physical space position information. That is, since the relationship between the transmitted haptics data can be correctly expressed, each transmitted haptics data can be correctly used in the subsequent device.

That is, the receiving device 200 can correctly acquire the haptics data mapped to the two-dimensional image and transmitted. In other words, the receiving device 200 can realize such a transmission method. Therefore, the receiving device 200 can suppress an increase in the load of haptics data transmission.

<Flow of reception processing>
An example of the flow of the reception process executed by the reception device 200 will be described with reference to the flowchart of FIG.

When the reception process is started, the container processing unit 201 of the receiving device 200 receives the file transmitted from the transmitting device 100 or the like in step S201. In step S202, the container processing unit 201 analyzes the file (container) and extracts the haptics composite video coded stream.

In step S203, the decoding unit 202 decodes the haptics composite video coded stream and generates a haptics composite image. In step S204, the position information extraction unit 221 of the media information analysis unit 203 detects the ROI region. Further, in step S205, the position information extraction unit 221 extracts the position information.

In step S206, the force / tactile information extraction unit 223 extracts haptics data such as force / tactile data from the haptics composite image. In step S207, the force / tactile information extraction unit 223 outputs the extracted haptics data to the outside of the receiving device 200 (for example, another device).

In step S208, the physical space remapping unit 222 maps the position information (that is, haptics data) to the 3D physical space.

In step S209, the haptics presentation unit 204 arranges the extracted haptics data at a location indicated by the 3D physical space position information and presents the media.

When the process of step S209 is completed, the reception process is completed.

By executing each process as described above, the receiving device 200 can suppress an increase in the load of haptics data transmission.

<Number of dimensions of image sensor>
The number of dimensions captured by the image sensor (that is, the number of cameras that generate captured images) is arbitrary. For example, as shown in A of FIG. 23, a subject may be imaged using nine cameras. Further, as shown in B of FIG. 23, the subject may be imaged using three cameras. Further, the view may be a captured image of a virtual viewpoint generated from an actual captured image without a physical camera.

<4. Third Embodiment>
<Bidirectional transmission>
The transmission of the haptics data described above may be bidirectional, as shown in FIG. 24, for example. For example, the haptics data detected by the user of the local operator operating the haptics device having the sensor is imaged as described above and transmitted (forward) to the remote device as a haptics composite image (encoded data). ) May be done.

In this case, the remote device, which is a haptics device having an actuator, reproduces the movement of the haptics device on the local operator side by using the transmitted haptics data. When the remote device grasps the object by reproducing this movement, the force sense data, tactile data, etc. (haptics data) at that time are detected by the remote device (sensor) and transmitted (feedback) to the local operator side. .. In the case of this feedback as well, as in the case of forward, the haptics data may be imaged as described above and transmitted as (encoded data) of the haptics composite image.

The haptics device on the local operator side uses the transmitted haptics data to reproduce the force and tactile sensation detected on the remote device. As a result, the user who is a local operator can experience the force and tactile sensation detected in the remote device through the haptics device.

By selecting the haptics data to be fed back in this way from a plurality of haptics data to be forward-transmitted by using the marker of interest, it is possible to suppress the feedback transmission of unnecessary haptics data. , It is possible to suppress an increase in the load of haptics data transmission.

<Remote control system>
FIG. 25 is a diagram illustrating an outline of a remote control system which is an embodiment of a communication system (information processing system) to which the present technology is applied. The remote control system 300 shown in FIG. 25 has a local system 301 and a remote system 302 that are remote from each other. The local system 301 and the remote system 302 each have a haptics device, communicate with each other via the network 310, and realize remote control of the haptics device by exchanging haptics data. For example, the operation input to one haptics device can be reproduced in the other haptics device.

Here, in the description, the system on the main side of communication is referred to as the local system 301, and the system on the other side of the communication is referred to as the remote system 302. However, the local system 301 and the remote system 302 basically play the same role as each other. It is a system that can be carried. Therefore, unless otherwise specified, the description of the local system 301 described below can also be applied to the remote system 302.

The configurations of the local system 301 and the remote system 302 are arbitrary. The configuration of the local system 301 and the configuration of the remote system 302 may be different from each other or may be the same as each other. Further, in FIG. 25, one local system 301 and one remote system 302 are shown, but the remote control system 300 can have an arbitrary number of local systems 301 and remote systems 302, respectively.

Further, the remote control system 300 can have an MPD server 303. The MPD server 303 performs processing related to registration and provision of MPD (Media Presentation Description) of DASH (Dynamic Adaptive Streaming over HTTP) to the local system 301 and the remote system 302. The local system 301 and the remote system 302 can use this MPD to select and acquire necessary information. Of course, the configuration of the MPD server 303 is also arbitrary, and the number thereof is also arbitrary.

Note that this MPD server 303 can be omitted. For example, the local system 301 or the remote system 302 may supply the MPD to the communication partner. Further, for example, the local system 301 and the remote system 302 may exchange haptics data without using MPD.

The network 310 is composed of, for example, a local area network, a network by a dedicated line, a WAN (Wide Area Network), the Internet, cellular communication, satellite communication, or any other wired communication network, wireless communication network, or both. Further, the network 310 may be composed of a plurality of communication networks.

<Local system>
FIG. 26 is a block diagram showing a main configuration example of the local system 301. Note that FIG. 26 shows the main things such as the processing unit and the data flow, and not all of them are shown in FIG. 26. That is, in each device included in the local system 301, there is a processing unit that is not shown as a block in FIG. 26, or there is a processing or data flow that is not shown as an arrow or the like in FIG. 26. May be good.

As shown in FIG. 26, the local system 301 has a haptics device 321 and a communication device 322, a digital interface 323, and a digital interface 324.

The haptics device 321 is a device that can serve as an interface for a user or a remote device, and generates haptics data or drives it based on the haptics data. Further, for example, the haptics device 321 can supply haptics data and the like to the communication device 322 via the digital interface 323. Further, the haptics device 321 can acquire haptics data and the like supplied from the communication device 322 via the digital interface 324.

The communication device 322 can communicate with another device via the network 310 (FIG. 25). The communication device 322 can, for example, exchange haptics data and exchange MPDs by the communication. Further, the communication device 322 can acquire haptics data and the like supplied from the haptics device 321 via the digital interface 323, for example. Further, the communication device 322 can supply haptics data and the like to the haptics device 321 via the digital interface 324. The digital interface 323 and the digital interface 324 are interfaces for digital devices of arbitrary standards such as USB (Universal Serial Bus) (registered trademark) and HDMI (High-Definition Multimedia Interface) (registered trademark).

The haptics device 321 includes an image sensor 331, an ROI setting unit 332, a motion pixel editing unit 333, a media information synthesis unit 334, a media information analysis unit 341, a renderer 342, an actuator 343, and a haptics interface (I / F) 344. Have.

The ROI setting unit 332 is the same processing unit as the ROI setting unit 101 (FIG. 2), and can perform the same processing. The motion pixel editing unit 333 is the same processing unit as the motion pixel editing unit 102 (FIG. 2), and can perform the same processing. The media information synthesis unit 334 is the same processing unit as the media information synthesis unit 103 (FIG. 2), and can perform the same processing.

Further, the media information analysis unit 341 is the same processing unit as the media information analysis unit 203 (FIG. 20), and can perform the same processing. The renderer 342 is a processing unit similar to the haptics presentation unit 204 (FIG. 20), and can perform the same processing.

The communication device 322 includes a composer 351, an encoding unit 352, a container processing unit 353, an MPD generation unit 354, an imaging unit 355, a video coding unit 356, a container processing unit 361, a decoding unit 362, an MPD control unit 363, and a video decoding unit. It has a conversion unit 364 and a display unit 365.

The coding unit 352 is the same processing unit as the coding unit 104 (FIG. 2), and can perform the same processing. The container processing unit 353 is the same processing unit as the container processing unit 105 (FIG. 2), and can perform the same processing.

Further, the container processing unit 361 is the same processing unit as the container processing unit 201 (FIG. 20), and performs the same processing. The decoding unit 362 is the same processing unit as the decoding unit 202 (FIG. 20), and performs the same processing.

The remote system 302 can also have the same configuration as the local system 301.

The local system 301 and the remote system 302 can perform bidirectional transmission (forward, feedback) of the haptic composite image as described with reference to FIG. 24, for example.

For example, when the image sensor 331 images the haptics interface 344 and supplies the captured image data to the ROI setting unit 332, the ROI setting unit 332 transfers the observation point (for example, a joint or the like) of the haptics interface 344 from the captured image. Derivation of spatial coordinates (coordinates of 3D coordinate system) and set ROI.

The image sensor 331 may have a plurality of cameras, and a plurality of captured images (plurality of views) obtained by the plurality of cameras are used in the ROI setting unit 332 and the motion pixel editing unit 333. Can be supplied. Further, the image sensor 331 is arbitrary, for example, a magnetic sensor that detects position or movement, an ultrasonic sensor, a GPS (Global Positioning System) sensor, a gyro sensor that detects a motion state such as angular velocity, an acceleration sensor that detects acceleration, or the like. It may have a sensor of.

The motion pixel editing unit 333 generates a motion focus map using the supplied captured image and ROI setting information and the like. The media information synthesis unit 334 maps the haptics data detected by the sensor of the haptics interface 344 to a two-dimensional image using the motion attention map, and generates a haptics composite image. The media information synthesis unit 334 supplies the generated haptics composite image to the communication device 322 via the digital interface 323.

The composer 351 of the communication device 322 acquires the haptic composite image and supplies it to the coding unit 352. The coding unit 352 encodes the haptics composite image supplied from the composer 351 and generates coded data (haptics composite video coded stream). At that time, the coding unit 352 may encode the haptics composite image as a picture of a moving image, and may further add control information regarding the haptics data to each picture. It is supplied to the container processing unit 353. The container processing unit 353 stores the coded data in a file for transmission. For example, the container processing unit 353 may generate an ISOBMFF format file for storing the coded data. Further, the container processing unit 353 may store the control information related to the haptics data in the media box of the file. The container processing unit 353 forwards it to the remote system 302.

The container processing unit 361 of the communication device 322 of the remote system 302 receives the file, analyzes it, and extracts the coded data (haptics synthetic video coded stream). The decoding unit 362 decodes the coded data (haptics composite video coded stream) and generates (restores) the haptics composite image. The decoding unit 362 supplies the haptics composite image to the haptics device 321 via the digital interface 324.

The media information analysis unit 341 of the haptics device 321 acquires the haptics composite image and extracts the haptics data. The renderer 342 renders using the haptics data and generates control information for the actuator 343.

The actuator 343 drives the haptics interface 344 in response to the control information. The haptics interface 344 functions as an interface for force sense data, tactile sense data, and the like for an operator who is a user, a remote device, and the like. That is, the haptics interface 344 of the remote system 302 is controlled by the actuator 343, and the movement (force sense or tactile sense) of the haptics interface 344 on the local system 301 side represented by the haptics data forward-transmitted as described above. Etc.) is reproduced.

When the image sensor 331 of the remote system 302 images the haptics interface 344 and generates the captured image data, the ROI setting unit 332 uses the captured image to obtain the spatial coordinates (for example, joints, etc.) of the observation point (for example, joints) of the haptics interface 344. The coordinates of the 3D coordinate system) are derived and the ROI is set. The motion pixel editing unit 333 generates a motion focus map using the captured image and ROI setting information and the like. The media information synthesis unit 334 generates a haptics composite image using the motion attention map, and supplies the haptics composite image to the communication device 322 via the digital interface 323.

The composer 351 of the communication device 322 acquires the haptic composite image. The coding unit 352 encodes the haptic composite image and generates coded data (haptics composite video coded stream). The container processing unit 353 stores the encoded data in a file for transmission and transmits it to the local system 301 (feedback).

The container processing unit 361 of the communication device 322 of the local system 301 receives the file, analyzes it, and extracts the coded data (haptics synthetic video coded stream). The decoding unit 362 decodes the coded data (haptics composite video coded stream) and generates (restores) the haptics composite image. The decoding unit 362 supplies the haptics composite image to the haptics device 321 via the digital interface 324.

The actuator 343 drives the haptics interface 344 in response to the control information. The haptics interface 344 functions as an interface for force sense data, tactile sense data, and the like for an operator who is a user, a remote device, and the like. That is, the haptics interface 344 of the remote system 302 is controlled by the actuator 343, and the information (force) detected in the haptics interface 344 on the remote system 302 side represented by the haptics data fed back and transmitted as described above. Reproduce the sense (reaction), touch, etc.).

As described above, the local system 301 and the remote system 302 can realize bidirectional transmission of haptics data.

The local system 301 and the remote system 302 can generate and transmit an MPD which is control information for controlling the reproduction of the haptics data, or receive the MPD to control the reproduction of the haptics data. ..

For example, the MPD generation unit 354 acquires a haptics composite image from the composer 351 and generates an MPD including control information regarding the haptics data included in the haptics composite image. The coding unit 352 encodes the MPD. The container processing unit 353 stores the coded data of the MPD in a transmission file, and transmits it to, for example, the MPD server 403. The container processing unit 353 may transmit the transmission file in which the coded data of the MPD is stored to the remote system 302.

For example, when the container processing unit 361 acquires the coded data of the MPD corresponding to the desired haptics data from the MPD server 303, the decoding unit 362 decodes the coded data to generate the MPD. The MPD control unit 363 can control the container processing unit 361 using the MPD and acquire desired haptics data.

Further, the local system 301 and the remote system 302 can also exchange data that is not haptics data. For example, the imaging unit 355 captures a subject and generates captured image data, the video coding unit 356 encodes the captured image data, and the container processing unit 353 stores the encoded data in a transmission file for a remote system. It can be sent to 302.

Further, the container processing unit 361 receives the transmission file, extracts the encoded data of the captured image data, the video decoding unit 364 decodes it to generate the captured image data, and the display unit 365 generates the captured image data. The captured image corresponding to the data can be displayed on a monitor or the like.

As described above, this technology can be applied to bidirectional transmission as well, and an increase in the load of haptics data transmission can be suppressed.

Further, as described above, MPDs can be generated, supplied, and acquired. Therefore, for example, control information related to haptics data can be exchanged using this MPD. Therefore, this control information can be acquired before the haptics composite image (encoded data) is exchanged.

<MPD server>
FIG. 27 is a block diagram showing a main configuration example of the MPD server 303. In the MPD server 303 shown in FIG. 27, the CPU 401, ROM 402, and RAM 403 are connected to each other via the bus 404.

The input / output interface 410 is also connected to the bus 404. An input unit 411, an output unit 412, a storage unit 413, a communication unit 414, and a drive 415 are connected to the input / output interface 410.

The input unit 411 may include any input device such as a keyboard, a mouse, a microphone, a touch panel, an image sensor, a motion sensor, and various other sensors. Further, the input unit 411 may include an input terminal. The output unit 412 may include any output device, such as a display, a projector, a speaker, and the like. Further, the output unit 412 may include an output terminal.

The storage unit 413 includes, for example, an arbitrary storage medium such as a hard disk, a RAM disk, or a non-volatile memory, and a storage control unit that writes or reads information from the storage medium. The communication unit 414 includes, for example, a network interface. The drive 415 drives an arbitrary removable recording medium 421 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and writes or reads information from the removable recording medium 421.

In the MPD server 303 configured as described above, the CPU 401 loads the program stored in the storage unit 413 into the RAM 403 via the input / output interface 410 and the bus 404 and executes the program, which will be described later. Realize various functions indicated by the function blocks to be performed. The RAM 403 also appropriately stores data and the like necessary for the CPU 401 to execute various processes of the program.

The program executed by the computer can be recorded and applied to the removable recording medium 421 as a package medium or the like, for example. In that case, the program can be installed in the storage unit 413 via the input / output interface 410 by mounting the removable recording medium 421 in the drive 415.

This program can also be provided via a wired or wireless transmission medium such as a local area network, a leased line network, or WAN, Internet, satellite communication, etc. In that case, the program can be received by the communication unit 414 and installed in the storage unit 413.

In addition, this program can be installed in advance in ROM 402 or storage unit 413.

<Container configuration example>
The haptic composite image sent and received in the remote control system 300 is stored in, for example, an ISOBMFF (ISO Base Media File Format) format container (transmission file). In the case of this ISO BMFF, as shown in FIG. 28, the container has an IS (Initialization Segment) and an MS (Media Segment). The track identification information (trackID), time stamp (Timestamp), etc. are stored in the MS. In addition, MPD is associated with this MS. Furthermore, SEI (Supplemental Enhancement Information) for haptics data can be stored in this MS.

The SEI (Haptics_data_embeddding_information SEI) for this haptics data is used, for example, when the haptics composite image is encoded as a picture of a moving image by the coding unit 104 or the like, for example, a frame of the haptics composite image is used for the encoded data. Contains control information about the haptics data that is added each time and mapped to that frame.

<SEI>
29, 30, and 31 show an example of the SEI (Haptics_data_embedding_information SEI) syntax for the haptics data. In addition, FIGS. 32 and 33 show examples of the semantics. By referring to this SEI on the receiving side, for example, as shown in FIG. 29, information indicating the range of ROI (ROI_start_horizontal, ROI_start_vertical, ROI_end_horizontal, ROI_end_vertical) and the like can be obtained.

<MPD>
In addition, control information related to haptics data can also be described in the MPD as described above.

For example, control based on MPD is to access the MPD server 303, acquire and analyze the MPD file, consider the possible bandwidth on the receiving network, and select the bit rate so that the bit rate is appropriate. It can be carried out. Further, depending on the device configuration on the receiving side, it is also possible to control the selection of the composite image to be distributed so as to be within the reproducible range.

For example, in MPD, a new schema may be defined using the Supplementary descriptor. When the transmitting side receives a bit rate request via the server, the media information synthesis unit 334 may control to realize the request. A description example of MPD is shown in FIGS. 34 and 35. In the case of this example, it is possible to select whether the total amount of the coded bit rate is 4 Mbps or 2 Mbps. The target rate can be achieved by switching the parameters of sensor views and motion map according to this selection.

Further, in the bidirectional haptics data transmission described above, FIG. 36 shows an example of an MPD including control information regarding the haptics data transmitted from the local system 301 to the remote system 302. Further, FIG. 37 shows an example of an MPD including control information regarding haptics data transmitted from the remote system 302 to the local system 301. An example of the semantics of MPD elements is shown in FIG.

<Media segment>
The control information related to the haptics data can also be stored in the ISOBMFF media box (for example, hpmb in FIG. 28). For example, the parameters shown in FIG. 39 can be stored in hpmb (hptc_mediabox). Of course, parameters other than these may be stored in hpmb.

<5. Addendum>
<Computer>
The series of processes described above can be executed by hardware or software. When a series of processes are executed by software, the programs constituting the software are installed on the computer. Here, the computer includes a computer embedded in dedicated hardware, a general-purpose personal computer capable of executing various functions by installing various programs, and the like.

FIG. 40 is a block diagram showing a configuration example of computer hardware that executes the above-mentioned series of processes programmatically.

In the computer 900 shown in FIG. 40, the CPU (Central Processing Unit) 901, the ROM (ReadOnly Memory) 902, and the RAM (RandomAccessMemory) 903 are connected to each other via the bus 904.

The input / output interface 910 is also connected to the bus 904. An input unit 911, an output unit 912, a storage unit 913, a communication unit 914, and a drive 915 are connected to the input / output interface 910.

The input unit 911 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, and the like. The output unit 912 includes, for example, a display, a speaker, an output terminal, and the like. The storage unit 913 includes, for example, a hard disk, a RAM disk, a non-volatile memory, or the like. The communication unit 914 includes, for example, a network interface. The drive 915 drives a removable recording medium 921 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

In the computer configured as described above, the CPU 901 loads the program stored in the storage unit 913 into the RAM 903 via the input / output interface 910 and the bus 904 and executes the above-described series. Is processed. The RAM 903 also appropriately stores data and the like necessary for the CPU 901 to execute various processes.

The program executed by the computer can be recorded and applied to, for example, a removable recording medium 921 as a package medium or the like. In that case, the program can be installed in the storage unit 913 via the input / output interface 910 by mounting the removable recording medium 921 in the drive 915.

This program can also be provided via a wired or wireless transmission medium such as a local area network, a leased line network, or WAN, Internet, satellite communication, etc. In that case, the program can be received by the communication unit 914 and installed in the storage unit 913.

In addition, this program can be installed in advance in ROM 902 or storage unit 913.

<Applicable target of this technology>
In the above, as an application example of the present technology, each device of the transmitting device 100, the receiving device 200, the remote control system 300, and the like has been described, but the present technology can be applied to any configuration.

For example, this technology is a transmitter or receiver (for example, for satellite broadcasting, cable broadcasting such as cable TV, Internet, local area network, distribution on a dedicated line network or WAN, and distribution to terminals by cellular communication. Various devices (for example, hard disk recorders and cameras) that record images on media such as television receivers and mobile phones, or on media such as optical disks, magnetic disks, and flash memories, and reproduce images from these storage media. Can be applied to various electronic devices.

Further, for example, in the present technology, a processor as a system LSI (Large Scale Integration) or the like (for example, a video processor), a module using a plurality of processors (for example, a video module), a unit using a plurality of modules (for example, a video unit) Alternatively, it can be implemented as a configuration of a part of the device, such as a set (for example, a video set) in which other functions are added to the unit.

Also, for example, this technology can be applied to a network system composed of a plurality of devices. For example, the present technology may be implemented as cloud computing that is shared and jointly processed by a plurality of devices via a network. For example, this technology is implemented in a cloud service that provides services related to images (moving images) to arbitrary terminals such as computers, AV (AudioVisual) devices, portable information processing terminals, and IoT (Internet of Things) devices. You may try to do it.

In the present specification, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a device in which a plurality of modules are housed in one housing are both systems. ..

The above-mentioned series of processes can be executed by hardware or software. When a series of processes are executed by software, the programs constituting the software are installed on the computer. Here, the computer includes a computer embedded in dedicated hardware, a general-purpose personal computer capable of executing various functions by installing various programs, and the like.

<Others>
The embodiment of the present technology is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present technology.

For example, the configuration described as one device (or processing unit) may be divided and configured as a plurality of devices (or processing units). On the contrary, the configurations described above as a plurality of devices (or processing units) may be collectively configured as one device (or processing unit). Further, of course, a configuration other than the above may be added to the configuration of each device (or each processing unit). Further, if the configuration and operation of the entire system are substantially the same, a part of the configuration of one device (or processing unit) may be included in the configuration of another device (or other processing unit). ..

Further, for example, the above-mentioned program may be executed in any device. In that case, the device may have necessary functions (functional blocks, etc.) so that necessary information can be obtained.

Further, for example, each step of one flowchart may be executed by one device, or may be shared and executed by a plurality of devices. Further, when a plurality of processes are included in one step, the plurality of processes may be executed by one device, or may be shared and executed by a plurality of devices. In other words, a plurality of processes included in one step can be executed as processes of a plurality of steps. On the contrary, the processes described as a plurality of steps can be collectively executed as one step.

Further, for example, in a program executed by a computer, the processing of the steps for writing the program may be executed in chronological order in the order described in the present specification, and the calls may be made in parallel or in parallel. It may be executed individually at the required timing such as when it is broken. That is, as long as there is no contradiction, the processing of each step may be executed in an order different from the above-mentioned order. Further, the processing of the step for writing this program may be executed in parallel with the processing of another program, or may be executed in combination with the processing of another program.

Further, for example, a plurality of technologies related to this technology can be independently implemented independently as long as there is no contradiction. Of course, any plurality of the present technologies can be used in combination. For example, some or all of the techniques described in any of the embodiments may be combined with some or all of the techniques described in other embodiments. It is also possible to carry out a part or all of any of the above-mentioned techniques in combination with other techniques not described above.

The present technology can also have the following configurations.
(1) A pixel mapping unit that maps haptics data detected at the observation point of the haptics device that serves as an interface to pixels in a two-dimensional image, and
An information processing device including a coding unit that encodes the two-dimensional image to which the haptics data is mapped by the pixel mapping unit and generates encoded data.
(2) The information processing apparatus according to (1), wherein the pixel mapping unit maps the haptics data to pixels corresponding to detection positions of the haptics data in the two-dimensional image.
(3) The information processing apparatus according to (1) or (2), wherein the pixel mapping unit arranges the haptics data in each of the Y component, Pr component, and Pb component of the two-dimensional image.
(4) The haptics data includes force sense data including information on force sense.
The information processing apparatus according to (3), wherein the pixel mapping unit divides the force sense data into components and arranges them in the Y component, the Pr component, and the Pb component of the two-dimensional image.
(5) The haptics data includes tactile data including information on tactile sensation.
The information according to (3) or (4), wherein the pixel mapping unit divides the tactile data in the bit depth direction and arranges the tactile data in the Y component, the Pr component, and the Pb component of the two-dimensional image. Processing equipment.
(6) The pixel mapping unit is described in any one of (1) to (5), wherein one haptics data is divided in the bit depth direction and arranged in each pixel of a subblock composed of a plurality of pixels. Information processing device.
(7) The information processing apparatus according to (6), wherein the pixel mapping unit arranges the haptics data sampled a plurality of times in a plurality of the sub-blocks included in one block for each sampled data.
(8) Further provided is a composite image generation unit that generates a composite image in which the two-dimensional image to which the haptics data is mapped by the pixel mapping unit and an image indicating the position of movement are combined.
The information processing apparatus according to any one of (1) to (7), wherein the coding unit encodes the composite image and generates the coded data.
(9) The information processing apparatus according to (8), wherein the image showing the position of the movement further indicates the position of interest.
(10) The coding unit is
The two-dimensional image to which the haptics data is mapped is encoded as a picture of a moving image.
Further, the information processing apparatus according to any one of (1) to (9), wherein control information related to the haptics data is added to each of the pictures.
(11) The information processing apparatus according to any one of (1) to (10), further comprising an MPD generation unit that generates an MPD including control information related to the haptics data.
(12) Further provided with a file generation unit for generating an ISOBMFF format file for storing the coded data.
The information processing device according to any one of (1) to (11), wherein the file generation unit stores control information related to the haptics data in a media box of the file.
(13) The haptics data detected at the observation point of the haptics device as the interface is mapped to the pixels of the two-dimensional image.
An information processing method that encodes the two-dimensional image to which the haptics data is mapped and generates encoded data.
(14) A decoding unit that decodes the coded data and generates a two-dimensional image to which the haptics data detected at the observation point of the interface haptics device is mapped.
An information processing device including an extraction unit that extracts the haptics data from the two-dimensional image generated by the decoding unit.
(15) The coded data is decoded to generate a two-dimensional image to which the haptics data detected at the observation point of the haptics device as the interface is mapped.
An information processing method for extracting the haptics data from the generated two-dimensional image.

100 transmitter, 101 ROI setting unit, 102 motion pixel editing unit, 103 media information synthesis unit, 104 coding unit, 105 container processing unit, 131 motion image generation unit, 132 pixel editing unit, 141 pixel mapping unit, 142 composite image Generation unit, 200 receiving device, 201 container processing unit, 202 decoding unit, 203 media information analysis unit, 204 haptics presentation unit, 221 position information extraction unit, 222 physical space remapping unit, 223 force / tactile information extraction unit, 300 Remote operation system, 301 local system, 302 remote system, 303 MPD server, 321 haptics device, 322 communication device, 331 image sensor, 332 ROI setting unit, 333 movement pixel editing unit, 334 media information synthesis unit, 341 media information analysis Department, 342 renderer, 343 actuator, 344 haptics I / F, 351 composer, 352 encoding unit, 353 container processing unit, 354 MPD generation unit, 361 container processing unit, 362 decoding unit, 363 MPD control unit

Claims

A pixel mapping unit that maps haptics data detected at the observation point of the haptics device that serves as an interface to pixels in a two-dimensional image, and a pixel mapping unit.
An information processing device including a coding unit that encodes the two-dimensional image to which the haptics data is mapped by the pixel mapping unit and generates encoded data.
The information processing device according to claim 1, wherein the pixel mapping unit maps the haptics data to pixels corresponding to detection positions of the haptics data in the two-dimensional image.
The information processing apparatus according to claim 1, wherein the pixel mapping unit arranges the haptics data in each of the Y component, Pr component, and Pb component of the two-dimensional image.
The haptics data includes force sensation data including information about force sensation.
The information processing apparatus according to claim 3, wherein the pixel mapping unit divides the force sensory data into components and arranges the force sense data in the Y component, the Pr component, and the Pb component of the two-dimensional image.
The haptics data includes tactile data that includes information about tactile sensation.
The information processing apparatus according to claim 3, wherein the pixel mapping unit divides the tactile data in the bit depth direction and arranges the tactile data in the Y component, the Pr component, and the Pb component of the two-dimensional image.
The information processing device according to claim 1, wherein the pixel mapping unit divides one haptics data in the bit depth direction and arranges the haptics data in each pixel of a subblock composed of a plurality of pixels.
The information processing apparatus according to claim 6, wherein the pixel mapping unit arranges the haptics data sampled a plurality of times in a plurality of the sub-blocks included in one block for each sampled data.
A composite image generation unit for generating a composite image obtained by synthesizing the two-dimensional image to which the haptics data is mapped by the pixel mapping unit and an image indicating the position of movement is further provided.
The information processing apparatus according to claim 1, wherein the coding unit encodes the composite image and generates the coded data.
The information processing device according to claim 8, wherein the image showing the position of the movement further indicates the position of interest.
The coding unit is
The two-dimensional image to which the haptics data is mapped is encoded as a picture of a moving image.
The information processing device according to claim 1, wherein control information related to the haptics data is added to each picture.
The information processing apparatus according to claim 1, further comprising an MPD generation unit that generates an MPD including control information related to the haptics data.
It also has a file generator that generates an ISO BMFF format file that stores the coded data.
The information processing device according to claim 1, wherein the file generation unit stores control information related to the haptics data in a media box of the file.
The haptics data detected at the observation point of the haptics device that serves as the interface is mapped to the pixels of the two-dimensional image.
An information processing method that encodes the two-dimensional image to which the haptics data is mapped and generates encoded data.
A decoding unit that decodes the coded data and generates a two-dimensional image to which the haptics data detected at the observation point of the interface haptics device is mapped.
An information processing device including an extraction unit that extracts the haptics data from the two-dimensional image generated by the decoding unit.
The coded data is decoded to generate a two-dimensional image to which the haptics data detected at the observation point of the interface haptics device is mapped.
An information processing method for extracting the haptics data from the generated two-dimensional image.