WO2018084051A1

WO2018084051A1 - Information processing device, head-mounted display, information processing system, and information processing method

Info

Publication number: WO2018084051A1
Application number: PCT/JP2017/038524
Authority: WO
Inventors: 大場　章男
Original assignee: 株式会社ソニー・インタラクティブエンタテインメント
Priority date: 2016-11-01
Filing date: 2017-10-25
Publication date: 2018-05-11
Also published as: JP6645949B2; JP2018074486A

Abstract

An image capturing device 12 uses a rolling shutter camera to capture moving images (frames f_n-1, f_n, f_n+1) having a time difference in an observation time period for each row in a frame. A display device 16 immediately displays the data of the captured images for each row in the order of acquisition. An information processing device 10 corrects position coordinates of a feature point extracted in each frame to position coordinates at respective reference time points, and performs analysis (images 204a, 204b).

Description

Information processing apparatus, head mounted display, information processing system, and information processing method

The present invention relates to an information processing apparatus that performs information processing using captured images, a head-mounted display having a capturing function, an information processing system that displays images using captured images, and an information processing method using the captured images.

A game is known in which a part of the body, such as the user's head, is photographed with a video camera, a predetermined area such as the eyes, mouth, hand, etc. is extracted and the area is replaced with another image and displayed on the display. (For example, refer to Patent Document 1). There is also known a user interface system that receives mouth and hand movements taken by a video camera as application operation instructions. In this way, technology that captures the real world and displays a virtual world that reacts to its movement, or performs some kind of information processing, is used in a wide range of fields, from small mobile terminals to leisure facilities, regardless of their scale. ing.

European Patent Application No. 0999518

Immediateness from shooting to output of results is important in order to realize realistic image expression and to perform information processing with high accuracy. In particular, when a subject's movement is immediately reflected in the displayed image, or when an image pickup device is provided on the head-mounted display worn by the user to display an image corresponding to the field of view, a slight delay in processing is uncomfortable and easy to use. Produce badness. In order to pursue immediacy, if it is attempted to reduce the amount of data transfer and processing load, it may be considered that sufficient accuracy cannot be obtained for information processing.

The present invention has been made in view of these problems, and an object of the present invention is to provide a technique that achieves both immediacy and processing accuracy in information processing and display using captured images.

An aspect of the present invention relates to an information processing apparatus. The information processing apparatus includes a captured image acquisition unit that acquires data of a captured moving image from an imaging apparatus that includes a rolling shutter that captures an image with a time lag for each row of pixels, and a frame of the moving image. A correction unit that corrects the position coordinates of the feature points to the position coordinates at the reference time of the frame; and an analysis processing unit that performs image analysis using the corrected position coordinates and reflects the result in the output data. It is characterized by that.

Another aspect of the present invention relates to a head mounted display. This head-mounted display has a rolling shutter that captures images with a time lag for each row of pixels, an image pickup unit that outputs captured image data sequentially from the row after shooting, and a shooting for each row of pixels. A display unit that acquires image data and displays the image data sequentially from the line for which acquisition has been completed.

Still another aspect of the present invention relates to an information processing system. This information processing system includes a rolling shutter that captures an image with a time lag for each row of pixels, and is captured from an imaging unit that sequentially outputs captured image data from the row where the shooting is completed, and the imaging unit. Analysis of acquiring the moving image data, correcting the position coordinates of the feature points in the frame of the moving image to the position coordinates of the frame at the reference time, and performing image analysis using the corrected position coordinates A processing unit, an output data generation unit that generates display image data using the result of image analysis and outputs the data for each row, and a display unit that sequentially displays the display image from the output row It is characterized by.

Still another aspect of the present invention relates to an information processing method. In this information processing method, the information processing device obtains captured moving image data from an imaging device including a rolling shutter that captures an image with a time lag for each row of pixels, and a frame of the moving image. Correcting the position coordinates of the feature points to the position coordinates at the reference time of the frame, performing image analysis using the corrected position coordinates, and outputting data reflecting the analysis results. It is characterized by including.

Note that any combination of the above-described components, and the expression of the present invention converted between a method, an apparatus, a system, a computer program, a recording medium on which the computer program is recorded, and the like are also effective as an aspect of the present invention. .

According to the present invention, it is possible to achieve both immediacy and processing accuracy in information processing and display using captured images.

It is a figure which shows the structural example of the information processing system of this Embodiment. It is a figure which shows the example of an external appearance shape when the display apparatus of this Embodiment is used as a head mounted display. It is a figure for demonstrating the relationship between the acquisition order of the imaging | photography data by the rolling shutter in this Embodiment, and an image plane. It is a figure for demonstrating the difference of the analysis of the picked-up image by a global shutter and a rolling shutter. It is a figure for demonstrating progress of time from imaging | photography to a display when transfer time is considered. It is a figure which shows the internal circuit structure of the information processing apparatus in this Embodiment. It is a figure which shows the structure of the functional block of the information processing apparatus in this Embodiment. It is a figure for demonstrating the method of the correction process which a correction | amendment part performs in this Embodiment. It is a figure for demonstrating the method in which the correction | amendment part calculates | requires the corrected optical flow in this Embodiment. It is a figure for demonstrating the correction process at the time of considering pure exposure time in this Embodiment. It is a figure which shows typically the temporal relationship of the picked-up image in this Embodiment, the image used for image analysis, and a display image.

FIG. 1 shows a configuration example of an information processing system according to the present embodiment. The information processing system 1 includes an imaging device 12 that captures a real space, an information processing device 10 that performs information processing based on a captured image, and a display device 16 that displays an image output by the information processing device 10. The information processing apparatus 10 may be connectable to a network 18 such as the Internet.

The information processing apparatus 10, the imaging apparatus 12, the display apparatus 16, and the network 18 may be connected by a wired cable, or may be wirelessly connected by a wireless LAN (Local Area Network) or the like. Any two or all of the imaging device 12, the information processing device 10, and the display device 16 may be combined and integrally provided. For example, the information processing system 1 may be realized by a portable terminal or a head mounted display equipped with them. In any case, the external shapes of the imaging device 12, the information processing device 10, and the display device 16 are not limited to those illustrated.

The imaging device 12 performs imaging processing such as a CMOS (Complementary Metal Metal Oxide Semiconductor) sensor that captures an object at a predetermined frame rate, and performs demosaic processing, lens distortion correction, color correction, and the like on the output data. Including an image processing mechanism for generating The image processing mechanism may include a mechanism for generating image data with a plurality of resolutions by reducing an image. The imaging device 12 may be a so-called stereo camera in which two cameras are arranged on the left and right at a known interval.

The imaging device 12 transmits the data of the captured and generated image to the information processing device 10 in a stream format in order from the uppermost pixel row of the image. When the imaging device 12 generates image data with a plurality of resolutions, for example, only resolution and area data in accordance with a request from the information processing device 10 may be transmitted. The information processing apparatus 10 performs image analysis on the data transmitted from the imaging apparatus 12 and performs information processing based on the result, or reflects the data in a data request to the imaging apparatus 12. Further, the information processing apparatus 10 transmits output data such as a display image and sound to the display device 16.

Here, the content of the output data is not particularly limited, and may vary depending on the function requested by the user in the system and the content of the activated application. For example, the information processing apparatus 10 may transmit the captured image data transmitted from the imaging apparatus 12 as it is to the display device 16 so that the captured image is displayed immediately. Or you can acquire the position and orientation of the object in the captured image by image analysis, and apply some processing to the captured image based on it, or progress the electronic game based on the result of the image analysis and generate a game screen Good. Typical examples of such a mode include virtual reality (VR) and augmented reality (AR).

The display device 16 includes a display such as liquid crystal, plasma, or organic EL that outputs an image, and a speaker that outputs sound, and outputs output data transmitted from the information processing device 10 as an image or sound. The display device 16 may be a television receiver, various monitors, a display screen of a mobile terminal, an electronic viewfinder of a camera, or a head-mounted display that is attached to the user's head and displays an image in front of the user's eyes.

FIG. 2 shows an example of the external shape when the display device 16 is a head mounted display. In this example, the head mounted display 100 includes an output mechanism unit 102 and a mounting mechanism unit 104. The mounting mechanism unit 104 includes a mounting band 106 that goes around the head when the user wears to fix the device.

The output mechanism unit 102 includes a housing 108 shaped to cover the left and right eyes when the user mounts the head mounted display 100, and includes a display panel inside so as to face the eyes when worn. The housing 108 may further include a lens that is positioned between the display panel and the user's eyes when the head mounted display 100 is attached, and that enlarges the viewing angle of the user. Further, the head mounted display 100 may further include a speaker or an earphone at a position corresponding to the user's ear when worn.

In this example, the head mounted display 100 includes the stereo camera 110 on the front surface of the housing 108 as the imaging device 12, and takes a moving image of the surrounding real space with a field of view corresponding to the user's line of sight. In this case, the information processing apparatus 10 can identify the position and orientation of the user's head relative to the surrounding environment by analyzing the captured image using SLAM (SimultaneousaneLocalization and Mapping) technology. Using this information, for example, if a visual field for the virtual world is determined and display images for left-eye viewing and right-eye viewing are generated and displayed, a VR as if the virtual world has spread in front of the eyes can be realized. . At this time, the information processing apparatus 10 may be an external apparatus that can establish communication with the head mounted display 100 or may be built in the head mounted display 100.

As described above, since the information processing system 1 according to the present embodiment can be applied to various modes, the configuration and appearance of each device may be appropriately determined accordingly. In the following, a description will be given with a focus on a technique that achieves both immediacy from shooting in real space to image display and analysis accuracy of the shot image. For this purpose, in the present embodiment, a rolling shutter camera in which shooting timing is shifted for each row is employed as the imaging device 12.

FIG. 3 is a diagram for explaining the relationship between the image data acquisition order by the rolling shutter and the image plane. The upper part of the figure shows an exposure time 142 and a data read time 144 for each row of pixel columns in the horizontal direction (x axis) on a plane 140 formed by the vertical direction (y axis) of the imaging surface and the time axis. The rolling shutter camera is a “line exposure sequential readout” type camera. As shown in the figure, the exposure time is shifted for each row of the imaging surface, and data for each frame is obtained by reading the data of each row immediately after the exposure is completed. .

Generally, analysis and display are performed by expanding the data for each row acquired in this way on a plane 146 formed by the x-axis and the y-axis and handling them as image frames at the same time. Strictly speaking, however, the actual observation time differs depending on the difference in exposure time between the object that appears above the photographed image and the object that moves down, so that the analysis result may include an error due to the difference. The error becomes larger as the movement of the object or the imaging surface becomes faster. In order to eliminate such an error, it is conceivable to employ a global shutter camera using an image sensor such as a CCD.

The global shutter camera is a “simultaneous exposure batch readout” type camera, and all rows are exposed at the same timing, so there is no difference in the observation time of objects appearing in one frame. FIG. 4 is a diagram for explaining a difference in image analysis when a subject moves. In the figure, (a) is a global shutter camera, (b) is a rolling shutter camera, and the same subject (or a feature point included therein) is photographed. It represents about. That is, the minimum unit solid line rectangle (rectangle or parallelogram) represents exposure for one frame.

First, in the case of the global shutter shown in (a), since all rows are exposed during the same period, the reference times t ₀ , t ₁ , t ₂ ,. There is no deviation depending on the position in the y-axis direction between the actual observation time. On the other hand, in the case of the rolling shutter shown in (b), since the exposure timing differs depending on the row as described above, at the actual observation time with respect to the reference times t ₀ , t ₁ , t ₂ ,. A shift depending on the position in the y-axis direction occurs.

In the case of the figure, since the exposure time of the uppermost row on the imaging surface is set as the reference time, the error increases as the subject is at the lower side. When such photographed images are collectively analyzed as data of reference times t ₀ , t ₁ , t ₂ ,..., As shown by white circles, they are recognized faster or slower depending on the moving direction of the subject. Error occurs. Depending on the contents of the image analysis, it is conceivable that an error occurs in the analysis result of not only the moving speed but also the position and orientation, or the detection target is erroneously recognized.

On the other hand, even in the case of a global shutter, data reading and transfer are sequentially performed in units of rows, which may be disadvantageous from the viewpoint of immediacy in image analysis and display. FIG. 5 is a diagram for explaining the passage of time from shooting to display when the transfer time is taken into account. (A) shows the timing of data processing for one frame when displaying an image shot with a global shutter camera and (b) showing an image shot with a rolling shutter camera on the display device 16, respectively. Although the information processing apparatus 10 may be interposed in the data transfer path, it is omitted in FIG.

For global shutter shown in (a), as shown in rectangular 110a by thick lines, the exposure of all the rows at time tg ₀ is made at the same time. On the other hand, the output from the imaging device 12 is made in order from the upper row due to the limitation of the transmission band, as indicated by a dotted line in the rectangle 110a. For example, the upper line of the image is output at an early timing indicated by an arrow a, and the lower line is output at a later timing indicated by an arrow a ′. That is, the lower row needs to wait for Δt before data output. As a result, it takes time tg ₁ -tg _{0 for} the imaging device 12 to output data of all rows.

The data thus output is stored in the frame buffer of the display device 16 via the information processing device 10 and displayed. The dotted line in the rectangle 112a represents the timing at which the data of each row is stored in the display device 16. Also in this case, the upper row of the image is stored at an early timing as indicated by an arrow a, and the lower row is stored at a later timing as indicated by an arrow a ′. That is, the upper row needs to wait for Δt ′ before all data is stored. As a result, it takes time tg ₃ -tg ₂ for the display device 16 to store all rows of data in the frame buffer.

In the figure, the time at which the storage in the frame buffer is completed is tg ₃ , but in the case of a display device premised on the frame buffer, an adjustment time corresponding to the driving method is further generated until the actual display. . The example shown in the drawing is a simple mode in which a captured image is displayed as it is. However, even if some processing or drawing is performed in the information processing apparatus 10, as long as the transmission band is limited, the time for sequentially transmitting data and the frame As the time for storing in the buffer, at least tg ₃ -tg ₀ is required from shooting to display. The same waiting time also occurs when the information processing apparatus 10 is provided with a frame buffer and performs image analysis or the like.

On the other hand, in the case of the rolling shutter shown in (b), since the exposure itself is performed in order from the upper row, the time of tr ₁ -tr ₀ is required to expose all the rows as indicated by the thick line in the rectangle 110b. Cost. However, since data is output immediately from the line where the exposure is completed, the time from shooting to output does not depend on the line, as illustrated by arrows b and b ′. That is, the standby time Δt that has occurred in the global shutter does not occur.

In consideration of the characteristics of the rolling shutter, a display corresponding to a line buffer that can output immediately for each row without waiting for storage of data for one frame in the frame buffer is preferably adopted as the display device 16. Then, as indicated by a rectangle 112b, the display progresses sequentially in the time from time tr ₂ to tr ₃ so as to synchronize with the output timing from the imaging device 12. As a result, the waiting time Δt ′ generated in the display device based on the frame buffer does not occur.

Considering such standby times Δt and Δt ′, the time difference from imaging to display is the shortest in the combination of the rolling shutter camera and the line buffer compatible display device shown in FIG. That is, on the premise of the time difference due to the scanning of the display screen, the latest information can be displayed as an image by providing the observation time difference for each line. Note that a field emission display (FED: Field として Emission Display) may be employed as a display of a method for immediately displaying input data.

In addition, the figure shows the processing timing of the imaging device 12 and the display device 16 as the most easily understood example. However, it is desirable to use newer data, that is, information, as a processing target by using a captured image. The same applies to the information processing apparatus 10 that performs processing. In other words, realizing a coordinated operation that shortens the time for waiting for input data in a memory or the like as much as possible has a remarkable effect in terms of immediacy and accuracy of display and information processing. At the same time, an effect of saving the memory capacity can be obtained.

Based on such unique knowledge, in this embodiment, a rolling shutter camera is adopted for shooting, and the imaging device 12, the information processing device 10, and the display device 16 are basically configured to output data immediately. On the other hand, in image analysis, by correcting the data so that the time difference of observation in one frame is eliminated, both immediate use of observation data and analysis accuracy are compatible. The correction is basically performed by estimating the position of each frame at the reference time based on the time and position at which the feature point was observed. A specific calculation method will be described later.

FIG. 6 shows the internal circuit configuration of the information processing apparatus 10. The information processing apparatus 10 includes a CPU (Central Processing Unit) 23, a GPU (Graphics Processing Unit) 24, and a main memory 26. These units are connected to each other via a bus 30. An input / output interface 28 is further connected to the bus 30. The input / output interface 28 outputs data to a peripheral device interface such as USB or IEEE1394, a communication unit 32 including a wired or wireless LAN network interface, a storage unit 34 such as a hard disk drive or a nonvolatile memory, and the display device 16. An output unit 36, an input unit 38 for inputting data from the imaging device 12 or an input device (not shown), and a recording medium driving unit 40 for driving a removable recording medium such as a magnetic disk, an optical disk or a semiconductor memory are connected.

The CPU 23 controls the entire information processing apparatus 10 by executing the operating system stored in the storage unit 34. The CPU 23 also executes various programs read from the removable recording medium and loaded into the main memory 26 or downloaded via the communication unit 32. The GPU 24 has a function of a geometry engine and a function of a rendering processor, performs a drawing process in accordance with a drawing command from the CPU 23, and outputs it to the output unit 36. The main memory 26 is composed of RAM (Random Access Memory) and stores programs and data necessary for processing.

FIG. 7 shows a functional block configuration of the information processing apparatus 10. Each functional block shown in the figure can be realized by the various circuits shown in FIG. 6 in terms of hardware, and in terms of software, an image analysis function, an information processing function, and an image loaded from a recording medium to the main memory. This is realized by a program that exhibits various functions such as a drawing function and a data input / output function. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

The information processing apparatus 10 includes a captured image acquisition unit 52 that acquires captured image data from the imaging device 12, an image analysis unit 54 that analyzes the acquired image, and an output that generates data to be output by using the analysis result. A data generation unit 56 is included. The captured image acquisition unit 52 is realized by the input unit 38, the CPU 23, the main memory 26, and the like of FIG. 6, and sequentially acquires frame data of the captured image from the imaging device 12. Specifically, as described above, data is acquired in a stream format in order from the line in which exposure is completed in one frame. The acquired data is supplied to the image analysis unit 54 and the output data generation unit 56.

Also at this time, it is preferable to supply data sequentially from the acquired row data. However, depending on the content of the image analysis performed by the image analysis unit 54, it may be stored as two-dimensional image data in the main memory 26 or the like so that the image analysis unit 54 can refer to it appropriately. Further, the captured image acquisition unit 52 may request the imaging apparatus 2 by designating captured image data to be acquired based on the result of image analysis by the image analysis unit 54.

The image analysis unit 54 is realized by the CPU 23, the GPU 24, the main memory 26, and the like in FIG. 6, performs a predetermined image analysis using the captured image data, and supplies the result to the output data generation unit 56. The content of the analysis performed by the image analysis unit 54 is not particularly limited. For example, the above-mentioned analysis such as the SLAM or the tracking processing of the object may be performed, and any of image analysis generally performed such as object detection, object recognition, and depth map acquisition may be used. In any case, the accuracy of the analysis process can be improved by the correction that eliminates the difference in the observation time within the frame as described above.

Specifically, the image analysis unit 54 includes a feature extraction unit 60, a correction unit 62, a correction data storage unit 64, and an analysis processing unit 66. The feature extraction unit 60 extracts features used for image analysis from the captured image. Specific extraction objects and processing algorithms vary depending on the image analysis to be performed, and examples include edge detection, corner detection, contour detection, and area division by texture. Since such feature extraction may be performed by a general technique, detailed description thereof is omitted.

The correction unit 62 corrects the points, lines, or area boundaries extracted as features to represent the exact position of each frame at the reference time. Data necessary for correction is stored in the correction data storage unit 64 and referred to as appropriate. Examples of the data include position information of features extracted up to the previous frame, and a two-dimensional map in which parameters related to observation time shift time are associated with discrete positions on the image plane. The parameter may be calculated in consideration of lens distortion correction performed by the imaging device 12.

The parameters that can be calculated in advance based on lens distortion correction and exposure progress speed, etc. can be prepared in advance in the form of a two-dimensional map or a lookup table, so that the correction process can be made more efficient. The analysis processing unit 66 performs a predetermined analysis of the image analysis as exemplified above by using the feature data whose position is corrected.

The output data generation unit 56 is realized by the CPU 23, the GPU 24, the main memory 26, the output unit 36, and the like of FIG. 6, and generates display image and audio data to be output and outputs them to the display device 16. The kind of data to be generated may vary depending on the purpose of use of the information processing apparatus 10 and the application selected by the user. In the aspect in which the captured image is displayed as it is, the data stream acquired from the captured image acquisition unit 52 may be output as it is. Even when some processing is performed on the captured image, it is desirable to suppress the delay time by completing the processing for each row and outputting it immediately.

FIG. 8 is a diagram for explaining a correction processing technique performed by the correction unit 62. As in FIG. 4, the horizontal direction is the time axis, the vertical direction is the vertical axis (y-axis) of the captured image, and the solid line rectangle (parallelogram) of the smallest unit is the exposure processing of one frame by the rolling shutter camera. Represents. The black circles represent the observation times and positions of feature points. In the case of the figure, two feature points are shown in each frame. For example, in the n-th frame f _n , two feature points are observed at the positions y _n and y _n ′ at the exposure timing indicated by the dotted line.

Although the figure shows a two-dimensional space of the y-axis and the time axis, the position information of the object is naturally a two-dimensional coordinate consisting of the horizontal axis (x-axis) and the y-axis of the image plane. If the exposure time of the top row is a reference time and the interval of the reference time between frames is d, the reference times of the frames f _n−1 , f _n , f _{n + 1} , f _{n + 2} ,. Thus, d · (n−1), d · n, d · (n + 1), d · (n + 2),... Here, the interval d has an inversely proportional relationship with the progress rate of exposure in the y-axis direction, and can also be regarded as a frame shooting period. The correcting unit 62 obtains a position correction value at the reference time by interpolation based on the amount of movement of the feature point from the previous frame f _n−1 .

In the figure, the corrected position is indicated by a white square. For example, when focusing on the feature point 120 extracted in the frame f _n , the delay time R _n from the reference time d · n at the time when the feature point 120 is observed depends on the position y _n in the vertical direction on the image plane, and Is given by
R _n = d · y _n / V
Here, V is the length of the image in the vertical direction. Considering the immediately preceding frame f _n−1 , the feature point 120 is observed at a time (d−R _n−1 ) before the reference time d · n and a time after R _n .

If the position coordinates of the feature point 120 observed at those times are (x _n−1 , y _n−1 ) and (x _n , y _n ), respectively, the feature point 120 at the reference time d · n between them The position coordinates (xc _n , yc _n ) are obtained as follows.

If L _n = 1 / (Vy _n-1 + y _n ), Equation 1 can be expressed as follows.

The correction unit 62 corrects the position coordinates of the feature points of each frame composing the captured image to the value at the reference time of the frame using Expression 2. Even if the feature is a line or a region, it can be corrected to the shape at the reference time by correcting the position coordinates of the points constituting the line or boundary line. Thereby, based on the image at the time unified within the frame, image analysis can be performed accurately.

In addition, since the position coordinates of the same feature point in the immediately preceding frame are used for the correction, the correction unit 62 stores the identification information of the feature point in association with at least the position coordinates in the immediately previous frame in the correction data storage unit 64. Keep it. In the example of FIG. 8, the change of the position coordinates of the feature points observed in the two preceding and following frames is linearly interpolated, but the interpolation algorithm is not limited to this. That is, the position coordinates observed in three or more frames may be taken into account, or they may be used to interpolate with a curve such as spline interpolation.

Further, in the example of FIG. 8, since the exposure time of the top row is set as the reference time, it is assumed that the observation times of all feature points are later than that. On the other hand, the reference time can be freely set such as the exposure time of the center row and the bottom row. In this case, naturally, the observation time of the feature point located above the line exposed at the reference time is earlier than the reference time, but the “delay time” described so far is replaced with the “deviation time” from the reference time. Thus, the correction can be realized by the same calculation. The same applies to the following calculations.

The above calculation is based on the premise that the orthogonal two-dimensional array of image sensors corresponds to the pixel array of the captured image. In this case, as shown by the dotted line in FIG. 8, the position with the observation time of the y-axis direction is proportional, the delay time R _n from the reference time is determined as a linear function of y _n as described above. On the other hand, when the lens distortion correction is performed in the imaging device 12, each pixel of the output captured image is shifted in the x-axis direction and the y-axis direction from the position of the imaging element where the pixel value is observed, and the shift amount is It depends on the position on the image.

As a result, the relationship between the position coordinates of the feature points in the image after lens distortion correction and the actual shooting time also changes depending on the position on the image plane. For this reason, the correction unit 62 may perform correction in consideration of lens distortion correction in the imaging device 12. Specifically, the parameter m (x, y) corresponding to y _n / V, which is a component of the ratio of the delay time to the exposure time for one frame in the delay time R _n = d · y _n / V described above. Are calculated for a plurality of positions (x, y) on the image plane.

Specifically, as shown below, the lens distortion correction reverse correction M is applied to each position (x, y) to obtain the coordinates (x _m , y _m ) before correction, and the y coordinate y of them Calculate the ratio of _m to V.
(X _m , y _m ) = (x, y) · M
_{m (x, y) = y} m / V

When using a correction map in which the correction amount is set for the orthogonal mesh on the image plane during lens distortion correction, a delay map representing the parameter m (x, y) on a two-dimensional plane can be easily applied to the same mesh. Can be generated. The correction unit 62 refers to the delay map stored in the correction data storage unit 64 based on the position coordinates (x _n , y _n ) of the feature points extracted from the frame f _n and interpolates as necessary. Then, the parameter m (x _n , y _n ) indicating the delay ratio at the time when the feature point is observed is acquired.

Then, the delay time R _n of the observation time of the feature point from the reference time d · _n of the frame f _n is obtained as follows.
R _n = d · m (x _n , y _n )
In this case, L _n = 1 / (1−m (x _n−1 , y _n−1 ) + m (x _n , y _n )), and the corrected position coordinates (xc _n , yc _n ) Instead, it asks for:

FIG. 9 is a diagram for explaining a method in which the correction unit 62 obtains a corrected optical flow. The format of the figure and the movement of the feature points are the same as those shown in FIG. In general, an optical flow indicating a movement vector of an object or a feature point is important information for detecting and tracking a moving object or specifying a shape change. In FIG. 9, the vector 130 indicating the optical flow is shown in a two-dimensional space formed by the y-axis and the time axis, but actually, the amount of movement per unit time on the image plane composed of the x-axis and the y-axis is shown. To express.

The correcting unit 62 accurately obtains the optical flow using the position coordinates (xc _n , yc _n ) of the feature points corrected by Expression 2 or 3. Specifically, the optical flow of the feature points in the frame f _n , that is, the vector (Vx _n , Vy _n ) on the image plane is obtained as follows.

Expression 4 is a technique for obtaining a vector after correcting the position coordinates (x _n , y _n ) of the feature point by Expression 2 or 3, but it is obtained directly from the position coordinates of the same feature point in the preceding and following frames. The vector 132 may be approximated as follows to simplify the processing.

In Expression 5, the parameter m is used for the ratio of the delay time of the observation time, but a linear expression such as y _n / V may be substituted. The optical flow may be obtained by the analysis processing unit 66 as needed during the image analysis. In the examples described so far, the exposure time of each row is assumed to be equal to the interval d of the reference time, that is, the shooting period of the frame, but more strictly, the exposure time changes according to the shooting environment etc. with the shooting period as the upper limit. Can do.

FIG. 10 is a diagram for explaining a correction process when a pure exposure time is considered. While figure formats are the same as those shown in FIG. 8, each row of the exposure time e _n is shorter than the distance d of the reference time. In this case, the progress of the exposure with respect to the longitudinal direction of the image, to become faster than the examples described heretofore, the delay time R _n at the same observation position is shortened. Specifically:
R _n = d · {m (x _n , y _n ) −sh _n }

Here sh _n is a shutter correction value in consideration of the exposure time e _n in the frame f _n, it is defined as follows.
sh _n = (1−e _n / d) / 2
When the delay time R _n into equation 3, the position coordinates after correction is obtained as follows.

However the difference in the exposure between successive frame time e _n-1 and e _n, if the negligible difference in turn shutter correction value sh _n-1 and sh _n, may be approximated as follows.

Furthermore, as long as the change in the ratio of the delay time between consecutive frames is negligible, the following approximation may be performed.
L _n ≒ 1
These are reasonable approximations unless a situation in which the exposure time is extremely changed or a situation in which the optical flow is significantly large does not occur. For the same reason, if the average value of the predicted shutter correction value is set as a constant, even if the exposure time is purely considered, the position coordinates of the feature points on the captured image, The correction process can be easily performed using the parameter m (x _n , y _n ) indicating the delay time ratio determined thereby.

FIG. 11 schematically shows a temporal relationship among a captured image, an image used for image analysis, and a display image in the present embodiment. The left side of the figure shows a photographed image by the imaging device 12, and the right side shows a display image by the display device 16, corresponding to three frames f _n−1 , f _n and f _{n + 1} . When a rolling shutter is used for the image pickup device 12, if the data for each row that is observed and read at each time is arranged in the time axis direction with the vertical direction in the figure as the time axis, it corresponds to the image plane as shown in the figure. Become.

In the figure, as a simple example, it is assumed that the black ball 200 is photographed to move downward. In order to clarify the position of the ball 200, a stationary object 202 is arranged on the right side. As described above, the time at which the ball 200 is observed in each frame is the reference time d · (n−1), d · n, d · (n + 1) that is the exposure time of the uppermost row of the frame, respectively. Delay by R _n−1 , R _n , R _{n + 1} . The information processing apparatus 10 acquires such captured image data from the imaging apparatus 12 in a stream format in order from the top row.

In the case of a simple mode in which captured images are displayed as they are, the information processing apparatus 10 sequentially outputs the acquired streams to the display apparatus 16. When the display device 16 is a display corresponding to the line buffer, the data is displayed immediately from the upper stage to the lower stage of the screen in the output order. Thereby, in principle, the delay time from photographing to display is unified to the time ΔT required for transfer in all rows. As a result, the most recent data possible is displayed.

Even when some processing or image generation is performed in the information processing apparatus 10, a newer image can be displayed by proceeding from the upper stage of the image plane and outputting it immediately. On the other hand, when image analysis is performed in the information processing apparatus 10, a general algorithm that handles one frame as data at the same time can be used as it is by correcting the position of the feature point and the optical flow as preprocessing. . That is, the

images

204a and 204b corresponding to each reference time are generated in a pseudo manner.

In the

images

204a and 204b shown in the figure, the image of the ball 200 observed before and after the reference times d · n and d · (n + 1) is indicated by a dotted line, and the corrected image obtained by temporally interpolating them is shown. The position of the ball is shaded. However, in practice, the correction target is limited to the feature points necessary for image analysis, thereby suppressing the occurrence of delay due to the correction processing. The result of accurate analysis using the positions of the feature points at the unified time in this way may be reflected in the subsequent display image or may be used for requesting data to the imaging device 12.

For example, by acquiring the position and movement of a predetermined object accurately, a region where the object is captured may be predicted and notified to the imaging device 12. In response to this, the imaging device 12 transmits the high-resolution image data of the notified area and the low-resolution image data of the other area, and synthesizes and displays them or uses them for further analysis. As a result, the data transfer amount can be suppressed as a whole. Alternatively, the movement of the visual field can be accurately obtained by using SLAM or the like, and VR or AR without a sense of incongruity can be realized.

In this embodiment, the feature points used for image analysis are targeted for correction, and the image itself is basically output with a time difference included. That is, the handling of the captured image is independent between the display process and the analysis process. If this characteristic is used, depending on the type and purpose of image analysis, the correction of feature points and the frequency of analysis processing using them will be lower than the frame rate of shooting and display, reducing the overall processing load. You can also.

According to the present embodiment described above, a rolling shutter camera is used as an imaging device, the time required from actual observation to output is shortened, and the captured image is analyzed within the frame. Correct to eliminate the observation time difference. This makes it possible to perform image analysis with high accuracy by using a conventionally used algorithm as it is.

Also, correction is limited to the characteristics used for image analysis, and parameters used for correction are calculated in advance, so that the correction process can be made more efficient and the influence on time can be reduced. The parameter calculation result is prepared as a two-dimensional map corresponding to discrete positions on the image plane, so that random access is possible and even an image subjected to a specific operation such as lens distortion correction, Easy and precise correction can be realized.

In addition, by constructing a system that combines an imaging device with a rolling shutter and a display device with a structure that can immediately display an input pixel row, it minimizes data stagnation in the route from shooting to display. Enable to display the latest image. By introducing the above-described image analysis technique into such a system, it is possible to achieve both information processing and display immediacy using captured images and processing accuracy.

The present invention has been described based on the embodiments. Those skilled in the art will understand that the above-described embodiment is an exemplification, and that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. is there.

1 information processing system, 10 information processing device, 12 imaging device, 16 display device, 23 CPU, 24 GPU, 26 main memory, 32 communication unit, 34 storage unit, 36 output unit, 38 input unit, 38 input unit, 40 recording medium drive unit, 52 taken image acquisition unit, 54 image analysis unit, 56 output data generation unit, 60 feature extraction unit, 62 correction unit, 64 correction data storage unit, 66 analysis processing unit.

As described above, the present invention can be used for information processing devices such as game devices and personal computers, head mounted displays, imaging devices, and information processing systems including them.

Claims

A captured image acquisition unit that acquires data of a captured moving image from an imaging device that includes a rolling shutter that captures an image with a time lag for each row of pixels;
A correction unit that corrects the position coordinates of the feature points in the frame of the moving image to the position coordinates at the reference time of the frame;
An analysis processing unit that performs image analysis using the corrected position coordinates and reflects the result in the output data;
An information processing apparatus comprising:
The correction unit shifts the time when the feature point is observed from the reference time based on the vertical shooting progress speed when shooting one frame in the imaging apparatus and the position coordinates of the feature point to be corrected. The information processing apparatus according to claim 1, wherein the position coordinates are corrected by specifying time.
A correction data storage unit storing a two-dimensional map in which predetermined parameters used for specifying the shift time are associated with discrete positions on the image plane;
The said correction | amendment part specifies the said shift | offset | difference time by referring to the said 2-dimensional map, and acquiring the said parameter corresponding to the position coordinate of the said feature point of the correction | amendment object. Information processing device.
4. The information processing apparatus according to claim 3, wherein the predetermined parameter includes a correction component that restores the lens distortion correction performed in the imaging apparatus.
The correction unit corrects the position coordinates by interpolating the position coordinates of the feature points in the frame before the correction target frame and the position coordinates of the feature points of the correction target frame based on the shift time. The information processing apparatus according to claim 2, wherein:
6. The information processing apparatus according to claim 2, wherein the correction unit adjusts the shift time according to a change in exposure time for each frame.
The captured image acquisition unit sequentially acquires data of the moving image from a row where shooting is completed in the imaging device,
The information processing apparatus further includes:
The information processing apparatus according to claim 1, further comprising a data output unit that sequentially outputs data of the moving image to a display device from a row acquired by the captured image acquisition unit.
The information processing apparatus according to claim 7, wherein the data output unit outputs data obtained by processing the moving image to the display device based on an analysis result by the analysis processing unit.
An image pickup unit that includes a rolling shutter that captures an image with a time lag for each row of pixels, and that sequentially outputs captured image data from the row where shooting is completed;
The captured image data is acquired for each row of pixels, and a display unit that sequentially displays from the acquired row,
A head-mounted display characterized by comprising:
An image pickup unit that includes a rolling shutter that captures an image with a time lag for each row of pixels, and that sequentially outputs captured image data from the row where shooting is completed;
A correction unit that acquires data of a captured moving image from the imaging unit, and corrects the position coordinates of the feature points in the frame of the moving image to the position coordinates at the reference time of the frame;
An analysis processing unit that performs image analysis using the corrected position coordinates;
An output data generation unit that generates display image data using the result of the image analysis and outputs the data for each row;
A display unit for sequentially displaying the display image from the output line;
An information processing system comprising:
Acquiring data of a captured moving image from an imaging device including a rolling shutter that captures an image with a time lag for each row of pixels;
Correcting the position coordinates of the feature points in the frame of the moving image to the position coordinates at the reference time of the frame;
Performing image analysis using the corrected position coordinates;
Outputting data reflecting the analysis results;
An information processing method by an information processing apparatus, comprising:
A function of acquiring data of a captured moving image from an imaging device having a rolling shutter that captures an image with a time lag for each row of pixels;
A function of correcting the position coordinates of the feature points in the frame of the moving image to the position coordinates at the reference time of the frame;
A function to perform image analysis using the corrected position coordinates;
A function to output data reflecting the analysis results,
A computer program for causing a computer to realize the above.