WO2012157178A1 - 動画像撮影装置、情報処理システム、情報処理装置、および画像データ処理方法 - Google Patents
動画像撮影装置、情報処理システム、情報処理装置、および画像データ処理方法 Download PDFInfo
- Publication number
- WO2012157178A1 WO2012157178A1 PCT/JP2012/002397 JP2012002397W WO2012157178A1 WO 2012157178 A1 WO2012157178 A1 WO 2012157178A1 JP 2012002397 W JP2012002397 W JP 2012002397W WO 2012157178 A1 WO2012157178 A1 WO 2012157178A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- data
- unit
- stream
- host terminal
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims description 13
- 238000003672 processing method Methods 0.000 title claims description 6
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims description 104
- 230000005540 biological transmission Effects 0.000 claims description 48
- 239000000203 mixture Substances 0.000 claims description 33
- 239000002131 composite material Substances 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 18
- 238000001514 detection method Methods 0.000 claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 10
- 238000003786 synthesis reaction Methods 0.000 claims description 10
- 238000003384 imaging method Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 abstract description 5
- 238000000034 method Methods 0.000 description 41
- 238000004891 communication Methods 0.000 description 27
- 230000008569 process Effects 0.000 description 27
- 239000000872 buffer Substances 0.000 description 21
- 238000010191 image analysis Methods 0.000 description 19
- 230000033001 locomotion Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 230000007246 mechanism Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005194 fractionation Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/95—Computational photography systems, e.g. light-field imaging systems
- H04N23/951—Computational photography systems, e.g. light-field imaging systems by using two or more images to influence resolution, frame rate or aspect ratio
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/002—Specific input/output arrangements not covered by G06F3/01 - G06F3/16
- G06F3/005—Input arrangements through a video camera
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440263—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44213—Monitoring of end-user related data
- H04N21/44218—Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/2628—Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
Definitions
- the present invention relates to a technique for performing information processing according to the movement of an object.
- Patent Document 1 a game in which a part of the body such as the user's head is photographed with a video camera, a predetermined area such as the eyes, mouth, hand, etc. is extracted, and the area is replaced with another image and displayed on a display.
- a user interface system that receives mouth and hand movements taken by a video camera as application operation instructions.
- the above-described technology requires a high-resolution image in order to extract a predetermined area such as a user's mouth or hand.
- the higher the performance of the video camera image sensor the greater the amount of data, such as higher resolution, and the filtering that provides the necessary information for compression, decompression processing, recognition, detection, measurement processing, etc. for transfer at an appropriate scale.
- data mining processing costs such as scaling, cropping and the like increase, and latency from the camera input to each processing output increases.
- an increase in latency significantly reduces usability.
- the performance of the image sensor of the video camera is improved, the performance of the entire system may be deteriorated.
- the present invention has been made in view of these problems, and an object of the present invention is to provide an image processing technique capable of suppressing latency from imaging to image display using the data while using a high-performance imaging device. Is to provide.
- An aspect of the present invention relates to a moving image photographing apparatus.
- This moving image capturing device generates a plurality of image data of different resolutions in a predetermined pixel order by reducing each frame of a moving image obtained by capturing an object in multiple stages, and sequentially outputs it as a stream.
- a plurality of image data output from the image data generation unit and a plurality of predetermined image data are connected for each pixel column of one column of image or a pixel column in a smaller range.
- a stream of data to be transmitted by extracting pixel data included in the requested image and area from the plurality of streams Generates an image sending unit configured to send to the host terminal, characterized in that it comprises for.
- This moving image photographing device is a moving image photographing device provided with a pair of cameras that photograph the same object from different viewpoints on the left and right, and each of the pair of cameras captures each of the moving images obtained by photographing the object.
- Data of a plurality of images having different resolutions are generated in a predetermined pixel order by reducing the frame in multiple stages, and sequentially output as a stream, and a plurality of images output from the image data generation unit
- a virtual composition including a plurality of predetermined images by connecting data of a plurality of predetermined images of the data to a pixel column corresponding to one column of the image or a pixel column in a smaller range and outputting it as a stream.
- An image synthesizing unit that generates an image
- the moving image capturing device further includes an image having a predetermined resolution among data of images generated by a pair of cameras with different viewpoints.
- a depth image representing the position of the object in the three-dimensional space is generated in a predetermined pixel order, and sequentially output as a stream, and data transmission from the connected host terminal Data to be transmitted by accepting the request and extracting pixel data included in the requested image and area from the multiple streams output from the image data generation unit, image synthesis unit, and stereo matching processing unit
- an image sending unit for generating the stream and sending it to the host terminal.
- Still another aspect of the present invention relates to an information processing system.
- This information processing system captures a target image by capturing a part of moving image data from the moving image capturing device that captures an object and generates moving image data.
- An information processing system including a host terminal that performs processing and displays an image, wherein the moving image capturing device has different resolutions by reducing each frame of the moving image obtained by capturing in multiple stages.
- a plurality of image data are generated in a predetermined pixel order, and sequentially output as a stream, and a plurality of predetermined image data among a plurality of image data output from the image data generation unit
- a stream of data to be transmitted by extracting image data requested from the host terminal and pixel data included in the area from the plurality of streams output from the generation unit, the image data generation unit, and the image synthesis unit
- an image transmission unit that generates and transmits the image to a host terminal.
- Still another aspect of the present invention relates to an information processing apparatus.
- This information processing apparatus is transmitted from a camera according to a request, a data request unit that requests transmission of image data of a frame of a moving image by specifying a resolution and an area in the image to a camera that is shooting an object.
- a data expansion unit that expands image data in a stream state in which pixel values of a designated area are connected for each pixel column as two-dimensional image data in the main memory, and a predetermined amount using the two-dimensional image data
- a data processing unit that performs image processing and displays an image, and the data request unit generates a plurality of images with different resolutions obtained by reducing a frame of a moving image generated in a camera in multiple stages.
- the data development unit converts the synthesized image transmitted from the camera into an individual two-dimensional image for each image to be synthesized. And performing fractionation of the image by developing the data.
- Still another embodiment of the present invention relates to an image data processing method.
- This image data processing method is an image data processing method performed by a moving image photographing device, and data of a plurality of images having different resolutions by reducing each frame of a moving image obtained by photographing an object in multiple stages.
- FIG. 1 is a diagram illustrating an overall configuration of a low delay camera system according to Embodiment 1.
- FIG. 1 is a diagram illustrating a configuration of a camera according to Embodiment 1.
- FIG. 2 is a diagram illustrating in detail the configuration of an image composition unit and an image transmission unit of the camera according to Embodiment 1.
- FIG. 3 is a diagram illustrating an internal circuit configuration of a host terminal according to Embodiment 1.
- FIG. FIG. 3 is a diagram schematically illustrating a basic transition of a data form in the camera and the host terminal according to the first embodiment.
- FIG. 6 is a time chart showing input timings from the pyramid filter unit of pixel values of a 1/4 demosaiced image, a 1/16 demosaiced image, and a 1/64 demosaiced image in the first embodiment.
- FIG. 6 is a diagram schematically illustrating a state in which the image composition unit connects data of pixel columns of a plurality of images in the first embodiment.
- 6 is a diagram illustrating a configuration relating to a data request process of a host terminal and a data transmission process of a camera in Embodiment 1.
- FIG. FIG. 10 is a diagram illustrating a modification of the configuration of the image sending unit in the first embodiment.
- 6 is a diagram illustrating a configuration of a camera according to Embodiment 2.
- FIG. 6 is a diagram illustrating a configuration of a camera according to Embodiment 3.
- FIG. FIG. 10 is a flowchart illustrating an example of a processing procedure in which image processing is performed in cooperation between a host terminal and a stereo camera according to Embodiment 3, and an example of a generated image.
- FIG. 10 is a flowchart illustrating another example of a processing procedure in which image processing is performed in cooperation between a host terminal and a stereo camera according to Embodiment 3, and an example of a generated image.
- FIG. 10 is a flowchart illustrating another example of a processing procedure in which image processing is performed in cooperation between a host terminal and a stereo camera according to Embodiment 3, and an example of a generated image.
- FIG. 1 shows an overall configuration of a low-latency camera system 10 according to the present embodiment.
- a moving image of the user 6 is captured by the camera 100, image processing is performed by the host terminal 20 based on the data, and the result is displayed on the display 4, or a network such as the Internet or a LAN (Local Area Network). 12 to a predetermined communication destination.
- a network such as the Internet or a LAN (Local Area Network). 12 to a predetermined communication destination.
- the camera 100 is a digital video camera including an image sensor such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor), and is installed on the upper part of the housing of the display 4 as shown.
- the display 4 is, for example, a liquid crystal television, a plasma television, a PC display, or the like.
- the user 6 is positioned in front of the display 4 and the whole body or a part of the user is imaged by the camera 100.
- the image displayed on the display 4 varies depending on the application executed in the low-latency camera system 10.
- the image 8 displayed on the display 4 is the face or hand of the user 6. A part of the body or the whole body.
- the image 8 displayed on the display 4 is the face of the chat partner, and the image of the user 6 is displayed on the display of the chat partner via the network 12.
- the camera 100 is installed on the upper part of the display 4. However, as long as the whole body or a part of the user 6 can be imaged, it may be arranged in the vicinity of the host terminal 20 or around the user other than the vicinity of the display 4. Further, the camera 100 may be embedded in the housing of the display 4 or the like instead of a single configuration. Instead of using an image sensor with the camera 100, an analog image may be used after A / D conversion.
- the host terminal 20 is a computer terminal such as a personal computer or a game device having an image processing function.
- the host terminal 20 continuously captures each frame of a moving image obtained by photographing the user 6 with the camera 100 or various data obtained from each frame in time series, and performs predetermined image processing.
- the image of the user 6 is transmitted to the chat partner in real time via the network 12.
- predetermined information processing is performed based on the image of the user 6 and various data obtained therefrom, and the result is output to the display 4.
- an image of a character that moves according to the movement of the user 6 or an image in which an item such as a sword is held in the hand of the user 6 is output to the display 4 in real time.
- the host terminal 20 may perform the face detection process of the user 6 and the tracking process of a specific part required in such an application, or the camera 100 as described later. It may be transmitted to the host terminal 20 as part of.
- processing such as representing only the face area of the user 6 obtained as a result of the face detection process with high resolution may be performed.
- the host terminal 20 can also synthesize object images such as menus and cursors for executing various applications and display them on the display 4.
- the camera 100 not only captures a moving image but also performs some processing using it to generate a plurality of types of data.
- Various processes performed by the camera 100, and hence the configuration thereof, can be considered in various ways depending on applications, processing capabilities of the camera and the host terminal, and the like.
- the camera 100 generates moving image data that represents the video that the camera 100 is capturing at a plurality of resolutions, and in accordance with a request from the host terminal 20, only necessary data is hosted in real time.
- Send to terminal 20 the host terminal 20 can specify how to represent the entire image such as resolution, color system and its elements, and can also specify an area within the frame.
- the data transmission load can be reduced by acquiring low-resolution whole image data and image data of only a noticeable region of the high-resolution image from the camera 100 and combining these images on the image plane.
- a moving image expressed in detail can be displayed for a region to be noted. This example is effective for an application such as a video chat if a face area obtained by performing face detection processing in the host terminal 20 is set as a noticeable area.
- FIG. 2 shows a configuration of the camera 100 according to the present embodiment.
- 3 and 4 and 8 to 11 described later can be realized by a hardware configuration such as a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and a drawing circuit.
- a hardware configuration such as a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and a drawing circuit.
- a program that exhibits various functions such as a data input function, a data holding function, an image processing function, and a drawing function.
- functional blocks realized by their cooperation are depicted. Therefore, these functional blocks can be realized in various forms by a combination of hardware and software.
- FIG. 2 also includes a schematic diagram of an image portion processed by each functional block for easy explanation.
- the camera 100 includes an image acquisition unit 102, a demosaic unit 104, a pyramid filter unit 170, an image composition unit 156, an image transmission unit 151, and a communication unit 108.
- the image acquisition unit 102 reads an image exposed by an image sensor such as a CCD or a CMOS at a predetermined timing (for example, 60 times / second). In the following description, it is assumed that this image has a width corresponding to W pixels in the horizontal direction and H pixels in the vertical direction. This image is a so-called RAW image.
- the image acquisition unit 102 sends this to the demosaic unit 104 and the image sending unit 151 every time exposure of one horizontal row of the RAW image is completed.
- the demosaic unit 104 includes a FIFO (First In First Out) buffer 105 having a capacity for W pixels and a simple demosaic processing unit 106.
- the FIFO buffer 105 receives pixel information for one horizontal row of the RAW image, and holds it until the next horizontal row of pixels is input to the demosaic unit 104.
- the simple demosaic processing unit 106 receives pixels for two horizontal rows, the simple demosaic processing unit 106 performs a demosaic (de-mosaic) process for generating a full-color image by complementing color information based on the peripheral pixels for each pixel. Execute.
- demosaic process As is well known to those skilled in the art, there are many methods for this demosaic process, but here, a simple demosaic process using only two horizontal rows of pixels is sufficient.
- the pixel for which the corresponding YCbCr value is to be calculated has only the G value
- the R value averages the R values adjacent to the left and right
- the G value uses the G value as it is
- the B value is the upper value.
- the B value located below is used as an RGB value, and this is substituted into a predetermined conversion formula to calculate the YCbCr value. Since such demosaic processing is well known, further detailed description is omitted. Note that the color space of the image data generated by the demosaic unit 104 and the subsequent processing is not limited to YCbCr.
- the simple demosaic processing unit 106 converts, for example, four horizontal 2 ⁇ vertical RGB pixels into a YCbCr color signal as illustrated.
- the block composed of four pixels is sent to the image sending unit 151 as a 1/1 demosaic image and sent to the pyramid filter unit 170.
- the pyramid filter unit 170 has a function of layering and outputting a certain image into a plurality of resolutions.
- the pyramid filter generally includes a number of 1 ⁇ 4 reduction filters corresponding to the required level of resolution.
- the pyramid filter has three layers of filters of the first filter 110 to the third filter 130. Each filter performs a process of calculating an average pixel value of four pixels by bilinear interpolation of four pixels adjacent to each other. Therefore, the image size after processing is 1 ⁇ 4 of the image before processing. It should be readily understood by those skilled in the art that the present embodiment can be similarly applied even when the number of filters is other than three layers.
- one FIFO buffer 112 for W pixels is arranged corresponding to each of the Y, Cb, and Cr signals. These FIFO buffers 112 have a role of holding YCbCr pixels for one horizontal row until the pixels for the next horizontal row are output from the simple demosaic processing unit 106.
- the pixel holding time is determined according to the line scan speed of the image sensor.
- the first filter 110 averages the pixel values of Y, Cb, and Cr for four pixels of 2 ⁇ 2 horizontal.
- the 1/1 demosaiced image becomes 1/2 in length and width, and is converted into a 1/4 size as a whole.
- the converted 1/4 demosaiced image is sent to the image composition unit 156 and also passed to the second filter 120 at the next stage.
- one FIFO buffer 122 for W / 2 pixels is arranged corresponding to each of the signals Y, Cb, and Cr. These FIFO buffers 122 also have a role of holding YCbCr pixels for one horizontal row until the pixels for the next horizontal row are output from the first filter 110.
- the second filter 120 averages the pixel values of Y, Cb, and Cr for four pixels of 2 ⁇ 2 horizontal. By repeating this process, the 1/4 demosaiced image becomes 1/2 in length and width, and is converted into a size of 1/16 as a whole.
- the converted 1/16 demosaiced image is sent to the image composition unit 156 and also passed to the third filter 130 at the next stage.
- the same processing as described above is repeated except that W / 4 FIFO buffers 132 are arranged in the previous stage. Then, the 1/64 demodulated image is output to the image composition unit 156. Since the pyramid filter as described above is well known as described in Patent Document 1, further detailed explanation is omitted in this specification.
- an image output reduced by 1 ⁇ 4 is input to the image composition unit 156.
- the size of the FIFO buffer required in the preceding stage of each filter becomes smaller as it passes through the filter in the pyramid filter unit 170.
- the number of filters is not limited to three, and may be determined as appropriate according to the required resolution width.
- the image composition unit 156 receives YCbCr pixel values of the 1/4, 1/16, and 1/64 demosaiced images from the first filter 110, the second filter 120, and the third filter 130, respectively. Then, a pixel row corresponding to one horizontal row of each image or a pixel row in a range smaller than one row is connected according to a predetermined rule, and each of the Y, Cb, and Cr pixel values is 1/4, 1/16, 1 / A new pixel row in which the pixel rows of the 64 demosaiced images are connected is generated. The generated pixel row is sent to the image sending unit 151.
- the image sending unit 151 selects necessary data in response to a data request received from the host terminal 20 via the communication unit 108 from among a plurality of types of input image data.
- the selected data is packetized and sent to the communication unit 108.
- the processing performed by the camera 100 in the present embodiment is executed in raster order starting from the upper left of the image and repeating the processing from left to right in the lower direction of the image, with a horizontal row of pixels as a basic processing unit. .
- the data format of various images input to the image sending unit 151 and the data format of the image transmitted to the host terminal 20 are basically a stream in which data in a horizontal row of images are connected in order from the top.
- the communication unit 108 transmits the packet to the host terminal 20 in accordance with a predetermined protocol such as USB 1.0 / 2.0.
- Communication with the host terminal 20 is not limited to wired communication, and may be, for example, wireless LAN communication such as IEEE802.11a / b / g or infrared communication such as IrDA.
- Y, Cb, and Cr data are individually expressed and arrows for data input / output are shown for the respective data. However, in order to prevent the figure from becoming complicated, those elements are hereinafter shown. Is expressed as one set.
- the data generated by the image composition unit 156 is a stream of a series of pixel values in which pixel columns of three demosaiced images are mixed. Therefore, strictly speaking, the result of connecting the three demosaiced images is not generated as a two-dimensional plane image.
- the subsequent processing is not performed via the image composition unit 156. This is the same as the image and 1/1 image data.
- the image synthesizing unit 156 substantially generates an image obtained by synthesizing the 1/4, 1/16, 1/64 demosaiced image.
- this virtual image is referred to as a “composite image”.
- FIG. 3 shows the configuration of the image composition unit 156 and the image transmission unit 151 of the camera 100 in detail.
- the image composition unit 156 includes FIFO buffers 149 and 150 that temporarily store data corresponding to one row of 1/16 and 1/64 demosaiced images acquired from the second filter 120 and the third filter 130, respectively.
- the pixel data for the horizontal line of the 1/16 demosaiced image from the second filter 120 is included in the data for the horizontal line of the 1/4 demosaiced image from the first filter 110.
- the data of half the pixel and the data of one-fourth of the pixel row for one horizontal row of the 1/64 demosaiced image from the third filter 130 are connected in this order, and the horizontal of the virtual composite image The data for one column.
- the image transmission unit 151 includes a data selection unit 154, a packetization unit 162, and a control unit 164. Based on a request from the host terminal 20, the control unit 164 of the image transmission unit 151 instructs the data selection unit 154 which one of various image data is to be transmitted as a packet.
- the data selection unit 154 receives, as input data, the raw image input from the image acquisition unit 102, the 1/1 demosaiced image input from the demosaic unit 104, and the pixel sequence data of the composite image input from the image composition unit 156.
- the data designated by the control unit 164 is selected, extracted, and sent to the packetizing unit 162.
- the packetization unit 162 packetizes the stream input from the data selection unit 154 for each size according to the protocol of the communication unit 108 and writes the packet into an internal packet buffer (not shown). For example, in the case of USB, a stream is packetized for each endpoint size.
- the communication unit 108 transfers the packet in the packet buffer to the host terminal 20 according to a predetermined communication protocol.
- the data selection unit 154 inputs a plurality of streams corresponding to them to the packetization unit 162. .
- the packetization unit 162. By providing a plurality of channels for the output of the data selection unit 154, the input / output of the packetizing unit 162, and the input / output of the communication unit 108, and transmitting the requested plurality of data in parallel, lower delay can be achieved. Data transmission may be realized. This case will be described in detail later.
- FIG. 4 shows the internal circuit configuration of the host terminal 20.
- the host terminal 20 includes a central processing unit (CPU) 50, a graphics unit (GPU) 52, a display control unit 54, a storage unit 56, a main memory 58, and a communication unit 60.
- the CPU 50 controls signal processing and internal components based on programs such as an operating system and applications.
- the GPU 52 performs image processing in accordance with a request from the CPU 50.
- the display control unit 54 generates a video signal for displaying the image data drawn by the GPU 52 in a frame buffer (not shown) on the display 4.
- the storage unit 56 is configured by a hard disk drive, a nonvolatile memory, or the like, and stores a program and necessary data for operating the low-latency camera system 10.
- the main memory 58 includes a RAM (Random Access Memory) and the like, and stores data transmitted from the camera 100 in addition to loading programs and data.
- the communication unit 60 is a peripheral device interface such as USB or IEEE 1394, or a wired or wireless LAN network interface. In this embodiment, in particular, the data request signal is transmitted to the camera 100 and the data transmitted from the camera 100 is received. I do. These units are connected to each other via a bus 62.
- the GPU 52 can directly read data necessary for processing such as texture data from the main memory 58 via the bus 62.
- FIG. 5 schematically shows a basic transition of the data format in the camera 100 and the host terminal 20.
- data of the entire frame image 200 having a width of W pixels in the horizontal direction and a width of H pixels in the vertical direction is transmitted from the camera 100 to the host terminal 20.
- generation, selection, and transmission of image data are performed in the pixel raster order, and pixel rows for one horizontal row are sequentially connected and processed in a stream format.
- the data output by the data selection unit 154 is the stream 202.
- the horizontal axis of the stream 202 represents the passage of time, and the rectangles L1, L2,..., LH constituting the stream 202 are the first column, the second column,. , H pixel data. If the data size of one pixel is d bytes, the data size of each rectangle is W ⁇ d bytes.
- the packetizing unit 162 collects the stream 202 into packets for each predetermined size, and generates packets P1, P2, P3, P4, P5,. As a result, packets P1, P2, P3, P4, P5,... Are transmitted from the camera 100 to the host terminal 20 in this order. When receiving the packets P1, P2, P3, P4, P5,... Via the communication unit 60, the host terminal 20 stores them in the main memory 58 under the control of the CPU 50.
- the packet data is arranged in the raster order in the main memory 58 so that the horizontal number of pixels W of the original frame image 200 is set to the horizontal width, and the data is expanded to continuous addresses of W ⁇ d ⁇ H bytes.
- the image 204 is restored.
- the rectangle which comprises the image 204 has shown the data of each packet.
- the GPU 52 renders an image to be displayed on the display 4 by processing the image 204 developed in the main memory 58 or combining it with another image.
- FIG. 6 is a time chart showing the input timings of the pixel values of the 1/4 demosaiced image, 1/16 demosaiced image, and 1/64 demosaiced image from each filter of the pyramid filter unit 170.
- the time steps S1, S2, S3, S4,... Each input the pixel values of the first, second, third, fourth,. Represents the period to be played.
- the highest resolution image among the images included in the composite image has the highest data generation rate in the pyramid filter unit 170. Therefore, a period in which pixel values for one horizontal row of the image are input is set as a reference time step, and the time step is associated with a pixel row for one horizontal row of the composite image. That is, data for one horizontal row of the composite image is generated with a period in which pixel values for one horizontal row of the highest resolution image are input as a reference period.
- the upper, middle, and lower stages of the figure show the input timing of the 1/4 demosaiced image, 1/16 demosaiced image, and 1/64 demosaiced image, respectively, and one rectangle corresponds to the input for one pixel. .
- pixel values of the first pixel row L (1/4) 1 of the 1/4 demosaiced image are sequentially input from the left pixel.
- the 1/16 demosaiced image and the 1/64 demosaiced image are not generated and are not input.
- step S2 pixel values of the second pixel row L (1/4) 2 of the 1/4 demosaiced image are sequentially input from the left pixel.
- the pyramid filter unit 170 uses the pixel values of the first pixel row L (1/4) 1 and the second pixel row L (1/4) 2 of the 1/4 demosaic image to 1/16. Since the first pixel row L (1/16) 1 of the demosaiced image is generated, the pixel value of the pixel row is also input in time step S2.
- the pixel value input in the leftmost period 210 of the first pixel row L (1/16) 1 of the 1/16 demosaiced image is the first pixel row L (1 of the 1/4 demosaiced image. / 4)
- the pixel value of two pixels input in the period 206 in 1 and the pixel value of two pixels input in the period 208 of the second pixel column L (1/4) 2 Generated using. Therefore, in the time step S2, the input timing of the pixel value of the pixel column L (1/16) 1 is delayed by at least two pixels from the input timing of the pixel value of the corresponding pixel of the pixel column L (1/4) 2. .
- the pixel value of the third pixel row L (1/4) 3 of the 1/4 demosaiced image is input.
- the pixel values in the second column of the 1/16 demosaiced image are not generated and the 1/64 demosaiced image is not generated, none of them is input.
- the next time step S4 that is, in the period in which the pixel value of the fourth pixel row L (1/4) 4 of the 1/4 demosaic image is input, 2 in the 1/16 demosaiced image, as in the time step S2.
- the pixel value of the second pixel column L (1/16) 2 is also input.
- the pyramid filter unit 170 uses the pixel values of the first pixel row L (1/16) 1 and the second pixel row L (1/16) 2 of the 1/16 demosaic image to 1/64 demosaic. Since the first pixel column L (1/64) 1 of the subsequent image is generated, the pixel value of the pixel column is also input in time step S4. For example, among the first pixel column L (1/64) 1 of the 1/64 demosaiced image, the pixel value input in the first input period 218 is the first pixel column L of the 1/16 demosaic image. (1/16) 1 out of the pixel values of the two pixels input in the period 210 and the period 212 and the second pixel column L (1/16) 2 out of the period 214 and the period 216 Are generated using pixel values of two pixels.
- the input timing of the pixel column L (1/64) 1 is delayed by at least two pixels from the input timing of the pixel value of the corresponding pixel in the pixel column L (1/16) 2. Thereafter, by repeating the pixel value input for each image in the same manner, all the pixel values of the 1/4 demosaiced image, the 1/16 demosaiced image, and the 1/64 demosaiced image are input to the image composition unit 156.
- the pixel values of each image are input from the corresponding filter of the pyramid filter unit 170 as individual streams in raster order.
- the image synthesizing unit 156 connects these to form one stream and outputs the stream to the image sending unit 151.
- pixel values are grouped for each image in each time step to generate a pixel row and connected in series.
- the input pixel value is only data of the 1/4 demosaiced image
- the time step S4 for example, the 1/4 demosaic image, the 1/16 demosaic image
- the data length output by the time step changes greatly, such as the data of three images of 64 demosaic images. Therefore, in the present embodiment, for an image having a time step in which data is not input, a part of the pixel value input immediately before is output using the time step, and is output at each time step. Equalize the data length.
- FIG. 7 schematically shows a state in which the image composition unit 156 has connected pixel array data of a plurality of images.
- S0, S1, S2, S3,... are time steps similar to those in FIG. 6, and pixel values for one column of the 1/4 demosaiced image are input in each period.
- the pixel columns to which data is output at each time step are indicated by different shaded rectangles for each image.
- the image composition unit 156 It is output as it is. Assuming that the number of pixels in the horizontal direction of the original RAW image is W, the number of pixels for one column of the 1/4 demosaiced image is W / 2 as shown in FIG.
- step S2 the pixel value of the second pixel row L (1/4) 2 of the 1/4 demosaiced image and the first pixel row L (1/16 ) of the 1/16 demosaiced image.
- One pixel value is input in parallel at the timing shown in FIG.
- the image composition unit 156 temporarily stores the pixel value of the first pixel row L (1/16) 1 of the 1/16 demosaiced image in the FIFO buffer 149, and the second row of the 1/4 demosaiced image.
- the pixel values of the pixel column L (1/4) 2 are continuously output first.
- the first pixel row L (1/16) of the 1/16 demosaiced image continues. 1 is read from the FIFO buffer 149 and output.
- the first half portion (the left half in the image plane ) of all the pixels in the first pixel row L (1/16) 1 of the 1/16 demosaiced image Only the pixel value is output, and the rest is continuously stored in the FIFO buffer 149.
- step S3 only the pixel value of the third pixel column L (1/4) 3 of the 1/4 demosaiced image is input.
- the image synthesizing unit 156 outputs the pixel value of the pixel column as it is, and then continues to the second half portion that has not been output in the first pixel column L (1/16) 1 of the 1/16 demosaiced image ( The pixel value in the right half of the image plane) is read from the internal memory and output.
- the image synthesizing unit 156 includes pixels of the second pixel row L (1/16) 2 of the 1/16 demosaiced image and the first pixel row L (1/64) 1 of the 1/64 demosaiced image.
- the values are temporarily stored in the FIFO buffers 149 and 150, respectively, and the pixel values of the fourth pixel row L (1/4) 4 of the 1/4 demosaiced image are continuously output first.
- the second pixel row L (1/16) of the 1/16 demosaiced image continues. 2 is read from the FIFO buffer 149 and output.
- the first pixel row L (1/64) 1 of the 1/64 demosaiced image is output.
- the first pixel row L (1/16) 1 of the 1/64 demosaiced image is divided into quarters, Only the pixel value of the first part is output. The rest is stored in the FIFO buffer 150.
- step S5 only the pixel value of the fifth pixel row L (1/4) 5 of the 1/4 demosaiced image is input.
- the image synthesizing unit 156 outputs the pixel value of the pixel column as it is, and subsequently, the second half of the pixel column L (1/16) 2 of the second column of the 1/16 demosaiced image that has not been output.
- the pixel value is read from the FIFO buffer 149 and output. Further, the pixel value of the second portion divided by a quarter of the unoutputted data of the first pixel row L (1/16) 1 of the 1/64 demosaiced image is output.
- the pixel value of the sixth pixel row L (1/4) 6 of the 1/4 demosaiced image and the third pixel row L (1 of the 1/16 demosaiced image ) / 16) The pixel value of the first half of 3 and the pixel of the third part divided by a quarter of the unoutputted data of the first pixel row L (1/64) 1 of the 1/64 demosaiced image Output the value.
- step S7 the pixel value of the seventh pixel row L (1/4) 7 of the 1/4 demosaiced image, the third pixel row L (1/16) of the 1/16 demosaiced image 3, the pixel value of the last part divided by a quarter of the pixel value L (1/64) 1 of the first column of the 1/64 demosaiced image is output.
- the first pixel row L (1/16) 1 in the 1/16 demosaiced image is output in half at two time steps of time steps S2 and S3.
- the first pixel row L (1/64) 1 of the 1/64 demosaiced image is output by a quarter to four time steps of time steps S4, S5, S6, and S7. If the number of pixels in the horizontal direction of the RAW image is W, the number of pixels for one horizontal row of the 1/16 demosaiced image and the 1/64 demosaiced image is W / 4 and W / 8, respectively.
- data of (W / 4) / 2 and (W / 8) / 4 pixels per time step is output.
- invalid data is first output as data for W / 2 pixels, which has been output 1/4 demosaiced image data, and then 1/16 demosaiced image, 1/64 demosaiced image Output data.
- S (H / 2 + 2) and S (H / 2 + 3) data of 1/4 demosaiced image and 1/16 demosaiced image has been output until then.
- W / 2 + (W / 4) / Invalid data is output first as the data for 2 pixels, and then the third part data and the fourth part data obtained by dividing the lowermost pixel column of the 1/64 demosaiced image by 1/4 are respectively output. To do.
- the data output from the image composition unit 156 is an array of pixel values, but by giving the number of pixels corresponding to each time step, that is, 21 W / 32 as the number of pixels for one horizontal row,
- the image sending unit 151 treats the data output at each time step as data for one column of the image, like the RAW image and the 1/1 demosaiced image.
- each time step can correspond to a vertical pixel of the image, and as a result, a composite image 220 represented by the entire rectangular area in FIG. 7 is generated.
- the 1/4 demosaiced image, the 1/16 demosaiced image, and the 1/64 demosaiced image in the composite image 220 These data constitute a rectangular area. Therefore, if the locality is used, the data for each image can be easily cut out.
- FIG. 8 shows a configuration relating to data request processing of the host terminal 20 and data transmission processing of the camera 100.
- the same reference numerals are given to those overlapping with the functional blocks shown in FIGS. 3 and 4, and a part of the description is omitted.
- the host terminal 20 and the camera 100 transmit and receive various kinds of data via the mutual communication units, which are omitted in the figure.
- the CPU 50 of the host terminal 20 includes a data request unit 64, a data processing unit 66, and a data expansion unit 68.
- the data selection unit 154 of the image transmission unit 151 of the camera 100 includes a stream selection unit 166 and a cropping unit 168.
- the data request unit 64 of the CPU 50 transmits to the camera 100 an image requesting transmission and a data request signal designating the area.
- an image for requesting transmission for example, either a RAW image or a post-demosaic image of each size is designated.
- the data request unit 64 of the CPU 50 also transmits a signal requesting the start and end of shooting, a signal designating shooting conditions, and the like to the camera 100.
- the shooting conditions are, for example, a frame rate, a shutter speed, a white balance, an angle of view, and the like, and are determined according to the performance of the camera 100, the application executed by the CPU 50, and the like.
- the control unit 164 of the image transmission unit 151 supplies the information to the data selection unit 154.
- the control unit 164 provides the information to the image acquisition unit 102 of the camera 100 as appropriate. Detailed description will be omitted.
- the stream selection unit 166 of the data selection unit 154 reads the stream of the RAW image, the 1/1 demosaiced image, and the composite image data from the image acquisition unit 102, the demosaic unit 104, and the image synthesis unit 156 in parallel, and then requests the data request. Only the image data specified by the signal is selected and output to the cropping unit 168.
- the cropping unit 168 extracts only pixel data included in the rectangular area specified by the data request signal from the input pixel data, and outputs the extracted data to the packetizing unit 162.
- the process performed by the cropping unit 168 is the same as a general cropping process in which a specified rectangular area in an image is cut out and an extra area is excluded.
- the processing target is not an image plane but a pixel column unit. However, if information on the number of pixels for one horizontal row of the original image is given, the two-dimensional coordinates of the image plane are set as the one-dimensional coordinates in the stream. The correspondence is easy, and the pixels to be cut out can be specified in the same manner.
- the data after 1/4 demosaiced image, 1/16 demosaiced image, and 1/64 demosaiced image are collected in a rectangular area on the composite image as shown in FIG.
- the three images can be easily separated by processing. For example, if an area having a horizontal width W / 8 and a vertical width H / 2 with the coordinate (W / 2, 1) as the upper left vertex in the composite image shown in FIG. Only the entire area of the 16 demosaiced image can be cut out.
- the data in the area in the image designated by the data request signal is continuously output to the packetization unit 162 in a stream format in which pixel columns are connected.
- the stream received by the packetizing unit 162 is packetized for each predetermined size according to the FIFO policy, and sequentially transmitted to the host terminal 20.
- the data expansion unit 68 of the host terminal 20 expands the packet received from the camera 100 as an image plane in the main memory 58 as shown in FIG.
- the data processing unit 66 performs processing according to the application being executed using the developed image. At this time, image processing may be requested to the GPU 52 as necessary, and the GPU 52 may read out an image from the main memory 58 and perform processing or composition. Since the image data developed in the main memory 58 is the same as general image data, it can be read out as a texture.
- the data processing unit 66 may analyze the image developed in the main memory 58 to acquire the face area and the position of the tracking target, and supply information on the area to the data requesting unit 64. At this time, the data request unit 64 may designate the area and transmit a new data request signal to the camera 100. In this case, the cropping unit 168 of the camera 100 changes the area to be extracted according to the designation at the timing of processing a new image frame.
- FIG. 9 shows a modification of the configuration of the image sending unit.
- blocks having the same functions as the functional blocks shown in FIG. 8 are denoted by the same reference numerals, and description thereof is partially omitted.
- the output of the image sending unit 151 and the input / output of the communication unit 108 have a plurality of channels. By providing a plurality of channels, different images and data of different areas can be extracted in parallel and transmitted to the host terminal 20 in parallel.
- the image sending unit 151 includes three data selection units, a first data selection unit 154a, a second data selection unit 154b, and a third data selection unit 154c, a first packetization unit 162a, a second packetization unit 162b, Three packetizing units of the three packetizing unit 162c are provided.
- the first data selection unit 154a and the first packetization unit 162a, the second data selection unit 154b and the second packetization unit 162b, the third data selection unit 154c and the third packetization unit 162c are connected in series and are in charge of each. Select, extract, and packetize data.
- the first data selection unit 154a, the second data selection unit 154b, and the third data selection unit 154c are respectively a stream selection unit 166a and a cropping unit 168a, a stream selection unit 166b and a cropping unit 168b, and a stream selection unit 166c and a cropping unit 168c.
- Have a set of The control unit 164 assigns information of up to three images and areas specified in the data request signal from the host terminal 20 to the three data selection units one by one.
- the image and area information to be assigned to different channels may be all different images or different areas of the same image.
- the processing performed by each data selection unit and packetization unit set is the same as the data selection unit 154 and packetization unit 162 shown in FIG.
- the packets of the three streams output in parallel from the image sending unit 151 are input to the three channels provided in the communication unit 108, that is, the first channel 172a, the second channel 172b, and the third channel 172c, respectively.
- the transmitted data is developed as individual images in the main memory 58 of the host terminal 20.
- the captured moving image is converted into multiple resolution data within the camera. And A stream in which pixel values are connected in the raster order of pixels for each image type and resolution. Then, a part thereof is transmitted according to the request of the host terminal, and a frame image is constructed in the memory of the host terminal.
- the memory size to be provided in the camera can be minimized by sequentially performing the processing in the state of the pixel row without developing the frame image inside the camera.
- the entire system can display an image corresponding to movement with low delay.
- image data of multiple resolutions are connected to each row of pixel values and included in one stream.
- the rate at which the “pixel value for one column” is generated differs depending on the resolution
- the low-resolution image in which the data is generated at a low rate is evenly included in the stream including the period in which the data is not generated. To distribute. This makes the data size to be processed and transmitted per unit time approximately equal, making it easy to estimate the time required for output, the transmission bandwidth to be used, and the time required for transmission, and a sudden increase in data size. This reduces the possibility of squeezing the transmission band.
- each of the images to be synthesized forms a rectangular area in the synthesized image, so by specifying the area in the synthesized image, by general image processing called cropping, Data of a plurality of images mixed in one stream can be easily separated.
- the camera 100 generates moving image data having a plurality of resolutions from the captured video, and sends only necessary data to the host terminal 20 in real time in accordance with a request from the host terminal 20.
- a motion difference image between frames is further generated as a request target of the host terminal 20.
- one of the images is analyzed in the camera 100, and the result is added as metadata to the image data to be transmitted to the host terminal 20.
- This embodiment can be realized by a system similar to the low-latency camera system 10 shown in FIG.
- the host terminal 20 has the same configuration as that shown in FIG.
- description will be made mainly focusing on differences from the first embodiment, and description of overlapping parts will be omitted as appropriate.
- FIG. 10 shows the configuration of the camera according to this embodiment.
- the camera 100a includes an image acquisition unit 102, a demosaic unit 104, a pyramid filter unit 170, an image composition unit 156, an image transmission unit 151a, and a communication unit 108, like the camera 100 in the first embodiment.
- the camera 100a further includes a difference image generation unit 174 and an image analysis unit 176.
- the image acquisition unit 102, demosaic unit 104, and pyramid filter unit 170 operate in the same manner as the corresponding functional blocks in the first embodiment.
- the difference image generation unit 174 generates a difference image between an image having a predetermined resolution output by the pyramid filter unit 170 and an image of another frame having the same resolution that has been output previously. Therefore, the difference image generation unit 174 includes an internal memory (not shown) that temporarily stores image data for one frame.
- the difference between the pixel value newly output from the pyramid filter unit 170 and the pixel value of the corresponding pixel in the previous frame stored in the internal memory is taken, and the result is used as the pixel value of the difference image as the image composition unit 156.
- a difference image is generated from the lowest resolution image generated by the pyramid filter unit 170 and is set as a synthesis target by the image synthesis unit 156. If the difference image is a part of the composite image, the image sending unit 151a and the communication unit 108 thereafter operate in the same manner as described in the first embodiment, thereby transmitting the difference image data to the host terminal 20. can do.
- the image analysis unit 176 performs a predetermined image analysis on the image having a predetermined resolution output from the pyramid filter unit 170, and passes the result to the image transmission unit 151a.
- the processing target of the difference image generation unit 174 and the image analysis unit 176 is an image having the same resolution.
- the present embodiment is not limited to this, and the processing may be performed independently.
- Image analysis performed by the image analysis unit 176 includes face detection processing and tracking of an object having a predetermined shape. Therefore, the analysis result passed to the image sending unit 151a is information on the position and size of the face area and the area of the object, and an evaluation value indicating detection / tracking accuracy.
- the analysis conditions such as what kind of analysis is to be performed and the shape information of the object are notified from the host terminal 20 to the camera 100 when the application is started in accordance with the application to be executed.
- the packetizing unit 162 (illustrated in FIG. 3) of the image sending unit 151a is an image analysis unit immediately after a stream of one frame of image data to be transmitted to the host terminal 20 or at a predetermined position in the stream of one frame.
- the result of the image analysis by 176 is inserted as metadata. Then, it is packetized with a predetermined size as in the case where image analysis is not performed.
- the host terminal 20 develops the image data portion of the data transmitted from the camera 100 as an image in the main memory, and uses the metadata for processing such as processing and composition of the image. Further, it is possible to newly specify data to be requested to the camera 100 for subsequent frames using metadata.
- image data and metadata is to predetermine the area where metadata is added on the image plane when all received streams are images, or to add information identifying the metadata to the metadata itself. Is possible.
- both the difference image generation unit 174 and the image analysis unit 176 are provided in the camera 100a, but only one of them may be provided.
- the information added as metadata may not be the result of image analysis, and may be, for example, a time stamp when the original RAW image is acquired.
- the information of the time stamp generated by the image acquisition unit 102 for each frame may be directly acquired by the image transmission unit 151a and inserted into the stream in the same manner as described above.
- a mechanism for generating a difference image is provided inside the camera.
- a region with motion is detected using a difference image
- even a low-resolution image often functions sufficiently.
- the lowest-resolution image is targeted, and the generated difference image is included in the synthesized image.
- a mechanism for detecting a face area and tracking an object of a predetermined shape is provided inside the camera, and the result is inserted as metadata into the image data stream in units of frames. This makes it possible to minimize processing to be performed by the host terminal when processing the face area or the object area or obtaining detailed information on the area.
- the camera 100 of the low-delay camera system 10 shown in FIG. 1 is configured as a stereo camera including a pair of cameras that capture the same object from different viewpoints on the left and right.
- stereo matching is performed using the frames of two moving images taken from the left and right to generate a depth image representing the position of the object in the depth direction.
- the depth image is transmitted as needed in response to a request from the host terminal 20 as with other images.
- the host terminal 20 may have the same configuration as that of the first embodiment.
- description will be made mainly focusing on differences from the first and second embodiments, and description of overlapping parts will be omitted.
- FIG. 11 shows the configuration of the camera according to this embodiment.
- the stereo camera 100b includes a first camera 190a, a second camera 190b, a stereo matching processing unit 192, an image sending unit 151b, and a communication unit 108.
- Each of the first camera 190a and the second camera 190b has substantially the same configuration as the camera 100 shown in Embodiment 1 and the camera 100a shown in Embodiment 2, but the image sending unit 151b and the communication unit 108 are about The first camera 190a, the second camera 190b, and the stereo matching processing unit 192 share.
- the first camera 190a includes an image acquisition unit 102a, a demosaic unit 104a, a pyramid filter unit 170a, an image composition unit 156a, and an image analysis unit 176a.
- the second camera 190b includes an image acquisition unit 102b, a demosaic unit 104b, a pyramid filter unit 170b, an image synthesis unit 156b, and an image analysis unit 176b.
- the imaging elements respectively provided in the image acquisition unit 102a and the image acquisition unit 102b image the same object from different left and right viewpoints.
- the configuration of the imaging element as hardware may be the same as that of a general stereo camera.
- the stereo matching processing unit 192 transmits one of the left and right moving image frames having a predetermined resolution from the demosaic unit 104a or the pyramid filter unit 170a of the first camera 190a, and the other one of the demosaic unit 104b or the pyramid filter unit of the second camera 190b. It is acquired at a predetermined rate from 170b.
- the stereo matching processing performed here may use any of various methods proposed so far. For example, a correlation window is set for one of the left and right images, and the corresponding points are obtained by calculating the cross-correlation coefficient with the correlation window image while moving the search window of the other image, and the parallax of these corresponding points.
- the area correlation method for obtaining three-dimensional position information using the principle of triangulation based on the above can be used.
- the input left and right image data is processed row by row, the pixel values of the depth image are determined in raster order, and sequentially output to the image sending unit 151b.
- the image sending unit 151b obtains data of the left and right RAW images, the 1/1 demosaiced image, and the composite image from the first camera 190a and the second camera 190b together with the data.
- the same image analysis result as that described in the second embodiment is received from the image analysis unit 176a of the first camera 190a and the image analysis unit 176b of the second camera 190b. Then, as described in the first embodiment, the data requested from the host terminal 20 is selected, and only the requested area is extracted and packetized as necessary. At this time, as described in the second embodiment, depending on the request of the host terminal 20, the result of the image analysis acquired from the image analysis units 176a and 176b is inserted as metadata.
- the processing performed by the communication unit 108 is the same as described above.
- the output of the image sending unit 151b and the input / output of the communication unit 108 are each indicated by one arrow.
- a plurality of channels are provided to transmit a plurality of data in parallel. It may be.
- the operation example shown here can be realized mainly by a system including the stereo camera 100b described in the third embodiment, but the configurations described in the first and second embodiments are appropriately combined.
- FIG. 12 shows a flowchart illustrating an example of a processing procedure in which the host terminal 20 and the stereo camera 100b cooperate to perform image processing, and an example of a generated image.
- the flowcharts of FIGS. 12 to 14 are started when the user inputs an application activation instruction to the host terminal 20.
- each step is represented by a rectangle connected in series. However, these steps are repeated for each pixel row and each frame and executed in parallel during the period of shooting a moving image. Shall be.
- the host terminal 20 designates initial conditions and necessary data set in an application program or the like, and issues a shooting start instruction and a data transmission request to the stereo camera 100b (S10).
- the initial conditions include the resolution and frame rate of a moving image captured by the two cameras of the stereo camera 100b, the resolution and frame rate of the image that the stereo matching processing unit 192 performs stereo matching, and the shape information of the tracking target.
- the resolution and frame rate of a moving image captured by the camera may be changed by changing the condition setting for exposure by the image sensor, or by adjusting the thinning of data from the image sensor at a later stage. Good.
- First camera Resolution 1280 ⁇ 720
- Second camera Resolution 1280 ⁇ 720
- Stereo matching Resolution 1280x720
- the designation of necessary data may designate metadata, in addition to designating the type of image, resolution, and area in the image.
- three data are designated as follows.
- Data 1 (Left image, YUV422: 16 bits, 0, 0, 1280, 720)
- Data 2 (left composite image, YUV422: 16 bits, 0, 0, 850, 367, face area, object area, time stamp)
- Data 3 (depth image, Z: 16 bits, 0, 0, 1280, 720)
- Data 1 is a 1/1 demosaiced image (YUV422: 16 bits) of the image taken by the left camera of the stereo camera 100b, the upper left coordinate is (0, 0), and the horizontal and vertical widths are ( 1280, 720). It can be seen that this area is the entire area of the photographed image in consideration of the resolution specified in the initial condition.
- Data 2 is an area in which the upper left coordinates are (0, 0) and the horizontal and vertical widths are (850, 357) in the composite image (YUV422: 16 bits) of the image taken by the left camera.
- the composite images in the examples of FIGS. 12 to 14 are the 1/4 demosaiced image, the 1/16 demosaiced image, the 1/64 demosaiced image shown in FIG. A difference image obtained as a result of the difference is included. This difference image is added to the right end of the composite image of FIG. 7 as an image area of (W / 16) / 8 ⁇ H / 2 according to the same rule as the others.
- the area specified by the data 2 is the entire area of this composite image.
- the face area obtained as a result of the face detection process, the object area obtained as a result of the tracking process, and the time stamp when the original image of the synthesized image was photographed are synthesized as metadata. It is specified to be added to the image.
- Data 3 has a depth image (16-bit position information in the depth direction as a pixel) generated by the stereo matching processing unit 192 having an upper left coordinate of (0, 0) and a horizontal and vertical width of (1280). , 720), that is, the entire region.
- the first camera 190a, the second camera 190b, and the stereo matching processing unit 192 perform the processing as described above using the captured image frame, so that the left RAW image, the 1/1 demosaiced image 230, and the left composite Data of the image 232, the depth image 234, the right RAW image and the 1/1 demosaiced image 236, and the left composite image 238 are generated (S14).
- the image transmission unit 151b generates and transmits transmission data by selecting and extracting only the data designated in S10 and packetizing it as a stream (S16).
- the host terminal 20 that has received the data develops an image in the main memory 58.
- the main memory 58 stores the entire area 240 of the 1/1 demosaiced image, the entire area 242 of the 1/4 demosaiced image, the entire area 244 of the 1/16 demosaiced image, and the entire area of the 1/64 demosaiced image.
- the area 246, the difference image 248 of the 1/256 demosaiced image, the face area, the object area, the metadata 250 including the time stamp, and the depth image 252 are stored.
- the CPU 50 and the GPU 52 of the host terminal 20 use these data to generate an image to be displayed and display it on the display 4 (S18, S20). For example, an area with motion is detected from the motion difference image 248, and the depth information of the object in that portion is acquired from the depth image 252. By continuing this for a plurality of frames, the user's gesture as the subject is recognized. And the image which performed the predetermined process according to gesture to the face area
- FIG. 13 shows a flowchart illustrating another example of a processing procedure in which the host terminal 20 and the stereo camera 100b cooperate to perform image processing, and a generated image example.
- the host terminal 20 designates initial conditions and necessary data, and issues a shooting start instruction and a data transmission request to the stereo camera 100b (S22).
- Data 1 (left composite image, YUV422: 16 bits, 0, 0, 850, 367, face area, object area, time stamp) This data is the same as the data 2 in the example of FIG.
- the first camera 190a and the second camera 190b of the stereo camera 100b that has received the designation of initial conditions and the data request start shooting moving images under the initial conditions (S24), and the first camera 190a, the second camera 190b, and the stereo
- Each matching processing unit 192 generates image data (S26).
- the image data at this time is the same as the image data generated in S14 of FIG.
- the image sending unit 151b generates and sends transmission data by selecting and extracting only the data designated in S22 and packetizing it as a stream (S28).
- the host terminal 20 that has received the data develops an image in the main memory 58.
- the main memory 58 stores the difference between the entire area 242 of the 1/4 demosaiced image, the entire area 244 of the 1/16 demosaiced image, the entire area 246 of the 1/64 demosaiced image, and the 1/256 demosaiced image.
- Metadata 250 including an image 248, a face area, an object area, and a time stamp is stored.
- the CPU 50 of the host terminal 20 determines an area having a movement identified from the difference image 248 and a predetermined area including a face area or an object area included in the metadata 250 as an attention area (S30). Then, a new data request is made by designating the attention area (S32).
- an attention area S30
- a new data request is made by designating the attention area (S32).
- two data are designated as follows.
- Data 2 (Left image, RAW: 16 bits, Fx, Fy, Fw, Fh)
- Data 3 depth image, Z: 8 bits, Hx, Hy, Hw, Hh)
- Data 2 is determined as an attention area including a face area in the RAW image (16 bits) taken by the left camera of the stereo camera 100b, and the upper left coordinates are (Fx, Fy), the horizontal direction and the vertical direction. Is a region of (Fw, Fh).
- the data 3 has an upper left coordinate (Hx, Hy) determined as a region of interest including the region of the object in the depth image generated by the stereo matching processing unit 192 (8-bit position information in the depth direction is a pixel). ), A region having horizontal and vertical widths of (Hw, Hh).
- the image sending unit 151b of the stereo camera 100b extracts the data of the specified area from the RAW image and the depth image at the timing when a new frame of each image is input, and packetizes the transmission data by packetizing it as a stream. Generate and transmit (S34).
- the host terminal 20 that has received the data develops an image in the main memory 58.
- the main memory 58 stores the raw image 254 of the area including the face and the depth image 256 of the area including the object.
- the CPU 50 and the GPU 52 of the host terminal 20 use these data to generate an image to be displayed and display it on the display 4 (S36, S38). For example, by combining the RAW image 254 of the area including the face with the 1/4 demosaiced image 242 as a background, a clear image is displayed only in the face area representing a change in facial expression while suppressing the data size. Furthermore, the depth information of the object may be acquired from the depth image 256 to recognize the user's gesture, and a predetermined process corresponding thereto may be performed.
- FIG. 14 shows a flowchart illustrating another example of a processing procedure in which the host terminal 20 and the stereo camera 100b cooperate to perform image processing, and an example of a generated image.
- the host terminal 20 designates initial conditions and necessary data, and issues a shooting start instruction and a data transmission request to the stereo camera 100b (S40).
- First camera Resolution 1280 ⁇ 720
- Second camera Resolution 1280 ⁇ 720
- Stereo matching 320 x 180 resolution, 15 fps frame rate
- Necessary data is specified as follows.
- Data 1 (left composite image, Y (motion difference): 8 bits, 840, 8, 10, 360, time stamp)
- Data 2 (left composite image, YUV422: 16 bits, 800, 4, 40, 360, face area, time stamp)
- Data 3 (depth image, Z: 8 bits, 20, 15, 280, 150, time stamp)
- Data 1 is the difference image area in the Y image, that is, the upper left coordinate (840, 8) and the horizontal and vertical widths (10, 360) of the composite image of the images taken by the left camera. It is an area. Further, the data 1 is specified to add the time stamp when the original image was taken as metadata.
- Data 2 is an area in which the upper left coordinate is (800, 4) and the horizontal and vertical widths are (40, 360) in the composite image (YUV422: 16 bits) of the image taken by the left camera, that is, 1 / 64 This is the area of the demosaiced image. Further, it is specified that the data 2 includes a face region obtained as a result of the face detection process and a time stamp when the original image is taken as metadata.
- the area information of each image included in the composite image designated by data 1 and data 2 can be specified according to the arrangement rule shown in FIG.
- the data 3 has a depth image (8-bit position information in the depth direction as pixels) generated by the stereo matching processing unit 192 having an upper left coordinate of (20, 15) and a horizontal and vertical width of (280). , 150). This area is obtained by cutting off the upper and lower ends of the depth image by 15 pixels and the left and right ends by 20 pixels, respectively, and is considered to be an area having meaning as depth information. In this way, the data size can be suppressed. Further, the data 3 is specified so that the time stamp when the original image is taken is added as metadata.
- the first camera 190a and the second camera 190b of the stereo camera 100b that have received the designation of initial conditions and the data request start shooting moving images under the initial conditions (S42), and the first camera 190a, the second camera 190b,
- the stereo matching processing unit 192 generates image data (S44).
- the image at this time is simpler than the examples of FIGS. 12 and 13 in terms of image size, color space, frame rate, and the like.
- the image sending unit 151b generates and sends transmission data by selecting and extracting only the data designated in S40 and packetizing it as a stream (S46).
- the host terminal 20 that has received the data develops an image in the main memory 58.
- the main memory 58 stores the difference image 260 of the 1/256 demosaiced image, the time stamp 262 of the original image, the entire area 260 of the 1/64 demosaiced image, the face area, and metadata including the time stamp. 266, a depth image 268 with the surroundings cut off and a time stamp 270 of the original image are stored.
- the CPU 50 and the GPU 52 of the host terminal 20 use these data to generate an image to be displayed and display it on the display 4 (S48, S50). For example, a moving area is detected from the difference image 260 and the depth information of the object in that portion is acquired from the depth image 268. As a result, the user's gesture as a subject is recognized, and an image subjected to a predetermined process according to the gesture is displayed on the face region obtained from the metadata 266 in the entire region 260 of the 1/64 demosaiced image. .
- the frame rate is reduced or only low-resolution images are transmitted, so that the entire area is targeted for transmission and processing, but the consumption of resources including the transmission band is suppressed. Since the entire area is transmitted, the adaptive area designation step shown in the example of FIG. 13 can be omitted. As described above, even if the data size per frame of the three data to be transmitted is different and the timing at which the data for one frame arrives at the host terminal 20 is shifted depending on the data, By adding a time stamp for each frame, the correspondence of data can be easily specified.
- the features of the first and second embodiments are applied to the stereo camera.
- the stereo camera is provided with a mechanism for performing stereo matching.
- a RAW image generated by each camera a 1/1 demosaiced image, a composite image, a depth image obtained as a result of stereo matching, face area information obtained as a result of face detection, and a result of tracking processing
- data designated by the host terminal can be transmitted with low delay. Therefore, the processing load of the host terminal is reduced, and an image display that follows the movement of the subject with a low delay can be achieved by a synergistic effect with the efficiency of data transmission from the camera.
- 4 display, 10 low delay camera system, 20 host terminal, 50 CPU, 52 GPU, 58 main memory, 60 communication unit, 64 data request unit, 66 data processing unit, 68 data development unit, 100 camera, 104 demosaic unit, 108 Communication unit, 149 FIFO buffer, 150 FIFO buffer, 151 image transmission unit, 156 image composition unit, 154 data selection unit, 162 packetization unit, 164 control unit, 166 stream selection unit, 168 cropping unit, 170 pyramid filter unit, 172a 1st channel, 172b 2nd channel, 172c 3rd channel, 174 Difference image generation unit, 176 Image analysis unit, 190a 1st channel La, 190b second camera, 192 stereo matching unit.
- the present invention can be used for information processing apparatuses such as computers, cameras, game apparatuses, and image display apparatuses.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Social Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Studio Devices (AREA)
- Image Processing (AREA)
- Studio Circuits (AREA)
Abstract
Description
図1は、本実施形態に係る低遅延カメラシステム10の全体構成を示す。このシステムでは、ユーザ6の動画像をカメラ100で撮像し、そのデータに基づきホスト端末20で画像処理を行い、ディスプレイ4にその結果を映し出したり、またはインターネット、LAN(Local Area Network)等のネットワーク12を介して所定の通信先に送信したりする。
実施形態1では、カメラ100が、撮影した映像から複数の解像度の動画像データを生成し、ホスト端末20からの要求に従い、そのうち必要なデータのみをリアルタイムにホスト端末20に送出した。本実施の形態ではさらに、フレーム間の動き差分画像を生成してホスト端末20の要求対象とする。さらにカメラ100においていずれかの画像を解析し、その結果をメタデータとして、ホスト端末20に送信する画像データに付加する。
本実施の形態では、図1に示した低遅延カメラシステム10のカメラ100を、左右の異なる視点から同じ対象物を撮影する一対のカメラを含むステレオカメラで構成する。そして当該ステレオカメラにおいて、左右から撮影した2つの動画像のフレームを用いてステレオマッチングを行い、対象物の奥行き方向の位置を表す奥行き画像を生成する。当該奥行き画像は他の画像と同様に、ホスト端末20からの要求に応じて随時送信する。ここでホスト端末20は実施の形態1と同様の構成でよい。以後、主に実施の形態1および2と異なる点に着目して説明し、重複する部分についてはその説明を省略する。
第1カメラ:解像度1280×720、フレームレート60fps
第2カメラ:解像度1280×720、フレームレート60fps
ステレオマッチング:解像度1280×720、フレームレート60fps
データ1:(左画像, YUV422:16bit,0,0,1280,720)
データ2: (左合成画像, YUV422:16bit,0,0,850,367, 顔領域,対象物領域,タイムスタンプ)
データ3: (奥行き画像、Z:16bit,0,0,1280,720)
データ1:1280×720画素×60fps×16bit=885Mbps
データ2:850×370画素×60fps×16bit=300Mbps
データ3:1280×720画素×60fps×16bit=885Mbps
であるから、合計2.1Gbpsとなる。
データ1: (左合成画像, YUV422:16bit,0,0,850,367, 顔領域,対象物領域,タイムスタンプ)
このデータは図12の例のデータ2と同じである。
データ2:(左画像,RAW:16bit,Fx,Fy,Fw,Fh)
データ3:(奥行き画像,Z:8bit,Hx,Hy,Hw,Hh)
データ1:850×370画素×60fps×16bit=300Mbps
データ2:400×600画素×60fps×16bit=230Mbps
データ3:320×450画素×60fps×8bit=70Mbps
となり、合計600Mbpsとなる。
第1カメラ:解像度1280×720、フレームレート30fps
第2カメラ:解像度1280×720、フレームレート15fps
ステレオマッチング:解像度320×180、フレームレート15fps
データ1:(左合成画像, Y(動き差分):8bit,840,8,10,360,タイムスタンプ)
データ2: (左合成画像, YUV422:16bit,800,4,40,360, 顔領域,タイムスタンプ)
データ3: (奥行き画像、Z:8bit,20,15,280,150,タイムスタンプ)
データ1:10×360画素×30fps×8bit=864Kbps
データ2:160×90画素×15fps×16bit=3.5Mbps
データ3:280×150画素×15fps×8bit=5Mbps
となり、合計9.5Mbpsとなる。
Claims (14)
- 対象物を撮影して得られる動画像の各フレームを多段階で縮小することにより異なる解像度の複数の画像のデータを、それぞれ所定の画素順に生成し、ストリームとして順次出力する画像データ生成部と、
前記画像データ生成部から出力された前記複数の画像のデータのうち所定の複数の画像のデータを、画像の一列分の画素列またはそれより小さい範囲の画素列ごとに接続してストリームとして出力することにより、前記所定の複数の画像を含む仮想的な合成画像を生成する画像合成部と、
接続したホスト端末からデータの送信要求を受け付け、前記画像データ生成部および前記画像合成部から出力された複数のストリームから、要求された画像および領域に含まれる画素のデータを抽出していくことにより送信すべきデータのストリームを生成し、前記ホスト端末に送信する画像送出部と、
を備えることを特徴とする動画像撮影装置。 - 前記画像合成部は、合成対象の画像のうち最高解像度の画像の画素一列分のデータが生成される期間を基準周期として前記合成画像の画素一列分とすべきデータを出力し、当該基準周期より長い周期で一列分のデータが生成されるその他の解像度の画像は、当該生成周期において均等にデータが出力されるように、接続する画素列の範囲を調整することを特徴とする請求項1に記載の動画像撮影装置。
- 前記画像送出部は、前記ホスト端末からデータの送信を要求された画像内の矩形領域を、ストリームを構成する画素列単位で切り出すクロッピング部を備え、
前記画像合成部は、合成対象の各画像が前記合成画像における矩形領域を構成するように各画像のデータを接続し、
前記クロッピング部は、前記ホスト端末からの要求に応じて、前記合成画像から合成対象の画像のうちのいずれかを画素列単位で切り出し、前記ホスト端末に送信することを特徴とする請求項1または2に記載の動画像撮影装置。 - 前記異なる解像度の複数の画像のうち、所定の解像度の画像をフレーム間差分することにより当該解像度の差分画像を生成する差分画像生成部をさらに備え、
前記画像合成部は、前記差分画像も合成対象に含めることを特徴とする請求項1から3のいずれかに記載の動画像撮影装置。 - 前記画像送出部は、画像データ生成部および前記画像合成部から出力された複数のストリームを並列に読み出し、それらのストリームのうち前記ホスト端末からの要求に応じて選択したストリームの少なくとも一部によって、送信すべきストリームを生成することを特徴とする請求項1から4のいずれかに記載の動画像撮影装置。
- 前記画像送出部は、ホスト端末へデータ送信を行うための複数の出力チャンネルを備え、前記ホスト端末から複数の領域のデータを要求された場合に、当該データごとに生成したストリームを前記複数の出力チャンネルから並列に送信することを特徴とする請求項1から5のいずれかに記載の動画像撮影装置。
- 前記複数の画像のいずれかに顔検出処理を施して、対象物である人の顔の領域を特定する顔検出部をさらに備え、
前記画像送出部は、前記ホスト端末からの要求に応じて、生成した画像データのストリームの所定の位置に、前記顔検出部が特定した顔の領域に係るデータをメタデータとして挿入したうえ、前記ホスト端末に送信することを特徴とする請求項1から6のいずれかに記載の動画像撮影装置。 - 追跡対象の対象物の形状情報を前記ホスト端末から取得し、それに基づき当該対象物の追跡処理を行うトラッキング部をさらに備え、
前記画像送出部は、前記ホスト端末からの要求に応じて、生成した画像データのストリームの所定の位置に、前記トラッキング部が特定した対象物の位置に係るデータをメタデータとして挿入したうえ、前記ホスト端末に送信することを特徴とする請求項1から7のいずれかに記載の動画像撮影装置。 - 左右の異なる視点から同じ対象物を撮影する一対のカメラを備えた動画像撮影装置であって、
前記一対のカメラはそれぞれ、
前記対象物を撮影して得られる動画像の各フレームを多段階で縮小することにより異なる解像度の複数の画像のデータを、それぞれ所定の画素順に生成し、ストリームとして順次出力する画像データ生成部と、
前記画像データ生成部から出力された前記複数の画像のデータのうち所定の複数の画像のデータを、画像の一列分の画素列またはそれより小さい範囲の画素列ごとに接続してストリームとして出力することにより、前記所定の複数の画像を含む仮想的な合成画像を生成する画像合成部と、
を備え、
前記動画像撮影装置はさらに、
前記一対のカメラが生成した視点の異なる画像のデータのうち、所定の解像度の画像のデータに対しステレオマッチングを行うことにより、前記対象物の3次元空間における位置を表す奥行き画像を所定の画素順に生成し、ストリームとして順次出力するステレオマッチング処理部と、
接続したホスト端末からデータの送信要求を受け付け、前記画像データ生成部、前記画像合成部、および前記ステレオマッチング処理部から出力された複数のストリームから、要求された画像および領域に含まれる画素のデータを抽出していくことにより送信すべきデータのストリームを生成し、前記ホスト端末に送信する画像送出部と、
を備えることを特徴とする動画像撮影装置。 - 対象物を撮影して動画像のデータを生成する動画像撮影装置と、当該動画像撮影装置から動画像のデータの一部を取得し、それを利用して所定の画像処理を行ったうえ画像を表示するホスト端末と、を備えた情報処理システムであって、
前記動画像撮影装置は、
撮影して得られた動画像の各フレームを多段階で縮小することにより異なる解像度の複数の画像のデータを、それぞれ所定の画素順に生成し、ストリームとして順次出力する画像データ生成部と、
前記画像データ生成部から出力された前記複数の画像のデータのうち所定の複数の画像のデータを、画像の一列分の画素列またはそれより小さい範囲の画素列ごとに接続してストリームとして出力することにより、前記所定の複数の画像を含む仮想的な合成画像を生成する画像合成部と、
前記画像データ生成部および前記画像合成部から出力された複数のストリームから、前記ホスト端末から要求された画像および領域に含まれる画素のデータを抽出していくことにより送信すべきデータのストリームを生成したうえ、前記ホスト端末に送信する画像送出部と、
を備えることを特徴とする情報処理システム。 - 対象物を撮影しているカメラに対し、解像度および画像内の領域を指定して動画像のフレームの画像データの送信を要求するデータ要求部と、
要求に従って前記カメラから送信された、指定した領域の画素値を画素列ごとに接続したストリームの状態の画像データを、メインメモリにおいて2次元の画像データとして展開するデータ展開部と、
前記2次元の画像データを利用して所定の画像処理を行ったうえ画像を表示するデータ処理部と、
を備え、
前記データ要求部は、前記カメラ内で生成される、動画像のフレームを多段階で縮小することにより得られる異なる解像度の複数の画像をそれぞれ所定の矩形領域に配置した合成画像を指定し、
前記データ展開部は、前記カメラから送信された前記合成画像を、合成対象の画像ごとに個別の2次元の画像データに展開することにより画像の分別を行うことを特徴とする情報処理装置。 - 動画像撮影装置が行う画像データ処理方法であって、
対象物を撮影して得られる動画像の各フレームを多段階で縮小することにより異なる解像度の複数の画像のデータを、それぞれ所定の画素順に生成し、ストリームとして順次出力するステップと、
前記出力するステップにおいて出力された前記複数の画像のデータのうち所定の複数の画像のデータを、画像の一列分の画素列またはそれより小さい範囲の画素列ごとに接続してストリームとして出力することにより、前記所定の複数の画像を含む仮想的な合成画像を生成するステップと、
接続したホスト端末からデータの送信要求を受け付け、前記出力するステップおよび生成するステップにおいて出力された複数のストリームから、要求された画像および領域に含まれる画素のデータを抽出していくことにより送信すべきデータのストリームを生成し、前記ホスト端末に送信するステップと、
を含むことを特徴とする画像データ処理方法。 - 撮像素子が対象物を撮影して得られる動画像の各フレームを多段階で縮小することにより異なる解像度の複数の画像のデータを、それぞれ所定の画素順に生成し、ストリームとして順次出力する機能と、
前記出力する機能において出力された前記複数の画像のデータのうち所定の複数の画像のデータを、画像の一列分の画素列またはそれより小さい範囲の画素列ごとに接続してストリームとして出力することにより、前記所定の複数の画像を含む仮想的な合成画像を生成する機能と、
接続したホスト端末からデータの送信要求を受け付け、前記出力する機能および生成する機能によって出力された複数のストリームから、要求された画像および領域に含まれる画素のデータを抽出していくことにより送信すべきデータのストリームを生成し、前記ホスト端末に送信する機能と、
をコンピュータに実現させるコンピュータプログラム。 - 撮像素子が対象物を撮影して得られる動画像の各フレームを多段階で縮小することにより異なる解像度の複数の画像のデータを、それぞれ所定の画素順に生成し、ストリームとして順次出力する機能と、
前記出力する機能において出力された前記複数の画像のデータのうち所定の複数の画像のデータを、画像の一列分の画素列またはそれより小さい範囲の画素列ごとに接続してストリームとして出力することにより、前記所定の複数の画像を含む仮想的な合成画像を生成する機能と、
接続したホスト端末からデータの送信要求を受け付け、前記出力する機能および生成する機能によって出力された複数のストリームから、要求された画像および領域に含まれる画素のデータを抽出していくことにより送信すべきデータのストリームを生成し、前記ホスト端末に送信する機能と、
をコンピュータに実現させるコンピュータプログラムを記録した記録媒体。
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/116,630 US9247132B2 (en) | 2011-05-19 | 2012-04-05 | Moving picture capturing device, information processing system, information processing device, and image data processing method |
BR112013029656-9A BR112013029656B1 (pt) | 2011-05-19 | 2012-04-05 | Dispositivo de captura de imagem em movimento, sistema e dispositivo de processamento de informação, método de processamento de dados de imagem, e,mídia de gravação legível por computador |
RU2013156453/07A RU2570195C2 (ru) | 2011-05-19 | 2012-04-05 | Устройство съемки движущихся изображений, система и устройство обработки информации и способ обработки изображений |
EP12785074.1A EP2712177B1 (en) | 2011-05-19 | 2012-04-05 | Moving picture capturing device, information processing system, information processing device, and image data processing method |
MX2013013313A MX2013013313A (es) | 2011-05-19 | 2012-04-05 | Dispositivo de captura de imagenes en movimiento, sistema de procesamiento de informacion, dispositivo de procesamiento de unformacion y metodo de procesamiento de datos de imagenes. |
KR1020137029578A KR101451734B1 (ko) | 2011-05-19 | 2012-04-05 | 동화상 촬영장치, 정보처리 시스템, 정보처리장치 및 화상 데이터 처리방법 |
CN201280022231.1A CN103518368B (zh) | 2011-05-19 | 2012-04-05 | 动图像拍摄装置、信息处理系统、信息处理装置、及图像数据处理方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011112931A JP5629642B2 (ja) | 2011-05-19 | 2011-05-19 | 動画像撮影装置、情報処理システム、情報処理装置、および画像データ処理方法 |
JP2011-112931 | 2011-05-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012157178A1 true WO2012157178A1 (ja) | 2012-11-22 |
Family
ID=47176533
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/002397 WO2012157178A1 (ja) | 2011-05-19 | 2012-04-05 | 動画像撮影装置、情報処理システム、情報処理装置、および画像データ処理方法 |
Country Status (10)
Country | Link |
---|---|
US (1) | US9247132B2 (ja) |
EP (1) | EP2712177B1 (ja) |
JP (1) | JP5629642B2 (ja) |
KR (1) | KR101451734B1 (ja) |
CN (1) | CN103518368B (ja) |
BR (1) | BR112013029656B1 (ja) |
MX (1) | MX2013013313A (ja) |
RU (1) | RU2570195C2 (ja) |
TW (1) | TWI496471B (ja) |
WO (1) | WO2012157178A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104125450A (zh) * | 2013-04-26 | 2014-10-29 | 索尼电脑娱乐公司 | 图像拾取装置、信息处理系统和图像数据处理方法 |
AU2012288349B2 (en) * | 2011-07-25 | 2015-11-26 | Sony Interactive Entertainment Inc. | Moving image capture device, information processing system, information processing device, and image data processing method |
US20160080645A1 (en) * | 2014-09-12 | 2016-03-17 | Sony Computer Entertainment Inc. | Image pickup apparatus, information processing apparatus, display apparatus, information processing system, image data sending method, image displaying method, and computer program |
WO2020261813A1 (ja) * | 2019-06-28 | 2020-12-30 | ソニーセミコンダクタソリューションズ株式会社 | 送信装置、受信装置及び伝送システム |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130024504A (ko) * | 2011-08-31 | 2013-03-08 | 삼성전기주식회사 | 삼차원 카메라 시스템 및 주시각 조정 방법 |
WO2014014238A1 (en) * | 2012-07-17 | 2014-01-23 | Samsung Electronics Co., Ltd. | System and method for providing image |
US9232177B2 (en) * | 2013-07-12 | 2016-01-05 | Intel Corporation | Video chat data processing |
JP2015195562A (ja) * | 2014-03-20 | 2015-11-05 | 村上 幹次 | 送信信号処理装置及び方法、受信信号処理装置 |
JP6390245B2 (ja) * | 2014-07-31 | 2018-09-19 | カシオ計算機株式会社 | 画像格納装置、画像管理方法及びプログラム |
JP6218787B2 (ja) * | 2015-09-29 | 2017-10-25 | 株式会社ソニー・インタラクティブエンタテインメント | 撮像装置、情報処理装置、表示装置、情報処理システム、画像データ送出方法、および画像表示方法 |
US11144553B2 (en) | 2015-11-30 | 2021-10-12 | International Business Machines Corporation | Streaming programmable point mapper and compute hardware |
JP2017158123A (ja) | 2016-03-04 | 2017-09-07 | ソニー株式会社 | 信号処理装置および撮像装置 |
JP6662745B2 (ja) * | 2016-10-04 | 2020-03-11 | 株式会社ソニー・インタラクティブエンタテインメント | 撮影装置、情報処理システム、および偏光画像処理方法 |
RU2647664C1 (ru) * | 2017-03-31 | 2018-03-16 | Общество С Ограниченной Ответственностью "Заботливый Город" | Способ обработки видеосигнала |
CN109002185B (zh) * | 2018-06-21 | 2022-11-08 | 北京百度网讯科技有限公司 | 一种三维动画处理的方法、装置、设备及存储介质 |
US10491760B1 (en) * | 2018-07-05 | 2019-11-26 | Delta Electronics, Inc. | Image transmission device, image transmission method, and image transmission system |
KR102147125B1 (ko) * | 2019-04-05 | 2020-08-24 | (주)루먼텍 | Ip워크플로우 기반 비디오 송수신 방법 및 그 시스템 |
KR102067191B1 (ko) * | 2019-06-28 | 2020-02-11 | 배경 | 상세영상 생성장치 |
JP7351140B2 (ja) * | 2019-08-26 | 2023-09-27 | 富士通株式会社 | 配信装置、配信システムおよび配信プログラム |
JP7308694B2 (ja) * | 2019-08-27 | 2023-07-14 | キヤノン株式会社 | 放射線撮像装置の制御装置及び制御方法並びに放射線撮像システム |
EP4031835A4 (en) * | 2019-09-22 | 2023-10-04 | Vayavision Sensing Ltd. | METHOD AND SYSTEMS FOR TRAINING AND VALIDATING A PERCEPTIONAL SYSTEM |
KR102312933B1 (ko) * | 2020-07-31 | 2021-10-14 | 한화시스템 주식회사 | 360도 파노라마 영상 획득용 상황 감시장치 및 상황 감시방법 |
CN112419361B (zh) * | 2020-11-20 | 2024-04-05 | 中国科学院上海微系统与信息技术研究所 | 一种目标追踪方法和仿生视觉装置 |
WO2023068956A1 (ru) * | 2021-10-19 | 2023-04-27 | Публичное Акционерное Общество "Сбербанк России" | Способ и система для определения синтетически измененных изображений лиц на видео |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0690444A (ja) * | 1992-09-07 | 1994-03-29 | Nippon Telegr & Teleph Corp <Ntt> | 人物像伝送方式 |
EP0999518A1 (en) | 1998-05-19 | 2000-05-10 | Sony Computer Entertainment Inc. | Image processing apparatus and method, and providing medium |
JP2011097521A (ja) * | 2009-11-02 | 2011-05-12 | Sony Computer Entertainment Inc | 動画像処理プログラム、装置および方法、動画像処理装置を搭載した撮像装置 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4709394A (en) * | 1985-08-23 | 1987-11-24 | Rca Corporation | Multiplexed real-time pyramid signal processing system |
US4849810A (en) * | 1987-06-02 | 1989-07-18 | Picturetel Corporation | Hierarchial encoding method and apparatus for efficiently communicating image sequences |
AU6899896A (en) * | 1995-08-21 | 1997-03-27 | Starcam Systems, Inc. | High-speed high-resolution multi-frame real-time digital camera |
US6879341B1 (en) * | 1997-07-15 | 2005-04-12 | Silverbrook Research Pty Ltd | Digital camera system containing a VLIW vector processor |
US20030123738A1 (en) * | 2001-11-30 | 2003-07-03 | Per Frojdh | Global motion compensation for video pictures |
JP2005176233A (ja) * | 2003-12-15 | 2005-06-30 | Canon Inc | 通信装置及び通信システム |
US8456515B2 (en) * | 2006-07-25 | 2013-06-04 | Qualcomm Incorporated | Stereo image and video directional mapping of offset |
JP2009146197A (ja) | 2007-12-14 | 2009-07-02 | Hitachi Ltd | 画像信号処理装置、画像信号処理方法、画像信号処理プログラムおよび表示装置 |
GB2456802A (en) * | 2008-01-24 | 2009-07-29 | Areograph Ltd | Image capture and motion picture generation using both motion camera and scene scanning imaging systems |
JP5474417B2 (ja) * | 2009-06-19 | 2014-04-16 | 株式会社エルモ社 | 動画データ生成装置、動画データ生成システム、動画データ生成方法およびコンピュータープログラム |
EP2592838B1 (en) * | 2010-07-08 | 2015-12-16 | Panasonic Intellectual Property Management Co., Ltd. | Image capture device |
-
2011
- 2011-05-19 JP JP2011112931A patent/JP5629642B2/ja active Active
-
2012
- 2012-04-05 EP EP12785074.1A patent/EP2712177B1/en active Active
- 2012-04-05 MX MX2013013313A patent/MX2013013313A/es active IP Right Grant
- 2012-04-05 RU RU2013156453/07A patent/RU2570195C2/ru active
- 2012-04-05 WO PCT/JP2012/002397 patent/WO2012157178A1/ja active Application Filing
- 2012-04-05 BR BR112013029656-9A patent/BR112013029656B1/pt active IP Right Grant
- 2012-04-05 KR KR1020137029578A patent/KR101451734B1/ko active IP Right Grant
- 2012-04-05 US US14/116,630 patent/US9247132B2/en active Active
- 2012-04-05 CN CN201280022231.1A patent/CN103518368B/zh active Active
- 2012-05-02 TW TW101115653A patent/TWI496471B/zh active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0690444A (ja) * | 1992-09-07 | 1994-03-29 | Nippon Telegr & Teleph Corp <Ntt> | 人物像伝送方式 |
EP0999518A1 (en) | 1998-05-19 | 2000-05-10 | Sony Computer Entertainment Inc. | Image processing apparatus and method, and providing medium |
JP2004195243A (ja) * | 1998-05-19 | 2004-07-15 | Sony Computer Entertainment Inc | 画像処理装置および方法、並びに提供媒体 |
JP2011097521A (ja) * | 2009-11-02 | 2011-05-12 | Sony Computer Entertainment Inc | 動画像処理プログラム、装置および方法、動画像処理装置を搭載した撮像装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2712177A4 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2012288349B2 (en) * | 2011-07-25 | 2015-11-26 | Sony Interactive Entertainment Inc. | Moving image capture device, information processing system, information processing device, and image data processing method |
US9736458B2 (en) | 2011-07-25 | 2017-08-15 | Sony Interactive Entertainment Inc. | Moving image capturing device, information processing system, information processing device, and image data processing method |
CN104125450A (zh) * | 2013-04-26 | 2014-10-29 | 索尼电脑娱乐公司 | 图像拾取装置、信息处理系统和图像数据处理方法 |
US20160080645A1 (en) * | 2014-09-12 | 2016-03-17 | Sony Computer Entertainment Inc. | Image pickup apparatus, information processing apparatus, display apparatus, information processing system, image data sending method, image displaying method, and computer program |
US9706114B2 (en) * | 2014-09-12 | 2017-07-11 | Sony Corporation | Image pickup apparatus, information processing apparatus, display apparatus, information processing system, image data sending method, image displaying method, and computer program |
WO2020261813A1 (ja) * | 2019-06-28 | 2020-12-30 | ソニーセミコンダクタソリューションズ株式会社 | 送信装置、受信装置及び伝送システム |
US12052515B2 (en) | 2019-06-28 | 2024-07-30 | Sony Semiconductor Solutions Corporation | Transmitting apparatus, receiving apparatus, and transmission system |
Also Published As
Publication number | Publication date |
---|---|
KR101451734B1 (ko) | 2014-10-16 |
EP2712177A1 (en) | 2014-03-26 |
EP2712177B1 (en) | 2020-11-04 |
TWI496471B (zh) | 2015-08-11 |
KR20130140174A (ko) | 2013-12-23 |
US20140078265A1 (en) | 2014-03-20 |
RU2013156453A (ru) | 2015-06-27 |
BR112013029656A2 (pt) | 2020-07-21 |
MX2013013313A (es) | 2014-02-10 |
CN103518368A (zh) | 2014-01-15 |
RU2570195C2 (ru) | 2015-12-10 |
CN103518368B (zh) | 2016-08-03 |
JP2012244438A (ja) | 2012-12-10 |
US9247132B2 (en) | 2016-01-26 |
JP5629642B2 (ja) | 2014-11-26 |
TW201251459A (en) | 2012-12-16 |
BR112013029656B1 (pt) | 2022-05-03 |
EP2712177A4 (en) | 2014-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5629642B2 (ja) | 動画像撮影装置、情報処理システム、情報処理装置、および画像データ処理方法 | |
JP5701707B2 (ja) | 動画像撮影装置、情報処理システム、情報処理装置、および画像データ処理方法 | |
WO2012132167A1 (ja) | 情報処理システム、情報処理装置、撮像装置、および情報処理方法 | |
JP5781353B2 (ja) | 情報処理装置、情報処理方法、および位置情報のデータ構造 | |
JP6062512B2 (ja) | 撮像装置、情報処理システム、および画像データ送出方法 | |
JP5325745B2 (ja) | 動画像処理プログラム、装置および方法、動画像処理装置を搭載した撮像装置 | |
JP6121787B2 (ja) | 撮像装置、情報処理システム、および画像データ処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12785074 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012785074 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20137029578 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14116630 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2013/013313 Country of ref document: MX |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2013156453 Country of ref document: RU Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112013029656 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112013029656 Country of ref document: BR Kind code of ref document: A2 Effective date: 20131118 |