WO2013140776A1 - 画像処理を行うことにより、現実空間にある対象物の奥行きを定める画像処理装置、立体視装置、集積回路、プログラム - Google Patents
画像処理を行うことにより、現実空間にある対象物の奥行きを定める画像処理装置、立体視装置、集積回路、プログラム Download PDFInfo
- Publication number
- WO2013140776A1 WO2013140776A1 PCT/JP2013/001802 JP2013001802W WO2013140776A1 WO 2013140776 A1 WO2013140776 A1 WO 2013140776A1 JP 2013001802 W JP2013001802 W JP 2013001802W WO 2013140776 A1 WO2013140776 A1 WO 2013140776A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- histogram
- image processing
- coordinate
- coordinates
- color
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims description 139
- 238000009499 grossing Methods 0.000 claims abstract description 62
- 239000003086 colorant Substances 0.000 claims description 52
- 230000004044 response Effects 0.000 claims description 29
- 235000019557 luminance Nutrition 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 17
- 239000002131 composite material Substances 0.000 claims description 13
- 230000015572 biosynthetic process Effects 0.000 claims description 12
- 238000003786 synthesis reaction Methods 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 6
- 241001422033 Thestylus Species 0.000 description 115
- 238000000034 method Methods 0.000 description 80
- 238000004364 calculation method Methods 0.000 description 26
- 230000008569 process Effects 0.000 description 25
- 238000001514 detection method Methods 0.000 description 14
- 230000006872 improvement Effects 0.000 description 12
- 238000003860 storage Methods 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 11
- 238000012986 modification Methods 0.000 description 11
- 230000004048 modification Effects 0.000 description 11
- 239000000470 constituent Substances 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000013507 mapping Methods 0.000 description 6
- 238000009877 rendering Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000004888 barrier function Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 239000011521 glass Substances 0.000 description 5
- 239000003973 paint Substances 0.000 description 5
- 238000012935 Averaging Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000012937 correction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000002194 synthesizing effect Effects 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 230000007423 decrease Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 238000000465 moulding Methods 0.000 description 3
- 238000004806 packaging method and process Methods 0.000 description 3
- 239000011347 resin Substances 0.000 description 3
- 229920005989 resin Polymers 0.000 description 3
- 230000001174 ascending effect Effects 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000005206 flow analysis Methods 0.000 description 2
- 210000003128 head Anatomy 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 210000003739 neck Anatomy 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013468 resource allocation Methods 0.000 description 2
- 230000004043 responsiveness Effects 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 210000000617 arm Anatomy 0.000 description 1
- 101150064974 ass1 gene Proteins 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007664 blowing Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000002542 deteriorative effect Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 101150110220 trf1 gene Proteins 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/0346—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/0354—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of 2D relative movements between the device, or an operating part thereof, and a plane or surface, e.g. 2D mice, trackballs, pens or pucks
- G06F3/03545—Pens or stylus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04815—Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
Definitions
- the coordinate generation technique is a technique for detecting the positional relationship between a part of the user's body and the operation member, and generating the coordinates of the capture target based on the detection result.
- Smartphones and tablet terminals detect touch positions on the screen and determine plausible coordinates on the screen.
- the touch position detection on the screen since the touch position that can be detected is limited on the surface of the screen, the degree of freedom of operation is small. Therefore, improvements have been made to generate a three-dimensional coordinate of an object to be captured from a captured image.
- an object in real space is expressed by a collection of pixels having various gradations.
- the pixels of various gradations are traced, the shape of the object is not correctly reproduced.
- the object may be represented by a pixel group having a distorted shape.
- the technology for determining the depth of the object as described above is based on the determination of the depth of the image of the object in the photographed image, if a three-dimensional coordinate is generated from such a distorted pixel group, The three-dimensional coordinates are strange and the movement of the object may not be tracked correctly. Although it is conceivable to increase the accuracy by repeatedly executing the algorithm for generating the three-dimensional coordinates, it takes time to converge the repetition and cannot follow the movement of the object. If so, the responsiveness is extremely lowered, which brings stress to the user.
- An object of the present invention is to determine a plausible depth of an object even if the object appears in a distorted pixel group in a captured image.
- a histogram is generated in which the number of pixels of the specified color in the frame image data obtained by photographing the real space is associated with each of the plurality of coordinates of the reference axis of the screen, Image processing may be performed in which the one associated with a specific coordinate is selected from the plurality of appearance numbers shown in FIG. 1 and the depth of the target is determined using the selected number of appearances.
- the histogram generated above represents the continuous length of pixels having the specified color on the reference axis of the screen
- the vertical width and horizontal width of the pixel group formed by the image of the object in the real space in the frame image by generating such a histogram Reasonable grasp of the maximum and minimum values is possible.
- the depth generated from the maximum value and the minimum value of the vertical and horizontal widths is appropriate, and the three-dimensional coordinates of the object in the real space can be generated with high accuracy.
- Histogram generation is performed by counting a combination of a predetermined luminance Y, red difference Cr, and blue difference Cb when transferring pixel lines from the image sensor of the camera. Since it is not necessary to repeatedly execute the algorithm, it is possible to generate three-dimensional coordinates that follow the movement of the object without deteriorating the response performance.
- Fig. 2 shows the internal configuration of a plurality of aspects that can be achieved by a device having means for overcoming an implementation barrier.
- a display device providing a stereoscopic reproduction environment and its accessories are shown.
- 1 shows an internal configuration of a stereoscopic display device incorporating an image processing device.
- 3D shows a stereoscopic video image and a stereoscopic GUI that can be viewed through wearing shutter-type glasses.
- FIG. 4 shows where the area range for stylus detection is located.
- a plurality of frame images obtained by photographing by the photographing unit 12 and a process of processing by the image processing apparatus for each frame image are shown.
- the internal structure of the loop-up table for extracting the designated color pixel and the bit structure of the entry are shown.
- the pixel group extraction by the designated color pixel group extraction unit 22 is shown. It shows how the pixel group changes in three cases: when the stylus is close to the screen, when it is far away, and when it is in the middle position.
- a pixel group, an array variable that is an X-axis histogram, an array variable that is a Y-axis histogram, and an impulse response for smoothing are shown.
- the approximate curve which approximated the shape of the histogram smoothed by the impulse response, and the histogram before and behind smoothing are shown. It is the figure which drawn the relationship between a stylus, an X-axis histogram, a Y-axis histogram, and a pixel group in three dimensions.
- the internal structure of the display apparatus which concerns on 2nd Embodiment is shown. It is a flowchart which shows the whole procedure which concerns on 2nd Embodiment. It is a flowchart which shows a histogram synthetic
- a stylus according to a second embodiment a pixel group extracted from the stylus, a generated X-axis histogram, and a Y-axis histogram are shown.
- the process of synthesizing the X-axis histogram generated for the pixel group of the specified color ma and the X-axis histogram generated for the specified color mb is shown.
- An identification method of how to distinguish a stylus sphere from an object of similar color is shown.
- the internal structure of the image process part 15 which concerns on 3rd Embodiment is shown.
- a stylus according to a third embodiment a pixel group extracted from the stylus, a generated X-axis histogram, and a Y-axis histogram are shown.
- FIG. 1 A flowchart which shows the whole procedure of the image processing apparatus which concerns on 3rd Embodiment. It is a flowchart which shows the calculation procedure of the rotation angle of a stylus.
- a stylus according to a fourth embodiment a pixel group extracted from the stylus, a generated X-axis histogram, and a Y-axis histogram are shown. It is a flowchart which shows the calculation procedure of a stylus rotation angle.
- a display device 101 to which a set of a camera 101a and a light emitting element 101b is attached is shown.
- Fig. 3 illustrates various aspects of a stylus.
- the display apparatus concerning 7th Embodiment is shown.
- a shape drawn by the user in space and three-dimensional coordinates generated from the movement of the stylus are shown. It is a figure which shows the process of the depth adjustment according to the shape of the capture
- the inventors faced various technical barriers in implementing image processing for determining the depth. The following are the steps to overcome this.
- the inventors selected the game control technique using the cylinder-shaped controller described in Patent Document 1 as a reference for research and development.
- Paragraph 0059 of Patent Document 1 discloses the derivation of the value of the cylinder inclination ⁇ .
- Such a slope ⁇ can be obtained by using the ratio w1: w2 of the values of the widths w1 and w2 of the pixel group. Since the ratio w2 / w1 of the widths w1 and w2 of the pixel group representing the cylinder is proportional to the inclination ⁇ of the cylinder on the YZ plane, the magnitude of the value of the inclination ⁇ can be obtained using this ratio w2 / w1. .
- a plurality of equidistant measurements are performed between the edges of the pixel group, and the ratio w2 / w1 is obtained using the average value.
- Paragraph 0062 of Patent Document 1 discloses one method for obtaining the depth value z. This is to count the total number of pixels corresponding to the object in the pixel group obtained by photographing. Since the number of pixels in the pixel group representing the cylinder is affected by the inclination in the ⁇ direction, it is necessary to first obtain a weighted value N ⁇ by weighting the number N of pixels with ⁇ . Since N ⁇ is proportional to the depth value z, the depth value z can be obtained from N ⁇ .
- the depth value calculation in the game control technology of Patent Document 1 is based on the premise that the shape of the cylinder is reproduced beautifully by the pixel group.
- the shape of the object does not appear clearly in the captured image, and the object has a distorted shape. It appears by the pixel group which forms. If an attempt is made to generate three-dimensional coordinates from such a distorted pixel group using Patent Document 1, the three-dimensional coordinates become strange, and the movement of the object cannot be tracked correctly.
- Patent Document 1 is for determining the position of a character in the virtual space of the game when the virtual space of the game is displayed on the screen of the flat display device and the character is activated therein. is there.
- the technique of Patent Document 1 as described above may be sufficient, but in an operation of touching an image (hereinafter referred to as a stereoscopic object) that pops out of the screen with a stereoscopic display device. Sufficient accuracy cannot be obtained. However, it is not realistic in terms of cost performance to provide an expensive distance measuring sensor to increase the distance between stereoscopic objects.
- Non-Patent Document 1 discloses a Continuously Adaptive Adaptive Mean Shift (CAMSHIFT) algorithm.
- CAMSHIFT uses the Hue ⁇ ⁇ Saturation Value color system to track the skin color probability distribution for face tracking.
- Procedure 1 Select the window size to be searched.
- Step 2 Select an initial location for the window to be searched.
- Procedure 3 Calculate the intermediate position in the window to be searched.
- Procedure 4 Place the window to be searched in the calculated intermediate location.
- Procedure 5 Step 3. Until the amount of movement of the location falls below the preset threshold. Step 4. repeat.
- the above-mentioned window is defined by a central cross line (centroid) connecting the eyes and nose and mouth on the human face.
- Procedure 1 The size of the window is defined by the horizontal width w and the vertical width l of the central cross line (centroid).
- the horizontal width w and the vertical width l of the central cross line (centroid) are expressed by a mathematical expression of 6 pages and 1 column.
- the parameters a and b used in this equation are calculated as shown on page 6, column 2 of Non-Patent Document 1, and are derived from the 0th-order moment M00, the first-order moment M11, and the second-order moment M22.
- These 0th-order moment M00, 1st-order moment M11, and 2nd-order moment M22 are also calculated from the intensity I (x, y) of the pixel located at the coordinates (x, y). *
- M00 is the 0th-order moment and M10 is the first-order moment, and is calculated by the sum of the intensities I (x, y) of the pixels located in the window.
- the inclination ⁇ of the central cross-line connecting the eyes and nose and mouth is derived as 0th moment M00, 1st moment M11, 2nd moment M20, M02 as shown in 2nd column on page 5.
- the inclination ⁇ of the center line is the inclination of the face.
- the face area is proportional to the distance from the camera. Therefore, if the skin color probability distribution is determined through the above-described central cross line (centroid) search, the distance from the camera to the face region can be calculated.
- the prior art described in Non-Patent Document 1 employs an iterative algorithm, when an object is drawn with a distorted pixel group, it takes time to converge the repeated execution of the algorithm. Unable to follow the movement. If so, the responsiveness is extremely lowered, which brings stress to the user.
- procedures. 2 select initial location for window to search
- procedure. In 3 calculation of the intermediate position of the window to be searched
- the above is the implementation barrier faced by the inventors in conceiving the present invention.
- FIG. 1 shows an internal configuration of a representative one of a plurality of modes that can be realized by an image processing apparatus having means for overcoming an implementation barrier.
- the plurality of aspects include a basic aspect (the following 1) and various derivations of the basic aspect (the following 2, 3, 4,).
- the implementation barrier is an image processing device that determines the depth of an object in a real space by image processing, and displays the number of appearance of pixels of a specified color in frame image data obtained by photographing the real space on the screen.
- a generating unit that generates a histogram corresponding to each of a plurality of coordinates of the reference axis, a smoothing unit that smoothes the generated histogram, and a plurality of occurrences shown in the smoothed histogram It is possible to overcome the problem by including a depth determination unit that selects the one associated with a specific coordinate and determines the depth of the object using the selected number of appearances.
- Fig.1 (a) shows the internal structure of this basic aspect.
- the number of pixels shown in the smoothed histogram is the basis for generating the three-dimensional coordinates. Since the variation in the number of horizontal pixels and the number of vertical strokes in the pixel group is unified, three-dimensional coordinate generation based on the number of pixels formed by the object is performed with high accuracy. As a result, the peak value of the histogram is easily determined, and three-dimensional coordinate generation can be performed satisfactorily. Since the influence of the shape of the pixel group being different for each captured image can be suppressed, tracking becomes easy.
- smoothing is to smooth the frequency of the histogram for each coordinate.
- the frequency of a certain coordinate is added to the frequency of the surrounding coordinates and divided by the number of coordinates, and the result is the new frequency. It is made by making.
- the smoothing filter include a moving average filter (averaging filter) and a Gaussian filter.
- a moving average filter (also known as an averaging filter, or simply a smoothing filter) is a technique of averaging using the frequencies of the coordinates around the target coordinate and using the average value as the frequency of the histogram. For example, the average value is obtained by multiplying the frequency of the coordinate of interest by the frequency around it by a 3 ⁇ 3 rate and a 5 ⁇ 5 rate.
- a rate is a collection of weighting factors to be assigned to each coordinate. However, the rate is adjusted to be 1 by adding all of them.
- the Gaussian filter is a filter that calculates the rate using a Gaussian distribution function so that the weight when calculating the average value increases as the coordinate is closer to the target coordinate, and the weight decreases as the distance increases.
- low-pass filters can be used as filters. This is because the low-pass transmission filter can smooth the frequency of the histogram. In the first embodiment at a later stage, an example in which an impulse response filter is used for smoothing will be described.
- the object in the real space is a capturing object whose movement is captured through the image processing of the present invention, and is an operating member as described in Patent Document 1, as described in Non-Patent Document 1.
- Patent Document 1 includes part of the human body (face, fingers, etc.). Since the description will be complicated if all of these are described, hereinafter, a part of the stylus and having a predetermined color (designated color) will be captured.
- the “depth” may be expressed in any unit system, but as an example of the description, the Z coordinate to be captured in the XYZ coordinate system assuming the arrangement of the stereoscopic object is selected.
- Such an X-Y-Z coordinate system represents the screen position of the display device in the X-Y coordinate system, and the spatial position from the screen to the user is represented by the Z-axis coordinate.
- the stereoscopic position of the stereoscopic object is defined in a space in which the XY plane exists at the origin of the Z axis. This is because if the depth is calculated using the Z coordinate of the X-Y-Z coordinate system, a depth with sufficient accuracy for the operation of the stereoscopic object is generated.
- ⁇ Aspect 2 Details of specific coordinates serving as a basis for depth determination> This aspect is obtained by adding the following modifications to the basic aspect.
- the specific coordinates include the coordinate on the reference axis associated with the maximum number of appearances in the smoothed histogram and the number of appearances after the next rank according to the maximum number of appearances in the smoothed histogram. It is a subordinate concept that there are coordinates on the reference axis that are associated with each other.
- the maximum frequency of the histogram generated based on the frame image data increases as the object approaches the camera and decreases as the object moves away. Since the histogram is smoothed, there is some correlation between the sum of the frequencies shown in the histogram and the depth of the object. Therefore, by assigning a plurality of numerical ranges that can be taken by the sum of frequencies shown in the histogram to each of a plurality of stages of the depth of the object, an appropriate depth can be derived from the above sum of frequencies.
- the image processing apparatus has a registration table in which two or more designated colors as a basis for generating a histogram are registered in advance, and a histogram for each of the two or more registered designated colors is generated.
- a histogram synthesizing unit that obtains a composite histogram in which the number of appearances for each coordinate indicated in the histogram is added together at the same coordinate and the addition result is the number of appearances corresponding to each coordinate;
- the coordinates for which the depth is determined by the determination unit are subordinately conceptualized as coordinates having an appearance number exceeding a predetermined threshold in the composite histogram.
- Register combinations of two or more pixel components as specified colors generate histograms for them, synthesize histograms, smooth the resulting histogram, and use it as the basis for depth determination Therefore, even if an image with a color similar to that of the capture target appears in the background, the depth of the capture target can be determined by excluding pixels of the similar image. As a result, even if a color similar to the user's clothes or background exists, the influence can be suppressed and the accuracy of depth determination can be increased.
- ⁇ Aspect 4 Additional calculation for the number of appearances> This aspect is obtained by adding the following modifications to the basic aspect. That is, any one of the two or more specified colors is a specific color to which a specific weighting factor is assigned, and the histogram combining unit generates a composite histogram for each coordinate of the histogram of the specific color.
- the number of appearances is a subordinate concept that is multiplied by a specific weighting coefficient and added to the number of appearances of the same coordinates in the histograms of other specified colors.
- ⁇ Aspect 5 Configuration addition for determining rotation angle>
- the object in the real space is an operation member having two or more specified colors
- the image processing apparatus includes a rotation angle determination unit that determines a rotation angle of the operation member, and the generation unit generates two or more specified colors.
- the rotation angle determination unit is based on the difference in how far the maximum and minimum coordinates in the histogram of each specified color are separated in the reference axis. It is a subordinate concept that determines the rotation angle of the operating member. When this subordinate conceptualization is applied, the internal configuration of the above aspect is as shown in FIG.
- Histogram generation and smoothing are repeated for each of two or more specified colors, and if the calculation process is performed according to the difference of the histogram coordinates, the rotation angle of the object can be derived, so 3D coordinate generation can be changed to subtle rotation of the object. Can be followed. Since it is possible to calculate how much the stylus is rotating, it is possible to realize an operation according to the delicate movement and rotation of the stylus, and to improve the operability.
- the designated color is specified by a combination of a luminance component and a color difference component constituting a pixel
- the image processing apparatus is configured to match a combination of the luminance component and the color difference component among a plurality of pixels constituting the frame image data.
- a pixel group extraction unit that extracts a pixel group in association with a plurality of coordinates on the reference axis, and the generation of a histogram by the generation unit associates the number of extracted pixels with the plurality of coordinates on the reference axis. It is a subordinate conceptualization that is made.
- the designated color is specified by a combination of luminances of a plurality of primary color components constituting a pixel, and the image processing apparatus matches a luminance combination of a plurality of primary color components among the plurality of pixels constituting the frame image data.
- a pixel group extraction unit that extracts a group of pixels associated with a plurality of coordinates on the reference axis, and the generation of the histogram by the generation unit associates the number of extracted pixels with the plurality of coordinates on the reference axis. It is a sub-conceptualization that is done by.
- the object in the real space is an operation member for operating a stereoscopic object that has popped out of the screen due to the stereoscopic effect of the stereoscopic device
- the reference axis is the X axis or the Y axis in the frame image data.
- the depth determined by the depth determination unit is a Z coordinate of a three-dimensional coordinate formed by the operation member, and the three-dimensional coordinate is used to generate an event that changes the behavior of the stereoscopic object in the stereoscopic device.
- the smoothing is a subordinate conceptualization that is performed by convolving an impulse response with the number of appearances for each coordinate shown in the histogram.
- the weighting coefficient is maximized at the center coordinates, and the weighting coefficient decreases from there to the periphery, so that the histogram can be transformed into an ideal shape suitable for determining the depth based on the area of the histogram. it can.
- ⁇ Aspect 10 Variations to be smoothed> There are the following basic variations. That is, a pixel group extracting unit that extracts a pixel group having a specific designated color from among a plurality of pixels constituting the frame image data, and smoothing the pixel value of each pixel in the extracted pixel group.
- the smoothing unit, the generation unit that generates the histogram indicating the number of pixels of the designated color in the smoothed pixel group in association with the plurality of coordinates of the reference axis of the screen, and the smoothed histogram are shown. It is possible to adopt a mode of a depth determination unit that selects one associated with a specific coordinate from a plurality of appearance numbers and determines the depth of the object using the selected number of appearances.
- the internal configuration of the above aspect is as shown in FIG.
- the change of the pixel in the extracted pixel group becomes smooth, and the variation of the frequency generated for this can be suppressed.
- a filter can be used for smoothing the pixels constituting the pixel group, so that the mounting cost can be reduced.
- the stereoscopic device includes an execution unit that executes an application, a reproduction unit that reproduces a stereoscopic image according to an instruction from the application, and an event indicating a depth value generated by the image processing device according to a user operation.
- the event manager for generating the stereoscopic video the application changes the content of the stereoscopic video playback by the playback unit in accordance with the event that has occurred. Since the operability of virtually touching the stereoscopic object that has jumped off the screen by the stereoscopic display device can be realized, a virtual reality feeling as if in a virtual space can be given to the user. Users are often freed from daily annoying matters. By doing this, you can gain vitality for tomorrow.
- ⁇ Aspect 12 Change in playback content>
- the stereoscopic image is composed of a playback video of a multi-view video stream or graphics drawn by an application, and the change in the playback content is the switching of the multi-view video stream to be played back or the switching of the graphics. It is a subordinate conception of being. With this aspect, it is possible to further enhance the interactivity with the user.
- the integrated circuit in the aspect is an integrated circuit that determines the depth of an object in the real space by image processing, and is obtained by photographing the real space
- a generating unit that generates a histogram indicating the number of occurrences of pixels of a specified color in the frame image data in association with each of a plurality of coordinates on the reference axis of the screen, a smoothing unit that smoothes the generated histogram, and a smoothing
- a depth determination unit that selects one associated with a specific coordinate from a plurality of occurrences shown in the converted histogram and determines the depth of the object using the selected number of occurrences. If it is, it is enough.
- the image processing apparatus can be made into components and modules, and the application of the image processing apparatus can be expanded to the semiconductor component industry.
- the program in the aspect is an image processing program that causes a computer to execute a process for determining the depth of an object in the real space by image processing, and shoots the real space Generating a histogram indicating the number of pixels of the specified color in the frame image data obtained in association with each of a plurality of coordinates on the reference axis of the screen, smoothing the generated histogram, and smoothed histogram It is sufficient to select the one associated with a specific coordinate from the multiple occurrences shown in Fig. 1 and let the computer execute the depth determination that determines the depth of the object using the selected occurrence number. . Since the program can be distributed through the network provider server and various recording media, the application of the image processing apparatus can be expanded to general computer software and online service industries.
- the display device is a stereoscopic display device, and is traded as a digital device such as a television, a tablet terminal, or a smartphone.
- FIG. 2A shows a system including a display device that provides a stereoscopic reproduction environment and its accessories.
- This system includes a stereoscopic television apparatus 101 that displays a right-eye image and a left-eye image in one frame period, shutter-type glasses 102 worn by a user during stereoscopic playback, and a stereoscopic that pops out of the screen during stereoscopic playback. It is comprised from the stylus 103 for operation with respect to a visual object.
- the screen of the display device 101 in this figure has a content in which the right eye image and the left eye image overlap. If the user wears the shutter glasses 102, the stereoscopic object will appear to jump out of the screen.
- a camera 101a is provided on the upper surface of the screen.
- This is a network camera used for photographing a user when realizing a videophone, but in this embodiment, an application for photographing an image of a user who operates a stylus is given.
- FIG. 2 (b) shows the configuration of the stylus.
- the stylus 103 in this figure is composed of a shaft portion 102a, a pen tip 103b, and a sphere 103c attached to the handle tip. That is, such a stylus is a two-way type having a nib 103b suitable for operation of a touch panel and a spherical body 103c suitable for capturing a capture target.
- the sharp tip can be used for touch panel operation
- the sphere can be used for camera tracking operations.
- FIG. 3 shows an internal configuration of a stereoscopic display device incorporating the image processing device.
- reference numerals 1 to 15 are given to the constituent elements of the display device. The constituent elements of the display device will be described in ascending order of the reference numerals. As shown in FIG.
- a stereoscopic display device incorporating an image processing device includes an external interface unit 1, a storage 2, a video decoder unit 3, a left-eye video plane 4a, a right-eye video plane 4b, a rendering unit 5, and a left-eye.
- FIGS the following description of the components of the display device will be made with the use cases shown in FIGS. In this use case, the user moves the stylus to be captured in front of the screen during playback of the stereoscopic video.
- External interface unit 1 is an interface with an external input, and includes a network interface, a broadcast tuner, and an input / output unit of a local drive.
- the storage 2 stores various files acquired through the interface 1.
- a class file written in an object-oriented programming language a stream file storing a multi-view video stream, an image data file storing image data, and three-dimensional shape model data data are stored.
- the video decoder unit 3 decodes the multi-view video stream.
- the multi-view video stream has a frame sequential format.
- the frame sequential format is a video format that forms a mixed video by alternately displaying a reference viewpoint video frame and an additional viewpoint video frame on the playback unit.
- the reference viewpoint video frame is a left eye image
- the additional viewpoint video frame is a right eye image.
- the video decoder unit 3 obtains a right-eye image and a left-eye image by decoding the multi-view video stream, and writes them into the right-eye video plane and the left-eye video plane. Such decoding is performed in accordance with a reproduction API call from the application.
- Arrows V0, V1, and V2 in FIG. 3 indicate a data flow for video data.
- V0 and V1 schematically indicate data supply to the video decoder unit 3.
- V2 indicates the output of uncompressed pixels by the decoder.
- FIG. 4A is a stereoscopic video image that can be viewed through such wearing.
- a stereoscopic object is projected from the screen by playing a multi-view video stream.
- the bear in the figure is a stereoscopic object that jumps out of the screen of the display device by playing a multi-view video stream. In such a situation, the stylus is given the use of virtually touching the bear that has jumped out of the screen.
- the rendering unit 5 performs graphics drawing according to a drawing instruction from the application. Graphics are generated by performing a series of drawing processes such as coordinate conversion, illuminance calculation, and viewport conversion on 3D shape model data obtained by expanding compressed images in the format of PNG, JPEG, TIFF, etc. There is something.
- the illuminance calculation includes texture mapping, and an image developed on the surface of the three-dimensional shape model data can be pasted by the texture mapping.
- drawing an image a set of left-eye graphics and right-eye graphics for stereoscopic viewing is obtained by drawing the same image at two drawing coordinates separated by an interval corresponding to binocular parallax.
- the left eye viewpoint position and the right eye viewpoint position are defined in the virtual space separated by human eyes, and viewport conversion is performed for each of the left eye viewpoint position and the right eye viewpoint position.
- a pair of left-eye graphics and right-eye graphics for stereoscopic viewing is obtained.
- the left-eye graphics plane 6a and the right-eye graphics plane 6b store an uncompressed pixel group constituting the graphics obtained by drawing separately for those constituting the left-eye graphics and those constituting the right-eye graphics. To do. If these right-eye graphics and left-eye graphics are displayed within one frame period, the right-eye graphics and the left-eye graphics overlap on the screen. If this screen is viewed while wearing the shutter-type glasses 102, the stereoscopic image of the GUI can be seen.
- FIG. 4B shows a stereoscopic GUI that can be viewed through such wearing.
- FIG. 4B assumes a situation in which a stereoscopic object, which is a GUI drawn by an application, pops out from the screen. A GUI that accepts selection of the previous page (Prev) and the next page (Next) in the figure is shown. In such a situation, the stylus is given the use of touching the GUI that pops out from this screen.
- the synthesizing unit 7 performs scaling and filtering on the line pixels stored in the left-eye graphics plane, the left-eye video plane, the right-eye graphics plane, and the right-eye video plane, and superimposes the moving image and the moving image by superimposing those located at the same coordinates. And layer composition with graphics. As a result, it is possible to obtain a combined stereoscopic image in which a stereoscopic moving image and graphics are combined.
- the display unit 8 includes a display panel in which light-emitting elements such as a plurality of organic EL elements, liquid crystal elements, and plasma elements are arranged in a matrix, a drive circuit attached to four sides of the display panel, and an element control circuit.
- the video plane 4a, b, The light emitting element is blinked according to the pixels constituting the frame image data stored in the graphics planes 6a and 6b.
- the heap memory 9 is a stack area in which byte codes of applications loaded from a local drive or a remote drive on the network are stored, and such byte codes are used for processing by the platform unit.
- the platform unit 10 includes a class loader, a bytecode interpreter, and an application manager, loads a class file from a local drive or a remote drive on the network, and executes a bytecode application that is an instance thereof.
- the processing content of the bytecode application includes a multi-view video stream playback instruction and a GUI drawing instruction.
- a multi-view video stream playback instruction is made by selecting a multi-view video stream to be played back and making a playback API call indicating the locator of the multi-view video stream.
- the drawing instruction of the GUI is made by calling the drawing API by defining image data or shape model data to be drawn, drawing coordinates and arguments. Arrows a0, a1, a2, and a3 in FIG.
- a0 is a class file download
- a1 is a class load from local storage to heap memory
- a2 indicates an event output that triggers application state transition
- a3 indicates a GUI drawing instruction by the application.
- the special feature is that the event that triggers the operation of this application includes three-dimensional coordinates indicating a specific part of the stylus, and changes the reproduction target and the drawing target according to the three-dimensional coordinates. Is a point. If such an event occurs during playback of a multi-view video stream, the target multi-view video stream is changed to another, and stream playback is instructed again.
- the multi-view video stream being played back is a bear's still-state stereoscopic video and the multi-view video stream after the change is a stereoscopic video in its active state, It is possible to produce a screen effect that moves around. If such an event occurs in the operation waiting state by the GUI display, the image data or the shape model data to be drawn is changed to another one, and the graphics drawing is instructed again.
- the stereoscopic graphics being played back here are GUI graphics that accept page operations for the previous page and the next page, and the changed image data or shape model data is for rendering the previous page and the next page. For example, it is possible to produce a screen effect that allows the page to be turned by the movement of the stylus.
- the event manager 11 defines a specific space area in a stereoscopic space where a stereoscopic object can exist.
- the image processing apparatus determines whether the coordinates exist in the spatial domain, and if so, generates a touch event that informs that the stylus has touched the stereoscopic object. To do. When it is determined that it does not belong to the specific space area, such a touch event is not generated.
- FIG. 5 shows where a specific region range is located in FIGS. 4 (a) and 4 (b).
- the grids gd1, gd2, gd3, gd4,... In the figure define a spatial region just in the head of the bear's resting state.
- the grids gd11, gd12, gd13, gd14,... In (b) define the space area in the portion of the GUI previous page button and next page button where the next page button exists.
- the photographing unit 12 controls the camera 101a, which is a network camera, and obtains one frame image during the frame period.
- the first row in FIG. 6 shows a plurality of frame images obtained by photographing by the photographing unit 12.
- the frame image of frame 1 is a front image of a user who tries to strike a bear's head, which is a stereoscopic object, with a stylus handle.
- the frame images after frame 2 are white, but the contents of the frame image are merely omitted, and there is a continuation of the user's image in frame 1. Since the stereoscopic object exists virtually, it does not appear in the captured image from this camera. The user will be photographed as if the user had a stylus in a blank space in front of the screen.
- the frame memory 13 stores pixels corresponding to position screens constituting a frame image obtained for each frame period by imaging by the imaging unit 12.
- resolutions of the frame image such as 1920 ⁇ 1080 and 640 ⁇ 480, and the number of vertical pixels and the number of horizontal pixels of such resolution are stored.
- a pixel can be expressed by a gradation of luminance Y, a gradation of red difference Cr, a gradation of blue difference Cb, and transparency T. Further, it can be expressed by a combination of the gradation of the R pixel component, the gradation of the G pixel component, the gradation of the B pixel component, and the transparency T.
- the transparency T indicates how much the pixels of the graphics plane are transmitted when the synthesis unit 7 superimposes the pixels of the video plane with the pixels of the graphics plane.
- an arrow V ⁇ b> 3 indicates user feedback of a captured image captured by the imaging unit 12. This feedback is made by writing back pixels from the frame memory 13 to the video planes 4a and 4b.
- the GUI drawn by the application and the video data are combined, and the combined image is displayed on the display unit 8.
- Pixels of the captured image in the frame memory 13 are output to the three-dimensional coordinate generation unit 15 in addition to the video planes 4a and 4b, and are subjected to image processing by the three-dimensional coordinate generation unit 15.
- Sensor 14 detects a remote control, panel operation, screen touch, etc., and notifies the event manager.
- i1 and i2 indicate inputs to the event manager. This input includes an input i1 of touch coordinates from the sensor 14 and an input i2 of 3D coordinates generated by the 3D coordinate generation unit 15.
- the image processing unit 15 captures the movement of the capture target by performing image processing on the plurality of frame images acquired by the photographing unit 12 and generating three-dimensional coordinates.
- the captured movement of the capture target is the movement of the capture target that moves back and forth between the screen of the display device and the user.
- the arrows up0 and dw0 in FIG. 3 symbolically indicate the movement of moving the capture target away from the screen and the movement of the capture target close to the screen, and the arrow cr0 symbolically indicates the movement of moving the capture target horizontally with the screen.
- the display device which is a component that plays the role of an image processing device
- every time a frame image is obtained by photographing by the photographing unit 12 image processing is performed on the frame image, and the depth of the stylus appearing in the frame image is reduced. Then, the three-dimensional coordinates of the stylus are generated and output to the event manager 11.
- the second row in FIG. 6 is a pixel group extracted from the first frame image. Since there is a user image holding the stylus in the frame image, a group of pixels at the tip of the stylus is extracted from each frame image.
- the XY coordinate system that defines the pixels of the pixel group, there are an X-axis histogram and a Y-axis histogram on the X-axis and Y-axis.
- the third level is the three-dimensional coordinates ((x1, y1, z1) (x2, y2, z2) (x3, y3, z3) ... generated by the X-axis histogram and Y-axis histogram of the second level. ⁇ ) This is a result of image processing by the image processing apparatus. These three-dimensional coordinates have different values for each frame image.
- the fourth row shows how the three-dimensional coordinates (Xi, Yi, Zi) in an arbitrary i-th frame are calculated. If the maximum frequencies of the X-axis histogram and Y-axis histogram of frame i are h (Xm) and h (Ym), Xi and Yi of (Xi, Yi, Zi) are Xm and YM. Zi of (Xi, Yi, Zi) is f (h (Xm) + h (Xm-1) + h (Xm + 1) + h (Ym) + h (Ym-1) + h (Ym + 1) ).
- the function f () is a function for deriving the Z coordinate from the frequency of the histogram.
- the maximum frequency h (Xm) and the frequency h (Xm ⁇ 1) and subsequent frequencies corresponding to the maximum frequency h (Xm ⁇ 1) is selected as the argument of the function.
- h (Xm + 1), h (Ym-1), and h (Ym + 1) is selected as the argument of the function.
- the constituent elements of the integrated device that realizes processing corresponding to the image processing device in the display device will be described.
- constituent elements of the image processing unit 15 are given reference numerals in the 20s. These components will be described in ascending order of the reference numerals.
- the image processing unit 15 includes a designated color loopup table 21, a designated color pixel group extraction unit 22, a storage 23, a histogram generation unit 24, a histogram storage unit 25, a histogram smoothing unit 26, and a smoothed histogram storage.
- Unit 27 and a three-dimensional coordinate generation unit 28 are examples of the image processing unit 15.
- FIG. 8 to FIG. 12 show the principle of three-dimensional coordinate generation by the designated color pixel group extraction unit 21 to three-dimensional coordinate generation unit 28 with specific examples.
- a stereoscopic object is operated with a stylus shaped sphere of a stylus.
- the designated color loopup table 21 is a loopup table to be referred to when extracting a pixel group, and defines what pixel component the designated color pixel to be extracted has.
- Colors that should be registered in the lap-up table as specified colors include molding colors that depend on the material of the stylus, paint colors when part of the stylus is painted, packaging colors when wrapped, etc. There is.
- the color of a pixel is defined by each radial coordinate corresponding to hue and saturation in the hue circle diagram, but in this embodiment, a combination of luminance component Y, red difference component Cr, blue difference component Cb, red It is defined by the combination of component R, green component G, and blue component B.
- FIG. 7A shows the internal configuration of the loop-up table when the pixel components of the designated color pixel are luminance Y, red difference Cr, and blue difference Cb.
- the loop-up table in this figure has a plurality of designated color pixel indexes (mp, mq, mr in the figure), and each of the designated colors has a luminance Y gradation (0 to 255).
- a combination of the gradation of red difference Cr (0 to 255) and the gradation of blue difference Cb (0 to 255) can be associated with each other.
- gradations can be designated in the range of 1 to 255 for each of the luminance Y, red difference Cr, and blue difference Cb.
- FIG. 7B shows a bit configuration of an entry associated with an index of each designated color in the Y, Cr, Cb format loop-up table.
- the bit width of one pixel is 32 bits, and 8 bits are assigned to each of the luminance Y, red difference Cr, and blue difference Cb. Of these 8 bits, an asterisk indicates a mask of lower bits. By using such a bit mask, a designated color having a gradation in a numerical range in which lower bits are different can be targeted for pixel group extraction.
- FIG. 7C shows the internal configuration of the loop-up table when the pixel components of the designated color pixel are a red component (R), a blue component (G), and a green component (B).
- the loop-up table in this figure has indexes (mp, mq, mr in the figure) of a plurality of designated color pixels, and each of the indexes has an R luminance gradation (0 to 255) and a B Combinations of gradation (0 to 255) and G gradation (0 to 255) can be associated. With such a loop-up table, gradations can be designated for each of RGB in the range from 0 to 255.
- FIG. 7D shows a bit configuration of an entry associated with each specified color index in the RGB format loop-up table.
- the bit width of one pixel is 32 bits, and 8 bits are assigned to each of RGB. Since it is complicated to describe each of the pixel representation by Y, Cr, Cb and the pixel representation by RGB, the designated color representation in the following explanation is unified to the luminance Y, red difference Cr, and blue difference Cb. Shall.
- the designated color pixel group extraction unit 22 is a pixel having a resolution of the order of 1920 ⁇ 1080 or 640 ⁇ 480 constituting a frame image when a frame image is obtained in the frame memory in the frame period, An extracted image is obtained by extracting one that matches the pixel component of the designated color pixel specified in 21. In this extracted image, pixels having pixel components that match the designated color pixels described in the loop-up table are arranged on a plain background. If there are two or more such designated colors, such an extracted image is generated for each of the two or more designated colors.
- FIG. 8 is a diagram illustrating pixel group extraction by the designated color pixel group extraction unit 22. Arrows ext1 and ext2 in the figure schematically show pixel group extraction from this frame image.
- the stylus sphere looks like a circle to the user's eyes, but becomes a distorted figure having a jagged outline when the pixels of the frame image are viewed microscopically.
- FIG. 9 shows how the pixel group changes in three cases: when the stylus approaches the screen, when the stylus moves away, and when the stylus exists at an intermediate position.
- FIG. 9A shows a case where the stylus exists at a position separated from the screen by a distance Zmiddle. Mmiddle indicates a pixel group extracted from a stylus image taken with such a distance.
- FIG. 9B shows a case where the distance between the stylus and the screen is far away (the distance at this time is Zfar). Mfar indicates a pixel group extracted from a stylus image taken with such a distance.
- FIG. 9C shows a case in which the distance between the stylus and the screen is close (the distance at this time is assumed to be near).
- Mnear represents a pixel group extracted from a stylus image taken with such a distance.
- Mmiddle it can be seen that in the pixel group Mfar in which the stylus is far away, the number of horizontal pixels of the pixel group extracted from the frame image is reduced.
- Mmiddle it can be seen that in the pixel group Mnear in the case where the stylus approaches, the number of horizontal pixels of the pixel group extracted from the frame image is reduced.
- Such a change in the number of horizontal pixels and a change in vertical pixels in the pixel group are clues for determining the depth.
- the histogram generation unit 24 generates the histogram by counting the number of appearance of the designated color pixel for each coordinate in the extracted image that is the extracted pixel group. Such generation is performed for a plurality of coordinates on the X axis and a plurality of Y coordinates on the Y axis.
- a histogram that shows the number of specified color pixels in association with each of a plurality of coordinates on the X axis is called an “X axis histogram”, and the number of occurrences of the specified color pixel in each of the plurality of coordinates on the Y axis.
- the histogram shown in association is called “Y-axis histogram”.
- the histogram storage unit 25 stores the X-axis histogram and the Y-axis histogram generated by the histogram generation unit.
- FIG. 10A shows the X axis and the Y axis of the XY coordinate system that defines the pixel group. The broken line in the figure defines a projection range in which the pixel group M is projected on the X axis.
- the X-axis histogram and Y-axis histogram target the coordinates of this projection range.
- the X-axis histogram and the Y-axis histogram can be configured by array variables that specify array elements using such coordinates.
- FIG. 10B shows array variables that define the X-axis histogram, and X0, X1, X2, and Xn are x-coordinates associated with frequencies in the X-axis histogram of FIG. 10B.
- the reason why these coordinates are associated with the frequency in the X-axis histogram is because the pixel group in FIG. 10A forms coordinates such as X0, X1, X2, and Xn on the X-axis.
- FIG. 10C shows array elements that define the Y-axis histogram.
- Y0, Y1, Y2, and Yn are Y coordinates associated with the frequency in the histogram of FIG.
- the reason why these coordinates are associated with the frequency in the Y-axis histogram is because the pixel group in FIG. 10A forms coordinates such as Y0, Y1, Y2, and Yn on the Y-axis. If such an array variable is defined using an object-oriented programming language and the array variable is set as an X-axis histogram and a Y-axis histogram, the appearance frequency of an arbitrary coordinate can be extracted.
- the histogram smoothing unit 26 calculates an average value of the frequencies shown in the X-axis histogram and the frequencies shown in the Y-axis histogram, and a smoothed X-axis histogram and a smoothed Y-axis histogram using the calculated average values as frequencies. Get.
- the histogram smoothing unit 26 uses an impulse response filter obtained by developing a moving average filter or a Gaussian filter for smoothing. In the smoothing by the impulse response filter, the generated X-axis histogram and Y-axis histogram are smoothed by convolution of the impulse response to obtain a smoothed X-axis histogram and a smoothed Y-axis histogram.
- FIG. 10D shows the convolution of I [n] with respect to the input impulse ⁇ [n].
- the impulse response I [n] is obtained by performing inverse discrete-time Fourier transform on the frequency response H (w) given to the desired system.
- the mathematical formula 1 in FIG. DTFT represents discrete time Fourier.
- Equation 3 A smoothed X-axis histogram and a smoothed Y-axis histogram are obtained by performing Equations 2 and 3 for the frequencies of the X-axis histogram and the Y-axis histogram.
- FIG. 11A is a graph showing an approximate curve approximating the shape of a histogram smoothed by an impulse response. In the figure, the highest value is at the center position, and the lower the value is at the periphery.
- FIG. 11B is an example of the X-axis histogram
- FIG. 11C is an example of the Y-axis histogram.
- FIG. 11D shows an X-axis histogram and a Y-axis histogram before and after smoothing in an XY coordinate system for specifying a pixel group.
- FIGS. 11B and 11C comparing the X-axis histogram and the Y-axis histogram before and after smoothing, in FIGS. 11B and 11C, there are two or more X coordinates and Y coordinates having the maximum frequency, and it is difficult to determine the peak.
- the histograms before smoothing shown in FIGS. 11B and 11C there are a plurality of maximum values, whereas in the histograms after smoothing processing shown in FIGS. There is only one X coordinate and Y coordinate (Xm, Ym). Since the maximum value can be set to one place, the detection position of the maximum value can be stabilized.
- the smoothed histogram storage unit 27 stores the smoothed X-axis histogram and smoothed Y-axis histogram obtained by the smoothing of the histogram smoothing unit 26.
- the 3D coordinate generation unit 28 is a component obtained by developing the depth determination by the depth determination unit up to 3D coordinate generation, and corresponds to the depth determination unit as described above.
- Z coordinate generation by the three-dimensional coordinate generation unit 28 is performed in the following process.
- the X coordinate (referred to as Xm) having the maximum number of appearances in the smoothed X-axis histogram and the Y coordinate (referred to as Ym) having the maximum number of appearances in the smoothed Y-axis histogram are specified.
- FIG. 12 is a three-dimensional drawing of the relationship between the stylus, the X-axis histogram, the Y-axis histogram, and the pixel group.
- the frequency sum in the vicinity of the histogram apex corresponds to the area of the central cross line in FIG. That is, the central cross line has a line width of 3 pixels and is orthogonal around the center of the pixel group.
- (Xm, Ym) represents the intersection of the central cross lines, and the entire length of the central cross line is calculated from the frequency sum in the vicinity of the vertices of the histogram.
- stereoscopic images generate 255 levels of depth by giving a parallax of 1 pixel to 255 pixels to the right eye image and the left eye image.
- the Z axis in FIG. 12 shows a numerical value of 255, which is a reference for this sense of depth.
- the values that can be taken by the sum of the frequencies of the smoothed X-axis histogram and the smoothed Y-axis histogram are divided into 255 numerical ranges, and these values are divided into 255 levels.
- the arrows ass1, ass2, and ass3 in FIG. 12 symbolize this assignment. Through such assignment, three-dimensional coordinates sufficient to determine the touch of the stereoscopic object can be acquired.
- FIG. 13 is a flowchart showing an overall procedure of image processing according to the first embodiment.
- the flowchart corresponds to the highest level process, that is, the main routine.
- step S10 is a determination of whether or not a frame image has been acquired by the camera.
- steps S11 to S15 a pixel group of the designated color of the acquired frame image is extracted (step S11), and an X-axis histogram and a Y-axis histogram of the pixels constituting the pixel group m are generated (step S12).
- the X-axis histogram and the Y-axis histogram are smoothed (step S13), and the maximum frequency of the X-axis histogram and the Y-axis histogram is detected to obtain the coordinates (Xm, Ym) (step S14).
- the procedure of obtaining the Zm coordinate from the sum of the histograms in the vicinity of (Step S15) is repeated every time Step S10 becomes Yes.
- Zm is h (Xm-1) + h (Xm) + h (Xm + 1) + h (Ym-1) + h (Zm) + h ( There is a method of calculating based on (Ym) + h (Ym + 1). Note that h (n) is the frequency at the position n.
- FIG. 14A is a flowchart showing an X-axis histogram generation procedure.
- this flowchart at the time of a subroutine call, one or more arguments are received and the process shown in this flowchart is executed. After the process is executed, the element of the return X-axis histogram is returned.
- a loop structure having a triple nest is formed. Among these, the innermost loop defined in step S23 repeats the processing in steps S24 to S25 for all the Y coordinates of the pixels constituting the pixel group m.
- “J” in the figure is a control variable that defines a loop in step S23. Step S22 repeats this innermost loop for all the X coordinates of the pixels constituting the pixel group.
- “I” in the figure is a control variable that defines a loop in step S22.
- step S21 the loop for each X coordinate defined in step S22 is repeated for all specified colors.
- “K” is a control variable that defines the loop in step S21.
- step S24 it is determined whether or not the pixel at the coordinates (Xi, Yj) has the designated color (k). If so, the frequency h (Xi) of Xi in the X-axis histogram is determined in step S25. ) Is incremented. If not, the process moves to the next Y coordinate.
- control variable j Since the control variable j is incremented every time this loop makes a round, the pixel at the coordinate Yj indicated by the control variable i is used for the processing of this loop. By repeating this loop, the frequency of each X coordinate in the X-axis histogram is set.
- FIG. 14B is a flowchart showing a Y-axis histogram generation procedure.
- the flowchart of this figure is made into a subroutine.
- a Y-axis histogram which is a return value is returned to the flowchart on the calling side. This return value is as shown at the end of the flowchart.
- a loop structure having a triple nest is formed.
- the innermost loop defined in step S28 repeats the processing in steps S29 to S30 for all X coordinates of the pixels constituting the pixel group.
- Step S27 repeats this innermost loop for all the Y coordinates of the pixels constituting the pixel group.
- step S26 the loop for each X coordinate defined in step S27 is repeated for all specified colors.
- the meaning of i, j, k is the same as that in FIG.
- step S29 it is determined whether or not the pixel at the coordinates (Xi, Yj) has the designated color (k). If so, in step S30, the frequency h (Yj) of Yj in the Y-axis histogram is determined. ) Is incremented. If not, the process moves to the next X coordinate.
- control variable i is incremented, so that the pixel at the coordinate Xi indicated by the control variable i is used for the processing of this loop.
- the frequency of each Y coordinate in the Y-axis histogram is set.
- FIG. 15A is a flowchart showing a procedure for smoothing the X-axis histogram.
- this flowchart at the time of a subroutine call, one or more arguments are accepted and the processing shown in this flowchart is executed. After the processing is executed, a smoothed X-axis histogram which is a return value is returned.
- the variable m is a variable that defines the convolution range in the positive direction of the X axis.
- the variable n is a variable that defines the convolution range in the negative direction of the X axis.
- step S31 the variables m and n are initialized by the calculation of (pixel group width-1) / 2, and the process proceeds to a loop defined in step S32. In this loop, the processes in steps S33 to S37 are repeated for the X coordinates of all the pixels constituting the pixel group.
- Step S33 is a determination as to whether x + m has exceeded the number of horizontal pixels on the screen, and if so, (number of horizontal pixels on the screen ⁇ x) is set to a variable m in step S34. If not, do not reset m.
- step S35 it is determined whether x ⁇ n is less than 0. If it is less, (x ⁇ 0) is set to the variable n in step S36. If not exceeded, do not reset n.
- Step S37 sets the frequency at the x-coordinate of the X-axis histogram smoothed by applying the frequency of the histogram at the X-coordinate and the frequency of the surrounding coordinates to a predetermined mathematical formula.
- the mathematical formula described in step S37 will be described.
- hsmoothed # x (x) indicates the frequency at the x coordinate of the smoothed X-axis histogram.
- Mathematical formula ⁇ I (x + i) ⁇ h (x + i) in step S37 is an impulse response convolution operation, and executes a convolution operation for m adjacent pixels.
- i represents an arbitrary one of a plurality of pixels adjacent in the positive direction on the X axis.
- h (x + i) indicates the frequency of the histogram of the pixel i ahead of the X coordinate.
- I (x + i) indicates an impulse response when calculating the frequency of the pixel placed at the i coordinate destination from the x coordinate.
- the symbol ⁇ means calculation of the sum of I (x + i) and h (x + i).
- Formula ⁇ I (x ⁇ i) ⁇ h (x ⁇ i) is a convolution operation of an impulse response, and performs a convolution operation on m adjacent pixels.
- i represents an arbitrary one of a plurality of pixels adjacent in the positive direction on the X axis.
- h (x-i) indicates the frequency of the histogram of the pixel i before the X coordinate.
- I (x-i) indicates an impulse response when calculating the frequency of the pixel located i-th before from the x coordinate.
- the symbol ⁇ means calculation of the sum of I (x ⁇ i) and h (x ⁇ i).
- the multiplication result obtained by dividing the sum by (m + n + 1) is the frequency of the smoothed X-axis histogram.
- FIG. 15B is a flowchart showing a smoothing procedure of the Y-axis histogram.
- the meanings of the variables m and n are the same as those in FIG.
- step S41 m and n are initialized by (horizontal width of pixel group ⁇ 1) / 2 (step S41), and the process proceeds to a loop defined by step S42. In this loop, the processes in steps S43 to S47 are repeated for the Y coordinates of all the pixels constituting the pixel group.
- step S43 it is determined whether or not y + m exceeds the number of vertical pixels on the screen. If so, (number of vertical pixels on the screen ⁇ y) is set to a variable m in step S44. If not, do not reset m.
- step S45 it is determined whether or not y ⁇ n is less than 0. If it is less, (y ⁇ 0) is set to the variable n in step S46. If not exceeded, do not reset n.
- step S47 the frequency at the y coordinate of the Y-axis histogram smoothed by applying the frequency of the histogram at the Y coordinate and the frequency of the surrounding coordinates to a predetermined mathematical formula is set. The mathematical formula described in step S47 will be described. hsmoothed # y (y) indicates the frequency at the y coordinate of the smoothed Y-axis histogram.
- Mathematical formula ⁇ I (y + i) h (y + i) in step S47 convolves an impulse response with respect to m pixels in the positive direction of the Y axis.
- i represents an arbitrary one of a plurality of pixels adjacent in the positive direction on the Y axis.
- h (y + i) indicates the frequency of the histogram of the pixel i ahead of the Y coordinate.
- I (y + i) represents an impulse response to be convolved when calculating the frequency of the pixel located at the i coordinate destination.
- the symbol ⁇ means calculation of the sum of I (y + i) and h (y + i).
- Formula ⁇ I (y ⁇ i) h (y ⁇ i) convolves an impulse response with respect to m pixels in the negative direction of the Y axis.
- h (y-i) indicates the frequency of the histogram of the pixel i ahead of the Y coordinate.
- I (y-i) represents an impulse response to be convolved in calculating the frequency of the pixel located at the y-i coordinate.
- the symbol ⁇ means calculation of the sum of I (y ⁇ i) and h (y ⁇ i).
- the multiplication result obtained by dividing the sum by (m + n + 1) is the frequency of the smoothed Y-axis histogram.
- the application can be linked to the movement of moving the stylus away and close from the screen. If the three-dimensional coordinates of the spatial area in which the stereoscopic display information exists are determined in advance and the coordinates of the tip of the stylus are inside the spatial area, the application changes the multi-view video stream to be reproduced. If the multi-view video stream after the change is an image of a “reverse stereoscopic object”, the user can experience that the user touches the stereoscopic object through the operation of the stylus.
- the second embodiment relates to an improvement for appropriately processing the contrast of a capture target.
- a stylus When a stylus is selected as an object to be captured, a plurality of colors of contrast are generated in the stylus image in the captured frame image due to indoor lighting or outdoor sunlight irradiation.
- the multi-color contrast includes the color of the portion directly exposed to light, the color of the normal portion, and the color of the shaded portion. Even if the molding color of the stylus and the paint color are one color, the object to be captured has a contrast depending on how it is illuminated. Therefore, in this embodiment, the color of the portion directly exposed to light in this contrast, the color of the normal portion
- the color of the shadowed part is registered in the image processing apparatus as a plurality of designated colors.
- a pixel group corresponding to the designated color is extracted, and an X-axis histogram and a Y-axis histogram are generated for each of the extracted pixel groups. Then, histogram synthesis is performed on the X-axis histogram and Y-axis histogram for each specified color generated as described above.
- FIG. 16 shows an internal configuration of the image processing unit 15 according to the second embodiment. This figure is drawn based on the internal configuration shown in FIG. 3, and is different from the internal configuration serving as the base in that a component is newly added. That is, the internal configuration of the image processing apparatus in FIG. 16 is obtained by adding new components and improving existing components.
- the added constituent elements are given with reference numerals in the 30s, and the appearance frequency of the designated color pixel shown in the X-axis histogram and the appearance frequency of the designated color pixel shown in the Y-axis histogram are added together for the same coordinates.
- the synthesized X-axis histogram, the histogram synthesis unit 30 for obtaining the synthesized Y-axis histogram, the weight coefficient storage unit 31 for storing the frequency to be multiplied for the summation of the histogram frequencies for each specified color, and the synthesized X-axis A combined histogram storage unit 32 that stores a histogram and a Y-axis histogram.
- the improvement of the existing constituent elements accompanying the addition of the constituent elements is that the histogram generating section 24 generates an X-axis histogram and a Y-axis histogram for each specified color, stores them in the histogram storing section 25, and delivers them to the histogram generating section 30 ( 1)
- the point to be smoothed by the histogram smoothing unit 26 is the composite X-axis histogram stored in the composite histogram storage unit 32, the composite Y-axis histogram (2), and the histogram smoothing unit 26 performs smoothing.
- the smoothed synthesized X-axis histogram and the smoothed synthesized Y-axis histogram are the points (3) that are the basis of the three-dimensional coordinate generation by the three-dimensional coordinate generation unit 28.
- the tip of the stylus is captured by using two or more colors, the three-dimensional coordinates of the tip are acquired, and the application is controlled based on the three-dimensional coordinates.
- FIG. 17 is a flowchart showing an overall procedure according to the second embodiment. This figure is drawn on the basis of the flowchart of FIG. 13, and is different in that some steps are replaced with another in this flowchart.
- Step S10 in FIG. 13 becomes Yes, and is given a reference number of the 50th generation.
- a series of steps in which these replacements have been made will be described.
- a pixel group ma of color A is extracted from the frame image (step S51), an X-axis histogram and a Y-axis histogram of the pixels constituting the pixel group ma are generated (step S52), and the color B included in the pixel group ma Are extracted (step S53), and an X-axis histogram and a Y-axis histogram of the pixels constituting the pixel group mb are generated (step S54).
- step S55 the frequency of the X-axis histogram and Y-axis histogram of the pixel group mb is multiplied by ⁇ (step S55).
- the X-axis histogram and Y-axis histogram that make up the specified color ma and the frequencies of the X-axis histogram and Y-axis histogram that make up the specified color mb are added together for the same coordinates, and the total value is each X coordinate and each Y coordinate.
- a composite X-axis histogram and a composite Y-axis histogram are obtained with the frequency of (step S56).
- step S58 the flow returns to the loop in step S10. If it exceeds, the Z coordinate is calculated from the sum of the frequencies in the vicinity of Xm and Ym.
- this loop makes a round, a new frame image is input, so that the new frame image is used for histogram generation by this loop.
- FIG. 18 is a flowchart showing a histogram synthesis procedure.
- k is an index indicating each of a plurality of designated colors
- ⁇ (k) is a weighting factor for the designated color (k).
- h # k (xi) indicates the frequency for the coordinate xi in the X-axis histogram generated for the specified color (k).
- Hx # sum (xi) indicates the frequency for the coordinate xi in the combined X-axis histogram generated for the specified color (k).
- Step S61 defines a loop for executing step S62 for all X coordinates of the X-axis histogram.
- Step S63 defines a loop for executing step S64 for all X coordinates of the Y-axis histogram.
- the processing structure is to process each of the plurality of data elements every time the loop makes a round.
- Designation of the designated color from which the pixel group is to be extracted is done by registering the combination of luminance Y, red difference Cr, and blue difference Cb that defines the designated color in the loop-up table, but luminance Y, red difference Cr, blue If only one combination of differences Cb is registered, there may be a case where a sufficient amount of pixels sufficient for generating a histogram cannot be extracted.
- pixels of similar color parts that are not styluses are not styluses. I will pick you up. Pixels of similar color parts that are not a stylus include a user's clothes that are similar to the stylus, and a user's background that is a color that is similar to the stylus.
- the colors that make up the contrast formed by the stylus are targeted for histogram generation, the histograms generated in this way are combined, and the frequency of the combined histogram is used, the frame image happens to have a similar color to the stylus. Even if the object is reflected, the object having the similar color can be excluded from the depth determination object. This is the technical significance of using contrast.
- FIG. 19A A specific example here is that a stylus as shown in FIG. 19A is used under indoor lighting. Such a stylus is painted in one color corresponding to the designated color ma, but there is a highlight due to reflection of indoor lighting.
- a circular area on the spherical surface of a sphere attached to the end of the stylus is an area where highlights due to irradiation of a light source exist, and has a designated color mb. The other areas have the designated color ma.
- the combination of the luminance Y, red difference Cr, and blue difference Cb of the highlight portion is registered as the designated color mb.
- 19B shows a pixel group extracted from a spherical image, an X-axis histogram, and a Y-axis histogram.
- the + symbol sm1 schematically indicates that the X-axis histogram hg1 for the designated color ma and the X-axis histogram hg2 for the designated color mb are subject to frequency addition, resulting in a combined X-axis histogram hg3.
- the + symbol sm2 schematically indicates that the Y-axis histogram hg4 for the specified color ma and the Y-axis histogram hg5 for the specified color mb are subject to frequency addition, and as a result, a combined Y-axis histogram hg6 is obtained.
- FIG. 20A shows an X-axis histogram generated for the pixel group of the designated color ma
- FIG. 20B shows an X-axis histogram generated for the designated color mb. Since the weight coefficient is assigned to the designated color mb, the frequency of the X-axis histogram for the designated color mb is multiplied by ⁇ .
- FIG. 20C shows a combined X-axis histogram obtained by combining the X-axis histogram for the designated color ma and the X-axis histogram for the designated color mb in FIG. The frequency of the histogram of the designated color ma is less than the threshold value, but the frequency of the composite histogram exceeds the threshold value.
- FIG. 20A shows an X-axis histogram generated for the pixel group of the designated color ma
- FIG. 20B shows an X-axis histogram generated for the designated color mb. Since the weight coefficient is assigned to the designated color mb, the frequency of the X-
- 20D shows a smoothed synthesized X-axis histogram obtained by smoothing the synthesized histogram.
- a broken line is an approximate curve.
- a set (Xm, Xm-1, Xm + 1) of the X coordinate having the maximum frequency and the X coordinate of the next order is found.
- a histogram of the pixels constituting the pixel group is generated with respect to the X axis and the Y axis (FIGS. 20A and 20B).
- Fig. 20 (c) and the smoothed histogram (Fig. 20 (d)) are used as the basis for depth determination, so the detection sensitivity increases and small objects can be detected. It becomes.
- FIG. 21 shows a similar color object in FIG. 21 which shows a pixel group extracted from a similar color object image and a pixel group extracted from a stylus sphere. And a contrast composed of a color md portion. Assuming that the colors mc and md having such a contrast are registered as designated colors, an X-axis histogram and a Y-axis histogram are generated for each of mc and md, as shown in the second row.
- the second row shows an X-axis histogram for two specified colors generated from a pixel group of similar color pixel groups and an X-axis histogram for two specified colors generated from a pixel group of a stylus sphere.
- the + symbol sm3 indicates that for the stylus sphere, the X-axis histogram hg11 corresponding to the designated color ma and the histogram hg12 corresponding to the designated color mb are to be combined.
- the + symbol sm4 indicates that for a similar color object, the X-axis histogram hg13 corresponding to the designated color mc and the histogram hg14 corresponding to the designated color md are to be combined.
- the third level is a combined X-axis histogram obtained by combining two X-axis histograms generated from pixel groups of similar color pixels, and two X-axis histograms generated from stylus sphere pixel groups. And a combined X-axis histogram obtained by combining. Since the coefficient ⁇ exceeding 1 is registered for the designated color mb, the maximum frequency of the combined X-axis histogram for the two X-axis histograms generated from the pixel group of the stylus sphere exceeds the threshold Th.
- the maximum frequency of the combined X axis histogram for the two X axis histograms generated from the pixel group of the similar color pixel group does not exceed the threshold Th.
- the weighting coefficient for the designated color mb it is possible to distinguish between the pixel group of the stylus sphere and the other pixel groups.
- the frequency included in the background can be canceled by subtracting the threshold for frequencies greater than or equal to the threshold Th by setting the frequency less than the threshold Th specified for the X-axis histogram and Y-axis histogram to 0 It is possible to suppress false detection and stabilize the generation of three-dimensional coordinates. Further, by combining the above-described histogram summation processing, for example, even in a situation where there is a threshold at the Th position in FIG. The object can be detected by the improvement effect.
- the stylus has two or more molding colors, paint colors, and packaging colors.
- the color of the portion directly exposed to light, the color of the normal portion, and the color of the shaded portion are registered in the image processing apparatus as a plurality of designated colors.
- a pixel group corresponding to the specified color is extracted, an X-axis histogram and a Y-axis histogram are generated for each of the extracted pixel groups, and the X-axis histogram for each specified color thus generated is generated. If the histogram synthesis is performed on the Y-axis histogram, the depth can be determined with higher accuracy.
- the designated color of the stylus highlight is registered, but the tip of the stylus is marked with a specific color paint, and the combination of the brightness Y, red difference Cr, and blue difference Cb of this paint is designated. You may register as color mb.
- FIG. 22 shows an internal configuration of the image processing unit 15 according to the third embodiment. This figure is drawn based on the internal configuration shown in FIG. 3, and is different from the internal configuration serving as the base in that new components are added.
- the addition of a new component in the internal configuration of FIG. 21 means that there is a rotation angle generation unit 41 that acquires the X coordinate and the Y coordinate from the X axis histogram and the Y axis histogram and calculates the rotation angle of the stylus. .
- the rotation angle generation unit 41 is a component corresponding to the rotation angle determination unit described above, and the rotation angle generated by the rotation angle generation unit 41 is the basis for event generation by the event manager. Descriptions of the combining unit described in the previous embodiment and components related thereto are omitted. This is because whether or not these components are provided is arbitrary. Now that the internal configuration according to the third embodiment has been described, the stylus configuration unique to this embodiment will be described.
- the rotation angle is the rolling depression angle that is the angle when rotating around the stylus as the X axis, the pitching angle angle that is the rotation angle when the stylus left and right is rotated as the Y axis, and the stylus up and down is the Z axis.
- rolling angle is selected as the angle angle as an object of explanation.
- the principle of determining the rotation angle in the present embodiment will be described with specific examples.
- FIG. 23A is an external view of a stylus according to the present embodiment.
- one hemispherical portion is painted with the designated color ma and the other hemispherical portion is painted with the designated color mb.
- FIG. 23B shows a state where the sphere at the end of the stylus is moved toward the screen.
- Arrows up1 and dw1 indicate movement trajectories along the vertical direction of the screen.
- An arrow ro1 indicates clockwise and counterclockwise rotation of the stylus.
- FIG. 24C shows an X-axis histogram and a Y-axis histogram for each designated color generated from a photographed image of a sphere that has been separately painted.
- Y0 [mb] indicates the minimum coordinate for the Y-axis histogram of the specified color [mb].
- Yn [ma] indicates the maximum coordinate for the Y-axis histogram of the specified color [ma].
- X0 [mb] is the minimum coordinate for the X-axis histogram of the specified color [mb].
- Xn [ma] indicates the maximum coordinate for the X-axis histogram of the specified color [ma].
- the difference ⁇ y formed on the Y axis by the Y axis histogram for each designated color is Yn (ma) ⁇ Y0 (mb).
- the difference ⁇ y made on the X axis by the X axis histogram for each specified color is Xn (ma) ⁇ X0 (mb).
- Fig. 23 (d) shows the geometric relationship between ⁇ and the coordinates of the histogram.
- the side AB of the triangle ABC is equal to Xn [ma] ⁇ X0 [mb]
- the side AC is equal to Yn [ma] ⁇ Y0 [mb].
- FIG. 24 is a flowchart showing an overall procedure of the image processing apparatus according to the third embodiment.
- This flowchart is an improvement based on FIG.
- the improvement of this flowchart is that step S71 is executed after obtaining Zm from the sum of histograms in the vicinity of Xm and Ym in step S59.
- step S71 the rotation angle ⁇ of the stylus is calculated from the difference between the coordinates of the X-axis histogram and the Y-axis histogram.
- the procedure of this step S71 can be expanded into the procedure of the subroutine of FIG.
- FIG. 25 is a flowchart showing a procedure for calculating the rotation angle of the stylus.
- step S75 the maximum X coordinate (Xn (ma)) of the X axis histogram of the designated color (ma) and the maximum Y coordinate (Yn (ma)) of the Y axis histogram of the designated color (ma) are obtained.
- the rotation angle of the stylus can be calculated in association with the generation process of generating the three-dimensional coordinates from the X-axis histogram and the Y-axis histogram.
- the movement can be captured precisely.
- FIG. 26 shows a configuration of a stylus according to the second embodiment.
- a sphere pair 103e of a sphere painted with the designated color ma and a sphere painted with the designated color mb exists at the tip of the stylus pattern.
- FIG. 26B schematically shows the movement of the stylus that can be captured in the present embodiment.
- An arrow up2dw2 in the figure indicates a movement locus along the vertical direction of the screen.
- An arrow ro2 indicates clockwise and counterclockwise rotation of the stylus. The principle of determining the rotation angle for the sphere pair will be described with reference to FIGS. 26 (c) to 26 (e).
- FIG. 26C shows a pixel group extracted from a frame image obtained by photographing the sphere pair 103e.
- the X-axis and Y-axis histograms generated for the specified color mb and the X-axis histogram and Y-axis histogram generated for the specified color ma are drawn on the X-axis and Y-axis in this drawing.
- the difference ⁇ y made on the Y axis by the Y axis histogram for each designated color is Yn (mb) ⁇ Y0 (ma).
- the difference ⁇ y made on the X axis by the X axis histogram for each specified color is Xn (ma) ⁇ X0 (mb). *
- FIG. 26 (d) shows the geometric relationship found at the rotation angle of the stylus.
- the rotation angle ⁇ of the stylus forms triangles ABC and CDE in the XY plane.
- FIG. 26 (e) shows the geometric relationship between ABC and CDE extracted. If the diameter of the sphere is the same and the distance from the center of the stylus to the center of the sphere is also the same, the triangle ABC and the triangle CDE are congruent.
- the length of sides BC and CD is half of Xn (ma) -X0 (mb), that is, Xn ( ma) -X0 (mb) / 2.
- the length of AB and DE is half of Yn (mb) ⁇ Y0 (ma), that is, Yn (mb) ⁇ Y0 ( ma) / 2.
- FIG. 27 is a flowchart showing a stylus rotation angle calculation procedure.
- step S81 of FIG. 27 the maximum X coordinate (Xn (ma)) of the X axis histogram of the designated color (ma) and the minimum Y coordinate (Y0 (ma)) of the Y axis histogram of the designated color (ma) are acquired.
- step S82 the minimum X coordinate (X0 (mb)) of the X axis histogram of the specified color (mb) and the maximum Y coordinate (Yn (mb) of the Y axis histogram of the specified color (mb) are acquired.
- the rotation angle ⁇ can be obtained based on the positional relationship of each color.
- a difference in histogram coordinates for each designated color appears greatly, so that it is possible to capture a subtle rotation of the stylus.
- FIG. 28A is a perspective view illustrating a display device 101 to which a set of a camera 101a and a light emitting element 101b is attached, and a stylus 103.
- FIG. In (a) the sphere attached to the tip of the stylus is processed with a white diffusing member, and reflects the LED color of the main body.
- FIG. 28B is a front view when the camera 101a and the light emitting element 101b are viewed from the front of the display device.
- 28C is a perspective view of the display device, and shows what kind of configuration the combination of the camera 101a and the light emitting element 101b attached inside the display device has.
- the stylus exists at the position (a)
- the stylus sphere diffuses the light emitted from the LED, and the diffused light enters the camera 101a.
- the surface color formed by the irradiated sphere is registered as a designated color, and the registered pixel group is extracted to generate a captured image.
- the image processing unit 15 Since the light emitting element paired with the camera illuminates the stylus, the image processing unit 15 generates a histogram from the irradiated image, and smoothes the resulting histogram to determine the depth based on the maximum frequency. Even if a color similar to the designated color of the stylus exists in the background, false detection can be avoided.
- the stylus according to the first embodiment is a 2-way type stylus having a pen tip adapted to the operation of the touch panel and a spherical body adapted to capture the capture target, but in this case, the pen tip and the spherical body are separated from each other.
- the pen tip and the spherical body are separated from each other.
- FIG. 29A shows that the stylus is covered with a cylindrical case 110 and the case is slid.
- the first level shows a storage form in which the cylindrical case 110 is slid in the handle tip direction to store the sphere.
- the second level is a cross-sectional view in a state where the sphere is accommodated. Thus, it can be seen that the ball of the handle is hidden in the cylindrical case 110 in the stored state.
- the third level is a cross-sectional view when the cylindrical case is slid in the pen tip direction. In this state, the sphere of the handle appears and can be used for operations on the stereoscopic object.
- the pen tip appears, and if it is slid in the direction of the pen tip, a ball appears. Accordingly, one of the pen tip and the handle tip sphere can be selected as a target for generating the three-dimensional coordinates.
- the sphere at the tip and rear of the stylus can be slid into and out of the case, erroneous detection can be prevented when the colors of the tip and sphere are similar.
- FIG. 29B shows a configuration in which a stylus is filled with a gel-like resin.
- the first stage is an external view
- the push button 111 is provided on the side surface
- the pen tip has a through hole 112.
- the second stage is a cross-sectional view.
- gel-like resin 113 is filled in the internal cavity of the stylus.
- the third level shows a state where the push button 111 is pressed. By such pressing, the gel-like resin 113 inside is pushed out, and a lump of gel is blown out through the through hole 112 provided in the pen tip.
- the gel color is registered as a designated color in the look-up table of the image processing apparatus and the user performs an operation of grasping the stylus while blowing from the stylus pen tip, a group of pixels constituting the image of the gel Histogram is generated for, and the resulting histogram is smoothed to determine the depth based on the maximum frequency.
- FIG. 29 (b) when the stylus button 111 is pressed, an object with a rubber-like color comes out from the tip and is captured by the camera 101a, so the tip and the sphere have similar colors. In some cases, false detection can be prevented.
- FIG. 29 (c) shows that a sphere is attached to the stylus by providing a conical hole in the sphere and inserting a stylus nib here.
- the touch panel operation and the tracking operation by the camera 101a can be performed with only one of the styluses by making the sphere skewered with a sharp tip.
- FIG. 29 (d) is an example in which the triangular pyramid constituting the pen tip is detachable
- FIG. 29 (e) is a sphere that is detachable from the stylus.
- the color of the sphere at the front end or the rear can be exchanged with a color with few false detections according to the color of the surrounding environment.
- FIG. 30A is a diagram illustrating an appearance of a display device according to the seventh embodiment.
- the camera 201 is attached to the display device via a support member, and is configured to look down on the screen obliquely downward.
- the screen of the display device is directed diagonally upward to the right, and the tip of the stylus is moving on the surface of the screen.
- Arrows up3 and dw3 in the figure indicate the vertical movement of the stylus.
- a cross cr1 in the figure schematically shows the locus of the stylus tip on the screen.
- the position of the stylus is captured not only when the stylus is moved at a position spatially separated from the screen but also when the screen surface is traced. This is because the image processing apparatus generates three-dimensional coordinates from the image.
- FIG. 30 (a) shows a frame image taken by the camera 101a of FIG.
- FIG. 30B shows a frame image taken by the camera 101a of FIG.
- mapping conversion is performed on the pixel group extracted at the time of pixel group extraction, and the extracted image corresponding to the lower half of the frame image is converted into a full-screen image.
- C shows an extracted image before map conversion and a full-screen image after map conversion.
- the left side of (c) is a screen frame extracted from the frame image of (b).
- the screen frame Since the position of the camera is diagonally above the screen, the screen frame has a shape (trapezoidal shape) as seen from above.
- the right side of (c) is a full-screen image after mapping conversion. If it is performed on the image after such mapping conversion, even a stylus operation for a planar view image can be a target for generating a three-dimensional coordinate.
- the touch sensor can be omitted in the tablet terminal, and the cost can be reduced. .
- the movement of the stylus that traces a little away from the screen is photographed, a histogram is generated for the photographed image, and the resulting histogram is smoothed to the maximum frequency.
- the coordinates indicating the depth of the stylus nib can be generated. Since the three-dimensional coordinates and rotation angle of the stylus are handled in the same row as the touch position on the screen and the tilt of the tablet terminal, the application can be operated in conjunction with the spatial position of the stylus. When the application is a game or a browser, the application can be operated to follow the three-dimensional movement of the stylus.
- a three-dimensional gesture is realized by applying the capture of the movement of the capture target described in the previous embodiments.
- the movement trajectory is represented by a plurality of frame images to generate an X-axis histogram and a Y-axis histogram, and if the user generates three-dimensional information based on this, the user can A three-dimensional shape drawn with a single stroke can be converted into a three-dimensional coordinate model.
- FIG. 31A shows a spiral shape drawn by the user in space and three-dimensional coordinates tr1, tr2, tr3,... Generated from the movement of the stylus.
- FIG. 31 (b) shows a spiral shape drawn by the user in the space and three-dimensional coordinates ur1, ur2, ur3,... Generated from the movement of the stylus.
- a plurality of three-dimensional coordinates indicating the movement trajectory of the stylus are generated from a plurality of frame images obtained by the camera 101a. Based on these coordinates, a user's gesture is recognized to generate a three-dimensional object model, and computer graphics Can be displayed.
- the gesture recognition is specifically performed as follows.
- a gesture dictionary is provided. This dictionary associates a normalized three-dimensional vector sequence with each gesture pattern to be recognized. In the above example, a normalized three-dimensional vector sequence is associated with each of the spiral shape and the pyramid shape.
- the image processing apparatus performs the image processing described in the embodiments so far on a plurality of frame images, three-dimensional coordinates of the capture target for each frame image are generated.
- a three-dimensional vector indicating the movement of the capture target is generated and normalized.
- the gesture dictionary is searched using a plurality of normalized three-dimensional vectors, and the most likely gesture is selected.
- the application executes processing corresponding to the selected gesture. In this case, if a user draws a spiral using a stylus, a tornado can be created in the game based on the spiral, and a pyramid can be created in the game if a tetrahedron is drawn.
- the amount of projection of the stereoscopic object varies depending on the screen size. Therefore, the weighting coefficient and the impulse response to be convoluted in generating the histogram and summing the frequencies of the histogram may be changed according to the projection amount of the stereoscopic object.
- the touch event including the three-dimensional coordinates generated by the image processing device may be used for device setup input and configuration input by realizing user authentication such as password input and interactive processing through a GUI member.
- the display device shown in the first embodiment may be configured as a stereoscopic theater system including a recording medium reproducing device and a display device.
- a recording medium reproducing device and a display device.
- the display unit and the image processing unit are components of the display device, and the platform unit, heap memory, video decoder, rendering unit, video plane, graphics plane, and event manager are components of the playback device.
- the image processing device provided in the display device outputs the three-dimensional coordinates to the playback device via the interface with the display device.
- the playback device of the recording medium generates an event and drives the application according to the received three-dimensional coordinates.
- smoothing has been realized with a moving average filter, a Gaussian filter, and an impulse response filter, but the smoothing is not limited to this, and by adding the frequency of multiple coordinates and dividing by the number of coordinates It is sufficient if the calculation is accompanied.
- Another example is as follows. That is, smoothing by kernel function, smoothing by local fit of polynomial, scatter plot smoothing, smoothing by spline function fitting, moving linear smoothing, Friedman supersmoother method, smoothing by moving median, Tukey moving center There are value smoothing and end point smoothing for the moving median.
- the plurality of coordinates to be frequency-counted in the histogram may be a plurality of coordinates that are continuous on the X axis or the Y axis, or may be obtained by thinning out a predetermined number of coordinates. By thinning out the coordinates for frequency counting, the number of histogram samples can be reduced, and the load of histogram generation can be reduced.
- the frequency that is the basis for determining the depth is the frequency of Xm, which is the coordinate of the maximum frequency, the frequency of Xm-1 in the front and the frequency of Xm + 1 in the next, but instead, It may be the frequency of coordinates separated by a certain number from Xm. Further, the frequency of Xm may not be adopted, and only the frequency of the neighborhood value such as the frequency of Xm-1 immediately before and the frequency of Xm + 1 immediately after may be used as a basis for determining the depth.
- the frequency used as the basis for determining the depth is the frequency of Ym, which is the coordinate of the maximum frequency, the frequency of Ym-1 the previous one, and the frequency of Ym + 1 the next, but instead The frequency of coordinates separated by a certain number from Ym may be used.
- the frequency of Ym may not be adopted, and only the frequency of the neighborhood value such as the frequency of Ym-1 immediately before and the frequency of Ym + 1 immediately after may be used as a basis for determining the depth.
- the multi-view video stream is a frame sequential format, but may be a vertical line interleaved format, a horizontal line interleaved format, a top-down format, a side-by-side format, a field sequential format, or a block interleaved format.
- the block interleaved format for example, the base viewpoint video and the additional viewpoint video are alternately arranged in blocks in the block of the mixed video to constitute the mixed video.
- the object for determining the depth may be a part of the human body.
- a Hue Saturation Value color system as described in Non-Patent Document 1 may be registered in the image processing apparatus as a specified color loop-up table, and a pixel group may be extracted.
- the human body is the object, the hand, arm, face, and neck of the human body are reflected in the frame image.
- the light-emitting element 101b shown in the first embodiment is used to irradiate a part of the human body that should be subject to depth determination.
- the depth determination target can be narrowed down to the fingertips of the human body.
- the contrast of the pattern formed by the surface of the stylus may be registered as a plurality of designated colors, and an X-axis histogram and a Y-axis histogram may be generated in accordance with the plurality of designated colors and used as a histogram synthesis target.
- the pattern contrast there can be a checker pattern, a rainbow pattern, a lattice pattern, a barcode, and a QR code (registered trademark). In this way, if various combinations of colors are registered in the loop-up table as designated colors, it is possible to appropriately distinguish between objects of similar colors and capture targets.
- the multiplicand is 16 bits long, this multiplicand is divided into four every 4 bits, and the product of this 4-bit part and a constant, that is, a multiple of 0 to 15 of the constant is stored in the above constant ROM. Stored.
- “calculation processing” and “arithmetic processing” in this specification do not mean only pure arithmetic operations, but ROM and the like. This also includes reading out of the recording medium in which the calculation result stored in the recording medium is read out according to the value of the operand.
- the histogram generation unit 24 can be realized as a histogram generation circuit having the following hardware configuration. Specifically, the histogram generation unit 24 includes the luminance Y, red difference Cr, and blue color difference Cb gradation bit values constituting the pixel, and the luminance Y, red difference Cr, blue color described in the specified color loop-up table. A comparison circuit that compares the gradation bit value of the difference Cb, a counter circuit that increments the count value when a match is determined by the comparison circuit, and a writing circuit that writes the pixel coordinates to the memory together with the count value can do.
- the histogram smoothing unit 26 can be realized as a histogram smoothing circuit having the following hardware configuration. More specifically, a multiplier / adder for multiplying the frequency of adjacent coordinates by a predetermined weighting factor, and a divider for dividing the addition result of the adder by the number of pixels of the plurality of pixels may be used. Since the smoothing is performed by weighted averaging, the histogram smoothing unit can be realized by causing the arithmetic operation circuit of the general-purpose processor to perform the following product-sum operation. The arithmetic operation circuit performs multiplication of the frequency stored in the register and the coefficient stored in the register according to the multiplication instruction.
- the product-sum value held in the product-sum result dedicated register is extracted, the multiplication result and the extracted product-sum value are added, and after the addition, the addition result is held in the product-sum result dedicated register.
- the arithmetic operation circuit repeats the above operation, the previous product-sum value is stored in the product-sum result dedicated register.
- the accumulated value of the product-sum result dedicated register is transferred to the general-purpose register, division is performed, and the result is used as the smoothed frequency.
- the three-dimensional coordinate generation unit 28 can be realized as a three-dimensional coordinate generation circuit having the following hardware configuration. Specifically, a comparison circuit that compares frequencies for each coordinate, a register circuit that stores the maximum frequency obtained by the comparison and the frequency according to this, a ROM circuit that holds multiple depths, a frequency According to the sum, a selector circuit that selectively outputs an appropriate depth from a plurality of depth values held in the ROM circuit may be used.
- FIG. 32A shows depth correction for a spherical body.
- a pixel group at the center is obtained by extracting a pixel group based on the designated color on the photographed image on the left side. If an X-axis histogram and a Y-axis histogram are generated from this extracted pixel group, the area of the central cross line as shown on the right side can be derived from the maximum frequency.
- the center crosshair is used as the depth reference, a portion corresponding to the surface of the sphere may be used as the depth reference.
- FIG. 32B shows the depth correction process for the pen tip.
- the left side shows a stylus pen tip, and the center shows an extracted image extracted from a photographed image of the pen tip.
- the left side shows a central cross line obtained from the maximum frequency of the smoothed histogram generated from this extracted pixel group.
- the shape of the pixel group at the pen tip is approximately an inverted triangle, and the central cross line generated based on this has a long lower side. Since this lower side corresponds to the pen tip, the length len1 of the lower side for correcting Zm, which is the depth of the stylus, is used. Furthermore, it is desirable to use an angle ⁇ formed by the lower side of the central cross line and the right side or the left side.
- the depth may be determined by applying the Xm frequency of the X-axis histogram and the Ym frequency of the Y-axis histogram to the continuously adaptive mean shift (CAMSHIFT). Since h (Xm) and h (Ym) determine the horizontal and vertical widths of the central crosshair, such h (Xm) and h (Ym) are used as the horizontal and vertical widths of the central crosshair as described above. By adopting it in the procedure (1) of the shift (CAMSHIFT), it becomes possible to realize the depth determination at an altitude that can capture a part of the human body.
- the system LSI may be configured by packaging the image processing unit 15 on a high-density substrate.
- a system LSI is a multi-chip module in which a plurality of bare chips are mounted on a high-density substrate and packaged so that the bare chip has the same external structure as a single LSI. .
- the architecture of the integrated circuit is composed of a preprogrammed DMA master circuit and the like, and is composed of a front-end processing unit (1) that performs general stream processing, and a signal processing unit that is configured by a SIMD processor and the like and performs general signal processing ( 2), a back-end unit (3) that performs pixel processing, image superposition, resizing, image format conversion AV output processing in general, a media interface (4) that is an interface with the drive and network, and a slave for memory access
- This circuit is composed of a memory controller (5) that implements reading and writing of packets and data in response to requests from the front end unit, signal processing unit, and back end unit.
- QFP Quad Flood Array
- PGA Peripheral Component Interconnect Express
- a frame image as a component of the internal processing system, a frame image, a conversion circuit that converts a pixel group or a histogram into a desired format, a cache memory that temporarily stores a data stream, a data transfer Buffer memory that performs speed adjustment, initialization program that reads the necessary program from ROM to RAM when power is turned on, initialization, power control circuit that performs power control according to the state of the histogram, MPU in the control unit,
- a program management unit that manages multiple programs corresponding to the components of the image processing device as task applications and performs scheduling according to the priority of these programs, and generates interrupt signals according to external events such as resets and power supply abnormalities
- An interrupt handler part can be added. Further, it may be integrated with a video decoder, a rendering unit, and a platform unit.
- the program shown in each embodiment can be created as follows. First, a software developer uses a programming language to write a source program that implements each flowchart and functional components. In this description, the software developer describes a source program that embodies each flowchart and functional components using a class structure, a variable, an array variable, and an external function call according to the syntax of the programming language.
- the described source program is given to the compiler as a file.
- the compiler translates these source programs to generate an object program.
- Translator translation consists of processes such as syntax analysis, optimization, resource allocation, and code generation.
- syntax analysis lexical analysis, syntax analysis, and semantic analysis of the source program are performed, and the source program is converted into an intermediate program.
- optimization operations such as basic block formation, control flow analysis, and data flow analysis are performed on the intermediate program.
- resource allocation in order to adapt to the instruction set of the target processor, a variable in the intermediate program is allocated to a register or memory of the processor of the target processor.
- code generation each intermediate instruction in the intermediate program is converted into a program code to obtain an object program.
- the object program generated here is composed of one or more program codes that cause a computer to execute the steps of the flowcharts shown in the embodiments and the individual procedures of the functional components.
- program codes such as a processor native code and JAVA (registered trademark) byte code.
- JAVA registered trademark
- a call statement that calls the external function becomes a program code.
- a program code that realizes one step may belong to different object programs.
- each step of the flowchart may be realized by combining arithmetic operation instructions, logical operation instructions, branch instructions, and the like.
- the programmer When object programs are generated, the programmer activates the linker for them.
- the linker allocates these object programs and related library programs to a memory space, and combines them into one to generate a load module.
- the load module generated in this manner is premised on reading by a computer, and causes the computer to execute the processing procedures and the functional component processing procedures shown in each flowchart.
- Such a computer program may be recorded on a non-transitory computer-readable recording medium and provided to the user.
- the display device and the image processing device may be connected via a network.
- the image processing device receives a frame image from the camera of the display device via the network and generates three-dimensional coordinates. Then, the generated three-dimensional coordinates are output to the display device, and an application executed on the display device is caused to perform an operation using the three-dimensional coordinates as a trigger.
- a time code may be added to the three-dimensional coordinates generated by the image processing apparatus.
- the time code added to the three-dimensional coordinates specifies the reproduction time of the frame image that is the basis for generating the three-dimensional coordinates.
- the application may ignore the 3D coordinates generated from the old frame image, or may thin out some of the 3D coordinates generated in bursts in a short period of time. it can.
- the integrated circuit, and the image processing program according to the present invention it is possible to extract three-dimensional position information of a specific object from a video with a small amount of calculation, and to a system for remotely controlling camera-equipped equipment. Is possible.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Image Processing (AREA)
Abstract
Description
上記実施障壁は、画像処理により、現実空間にある対象物の奥行きを定める画像処理装置であって、現実空間を撮影することで得られたフレーム画像データにおける指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれに対応付けて示すヒストグラムを生成する生成部と、生成されたヒストグラムの平滑化を行う平滑化部と、平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定部とを備えるという態様で克服することができる。図1(a)は、かかる基本的態様の内部構成を示す。上記態様ではヒストグラムをとり平滑化を施すので、平滑化がなされたヒストグラムに示されている画素数が、三次元座標生成の基礎となる。画素群における水平画素数、垂直画数バラツキは、整ったものに統一されるから、対象物がなす画素数を基礎にした三次元座標生成は、高精度になされることになる。これによりヒストグラムのピーク値が定まりやすくなり、三次元座標生成を良好に行うことができる。画素群の形状が、撮影画像毎に異なることによる影響を抑制することができるので、追従が容易となる。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記特定の座標には、平滑化されたヒストグラムにおいて、最大の出現数に対応付けられている基準軸上の座標と、平滑化されたヒストグラムにおいて最大出現数に準ずる次順位以降の出現数に対応付けられている基準軸上の座標とがあるという下位概念化を施したものである。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記画像処理装置は、ヒストグラム生成の基礎となる2以上の指定色を予め登録している登録テーブルと、登録された2以上の指定色のそれぞれについてヒストグラムが生成された場合、指定色毎のヒストグラムに示される各座標毎の出現数を、同じ座標に位置するもの同士で足し合わせて、その加算結果を各座標に対応する出現数とした合成ヒストグラムを得るヒストグラム合成部とを備え、前記奥行き決定部による奥行き決定の対象となる座標は、合成ヒストグラムにおいて、所定の閾値を上回る出現数を有する座標であるという下位概念化を施したものである。かかる下位概念化を施すと、上記態様の内部構成は図1(b)に示すものとなる。本図において強調線で囲まれている箇所が、この改変が存在する箇所を示す。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記2つ以上の指定色のうち何れか1つのものは、特定の重み係数が割り当てられた特定色であり、前記ヒストグラム合成部が合成ヒストグラムを生成するにあたって、特定色のヒストグラムの座標毎の出現数については、特定の重み係数を乗じた上、他の指定色のヒストグラムの同じ座標の出現数に足し合わせるという下位概念化を施したものである。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記現実空間における対象物は、2以上の指定色をなす操作部材であり、前記画像処理装置は、操作部材の回転角を決定する回転角決定部を備え、生成部によって2以上の指定色について生成されるヒストグラムは、基準軸において互いに異なる座標に位置し、前記回転角決定部は、 各指定色のヒストグラムにおける最大座標及び最小座標が、基準軸においてどれだけ隔てられているかという差分に基づき操作部材の回転角を定めるという下位概念化を施したものである。かかる下位概念化を施すと、上記態様の内部構成は図1(c)に示すものとなる。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記指定色は、画素を構成する輝度成分及び色差成分の組合せによって特定され、画像処理装置は、フレーム画像データを構成する複数画素のうち、輝度成分及び色差成分の組合せに合致するものからなる画素群を、基準軸上の複数の座標と対応付けて抽出する画素群抽出部を備え、生成部によるヒストグラム生成は、抽出された画素の数を、基準軸上の複数の座標に対応付けることでなされるという下位概念化を施したものである。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記指定色は、画素を構成する複数の原色成分の輝度の組合せによって特定され、画像処理装置は、フレーム画像データを構成する複数画素のうち、複数の原色成分の輝度の組合せに合致するものからなる画素群を、基準軸上の複数の座標と対応付けて抽出する画素群抽出部を備え、生成部によるヒストグラム生成は、抽出された画素の数を、基準軸上の複数の座標に対応付けることでなされるという下位概念化を施したものである。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記現実空間の対象物は、立体視装置の立体視効果により画面から飛び出した立体視オブジェクトを操作するための操作部材であり、前記基準軸とは、フレーム画像データにおけるX軸又はY軸であり、前記奥行き決定部により決定される奥行きは、操作部材がなす三次元座標のZ座標になり、当該三次元座標は、立体視装置において立体視オブジェクトの挙動を変化させるイベントの生成に用いられるという下位概念化を施したものである。この態様では、光学式の測距センサを用いずとも画面の前にいるユーザの大体の位置を決めることができるから、表示装置に組込まれた場合、ユーザが画面に近づきすぎているかどうかを検出して近づきすぎている場合に警報を発する等の処理を実現することができる。
本態様は、基本的な態様に対して以下の改変を加えたものである。つまり前記平滑化は、ヒストグラムに示される座標毎の出現数に対してインパルス応答を畳み込むことでなされるという下位概念化を施したものである。インパルス応答の畳み込みでは、中央の座標で重み係数が最大になり、そこから周辺につれ重み係数が下がってゆくので、ヒストグラムの面積による奥行き決定に適した、理想的な形状にヒストグラムを変形させることができる。
基本的な態様のバリエーションとして以下のものがある。つまりフレーム画像データを構成する複数画素のうち、特定の指定色をもつものからなる画素群を抽出する画素群抽出部と、抽出された画素群における各画素の画素値に対して平滑化を施す平滑化部と、平滑化された画素群における指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれ対応付けて示すヒストグラムを生成する生成部と、平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定部という態様を採用することができる。かかる下位概念化を施すと、上記態様の内部構成は図1(d)に示すものとなる。抽出された画素群における画素の変化がなめらかになり、これを対象として生成される度数のばらつきを抑制することができる。また画素群を構成する画素の平滑化には、フィルタを用いることができるから、実装のコスト化が可能になる。
画像処理装置と共に用いられる装置とは以下のものである。つまり立体視装置であって、アプリケーションを実行する実行部と、アプリケーションからの指示に従い、立体視画像の再生を行う再生部と、画像処理装置が発生した奥行き値を示すイベントを、ユーザ操作に応じて生成するイベントマネージャとを備え、前記アプリケーションは、発生したイベントに応じて再生部による立体視映像の再生内容を変化させるものである。立体視表示装置により画面から飛出した立体視オブジェクトを仮想的にタッチするという操作性を実現することができるので、仮想空間にいるかのような仮想現実感をユーザに与えることができる。ユーザは、日常の煩わしい事柄からしばし解放される。こうすることで、明日への活力をえることができる。
本態様は、上記立体視装置の基本的な態様に対して以下の改変を加えたものである。つまり前記立体視画像は、多視点ビデオストリームの再生映像、又は、アプリケーションにより描画されるグラフィクスで構成され、前記再生内容の変化は、再生の対象となる多視点ビデオストリームの切り替え、グラフィクスの切り替えがあるという下位概念化を施したものである。かかる態様により、ユーザとの対話性を一層充実させることができる。
集積回路を実施する局面で課題解決を図る場合、当該局面における集積回路は、画像処理により、現実空間にある対象物の奥行きを定める集積回路であって、現実空間を撮影することで得られたフレーム画像データにおける指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれに対応付けて示すヒストグラムを生成する生成部と、生成されたヒストグラムの平滑化を行う平滑化部と、平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定部とを具備していれば足りる。かかる集積回路によれば画像処理装置の部品化、モジュール化が可能になり、半導体部品の業界まで、画像処理装置の用途を広げることができる。
プログラムを実施する局面で課題解決を図る場合、当該局面におけるプログラムは、画像処理により、現実空間にある対象物の奥行きを定める処理をコンピュータに実行させる画像処理プログラムであって、現実空間を撮影することで得られたフレーム画像データにおける指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれに対応付けて示すヒストグラムの生成、生成されたヒストグラムの平滑化と、平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定とをコンピュータに実行させれば足りる。ネットワークプロバイダサーバや各種記録媒体を通じたプログラムの配布が可能になるから、一般のコンピュータソフトウェアやオンラインサービスの業界まで、画像処理装置の用途を広げることができる。
以下、画像処理装置を応用した応用製品の実施形態について説明する。画像処理装置の応用の形態としては、画像処理装置と、表示装置とを互いに別の装置として製品にする場合と、画像処理装置を表示装置の内部に組込んで製品にする場合とがあり、第1実施形態は、かかる後者の態様になる。表示装置は、立体視表示装置であり、テレビ、タブレット端末、スマートフォンといったデジタル機器として取り引される。
第2実施形態は、捕捉対象のコントラストを好適に処理する改良に関する。スタイラスを捕捉対象として選ぶ場合、室内の照明や屋外における日光の照射のため、撮影されたフレーム画像におけるスタイラス像では、複数色のコントラストが発生する。この複数色のコントラストは、直接光が当たった部分の色、通常の部分の色、影になっている部分の色を含む。たとえスタイラスの成形色、塗装色が一色であっても、光の照らされ方によって捕捉対象はコントラストをなすから、本実施形態ではかかるコントラストにおける直接光が当たった部分の色、通常の部分の色、影になっている部分の色を複数の指定色として画像処理装置に登録している。そして、かかる指定色に該当する画素群を抽出し、かかる抽出画素群のそれぞれについて、X軸ヒストグラム、Y軸ヒストグラムを生成する。そして、そのように生成された指定色毎のX軸ヒストグラム、Y軸ヒストグラムについてヒストグラム合成を行う。
本実施形態は、スタイラスのZ座標だけではなく、スタイラスの回転角を算出する構成要素を追加する。図22は、第3実施形態に係る画像処理部15の内部構成を示す。本図は、図3の内部構成をベースとして作図されており、このベースとなる内部構成と比較して、新規の構成要素が追加されている点が異なる。図21の内部構成における新規な構成要素の追加とは、X軸ヒストグラム及びY軸ヒストグラムからX座標、Y座標を取得してスタイラスの回転角を算出する回転角生成部41が存在することである。かかる回転角生成部41は、上述した回転角決定部に対応する構成要素であり、回転角生成部41で生成される回転角がイベントマネージャによるイベント生成の基礎となる。先の実施形態で説明した合成部と、これに関連する構成要素とは記載を省略している。これらの構成要素を具備するかどうかは任意的であるからである。第3実施形態に係る内部構成を説明したところで、本実施形態特有のスタイラス構成について説明する。
図24(c)は、塗り分けがなされた球体の撮影画像から生成される指定色毎のX軸ヒストグラム及びY軸ヒストグラムを示す。
本実施形態では、回転角決定に適合したスタイラスとしてどのようなものがよいかを示す。回転角決定に適した構成としては、スタイラスの柄先に、球状体を2つ取り付けるというものである。図26は、第2実施形態に係るスタイラスの構成を示す。図26(a)は、スタイラスの柄の先の部分に、指定色maで塗られた球体と、指定色mbで塗られた球体との球体対103eが存在する。図26(b)は、本実施形態で捕捉することができるスタイラスの動きを模式的に示す。本図の矢印up2dw2は、画面の鉛直方向に沿った移動軌跡を示す。矢印ro2は、スタイラスの右回り、左回りの回転を示す。上記球体対を対象にした回転角決定の原理を図26(c)~(e)を参照しながら説明する。
第5実施形態は、立体視表示装置において、カメラと組みにして発光素子101bを取り付ける形態を開示する。図28(a)は斜視図であり、カメラ101aと発光素子101bの組みが取り付けられた表示装置101と、スタイラス103とを描いている。(a)において、スタイラス先端に取り付けられた球体は白色の拡散部材で加工されており、本体のLEDの色を反射する。図28(b)は、表示装置の正面からカメラ101aと発光素子101bとを見た場合の正面図である。図28(c)は、表示装置の透視図であり、表示装置内部において取り付けられたカメラ101aと発光素子101bとの組みが、どのような構成をなすかを示す。(a)の位置にスタイラスが存在する場合、スタイラスの球体はLEDから発せられた光を拡散し、その拡散光がカメラ101aに入射する。画像処理装置には、この照射された球体がなす表面色が指定色として登録されていて、かかる登録がなされた画素群を抽出し、撮影画像を生成する。
第1実施形態のスタイラスは、タッチパネルの操作に適合したペン先と、捕捉対象の捕捉に適合した球状体とを有する2wayタイプのスタイラスであったが、これでは、ペン先と、球状体とが同時に捕捉されてしまって誤検出が生じやすいという欠点が生じる。そこで本実施形態では、スタイラスの形態自体の改良により、かかる誤検出の解消を図る。
これまでの実施形態は、立体視オブジェクトに対するスタイラスによるタッチをカメラ101aで捕捉するための三次元座標生成を対象としていたが、本実施形態は、画面をなぞるスタイラスペンの動きを追従する形態である。
本実施形態は、これまでの実施形態で述べた捕捉対象の動きの捕捉を応用して三次元ジェスチャを実現するというものである。表示装置の手前でユーザが渦巻を描いた場合、この動きの軌跡を複数枚のフレーム画像で表してX軸ヒストグラム及びY軸ヒストグラム生成を行い、これを基に三次元情報を生成すればユーザが一筆書きで描いた立体形状を、三次元座標モデルに変換することができる。図31(a)は、ユーザが空間中で描いた渦巻形状と、このスタイラスの動きから生成される三次元座標tr1,tr2,tr3・・・・とを示す。
以上、本願の出願時点において、出願人が知り得る最良の実施形態について説明したが、以下に示す技術的トピックについては、更なる改良や変更実施を加えることができる。各実施形態に示した通り実施するか、これらの改良・変更を施すか否かは、何れも任意的であり、実施する者の主観によることは留意されたい。
立体視オブジェクトの飛び出し量は、画面サイズによって変わる。よって立体視オブジェクトの飛出し量に応じて、ヒストグラム生成やヒストグラムの度数合算にあたっての重み係数、畳み込むべきインパルス応答を変更してもよい。
三次元座標のうち、特定の空間領域に属するものをタッチイベントの対象としたが、発生したイベントの全てをタッチイベントに変換してアプリケーションに送信してもよい。立体視オブジェクトのどの部分を操作の対象にするかはアプリケーションによって異なるので、こうする方がアプリケーションによる適切な処理が可能になるからである。この場合、イベントマネージャにあたる処理をアプリケーションをロードすればよい。
第1実施形態に示した表示装置を、記録媒体の再生装置及び表示装置からなる立体視シアターシステムとして構成してもよい。この場合、表示部及び画像処理部のみを表示装置の構成要素とし、プラットフォーム部、ヒープメモリ、ビデオデコーダ、レンダリング部、ビデオプレーン、グラフィクスプレーン、イベントマネージャを再生装置の構成要素とする。この場合、表示装置に具備された画像処理装置は、表示装置とのインターフェイスを介して三次元座標を再生装置に出力する。記録媒体の再生装置は、こうして受け取った三次元座標に従い、イベントを生成してアプリケーションを駆動するのである。
これまでの実施形態において、移動平均フィルタ、ガウシアンフィルタやインパルス応答フィルタで平滑化を実現したが平滑化は、これに限られるものではなく、複数座標の度数を加算して、座標数で割るとの演算が伴えば足りる。他の一例として以下のものがある。つまり、核関数による平滑化、多項式の局所的当てはめによる平滑化 、散布図平滑化、スプライン関数当てはめによる平滑化、移動直線平滑化、Friedmanのsupersmoother 法 、移動中央値による平滑化 、Tukey の移動中央値平滑化、移動中央値に対する端点平滑化が存在する。
ヒストグラムで度数計数の対象となる複数の座標は、X軸又はY軸において連続する複数の座標であってもよいし、ある決まった数の座標を間引いたものでもよい。度数計数の対象となる座標を間引くことで、ヒストグラム標本数の少なくすることができ、ヒストグラム生成の負荷を小さくすることができる。
奥行き決定の基礎となる度数を、最大度数の座標であるXmの度数と、一つ手前のXm-1の度数と、1つ後のXm+1の度数とにしたが、これらに代えて、Xmからある決まった数だけ隔てた座標の度数であってもよい。また、Xmの度数を採用せず、一つ手前のXm-1の度数や1つ後のXm+1の度数等、近傍値の度数のみを奥行き決定の基礎にしてもよい。同様に奥行き決定の基礎となる度数を、最大度数の座標であるYmの度数と、一つ手前のYm-1の度数と、1つ後のYm+1の度数とにしたが、これらに代えて、Ymからある決まった数だけ隔てた座標の度数であってもよい。また、Ymの度数を採用せず、一つ手前のYm-1の度数や1つ後のYm+1の度数等、近傍値の度数のみを奥行き決定の基礎にしてもよい。
多視点ビデオストリームは、フレームシーケンシャルフォーマットであるとしたが、垂直方向ラインインターリーブドフォーマット、水平方向ラインインターリーブドフォーマット、トップダウンフォーマット、サイドバイサイドフォーマット、フィールドシーケンシャルフォーマット、ブロックインターリーブドフォーマットであってもよい。ブロックインターリーブドフォーマットは、例えば、基準視点映像及び付加視点映像がブロック単位で交互に混合映像のブロックに配置されることによって、混合映像を構成する。
奥行き決定の対象物は、人体の一部であってもよい。この場合、非特許文献1に記載されているようなHue Saturation Valueカラーシステムを指定色のルップアップテーブルとして画像処理装置に登録しておき、画素群の抽出を行えばよい。しかし人体を対象物とする場合、人体の手、腕、顔、首がフレーム画像に写りこむ。この場合、上述したようなルップアップテーブルでは、手、腕、顔、首の画素群が抽出され、適切な奥行きが定まらない恐れがある。そこで、第1実施形態に示した発光素子101bを用いて人体の部位のうち、奥行き決定の対象となるべき部位を照射する。ルップアップテーブルでは、その照射された状態の部位の輝度Y,赤色差Cr,青色差Cbの組合せ、輝度成分、赤色差成分、青色差成分の組合せを登録しておく。こうすることで奥行き決定の対象を人体の指先に絞ることができる。
スタイラスの表面がなす模様のコントラストを、複数の指定色として登録しておき、かかる複数の指定色に従いX軸ヒストグラム、Y軸ヒストグラムを生成してヒストグラム合成の対象にしてもよい。模様のコントラストとしては、チェッカーパターンや虹色パターン、格子パターン、バーコード、QRコード(登録商標)がありえる。このように、様々な色の組合せを指定色としてルップアップテーブルに登録しておけば、類似する色の物体と、捕捉対象との区別を適切に行うことができる。
本願の出願明細書や図面に記載した様々な数式は、数学的な概念を意味するのではなく、あくまでも、コンピュータ上で実行される数値演算を意味するものなので、当然のことながら、コンピュータに実現させるための、必要な改変が加えられることはいうまでもない。例えば、数値を、整数型、固定少数点型、浮動小数点型で扱うための飽和演算や正値化演算が施されてよいことはいうまでもない。更に、各実施形態に示した、数式に基づく演算処理や算出処理は、定数ROMを用いたROM乗算器で実現することができる。定数ROMには、被乗数と定数との積の値はあらかじめ計算されて格納されている。例えば、被乗数が16ビット長である場合、この被乗数は、4ビット毎に四つに区切られ、この4ビット部分と定数との積、つまり、定数の0~15の倍数が上記の定数ROMに格納されている。上記の一区切りの4ビットと定数16ビットとの積は20ビット長であり、上記の四つの定数が同一のアドレスに格納されるので、20×4=80ビット長が一語のビット長となる。以上述べたように、ROM乗算器での実現が可能であるので、本明細書でいうところの"算出処理"や"演算処理"は、純粋な算術演算のみを意味するのではなく、ROM等の記録媒体に格納された演算結果を、被演算子の値に応じて読み出すという、記録媒体の読み出しをも包含する。
ヒストグラム生成部24は、以下のようなハードウェア構成を具備したヒストグラム生成回路として実現することができる。具体的にいうとヒストグラム生成部24は、画素を構成する輝度Y,赤色差Cr,青色差Cbの階調ビット値と、指定色のルップアップテーブルに記載された輝度Y,赤色差Cr,青色差Cbの階調ビット値とを比較する比較回路、比較回路で一致が判定された場合、カウント値をインクリメントするカウンタ回路、画素の座標を、そのカウント値と共にメモリに書き込む書込回路とにより実現することができる。
奥行き決定にあたっては、捕捉対象の形状に応じた奥行き補正を行うのが望ましい。図32(a)は、球状体についての奥行き補正を示す。左側の撮影した撮影画像に対して指定色に基づく画素群抽出がなされことで、中央の画素群が得られる。そしてこの抽出画素群からX軸ヒストグラム、Y軸ヒストグラムを生成すればその最大度数から右側に示すような中央十字線の面積を導くことができる。この中央十字線を奥行き基準としたが、かかる球の表面にあたる部分を奥行きの基準にしてもよい。図32(b)はペン先に対する奥行き補正の過程を示す。左側は、スタイラスのペン先を示し、中央はこのペン先を撮影した撮影画像から抽出される抽出画像を示す。左側は、この抽出画素群から生成された平滑化ヒストグラムの最大度数から得られる中央十字線を示す。中央十字線に示すように、ペン先の画素群の形状は、おおよそ逆三角形となり、これを基に生成される中央十字線も、下辺が長いものとなる。この下辺がペン先にあたるから、スタイラスの奥行きであるZmの補正にかかる下辺の長さlen1を使用する。更に、中央十字線の下辺と、右辺または左辺とがなす角度θをもちいることが望ましい。中央十字線の下辺と、右辺または左辺とがなす角度θは、tanθ=len2/len1の関係を満たすから、len2と、len1との比率をθとして使用し、かかるθをペン先の本来の角度と比較すればペン先がX-Y平面からどれだけ傾いているかがわかる。かかるθを奥行き決定に利用することで、ペン先による操作をより直感的にすることができる。
X軸ヒストグラムのXmの度数、Y軸ヒストグラムのYmの度数をContinuously Adaptive Meanシフト(CAMSHIFT)に適用して奥行きを決定してもよい。h(Xm)、h(Ym)により中央十字線の横幅、縦幅が定まるから、かかるh(Xm)、h(Ym)を、中央十字線の横幅、縦幅として上述したようなContinuously Adaptive Meanシフト(CAMSHIFT)の手順1.に採用することで、人体の一分を捕捉できるような高度が奥行き決定の実現が可能になる。
画像処理部15は、表示装置に組込むべき半導体集積回路であるから、かかる画像処理部15を高密度基板上にパッケージングすることでシステムLSIを構成してもよい。システムLSIは、複数個のベアチップを高密度基板上に実装し、パッケージングすることにより、あたかも1つのLSIのような外形構造を複数個のベアチップに持たせたものであり、マルチチップモジュールと呼ばれる。集積回路のアーキテクチャは、プリプログラムされたDMAマスタ回路等から構成され、ストリーム処理全般を実行するフロントエンド処理部(1)と、SIMDプロセッサ等から構成され、信号処理全般を実行する信号処理部(2)と、画素処理や画像重畳、リサイズ、画像フォーマット変換AV出力処理全般を行うバックエンド部(3)と、ドライブ、ネットワークとのインターフェイスであるメディアインターフェイス(4)と、メモリアクセスのためのスレーブ回路であり、フロントエンド部、信号処理部、バックエンド部の要求に応じて、パケットやデータの読み書きを実現するメモリコントローラ(5)とから構成される。ここでパッケージの種別に着目するとシステムLSIには、QFP(クッド フラッド アレイ)、PGA(ピン グリッド アレイ)という種別がある。QFPは、パッケージの四側面にピンが取り付けられたシステムLSIである。PGAは、底面全体に、多くのピンが取り付けられたシステムLSIである。
各実施形態に示したプログラムは、以下のようにして作ることができる。先ず初めに、ソフトウェア開発者は、プログラミング言語を用いて、各フローチャートや、機能的な構成要素を実現するようなソースプログラムを記述する。この記述にあたって、ソフトウェア開発者は、プログラミング言語の構文に従い、クラス構造体や変数、配列変数、外部関数のコールを用いて、各フローチャートや、機能的な構成要素を具現するソースプログラムを記述する。
表示装置と、画像処理装置とはネットワークを介して接続されてもよい。この場合、画像処理装置は、ネットワークを介して表示装置のカメラによるフレーム画像を受け取り三次元座標生成を行う。そして生成した三次元座標を表示装置に出力して、表示装置で実行されているアプリケーションに、かかる三次元座標をトリガにした動作を行わせる。また画像処理装置により生成される三次元座標にはタイムコードが付加されてもよい。三次元座標に付加されるタイムコードは三次元座標生成の基礎となったフレーム画像の再生時刻を特定するものである。かかるタイムコードを参照することでアプリケーションは、古いフレーム画像から生成された三次元座標を無視したり、また、短い期間にバースト的に発生した複数の三次元座標の一部を間引いたりすることができる。
24 ヒストグラム生成部
26 ヒストグラム平滑化部
28 三次元座標生成部
Claims (14)
- 画像処理により、現実空間にある対象物の奥行きを定める画像処理装置であって、
現実空間を撮影することで得られたフレーム画像データにおける指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれに対応付けて示すヒストグラムを生成する生成部と、
生成されたヒストグラムの平滑化を行う平滑化部と、
平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定部と
を備える画像処理装置。 - 前記特定の座標には、
平滑化されたヒストグラムにおいて、最大の出現数に対応付けられている基準軸上の座標と、平滑化されたヒストグラムにおいて最大出現数に準ずる次順位以降の出現数に対応付けられている基準軸上の座標とがある
ことを特徴とする請求項1記載の画像処理装置。 - 前記画像処理装置は、ヒストグラム生成の基礎となる2以上の指定色を予め登録している登録テーブルと、
登録された2以上の指定色のそれぞれについてヒストグラムが生成された場合、指定色毎のヒストグラムに示される各座標毎の出現数を、同じ座標に位置するもの同士で足し合わせて、その加算結果を各座標に対応する出現数とした合成ヒストグラムを得るヒストグラム合成部とを備え、
前記奥行き決定部による奥行き決定の対象となる座標は、合成ヒストグラムにおいて、所定の閾値を上回る出現数を有する座標である
ことを特徴とする請求項1記載の画像処理装置。 - 前記2つ以上の指定色のうち何れか1つのものは、特定の重み係数が割り当てられた特定色であり、
前記ヒストグラム合成部が合成ヒストグラムを生成するにあたって、特定色のヒストグラムの座標毎の出現数については、特定の重み係数を乗じた上、他の指定色のヒストグラムの同じ座標の出現数に足し合わせる
ことを特徴とする請求項3記載の画像処理装置。 - 前記現実空間における対象物は、2以上の指定色をなす操作部材であり、前記画像処理装置は、操作部材の回転角を決定する回転角決定部を備え、
生成部によって2以上の指定色について生成されるヒストグラムは、基準軸において互いに異なる座標に位置し、
前記回転角決定部は、 各指定色のヒストグラムにおける最大座標及び最小座標が、基準軸においてどれだけ隔てられているかという差分に基づき操作部材の回転角を定める
ことを特徴とする請求項1記載の画像処理装置。 - 前記指定色は、画素を構成する輝度成分及び色差成分の組合せによって特定され、
画像処理装置は、
フレーム画像データを構成する複数画素のうち、輝度成分及び色差成分の組合せに合致するものからなる画素群を、基準軸上の複数の座標と対応付けて抽出する画素群抽出部を備え、
生成部によるヒストグラム生成は、
抽出された画素の数を、基準軸上の複数の座標に対応付けることでなされる
ことを特徴とする請求項1記載の画像処理装置。 - 前記指定色は、画素を構成する複数の原色成分の輝度の組合せによって特定され、
画像処理装置は、
フレーム画像データを構成する複数画素のうち、複数の原色成分の輝度の組合せに合致するものからなる画素群を、基準軸上の複数の座標と対応付けて抽出する画素群抽出部を備え、
生成部によるヒストグラム生成は、
抽出された画素の数を、基準軸上の複数の座標に対応付けることでなされる
ことを特徴とする請求項1記載の画像処理装置。 - 前記現実空間の対象物は、立体視装置の立体視効果により画面から飛び出した立体視オブジェクトを操作するための操作部材であり、
前記基準軸とは、フレーム画像データにおけるX軸又はY軸であり、
前記奥行き決定部により決定される奥行きは、操作部材がなす三次元座標のZ座標になり、当該三次元座標は、立体視装置において立体視オブジェクトの挙動を変化させるイベントの生成に用いられる
ことを特徴とする請求項1記載の画像処理装置。 - 前記平滑化は、ヒストグラムに示される座標毎の出現数に対してインパルス応答を畳み込むことでなされる
ことを特徴とする請求項1記載の画像処理装置。 - フレーム画像データを構成する複数画素のうち、特定の指定色をもつものからなる画素群を抽出する画素群抽出部と、
抽出された画素群における各画素の画素値に対して平滑化を施す平滑化部と、
平滑化された画素群における指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれ対応付けて示すヒストグラムを生成する生成部と、
平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定部と
を備える画像処理装置。 - 画像処理装置と共に用いられる立体視装置であって、
アプリケーションを実行する実行部と、
アプリケーションからの指示に従い、立体視画像の再生を行う再生部と、
画像処理装置が発生した奥行き値を示すイベントを、ユーザ操作に応じて生成するイベントマネージャとを備え、
前記アプリケーションは、発生したイベントに応じて再生部による立体視映像の再生内容を変化させる
ことを特徴とする立体視装置。 - 立体視画像は、多視点ビデオストリームの再生映像、又は、アプリケーションにより描画されるグラフィクスで構成され、
前記再生内容の変化は、再生の対象となる多視点ビデオストリームの切り替え、グラフィクスの切り替えがある
ことを特徴とする請求項11記載の立体視装置。 - 画像処理により、現実空間にある対象物の奥行きを定める集積回路であって、
現実空間を撮影することで得られたフレーム画像データにおける指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれに対応付けて示すヒストグラムを生成する生成部と、
生成されたヒストグラムの平滑化を行う平滑化部と、
平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定部と
を備える集積回路。 - 画像処理により、現実空間にある対象物の奥行きを定める処理をコンピュータに実行させる画像処理プログラムであって、
現実空間を撮影することで得られたフレーム画像データにおける指定色の画素の出現数を、画面の基準軸の複数座標のそれぞれに対応付けて示すヒストグラムの生成、
生成されたヒストグラムの平滑化と、
平滑化されたヒストグラムに示される複数の出現数の中から、特定の座標に対応付けられているものを選んで、選ばれた出現数を用いて対象物の奥行きを決める奥行き決定とを コンピュータに実行させる画像処理プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/003,084 US9754357B2 (en) | 2012-03-23 | 2013-03-05 | Image processing device, stereoscoopic device, integrated circuit, and program for determining depth of object in real space generating histogram from image obtained by filming real space and performing smoothing of histogram |
JP2013544596A JP6100698B2 (ja) | 2012-03-23 | 2013-03-15 | 画像処理を行うことにより、現実空間にある対象物の奥行きを定める画像処理装置、立体視装置、集積回路、プログラム |
CN201380001048.8A CN103503030B (zh) | 2012-03-23 | 2013-03-15 | 通过进行图像处理来决定位于现实空间中的对象物的进深的图像处理装置、立体视觉装置、集成电路 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012066612 | 2012-03-23 | ||
JP2012-066612 | 2012-03-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013140776A1 true WO2013140776A1 (ja) | 2013-09-26 |
Family
ID=49222257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/001802 WO2013140776A1 (ja) | 2012-03-23 | 2013-03-15 | 画像処理を行うことにより、現実空間にある対象物の奥行きを定める画像処理装置、立体視装置、集積回路、プログラム |
Country Status (4)
Country | Link |
---|---|
US (1) | US9754357B2 (ja) |
JP (1) | JP6100698B2 (ja) |
CN (1) | CN103503030B (ja) |
WO (1) | WO2013140776A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016143199A (ja) * | 2015-01-30 | 2016-08-08 | 富士通株式会社 | 入力装置、入力方法及び入力処理用コンピュータプログラムならびに入力システム |
CN106097309A (zh) * | 2016-05-30 | 2016-11-09 | 余同立 | 一种智能显示视觉图像的位置信息处理方法及系统 |
US9984281B2 (en) | 2015-01-29 | 2018-05-29 | Panasonic Intellectual Property Management Co., Ltd. | Image processing apparatus, stylus, and image processing method |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20140046327A (ko) * | 2012-10-10 | 2014-04-18 | 삼성전자주식회사 | 멀티 디스플레이 장치, 입력 펜, 멀티 디스플레이 장치의 제어 방법 및 멀티 디스플레이 시스템 |
US10475226B2 (en) | 2013-03-15 | 2019-11-12 | Crayola Llc | Coloring kit for capturing and animating two-dimensional colored creation |
US9946448B2 (en) | 2013-03-15 | 2018-04-17 | Crayola Llc | Coloring kit for capturing and animating two-dimensional colored creation |
US20140267425A1 (en) * | 2013-03-15 | 2014-09-18 | Crayola Llc | Personalized Digital Animation Kit |
JP6510213B2 (ja) * | 2014-02-18 | 2019-05-08 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 投影システム、半導体集積回路、および画像補正方法 |
CN105303586B (zh) * | 2014-07-16 | 2018-05-11 | 深圳Tcl新技术有限公司 | 获取直方图的方法、亮度动态调整的方法及图像处理装置 |
CA2959389A1 (en) | 2014-09-02 | 2016-03-10 | Ab Initio Technology Llc | Compilation of graph-based program specifications with automated clustering of graph components based on the identification of particular data port connections |
DE112015003587T5 (de) * | 2014-09-02 | 2017-05-11 | Ab Initio Technology Llc | Spezifizieren von komponenten in graphbasierten programmen |
US9467279B2 (en) * | 2014-09-26 | 2016-10-11 | Intel Corporation | Instructions and logic to provide SIMD SM4 cryptographic block cipher functionality |
US9805662B2 (en) * | 2015-03-23 | 2017-10-31 | Intel Corporation | Content adaptive backlight power saving technology |
US9430688B1 (en) * | 2015-03-25 | 2016-08-30 | The Boeing Company | Overlapping multi-signal classification |
US10353946B2 (en) * | 2017-01-18 | 2019-07-16 | Fyusion, Inc. | Client-server communication for live search using multi-view digital media representations |
JP6969149B2 (ja) * | 2017-05-10 | 2021-11-24 | 富士フイルムビジネスイノベーション株式会社 | 三次元形状データの編集装置、及び三次元形状データの編集プログラム |
WO2019161562A1 (en) * | 2018-02-26 | 2019-08-29 | Intel Corporation | Object detection with image background subtracted |
US10733800B2 (en) | 2018-09-17 | 2020-08-04 | Facebook Technologies, Llc | Reconstruction of essential visual cues in mixed reality applications |
CN109615659B (zh) * | 2018-11-05 | 2023-05-05 | 成都西纬科技有限公司 | 一种车载多摄像机环视系统的摄像机参数获得方法及装置 |
CN109933195B (zh) * | 2019-03-06 | 2022-04-22 | 广州世峰数字科技有限公司 | 基于mr混合现实技术的界面立体式展示方法及交互系统 |
US20240081636A1 (en) * | 2021-02-22 | 2024-03-14 | Northeastern University | Method for Visual Function Assessment Using Multistable Rivalry Paradigms |
CN112907745B (zh) * | 2021-03-23 | 2022-04-01 | 北京三快在线科技有限公司 | 一种数字正射影像图生成方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06131442A (ja) * | 1992-10-19 | 1994-05-13 | Mazda Motor Corp | 3次元虚像造形装置 |
JPH07200158A (ja) * | 1993-12-07 | 1995-08-04 | At & T Corp | 一次元イメージセンサを用いたスタイラスの位置決め装置と方法 |
JP2006171929A (ja) * | 2004-12-14 | 2006-06-29 | Honda Motor Co Ltd | 顔領域推定装置、顔領域推定方法及び顔領域推定プログラム |
JP2006185109A (ja) * | 2004-12-27 | 2006-07-13 | Hitachi Ltd | 画像計測装置及び画像計測方法 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4991092A (en) * | 1988-08-12 | 1991-02-05 | The Regents Of The University Of California | Image processor for enhancing contrast between subregions of a region of interest |
WO1993015376A1 (en) * | 1992-01-30 | 1993-08-05 | Fujitsu Limited | System for recognizing and tracking target mark, and method therefor |
JP2647033B2 (ja) * | 1994-11-24 | 1997-08-27 | 日本電気株式会社 | ルックアップテーブル作成方法及びルックアップテーブル作成装置 |
US6795068B1 (en) | 2000-07-21 | 2004-09-21 | Sony Computer Entertainment Inc. | Prop input device and method for mapping an object from a two-dimensional camera image to a three-dimensional space for controlling action in a game program |
US20020176001A1 (en) * | 2001-05-11 | 2002-11-28 | Miroslav Trajkovic | Object tracking based on color distribution |
JP2005031045A (ja) * | 2003-07-11 | 2005-02-03 | Olympus Corp | 情報呈示装置及び情報呈示システム |
US20080018668A1 (en) * | 2004-07-23 | 2008-01-24 | Masaki Yamauchi | Image Processing Device and Image Processing Method |
US7683950B2 (en) * | 2005-04-26 | 2010-03-23 | Eastman Kodak Company | Method and apparatus for correcting a channel dependent color aberration in a digital image |
US9891435B2 (en) * | 2006-11-02 | 2018-02-13 | Sensics, Inc. | Apparatus, systems and methods for providing motion tracking using a personal viewing device |
US8103102B2 (en) * | 2006-12-13 | 2012-01-24 | Adobe Systems Incorporated | Robust feature extraction for color and grayscale images |
CN101622869B (zh) * | 2007-12-18 | 2012-03-07 | 松下电器产业株式会社 | 图像再现装置、图像再现方法及图像再现程序 |
US8525900B2 (en) * | 2009-04-23 | 2013-09-03 | Csr Technology Inc. | Multiple exposure high dynamic range image capture |
US8340420B2 (en) * | 2009-10-05 | 2012-12-25 | National Taiwan University | Method for recognizing objects in images |
US8199165B2 (en) * | 2009-10-14 | 2012-06-12 | Hewlett-Packard Development Company, L.P. | Methods and systems for object segmentation in digital images |
US9646453B2 (en) * | 2011-12-23 | 2017-05-09 | Bally Gaming, Inc. | Integrating three-dimensional and two-dimensional gaming elements |
-
2013
- 2013-03-05 US US14/003,084 patent/US9754357B2/en not_active Expired - Fee Related
- 2013-03-15 JP JP2013544596A patent/JP6100698B2/ja not_active Expired - Fee Related
- 2013-03-15 WO PCT/JP2013/001802 patent/WO2013140776A1/ja active Application Filing
- 2013-03-15 CN CN201380001048.8A patent/CN103503030B/zh not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06131442A (ja) * | 1992-10-19 | 1994-05-13 | Mazda Motor Corp | 3次元虚像造形装置 |
JPH07200158A (ja) * | 1993-12-07 | 1995-08-04 | At & T Corp | 一次元イメージセンサを用いたスタイラスの位置決め装置と方法 |
JP2006171929A (ja) * | 2004-12-14 | 2006-06-29 | Honda Motor Co Ltd | 顔領域推定装置、顔領域推定方法及び顔領域推定プログラム |
JP2006185109A (ja) * | 2004-12-27 | 2006-07-13 | Hitachi Ltd | 画像計測装置及び画像計測方法 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9984281B2 (en) | 2015-01-29 | 2018-05-29 | Panasonic Intellectual Property Management Co., Ltd. | Image processing apparatus, stylus, and image processing method |
JP2016143199A (ja) * | 2015-01-30 | 2016-08-08 | 富士通株式会社 | 入力装置、入力方法及び入力処理用コンピュータプログラムならびに入力システム |
CN106097309A (zh) * | 2016-05-30 | 2016-11-09 | 余同立 | 一种智能显示视觉图像的位置信息处理方法及系统 |
CN106097309B (zh) * | 2016-05-30 | 2017-12-19 | 广州巧瞳科技有限公司 | 一种智能显示视觉图像的位置信息处理方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN103503030B (zh) | 2017-02-22 |
CN103503030A (zh) | 2014-01-08 |
JPWO2013140776A1 (ja) | 2015-08-03 |
US20140071251A1 (en) | 2014-03-13 |
JP6100698B2 (ja) | 2017-03-22 |
US9754357B2 (en) | 2017-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6100698B2 (ja) | 画像処理を行うことにより、現実空間にある対象物の奥行きを定める画像処理装置、立体視装置、集積回路、プログラム | |
CN108525298B (zh) | 图像处理方法、装置、存储介质及电子设备 | |
US8007110B2 (en) | Projector system employing depth perception to detect speaker position and gestures | |
CN106355153B (zh) | 一种基于增强现实的虚拟对象显示方法、装置以及系统 | |
CN107018336B (zh) | 图像处理的方法和装置及视频处理的方法和装置 | |
US9619105B1 (en) | Systems and methods for gesture based interaction with viewpoint dependent user interfaces | |
US7755608B2 (en) | Systems and methods of interfacing with a machine | |
KR101453815B1 (ko) | 사용자의 시점을 고려하여 동작인식하는 인터페이스 제공방법 및 제공장치 | |
CN105051792B (zh) | 用于使用深度映射和光源合成增强3d图像的设备 | |
US9886102B2 (en) | Three dimensional display system and use | |
CN110969685B (zh) | 使用渲染图的可定制渲染管线 | |
US20110012830A1 (en) | Stereo image interaction system | |
JP2018537755A (ja) | 中心窩ジオメトリテッセレーション | |
CN111353930B (zh) | 数据处理方法及装置、电子设备及存储介质 | |
CN111275801A (zh) | 一种三维画面渲染方法及装置 | |
CN112657176A (zh) | 一种结合人像行为信息的双目投影人机交互方法 | |
CN114419226A (zh) | 全景渲染方法、装置、计算机设备和存储介质 | |
CN108401452B (zh) | 使用虚拟现实头戴式显示器系统来执行真实目标检测和控制的装置和方法 | |
US11538228B1 (en) | Systems and methods for augmented reality viewing based on directly mapped point cloud overlays | |
CN116958344A (zh) | 虚拟形象的动画生成方法、装置、计算机设备及存储介质 | |
US11748918B1 (en) | Synthesized camera arrays for rendering novel viewpoints | |
Boubekeur | ShellCam: Interactive geometry-aware virtual camera control | |
Song et al. | Real-time single camera natural user interface engine development | |
CN110941974B (zh) | 虚拟对象的控制方法和装置 | |
Rick et al. | GPU implementation of 3D object selection by conic volume techniques in virtual environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 14003084 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2013544596 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13764307 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13764307 Country of ref document: EP Kind code of ref document: A1 |