WO2023037451A1

WO2023037451A1 - Image processing device, method, and program

Info

Publication number: WO2023037451A1
Application number: PCT/JP2021/033030
Authority: WO
Inventors: 由実菊地; 卓佐野; 正人小野; 真二深津
Original assignee: 日本電信電話株式会社
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2023-03-16
Also published as: JPWO2023037451A1

Abstract

An image processing device according to one embodiment of the present invention comprises an extraction unit that extracts one or more processing target regions on the basis of characteristic elements of pixels of a moving image, a feature measurement unit that measures features of pixels in a region obtained by tracking the extracted processing target regions in continuous frames of the moving image, and a mask generation unit that generates image mask information that is information indicating the shapes and positions of the processing target regions extracted by the extraction unit when an average value of the features measured by the feature measurement unit falls within a predetermined range.

Description

Image processing device, method and program

The embodiments of the present invention relate to an image processing device, method and program.

From contents captured as a single perspective image or stereo image using a computer such as a program running on a computer and an image processing chip There is a technique for automatically reconstructing (image processing) 3D contents (three-dimensional contents) (see Patent Document 1, for example).

Japanese Patent Application Laid-Open No. 2011-18269

As described above, when reconstructing 3D content from content captured as monocular images or stereo images, the presence of transparent objects in the content poses a problem.
For example, glass, smoke used in stage performances, or fog that appears depending on the weather, are objects that are farther away than the original subject and can be seen through other objects ( hereinafter sometimes referred to as transparent objects) are included in many contents.

It is easy for a human to recognize the presence or absence of this transparent object and the depth position, but for a computer, it is necessary to correctly grasp the presence or absence of this transparent object, and to determine at what depth this transparent object exists. It is difficult to judge whether or not it exists at a position.

Since it is difficult to determine the presence or absence of this transparent object and to determine the depth position, when 3D content is automatically reconstructed by a computer, for example, in a foggy landscape, further away than the fog If an arbitrary object faintly visible in the direction of , here, object A is processed and extracted by a computer, whether or not it can be extracted changes depending on the density of the fog.

Here, when the density of the fog is near the boundary point of whether or not object A can be extracted, the fact that object A can be extracted and the fact that it cannot be extracted are repeated. There is a problem that the result of depth information processing changes with a relatively high frequency for each frame, flicker occurs, and the image quality deteriorates.

SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an image processing apparatus, method, and program capable of improving quality when an image includes a translucent object. to provide.

An image processing apparatus according to an aspect of the present invention includes an extracting unit that extracts one or more processing target regions based on characteristic elements of pixels of a moving image, and the extracted processing targets in consecutive frames of the moving image. a feature quantity measuring unit for measuring a feature quantity of pixels in a region obtained by tracking the region; and a mask generation unit that generates information indicating the shape and position of the processed region as mask information of the image.

An image processing method according to an aspect of the present invention is a method performed by an image processing apparatus, comprising: an extracting unit for extracting one or more processing target regions based on characteristic elements of pixels of a moving image; a feature amount measuring unit for measuring a feature amount of pixels in a region obtained by following the extracted processing target area in consecutive frames; and an average of the feature amounts measured by the feature amount measuring unit and a mask generating unit that generates information indicating the shape and position of the processing target area extracted by the extracting unit as mask information of the image when it is within the range.

According to the present invention, it is possible to improve the quality when a translucent object is included in an image.

FIG. 1 is a diagram showing an application example of an image processing system according to an embodiment of the present invention. FIG. 2 is a flow chart showing an example of the processing operation by the translucent area extracting unit. FIG. 3 is a diagram showing an example of a semi-transparent area extracted from an image by a semi-transparent area extraction unit. FIG. 4 is a flow chart showing an example of the processing operation of the feature amount measuring unit. FIG. 5 is a flow chart showing an example of the processing operation of the original image acquisition unit. FIG. 6 is a flowchart illustrating an example of the processing operation of the feature quantity comparison unit. FIG. 7 is a flowchart illustrating an example of a processing operation of a mask generation unit; FIG. 8 is a block diagram showing an example of the hardware configuration of the translucent area processing device of the image processing system according to one embodiment of the present invention.

An embodiment according to the present invention will be described below with reference to the drawings.
FIG. 1 is a diagram showing an application example of an image processing system according to one embodiment of the present invention.
As shown in FIG. 1, the image processing system according to one embodiment of the present invention has a translucent area processing device (image processing device) 100 and a 3D image processing device 120 .
The semi-transparent area processing device 100 has a semi-transparent area extracting section 11 , a feature amount measuring section 12 , an original image obtaining section 13 , a feature amount comparing section 14 and a mask generating section 15 .
The 3D image processing device 120 also has a depth map generation unit 21 and a flick management unit 22 .

A semi-transparent area processing device 100 extracts a semi-transparent area in an image, such as fog or smoke in a stage production, and optimizes depth information for this semi-transparent area.
The 3D image processing device 120 may be integrated with the translucent region processing device 100 .

The semi-transparent area extraction unit 11 of the semi-transparent area processing device 100 extracts a semi-transparent area in the image and processes the information of this area. In the following description, it is assumed that the extracted translucent area is a smoke area.

FIG. 2 is a flow chart showing an example of processing operations by a semi-transparent region extraction unit.
Since the color of smoke varies depending on conditions such as stage lighting and environment, a case where the color of smoke is whitish will be described here as an example.

The semi-transparent region extracting unit 11 examines the values of pixels in a whitish region in the image for any frame in the moving image. are held as area A1, area A2, area A3, . . . , area An (S11). Examples of range U here can be shown in (1) and (2) below.
(1) (For RGB) 240 < R ≤ 255, 240 < G ≤ 255, 240 < B ≤ 255
(2) (For YUV (YCbCr)) 240<Y≤255, -10<Cb≤0, -10<Cr≤0

Here, the value range U is not particularly limited as long as it is information about pixels obtained from an image, and may be hue information or luminance information, as long as the characteristics of each pixel can be accurately extracted. Also, the value range U may be freely changed by an image editor based on an empirical rule based on the characteristics of each image, or an optimum solution may be set by a computer (including machine learning). .

FIG. 3 is a diagram showing an example of a semi-transparent area extracted from an image by a semi-transparent area extraction unit.
The example shown in FIG. 3 shows that regions A1, A2, and A3 in image G1 are extracted as regions corresponding to whitish smoke.

Next, the translucent area extraction unit 11 holds the pixel values In of the representative points X1, X2, X3, . ). At this time, the representative point may be the barycentric point of the area An or the point where the (x, y) coordinates of the area An are the minimum values.

Next, the translucent area extracting unit 11 tracks the area A1, area A2, area A3, .
The semi-transparent region extracting unit 11 passes the region information of the tracked region to the feature quantity measuring unit 12 (S14). This area information includes shape information and position information of the area. The processing procedure of the feature amount measuring unit 12 will be described later.

Next, the semi-transparent area extraction unit 11 determines whether or not the average pixel value Kn, which will be described later, returned from the feature amount measurement unit 12 is within the value range U described above (S15).
If the average pixel value Kn is not within the value range U (No in S15), the translucent area extraction unit 11 deletes (discards) the area An (S17), and the process for the area An ends.

On the other hand, if the average pixel value Kn is within the value range U (Yes in S15), the translucent area extraction unit 11 passes the area information on the area An to the mask generation unit 15 (S16). Return to processing. The area information of area An includes shape information MAn of area An and position information NAn of area An. The processing operation of the mask generation unit 15 will be described later.

Next, the processing operation of the feature quantity measuring unit 12 will be described. FIG. 4 is a flow chart showing an example of the processing operation of the feature amount measuring unit.
First, the feature amount measuring unit 12 continues to measure all pixel values within the area An in the information passed from the translucent area extracting unit 11 for each frame (S21).

The feature amount measurement unit 12 calculates the average value of all pixel values measured in S21 and holds this average value as the average pixel value Kn (S22). The feature amount measuring unit 12 returns the average pixel value Kn of the area An, which was held in S22, to the translucent area extracting unit 11 (S23).

Next, the processing operation of the original image acquiring unit 13 and the processing operation of the feature quantity comparing unit 14 shown in FIG. 1 will be sequentially described.
By providing the original image acquisition unit 13 and the feature amount comparison unit 14, if a problem occurs in the processing performed by the semi-transparent region extraction unit 11 and the feature amount measurement unit 12, the problem can be solved. By performing the complementing, high image quality can be maintained.

Therefore, the group consisting of the translucent region extracting section 11 and the feature quantity measuring section 12 and the group consisting of the original image obtaining section 13 and the feature quantity comparing section 14 may operate independently of each other. , both groups may exchange information and operate while complementing each other.

Specific matters that can be assumed as problems during the processing operations of the translucent region extracting unit 11 and the feature amount measuring unit 12 are, for example, (1) "Because the number of regions having feature amounts similar to smoke is remarkably large, In addition, the amount of calculation increases significantly, so the processing time increases." It is difficult to obtain good results in terms of characteristics.", etc.

FIG. 5 is a flow chart showing an example of the processing operation of the original image acquisition unit.
First, based on the shooting plan, the original image acquiring unit 13 shoots in advance each image showing the range to be shot using an imaging device (not shown) (S31).
Next, the original image acquisition unit 13 analyzes each image captured in S31, and stores a set W of feature values of pixels of each analyzed image in an internal memory (S32). ).
The feature set W may include various numerical values that characterize the pixels of the image, such as RGB values and luminance values of all pixels.
Then, the original image acquisition unit 13 passes the feature amount set W of the pixels of the image to the feature amount comparison unit 14 in response to the request from the feature amount comparison unit 14 (S33).

FIG. 6 is a flowchart illustrating an example of the processing operation of the feature quantity comparison unit.
First, the feature quantity comparison unit 14 receives the values of the feature quantity set W of the pixels of the original image from the original image acquisition unit 13 .
Then, the feature quantity comparison unit 14 compares the feature quantity set W with the feature quantity V of pixels of consecutive frames of the moving image to be processed (S41).

The feature quantity comparison unit 14 calculates Z according to the following formula (1).
Z = | V - W | ... formula (1)

The feature quantity comparison unit 14 determines whether or not the calculated value of Z is within the value range T specified by the following equation (2) (S42).
0<T≦a (a is a positive number) Expression (2)

If YES in S42, the feature quantity comparison unit 14 sets the n regions in which the pixel group within the value range U exists as region B1, region B2, region B3, . . . , region Bn. are stored in an internal memory or the like (S43).

Here, as long as the feature amount set W and the feature amount V are of the same type, they may be numerical values representing hue or luminance. Also, a in the above equation (2) may be a value set as a suitable value based on empirical rules by the editor of the image, or may be a value calculated by calculation processing.

Next, the pixel values Jn of the representative points Y1, Y2, Y3, . . . , Yn in the regions B1, B2, B3, . This representative point may be the center of gravity of the area Bn, or the point where the coordinate values (x, y) of the area Bn are the minimum values.
Finally, the feature amount comparison unit 14 passes the shape information MBn and the position information NBn, which are area information in the area Bn, to the mask generation unit 15 (S45), and returns to the processing of S41.

FIG. 7 is a flowchart illustrating an example of a processing operation of a mask generation unit;
First, the mask generation unit 15 extracts the area information (shape and position information) of the area An from the translucent area extraction unit 11 or the area information (shape and position information) of the area Bn from the feature amount comparison unit 14. ) is received (S51). In the following description, it is assumed that both the area information of area An and the area information of area Bn have been received.

The mask generation unit 15 generates moving averages FMAn, FMBn, FNAn for F (variable) frames going back from the present to the past for the shape information MAn, MBn and the position information NAn, NBn in the area information received in S51. , and FNBn are calculated (S52).

Here, the value of F may be set as any positive number. As an example of a guideline for F, a value of about 20 to 60 is suitable for a 60fps video. The value of F may be changed according to the weight of processing.

The mask generation unit 15 determines that the values of the moving averages FMAn and FMBn calculated in S52 are included in the value range GM shown in the following equation (3), and that the values of the moving averages FNAn and FNBn calculated in S52 are It is determined whether or not it is included in the range of GN indicated by the following formula (4) (S53).
0<GM≤b (b is a positive number) Equation (3)
0<GN≦b (c is a positive number) Equation (4)

In the case of Yes in S53, the mask generation unit 15 passes the shape information MAn, MBn and the position information NAn, NBn, which are area information, to the depth map generation unit 21 of the 3D image processing device 120 as mask information at this time. (S54). Further, when No in S53, the mask generation unit 15 terminates the process without passing the region information to the depth map generation unit 21 (S55).
By calculating the moving average in the above S52 and determining this value in S53, mask information is generated slowly, so that image flickering can be prevented.
The depth map generator 21 may be held within the image processing system, or may be used in an external processing device or module.

Also, the depth map generation unit 21 that has received the mask information generates depth information for each image based on this mask information. Further, when flicker is recognized in the depth information generated by the depth map generation unit 21, the flick management unit 22 notifies the semi-transparent area extraction unit 11 and the original image acquisition unit 13 of the semi-transparent area processing device 100. do. Upon receiving this notification, the semi-transparent area extraction unit 11 and the original image acquisition unit 13 start the above processing.

FIG. 8 is a block diagram showing an example of the hardware configuration of the translucent area processing device of the image processing system according to one embodiment of the present invention.
In the example shown in FIG. 8, the translucent area processing device 100 according to the above embodiment is configured by, for example, a server computer or a personal computer, and hardware such as a CPU (Central Processing Unit). It has a hardware processor 111A. A program memory 111B, a data memory 112, an input/output interface 113 and a communication interface 114 are connected to the hardware processor 111A via a bus 115. . The same applies to the 3D image processing device 120 as well.

The communication interface 114 includes, for example, one or more wireless communication interface units, and allows information to be sent and received to and from a communication network NW. As the wireless interface, an interface adopting a low-power wireless data communication standard such as a wireless LAN (Local Area Network) is used.

The input/output interface 113 is connected to an input device 200 and an output device 300 attached to the translucent area processing apparatus 100 and used by a user or the like.
The input/output interface 113 captures operation data input by a user or the like through an input device 200 such as a keyboard, touch panel, touchpad, mouse, etc., and outputs data to a liquid crystal or organic EL device. A process for outputting to and displaying on an output device 300 including a display device using (Electro Luminescence) or the like is performed. As the input device 200 and the output device 300, devices built in the translucent area processing apparatus 100 may be used, or other devices that can communicate with the translucent area processing apparatus 100 via the network NW. Information terminal input and output devices may be used.

The program memory 111B is a non-temporary tangible storage medium, for example, a non-volatile memory that can be written and read at any time, such as a HDD (Hard Disk Drive) or SSD (Solid State Drive), It is used in combination with a nonvolatile memory such as ROM (Read Only Memory), and stores programs necessary for executing various control processes and the like according to one embodiment.

The data memory 112 is used as a tangible storage medium, for example, by combining the above-described nonvolatile memory and a volatile memory such as RAM (random access memory), and various processes are performed. It is used to store various data acquired and created in the process.

A semi-transparent area processing device 100 according to an embodiment of the present invention includes a semi-transparent area extraction section 11, a feature amount measurement section 12, and an original image acquisition section 13 shown in FIG. , a feature comparison unit 14, and a mask generation unit 15. FIG.

Each information storage unit used as a working memory by each unit of the translucent area processing apparatus 100 can be configured by using the data memory 112 shown in FIG. However, these configured storage areas are not essential components in the translucent area processing device 100. For example, an external storage medium such as a USB (Universal Serial Bus) memory, or a database located in the cloud It may be an area provided in a storage device such as a server (database server).

The processing function units in each of the translucent region extraction unit 11, the feature amount measurement unit 12, the original image acquisition unit 13, the feature amount comparison unit 14, and the mask generation unit 15 are all stored in the program memory 111B. It can be realized by causing the hardware processor 111A to read and execute the program. Some or all of these processing functions may be implemented in a variety of other forms, including integrated circuits such as Application Specific Integrated Circuits (ASICs) or Field-Programmable Gate Arrays (FPGAs). may be implemented.

The image processing apparatus according to the present embodiment extracts one or more processing target regions based on the characteristic elements of the pixels of the moving image, and follows the extracted processing target regions in consecutive frames of the moving image. A feature amount of pixels in the area is measured, and information indicating the shape and position of the extracted processing target area is generated as image mask information when the average of the measured feature amounts is within a predetermined range. This makes it possible to prevent the occurrence of flicker in a moving image due to the movement of an area where objects in the background cannot be seen through.

In addition, the method described in each embodiment can be applied to a program (software means) that can be executed by a computer (computer), for example, a magnetic disk (floppy disk, hard disk) etc.), optical discs (CD-ROM, DVD, MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.) and other recording media, or transmitted and distributed via communication media can be The programs stored on the medium also include a setting program for configuring software means (including not only execution programs but also tables and data structures) to be executed by the computer. A computer that realizes this device reads a program recorded on a recording medium, and optionally constructs software means by a setting program, and executes the above-described processing by controlling the operation by this software means. The term "recording medium" as used herein is not limited to those for distribution, and includes storage media such as magnetic disks, semiconductor memories, etc. provided in computers or devices connected via a network.

It should be noted that the present invention is not limited to the above-described embodiments, and can be variously modified in the implementation stage without departing from the gist of the present invention. Further, each embodiment may be implemented in combination as appropriate, in which case the combined effect can be obtained. Furthermore, various inventions are included in the above embodiments, and various inventions can be extracted by combinations selected from a plurality of disclosed constituent elements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiments, if the problem can be solved and effects can be obtained, the configuration with the constituent elements deleted can be extracted as an invention.

DESCRIPTION OF SYMBOLS 100... Semi-transparent area processing apparatus 120... 3D image processing apparatus 11... Semi-transparent area extraction part 12... Feature amount measurement part 13... Original image acquisition part 14... Feature amount comparison part 15... Mask generation part

Claims

an extraction unit that extracts one or more processing target regions based on characteristic elements of pixels of a moving image;
a feature quantity measuring unit that measures a feature quantity of pixels in a region obtained by following the extracted processing target region in successive frames of the moving image;
A mask generation unit that generates information indicating the shape and position of the processing target region extracted by the extraction unit as mask information of an image when the average of the feature amounts measured by the feature amount measurement unit is within a predetermined range. and,
An image processing device with
The extractor is
Extracting an area in which pixel values in the image are within a predetermined value range as the processing target area;
The image processing apparatus according to claim 1.
The mask generation unit
calculating a moving average of information indicating the shape and position of the processing target area extracted by the extracting unit in a predetermined number of frames preceding the present among the consecutive frames, and calculating the moving average within a predetermined value range; When included, information indicating the shape and position of the processing target region extracted by the extracting unit in the predetermined number of frames going back is generated as the mask information.
The image processing apparatus according to claim 1.
When the difference between the feature amount of each pixel of a photographed image and the feature amount of pixels in consecutive frames of the moving image falls within a predetermined value range, one or more pixels having pixels included in the value range exist. further comprising a specifying unit that specifies the area as a processing target area,
The mask generation unit
generating information indicating the shape and position of the identified processing target area as mask information of the image;
The image processing apparatus according to claim 1.
A method performed by an image processing device, comprising:
an extraction unit that extracts one or more processing target regions based on characteristic elements of pixels of a moving image;
a feature quantity measuring unit that measures a feature quantity of pixels in a region obtained by following the extracted processing target region in successive frames of the moving image;
A mask generation unit that generates information indicating the shape and position of the processing target region extracted by the extraction unit as mask information of an image when the average of the feature amounts measured by the feature amount measurement unit is within a predetermined range. and,
An image processing method comprising:
The extractor is
Extracting an area in which pixel values in the image are within a predetermined value range as the processing target area;
6. The image processing method according to claim 5.
The mask generation unit
calculating a moving average of information indicating the shape and position of the processing target area extracted by the extracting unit in a predetermined number of frames preceding the present among the consecutive frames, and calculating the moving average within a predetermined value range; When included, information indicating the shape and position of the processing target region extracted by the extracting unit in the predetermined number of frames going back is generated as the mask information.
6. The image processing method according to claim 5.
An image processing program that causes a processor to function as each part of the image processing apparatus according to any one of claims 1 to 4.