WO2023166852A1 - 情報処理装置、情報処理方法、および、コンピュータ読み取り可能な非一時的記憶媒体 - Google Patents
情報処理装置、情報処理方法、および、コンピュータ読み取り可能な非一時的記憶媒体 Download PDFInfo
- Publication number
- WO2023166852A1 WO2023166852A1 PCT/JP2023/000250 JP2023000250W WO2023166852A1 WO 2023166852 A1 WO2023166852 A1 WO 2023166852A1 JP 2023000250 W JP2023000250 W JP 2023000250W WO 2023166852 A1 WO2023166852 A1 WO 2023166852A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- resolution
- image
- information processing
- low
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 60
- 238000003672 processing method Methods 0.000 title claims description 10
- 238000003860 storage Methods 0.000 title claims description 7
- 238000012545 processing Methods 0.000 claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000009877 rendering Methods 0.000 claims description 53
- 238000010200 validation analysis Methods 0.000 claims description 8
- 238000010191 image analysis Methods 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 claims description 4
- 239000008186 active pharmaceutical agent Substances 0.000 description 17
- 238000010586 diagram Methods 0.000 description 13
- 238000000034 method Methods 0.000 description 10
- 238000004519 manufacturing process Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000013213 extrapolation Methods 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 235000019640 taste Nutrition 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
Definitions
- the present invention relates to an information processing device, an information processing method, and a computer-readable non-temporary storage medium.
- Pixer Deep Learned Super Resolution for Feature Film Production ⁇ URL: https://graphics. pixar. com/library/SuperResolution/paper. pdf> (retrieved on February 15, 2022)
- rendering is often done in low resolution (eg 2K) and then enlarged to high resolution (eg 4K) in post-processing (up-scaling).
- high-definition video can be generated in a relatively short period of time.
- images obtained by upscaling tend to have poor image quality compared to images rendered in high resolution.
- the present disclosure proposes an information processing device, an information processing method, and a computer-readable non-temporary storage medium capable of generating high-quality video in a short time.
- a tuning unit that fine-tunes an image processing network using a first low-resolution video and a high-resolution video corresponding to a part of a sequence as a training data set, and the fine-tuned image processing network. and an upconverter for upscaling a second lower resolution video corresponding to the remainder of the sequence.
- an information processing method in which the information processing of the information processing device is executed by a computer, and a computer-readable non-temporary computer-readable non-temporary information processing method storing a program for causing the computer to realize the information processing of the information processing device.
- a storage medium is provided.
- FIG. 1 is a diagram for explaining the background of the present disclosure
- FIG. 1 is a diagram for explaining the background of the present disclosure
- FIG. 1 illustrates an example rendering system of the present disclosure
- FIG. 2 is a diagram schematically showing information processing performed in a rendering system
- FIG. 10 is a diagram showing an example of a method of selecting a first image area for fine tuning
- FIG. 10 is a diagram showing an example of a method of selecting a first image area for fine tuning
- FIG. 10 is a diagram illustrating an example of a processing flow relating to display video generation processing; It is a figure which shows the hardware structural example of an information processing apparatus.
- [1. background] 1 and 2 are diagrams for explaining the background of the present disclosure.
- Non-Patent Document 1 a general-purpose DNN (Deep Neural Network) trained on a general-purpose dataset is used for upscaling.
- a general-purpose data set means a highly versatile learning data set containing various CG contents accumulated before the production of a movie. With this method, standard image quality can be ensured for various video contents. However, there are a wide variety of videos produced at the production site, and sufficient image quality cannot always be provided for the intended video content.
- a series of videos is divided into multiple sequences, and a dedicated learning data set (dedicated data set) is prepared for each sequence.
- a dedicated data set contains a portion of the footage in the sequence.
- a generic DNN is fine-tuned using a dedicated dataset to generate a dedicated DNN for each sequence. High-precision upscaling by a dedicated DNN for each sequence can improve the image quality significantly over the entire video. A specific description will be given below.
- FIG. 3 is a diagram illustrating an example rendering system of the present disclosure.
- the rendering system of the present disclosure is realized by the information processing device 1 shown in FIG.
- the information processing device 1 has, for example, a content database DB, a low resolution rendering section 10, a selection section 11, a high resolution rendering section 12, a tuning section 13, an upconverter 14 and an output section 15.
- FIG. 1 A content database DB, a low resolution rendering section 10, a selection section 11, a high resolution rendering section 12, a tuning section 13, an upconverter 14 and an output section 15.
- the content database DB stores video content CT.
- the video content CT includes 3D data used for rendering.
- the video content CT is output as a high-quality video through low-resolution rendering processing and upscaling processing.
- the low-resolution rendering unit 10 divides a series of images into multiple sequences.
- the low-resolution rendering unit 10 renders the video content CT at low resolution for each sequence to generate a low-resolution sequence video LS (see FIG. 4).
- a part of the low-resolution sequence video LS is used for fine-tuning the image processing network NW for upscaling.
- “low resolution” means resolution lower than the resolution of the display image which should be finally output.
- “High resolution” means high resolution to be provided in the displayed image.
- the video area used for fine tuning will be referred to as the first video area VA1 (see FIG. 4), and the video content CT corresponding to the first video area VA1 will be referred to as the first video content CT1.
- the remaining image areas (all image areas other than the first image area VA1) that are not used for fine tuning are referred to as a second image area VA2 (see FIG. 4), and the image content CT corresponding to the second image area VA2 is described. It is described as a second video content CT2.
- the selection unit 11 selects the first video area VA1 for fine tuning from the sequence.
- the low-resolution rendering unit 10 renders the first video content CT1 at low resolution to generate a first low-resolution video LV1 corresponding to part of the sequence.
- the low-resolution rendering unit 10 renders the second video content CT2 at low resolution to generate a second low-resolution video LV2 corresponding to the remainder of the sequence.
- the high-resolution rendering unit 12 renders the first video content CT1 with high resolution to generate a high-resolution video HV.
- the tuning unit 13 uses the first low-resolution video LV1 (input data) and the high-resolution video HV (correct data) corresponding to a part of the sequence as a learning data set DS (see FIG. 4) to perform image processing that is a general-purpose DNN. Fine tune the network NW.
- Fine-tuning means the process of re-learning the weights of the entire model using the weights of the trained network (base weight BW) as initial values.
- the tuning unit 13 replaces the weights (base weights BW) of the image processing network NW (general-purpose DNN) before relearning with the weights after relearning (fine tune weights FW).
- NW general-purpose DNN
- This provides a dedicated image processing network NW (dedicated DNN) specialized for sequences.
- the upconverter 14 upscales the second low-resolution video LV2 using a fine-tuned image processing network NW (dedicated DNN).
- NW fine-tuned image processing network
- the output unit 15 synthesizes the converted video CV obtained by upscaling the second low resolution video LV2 and the high resolution video HV generated for fine tuning.
- the output unit 15 outputs the high-resolution sequence video HS obtained by synthesis as a display video.
- FIG. 4 is a diagram schematically showing information processing performed in the rendering system.
- the low-resolution rendering unit 10 generates low-resolution rendered images RI (low-resolution rendered images LI) for all frames in the sequence. As a result, a low-resolution rendered image LS including low-resolution rendered images LI of all frames is generated.
- the selection unit 11 selects a plurality of frames to be used for the learning data set DS based on the order of frames or the image analysis results of the low-resolution sequence video LS.
- a plurality of selected frames correspond to the first video area VA1
- a plurality of unselected frames correspond to the second video area VA2.
- the high-resolution rendering unit 12 generates high-resolution rendered images RI (high-resolution rendered images HI) for all selected frames SF. As a result, a high resolution video HV that selectively includes the high resolution rendered image HI of each selected frame SF is generated.
- the tuning unit 13 extracts the low-resolution rendered image LI and the high-resolution rendered image HI generated for the same selected frame SF as an image pair for training.
- the tuning unit 13 generates a learning data set DS based on the image pairs of all selected frames SF.
- the tuning unit 13 fine-tunes the image processing network NW of the upconverter 14 using the learning data set DS.
- the upconverter 14 extracts the low-resolution rendered images LI of all non-selected frames NF from the low-resolution sequence video LV.
- the upconverter 14 upscales the low-resolution rendered image LI of the non-selected frame NF using a fine-tuned image processing network NW to generate the converted image CI. Thereby, a converted video CV selectively including the converted image CI of each non-selected frame NF is generated.
- the output unit 15 synthesizes the high-resolution video HV generated for the first video area VA1 (selected frame SF) and the converted video CV generated for the second video area VA2 (non-selected frame NF) to produce a high-resolution image.
- a resolution sequence video HS is generated.
- the output unit 15 outputs the high resolution sequence video HS as a display video.
- the example of FIG. 5 is a selection example based on mechanical processing.
- the selection unit 11 selects the first video area VA1 based on the order of the frames.
- a preset number of frames from the beginning of the sequence are selected as the selection frame SF (first video area VA1).
- the selection frame SF first video area VA1
- a plurality of frames arranged at substantially regular intervals are selected as the selected frames SF.
- the middle example in FIG. 5 shows the case without extrapolation
- the lower example in FIG. 5 shows the case with extrapolation.
- the example of FIG. 6 is an example of selection based on video content.
- the selection unit 11 performs image analysis on the sequence and selects the first video area VA1.
- the selection unit 11 performs image analysis on the low-resolution sequence video LV and selects a video region whose image similarity exceeds the similarity criterion as the first video region VA1.
- Image similarity means similarity between low-resolution rendering images LI.
- a similarity criterion means a criterion for similarity judgment.
- the similarity criterion can be arbitrarily set by the user.
- the selection unit 11 selects a plurality of frames having a larger amount of information and a lower image similarity than the low-resolution rendered images LI of the other frames as selected frames SF, in light of the similarity criterion.
- the analogy judgment may be made based on a rule such as a threshold value, or may be made using a DNN for analogy judgment.
- the information amount of the low-resolution rendering image LI is calculated using a feature amount such as luminance variance, and the calculated information amount or the image feature amount calculated from the information amount is set as a predetermined reference value.
- a method of performing analogy judgment by comparison is conceivable.
- the image similarity is measured by PSNR (Peak Signal-to-Noise Ratio) or the like, and the measured image similarity is compared with a predetermined reference value to make a similarity judgment. is also conceivable.
- the user prepares a group of images showing various characters and scenes.
- the user determines in advance the number of frames to be selected by the selection unit 11 (the number of frames to be selected: N, for example).
- the user measures the PSNR of each image, and determines the combination of N images with the highest total PSNR value as correct data.
- the user learns the DNN for analogy judgment using the image group and the correct data as a learning data set. This generates a DNN capable of selecting a combination of N frames with a high PSNR (picture similarity) with high probability.
- FIG. 7 is a diagram showing an example of fine tuning.
- the tuning unit 13 acquires the image pair of each selected frame SF as a learning data set DS.
- the tuning unit 13 can extract a specific image region from the image pair as a patch and perform fine tuning using the extracted patch.
- the patch size can be arbitrarily determined, such as 128 pixels ⁇ 128 pixels or 64 pixels ⁇ 64 pixels. Examples of patch extraction methods include a method of randomly extracting patches from within an image, and a method of adaptively extracting regions effective for learning based on image analysis results.
- the learning data set DS may include image data other than the image data (image pair) described above.
- the tuning unit 13 can add image data of another content different from the video content CT of the sequence to the learning data set DS.
- Image data of another content can include, for example, part of pre-train data used for machine learning of the image processing network NW before being fine-tuned.
- the tuning unit 13 can use patches extracted from pre-train data for fine tuning. This provides some generalization performance for unknown inputs.
- FIG. 8 is a diagram illustrating an example of a processing flow relating to display image generation processing.
- the low-resolution rendering unit 10 renders the sequence video content CT at a low resolution to generate a low-resolution sequence video LS (step S1).
- the selection unit 11 selects the first video area VA1 in the sequence to be used for fine tuning (step S2).
- the high-resolution rendering unit 12 renders the video content CT of the first video area VA1 at high resolution to generate a high-resolution video HV (step S3).
- the tuning unit 13 fine-tunes the image processing network NW for upscaling using the low-resolution video LV (first low-resolution video LV1) of the first video area VA1 and the high-resolution video HV as the learning data set DS (step S4 ).
- the tuning unit 13 determines whether fine tuning is appropriate (step S5).
- the tuning unit 13 uses part of the learning data set DS as a validation data set.
- the tuning unit 13 fine-tunes based on the comparison result between the converted video CV obtained by upscaling the first low-resolution video LV1 included in the validation data set and the high-resolution video HV corresponding to the converted video CV. Determine the adequacy of
- the tuning unit 13 determines that the fine tuning has been properly performed when the adequacy satisfies the allowable standard. For example, the tuning unit 13 determines that the fine-tuning is completed when the adequacy satisfies the allowable standard during learning using the learning data set DS, and ends the learning. The tuning unit 13 determines that the fine tuning is not properly performed when the adequacy that satisfies the allowable standard is not obtained.
- the adequacy is calculated, for example, as the difference between the converted video CV and the high-resolution video HV to be compared.
- the acceptance criteria can be arbitrarily set by the user using a threshold or the like.
- step S5 If the fine-tuning is appropriate (step S5: Yes), the upconverter 14 uses the fine-tuned image processing network NW to convert the low-resolution video LV (second low-resolution Video LV2) is upscaled (step S9).
- the output unit 15 combines the converted video CV obtained by upscaling and the high resolution video HV used for fine tuning to generate a high resolution sequence video HS (step S10).
- step S6 determines whether there is room for changing the learning conditions. Examples of changes in learning conditions include changes in hyperparameters. If there is room to change the learning conditions (step S6: Yes), the tuning unit 13 changes the learning conditions (step S7), returns to step S4, and performs learning again.
- step S6 If there is no room to change the learning conditions (step S6: No), the tuning unit 13 changes the frame (first video area VA1) used for the learning data set DS (step S8), and returns to step S3. and learn again.
- the tuning unit 13 may change the length of the sequence instead of changing the first video area VA1.
- FIG. 9 is a diagram showing a hardware configuration example of the information processing apparatus 1. As shown in FIG.
- the information processing of the information processing device 1 is realized by the computer 1000, for example.
- the computer 1000 has a CPU (Central Processing Unit) 1100, a RAM (Random Access Memory) 1200, a ROM (Read Only Memory) 1300, a HDD (Hard Disk Drive) 1400, a communication interface 1500, and an input/output interface 1600.
- CPU Central Processing Unit
- RAM Random Access Memory
- ROM Read Only Memory
- HDD Hard Disk Drive
- the CPU 1100 operates based on a program (program data 1450) stored in the ROM 1300 or HDD 1400, and controls each section. For example, CPU 1100 loads programs stored in ROM 1300 or HDD 1400 into RAM 1200 and executes processes corresponding to various programs.
- program data 1450 program data 1450
- the ROM 1300 stores a boot program such as BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, and programs dependent on the hardware of the computer 1000.
- BIOS Basic Input Output System
- the HDD 1400 is a computer-readable non-temporary recording medium that non-temporarily records programs executed by the CPU 1100 and data used by such programs.
- the HDD 1400 is a recording medium that records the information processing program according to the embodiment, which is an example of the program data 1450 .
- a communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet).
- CPU 1100 receives data from another device or transmits data generated by CPU 1100 to another device via communication interface 1500 .
- the input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000 .
- CPU 1100 receives data from input devices such as a keyboard and mouse via input/output interface 1600 .
- the CPU 1100 transmits data to an output device such as a display device, a speaker, or a printer via the input/output interface 1600 .
- the input/output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium.
- Media include, for example, optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, semiconductor memories, etc. is.
- the CPU 1100 of the computer 1000 executes the information processing program loaded on the RAM 1200 to implement the functions of the above-described units.
- the HDD 1400 also stores an information processing program, various models, and various data according to the present disclosure.
- CPU 1100 reads and executes program data 1450 from HDD 1400 , as another example, these programs may be obtained from another device via external network 1550 .
- the information processing device 1 has a tuning section 13 and an upconverter 14 .
- the tuning unit 13 fine-tunes the image processing network NW using the first low-resolution video LV1 and the high-resolution video HV corresponding to a part of the sequence as the learning data set DS.
- An upconverter 14 upscales the second low resolution video LV2 corresponding to the remainder of the sequence using a fine-tuned image processing network NW.
- the processing of the information processing apparatus 1 is executed by the computer 1000 .
- the computer-readable non-temporary storage medium of the present disclosure stores a program that causes the computer 1000 to implement the processing of the information processing device 1 .
- the image processing network NW for upscaling is appropriately fine-tuned using part of the video of the sequence.
- the computation time is relatively short. Therefore, even if the time required for fine-tuning is considered, a high-quality sequence video is generated in a relatively short time. In a verification experiment conducted by the inventor regarding this point, the following effect of shortening the rendering time is obtained.
- the information processing device 1 has an output unit 15 .
- the output unit 15 synthesizes the converted video CV obtained by upscaling the second low-resolution video LV2 and the high-resolution video HV generated for fine tuning, and outputs a high-resolution sequence video HS.
- the high-resolution video HV obtained by large-scale computation is effectively used not only for fine tuning but also as part of the output video.
- the information processing device 1 has a selection unit 11 , a low resolution rendering unit 10 and a high resolution rendering unit 12 .
- the selection unit 11 selects the first video area VA1 for fine tuning from the sequence.
- the low-resolution rendering unit 10 renders the video content CT of the first video area VA1 at a low resolution to generate a first low-resolution video LV1.
- the low-resolution rendering unit 10 renders the video content CT of the second video area VA2 other than the first video area VA1 at a low resolution to generate a second low-resolution video LV2.
- the high-resolution rendering unit 12 renders the video content CT of the first video area VA1 at high resolution to generate a high-resolution video HV.
- the selection unit 11 selects the first video area VA1 based on the order of frames.
- the selection of the first video area VA1 can be mechanically performed.
- the selection unit 11 performs image analysis on the sequence and selects the first video area VA1.
- an appropriate area corresponding to the video content CT can be selected as the first video area VA1.
- the selection unit 11 selects a video area whose image similarity exceeds the similarity criterion as the first video area VA1.
- an area with a large amount of information and a low degree of similarity with other frames can be selected as the first video area VA1.
- the tuning unit 13 adds image data of another content different from the video content CT of the sequence to the learning data set DS.
- the image data of the separate content includes part of the pre-train data used for machine learning of the image processing network NW before fine-tuning.
- the tuning unit 13 uses part of the learning data set DS as a validation data set.
- the tuning unit 13 fine-tunes based on the comparison result between the converted video CV obtained by upscaling the first low-resolution video LV1 included in the validation data set and the high-resolution video HV corresponding to the converted video CV. Determine the adequacy of
- the tuning unit 13 determines that fine-tuning is completed when the adequacy satisfies the permissible standard during learning using the learning data set DS.
- fine tuning can be automatically terminated based on the appropriateness.
- the tuning unit 13 changes the learning conditions and performs learning again.
- the tuning unit 13 changes the video area (first video area VA1) of the sequence used for the learning data set DS or changes the length of the sequence to perform learning again. conduct.
- the present technology can also adopt the following configuration.
- a tuning unit that fine-tunes an image processing network using a first low-resolution image and a high-resolution image corresponding to a portion of the sequence as training data sets; an upconverter for upscaling a second lower resolution video corresponding to the remainder of the sequence using the fine-tuned image processing network;
- Information processing device having (2) An output unit that synthesizes the converted video obtained by upscaling the second low-resolution video and the high-resolution video and outputs a high-resolution sequence video, The information processing apparatus according to (1) above.
- a selection unit that selects a first image area for fine tuning from the sequence; Rendering the video content of the first video region at low resolution to generate the first low-resolution video, and rendering the video content of the second video region other than the first video region at low resolution and rendering the 2 a low-resolution rendering unit that generates low-resolution video; a high-resolution rendering unit that renders the video content of the first video region in high resolution to generate the high-resolution video;
- the information processing apparatus according to (1) or (2) above.
- the selection unit selects the first video region based on the order of frames.
- the selection unit selects the first video area by image analysis of the sequence.
- the selection unit selects a video region having an image similarity exceeding a similarity criterion as the first video region.
- the tuning unit adds image data of another content different from the video content of the sequence to the learning data set.
- the image data of the different content includes a part of pre-train data used for machine learning of the image processing network before being fine-tuned, The information processing apparatus according to (7) above.
- the tuning unit uses a part of the learning data set as a validation data set, a converted video obtained by upscaling the first low-resolution video included in the validation data set, and the converted video Determining the appropriateness of fine-tuning based on the comparison result of the corresponding high-resolution image,
- the information processing apparatus according to any one of (1) to (8) above.
- the tuning unit determines that fine tuning has been completed when the adequacy satisfies an allowable criterion during learning using the learning data set.
- the information processing device according to (9) above.
- (11) If the adequacy that satisfies the acceptance criteria is not obtained, the tuning unit changes the learning conditions and performs learning again.
- the information processing apparatus according to (10) above.
- the tuning unit changes the video region of the sequence used for the learning data set or changes the length of the sequence to perform learning again.
- the information processing apparatus according to (11) above.
- (13) Fine-tuning an image processing network using a first low-resolution image and a high-resolution image corresponding to a portion of the sequence as training data sets, upscaling a second low-resolution video corresponding to the remainder of the sequence using the fine-tuned image processing network;
- a computer-implemented information processing method comprising: (14) Fine-tuning an image processing network using a first low-resolution image and a high-resolution image corresponding to a portion of the sequence as training data sets, upscaling a second low-resolution video corresponding to the remainder of the sequence using the fine-tuned image processing network;
- a computer-readable non-temporary storage medium that stores a program that causes a computer to implement a task.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
[1.背景]
[2.レンダリングシステム]
[3.情報処理方法]
[3-1.ファインチューニング用の映像領域の選択]
[3-2.ファインチューニング]
[3-3.処理フロー]
[4.ハードウェア構成例]
[5.効果]
図1および図2は、本開示の背景を説明する図である。
図3は、本開示のレンダリングシステムの一例を示す図である。
図4は、レンダリングシステムで行われる情報処理を模式的に示す図である。
図5および図6は、ファインチューニング用の第1映像領域VA1の選択方法の一例を示す図である。
図5の例は、機械的な処理に基づく選択例である。選択部11は、フレームの順番に基づいて第1映像領域VA1を選択する。
図6の例は、映像内容に基づく選択例である。選択部11は、シーケンスを画像解析して第1映像領域VA1を選択する。
図7は、ファインチューニングの一例を示す図である。
図8は、表示映像の生成処理に関する処理フローの一例を示す図である。
図9は、情報処理装置1のハードウェア構成例を示す図である。
情報処理装置1は、チューニング部13およびアップコンバータ14を有する。チューニング部13は、シーケンスの一部に対応する第1低解像度映像LV1および高解像度映像HVを学習用データセットDSとして画像処理ネットワークNWをファインチューニングする。アップコンバータ14は、ファインチューニングされた画像処理ネットワークNWを用いてシーケンスの残りの部分に対応する第2低解像度映像LV2をアップスケーリングする。本開示の情報処理方法は、情報処理装置1の処理がコンピュータ1000により実行される。本開示のコンピュータ読み取り可能な非一時的記憶媒体は、情報処理装置1の処理をコンピュータ1000に実現させるプログラムを記憶する。
・1シーケンス::200フレーム
・高解像度(4K)でのレンダリング時間:12時間/フレーム
・低解像度(2K)でのレンダリング時間:3時間/フレーム
・ファインチューニングの学習にかかる時間:2時間
・推論(アップスケーリング)にかかる時間:100秒/フレーム
・全て高解像度(4K)でレンダリングした場合のレンダリング時間:100日
・本開示の手法を用いた場合のレンダリング時間:約28日
(内訳)
・全フレームに対する低解像度(2K)のレンダリング時間:25日
・第1映像領域に対する高解像度(4K)のレンダリング時間:2.5日
・ファインチューニングにかかる時間:2時間
・推論(アップスケーリング)にかかる時間:約0.5時間
なお、本技術は以下のような構成も採ることができる。
(1)
シーケンスの一部に対応する第1低解像度映像および高解像度映像を学習用データセットとして画像処理ネットワークをファインチューニングするチューニング部と、
ファインチューニングされた前記画像処理ネットワークを用いて前記シーケンスの残りの部分に対応する第2低解像度映像をアップスケーリングするアップコンバータと、
を有する情報処理装置。
(2)
前記第2低解像度映像をアップスケーリングして得られたコンバート映像と、前記高解像度映像と、を合成して高解像度シーケンス映像として出力する出力部を有する、
上記(1)に記載の情報処理装置。
(3)
前記シーケンスからファインチューニング用の第1映像領域を選択する選択部と、
前記第1映像領域の映像コンテンツを低解像度でレンダリングして前記第1低解像度映像を生成し、かつ、前記第1映像領域以外の第2映像領域の映像コンテンツを低解像度でレンダリングして前記第2低解像度映像を生成する低解像度レンダリング部と、
前記第1映像領域の映像コンテンツを高解像度でレンダリングして前記高解像度映像を生成する高解像度レンダリング部と、
を有する上記(1)または(2)に記載の情報処理装置。
(4)
前記選択部は、フレームの順番に基づいて前記第1映像領域を選択する、
上記(3)に記載の情報処理装置。
(5)
前記選択部は、前記シーケンスを画像解析して前記第1映像領域を選択する、
上記(3)に記載の情報処理装置。
(6)
前記選択部は、画像類似度が類似規準を超える映像領域を前記第1映像領域として選択する、
上記(5)に記載の情報処理装置。
(7)
前記チューニング部は、前記シーケンスの映像コンテンツとは異なる別コンテンツの画像データを前記学習用データセットに加える、
上記(1)ないし(6)のいずれか1つに記載の情報処理装置。
(8)
前記別コンテンツの画像データは、ファインチューニングされる前の前記画像処理ネットワークの機械学習に用いられたプレトレインデータの一部を含む、
上記(7)に記載の情報処理装置。
(9)
前記チューニング部は、前記学習用データセットの一部をバリデーション用データセットとして用い、前記バリデーション用データセットに含まれる前記第1低解像度映像をアップスケーリングして得られるコンバート映像と、前記コンバート映像に対応する前記高解像度映像と、の比較結果に基づいてファインチューニングの適正度を判定する、
上記(1)ないし(8)のいずれか1つに記載の情報処理装置。
(10)
前記チューニング部は、前記学習用データセットを用いた学習中に前記適正度が許容基準を満たした場合にファインチューニングが完了したと判定する、
上記(9)に記載の情報処理装置。
(11)
前記チューニング部は、前記許容基準を満たす前記適正度が得られない場合には、学習条件を変更して再度学習を行う、
上記(10)に記載の情報処理装置。
(12)
前記チューニング部は、前記学習条件の変更の余地がない場合には、前記学習用データセットに用いるシーケンスの映像領域の変更または前記シーケンスの長さの変更を行って再度学習を行う、
上記(11)に記載の情報処理装置。
(13)
シーケンスの一部に対応する第1低解像度映像および高解像度映像を学習用データセットとして画像処理ネットワークをファインチューニングし、
ファインチューニングされた前記画像処理ネットワークを用いて前記シーケンスの残りの部分に対応する第2低解像度映像をアップスケーリングする、
ことを有する、コンピュータにより実行される情報処理方法。
(14)
シーケンスの一部に対応する第1低解像度映像および高解像度映像を学習用データセットとして画像処理ネットワークをファインチューニングし、
ファインチューニングされた前記画像処理ネットワークを用いて前記シーケンスの残りの部分に対応する第2低解像度映像をアップスケーリングする、
ことをコンピュータに実現させるプログラムを記憶した、コンピュータ読み取り可能な非一時的記憶媒体。
10 低解像度レンダリング部
11 選択部
12 高解像度レンダリング部
13 チューニング部
14 アップコンバータ
15 出力部
CT 映像コンテンツ
CV コンバート映像
DS 学習用データセット
LV1 第1低解像度映像
LV2 第2低解像度映像
HS 高解像度シーケンス映像
HV 高解像度映像
NW 画像処理ネットワーク
VA1 第1映像領域
VA2 第2映像領域
Claims (14)
- シーケンスの一部に対応する第1低解像度映像および高解像度映像を学習用データセットとして画像処理ネットワークをファインチューニングするチューニング部と、
ファインチューニングされた前記画像処理ネットワークを用いて前記シーケンスの残りの部分に対応する第2低解像度映像をアップスケーリングするアップコンバータと、
を有する情報処理装置。 - 前記第2低解像度映像をアップスケーリングして得られたコンバート映像と、前記高解像度映像と、を合成して高解像度シーケンス映像として出力する出力部を有する、
請求項1に記載の情報処理装置。 - 前記シーケンスからファインチューニング用の第1映像領域を選択する選択部と、
前記第1映像領域の映像コンテンツを低解像度でレンダリングして前記第1低解像度映像を生成し、かつ、前記第1映像領域以外の第2映像領域の映像コンテンツを低解像度でレンダリングして前記第2低解像度映像を生成する低解像度レンダリング部と、
前記第1映像領域の映像コンテンツを高解像度でレンダリングして前記高解像度映像を生成する高解像度レンダリング部と、
を有する請求項1に記載の情報処理装置。 - 前記選択部は、フレームの順番に基づいて前記第1映像領域を選択する、
請求項3に記載の情報処理装置。 - 前記選択部は、前記シーケンスを画像解析して前記第1映像領域を選択する、
請求項3に記載の情報処理装置。 - 前記選択部は、画像類似度が類似規準を超える映像領域を前記第1映像領域として選択する、
請求項5に記載の情報処理装置。 - 前記チューニング部は、前記シーケンスの映像コンテンツとは異なる別コンテンツの画像データを前記学習用データセットに加える、
請求項1に記載の情報処理装置。 - 前記別コンテンツの画像データは、ファインチューニングされる前の前記画像処理ネットワークの機械学習に用いられたプレトレインデータの一部を含む、
請求項7に記載の情報処理装置。 - 前記チューニング部は、前記学習用データセットの一部をバリデーション用データセットとして用い、前記バリデーション用データセットに含まれる前記第1低解像度映像をアップスケーリングして得られるコンバート映像と、前記コンバート映像に対応する前記高解像度映像と、の比較結果に基づいてファインチューニングの適正度を判定する、
請求項1に記載の情報処理装置。 - 前記チューニング部は、前記学習用データセットを用いた学習中に前記適正度が許容基準を満たした場合にファインチューニングが完了したと判定する、
請求項9に記載の情報処理装置。 - 前記チューニング部は、前記許容基準を満たす前記適正度が得られない場合には、学習条件を変更して再度学習を行う、
請求項10に記載の情報処理装置。 - 前記チューニング部は、前記学習条件の変更の余地がない場合には、前記学習用データセットに用いるシーケンスの映像領域の変更または前記シーケンスの長さの変更を行って再度学習を行う、
請求項11に記載の情報処理装置。 - シーケンスの一部に対応する第1低解像度映像および高解像度映像を学習用データセットとして画像処理ネットワークをファインチューニングし、
ファインチューニングされた前記画像処理ネットワークを用いて前記シーケンスの残りの部分に対応する第2低解像度映像をアップスケーリングする、
ことを有する、コンピュータにより実行される情報処理方法。 - シーケンスの一部に対応する第1低解像度映像および高解像度映像を学習用データセットとして画像処理ネットワークをファインチューニングし、
ファインチューニングされた前記画像処理ネットワークを用いて前記シーケンスの残りの部分に対応する第2低解像度映像をアップスケーリングする、
ことをコンピュータに実現させるプログラムを記憶した、コンピュータ読み取り可能な非一時的記憶媒体。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202380023515.0A CN118805196A (zh) | 2022-03-01 | 2023-01-10 | 信息处理装置、信息处理方法和计算机可读非暂态存储介质 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022030697 | 2022-03-01 | ||
JP2022-030697 | 2022-03-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023166852A1 true WO2023166852A1 (ja) | 2023-09-07 |
Family
ID=87883643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2023/000250 WO2023166852A1 (ja) | 2022-03-01 | 2023-01-10 | 情報処理装置、情報処理方法、および、コンピュータ読み取り可能な非一時的記憶媒体 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118805196A (ja) |
WO (1) | WO2023166852A1 (ja) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019008599A (ja) * | 2017-06-26 | 2019-01-17 | 株式会社 Ngr | 順伝播型ニューラルネットワークを用いた画像ノイズ低減方法 |
JP2019528514A (ja) * | 2016-08-05 | 2019-10-10 | クアルコム,インコーポレイテッド | 動的なフォビエーション調整 |
JP2020035094A (ja) * | 2018-08-28 | 2020-03-05 | オリンパス株式会社 | 機械学習装置、教師用データ作成装置、推論モデル、および教師用データ作成方法 |
JP2020102012A (ja) * | 2018-12-21 | 2020-07-02 | キヤノン株式会社 | 画像処理装置、画像処理方法およびプログラム |
-
2023
- 2023-01-10 WO PCT/JP2023/000250 patent/WO2023166852A1/ja active Application Filing
- 2023-01-10 CN CN202380023515.0A patent/CN118805196A/zh active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019528514A (ja) * | 2016-08-05 | 2019-10-10 | クアルコム,インコーポレイテッド | 動的なフォビエーション調整 |
JP2019008599A (ja) * | 2017-06-26 | 2019-01-17 | 株式会社 Ngr | 順伝播型ニューラルネットワークを用いた画像ノイズ低減方法 |
JP2020035094A (ja) * | 2018-08-28 | 2020-03-05 | オリンパス株式会社 | 機械学習装置、教師用データ作成装置、推論モデル、および教師用データ作成方法 |
JP2020102012A (ja) * | 2018-12-21 | 2020-07-02 | キヤノン株式会社 | 画像処理装置、画像処理方法およびプログラム |
Non-Patent Citations (1)
Title |
---|
"Deep Learned Super Resolution for Feature Film Production", PIXER, 15 February 2022 (2022-02-15), Retrieved from the Internet <URL:https://graphics.pixar.com/library/SuperResolution/paper.pdf> |
Also Published As
Publication number | Publication date |
---|---|
CN118805196A (zh) | 2024-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11631162B2 (en) | Machine learning training method, system, and device | |
US20210160556A1 (en) | Method for enhancing resolution of streaming file | |
KR102169242B1 (ko) | 초해상도 영상 복원을 위한 기계 학습 방법 | |
US20190251707A1 (en) | Saliency prediction for a mobile user interface | |
CN112771578B (zh) | 使用细分缩放和深度上缩放的图像生成 | |
Van Hoorick | Image outpainting and harmonization using generative adversarial networks | |
WO2022247568A1 (zh) | 一种图像恢复方法、装置和设备 | |
Wei et al. | Improving resolution of medical images with deep dense convolutional neural network | |
US11887277B2 (en) | Removing compression artifacts from digital images and videos utilizing generative machine-learning models | |
US20220270365A1 (en) | Image recognition method, video playback method, related devices | |
US20240169479A1 (en) | Video generation with latent diffusion models | |
KR102537207B1 (ko) | 머신 러닝에 기반한 이미지 처리 방법 및 장치 | |
CN113157275B (zh) | 帧动画的渲染方法、装置、电子设备及存储介质 | |
CN112801876B (zh) | 信息处理方法、装置及电子设备和存储介质 | |
CN117835001A (zh) | 视频编辑方法、装置、设备和介质 | |
WO2023166852A1 (ja) | 情報処理装置、情報処理方法、および、コンピュータ読み取り可能な非一時的記憶媒体 | |
US20230298135A1 (en) | Image super-resolution method using frequency domain features | |
CN116030256A (zh) | 小目标分割方法、小目标分割系统、设备和介质 | |
JP7159582B2 (ja) | 監視ビデオにおけるデータの拡張方法及び装置 | |
CN114008661A (zh) | 图像处理方法、装置及其计算机程序产品 | |
CN112565819B (zh) | 一种视频数据处理的方法及装置、电子设备、存储介质 | |
US20230186608A1 (en) | Method, device, and computer program product for video processing | |
WO2023010701A1 (en) | Image generation method, apparatus, and electronic device | |
WO2023072072A1 (zh) | 一种模糊图像生成方法、网络模型训练方法及装置 | |
CN112581363A (zh) | 图像超分辨率重建方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23763115 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2024504385 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023763115 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2023763115 Country of ref document: EP Effective date: 20241001 |