WO2024009721A1 - Dispositif de traitement des images et procédé de traitement des images - Google Patents

Dispositif de traitement des images et procédé de traitement des images Download PDF

Info

Publication number
WO2024009721A1
WO2024009721A1 PCT/JP2023/022231 JP2023022231W WO2024009721A1 WO 2024009721 A1 WO2024009721 A1 WO 2024009721A1 JP 2023022231 W JP2023022231 W JP 2023022231W WO 2024009721 A1 WO2024009721 A1 WO 2024009721A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
clothing
clothed
avatar
human body
Prior art date
Application number
PCT/JP2023/022231
Other languages
English (en)
Japanese (ja)
Inventor
倫晶 有定
新吾 堀内
大暉 市原
祥彦 静野
浩之 木村
裕貴 中山
喜貴 千賀
Original Assignee
株式会社Nttデータ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Nttデータ filed Critical 株式会社Nttデータ
Publication of WO2024009721A1 publication Critical patent/WO2024009721A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present invention relates to an image processing device and an image processing method. More specifically, the present invention attaches 3D clothing data created by a user to a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and automatically converts the 2D data of the clothing image and the 2D data of the human.
  • the present invention relates to an image processing apparatus and an image processing method for synthesizing images and outputting them as a 2D data wearing image.
  • GAN Geneative Adversarial Network
  • An avatar is a character used as a user's alter ego on a network, and includes two-dimensional (2D) image avatars and three-dimensional (3D) avatars.
  • GAN is a method of generating images by learning using two neural networks. GAN technology has made it possible for computers to generate an unlimited number of similar images as if they were taken of the real thing.
  • 3D avatars and GAN technology efforts are being made in the apparel and advertising industries to utilize non-existent models (virtual models) in sales promotions, marketing, and other business operations.
  • 3D models of clothing are created, and the 3D clothing data is converted into 2D data and composited with the model's image. Because the image may sometimes be hidden away, corrections were made manually to make it look natural.
  • Patent Document 1 The technique disclosed in Patent Document 1 was known as an image synthesis technique for synthesizing clothing data with a mannequin image. However, even if the technique of Patent Document 1 is used, there is a problem in that the combination of the clothing image and the model image still looks unnatural. For example, when an image of a human body from the neck up and an image of clothing are combined, the lining of the back of the neck is displayed, leaving the combined image unnatural.
  • the present invention was made to solve such problems, and it involves dressing a 3D avatar with 3D clothing data created by a user in a 3DCG environment, converting only the clothing image into 2D data, and converting the clothing image into 2D data.
  • An object of the present invention is to provide an image processing device and an image processing method that automatically synthesize images of 2D data and human 2D data and output the 2D data as a worn image.
  • An image processing device that is one aspect of the present invention includes Means for generating a first 3D clothed avatar, a second 3D clothed avatar, and a synthetic clothing image using setting data, a 3D avatar, and 3D clothing data associated with the synthetic model image;
  • An image of the entire expressed human body part of the synthetic model image is surrounded by edges based on edge information of a mask image of the expressed human body part associated with the synthetic model image and edge information of the synthetic clothing image. and for each divided region, the matching rate of the corresponding portion between the second 3D clothed avatar and the image of the entire expressed human body part of the synthesis model image divided into the regions.
  • the unnaturalness that conventionally occurs in a composite image is reduced, and manual correction by humans is no longer necessary.
  • FIG. 1 is an overall configuration diagram of an image processing system 1 including an image processing device 10, a user terminal 11, a 3D scanner 12, and an imaging device 13 according to an embodiment of the present invention.
  • FIG. 1 is a diagram illustrating an overview of processing executed by an image processing device 10, a user terminal 11, and an imaging device 13 according to an embodiment of the present invention.
  • 1 is a system configuration diagram of an image processing apparatus 10 according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating an example of a data structure of live photographic data 106 according to an embodiment of the present invention. It is a diagram showing an example of a data structure of a 3D avatar 108 according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing a processing flow in which the image processing device 10 generates a 3D clothed avatar and a synthetic clothing image.
  • FIG. 3 is a diagram illustrating a processing flow in which the image processing device 10 generates a composite face/hand image in which the anteroposterior relationship between clothing and a human body has been determined.
  • 3 is a diagram showing a processing flow in which the image processing device 10 generates a shadow image.
  • FIG. 3 is a diagram showing a processing flow in which the image processing device 10 generates a final clothed image.
  • (a) is a diagram illustrating an image in which edge information is extracted from mask images of faces and hands, and clothing images for synthesis
  • (b) is a diagram illustrating an image in which two pieces of edge information are combined.
  • (a) is a diagram showing an example in which the hand part of the synthesis model image is divided into several regions surrounded by edges
  • (b) is a diagram showing the conventional method of the generated synthesis face/hand image.
  • FIG. 2 is a diagram illustrating an example of a hand image of the technique and the present invention.
  • the image processing device 10 will be described as one device or system, but the various processes executed by the image processing device 10 may be configured to be distributed and executed by multiple devices or systems. .
  • the image processing device 10 executes image composition processing described in this specification. More specifically, the image processing device 10 dresses the 3D clothing data created by the user on a 3D avatar in a 3DCG environment, converts only the clothing image into 2D data, and converts the 2D data of the clothing image and the 2D data of the human being into 2D data.
  • the system automatically synthesizes the images and outputs a 2D data wearing image.
  • the user terminal 11 can be any type of device (e.g., PC, tablet terminal, etc.) capable of operating in a wired or wireless environment used by the user, but is not limited to a specific device or device. There isn't.
  • the user terminal 11 can generate 3D clothing data using a third-party application, and can generate 3D avatar data using a 3D scanner 12 or the like.
  • the user terminal 11 transmits 3D clothing data and 3D avatar data to the image processing device 10, transmits various instructions regarding image composition processing via an application provided by the image processing device 10, and performs image processing on the composition results. It can be received from the device 10.
  • the 3D scanner 12 is a device that has a function of generating 3D avatar data in response to instructions from the user terminal 11.
  • the imaging device 13 is a device that takes a live photograph of a real model, and is a device that takes an image of the model using one or more cameras, and can include any studio device.
  • the floor, wall, etc. of the imaging location may be any background such as a blue background or a green background.
  • the image processing device 10 uses images of real models and various setting data to dress 3D avatars with 3D clothing data created by the user in a 3DCG environment, converts only the clothing images into 2D data, and creates clothing images. automatically synthesizes the 2D data of the person and the 2D data of the person, and outputs the 2D data wearing image.
  • FIG. 2 is a diagram illustrating an overview of processing executed by the image processing device 10, the user terminal 11, and the imaging device 13 according to the embodiment of the present invention.
  • S1 in FIG. 2 is executed by the imaging device 13
  • S2 is executed by the user terminal 11
  • S3 to S6 are executed by the image processing device 10.
  • a user uses the imaging device 13 to photograph an actual model.
  • a model image of an actual model is used as a model image for synthesis in image synthesis processing that will be described later.
  • the model image for synthesis is 2D image data.
  • the imaging device 13 transmits the synthesis model image, camera setting data (camera angle, distance, etc.) and illumination setting data (brightness, etc.) at the time of imaging to the image processing device 10.
  • the image processing device 10 stores the compositing model image, camera setting data, and illumination setting data received from the imaging device 13 in the live photographing data 106.
  • the user terminal 11 has an application such as a third party, generates 3D clothing data, and also communicates with the 3D scanner 12 to generate a 3D avatar with the same pose as the actual model.
  • the pose of the 3D avatar may be the same as that of the real model at this stage, or it may be set as a basic pose and changed to the same pose during processing in S3, which will be described later.
  • the user terminal 11 transmits the 3D clothing data and 3D avatar to the image processing device 10, and the image processing device 10 stores the received 3D clothing data and 3D avatar in 3D clothing data 107 and 3D avatar 108, respectively.
  • the image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar and a synthetic clothing image. In response to an instruction from the user terminal 11, the image processing device 10 outputs a 2D image of only clothes (synthesis clothing image) and two types of 3D clothed avatars for use in the compositing process.
  • the image processing device 10 reads 3D clothing data from the 3D clothing data 107 and reads 3D avatars from the 3D avatar 108 in response to instructions from the user terminal 11.
  • the image processing device 10 can also read out a model image from the live-action photography data 106 and change the pose of the 3D avatar to the same pose as the pose of the read model image.
  • the image processing device 10 superimposes 3D clothing data on a 3D avatar in a 3DCG (computer graphics) space, performs predetermined position calculations, and determines the size and placement position of the 3D clothing data. and place the 3D clothing data at the appropriate position of the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.
  • 3DCG computer graphics
  • the image processing device 10 executes a cross simulation of 3D clothing data on a 3D avatar wearing 3D clothing data in a 3DCG space according to the body shape and pose of the 3D avatar,
  • the first 3D clothed avatar is stored in the 3D clothed avatar 109.
  • the image processing device 10 generates a 2D clothing image (also referred to as a "synthesis clothing image") based on the 3D clothing data excluding the 3D avatar, and stores it in the synthesis clothing image 110.
  • a 2D clothing image also referred to as a "synthesis clothing image”
  • the user uses any application to mechanically or manually generate face and hand mask images from the synthesis model image using a method such as binarization. That is, in response to a mask image generation instruction from the user terminal 11, the image processing device 10 generates mask images of the face and hands from the synthesis model image received from the imaging device 13. The image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter.
  • the described embodiment is explained using tops as an example of the type of clothing, so the parts of the human body that are exposed in the clothed state (exposed human body parts) are the face and/or hands, but other It should be understood that in the case of different types of clothing, the body parts exposed may vary depending on the type of clothing (eg, feet and/or ankles in the case of bottoms).
  • the image processing device 10 extracts key points of the face and hands from the synthesis model image, and sets a search range using the extracted key points as a bounding box.
  • the image processing device 10 searches for edges while moving from one side of the 3D face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. and continues searching the edge until it turns around.
  • the image processing device 10 extracts an image of the entire face and hand of the synthesis model image from the searched edge range.
  • the image processing device 10 executes depth information extraction processing to extract depth information (also referred to as "depth information") in the first 3D clothed avatar.
  • the image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter.
  • the image processing device 10 can extract edge information using depth information, because if the edges of the clothing image for synthesis are extracted as they are, wrinkles caused by the texture will become edge noise.
  • the image processing device 10 combines the edge information of the mask image of the face and hand with the edge information of the clothing image for synthesis, and creates several images of the entire face and hand of the model image for synthesis based on the combined images. To divide. The image processing device 10 determines the face and hand to be finally synthesized while calculating the matching rate with the corresponding area in the second 3D clothed avatar for each divided area of the entire face and hand of the synthesis model image. Extract images (face and hand images for synthesis).
  • the image processing device 10 reflects the camera setting data and proof setting data used when generating the synthesis model image on the first 3D clothed avatar read from the 3D clothed avatar 109, and calculates the shadow created when rendering is performed. (shade) information is generated as a shade image and stored in the 3D clothed avatar 109.
  • the image processing device 10 executes a first clothing composition process of superimposing a composition clothing image on a composition model image and outputting a first clothed model image.
  • the image processing device 10 executes a second clothed image generation process that generates a final clothed model image by superimposing the synthetic face and hand images on the first clothed model image and further superimposing the shadow image. do.
  • FIG. 3 is a system configuration diagram of the image processing device 10 according to the embodiment of the present invention.
  • the image processing device 10 may be configured to be placed on a cloud system or on an in-house network.
  • the image processing device 10 includes a control section 101, a main storage section 102, an auxiliary storage section 103, and an interface (IF) section 104, which are interconnected by a bus 120 or the like, like a general computer. , and an output section 105.
  • the auxiliary storage unit 103 stores programs that implement each function of the image processing device 10 and data handled by the programs.
  • the auxiliary storage unit 103 includes live photographed data 106, 3D clothing data 107, 3D avatar 108, 3D clothing avatar 109, and synthetic clothing image 110 in a file/database format.
  • the image processing device 10 can read or update information stored in the live-action photography data 106, 3D clothing data 107, 3D avatar 108, 3D clothed avatar 109, and synthetic clothing image 110.
  • Each program stored in the auxiliary storage unit 103 is executed by the image processing device 10.
  • the control unit 101 also called a central processing unit (CPU), controls each component of the image processing device 10 and calculates data, and reads various programs stored in the auxiliary storage unit 103 to the main storage unit 102. and execute it.
  • the main storage unit 102 is also called main memory, and stores various received data, computer-executable instructions, and data after arithmetic processing using the instructions.
  • the auxiliary storage unit 103 is a storage device such as a hard disk (HDD) or SSD (Solid State Drive), and stores data and programs for a long period of time.
  • HDD hard disk
  • SSD Solid State Drive
  • FIG. 3 describes an embodiment in which the control unit 101, the main storage unit 102, and the auxiliary storage unit 103 are provided inside the same computer, as another embodiment, the image processing apparatus 10
  • the image processing apparatus 10 By using a plurality of main storage units 102 and auxiliary storage units 103, it is also possible to implement parallel distributed processing by a plurality of computers.
  • a plurality of servers for the image processing apparatus 10 may be installed, and one auxiliary storage unit 103 may be shared by the plurality of servers.
  • the IF unit 104 serves as an interface for transmitting and receiving data with other systems and devices, and also provides an interface for receiving various commands and input data (various masters, tables, etc.) from the system operator.
  • the output unit 105 provides a display screen for displaying processed data, a printing means for printing the data, and the like.
  • Components similar to the control unit 101, main storage unit 102, auxiliary storage unit 103, IF unit 104, and output unit 105 also exist in the user terminal 11 and the imaging device 13.
  • the live-action shooting data 106 stores a model image (2D image data) of a real model, a mask image of the face and hands of the real model, and camera setting data and lighting setting data at the time of live-action shooting.
  • FIG. 4 is a diagram illustrating an example of the data structure of the live-action photography data 106 according to the embodiment of the present invention.
  • the live-action shooting data 106 can include a live-action shooting ID 401, a model image 402, a mask image 403, camera setting data 404, and lighting setting data 405, but is not limited to these data items and may include other data items. can also be included.
  • the live-action shooting ID 401 is an identifier that identifies a model at the time of live-action shooting and data associated with the model.
  • the model image 402 is 2D model image data of a real model, and is also called a "synthesis model image.”
  • the mask image 403 is a mask image of the model's face and hands generated from the synthesis model image.
  • Camera setting data 404 indicates camera setting data at the time of live-action photography, such as camera angle and distance.
  • Lighting setting data 405 indicates lighting setting data, such as brightness, at the time of live-action photography.
  • the 3D clothing data 107 stores 3D clothing data generated by the user.
  • the 3D clothing data may be stored in association with attribute information (eg, clothing category, color, shape, etc.) to facilitate image selection.
  • the 3D avatar 108 stores 3D avatar data generated by the user.
  • the 3D avatar is created by the user so as to have the same pose as the model image at the time of live-action photography.
  • FIG. 5 is a diagram showing an example of the data structure of the 3D avatar 108 according to the embodiment of the present invention.
  • the 3D avatar 108 can include a 3D avatar ID 501, a 3D avatar 502, and a live-action shooting ID 401, but is not limited to these data items and can also include other data items.
  • the 3D clothed avatar 109 stores image data of a 3D clothed avatar obtained by superimposing 3D clothing data on the 3D avatar and performing predetermined processing.
  • FIG. 6 is a diagram showing an example of the data structure of the 3D clothed avatar 109 according to the embodiment of the present invention.
  • the 3D clothed avatar 109 can include a 3D clothed avatar ID 601, a first 3D clothed avatar 602, a second 3D clothed avatar 603, shadow information 604, a shadow image 605, a 3D avatar ID 501, and a live-action shooting ID 401. It is not limited to these data items and can also include other data items.
  • the 3D clothed avatar ID 601 is an identifier that identifies the 3D clothed avatar generated by the image processing device 10.
  • the first 3D clothed avatar 602 shows image data of a 3D clothed avatar that has undergone 3D cloth simulation.
  • a second 3D clothed avatar 603 represents image data of a 3D clothed avatar that has been subjected to 3D rendering processing by reflecting camera setting data and certification setting data on the first 3D clothed avatar.
  • the shadow information 604 and the shadow image 605 are the shadow information and shadow information generated when rendering is executed by reflecting the camera setting data and proof setting data when generating the synthesis model image for the first 3D clothed avatar. The shadow images are shown respectively.
  • the 3D avatar ID 501 is an identifier for identifying the 3D avatar from which the 3D clothed avatar is generated
  • the live-action shooting ID 401 is an identifier for identifying the live-action shooting associated with the 3D avatar.
  • the 3D avatar ID 501 and live-action shooting ID 401 make it easier to acquire various setting data and the like when shooting a real model.
  • the synthetic clothing image 110 stores 2D clothing data generated based on the 3D clothing data of the second 3D clothing avatar.
  • FIG. 7 is a diagram illustrating an example of a data structure of a clothing image for synthesis 110 according to an embodiment of the present invention.
  • the synthetic clothing image 110 can include a synthetic clothing image ID 701, a synthetic clothing image 702, a live-action photography ID 401, and a 3D clothed avatar ID 601, but is not limited to these data items and may include other data items. can also be included.
  • the compositing clothing image ID 701 is an identifier that identifies 2D clothing image data used in the image compositing process according to the embodiment of the present invention.
  • a clothing image for composition 702 indicates 2D clothing image data used for image composition processing.
  • the live-action shooting ID 401 is an identifier that identifies the live-action shooting associated with the 3D clothed avatar from which the synthetic clothing image is generated.
  • the 3D clothed avatar ID 601 is an identifier of a 3D clothed avatar associated with 3D cloth data that is the original data of the clothing image for synthesis.
  • the image processing device 10 creates a final clothed model using a compositing clothing image (2D), a compositing model image (2D), various setting data, 3D avatars, and 3D clothing data.
  • the processing flow for generating an image will be explained. 8 to 11 show the processing contents of S3 to S6 in FIG. 2, respectively. Either of S4 and S5 may be performed first.
  • FIG. 8 shows data generated by the image processing device 10 through the processing described above with reference to S1 and S2 of FIG. A processing flow for generating a 3D clothed avatar and a synthetic cloth image using data is shown.
  • the image processing device 10 provides the user terminal 11 with an application for generating a 3D clothed avatar image, and performs the process based on user instructions received via the user terminal 11.
  • the user terminal 11 selects a synthesis model image, 3D clothing data, and 3D avatar to be processed through the provided application, and sends a selection instruction to the image processing device 10.
  • the image processing device 10 reads the selected model image from the live-action photography data 106, reads 3D clothing data from the 3D clothing data 107, and converts the selected 3D avatar into a 3D avatar 108. Read from.
  • the image processing device 10 can change the pose of the 3D avatar to match the pose of the read model image. Through this processing, the pose of the model image and the pose of the 3D avatar match, the model image and the 3D avatar are associated, and the selected live-action shooting ID 401 is stored in the 3D avatar 108.
  • the user terminal 11 transmits a placement instruction to the image processing device 10 to place the 3D clothing data at an appropriate position of the 3D avatar on the 3DCG space of the application.
  • the image processing device 10 adjusts the size and placement position of the 3D clothing data by superimposing the 3D clothing data on the 3D avatar in the 3DCG space and performing predetermined position calculations in accordance with placement instructions from the user terminal 11. and place the 3D clothing data at an appropriate position on the 3D avatar. Through this process, the 3D clothing data is put on the 3D avatar.
  • Cloth simulation refers to a technology that physically simulates the movement of cloth such as clothing. For example, physical calculations are performed on the cloth, such as simulating the wrinkles in clothing when a 3D avatar wears it.
  • the image processing device 10 executes a cross simulation of 3D clothing data in a 3DCG space on a 3D avatar wearing 3D clothing data according to the body shape and pose of the 3D avatar.
  • the cross-simulated 3D clothed avatar is stored in the first 3D clothed avatar 602 of the 3D clothed avatar 109 .
  • the image processing device 10 reads camera setting data and lighting setting data associated with the model image selected in S801 from the live-action photography data 106.
  • the image processing device 10 reflects the read camera setting data and lighting setting data on the cross-simulated 3D clothed avatar (first 3D clothed avatar) in the 3DCG space of the application, and applies a predetermined shader.
  • the image processing device 10 extracts 3D clothing data by removing the 3D avatar from the 3D clothing avatar (second 3D clothing avatar) that has undergone 3D rendering processing.
  • the image processing device 10 generates a 2D clothing image (herein referred to as a "synthesis clothing image") based on the extracted 3D clothing data, and stores it in the synthesis clothing image 110.
  • the image processing device 10 uses a face and hand mask image, a composition clothing image, and a 3D clothed avatar to generate a composite face and hand image in which the front-back relationship between the clothing and the human body has been determined.
  • the processing flow is shown. Note that this processing flow assumes that the image processing device 10 communicates with the user terminal 11 through an arbitrary application to generate mask images of the face and hands from the synthesis model image.
  • the term "hand” is used to indicate any of the wrist, palm, and fingers of a human body from the shoulder to the fingertips, but these may vary depending on the design of the clothing. .
  • the image processing device 10 extracts key points of the face and hands from the synthesis model image associated with the live-action shooting ID 401 to be processed, and sets a search range using the extracted key points as a bounding box.
  • the image processing device 10 searches for edges while moving from one side of the face or hand to the other side (for example, from the left outer part to the right outer part), and reaches the other side of the face or hand. Continue searching the edge until it turns around.
  • the image processing device 10 extracts the entire face and hand image of the synthesis model image from the searched edge range.
  • the image processing device 10 extracts edge information of a face and hand mask image that has been generated in advance using an arbitrary filter.
  • the upper part of FIG. 12(a) shows an image of extracting edge information from mask images of faces and hands.
  • the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109, and performs depth information extraction processing on the read first 3D clothed avatar 602. Through this process, the image processing device 10 can acquire depth information at each position of the first 3D clothed avatar 602. Depth information makes it possible to distinguish between wrinkles and contour lines in clothing.
  • the image processing device 10 extracts edge information of the clothing image for synthesis using an arbitrary filter. If the edge information of the clothing image for synthesis is extracted as is, wrinkles due to the texture of the clothing may become noise, so the image processing device 10 extracts the edge information of the clothing image for synthesis using the depth information acquired in S902. Can be done.
  • the lower part of FIG. 12(a) shows an image of extracting edge information from a clothing image for synthesis.
  • the order of the processing in S901 and the processing in S902 and S903 is not particularly limited and may be performed first. That is, the processing in S902 and S903 may be performed after the processing in S901, or the processing in S901 may be performed after the processing in S902 and S903. Alternatively, both may be processed in parallel.
  • the image processing device 10 combines the edge information of the clothing image for synthesis with the edge information of the face and hand mask images.
  • FIG. 12(b) shows an image in which two pieces of edge information are combined.
  • the image processing device 10 divides the entire face and hand image of the synthesis model image extracted in S901 into regions surrounded by edges, based on the edge information combined in S904.
  • FIG. 13A is an example showing that the hand portion of the synthesis model image is divided into several regions surrounded by edges.
  • the image processing device 10 reads the second 3D clothed avatar 603 from the 3D clothed avatar 109, and combines the read second 3D clothed avatar 603 with the compositing model image that has been divided into several regions.
  • the corresponding parts of the face and the entire hand image (for example, the thumbs of both left hands) are compared for each divided region.
  • the image processing device 10 calculates the matching rate between the two, and determines that a portion where the matching rate is equal to or higher than a predetermined threshold value (X) is an actually visible portion. Based on the determination results for each area, the image processing device 10 determines a certain area of the entire face and hand image of the synthesis model image as a visible part (visible part) or as an invisible part (visible part).
  • the face and hand images to be finally synthesized are extracted.
  • the image processing device 10 since the matching rate for the thumb portion of the left hand was less than the threshold value (X), the image processing device 10 determines that the image of the thumb portion of the left hand should not be combined with the face and hand images to be finally combined. Processing is being performed not to include it.
  • the threshold (X) can be changed based on depth information. Therefore, the image processing apparatus 10 can change the threshold value (X) of each position based on the depth information of each position acquired in S902. Therefore, the value of the threshold (X) can change for each divided region surrounded by edges.
  • FIG. 13(b) shows an example of the generated synthetic face/hand images of the prior art and the hand image of the present invention.
  • the thumb portion is visible because the conventional general image synthesis process does not perform the above-described matching determination. In the actual pose, the thumb is hidden behind the folds of the clothing and cannot be seen, resulting in an unnatural image.
  • the thumb part is hidden behind the folds of the clothing and cannot be seen.
  • the image processing device 10 determines that the matching rate for this thumb portion is less than the threshold (X), and does not include this thumb portion in the face/hand image for synthesis since it is an invisible portion.
  • the image processing device 10 renders the cross-simulated 3D clothed avatar (first 3D clothed avatar) by reflecting the camera setting data and proof setting data when generating the synthesis model image.
  • the processing flow for generating a shade image based on shade information generated when executing the process is shown.
  • the image processing device 10 reads the first 3D clothed avatar 602 from the 3D clothed avatar 109 based on the live-action shooting ID 401 to be processed.
  • the image processing device 10 also queries the live-action shooting data 106 based on the live-action shooting ID 401 and reads out the corresponding camera setting data 404 and illumination setting data 405.
  • the image processing device 10 performs rendering on the read first 3D clothed avatar 602 by reflecting the corresponding camera setting data 404 and lighting setting data 405, and determines whether or not light is shining on it. Calculate and perform shading.
  • the image processing device 10 In S1003, the image processing device 10 generates a shadow image based on shadow information that is a shading calculation result.
  • the image processing device 10 stores the shadow information and the shadow image as the shadow information 604 and the shadow image 605 of the 3D clothed avatar 109, respectively.
  • FIG. 11 shows a first clothing composition process in which the image processing device 10 outputs a first clothed model image by superimposing a composition clothing image on a composition model image associated with the live-action shooting ID 401 to be processed; A second clothed compositing process that generates a final clothed model image by superimposing the compositing face/hand image on the first clothed model image that is the output of the first clothed compositing process, and further overlapping the shadow image.
  • the processing flow for generating the final clothed image by executing the following is shown.
  • the image processing device 10 executes a first clothing composition process. More specifically, the image processing device 10 reads the model image 402 from the live-action photography data 106 based on the live-action photography ID 401 to be processed, and queries the clothing image for composition 110 using the live-action photography ID 401. The composite clothing image 702 is read out. The image processing device 10 generates a first clothed model image by superimposing the synthetic clothing image on the synthetic model image.
  • the image processing device 10 executes a second clothing composition process. More specifically, the image processing device 10 generates the final clothed model image by superimposing the synthetic face/hand image and the shadow image on the generated first clothed model image. do. The image processing device 10 provides the generated final clothed model image to the user terminal 11.
  • the above-described processing enables the image processing device 10 to perform image synthesis processing while estimating the context of a person and clothing more precisely. According to the present invention, it is difficult to identify the front and back relationship between a person and clothing, and problems such as low accuracy of output images such as the clothing area that should normally be on the person becoming invisible can be solved. .
  • the principles of the present invention can also be applied to parts of the human body other than the face and hands, depending on the type of clothing, to create a composite image. be able to generate.
  • the parts of the human body that are exposed when wearing tops and bottoms are different.
  • the exposed body parts may be the face and/or hands, and in the case of bottoms, the exposed body parts may be the hands and/or ankles.
  • “exposed human body parts” refer to the face, hands, feet, ankles, etc., depending on the type of clothing.
  • the objects are not limited to the human body and clothing.
  • the target objects may be a human body and a vehicle (car, motorcycle, bicycle, etc.).
  • the number of objects may be three or more.
  • the present invention can be implemented as, for example, a system, device, method, program, storage medium, or the like.
  • Image processing system 10 Image processing device 11 User terminal 12 3D scanner 13 Imaging device 14 Network 101 Control unit 102 Main storage unit 103 Auxiliary storage unit 104 Interface (IF) unit 105 Output unit 106 Live-action shooting data 107 3D clothing data 108 3D avatar 109 3D clothed avatar 110 Clothes image for synthesis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente invention réduit le caractère non naturel dans une image synthétisée et évite la nécessité d'une correction manuelle par une personne. Ce dispositif de traitement des images : divise un visage et des mains complètes dans une image de modèle pour une synthèse en régions délimitées par des bords, sur la base d'informations de bord concernant une image de masque d'un visage et des mains dans l'image de modèle pour la synthèse et des informations de bord concernant une image de vêtements pour la synthèse ; calcule, pour chaque région divisée, le taux de cohérence entre un second avatar fermé 3D et des parties qui correspondent à des images du visage et des mains complètes dans l'image de modèle pour la synthèse, le visage et les mains ayant été divisés en régions ; et génère une image de visage/mains pour la synthèse pour laquelle une relation avant/après des vêtements et une personne a déjà été évaluée. Le dispositif de traitement des images : génère, pour un premier avatar fermé 3D, une image d'ombre dans laquelle un rendu est exécuté pour refléter des données de réglage qui sont associées à l'image de modèle pour la synthèse ; délivre une première image de modèle fermé dans laquelle l'image de vêtements est superposée sur l'image de modèle pour la synthèse ; superpose l'image de visage/mains pour la synthèse sur la première image de modèle fermé ; et superpose en outre l'image d'ombre sur l'image résultante pour générer une image de modèle fermée finale.
PCT/JP2023/022231 2022-07-08 2023-06-15 Dispositif de traitement des images et procédé de traitement des images WO2024009721A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022110527A JP2024008557A (ja) 2022-07-08 2022-07-08 画像処理装置、画像処理方法、及びプログラム
JP2022-110527 2022-07-08

Publications (1)

Publication Number Publication Date
WO2024009721A1 true WO2024009721A1 (fr) 2024-01-11

Family

ID=89453185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/022231 WO2024009721A1 (fr) 2022-07-08 2023-06-15 Dispositif de traitement des images et procédé de traitement des images

Country Status (2)

Country Link
JP (1) JP2024008557A (fr)
WO (1) WO2024009721A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09259252A (ja) * 1996-03-22 1997-10-03 Hitachi Ltd 画像処理方法
JP2017037637A (ja) * 2015-07-22 2017-02-16 アディダス アーゲー 人工ピクチャを生成する方法および装置
US20170372515A1 (en) * 2014-12-22 2017-12-28 Reactive Reality Gmbh Method and system for generating garment model data
US10540757B1 (en) * 2018-03-12 2020-01-21 Amazon Technologies, Inc. Method and system for generating combined images utilizing image processing of multiple images
WO2021063829A1 (fr) * 2019-09-30 2021-04-08 Reactive Reality Ag Procédé et produit programme d'ordinateur destiné à traiter des données de modèle d'un ensemble de vêtements
JP2022530710A (ja) * 2020-02-24 2022-06-30 深▲チェン▼市商▲湯▼科技有限公司 画像処理方法、装置、コンピュータ機器及び記憶媒体

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09259252A (ja) * 1996-03-22 1997-10-03 Hitachi Ltd 画像処理方法
US20170372515A1 (en) * 2014-12-22 2017-12-28 Reactive Reality Gmbh Method and system for generating garment model data
JP2017037637A (ja) * 2015-07-22 2017-02-16 アディダス アーゲー 人工ピクチャを生成する方法および装置
US10540757B1 (en) * 2018-03-12 2020-01-21 Amazon Technologies, Inc. Method and system for generating combined images utilizing image processing of multiple images
WO2021063829A1 (fr) * 2019-09-30 2021-04-08 Reactive Reality Ag Procédé et produit programme d'ordinateur destiné à traiter des données de modèle d'un ensemble de vêtements
JP2022530710A (ja) * 2020-02-24 2022-06-30 深▲チェン▼市商▲湯▼科技有限公司 画像処理方法、装置、コンピュータ機器及び記憶媒体

Also Published As

Publication number Publication date
JP2024008557A (ja) 2024-01-19

Similar Documents

Publication Publication Date Title
JP7370527B2 (ja) 衣服の3次元モデルデータを生成するための方法およびコンピュータプログラム
US10685454B2 (en) Apparatus and method for generating synthetic training data for motion recognition
US10628666B2 (en) Cloud server body scan data system
US7663648B1 (en) System and method for displaying selected garments on a computer-simulated mannequin
CN106373178B (zh) 生成人工图像的装置和方法
US9167155B2 (en) Method and system of spacial visualisation of objects and a platform control system included in the system, in particular for a virtual fitting room
US9639635B2 (en) Footwear digitization system and method
JP2019510297A (ja) ユーザの真実の人体モデルへの仮想的な試着
JP2013235537A (ja) 画像作成装置、画像作成プログラム、及び記録媒体
Li et al. In-home application (App) for 3D virtual garment fitting dressing room
KR20130089649A (ko) 3차원 이미지에서 컨텐츠를 검열하는 방법 및 장치
KR20060108271A (ko) 디지털 패션 디자인용 실사 기반 가상 드레이핑 시뮬레이션방법
KR20150124518A (ko) 증강 현실 기반 가상 피팅을 위한 가상 의상 생성 장치 및 방법
KR101586010B1 (ko) 증강 현실 기반 가상 피팅을 위한 의상의 물리적 시뮬레이션 장치 및 방법
WO2018182938A1 (fr) Procédé et système de balayage de corps sans fil à ultra faible encombrement
WO2024009721A1 (fr) Dispositif de traitement des images et procédé de traitement des images
JP2012120080A (ja) 立体写真撮影装置
JP2017188071A (ja) 柄替えシミュレーション装置、柄替えシミュレーション方法、及びプログラム
Siegmund et al. Virtual Fitting Pipeline: Body Dimension Recognition, Cloth Modeling, and On-Body Simulation.
KR101780496B1 (ko) 컴퓨터그래픽 툴에 의한 캐릭터 모델링 기반 3d 디지털액터 영상 구현방법
JP7388751B2 (ja) 学習データ生成装置、学習データ生成方法および学習データ生成プログラム
WO2018151612A1 (fr) Système et procédé de mappage de texture
WO2015144563A1 (fr) Système et procédé de traitement d'image
CA2289413C (fr) Systeme et methode de presentation de vetements selectionnes sur un mannequin virtuel
KR101803064B1 (ko) 3차원 모델 복원 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23835255

Country of ref document: EP

Kind code of ref document: A1