CN117152302A

CN117152302A - Method, device, equipment and storage medium for generating display image of target object

Info

Publication number: CN117152302A
Application number: CN202310967809.0A
Authority: CN
Inventors: 刘莹; 程京; 徐昊; 陈坤
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-12-01

Abstract

The disclosure relates to the technical field of image processing, and provides a method, a device, equipment and a storage medium for generating a display image of a target object. Wherein the method comprises the following steps: acquiring an original image of a target object, and extracting a main body region image of the target object from the original image; acquiring description information of a target object, and generating a candidate background image applicable to the target object based on the description information; and fusing the main body region image with the candidate background image to generate a display image of the target object. By implementing the technical scheme, candidate background images in various styles can be conveniently generated in batches, the existing material library or picture templates are not relied on, and the method can be suitable for the generation scenes of display images in various categories.

Description

Method, device, equipment and storage medium for generating display image of target object

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to a method, a device, equipment and a storage medium for generating a display image of a target object.

Background

With popularization and development of internet technology, online purchasing of commodities becomes a mainstream trend, so that generation of commodity graphs is of great importance. The current commodity graph is mainly realized based on a manual manufacturing or machine manufacturing mode. The time consumption of manual production is long, the manual production depends on picture materials, and batch production is difficult; machine fabrication, although it can be mass-produced, still relies on existing libraries of material or manually set photo templates, with limited richness in the style of commodity drawings.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a method, apparatus, device, and storage medium for generating a display image of a target object, so as to solve the problem that commodity graph generation depends on a material library or a picture template.

In a first aspect, an embodiment of the present disclosure provides a method for generating a display image of a target object, including: acquiring an original image of a target object, and extracting a main body region image of the target object from the original image; acquiring description information of a target object, and generating a candidate background image applicable to the target object based on the description information; and fusing the main body region image with the candidate background image to generate a display image of the target object.

According to the method for generating the display images of the target objects, the main body area images of the target objects are extracted from the original images, and the corresponding candidate background images are generated based on the description information of the main body area images, so that the candidate background images for the target objects can be automatically generated based on the characteristics of the target objects, the candidate background images with various styles can be conveniently generated in batches, and then the main body area images and the candidate background images are fused to obtain the batch display images, so that the method is not dependent on the existing material library or picture template any more, and can be suitable for the generation scenes of the display images of various categories.

In a second aspect, an embodiment of the present disclosure provides a device for generating a display image of a target object, including: the image acquisition module is used for acquiring an original image of the target object and extracting a main body area image of the target object from the original image; the descriptive information acquisition module is used for acquiring descriptive information of the target object and generating candidate background images applicable to the target object based on the descriptive information; and the fusion module is used for fusing the main area image and the candidate background image to generate a display image of the target object.

In a third aspect, an embodiment of the present disclosure provides an electronic device, where the electronic device includes a memory and a processor, where the memory is configured to store a computer program, and when the computer program is executed by the processor, implement a method for generating a display image of a target object according to the first aspect or any implementation manner corresponding to the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer readable storage medium, where the computer readable storage medium is configured to store a computer program, where the computer program when executed by a processor implements a method for generating a presentation image of a target object according to the first aspect or any implementation manner corresponding to the first aspect.

Drawings

The features and advantages of the various embodiments of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be construed as limiting the disclosure in any way, in which:

fig. 1 is a flowchart of a method for generating a presentation image of a target object according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of generating a presentation image of a target object according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method of generating a presentation image of a target object according to an embodiment of the present disclosure;

FIG. 4 is a block diagram of a structure of a device for generating a presentation image of a target object according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by those skilled in the art without the inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

The existing commodity graph manufacturing modes mainly comprise the following categories:

manual production, such as designers or operators, is performed for commodity and commodity selling scenes, and commodity diagrams meeting target requirements are manually produced, but the mode has the following defects:

(1) The time is longer, usually, the production time of a commodity graph is at least 20-30 minutes, and the time is mainly consumed for searching and selecting production materials, typesetting layout, color matching adjustment and the like;

(2) The method is difficult to generate in batches, and because the time consumption is long and the time consumption depends on the picture materials to be manufactured, the manual commodity picture is often only suitable for the generation scenes of a small number of pictures, and the batch generation cannot be supported.

And (3) machine manufacturing, wherein the model can select materials suitable for current commodities based on a material library according to the existing mature intelligent typesetting or picture generation model algorithm, and intelligently typeset pictures, and mass-produce commodity pictures for manual selection or direct external delivery. However, this approach has the following drawbacks:

(1) Depending on the existing material library, intelligent selection is needed from the material pool, such as selecting background elements, button elements, decoration elements and the like corresponding to the commodity diagram, and combining and superposing to generate a target commodity diagram;

(2) Depending on the manually set picture templates, such as the picture templates set by a designer, the result pictures are output according to the corresponding layout, resulting in direct correlation between the generated style richness and the number of the manual templates.

Based on the method, the background image and the related element image can be generated based on the characteristics of the target object, the dependence on an original material library and an artificial picture template is solved, the method can be adapted to the generation scenes of the display images of various categories, and the method is convenient for generating candidate background images of various types in batches.

According to an embodiment of the present invention, there is provided an embodiment of a method for generating a presentation image of a target object, it being noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in an order different from that herein.

In this embodiment, a method for generating a display image of a target object is provided, which may be used in an electronic device, such as a computer, a mobile phone, a notebook computer, etc., fig. 1 is a flowchart of a method for generating a display image of a target object according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:

Step S101, an original image of the target object is acquired, and a subject area image of the target object is extracted from the original image.

The target object is used for representing various kinds and models of goods, such as refrigerators, air conditioners, kettles, snacks and the like. The original image is an image captured or acquired in advance with respect to the target object in various space use environments. The subject area image is a subject image of the target object.

The original image comprises a target object main body and a current space use environment thereof, aiming at the original image, the position of the target object main body can be identified, the main body outline of the target object is identified, and a main body area image is extracted from the original image according to the main body outline.

Step S102, the description information of the target object is acquired, and candidate background images applicable to the target object are generated based on the description information.

The description information is feature text information for describing the target object. The candidate background image is an image of the space usage environment in which the target object is located. Specifically, as the description information can reflect the characteristics of the target object, the description information and the background generation model of the preset training are combined, so that the corresponding candidate background image can be generated.

Taking a refrigerator as an example, the description information of the refrigerator is a warm house environment, and a plurality of candidate background images suitable for the refrigerator can be generated by combining the description information, and particularly the warm house scenes such as a restaurant background image, a kitchen background image, a living room background image and the like can be included.

Step S103, fusing the main area image and the candidate background image to generate a display image of the target object.

And combining the main body region image with the candidate background image for feature matching, and effectively fusing the main body region image with the candidate background image according to a feature matching result so as to highlight the main body region image in the candidate background image, so that the fusion between the candidate background image and the main body region image is more harmonious, and a display image aiming at a target object is obtained.

According to the method for generating the display images of the target objects, the main body area images of the target objects are extracted from the original images, and the corresponding candidate background images are generated based on the description information of the main body area images, so that the candidate background images for the target objects can be automatically generated based on the characteristics of the target objects, the candidate background images with various styles can be conveniently generated in batches, and then the main body area images and the candidate background images are fused to obtain the batch display images, so that the method does not depend on the existing material library or picture templates any more, and can be suitable for the generation scenes of the display images with various categories.

In this embodiment, a method for generating a display image of a target object is provided, which may be used in an electronic device, such as a computer, a mobile phone, a notebook computer, etc., fig. 2 is a flowchart of a method for generating a display image of a target object according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

step S201, an original image of the target object is acquired, and a subject area image of the target object is extracted from the original image. The detailed description refers to the relevant descriptions of the corresponding steps in the above embodiments, and will not be repeated here.

Step S202, the description information of the target object is acquired, and candidate background images applicable to the target object are generated based on the description information.

For the acquisition of the description information, the object identification acquisition of the target object can be adopted, and the attribute characteristic determination of the target object in the original image can be identified.

In some specific embodiments, the step S202 may include:

in step S2021, the object identifier of the target object is acquired, and the object information matching the object identifier is queried.

Wherein the object information includes at least one of image information, text information, audio information, and video information of the target object.

The object identifier is a unique property for characterizing the target object, and the object identifier may be represented by a code, for example, a bar code, a product serial number, etc. corresponding to the target object. The object information represents information describing the target object, and the object information is associated with the object identification, and specifically includes one or more of image information, text information, audio information, and video information, such as an original image of the target object, a target object detail page, a title and text description of the target object, category information of the target object, and the like.

Step S2022 parses one or more description contents related to the target object from the object information, and generates description information of the target object based on the one or more description contents.

The descriptive content is descriptive text of the target object. After determining the object information, the electronic device may identify the description content contained in the object information, and fuse the identified one or more description contents into description information.

For example, for a detail page picture, text content in the detail page can be recognized through OCR, and the recognized text content is taken as descriptive content; for the original image, the content contained in the original image can be extracted through a model of speaking through pictures, and the content is converted into descriptive content in a text form; for video information, the content contained in each frame of image can be extracted to obtain the description content of the target object; for audio information, conversion of speech to text may be extracted to obtain corresponding descriptive content. And finally, combining all the description contents to generate the description information of the target object.

Step S2023 generates a candidate background image applicable to the target object based on the description information. And generating corresponding candidate background images by combining the description information and a background generation model of preset training. The training method for the background generation model is described in further detail below, and is not described here.

In some specific embodiments, the step S202 may further include:

in step S2024, the attribute features of the target object are identified in the original image of the target object, and the description information of the target object is generated based on the identified attribute features.

The attribute features represent characteristics of the target object such as class, color, shape, and the like. When the target object does not have the corresponding object identification, the target object main body in the original image can be identified, and the attribute characteristics of the target object are determined. And then, combining the content understanding model and the attribute characteristics of the target object to generate the description information of the target object.

Step S2025 generates a candidate background image applicable to the target object based on the description information. And generating corresponding candidate background images by combining the description information and a background generation model of preset training. The training method for the background generation model is described in further detail below, and is not described here.

In some alternative embodiments, the description information of the target object is input into the trained background generation model, and the candidate background image of the target object is output. The training method of the background generation model is as follows:

step a1, acquiring an image sample set of a target object, wherein the image sample set comprises a plurality of image samples of the target object.

Step a2, extracting a background area image except for a target object from any target image sample in the image sample set.

And a step a3 of generating a description sample for describing the target object in the target image sample, and processing the description sample by utilizing a background generation model to output a predicted background image corresponding to the description sample.

And a step a4 of comparing the predicted background image with the background area image and generating error information based on the comparison result so as to correct the background generation model through the error information.

The image sample set is a pre-acquired original image set aiming at the target object, and comprises a plurality of original images under different use scenes, wherein the original images are taken as image samples. Specifically, the image sample set may be obtained by photographing with an image pickup apparatus, and may also be obtained by on-line collection, and the manner of collecting the image sample set is not particularly limited herein.

The image sample comprises the target object and the space use environment where the target object is located, and the space use environment exists as the background of the target object. For any target image sample in the image sample set, image segmentation is performed on the target image sample to extract a target object and a background area image.

And describing a target object in the target image sample, and generating a description sample corresponding to the target object. Then, the description sample is input into a background generation model to be subjected to prediction processing, and a prediction background image corresponding to the description sample is output.

And comparing the predicted background image with the background area image to generate a comparison result. And determining error information between the predicted background image and the background area image through the comparison result, and carrying out iterative training on the background generation model by combining the error information so as to realize correction on the background generation model, so that the output result of the background generation model is more accurate.

Here, training of the background image generation model is performed in combination with the background area image in the target image sample and the description text of the target object, so that candidate background images of the target object are generated in batches according to the description information.

Step S203, fusing the main area image and the candidate background image to generate a display image of the target object. The detailed description refers to the corresponding related descriptions of the above embodiments, and will not be repeated here.

According to the method for generating the display image of the target object, the object information related to the target object is determined through the object identification of the target object, and the description information can be obtained through analyzing the object information, so that the description information can be obtained only by relying on the object identification, and the follow-up batch generation of candidate background images is facilitated. And the description information is determined through the attribute characteristics of the target object, so that the description information can be matched with the target object, and the generation accuracy of the candidate background image is ensured.

In this embodiment, a method for generating a display image of a target object is provided, which may be used in an electronic device, such as a computer, a mobile phone, a notebook computer, etc., fig. 3 is a flowchart of a method for generating a display image of a target object according to an embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:

step S301, an original image of the target object is acquired, and a subject area image of the target object is extracted from the original image. The detailed description refers to the relevant descriptions of the corresponding steps in the above embodiments, and will not be repeated here.

In step S302, description information of the target object is acquired, and candidate background images applicable to the target object are generated based on the description information. The detailed description refers to the relevant descriptions of the corresponding steps in the above embodiments, and will not be repeated here.

Step S303, fusing the main area image and the candidate background image to generate a display image of the target object.

Specifically, the step S303 may include:

in step S3031, one or more candidate regions for matching the target object are identified in the candidate background images.

The candidate region is a spatial region suitable for the target object. Specifically, the candidate region may be a spatial region for placing a target object, such as a restaurant, living room, kitchen, floor, table top, wall surface, cooking bench, or the like. The candidate region matches a target object, such as a kettle and a table top, a refrigerator and a floor, etc. Furthermore, the candidate region may also be a scene adapted to the target object. For example, the target object is a refrigerator, and the candidate region may then characterize the environment of the home. For another example, the target object is a vehicle-mounted fragrance, and the candidate region may then characterize the vehicle-mounted environment. After the candidate background image is obtained, identifying environmental features contained in the candidate background image to determine one or more candidate areas capable of matching the target object.

For example, when the target object is a refrigerator, a spatial region for matching the refrigerator may be identified from candidate background images generated for the refrigerator, where the spatial region may be a restaurant floor, a kitchen floor, or any other region suitable for placing the refrigerator, which is not specifically limited herein.

Step S3032, the attribute features of the target object are identified in the subject region image, and the actual matching region is determined in the candidate region based on the attribute features.

The actual matching region is a region that matches the target object. The attribute features are used to characterize the class, color, shape, etc. of the target object. Since the subject area image is an area where the subject of the target object is located in the original image, the attribute features of the target object can be identified from the subject area image. And then, screening out the actual matching area of the target object from the plurality of candidate areas by combining the attribute characteristics of the target object.

For example, when the target object is determined to be a tea pot by identifying the target object of the main area image and the candidate area identified from the candidate background image is a cooking bench, a table top or a ground, the shape and the class of the pot can be combined at this time, and the actual matching area matched with the pot is determined to be the table top from a plurality of candidate areas.

Step S3033, the subject region image is fused to the actual matching region in the candidate background image.

The subject region image matches the actual matching region, i.e. the target subject in the subject region image is adapted to be placed in the actual matching region. At the moment, the main body region image and the actual matching region are fused, so that the effective fusion of the main body region image and the candidate background image is realized, and the fusion of the target object and the environment where the target object is located is ensured to be more harmonious.

In some optional embodiments, before step S3032, it may further include: and scaling the main body region image according to the size information of the target reference object in the candidate background image to generate a scaled main body region image.

Accordingly, the step S3033 may include: and fusing the scaled main area image to the actual matching area in the candidate background image.

In order to further ensure effective fusion of the target object and the environment in which it is located, it is necessary that the subject region image matches the target reference object in the candidate background image in terms of size. The main body region image can be scaled by combining the size information of the target reference object in the candidate background image, so that the main body region image after the scaling is obtained, and the size matching degree between the scaled main body region image and the target reference object in the candidate background image is ensured.

In the embodiment, the main body region image is adaptively scaled by combining the size of the target reference object, so that the fusion of the main body region image and the candidate background image is optimized, and the display image obtained by fusion is more accurate and attractive.

As an alternative implementation manner, after the display object is generated, necessary description text can be supplemented for the target object, and the supplemented description text is adaptively adjusted by combining the display image and the size of the target object in the display image, so that the description text is fused into the display image, and the characteristics of the target object in the display image are highlighted.

Step S304, determining the degree of segmentation of the target object in the main area image, and determining the display effect information of the display image.

The segmentation degree represents the extraction integrity and accuracy of the target object in the subject region image. Specifically, the step of determining the degree of segmentation of the target object in the subject region image includes:

step b1, detecting the edge contour of the target object in the main area image.

And b2, comparing the edge contour with the standard contour of the target object to generate a contour comparison result.

And b3, identifying redundant image information except the target object in the main body area image, and determining the segmentation degree of the target object in the main body area image based on the redundant image information and the contour comparison result.

The standard contour is a contour that the target object actually has. The subject area image is an image containing the subject body of the target object, in which the edges of the target object can be depicted to determine the edge contour of the target object. And simultaneously, redundant image information except the target object is removed from the main area image.

And comparing the standard contour of the target object with the edge contour of the target object extracted from the main area image to determine whether the standard contour and the edge contour are consistent, and obtaining a corresponding contour comparison result. And combining the redundant image information and the contour comparison result to determine the degree of segmentation of the target object.

In the above embodiment, the edge contour of the target object is compared with the standard contour, and the segmentation degree of the target object is determined according to the contour comparison result and the redundant image information, so as to determine the integrity of the target object, and avoid the introduction of redundant information in segmentation.

The display effect information represents the effect of the target object in the display image and is used for representing the generation effect of the display image. Specifically, the display effect information of the display image is realized by a quality evaluation model which completes training, and the training step of the quality evaluation model can include:

Step c1, acquiring a display image sample set, wherein display samples in the display image sample set are provided with marked standard conversion information, and the standard conversion information is used for generating display effect information of the display samples.

And c2, processing the target display sample by using a quality evaluation model aiming at any target display sample in the display image sample set to generate prediction conversion information corresponding to the target display sample.

And c3, comparing the predicted conversion information with the standard conversion information, and generating error information based on the comparison result so as to correct the quality evaluation model through the error information.

The standard conversion information is used for generating display effect information for evaluating the generation effect of the display sample, and specifically, the standard conversion information may include the number of times the display sample is browsed, the number of comments of the display sample, the number of good comments, the purchase amount of the target object corresponding to the display sample, and the like.

The display image sample set is a display sample set generated by fusing the target object and the candidate background image, and display samples in the display image sample set all have corresponding standard conversion information.

For any target display sample in the display image sample set, inputting the target display sample into a quality evaluation model for evaluation of conversion information, and outputting predicted conversion information corresponding to the target display sample.

Comparing the predicted transformation information with the standard transformation information, and determining a comparison result between the predicted transformation information and the standard transformation information. And determining error information between the predicted conversion information and the standard conversion information through a comparison result, and carrying out iterative training on the quality evaluation model by combining the error information so as to realize correction on the quality evaluation model, so that the conversion information output by the quality evaluation model is more accurate.

In the above embodiment, the quality evaluation model is trained to evaluate the display effect information of the display image, so as to ensure that the finally determined display image accords with the self main body characteristics of the target object and has higher matching degree with the target object.

Step S305, generating quality information of the display image according to the segmentation degree and the display effect information, so as to judge whether to keep the display image according to the quality information.

The quality information represents the fusion matching degree of the target object displayed in the display image and the candidate background image, and the integrity of the target object. The quality information of the display image can be determined by combining the segmentation degree and the display effect information. If the fusion matching degree of the target object displayed by the display image and the candidate background image is poor and/or the integrity of the target object is poor, the display image can be judged to have poor generation effect and the display image can be deleted; otherwise, the display image generation effect is better, and the display image is reserved.

In some optional embodiments, if there are a plurality of generated display images for the target object, the method may further include: and sorting the plurality of display images according to the display effect information of each display image, and selecting the actual display image of the target object based on the sorting result.

And carrying out quality scoring on each display image by combining the display effect information of each display image so as to obtain the score of each display image. And sequencing each display image from high to low or from low to high according to the score of each display image to obtain a corresponding sequencing result. The display image with higher score can be determined according to the sorting result, and is determined to be an actual display image.

The display images are ranked according to the display effect information, so that the actual display images with higher matching degree can be selected from the display images, and the characteristics of the target object and the use scene can be accurately represented by the actual display images.

According to the method for generating the display image of the target object, the candidate region and the attribute features of the target object are combined to determine the actual matching region aiming at the target object, so that the fusion layout of the main region image and the candidate background image can be intelligently adjusted aiming at different backgrounds and attribute features of the target object, and the display image obtained through fusion can accurately represent the applicable scene of the target object. According to the segmentation degree of the target object in the main body region image and the display effect information of the display image, the generation quality of the display image is determined, and the display image with higher generation quality is conveniently screened out.

The present embodiment also provides a device for generating a display image of a target object, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a device for generating a display image of a target object, as shown in fig. 4, including:

the image acquisition module 401 is configured to acquire an original image of a target object, and extract a subject area image of the target object from the original image.

The description information obtaining module 402 is configured to obtain description information of the target object, and generate a candidate background image applicable to the target object based on the description information.

The fusion module 403 is configured to fuse the main region image with the candidate background image to generate a display image of the target object.

In some alternative embodiments, the description information obtaining module 402 may include:

and the object information acquisition unit is used for acquiring the object identification of the target object and inquiring the object information matched with the object identification. Wherein the object information includes at least one of image information, text information, audio information, and video information of the target object.

And the analysis unit is used for analyzing one or more descriptive contents related to the target object from the object information and generating descriptive information of the target object based on the one or more descriptive contents.

In some alternative embodiments, the description information obtaining module 402 may further include:

and the feature recognition unit is used for recognizing the attribute features of the target object in the original image of the target object and generating the description information of the target object based on the recognized attribute features.

and the background generation model training unit is used for training the background generation model.

Specifically, the background generation model training unit is configured to: acquiring an image sample set of a target object, wherein the image sample set comprises a plurality of image samples of the target object; extracting a background area image except for a target object from a target image sample aiming at any target image sample in an image sample set; generating a description sample for describing a target object in the target image sample, and processing the description sample by utilizing a background generation model to output a prediction background image corresponding to the description sample; and comparing the predicted background image with the background area image, and generating error information based on the comparison result so as to correct the background generation model through the error information.

In some alternative embodiments, the fusing module 403 may include:

and a candidate region identification unit for identifying one or more candidate regions for matching the target object in the candidate background image.

And an attribute feature recognition unit for recognizing an attribute feature of the target object in the subject area image and determining an actual matching area in the candidate area based on the attribute feature.

And the fusion unit is used for fusing the main area image to the actual matching area in the candidate background image.

In some alternative embodiments, the fusing module 403 may further include:

and the scaling unit is used for scaling the main body area image according to the size information of the target reference object in the candidate background image so as to generate a scaled main body area image.

Correspondingly, the fusing unit is further used for fusing the scaled main area image to the actual matching area in the candidate background image.

In some optional embodiments, the generating device of the display image of the target object may further include:

and the information determining unit is used for determining the degree of segmentation of the target object in the main area image and determining the display effect information of the display image.

And the quality determining unit is used for generating quality information of the display image according to the segmentation degree and the display effect information so as to judge whether the display image is reserved or not according to the quality information.

In some optional embodiments, the information determining unit includes:

a segmentation degree determination subunit, configured to detect an edge contour of the target object in the main region image; comparing the edge contour with a standard contour of the target object to generate a contour comparison result; redundant image information other than the target object is identified in the subject area image, and the degree of segmentation of the target object in the subject area image is determined based on the redundant image information and the contour comparison result.

And the quality evaluation model training subunit is used for training the quality evaluation model to determine quality information.

Specifically, the quality assessment model training subunit is configured to: acquiring a display image sample set, wherein display samples in the display image sample set are provided with marked standard conversion information, and the standard conversion information is used for generating display effect information of the display samples; aiming at any target display sample in the display image sample set, processing the target display sample by using a quality evaluation model to generate prediction conversion information corresponding to the target display sample; comparing the predicted transformation information with the standard transformation information, and generating error information based on the comparison result to correct the quality assessment model through the error information.

the sorting module is used for sorting the plurality of display images according to the display effect information of each display image, and selecting the actual display image of the target object based on the sorting result.

For specific processing logic of each module and each unit, reference may be made to the description of the foregoing method embodiment, and details are not repeated herein.

The respective units set forth in the above embodiments may be implemented by a computer chip or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

According to the device for generating the display images of the target objects, the main body area images of the target objects are extracted from the original images, and the corresponding candidate background images are generated based on the description information of the main body area images, so that the candidate background images for the target objects can be automatically generated based on the characteristics of the target objects, the candidate background images with various styles can be conveniently generated in batches, and then the main body area images and the candidate background images are fused to obtain the batch display images, so that the device does not depend on the existing material library or picture templates any more, and can be suitable for the generation scenes of the display images of various categories.

The embodiment of the invention also provides electronic equipment, which is provided with the device for generating the display image of the target object shown in the figure 4.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, as shown in fig. 5, the electronic device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 5.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The electronic device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 5.

The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.

The electronic device also includes a communication interface for data communication between the electronic device and other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for embodiments of the apparatus, device and storage medium, the description is relatively simple as it is substantially similar to the method embodiments, as relevant points are found in the partial description of the method embodiments.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Although embodiments of the present disclosure have been described with reference to the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and such modifications and variations fall within the scope as defined by the appended claims.

Claims

1. A method for generating a presentation image of a target object, the method comprising:

acquiring an original image of a target object, and extracting a main body region image of the target object from the original image;

acquiring description information of the target object, and generating a candidate background image applicable to the target object based on the description information;

and fusing the main area image and the candidate background image to generate a display image of the target object.

2. The method according to claim 1, wherein the obtaining the description information of the target object includes:

Acquiring an object identifier of the target object, and inquiring object information matched with the object identifier, wherein the object information comprises at least one of image information, text information, audio information and video information of the target object;

and analyzing one or more descriptive contents related to the target object from the object information, and generating descriptive information of the target object based on the one or more descriptive contents.

3. The method according to claim 1, wherein the obtaining the description information of the target object includes:

and identifying the attribute characteristics of the target object in the original image of the target object, and generating the description information of the target object based on the identified attribute characteristics.

4. The method according to claim 1, wherein the step of generating candidate background images applicable to the target object based on the description information is performed by a trained background generation model; the background generation model is trained according to the following mode:

acquiring an image sample set of the target object, wherein the image sample set comprises a plurality of image samples of the target object;

Extracting a background area image except for the target object from any target image sample in the image sample set;

generating a description sample for describing the target object in the target image sample, and processing the description sample by utilizing the background generation model to output a predicted background image corresponding to the description sample;

and comparing the predicted background image with the background area image, and generating error information based on a comparison result so as to correct the background generation model through the error information.

5. The method of claim 1, wherein the fusing the subject area image with the candidate background image comprises:

identifying, in the candidate background image, one or more candidate regions for matching the target object;

identifying attribute features of the target object in the subject region image, and determining an actual matching region in the candidate region based on the attribute features;

the subject region image is fused to the candidate background image at the actual matching region.

6. The method of claim 5, wherein prior to fusing the subject region image at the actual matching region in the candidate background image, the method further comprises:

and scaling the main body region image according to the size information of the target reference object in the candidate background image so as to generate a scaled main body region image.

7. The method of claim 1, wherein after generating the presentation image of the target object, the method further comprises:

determining the segmentation degree of the target object in the main area image, and determining the display effect information of the display image;

and generating quality information of the display image according to the segmentation degree and the display effect information, so as to judge whether to reserve the display image according to the quality information.

8. The method of claim 7, wherein determining a degree of segmentation of the target object in the subject region image comprises:

detecting an edge contour of the target object in the main body area image;

comparing the edge profile with a standard profile of the target object to generate a profile comparison result;

Redundant image information other than the target object is identified in the main body area image, and the degree of segmentation of the target object in the main body area image is determined based on the redundant image information and the contour comparison result.

9. The method of claim 7, wherein the step of determining the presentation effect information of the presentation image is performed by a trained quality assessment model trained in the following manner:

acquiring a display image sample set, wherein display samples in the display image sample set are provided with marked standard conversion information, and the standard conversion information is used for generating display effect information of the display samples;

processing the target display sample by using the quality evaluation model aiming at any target display sample in the display image sample set to generate prediction conversion information corresponding to the target display sample;

comparing the predicted transformation information with the standard transformation information, and generating error information based on a comparison result to correct the quality assessment model through the error information.

10. The method of claim 1 or 7, wherein if generating a plurality of presentation images of the target object, the method further comprises:

and sorting the plurality of display images according to the display effect information of each display image, and selecting the actual display image of the target object based on the sorting result.

11. A device for generating a presentation image of a target object, the device comprising:

the image acquisition module is used for acquiring an original image of a target object and extracting a main body area image of the target object from the original image;

the descriptive information acquisition module is used for acquiring descriptive information of the target object and generating candidate background images applicable to the target object based on the descriptive information;

and the fusion module is used for fusing the main area image and the candidate background image to generate a display image of the target object.

12. An electronic device comprising a memory and a processor, the memory for storing a computer program which, when executed by the processor, implements a method of generating a presentation image of a target object as claimed in any one of claims 1 to 10.

13. A computer-readable storage medium for storing a computer program which, when executed by a processor, implements a method of generating a presentation image of a target object according to any one of claims 1 to 10.