WO2024104477A1 - Procédé et appareil de génération d'image, dispositif électronique et support de stockage - Google Patents

Procédé et appareil de génération d'image, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2024104477A1
WO2024104477A1 PCT/CN2023/132440 CN2023132440W WO2024104477A1 WO 2024104477 A1 WO2024104477 A1 WO 2024104477A1 CN 2023132440 W CN2023132440 W CN 2023132440W WO 2024104477 A1 WO2024104477 A1 WO 2024104477A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
original
target image
style
Prior art date
Application number
PCT/CN2023/132440
Other languages
English (en)
Chinese (zh)
Inventor
王晶
苗旺
徐雨旸
徐丁丁
刘松伟
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024104477A1 publication Critical patent/WO2024104477A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction

Definitions

  • the embodiments of the present disclosure relate to the field of image generation technology, and in particular, to an image generation method, device, electronic device, and storage medium.
  • the embodiments of the present disclosure provide an image generation method, an apparatus, an electronic device, and a storage medium to overcome the problem that the generated images have little image information and poor visual expression.
  • an embodiment of the present disclosure provides an image generation method, comprising:
  • an image generating device including:
  • An acquisition module used for acquiring at least two original images
  • a migration module is used to perform image style migration on at least two of the original images to obtain a corresponding target image frame having a target image style
  • the combination module is used to determine a target puzzle layout according to image contents corresponding to at least two of the original images, and to combine at least two of the target image frames according to the target puzzle layout to generate a target puzzle.
  • an electronic device including:
  • a processor and a memory communicatively connected to the processor
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the image generating method as described in the first aspect and various possible designs of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the image generation method described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the image generation method described in the first aspect and various possible designs of the first aspect.
  • the image generation method, device, electronic device and storage medium provided in the present embodiment obtain at least two original images, and perform image style transfer on at least two of the original images respectively to obtain corresponding target image frames with target image style; determine the target puzzle layout according to the image content corresponding to at least two of the original images; and combine at least two of the target image frames according to the target puzzle layout to generate a target puzzle.
  • the target image frame with a specific image style is obtained by performing style transfer on at least two original images, and then the target image frames are arranged and combined to obtain a target puzzle with a layout matching the content of multiple original images
  • the target puzzle can not only display multiple frames of images with style special effects, but also display the content relevance of multiple frames of images with style special effects through the puzzle layout, thereby achieving full display of effective information in multiple frames of original images and improving visual expression.
  • FIG1 is a diagram of an application scenario of the image generation method provided by an embodiment of the present disclosure.
  • FIG2 is a flowchart of an image generation method according to an embodiment of the present disclosure.
  • FIG3 is a flowchart of a specific implementation method of step S102 in the embodiment shown in FIG2 ;
  • FIG4 is a flowchart of a specific implementation method of step S1022 in the embodiment shown in FIG3 ;
  • FIG5 is a schematic diagram of determining a first image frame provided by an embodiment of the present disclosure.
  • FIG6 is a schematic diagram of a target puzzle layout of a target puzzle provided by an embodiment of the present disclosure.
  • FIG7 is a second flow chart of the image generation method provided by an embodiment of the present disclosure.
  • FIG8 is a schematic diagram of a cropped image provided by an embodiment of the present disclosure.
  • FIG9 is a flowchart of a specific implementation method of step S206 in the embodiment shown in FIG2 ;
  • FIG10 is a flowchart of a specific implementation method of step S208 in the embodiment shown in FIG2 ;
  • FIG11 is a structural block diagram of an image generating device provided by an embodiment of the present disclosure.
  • FIG12 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 13 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • FIG1 is a diagram of an application scenario of the image generation method provided in an embodiment of the present disclosure.
  • the image generation method provided in an embodiment of the present disclosure can be applied to application scenarios of video image editing and processing such as video cover generation and video conversion to a picture set.
  • the method provided in an embodiment of the present disclosure can be applied to a terminal device or a server.
  • a video editing application APP is running in the terminal device.
  • the user After the user obtains the video to be processed by shooting, downloading from a server, or receiving transmission from other terminal devices, the user loads the video to be processed into a video editing application (shown as an App in the figure) to which the image generation method provided in an embodiment of the present application is applied. Afterwards, the terminal device uses the video editing application to process the video to be processed. After the video is processed, a target image capable of representing the video content is generated based on the video content in the video to be processed, so that the target image is used as the video cover of the video to be processed.
  • a video editing application shown as an App in the figure
  • the user In the prior art, taking the application scenario of making a video cover in the process of video content creation as an example, the user generates a corresponding image as a video cover based on the video data by operating the terminal device to achieve the purpose of previewing and displaying the video content.
  • the above-mentioned image is usually generated by extracting a certain video frame in the video based on the user's manual selection.
  • the video data includes multiple frames of images, and the image content of each frame of the image is different. Taking one of the frames as the video cover cannot fully express the content of the video data, and there is a problem of small amount of image information.
  • the embodiments of the present disclosure provide an image generation method to solve the above problems.
  • FIG. 2 is a flow chart of an image generation method provided by an embodiment of the present disclosure.
  • the method of this embodiment can be applied in a terminal device, and the image generation method includes:
  • Step S101 Acquire at least two original images.
  • the original image is an image used as the material of the target puzzle, and the original image can be obtained by extracting frames from the material data.
  • the material data can be a video, or a picture set, or a collection of the two. Taking the material data as a video as an example, refer to the application scenario diagram shown in Figure 1, and the material data corresponds to the video to be processed in the embodiment shown in Figure 1.
  • the material data can be obtained by shooting by the user through the image acquisition unit of the terminal device, such as a camera; it can also be obtained by accessing the server for downloading, or by receiving data sent by other terminal devices. It can be set here as needed, and no further examples are given.
  • the material data includes at least two frames of material images.
  • the material data is composed of multiple video frames (material images).
  • the video frames constituting the initial image data can be obtained.
  • the video frames (material images) in the material data are screened to obtain images that meet the requirements of the rules, that is, the original images.
  • the key frames (I-frames) in the material data are used as the original images.
  • the method of determining and retrieving key frames in video data is a prior art and will not be described in detail here.
  • the material data may be frame extracted based on the content of the material data, thereby obtaining an original image of the material used to generate a subsequent target puzzle.
  • the specific implementation of step S101 includes:
  • Step S1011 Acquire material data, where the material data includes a video and/or a picture set;
  • Step S1012 extracting frames of the material data according to the image content of the material images in the material data to obtain at least two original images, where the material images are video frames in the video and/or pictures in the picture set.
  • image recognition is first performed on each material image in the material data to obtain the image content of each material image, wherein there are multiple specific implementation methods of the image content, for example, it can be a feature matrix (feature) describing the image content, it can also be a pixel matrix describing the image content, or it can be an identifier representing the specific content in the image. More specifically, for example, when the image includes a portrait, the corresponding identifier (image content) is #001, and when the image includes a landscape, the corresponding identifier (image content) is #002.
  • the content can be further subdivided on this basis to obtain a more detailed identifier.
  • the corresponding identifier image content
  • the corresponding identifier image content
  • the corresponding identifier image content
  • the corresponding identifier image content
  • the mapping relationship between the identifier and the specific content in the image can be set based on specific needs, and examples are not given here one by one.
  • step S1012 includes:
  • Step S1012A based on the image content of the material image, obtaining the posture similarity corresponding to the material image, where the posture similarity represents the similarity between the posture of the human element in the image content and the target posture.
  • Step S1012B Determine at least two original images according to the posture similarities corresponding to the material images.
  • the image content of the material image is obtained.
  • the image content includes human elements, which refer to image elements related to human portraits, such as The head, torso, limbs, hands and the entire portrait in the portrait.
  • the human elements present different postures.
  • the posture of the human elements in the image content is compared with the preset target posture to obtain the posture similarity.
  • the specific calculation method of posture similarity can be implemented based on the image consistency algorithm.
  • the image consistency algorithm is a prior art known to those skilled in the art and will not be repeated here.
  • the posture of the human elements includes, for example: one or more of facial expressions, movements of the limbs and torso, and hand movements.
  • the purpose of obtaining the posture similarity corresponding to each material image is to evaluate whether the material image can express the important content in the material data, so as to be screened as the original image as the material of the target puzzle generated subsequently; wherein the target posture includes a plurality of preset postures of human elements, for example, facial smiling expression, facial laughing expression, limb movements when waving, etc., which can be used as a rule for judging whether the posture of the human element can express effective information (for example, whether it can express happy or angry emotions), and whether it conforms to the aesthetic characteristics.
  • the target posture includes a plurality of preset postures of human elements, for example, facial smiling expression, facial laughing expression, limb movements when waving, etc., which can be used as a rule for judging whether the posture of the human element can express effective information (for example, whether it can express happy or angry emotions), and whether it conforms to the aesthetic characteristics.
  • the posture similarity corresponding to each original image is obtained, and then based on the posture similarity, the image content of each original image is screened out, and the human posture is more meaningful and more in line with the aesthetic characteristics, so that the target image frame and the target puzzle generated subsequently can express more abundant effective information, and make the portrait in the image more beautiful.
  • the specific implementation method of determining at least two original images includes:
  • the material image with posture similarity greater than a first similarity threshold is determined as the first image frame, and/or the material image with posture similarity less than a second similarity threshold is determined as the first image frame; wherein the first similarity threshold is greater than the second similarity threshold.
  • FIG5 is a schematic diagram of determining an original image provided by an embodiment of the present disclosure.
  • the process of determining at least two original images in the above steps is introduced in conjunction with FIG5.
  • the first similarity threshold is greater than the second similarity threshold.
  • the first similarity threshold shown as p1 in the figure
  • the first similarity threshold shown as p2 in the figure
  • the first similarity threshold shown as p2 in the figure
  • the gesture of the human figure in the material image is a gesture designed by the user with purpose. Although there is a possibility that it does not meet the aesthetic characteristics (due to the large difference from the target gesture), the gesture contains more information, so the material image C is also determined as the original image.
  • the gesture corresponding to the material image B is regarded as a degraded target gesture, which cannot meet the requirements of aesthetic features.
  • it is close to the target gesture it is not enough to reflect enough information. Therefore, the material image B is excluded and not used as the original image.
  • the original image that is greater than the first similarity threshold and/or less than the second similarity threshold is determined as the original image, thereby ensuring that the first image frame can contain more information and take into account aesthetic characteristics, thereby achieving a full display of the data content in the material data, and improving the information content and image perception of the subsequently generated target image frames and target puzzles.
  • Step S102 performing image style transfer on at least two original images respectively to obtain corresponding target image frames having the target image style.
  • image style transfer is performed on the multiple original images respectively.
  • the material data includes 30 key frames in total.
  • the 30 key frames are used as original images, and image style transfer is performed respectively to obtain 30 corresponding image frames with the same specific image style (target image style), namely, target image frames.
  • Image style transfer refers to adding image style effects to an image so that the processed image has a certain image painting style in color and line, such as oil painting style, comic style, sketch style, etc.
  • the specific implementation method is, for example, to process the original images separately through a pre-trained style transfer model that can achieve the target image style, so as to obtain images with the target image style, namely, target image frames.
  • Specific training of the style transfer model The method of use is known to those skilled in the art and will not be described in detail here.
  • Step S103 determining a target puzzle layout according to image contents corresponding to at least two original images.
  • Step S104 combining at least two target image frames according to a target puzzle layout to generate a target puzzle.
  • each target image frame is spliced and combined to obtain an image with certain layout rules, namely, a target puzzle.
  • the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout characterizes the size and/or position of the puzzle area in the target puzzle.
  • Figure 6 is a schematic diagram of a target puzzle layout of a target puzzle provided in an embodiment of the present disclosure.
  • the target puzzle is composed of four target image frames, namely, target image frame A, target image frame B, target image frame C and target image frame D, wherein each target image frame corresponds to a puzzle area, the puzzle area corresponding to target image frame A is relatively large and is located on the left side of the target puzzle, and the puzzle areas corresponding to target image frames B, target image frames C and target image frames D are relatively small and are located on the right side of the target puzzle.
  • the target puzzle layout of the target puzzle is not randomly generated, but is determined based on the image content of each original image.
  • the step of generating the target puzzle layout includes:
  • Step S103A obtaining layout information according to the image contents corresponding to at least two original images in the initial image data, wherein the layout information represents the size and/or position of each puzzle area in the target puzzle.
  • Step S103B Generate a target puzzle layout according to the layout information.
  • the initial image data is a video data for introducing clothing matching.
  • the image content of the target image frame A corresponds to the overall portrait (character element, the same below), which is located in the most prominent main position on the left side of the target puzzle to show the overall clothing matching effect of the character; while the image content of the target image frame B corresponds to the front of the portrait, the image content of the target image frame C corresponds to the back of the portrait, and the image content of the target image frame D corresponds to the side of the portrait, all of which are located in a secondary position on the right side of the target puzzle, for showing the clothing matching effects of the character on the front, back and side, so that the target puzzle can realize the display of important content information in the video data (initial image data) (the front, back, side and overall effect of clothing matching), thereby increasing the amount of information of the target puzzle.
  • the method for obtaining the image content of each target image frame has been introduced in the previous step.
  • the target image frames can be sorted according to the posture similarity of the target posture or the prominence of the aesthetic features, so as to determine the area and position of the puzzle area corresponding to each target video frame, and then determine the target puzzle layout.
  • At least two original images are obtained, and image style transfer is performed on the at least two original images respectively to obtain corresponding target image frames with target image style; the target puzzle layout is determined according to the image content corresponding to the at least two original images; and the at least two target image frames are combined according to the target puzzle layout to generate a target puzzle. Since the target image frame with a specific image style is obtained by performing style transfer on at least two original images, and the target image frames are arranged and combined to obtain a target puzzle with a layout matching the content of multiple original images, the target puzzle can not only display multiple frames of images with style special effects, but also display the relevance of the content of multiple frames of images with style special effects through the puzzle layout, thereby achieving full display of effective information in multiple frames of original images and improving visual expression.
  • FIG. 7 is a second flow chart of the image generation method provided by the embodiment of the present disclosure. Based on the embodiment shown in FIG. 2 , this embodiment further refines step S102 and adds a step of determining the target puzzle layout.
  • the image generation method includes:
  • Step S201 Acquire material data, where the material data includes a video and/or a picture set.
  • Step S202 Acquire the image content of each material image in the material data.
  • steps S201 - S202 of this embodiment has been described in detail in the embodiment shown in FIG. 2 , and will not be repeated here.
  • Step S203 acquiring the dynamic definition corresponding to each material image, and obtaining at least two original images based on the image content and the corresponding dynamic definition of each material image.
  • dynamic clarity refers to the picture clarity when playing dynamic images, which is specifically reflected in whether the dynamic picture has "tailing” or "ghosting” and other phenomena.
  • Clarity detection can be obtained by performing correlation analysis on the image. Specifically, for example, the image is divided into several horizontal or vertical regions, and then the correlation of adjacent regions is calculated. If there are "tailing” or "ghosting" phenomena, that is, the dynamic clarity is low, the correlation is large, otherwise the correlation is small. Thus, based on the calculation of the correlation, the corresponding dynamic clarity is obtained.
  • there are other possible implementation methods for dynamic clarity which will not be described one by one here.
  • each material image is further filtered based on dynamic definition, and images with lower dynamic definition are removed, and images with higher dynamic definition are retained as original images, thereby improving the picture clarity of the original image and improving the visual effect of the target image frame generated subsequently.
  • the implementation method of filtering each material image based on the image content of the material image has been introduced in detail in the embodiment shown in FIG2 and will not be repeated here.
  • this embodiment also includes:
  • Step S203A Determine the target image style based on the image contents corresponding to at least two original images.
  • the target image style refers to the type of a certain image style special effect, such as oil painting style, comic style, sketch style, etc.
  • the target image style is determined based on the data content of the initial image data, wherein the image content corresponding to at least two original images refers to the image content corresponding to at least two original images respectively, and the correlation between the image contents.
  • the image content corresponding to at least two original images can be determined by the content of the material data corresponding to the original images.
  • the content theme and type expressed by the image content corresponding to at least two original images can be represented by a specific content identifier, for example, the content identifier is #1, indicating that at least two original images are selfie videos of users; the content identifier is #2, indicating that at least two original images are short videos; the content identifier is #3, indicating that at least two original images are movies.
  • the specific implementation method and expression method of the content identifier can be set based on needs, and will not be repeated here. Furthermore, there is a preset mapping relationship between the image content corresponding to at least two original images and the target image style.
  • the corresponding target image style is a cartoon style
  • the corresponding target image style is a sketch style.
  • the corresponding target image style is determined by the image content corresponding to at least two original images and the content correlation between the image content corresponding to at least two original images, so that the image style of the generated target image frame matches the image content corresponding to at least two original images, thereby improving the visual expression of the target image frame.
  • Step S204 Determine a target image element in the original image based on the image content of the original image.
  • Step S205 performing edge cropping around the target image element to obtain a cropped image including the target image element, wherein the image area proportion of the target image element in the cropped image is greater than the image area proportion of the target image element in the original image.
  • the picture composition of the original image is the same as the material image.
  • the problem of failing to highlight the key points of the picture will occur due to the lack of changes in the previous and next image frames.
  • the target image element in the original image is determined. For example, if the image content of the original image is a selfie of a portrait, the portrait contour is used as the center for cropping around, and the invalid area in the first image frame is cut off to obtain a cropped image containing the portrait contour (target image element).
  • FIG8 is a schematic diagram of a cropped image provided by an embodiment of the present disclosure. As shown in FIG8, the original image includes a portrait. After the invalid area outside the portrait is cropped based on the portrait contour, the image frame containing the portrait is obtained, that is, the cropped image. Among them.
  • the proportion of the portrait contour (target image element) in the cropped image is higher than the proportion of the portrait contour in the original image. Therefore, the purpose of highlighting the focus of the picture is achieved and the visual expression of the target image frame is improved.
  • the invalid image area is reduced, and the image migration efficiency in the subsequent style migration process can be increased.
  • Step S206 Determine a target puzzle layout according to the image contents corresponding to at least two original images.
  • the image content of the original image can be evaluated based on certain rules, and a target puzzle layout can be generated based on the evaluation results, so that the first image frame with higher information content and/or higher aesthetics can be displayed preferentially.
  • the specific implementation method is, for example, evaluating and sorting the first image frame based on the posture similarity, aesthetic features, etc. corresponding to the image content to generate the target puzzle layout.
  • the specific implementation method has been introduced in the corresponding paragraph of the embodiment shown in FIG. 2 and will not be repeated here.
  • step S206 includes:
  • Step S2061 Obtain context information based on image contents corresponding to at least two original images, where the context information represents a contextual relationship between image contents corresponding to at least two initial images.
  • Step S2062 Determine the target puzzle layout according to the context information.
  • the original image is the result of filtering the material images in the initial image data, that is, the original image is a specific material image.
  • Different original images have continuity in content. For example, if the material data corresponding to the original image is a "dance" video, then the dance movements corresponding to the material images in the video have a temporal connection.
  • the original image filtered out from the material image is It also has this kind of temporal correlation, that is, the contextual relationship. More specifically, for example, based on the previous steps, 100 original images are obtained. Then, semantic recognition is performed on each original image to obtain semantic information representing the dance movement corresponding to each original image; based on the semantic information corresponding to each ordered original image, contextual information is generated.
  • the contextual information can be a feature matrix representing the correlation between semantic information. Then, based on the contextual information, the original images corresponding to repetitive dance movements and non-important dance movements are screened to obtain the number of first image frames (for example, 10) that only represent important dance movements and non-repetitive dance movements, as well as the importance evaluation value representing the importance of dance movements; then, the number of puzzle areas in the target puzzle layout, as well as the corresponding original image size and position, that is, the layout information, are determined.
  • first image frames for example, 10
  • Step S207 Based on the style transfer model corresponding to the target image style, style transfer is performed on each cropped image to obtain a target image frame corresponding to each cropped image.
  • Step S208 displaying a special effect identifier in the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on the image content of the target image frame.
  • dynamic feature identifiers such as "fireworks sticker effects”, “virtual jewelry effects”, etc. can be further added to the target image frame to further improve the visual expressiveness of the target puzzle.
  • step S208 includes:
  • Step S2081 Perform facial feature detection on the human element in the target image frame to obtain corresponding facial expression features.
  • Step S2082 Based on the facial expression features, determine the corresponding target special effect mark, and determine the target display position of the target special effect mark.
  • Step S2083 Based on the target display position, a target special effect identifier is added to the target image frame to obtain a special effect target image frame.
  • this embodiment is applicable to a scene in which a target image frame contains human elements.
  • each element in the target image frame is first identified to obtain human elements, such as the face of a portrait, and then facial feature detection is performed on the human elements to obtain facial expression features, such as happiness, sadness, etc.
  • the corresponding target special effect identifier is determined, and then based on the target image frame, the target special effect identifier is generated.
  • the position of each element in the image is determined to determine the target display position of the special effect mark, so that the target special effect mark avoids other image elements to avoid occlusion.
  • the target special effect mark is loaded to the target display position of the target image frame to obtain the special effect target image frame.
  • Step S209 combining at least two special effect target image frames according to the target puzzle layout to generate a target puzzle.
  • steps S207 and S209 of this embodiment has been described in detail in the embodiment shown in FIG. 2 , and will not be repeated here.
  • FIG11 is a structural block diagram of an image generation device provided by an embodiment of the present disclosure.
  • the image generation device 3 includes:
  • An acquisition module 31 used for acquiring at least two original images
  • a migration module 32 configured to perform image style migration on at least two original images respectively to obtain corresponding target image frames having a target image style
  • the combining module 33 is used to determine a target puzzle layout according to the image contents corresponding to at least two original images, and to combine at least two target image frames according to the target puzzle layout to generate a target puzzle.
  • the acquisition module 31 is specifically used to: acquire material data, the material data including a video and/or a picture set; extract frames of the material data according to the image content of the material images in the material data to obtain at least two original images, the material images being video frames in the video and/or pictures in the picture set.
  • the acquisition module 31 when the acquisition module 31 extracts frames of the material data according to the image content of the material image in the material data to obtain at least two original images, it is specifically used to: acquire the posture similarity corresponding to the material image based on the image content of the material image, the posture similarity characterizing the similarity between the posture of the character element in the image content and the target posture; determine at least two original images according to the posture similarity corresponding to each material image.
  • the acquisition module 31 determines at least two original images based on the posture similarities corresponding to each material image, it is specifically used to: determine the material images whose posture similarity is greater than a first similarity threshold as the original images, and/or, determine the material images whose posture similarity is less than a second similarity threshold as the original images; wherein the first similarity threshold is greater than the second similarity threshold.
  • the acquisition module 31 is also used to: acquire the dynamic clarity corresponding to the material image; when the acquisition module 31 extracts frames of the material data according to the image content of the material image in the material data to obtain at least two original images, it is specifically used to: acquire at least two original images based on the image content and the corresponding dynamic clarity of each material image.
  • the migration module 32 is specifically used to: obtain a style migration model corresponding to the target image style; and process the original image based on the style migration model to obtain a target image frame corresponding to the original image.
  • the migration module 32 when the migration module 32 processes the original image based on the style transfer model to obtain a target image frame corresponding to the original image, it is specifically used to: determine the target image element in the original image based on the image content of the original image; perform edge cropping around the target image element to obtain a cropped image containing the target image element, wherein the target image element accounts for a larger proportion of the image area in the cropped image than the target image element accounts for the image area in the original image; perform style migration on each cropped image based on the style transfer model corresponding to the target image style to obtain a target image frame corresponding to the original image.
  • the migration module 32 before performing image style migration on at least two original images to obtain corresponding target image frames with target image style, the migration module 32 is further used to: determine the target image style based on image contents corresponding to the at least two original images.
  • the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout represents the size and/or position of the puzzle area in the target puzzle.
  • the combination module 33 when determining the target puzzle layout based on the image contents corresponding to at least two original images, is specifically used to: obtain context information based on the image contents corresponding to at least two original images, wherein the context information represents a contextual relationship between the image contents corresponding to at least two initial images; and determine the target puzzle layout based on the context information.
  • the combining module 33 is further configured to: add a special effect identifier to the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on the image content of the target image frame.
  • the combination module 33 adds a special effect identifier to the target image frame to obtain the special effect target image frame, which is specifically used to: perform facial feature detection on the character elements in the target image frame to obtain the corresponding facial expression features; determine the corresponding target special effect based on the facial expression features; effect identifier; determining a target display position of a target special effect identifier; and adding a target special effect identifier to a target image frame based on the target display position to obtain a special effect target image frame.
  • a special effect identifier to the target image frame to obtain the special effect target image frame, which is specifically used to: perform facial feature detection on the character elements in the target image frame to obtain the corresponding facial expression features; determine the corresponding target special effect based on the facial expression features; effect identifier; determining a target display position of a target special effect identifier; and adding a target special effect identifier to a target image frame based on the target display position to obtain a special effect target image frame.
  • the acquisition module 31, the migration module 32 and the combination module 33 are connected in sequence.
  • the image generation device 3 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
  • FIG. 12 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 12 , the electronic device 4 includes:
  • the memory 42 stores computer executable instructions
  • the processor 41 executes the computer-executable instructions stored in the memory 42 to implement the image generation method in the embodiments shown in FIG. 2 to FIG. 10 .
  • processor 41 and the memory 42 are connected via a bus 43 .
  • FIG. 13 it shows a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present disclosure
  • the electronic device 900 may be a terminal device or a server.
  • the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PADs Portable Android Devices
  • PMPs portable multimedia players
  • vehicle terminals such as vehicle navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 13 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
  • the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903.
  • a processing device e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903.
  • the processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904.
  • the electronic device 900 includes an input device 906 such as a disk, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; an output device 907 such as a liquid crystal display (LCD), a speaker, a vibrator, etc.; a storage device 908 such as a magnetic tape, a hard disk, etc.; and a communication device 909.
  • the communication device 909 can allow the electronic device 900 to communicate with other devices wirelessly or wired to exchange data.
  • FIG. 13 shows an electronic device 900 with various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or provided alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902.
  • the processing device 901 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
  • This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
  • Computer readable signal media may also be any computer readable medium other than computer readable storage media, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any of the above. The right combination.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chips
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • an image generation method comprising:
  • obtaining at least two original images includes: obtaining material data, wherein the material data includes a video and/or a picture set; extracting frames from the material data according to image contents of the material images in the material data to obtain at least two original images, wherein the material images are video frames in the video and/or pictures in the picture set.
  • the material data is frame extracted according to the image content of the material image in the material data to obtain at least two of the original images, including: based on the image content of the material image, obtaining the posture similarity corresponding to the material image, the posture similarity characterizing the similarity between the posture of the character element in the image content and the target posture; and determining at least two original images according to the posture similarities corresponding to each of the material images.
  • At least two original images are determined according to the posture similarities corresponding to the material images, including: determining the material images whose posture similarity is greater than a first similarity threshold as the original images, and/or determining the material images whose posture similarity is less than a second similarity threshold as the original images.
  • the material image with the same similarity value is determined as the original image; wherein the first similarity threshold is greater than the second similarity threshold.
  • the method further includes: obtaining the dynamic clarity corresponding to the material image; extracting frames of the material data according to the image content of the material image in the material data to obtain at least two of the original images, including; obtaining at least two of the original images based on the image content and corresponding dynamic clarity of each of the material images.
  • image style transfer is performed on at least two of the original images respectively to obtain corresponding target image frames with the style of the target image, including: obtaining a style transfer model corresponding to the target image style; and processing the original images based on the style transfer model to obtain the target image frames corresponding to the original images.
  • the original image is processed based on the style transfer model to obtain a target image frame corresponding to the original image, including: determining a target image element in the original image based on the image content of the original image; performing edge cropping around the target image element to obtain a cropped image containing the target image element, wherein the target image element in the cropped image accounts for a larger proportion of the image area of the target image element in the original image; and performing style migration on each of the cropped images based on the style transfer model corresponding to the style of the target image to obtain a target image frame corresponding to the original image.
  • the target image style before performing image style migration on at least two of the original images respectively to obtain corresponding target image frames with the target image style, it also includes: determining the target image style based on image contents corresponding to the at least two original images.
  • the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout represents the size and/or position of the puzzle area in the target puzzle.
  • determining a target puzzle layout based on image contents corresponding to at least two of the original images includes: obtaining context information based on the image contents corresponding to the at least two original images, the context information representing a contextual relationship between the image contents corresponding to the at least two initial images; and determining the target puzzle layout based on the context information.
  • the method further includes: adding a special effect identifier to the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on image content of the target image frame.
  • a special effect identifier is added to the target image frame to obtain a special effect target image frame, including: performing facial feature detection on a person element in the target image frame to obtain a corresponding facial expression feature; determining a corresponding target special effect identifier based on the facial expression feature; determining a target display position of the target special effect identifier; and adding the target special effect identifier to the target image frame based on the target display position to obtain a special effect target image frame.
  • an image generating device comprising:
  • An acquisition module used for acquiring at least two original images
  • a migration module used to perform image style migration on at least two of the original images respectively to obtain corresponding target image frames with target image style
  • the combination module is used to determine a target puzzle layout according to image contents corresponding to at least two of the original images, and to combine at least two of the target image frames according to the target puzzle layout to generate a target puzzle.
  • the acquisition module is specifically used to: acquire material data, the material data including a video and/or a picture set; extract frames of the material data according to image content of the material images in the material data to obtain at least two of the original images, the material images being video frames in the video and/or pictures in the picture set.
  • the acquisition module when the acquisition module extracts frames of the material data according to the image content of the material image in the material data to obtain at least two of the original images, it is specifically used to: acquire the posture similarity corresponding to the material image based on the image content of the material image, the posture similarity representing the similarity between the posture of the character element in the image content and the target posture; determine at least two original images according to the posture similarities corresponding to each of the material images.
  • the acquisition module determines at least two original images based on the posture similarities corresponding to each of the material images, it is specifically used to: determine the material image whose posture similarity is greater than a first similarity threshold as the original image, and/or, determine the material image whose posture similarity is less than a second similarity threshold as the original image; wherein the first similarity threshold is greater than the second similarity threshold.
  • the acquisition module is further used to: acquire the dynamic definition corresponding to the material image; when the acquisition module extracts frames of the material data according to the image content of the material image in the material data to obtain at least two original images, the acquisition module is specifically used to: At least two original images are obtained based on the image content and the corresponding dynamic definition of each of the material images.
  • the migration module is specifically used to: obtain a style migration model corresponding to the target image style; and process the original image based on the style migration model to obtain a target image frame corresponding to the original image.
  • the migration module when the migration module processes the original image based on the style migration model to obtain a target image frame corresponding to the original image, the migration module is specifically used to: determine the target image element in the original image based on the image content of the original image; perform edge cropping around the target image element to obtain a cropped image containing the target image element, wherein the target image element accounts for a larger proportion of the image area in the cropped image than the target image element accounts for the image area in the original image; perform style migration on each of the cropped images based on the style migration model corresponding to the target image style to obtain a target image frame corresponding to the original image.
  • the migration module before performing image style migration on at least two of the original images respectively to obtain corresponding target image frames with the target image style, the migration module is further used to: determine the target image style based on image content corresponding to the at least two original images.
  • the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout represents the size and/or position of the puzzle area in the target puzzle.
  • the combination module determines the target puzzle layout based on the image contents corresponding to at least two of the original images, it is specifically used to: obtain context information based on the image contents corresponding to the at least two original images, the context information representing the contextual relationship between the image contents corresponding to the at least two initial images; and determine the target puzzle layout based on the context information.
  • the combining module is further used to: add a special effect identifier to the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on the image content of the target image frame.
  • the combination module when the combination module adds a special effect mark to the target image frame to obtain a special effect target image frame, it is specifically used to: perform facial feature detection on the character elements in the target image frame to obtain corresponding facial expression features; based on the facial expression features, determine Determine a corresponding target special effect identifier; determine a target display position of the target special effect identifier; based on the target display position, add the target special effect identifier to the target image frame to obtain a special effect target image frame.
  • an electronic device comprising: a processor, and a memory communicatively connected to the processor;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory to implement the image generating method as described in the first aspect and various possible designs of the first aspect.
  • a computer-readable storage medium stores computer execution instructions.
  • the image generation method described in the first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the image generation method described in the first aspect and various possible designs of the first aspect.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Des modes de réalisation de la présente divulgation concernent un procédé et un appareil de génération d'image, un dispositif électronique et un support de stockage. Le procédé consiste à : acquérir au moins deux images d'origine ; effectuer respectivement une migration de style d'image sur lesdites deux images d'origine pour obtenir des trames d'image cibles correspondantes ayant un style d'image cible ; et combiner lesdites deux trames d'image cibles selon une disposition d'image d'épissage cible pour générer une image épissée cible, la disposition d'image épissée cible étant déterminée sur la base du contenu d'image desdites deux images d'origine dans les données d'image initiales. Des trames d'image cibles ayant un style d'image spécifique sont obtenues en effectuant une migration de style sur une pluralité d'images d'origine dans des données d'image initiales, puis les trames d'image cibles sont combinées pour obtenir une image épissée cible correspondant au contenu de la pluralité d'images d'origine, de telle sorte que les informations efficaces dans une pluralité de trames d'image d'origine sont complètement affichées, et l'expression visuelle est améliorée.
PCT/CN2023/132440 2022-11-18 2023-11-17 Procédé et appareil de génération d'image, dispositif électronique et support de stockage WO2024104477A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211449865.7 2022-11-18
CN202211449865.7A CN118071577A (zh) 2022-11-18 2022-11-18 图像生成方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024104477A1 true WO2024104477A1 (fr) 2024-05-23

Family

ID=91083871

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/132440 WO2024104477A1 (fr) 2022-11-18 2023-11-17 Procédé et appareil de génération d'image, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN118071577A (fr)
WO (1) WO2024104477A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112215854A (zh) * 2020-10-19 2021-01-12 珠海金山网络游戏科技有限公司 一种图像处理的方法及装置
US20210125372A1 (en) * 2019-10-24 2021-04-29 Microsoft Technology Licensing, Llc Prior informed pose and scale estimation
CN113012082A (zh) * 2021-02-09 2021-06-22 北京字跳网络技术有限公司 图像显示方法、装置、设备及介质
CN113590854A (zh) * 2021-09-29 2021-11-02 腾讯科技(深圳)有限公司 一种数据处理方法、设备以及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210125372A1 (en) * 2019-10-24 2021-04-29 Microsoft Technology Licensing, Llc Prior informed pose and scale estimation
CN112215854A (zh) * 2020-10-19 2021-01-12 珠海金山网络游戏科技有限公司 一种图像处理的方法及装置
CN113012082A (zh) * 2021-02-09 2021-06-22 北京字跳网络技术有限公司 图像显示方法、装置、设备及介质
CN113590854A (zh) * 2021-09-29 2021-11-02 腾讯科技(深圳)有限公司 一种数据处理方法、设备以及计算机可读存储介质

Also Published As

Publication number Publication date
CN118071577A (zh) 2024-05-24

Similar Documents

Publication Publication Date Title
CN108010112B (zh) 动画处理方法、装置及存储介质
CN109688463B (zh) 一种剪辑视频生成方法、装置、终端设备及存储介质
CN109729426B (zh) 一种视频封面图像的生成方法及装置
CN110968736B (zh) 视频生成方法、装置、电子设备及存储介质
CN111696176B (zh) 图像处理方法、装置、电子设备及计算机可读介质
US20140361974A1 (en) Karaoke avatar animation based on facial motion data
US20220350842A1 (en) Video tag determination method, terminal, and storage medium
CN108846886B (zh) 一种ar表情的生成方法、客户端、终端和存储介质
EP4243398A1 (fr) Procédé et appareil de traitement vidéo, dispositif électronique et support de stockage
CN110070496B (zh) 图像特效的生成方法、装置和硬件装置
WO2021254502A1 (fr) Procédé et appareil d'affichage d'objet cible, et dispositif électronique
WO2019114328A1 (fr) Procédé de traitement vidéo faisant appel à la réalité augmentée et dispositif associé
CN113453040A (zh) 短视频的生成方法、装置、相关设备及介质
JP7209851B2 (ja) 画像変形の制御方法、装置およびハードウェア装置
CN112035046B (zh) 榜单信息显示方法、装置、电子设备及存储介质
CN109600559B (zh) 一种视频特效添加方法、装置、终端设备及存储介质
EP4300431A1 (fr) Procédé et appareil de traitement d'action pour objet virtuel, et support de stockage
CN114331820A (zh) 图像处理方法、装置、电子设备及存储介质
CN103997687A (zh) 用于向视频增加交互特征的技术
Tolosana et al. An introduction to digital face manipulation
CN112785669B (zh) 一种虚拟形象合成方法、装置、设备及存储介质
CN114697759A (zh) 虚拟形象视频生成方法及其系统、电子设备、存储介质
CN110619602B (zh) 一种图像生成方法、装置、电子设备及存储介质
WO2024104477A1 (fr) Procédé et appareil de génération d'image, dispositif électronique et support de stockage
CN114697568B (zh) 特效视频确定方法、装置、电子设备及存储介质