WO2024104477A1

WO2024104477A1 - Image generation method and apparatus, electronic device, and storage medium

Info

Publication number: WO2024104477A1
Application number: PCT/CN2023/132440
Authority: WO
Inventors: 王晶; 苗旺; 徐雨旸; 徐丁丁; 刘松伟
Original assignee: 北京字跳网络技术有限公司
Priority date: 2022-11-18
Filing date: 2023-11-17
Publication date: 2024-05-23
Also published as: CN118071577A

Abstract

Embodiments of the present disclosure provide an image generation method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring at least two original images; respectively performing image style migration on the at least two original images to obtain corresponding target image frames having a target image style; and combining the at least two target image frames according to a target-splicing image layout to generate a target spliced image, wherein the target spliced image layout is determined on the basis of the image content of the at least two original images in the initial image data. Target image frames having a specific image style are obtained by performing style migration on a plurality of original images in initial image data, and then the target image frames are combined to obtain a target spliced image matching the content of the plurality of original images, so that the effective information in a plurality of original image frames is fully displayed, and the visual expression is improved.

Description

Image generation method, device, electronic device and storage medium

This application claims priority to the Chinese invention patent application entitled “Image generation method, device, electronic device and storage medium” filed on November 18, 2022 and application number 202211449865.7. The entire contents of this application are incorporated by reference into this application.

Technical Field

The embodiments of the present disclosure relate to the field of image generation technology, and in particular, to an image generation method, device, electronic device, and storage medium.

Background technique

Currently, taking the application scenario of video content creation as an example, users need to generate a corresponding image as a video cover based on video data to achieve the purpose of previewing and displaying the video content. In the prior art, a certain video frame in the video is usually extracted based on a manual selection by the user to generate the above video cover.

However, images generated by video data in the prior art have problems such as small amount of image information and poor visual expression.

Summary of the invention

The embodiments of the present disclosure provide an image generation method, an apparatus, an electronic device, and a storage medium to overcome the problem that the generated images have little image information and poor visual expression.

In a first aspect, an embodiment of the present disclosure provides an image generation method, comprising:

Acquire at least two original images, and perform image style transfer on at least two of the original images respectively to obtain corresponding target image frames with target image style; determine a target puzzle layout according to image contents corresponding to at least two of the original images; combine at least two of the target image frames according to the target puzzle layout to generate a target puzzle.

In a second aspect, an embodiment of the present disclosure provides an image generating device, including:

An acquisition module, used for acquiring at least two original images;

A migration module is used to perform image style migration on at least two of the original images to obtain a corresponding target image frame having a target image style;

The combination module is used to determine a target puzzle layout according to image contents corresponding to at least two of the original images, and to combine at least two of the target image frames according to the target puzzle layout to generate a target puzzle.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including:

A processor, and a memory communicatively connected to the processor;

The memory stores computer-executable instructions;

The processor executes the computer-executable instructions stored in the memory to implement the image generating method as described in the first aspect and various possible designs of the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored. When a processor executes the computer execution instructions, the image generation method described in the first aspect and various possible designs of the first aspect is implemented.

In a fifth aspect, an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the image generation method described in the first aspect and various possible designs of the first aspect.

The image generation method, device, electronic device and storage medium provided in the present embodiment obtain at least two original images, and perform image style transfer on at least two of the original images respectively to obtain corresponding target image frames with target image style; determine the target puzzle layout according to the image content corresponding to at least two of the original images; and combine at least two of the target image frames according to the target puzzle layout to generate a target puzzle. Since the target image frame with a specific image style is obtained by performing style transfer on at least two original images, and then the target image frames are arranged and combined to obtain a target puzzle with a layout matching the content of multiple original images, the target puzzle can not only display multiple frames of images with style special effects, but also display the content relevance of multiple frames of images with style special effects through the puzzle layout, thereby achieving full display of effective information in multiple frames of original images and improving visual expression.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following briefly introduces the drawings required for use in the embodiments or the prior art descriptions. Obviously, the drawings described below are some embodiments of the present disclosure. For ordinary technicians in this field, Other drawings can be obtained based on these drawings without any creative effort.

FIG1 is a diagram of an application scenario of the image generation method provided by an embodiment of the present disclosure;

FIG2 is a flowchart of an image generation method according to an embodiment of the present disclosure;

FIG3 is a flowchart of a specific implementation method of step S102 in the embodiment shown in FIG2 ;

FIG4 is a flowchart of a specific implementation method of step S1022 in the embodiment shown in FIG3 ;

FIG5 is a schematic diagram of determining a first image frame provided by an embodiment of the present disclosure;

FIG6 is a schematic diagram of a target puzzle layout of a target puzzle provided by an embodiment of the present disclosure;

FIG7 is a second flow chart of the image generation method provided by an embodiment of the present disclosure;

FIG8 is a schematic diagram of a cropped image provided by an embodiment of the present disclosure;

FIG9 is a flowchart of a specific implementation method of step S206 in the embodiment shown in FIG2 ;

FIG10 is a flowchart of a specific implementation method of step S208 in the embodiment shown in FIG2 ;

FIG11 is a structural block diagram of an image generating device provided by an embodiment of the present disclosure;

FIG12 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solution and advantages of the embodiments of the present disclosure clearer, the technical solution in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.

The application scenarios of the embodiments of the present disclosure are explained below:

FIG1 is a diagram of an application scenario of the image generation method provided in an embodiment of the present disclosure. The image generation method provided in an embodiment of the present disclosure can be applied to application scenarios of video image editing and processing such as video cover generation and video conversion to a picture set. Specifically, taking the application scenario of generating a video cover as an example, the method provided in an embodiment of the present disclosure can be applied to a terminal device or a server. Taking the application to a terminal device as an example, a video editing application (APP) is running in the terminal device. As shown in FIG1 , after the user obtains the video to be processed by shooting, downloading from a server, or receiving transmission from other terminal devices, the user loads the video to be processed into a video editing application (shown as an App in the figure) to which the image generation method provided in an embodiment of the present application is applied. Afterwards, the terminal device uses the video editing application to process the video to be processed. After the video is processed, a target image capable of representing the video content is generated based on the video content in the video to be processed, so that the target image is used as the video cover of the video to be processed.

In the prior art, taking the application scenario of making a video cover in the process of video content creation as an example, the user generates a corresponding image as a video cover based on the video data by operating the terminal device to achieve the purpose of previewing and displaying the video content. In the prior art, the above-mentioned image is usually generated by extracting a certain video frame in the video based on the user's manual selection. However, the video data includes multiple frames of images, and the image content of each frame of the image is different. Taking one of the frames as the video cover cannot fully express the content of the video data, and there is a problem of small amount of image information. At the same time, using the original image frame in the video data as the video cover, compared with the display method of continuous playback of the video content of the video data, will cause the video cover to fail to highlight the key points of the video content, affecting the visual expression and information expression ability.

In addition, other image processing application scenarios to which the method of this embodiment is applicable, such as converting a video into one or more pictures, or generating a puzzle using a picture set, also face the above-mentioned problems.

The embodiments of the present disclosure provide an image generation method to solve the above problems.

Referring to FIG. 2 , FIG. 2 is a flow chart of an image generation method provided by an embodiment of the present disclosure. The method of this embodiment can be applied in a terminal device, and the image generation method includes:

Step S101: Acquire at least two original images.

Exemplarily, the original image is an image used as the material of the target puzzle, and the original image can be obtained by extracting frames from the material data. The material data can be a video, or a picture set, or a collection of the two. Taking the material data as a video as an example, refer to the application scenario diagram shown in Figure 1, and the material data corresponds to the video to be processed in the embodiment shown in Figure 1. The material data can be obtained by shooting by the user through the image acquisition unit of the terminal device, such as a camera; it can also be obtained by accessing the server for downloading, or by receiving data sent by other terminal devices. It can be set here as needed, and no further examples are given.

The material data includes at least two frames of material images. Taking the material image as video data as an example, the material data (video data) is composed of multiple video frames (material images). By decoding the initial image data, the video frames constituting the initial image data can be obtained. Afterwards, based on the preset rules, the video frames (material images) in the material data are screened to obtain images that meet the requirements of the rules, that is, the original images. For example, the key frames (I-frames) in the material data are used as the original images. Among them, the method of determining and retrieving key frames in video data is a prior art and will not be described in detail here.

In another possible implementation, the material data may be frame extracted based on the content of the material data, thereby obtaining an original image of the material used to generate a subsequent target puzzle. For example, as shown in FIG3 , the specific implementation of step S101 includes:

Step S1011: Acquire material data, where the material data includes a video and/or a picture set;

Step S1012: extracting frames of the material data according to the image content of the material images in the material data to obtain at least two original images, where the material images are video frames in the video and/or pictures in the picture set.

Exemplarily, after obtaining the material data, image recognition is first performed on each material image in the material data to obtain the image content of each material image, wherein there are multiple specific implementation methods of the image content, for example, it can be a feature matrix (feature) describing the image content, it can also be a pixel matrix describing the image content, or it can be an identifier representing the specific content in the image. More specifically, for example, when the image includes a portrait, the corresponding identifier (image content) is #001, and when the image includes a landscape, the corresponding identifier (image content) is #002.

Furthermore, the content can be further subdivided on this basis to obtain a more detailed identifier. For example, when the image includes a portrait, the corresponding identifier (image content) is #001_1; when the image includes two portraits, the corresponding identifier (image content) is #001_2. The specific representation method of the identifier representing the specific content in the material image, and the mapping relationship between the identifier and the specific content in the image can be set based on specific needs, and examples are not given here one by one.

Further, after obtaining the image content corresponding to each material image, the material images are screened according to the image content corresponding to the material images, and at least two key image frames that can better represent the important content in the material data, i.e., original images, are determined to generate the target puzzle. In a possible implementation, as shown in FIG4 , the specific implementation of step S1012 includes:

Step S1012A: based on the image content of the material image, obtaining the posture similarity corresponding to the material image, where the posture similarity represents the similarity between the posture of the human element in the image content and the target posture.

Step S1012B: Determine at least two original images according to the posture similarities corresponding to the material images.

Exemplarily, after identifying each material image, the image content of the material image is obtained. In this embodiment, the image content includes human elements, which refer to image elements related to human portraits, such as The head, torso, limbs, hands and the entire portrait in the portrait. In the image content of the material image, the human elements present different postures. The posture of the human elements in the image content is compared with the preset target posture to obtain the posture similarity. Among them, the more consistent the posture of the human elements in the image content of the material image is with the target posture, the higher the posture similarity; otherwise, the lower the posture similarity. The specific calculation method of posture similarity can be implemented based on the image consistency algorithm. The image consistency algorithm is a prior art known to those skilled in the art and will not be repeated here. Among them, the posture of the human elements includes, for example: one or more of facial expressions, movements of the limbs and torso, and hand movements.

In this embodiment, the purpose of obtaining the posture similarity corresponding to each material image is to evaluate whether the material image can express the important content in the material data, so as to be screened as the original image as the material of the target puzzle generated subsequently; wherein the target posture includes a plurality of preset postures of human elements, for example, facial smiling expression, facial laughing expression, limb movements when waving, etc., which can be used as a rule for judging whether the posture of the human element can express effective information (for example, whether it can express happy or angry emotions), and whether it conforms to the aesthetic characteristics. Therefore, by comparing the similarity between the posture of the human element in the image content of each material image and the target posture, the posture similarity corresponding to each original image is obtained, and then based on the posture similarity, the image content of each original image is screened out, and the human posture is more meaningful and more in line with the aesthetic characteristics, so that the target image frame and the target puzzle generated subsequently can express more abundant effective information, and make the portrait in the image more beautiful. Further, in a possible implementation method, according to the posture similarity corresponding to each material image, the specific implementation method of determining at least two original images includes:

The material image with posture similarity greater than a first similarity threshold is determined as the first image frame, and/or the material image with posture similarity less than a second similarity threshold is determined as the first image frame; wherein the first similarity threshold is greater than the second similarity threshold.

FIG5 is a schematic diagram of determining an original image provided by an embodiment of the present disclosure. The process of determining at least two original images in the above steps is introduced in conjunction with FIG5. Referring to FIG5, the posture similarity, the first similarity threshold and the second similarity threshold are all normalized values, wherein, when the posture similarity = 1, it indicates complete consistency; when the posture similarity = 0, it indicates complete inconsistency. The first similarity threshold is greater than the second similarity threshold. Specifically, for example, the first similarity threshold (shown as p1 in the figure) is, for example, 0.9; the first similarity threshold (shown as p2 in the figure) is, for example, 0.2. Based on the target posture, after processing the image contents of material image A, material image A and material image C respectively, the posture similarity corresponding to material image A is obtained as gesture_evl_A=0.95, and the posture similarity corresponding to material image B is obtained as gesture_evl_A=0.95. The gesture similarity of the material image C is gesture_evl_B＝0.7, and the gesture similarity of the material image C is gesture_evl_C＝0.1. On the one hand, the gesture similarity of the material image A is gesture_evl_A＝0.95, which satisfies the condition of being greater than the first similarity threshold (gesture_evl_A>0.9), that is, the gesture of the human element corresponding to the material image A is very similar to the target gesture, so the gesture of the human element corresponding to the material image A is regarded as the target gesture, and the original image A is determined as the original image. On the other hand, the gesture similarity of the material image C is gesture_evl_C＝0.1, which satisfies the condition of being less than the second similarity threshold (gesture_evl_C<0.2), that is, the gesture of the human element corresponding to the material image C is very different from the target gesture. In this case, it is considered that the gesture of the human figure in the material image is a gesture designed by the user with purpose. Although there is a possibility that it does not meet the aesthetic characteristics (due to the large difference from the target gesture), the gesture contains more information, so the material image C is also determined as the original image. The gesture similarity gesture_evl_B=0.8 corresponding to the material image B does not satisfy the condition of being less than the second similarity threshold, nor does it satisfy the condition of being greater than the first similarity threshold (0.2<gesture_evl_B<0.9). In this case, the gesture corresponding to the material image B is regarded as a degraded target gesture, which cannot meet the requirements of aesthetic features. At the same time, because it is close to the target gesture, it is not enough to reflect enough information. Therefore, the material image B is excluded and not used as the original image.

In the steps of this embodiment, by obtaining the posture similarity of the material image, and based on the size of the posture similarity, the original image that is greater than the first similarity threshold and/or less than the second similarity threshold is determined as the original image, thereby ensuring that the first image frame can contain more information and take into account aesthetic characteristics, thereby achieving a full display of the data content in the material data, and improving the information content and image perception of the subsequently generated target image frames and target puzzles.

Step S102: performing image style transfer on at least two original images respectively to obtain corresponding target image frames having the target image style.

Exemplarily, after obtaining at least two original images, image style transfer is performed on the multiple original images respectively. For example, the material data includes 30 key frames in total. The 30 key frames are used as original images, and image style transfer is performed respectively to obtain 30 corresponding image frames with the same specific image style (target image style), namely, target image frames. Image style transfer refers to adding image style effects to an image so that the processed image has a certain image painting style in color and line, such as oil painting style, comic style, sketch style, etc. The specific implementation method is, for example, to process the original images separately through a pre-trained style transfer model that can achieve the target image style, so as to obtain images with the target image style, namely, target image frames. Specific training of the style transfer model The method of use is known to those skilled in the art and will not be described in detail here.

Step S103: determining a target puzzle layout according to image contents corresponding to at least two original images.

Step S104: combining at least two target image frames according to a target puzzle layout to generate a target puzzle.

Exemplarily, after obtaining at least two target image frames, each target image frame is spliced and combined to obtain an image with certain layout rules, namely, a target puzzle. Exemplarily, the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout characterizes the size and/or position of the puzzle area in the target puzzle. Figure 6 is a schematic diagram of a target puzzle layout of a target puzzle provided in an embodiment of the present disclosure. As shown in Figure 6, the target puzzle is composed of four target image frames, namely, target image frame A, target image frame B, target image frame C and target image frame D, wherein each target image frame corresponds to a puzzle area, the puzzle area corresponding to target image frame A is relatively large and is located on the left side of the target puzzle, and the puzzle areas corresponding to target image frames B, target image frames C and target image frames D are relatively small and are located on the right side of the target puzzle.

The target puzzle layout of the target puzzle is not randomly generated, but is determined based on the image content of each original image. In a possible implementation, the step of generating the target puzzle layout includes:

Step S103A: obtaining layout information according to the image contents corresponding to at least two original images in the initial image data, wherein the layout information represents the size and/or position of each puzzle area in the target puzzle.

Step S103B: Generate a target puzzle layout according to the layout information.

Exemplarily, the above steps are performed before step S103. Specifically, for example, referring to the target puzzle shown in Figure 6, the initial image data is a video data for introducing clothing matching. In each target image frame obtained based on the original image in the initial image data, the image content of the target image frame A corresponds to the overall portrait (character element, the same below), which is located in the most prominent main position on the left side of the target puzzle to show the overall clothing matching effect of the character; while the image content of the target image frame B corresponds to the front of the portrait, the image content of the target image frame C corresponds to the back of the portrait, and the image content of the target image frame D corresponds to the side of the portrait, all of which are located in a secondary position on the right side of the target puzzle, for showing the clothing matching effects of the character on the front, back and side, so that the target puzzle can realize the display of important content information in the video data (initial image data) (the front, back, side and overall effect of clothing matching), thereby increasing the amount of information of the target puzzle.

Furthermore, the method for obtaining the image content of each target image frame has been introduced in the previous step. In a possible implementation, the target image frames can be sorted according to the posture similarity of the target posture or the prominence of the aesthetic features, so as to determine the area and position of the puzzle area corresponding to each target video frame, and then determine the target puzzle layout.

In this embodiment, at least two original images are obtained, and image style transfer is performed on the at least two original images respectively to obtain corresponding target image frames with target image style; the target puzzle layout is determined according to the image content corresponding to the at least two original images; and the at least two target image frames are combined according to the target puzzle layout to generate a target puzzle. Since the target image frame with a specific image style is obtained by performing style transfer on at least two original images, and the target image frames are arranged and combined to obtain a target puzzle with a layout matching the content of multiple original images, the target puzzle can not only display multiple frames of images with style special effects, but also display the relevance of the content of multiple frames of images with style special effects through the puzzle layout, thereby achieving full display of effective information in multiple frames of original images and improving visual expression.

Referring to FIG. 7 , FIG. 7 is a second flow chart of the image generation method provided by the embodiment of the present disclosure. Based on the embodiment shown in FIG. 2 , this embodiment further refines step S102 and adds a step of determining the target puzzle layout. The image generation method includes:

Step S201: Acquire material data, where the material data includes a video and/or a picture set.

Step S202: Acquire the image content of each material image in the material data.

The specific implementation of steps S201 - S202 of this embodiment has been described in detail in the embodiment shown in FIG. 2 , and will not be repeated here.

Step S203: acquiring the dynamic definition corresponding to each material image, and obtaining at least two original images based on the image content and the corresponding dynamic definition of each material image.

Exemplarily, in the case of acquiring material images, it is possible to further perform clarity detection on each material image to obtain the corresponding dynamic clarity, wherein dynamic clarity refers to the picture clarity when playing dynamic images, which is specifically reflected in whether the dynamic picture has "tailing" or "ghosting" and other phenomena. Clarity detection can be obtained by performing correlation analysis on the image. Specifically, for example, the image is divided into several horizontal or vertical regions, and then the correlation of adjacent regions is calculated. If there are "tailing" or "ghosting" phenomena, that is, the dynamic clarity is low, the correlation is large, otherwise the correlation is small. Thus, based on the calculation of the correlation, the corresponding dynamic clarity is obtained. Of course, there are other possible implementation methods for dynamic clarity, which will not be described one by one here.

Further, after obtaining the dynamic definition corresponding to the material image, While filtering each material image based on image content, each material image is further filtered based on dynamic definition, and images with lower dynamic definition are removed, and images with higher dynamic definition are retained as original images, thereby improving the picture clarity of the original image and improving the visual effect of the target image frame generated subsequently. The implementation method of filtering each material image based on the image content of the material image has been introduced in detail in the embodiment shown in FIG2 and will not be repeated here.

Optionally, this embodiment also includes:

Step S203A: Determine the target image style based on the image contents corresponding to at least two original images.

Exemplarily, the target image style refers to the type of a certain image style special effect, such as oil painting style, comic style, sketch style, etc. There are many ways to determine the target image style, for example, by determining the corresponding target image style through preset configuration information; for another example, the target image style is determined based on the data content of the initial image data, wherein the image content corresponding to at least two original images refers to the image content corresponding to at least two original images respectively, and the correlation between the image contents. In a possible implementation method, the image content corresponding to at least two original images can be determined by the content of the material data corresponding to the original images. The content theme and type expressed by the image content corresponding to at least two original images can be represented by a specific content identifier, for example, the content identifier is #1, indicating that at least two original images are selfie videos of users; the content identifier is #2, indicating that at least two original images are short videos; the content identifier is #3, indicating that at least two original images are movies. The specific implementation method and expression method of the content identifier can be set based on needs, and will not be repeated here. Furthermore, there is a preset mapping relationship between the image content corresponding to at least two original images and the target image style. For example, when the image content corresponding to at least two original images is a portrait selfie video, the corresponding target image style is a cartoon style; when the image content corresponding to at least two original images is a short video, the corresponding target image style is a sketch style. In this embodiment, the corresponding target image style is determined by the image content corresponding to at least two original images and the content correlation between the image content corresponding to at least two original images, so that the image style of the generated target image frame matches the image content corresponding to at least two original images, thereby improving the visual expression of the target image frame.

Step S204: Determine a target image element in the original image based on the image content of the original image.

Step S205: performing edge cropping around the target image element to obtain a cropped image including the target image element, wherein the image area proportion of the target image element in the cropped image is greater than the image area proportion of the target image element in the original image.

For example, after the original image is obtained, the picture composition of the original image is the same as the material image. After the original image is extracted from the material data alone, the problem of failing to highlight the key points of the picture will occur due to the lack of changes in the previous and next image frames.

To solve the above problems, in this embodiment, based on the image content of the original image, the target image element in the original image is determined. For example, if the image content of the original image is a selfie of a portrait, the portrait contour is used as the center for cropping around, and the invalid area in the first image frame is cut off to obtain a cropped image containing the portrait contour (target image element). FIG8 is a schematic diagram of a cropped image provided by an embodiment of the present disclosure. As shown in FIG8, the original image includes a portrait. After the invalid area outside the portrait is cropped based on the portrait contour, the image frame containing the portrait is obtained, that is, the cropped image. Among them. Since the invalid area in the first image frame is cropped, the proportion of the portrait contour (target image element) in the cropped image is higher than the proportion of the portrait contour in the original image. Thereby, the purpose of highlighting the focus of the picture is achieved and the visual expression of the target image frame is improved. At the same time, by cropping the original image to obtain a cropped image, the invalid image area is reduced, and the image migration efficiency in the subsequent style migration process can be increased.

Step S206: Determine a target puzzle layout according to the image contents corresponding to at least two original images.

Furthermore, after obtaining the image content of the original image, the image content of the original image can be evaluated based on certain rules, and a target puzzle layout can be generated based on the evaluation results, so that the first image frame with higher information content and/or higher aesthetics can be displayed preferentially. The specific implementation method is, for example, evaluating and sorting the first image frame based on the posture similarity, aesthetic features, etc. corresponding to the image content to generate the target puzzle layout. The specific implementation method has been introduced in the corresponding paragraph of the embodiment shown in FIG. 2 and will not be repeated here.

In a possible implementation, illustratively, as shown in FIG9 , a specific implementation of step S206 includes:

Step S2061: Obtain context information based on image contents corresponding to at least two original images, where the context information represents a contextual relationship between image contents corresponding to at least two initial images.

Step S2062: Determine the target puzzle layout according to the context information.

Exemplarily, the original image is the result of filtering the material images in the initial image data, that is, the original image is a specific material image. Different original images have continuity in content. For example, if the material data corresponding to the original image is a "dance" video, then the dance movements corresponding to the material images in the video have a temporal connection. The original image filtered out from the material image is It also has this kind of temporal correlation, that is, the contextual relationship. More specifically, for example, based on the previous steps, 100 original images are obtained. Then, semantic recognition is performed on each original image to obtain semantic information representing the dance movement corresponding to each original image; based on the semantic information corresponding to each ordered original image, contextual information is generated. The contextual information can be a feature matrix representing the correlation between semantic information. Then, based on the contextual information, the original images corresponding to repetitive dance movements and non-important dance movements are screened to obtain the number of first image frames (for example, 10) that only represent important dance movements and non-repetitive dance movements, as well as the importance evaluation value representing the importance of dance movements; then, the number of puzzle areas in the target puzzle layout, as well as the corresponding original image size and position, that is, the layout information, are determined.

In this embodiment, by obtaining context information and determining the target puzzle layout based on the context information, full use is made of the information in the initial image data, making the target puzzle layout of the generated target puzzle more reasonable, better reflecting the important information in the initial image data, and improving the display effect.

Step S207: Based on the style transfer model corresponding to the target image style, style transfer is performed on each cropped image to obtain a target image frame corresponding to each cropped image.

Step S208: displaying a special effect identifier in the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on the image content of the target image frame.

For example, after performing style transfer on the cropped image to obtain the target image frame, dynamic feature identifiers such as "fireworks sticker effects", "virtual jewelry effects", etc. can be further added to the target image frame to further improve the visual expressiveness of the target puzzle.

Exemplarily, as shown in FIG10 , the specific implementation steps of step S208 include:

Step S2081: Perform facial feature detection on the human element in the target image frame to obtain corresponding facial expression features.

Step S2082: Based on the facial expression features, determine the corresponding target special effect mark, and determine the target display position of the target special effect mark.

Step S2083: Based on the target display position, a target special effect identifier is added to the target image frame to obtain a special effect target image frame.

Exemplarily, this embodiment is applicable to a scene in which a target image frame contains human elements. Specifically, each element in the target image frame is first identified to obtain human elements, such as the face of a portrait, and then facial feature detection is performed on the human elements to obtain facial expression features, such as happiness, sadness, etc. Then, based on the facial expression features, the corresponding target special effect identifier is determined, and then based on the target image frame, the target special effect identifier is generated. The position of each element in the image is determined to determine the target display position of the special effect mark, so that the target special effect mark avoids other image elements to avoid occlusion. Finally, the target special effect mark is loaded to the target display position of the target image frame to obtain the special effect target image frame.

Step S209: combining at least two special effect target image frames according to the target puzzle layout to generate a target puzzle.

The specific implementation of steps S207 and S209 of this embodiment has been described in detail in the embodiment shown in FIG. 2 , and will not be repeated here.

Corresponding to the image generation method of the above embodiment, FIG11 is a structural block diagram of an image generation device provided by an embodiment of the present disclosure. For ease of explanation, only the parts related to the embodiment of the present disclosure are shown. Referring to FIG11 , the image generation device 3 includes:

An acquisition module 31, used for acquiring at least two original images;

A migration module 32, configured to perform image style migration on at least two original images respectively to obtain corresponding target image frames having a target image style;

The combining module 33 is used to determine a target puzzle layout according to the image contents corresponding to at least two original images, and to combine at least two target image frames according to the target puzzle layout to generate a target puzzle.

In one possible implementation, the acquisition module 31 is specifically used to: acquire material data, the material data including a video and/or a picture set; extract frames of the material data according to the image content of the material images in the material data to obtain at least two original images, the material images being video frames in the video and/or pictures in the picture set.

In a possible implementation, when the acquisition module 31 extracts frames of the material data according to the image content of the material image in the material data to obtain at least two original images, it is specifically used to: acquire the posture similarity corresponding to the material image based on the image content of the material image, the posture similarity characterizing the similarity between the posture of the character element in the image content and the target posture; determine at least two original images according to the posture similarity corresponding to each material image.

In a possible implementation, when the acquisition module 31 determines at least two original images based on the posture similarities corresponding to each material image, it is specifically used to: determine the material images whose posture similarity is greater than a first similarity threshold as the original images, and/or, determine the material images whose posture similarity is less than a second similarity threshold as the original images; wherein the first similarity threshold is greater than the second similarity threshold.

In a possible implementation, the acquisition module 31 is also used to: acquire the dynamic clarity corresponding to the material image; when the acquisition module 31 extracts frames of the material data according to the image content of the material image in the material data to obtain at least two original images, it is specifically used to: acquire at least two original images based on the image content and the corresponding dynamic clarity of each material image.

In a possible implementation, the migration module 32 is specifically used to: obtain a style migration model corresponding to the target image style; and process the original image based on the style migration model to obtain a target image frame corresponding to the original image.

In one possible implementation, when the migration module 32 processes the original image based on the style transfer model to obtain a target image frame corresponding to the original image, it is specifically used to: determine the target image element in the original image based on the image content of the original image; perform edge cropping around the target image element to obtain a cropped image containing the target image element, wherein the target image element accounts for a larger proportion of the image area in the cropped image than the target image element accounts for the image area in the original image; perform style migration on each cropped image based on the style transfer model corresponding to the target image style to obtain a target image frame corresponding to the original image.

In a possible implementation, before performing image style migration on at least two original images to obtain corresponding target image frames with target image style, the migration module 32 is further used to: determine the target image style based on image contents corresponding to the at least two original images.

In a possible implementation, the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout represents the size and/or position of the puzzle area in the target puzzle.

In a possible implementation, when determining the target puzzle layout based on the image contents corresponding to at least two original images, the combination module 33 is specifically used to: obtain context information based on the image contents corresponding to at least two original images, wherein the context information represents a contextual relationship between the image contents corresponding to at least two initial images; and determine the target puzzle layout based on the context information.

In a possible implementation, the combining module 33 is further configured to: add a special effect identifier to the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on the image content of the target image frame.

In a possible implementation, the combination module 33 adds a special effect identifier to the target image frame to obtain the special effect target image frame, which is specifically used to: perform facial feature detection on the character elements in the target image frame to obtain the corresponding facial expression features; determine the corresponding target special effect based on the facial expression features; effect identifier; determining a target display position of a target special effect identifier; and adding a target special effect identifier to a target image frame based on the target display position to obtain a special effect target image frame.

The acquisition module 31, the migration module 32 and the combination module 33 are connected in sequence. The image generation device 3 provided in this embodiment can implement the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.

FIG. 12 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 12 , the electronic device 4 includes:

A processor 41, and a memory 42 communicatively connected to the processor 41;

The memory 42 stores computer executable instructions;

The processor 41 executes the computer-executable instructions stored in the memory 42 to implement the image generation method in the embodiments shown in FIG. 2 to FIG. 10 .

Optionally, the processor 41 and the memory 42 are connected via a bus 43 .

The relevant instructions can be understood by referring to the relevant descriptions and effects corresponding to the steps in the embodiments corresponding to Figures 2 to 10, and no further details will be given here.

Referring to FIG. 13 , it shows a schematic diagram of the structure of an electronic device 900 suitable for implementing the embodiment of the present disclosure, and the electronic device 900 may be a terminal device or a server. The terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG. 13 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.

As shown in FIG. 13 , the electronic device 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random access memory (RAM) 903. Various programs and data required for the operation of the electronic device 900 are also stored in the RAM 903. The processing device 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Typically, the following devices can be connected to the I/O interface 905: including, for example, a touch screen, a touch pad, a keyboard, The electronic device 900 includes an input device 906 such as a disk, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; an output device 907 such as a liquid crystal display (LCD), a speaker, a vibrator, etc.; a storage device 908 such as a magnetic tape, a hard disk, etc.; and a communication device 909. The communication device 909 can allow the electronic device 900 to communicate with other devices wirelessly or wired to exchange data. Although FIG. 13 shows an electronic device 900 with various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or provided alternatively.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When the computer program is executed by the processing device 901, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.

It should be noted that the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above. Computer readable signal media may also be any computer readable medium other than computer readable storage media, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device. The program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any of the above. The right combination.

The computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.

The computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device executes the method shown in the above embodiment.

Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages. The program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).

The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present disclosure. In this regard, each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some implementations as replacements, the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by software or hardware. The name of a unit does not limit the unit itself in some cases. For example, the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".

The functions described above may be performed at least in part by one or more hardware logic components. For example, and without limitation, exemplary types of hardware logic components that may be used include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), complex programmable logic devices (CPLDs), and the like.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In a first aspect, according to one or more embodiments of the present disclosure, there is provided an image generation method, comprising:

According to one or more embodiments of the present disclosure, obtaining at least two original images includes: obtaining material data, wherein the material data includes a video and/or a picture set; extracting frames from the material data according to image contents of the material images in the material data to obtain at least two original images, wherein the material images are video frames in the video and/or pictures in the picture set.

According to one or more embodiments of the present disclosure, the material data is frame extracted according to the image content of the material image in the material data to obtain at least two of the original images, including: based on the image content of the material image, obtaining the posture similarity corresponding to the material image, the posture similarity characterizing the similarity between the posture of the character element in the image content and the target posture; and determining at least two original images according to the posture similarities corresponding to each of the material images.

According to one or more embodiments of the present disclosure, at least two original images are determined according to the posture similarities corresponding to the material images, including: determining the material images whose posture similarity is greater than a first similarity threshold as the original images, and/or determining the material images whose posture similarity is less than a second similarity threshold as the original images. The material image with the same similarity value is determined as the original image; wherein the first similarity threshold is greater than the second similarity threshold.

According to one or more embodiments of the present disclosure, the method further includes: obtaining the dynamic clarity corresponding to the material image; extracting frames of the material data according to the image content of the material image in the material data to obtain at least two of the original images, including; obtaining at least two of the original images based on the image content and corresponding dynamic clarity of each of the material images.

According to one or more embodiments of the present disclosure, image style transfer is performed on at least two of the original images respectively to obtain corresponding target image frames with the style of the target image, including: obtaining a style transfer model corresponding to the target image style; and processing the original images based on the style transfer model to obtain the target image frames corresponding to the original images.

According to one or more embodiments of the present disclosure, the original image is processed based on the style transfer model to obtain a target image frame corresponding to the original image, including: determining a target image element in the original image based on the image content of the original image; performing edge cropping around the target image element to obtain a cropped image containing the target image element, wherein the target image element in the cropped image accounts for a larger proportion of the image area of the target image element in the original image; and performing style migration on each of the cropped images based on the style transfer model corresponding to the style of the target image to obtain a target image frame corresponding to the original image.

According to one or more embodiments of the present disclosure, before performing image style migration on at least two of the original images respectively to obtain corresponding target image frames with the target image style, it also includes: determining the target image style based on image contents corresponding to the at least two original images.

According to one or more embodiments of the present disclosure, the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout represents the size and/or position of the puzzle area in the target puzzle.

According to one or more embodiments of the present disclosure, determining a target puzzle layout based on image contents corresponding to at least two of the original images includes: obtaining context information based on the image contents corresponding to the at least two original images, the context information representing a contextual relationship between the image contents corresponding to the at least two initial images; and determining the target puzzle layout based on the context information.

According to one or more embodiments of the present disclosure, the method further includes: adding a special effect identifier to the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on image content of the target image frame.

According to one or more embodiments of the present disclosure, a special effect identifier is added to the target image frame to obtain a special effect target image frame, including: performing facial feature detection on a person element in the target image frame to obtain a corresponding facial expression feature; determining a corresponding target special effect identifier based on the facial expression feature; determining a target display position of the target special effect identifier; and adding the target special effect identifier to the target image frame based on the target display position to obtain a special effect target image frame.

In a second aspect, according to one or more embodiments of the present disclosure, there is provided an image generating device, comprising:

An acquisition module, used for acquiring at least two original images;

A migration module, used to perform image style migration on at least two of the original images respectively to obtain corresponding target image frames with target image style;

In a possible implementation, the acquisition module is specifically used to: acquire material data, the material data including a video and/or a picture set; extract frames of the material data according to image content of the material images in the material data to obtain at least two of the original images, the material images being video frames in the video and/or pictures in the picture set.

In a possible implementation, when the acquisition module extracts frames of the material data according to the image content of the material image in the material data to obtain at least two of the original images, it is specifically used to: acquire the posture similarity corresponding to the material image based on the image content of the material image, the posture similarity representing the similarity between the posture of the character element in the image content and the target posture; determine at least two original images according to the posture similarities corresponding to each of the material images.

In a possible implementation, when the acquisition module determines at least two original images based on the posture similarities corresponding to each of the material images, it is specifically used to: determine the material image whose posture similarity is greater than a first similarity threshold as the original image, and/or, determine the material image whose posture similarity is less than a second similarity threshold as the original image; wherein the first similarity threshold is greater than the second similarity threshold.

In a possible implementation, the acquisition module is further used to: acquire the dynamic definition corresponding to the material image; when the acquisition module extracts frames of the material data according to the image content of the material image in the material data to obtain at least two original images, the acquisition module is specifically used to: At least two original images are obtained based on the image content and the corresponding dynamic definition of each of the material images.

In a possible implementation, the migration module is specifically used to: obtain a style migration model corresponding to the target image style; and process the original image based on the style migration model to obtain a target image frame corresponding to the original image.

In a possible implementation, when the migration module processes the original image based on the style migration model to obtain a target image frame corresponding to the original image, the migration module is specifically used to: determine the target image element in the original image based on the image content of the original image; perform edge cropping around the target image element to obtain a cropped image containing the target image element, wherein the target image element accounts for a larger proportion of the image area in the cropped image than the target image element accounts for the image area in the original image; perform style migration on each of the cropped images based on the style migration model corresponding to the target image style to obtain a target image frame corresponding to the original image.

In a possible implementation, before performing image style migration on at least two of the original images respectively to obtain corresponding target image frames with the target image style, the migration module is further used to: determine the target image style based on image content corresponding to the at least two original images.

In a possible implementation, when the combination module determines the target puzzle layout based on the image contents corresponding to at least two of the original images, it is specifically used to: obtain context information based on the image contents corresponding to the at least two original images, the context information representing the contextual relationship between the image contents corresponding to the at least two initial images; and determine the target puzzle layout based on the context information.

In a possible implementation, the combining module is further used to: add a special effect identifier to the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on the image content of the target image frame.

In a possible implementation, when the combination module adds a special effect mark to the target image frame to obtain a special effect target image frame, it is specifically used to: perform facial feature detection on the character elements in the target image frame to obtain corresponding facial expression features; based on the facial expression features, determine Determine a corresponding target special effect identifier; determine a target display position of the target special effect identifier; based on the target display position, add the target special effect identifier to the target image frame to obtain a special effect target image frame.

In a third aspect, according to one or more embodiments of the present disclosure, there is provided an electronic device, comprising: a processor, and a memory communicatively connected to the processor;

The memory stores computer-executable instructions;

In a fourth aspect, according to one or more embodiments of the present disclosure, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores computer execution instructions. When a processor executes the computer execution instructions, the image generation method described in the first aspect and various possible designs of the first aspect is implemented.

The above description is only a preferred embodiment of the present disclosure and an explanation of the technical principles used. Those skilled in the art should understand that the scope of disclosure involved in the present disclosure is not limited to the technical solutions formed by a specific combination of the above technical features, but should also cover other technical solutions formed by any combination of the above technical features or their equivalent features without departing from the above disclosed concept. For example, the above features are replaced with the technical features with similar functions disclosed in the present disclosure (but not limited to) by each other.

In addition, although each operation is described in a specific order, this should not be understood as requiring these operations to be performed in the specific order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Similarly, although some specific implementation details are included in the above discussion, these should not be interpreted as limiting the scope of the present disclosure. Some features described in the context of a separate embodiment can also be implemented in a single embodiment in combination. On the contrary, the various features described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable sub-combination mode.

Although the subject matter has been described in language specific to structural features and/or methodological logical acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

A method for generating an image, comprising:

Obtain at least two original images;

Performing image style migration on at least two of the original images respectively to obtain corresponding target image frames having the target image style;

Determining a target puzzle layout according to image contents corresponding to at least two of the original images;

At least two of the target image frames are combined according to a target puzzle layout to generate a target puzzle.
According to the method of claim 1, the acquiring of at least two original images comprises:

Acquiring material data, wherein the material data includes a video and/or a picture set;

According to the image content of the material images in the material data, the material data is frame extracted to obtain at least two original images, and the material images are video frames in the video and/or pictures in the picture set.
The method according to claim 2, extracting frames of the material data according to the image content of the material image in the material data to obtain at least two of the original images, comprises:

Based on the image content of the material image, obtaining a posture similarity corresponding to the material image, wherein the posture similarity represents a similarity between a posture of a human element in the image content and a target posture;

At least two original images are determined according to the posture similarities corresponding to the material images.
The method according to claim 3, determining at least two original images according to the posture similarities corresponding to the material images, comprises:

Determine the material image whose posture similarity is greater than a first similarity threshold as the original image, and/or,

Determine the material image whose posture similarity is less than a second similarity threshold as the original image;

The first similarity threshold is greater than the second similarity threshold.
The method according to claim 2, further comprising:

Obtaining the dynamic definition corresponding to the material image;

Extracting frames of the material data according to image contents of the material images in the material data to obtain at least two original images, including:

At least two original images are obtained based on the image content and the corresponding dynamic definition of each of the material images.
According to the method of claim 1, performing image style transfer on at least two of the original images respectively to obtain corresponding target image frames having the target image style, comprising:

Obtaining a style transfer model corresponding to the target image style;

The original image is processed based on the style transfer model to obtain a target image frame corresponding to the original image.
According to the method of claim 6, processing the original image based on the style transfer model to obtain a target image frame corresponding to the original image comprises:

Determining a target image element in the original image based on image content of the original image;

Perform edge cropping around the target image element to obtain a cropped image containing the target image element, wherein the image area proportion of the target image element in the cropped image is greater than the image area proportion of the target image element in the original image;

Style transfer is performed on each of the cropped images based on a style transfer model corresponding to the target image style to obtain a target image frame corresponding to the original image.
The method according to claim 1, before performing image style migration on at least two of the original images to obtain corresponding target image frames having the target image style, further comprises:

The target image style is determined based on image contents corresponding to at least two of the original images.
According to the method of claim 1, the target puzzle includes at least two puzzle areas, each puzzle area is used to display a corresponding target image frame, and the target puzzle layout represents the size and/or position of the puzzle area in the target puzzle.
The method according to claim 1, determining a target puzzle layout according to image contents corresponding to at least two of the original images, comprises:

Obtaining context information according to image contents corresponding to the at least two original images, wherein the context information represents a contextual relationship between image contents corresponding to the at least two initial images;

According to the context information, a target puzzle layout is determined.
The method according to claim 1, further comprising:

A special effect identifier is added to the target image frame to obtain a special effect target image frame, wherein the special effect identifier is determined based on the image content of the target image frame.
The method according to claim 11, adding a special effect identifier to the target image frame to obtain a special effect target image frame, comprises:

Perform facial feature detection on the human elements in the target image frame to obtain the corresponding facial representation emotional characteristics;

Based on the facial expression features, determining a corresponding target special effect identifier;

Determining a target display position of the target special effect mark;

Based on the target display position, the target special effect identifier is added to the target image frame to obtain a special effect target image frame.
An image generating device, comprising:

An acquisition module, used for acquiring at least two original images;

A migration module, used to perform image style migration on at least two of the original images respectively to obtain corresponding target image frames with target image style;

The combination module is used to determine a target puzzle layout according to image contents corresponding to at least two of the original images, and to combine at least two of the target image frames according to the target puzzle layout to generate a target puzzle.
An electronic device comprises: a processor, and a memory communicatively connected to the processor;

The memory stores computer-executable instructions;

The processor executes the computer-executable instructions stored in the memory to implement the image generating method according to any one of claims 1 to 12.
A computer-readable storage medium having computer-executable instructions stored therein, wherein when a processor executes the computer-executable instructions, the image generation method according to any one of claims 1 to 12 is implemented.
A computer program product comprises a computer program, wherein when the computer program is executed by a processor, the image generation method according to any one of claims 1 to 12 is implemented.