CN114119438A

CN114119438A - Training method and device of image collage model and image collage method and device

Info

Publication number: CN114119438A
Application number: CN202111334867.7A
Authority: CN
Inventors: 张明睿; 李马丁; 孙明; 戴宇荣; 陈莉; 于冰
Original assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-03-01

Abstract

The disclosure provides a training method and equipment of an image collage model and an image collage method and equipment. The training method comprises the following steps: acquiring a training sample comprising a plurality of subimages, and collaging the subimages into an image to obtain an initial collage image to be adjusted; obtaining the aesthetic characteristics of the collage image to be adjusted currently; inputting the obtained aesthetic features into the image collage model to obtain a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be adjusted currently, and returning to the step of obtaining the aesthetic features of the collage image to be adjusted currently; when it is determined that the collage image is not to be continually adjusted, parameters of the image collage model are adjusted according to a reward function that performs a sequence of predicted collage adjustment actions to train the image collage model.

Description

Training method and device of image collage model and image collage method and device

Technical Field

The present disclosure relates generally to the field of image processing technology, and more particularly, to a method and apparatus for training an image collage model, and an image collage method and apparatus.

Background

The image collage technology refers to a technology for collaging a plurality of key frames in a plurality of images or videos into one image. The image collage is beautiful, has a large amount of information, can attract the attention of a user in a short time, transmits more information, and can be applied to various service scenes.

The existing large part image collage technology has higher manufacturing cost, and some more beautiful collage effects even need to be achieved by manual processing by a designer with high proficiency. The image automatic collage algorithm can obviously reduce the manufacturing cost of the image collage, and a user can complete the image collage by one key only by inputting videos or a plurality of images. However, currently, there are few studies on automatic image collage, and the quality of image collage is poor.

Disclosure of Invention

An exemplary embodiment of the present disclosure is directed to a training method and apparatus of an image collage model and an image collage method and apparatus, which at least solve the problems of the related art described above, and which may or may not solve any of the problems described above.

According to a first aspect of the embodiments of the present disclosure, there is provided a training method of an image collage model, the training method including: acquiring a training sample comprising a plurality of subimages, and collaging the subimages into an image to obtain an initial collage image to be adjusted; obtaining the aesthetic characteristics of the collage image to be adjusted currently; inputting the obtained aesthetic features into the image collage model to obtain a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be adjusted currently, and returning to the step of obtaining the aesthetic features of the collage image to be adjusted currently; when it is determined that the collage image is not to be continually adjusted, parameters of the image collage model are adjusted according to a reward function that performs a sequence of predicted collage adjustment actions to train the image collage model.

Optionally, inputting the obtained aesthetic features into the image collage model, obtaining a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be adjusted currently, and returning to execute the step of obtaining the aesthetic features of the collage image to be adjusted currently, including: inputting the obtained aesthetic characteristics into the image collage model to obtain a collage adjustment action sequence needing to be executed in the prediction; executing the collage adjusting action sequence of the current prediction on the collage image to be adjusted currently to obtain the collage image after the collage adjustment; cutting the collage image after the collage adjustment according to the specified length-width ratio; and determining whether to continuously adjust the collage image, wherein when the collage image is determined to be continuously adjusted, the collage image obtained by cutting at this time is used as the collage image to be currently adjusted, and the step of obtaining the aesthetic characteristics of the collage image to be currently adjusted is returned to be executed.

Optionally, the step of obtaining the aesthetic features of the collage image currently to be adjusted comprises: inputting the collage image to be adjusted into an aesthetic feature generation network to obtain the aesthetic features of a plurality of image blocks sampled out of order from the collage image to be adjusted; and performing fusion processing on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be adjusted currently, wherein the aesthetic features of the collage image to be adjusted currently are used for representing the image element aesthetic features and composition aesthetic features of the collage image to be adjusted currently.

Optionally, the step of performing a fusion process on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be currently adjusted includes: and performing fusion processing on the aesthetic features of the image blocks based on the distance between the image block and the center of the collage image to be adjusted currently and/or the area of the image block to obtain the aesthetic features of the collage image to be adjusted currently.

Optionally, the step of performing a fusion process on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be currently adjusted includes: and performing fusion processing on the obtained aesthetic characteristics of the plurality of image blocks through an attention mechanism aiming at composition aesthetics to obtain the aesthetic characteristics of the collage image to be adjusted currently.

Optionally, the type of collage adjustment action includes: a layout adjustment action and/or a local adjustment action; wherein the types of layout adjustment actions include: exchanging the positions of the two sub-images and terminating the layout adjustment action; wherein the type of local adjustment action comprises at least one of: adjusting the placement position of the sub-images, not adjusting the placement position of the sub-images, rotating the placement angle of the sub-images, not adjusting the placement angle of the sub-images, adjusting the layer sequence of the sub-images, not adjusting the layer sequence of the sub-images.

Optionally, the reward function for performing each predicted collage adjustment sequence is calculated by: a reward function for executing the collage adjustment sequence of the present prediction is calculated based on a difference between the aesthetic score of the collage image obtained after executing the collage adjustment sequence of the present prediction and the aesthetic score of the collage image obtained after executing the collage adjustment sequence of the previous prediction.

Optionally, the reward function for performing the tth predicted collage adjustment sequence of actions is calculated based on: t, the difference between the aesthetic score of the collage image obtained after execution of the collage adjustment sequence predicted the tth time, and the aesthetic score of the collage image obtained after execution of the collage adjustment sequence predicted the t-1 st time.

Optionally, the aesthetic score of the collage image is calculated based on the number of aesthetic boxes of the collage image and/or a blank lost area of the collage image.

Optionally, the step of cropping the tile image after the tile adjustment according to the specified aspect ratio includes: cutting out a plurality of candidate views which accord with the specified length-width ratio from the collage image after the collage adjustment; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

Optionally, the step of performing fusion processing on the aesthetic features of the plurality of image blocks based on the distance between the image block and the center of the collage image to be currently adjusted and/or the area of the image block to obtain the aesthetic features of the collage image to be currently adjusted includes: and weighting the aesthetic characteristics of the plurality of image blocks based on the Euclidean distance between the image blocks and the center of the collage image to be adjusted currently and the ratio of the area of the image blocks to the area of the collage image to be adjusted currently so as to obtain the aesthetic characteristics of the collage image to be adjusted currently.

According to a second aspect of the embodiments of the present disclosure, there is provided an image collage method including: acquiring a plurality of subimages to be collaged, and collaging the subimages to be collaged into one image to obtain an initial collage image to be adjusted; obtaining the aesthetic characteristics of the collage image to be adjusted currently; inputting the obtained aesthetic features into an image collage model to obtain a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be adjusted currently, and returning to the step of obtaining the aesthetic features of the collage image to be adjusted currently; when determining not to continue adjusting the collage image, outputting the adjusted collage image, wherein the image collage model is trained by: adjusting parameters of the image collage model based on a reward function that performs a sequence of collage adjustment actions for each prediction of the image collage model for a training sample.

Optionally, inputting the obtained aesthetic features into an image collage model, obtaining a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be adjusted currently, and returning to execute the step of obtaining the aesthetic features of the collage image to be adjusted currently, including: inputting the obtained aesthetic characteristics into the image collage model to obtain a collage adjustment action sequence needing to be executed in the prediction; executing the collage adjusting action sequence of the current prediction on the collage image to be adjusted currently to obtain the collage image after the collage adjustment; cutting the collage image after the collage adjustment according to the length-width ratio specified by the user; determining whether to continuously adjust the collage image, wherein when the collage image is determined to be continuously adjusted, the collage image obtained by cutting at this time is used as the collage image to be currently adjusted, and the step of obtaining the aesthetic characteristics of the collage image to be currently adjusted is returned to be executed; wherein, when it is determined that the collage image is not to be continuously adjusted, the step of outputting the adjusted collage image includes: and when the collage image is determined not to be continuously adjusted, outputting the collage image obtained by the cutting.

Optionally, the step of cutting the collage image after the collage adjustment according to the aspect ratio specified by the user includes: cutting out a plurality of candidate views which accord with the length-width ratio specified by a user from the collage image after the collage adjustment; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

Optionally, the image collage model is trained using a training method as described above.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus of an image collage model, the training apparatus including: an initial collage unit configured to obtain a training sample including a plurality of subimages and collage the plurality of subimages into one image to obtain an initial collage image to be adjusted; an aesthetic feature acquisition unit configured to acquire an aesthetic feature of a collage image to be currently adjusted; a prediction adjusting unit configured to input the obtained aesthetic features into the image collage model, obtain a predicted collage adjusting action sequence to be executed, execute the predicted collage adjusting action sequence on the collage image to be adjusted currently, and return to the aesthetic features of the collage image to be adjusted currently obtained by the aesthetic feature obtaining unit; a training unit configured to adjust parameters of the image collage model according to a reward function that performs a sequence of predicted collage adjustment actions to train the image collage model when it is determined not to continue adjusting the collage image.

Optionally, the prediction adjusting unit comprises: the prediction unit is configured to input the acquired aesthetic characteristics into the image collage model to obtain a collage adjustment action sequence needing to be executed in the prediction; a collage adjustment unit configured to execute a collage adjustment action sequence of the current prediction on a collage image to be adjusted currently to obtain a collage image after the current collage adjustment; a cropping unit configured to crop the tile image adjusted for the present tile in accordance with a specified aspect ratio; and an adjustment end determining unit configured to determine whether to continue adjusting the collage image, wherein when it is determined to continue adjusting the collage image, the aesthetic feature acquiring unit takes the collage image obtained by the cutting as the collage image to be currently adjusted, and acquires the aesthetic feature of the collage image to be currently adjusted.

Optionally, the aesthetic feature obtaining unit is configured to input the collage image currently to be adjusted into an aesthetic feature generation network, resulting in aesthetic features of a plurality of image blocks sampled out-of-order from the collage image currently to be adjusted; and performing fusion processing on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be adjusted currently, wherein the aesthetic features of the collage image to be adjusted currently are used for representing the image element aesthetic features and composition aesthetic features of the collage image to be adjusted currently.

Optionally, the aesthetic feature obtaining unit is configured to perform a fusion process on the aesthetic features of the plurality of image blocks based on the distance between the image block and the center of the collage image to be currently adjusted and/or the area of the image block, so as to obtain the aesthetic features of the collage image to be currently adjusted.

Optionally, the aesthetic feature obtaining unit is configured to perform a fusion process on the obtained aesthetic features of the plurality of image blocks through an attention mechanism for composition aesthetics to obtain the aesthetic features of the collage image to be adjusted currently.

Optionally, the cropping unit is configured to crop out a plurality of candidate views conforming to the specified aspect ratio from the collage image after the collage adjustment; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

Optionally, the aesthetic feature obtaining unit is configured to perform weighting processing on the aesthetic features of the plurality of image blocks based on euclidean distances between the image blocks and the center of the collage image to be currently adjusted and ratios of areas of the image blocks to the area of the collage image to be currently adjusted, so as to obtain the aesthetic features of the collage image to be currently adjusted.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an image collage apparatus including: an initial collage unit configured to acquire a plurality of sub-images to be collaged and collage the plurality of sub-images to be collaged into one image to obtain an initial collage image to be adjusted; an aesthetic feature acquisition unit configured to acquire an aesthetic feature of a collage image to be currently adjusted; the prediction adjusting unit is configured to input the acquired aesthetic features into the image collage model, obtain a predicted collage adjusting action sequence needing to be executed, execute the predicted collage adjusting action sequence on the collage image to be adjusted currently, and return to the aesthetic features of the collage image to be adjusted currently acquired by the aesthetic feature acquiring unit; an output unit configured to output the adjusted collage image when it is determined not to continue adjusting the collage image, wherein the image collage model is trained by: adjusting parameters of the image collage model based on a reward function that performs a sequence of collage adjustment actions for each prediction of the image collage model for a training sample.

Optionally, the prediction adjusting unit comprises: the prediction unit is configured to input the acquired aesthetic characteristics into the image collage model to obtain a collage adjustment action sequence needing to be executed in the prediction; a collage adjustment unit configured to execute a collage adjustment action sequence of the current prediction on a collage image to be adjusted currently to obtain a collage image after the current collage adjustment; a cropping unit configured to crop the collage image after the collage adjustment according to an aspect ratio specified by a user; an adjustment end determining unit configured to determine whether to continue adjusting the collage image, wherein when it is determined to continue adjusting the collage image, the aesthetic feature acquiring unit takes the collage image obtained by the cutting as the collage image to be currently adjusted, and acquires an aesthetic feature of the collage image to be currently adjusted; wherein the output unit is configured to output the collage image obtained by the present cropping when it is determined that the collage image is not to be continuously adjusted.

Optionally, the clipping unit is configured to: cutting out a plurality of candidate views which accord with the length-width ratio specified by a user from the collage image after the collage adjustment; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

Optionally, the image collage model is trained using a training apparatus as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a training method of an image collage model as described above and/or an image collage method as described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by at least one processor, cause the at least one processor to perform the training method of an image collage model as described above and/or the image collage method as described above.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by at least one processor, implement the training method of an image collage model as described above and/or the image collage method as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: a high quality collage image of a particular aspect ratio can be automatically collaged.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 shows a flowchart of a method of training an image collage model, according to an example embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a method of training an image collage model, according to another example embodiment of the present disclosure;

FIG. 3 illustrates an example of a collage adjustment action, according to an example embodiment of the present disclosure;

FIG. 4 illustrates a flow chart for obtaining aesthetic characteristics of a collage image currently to be adjusted, according to an exemplary embodiment of the present disclosure;

FIG. 5 shows a flowchart of an image collage method according to an example embodiment of the present disclosure;

FIG. 6 shows a flowchart of an image collage method according to another example embodiment of the present disclosure;

FIG. 7 shows a block diagram of a training apparatus for an image collage model, according to an example embodiment of the present disclosure;

FIG. 8 illustrates a block diagram of a configuration of an image collage apparatus according to an exemplary embodiment of the present disclosure;

fig. 9 illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

FIG. 1 shows a flowchart of a training method of an image collage model according to an example embodiment of the present disclosure.

Referring to fig. 1, in step S101, a training sample including a plurality of sub-images is obtained, and the plurality of sub-images are tiled into one image to obtain an initial tiled image to be adjusted. In other words, the collaged image is the initial collage image to be adjusted.

As an example, the plurality of sub-images may be a plurality of individual images, or may be a plurality of key frames in the same video. For example, the training sample may comprise a sequence of video key frames.

As an example, the plurality of sub-images may be tiled into one image according to a sequential order of the plurality of sub-images (e.g., a sequential order of the plurality of sub-images in a sequence).

As an example, the plurality of sub-images may be tiled into one image using a preset appropriate template. For example, the template may be a grid-like template.

In step S102, the aesthetic features of the collage image currently to be adjusted are obtained.

It should be understood that when step S102 is performed for a certain training sample for the first time, the collage image to be adjusted currently is the initial collage image to be adjusted. That is, the initial value of the collage image to be currently adjusted is: an initial collage image to be adjusted.

It should be understood that various suitable ways may be used to obtain the aesthetic characteristics of the collage image currently to be adjusted, and an exemplary embodiment of step S102 will be described below in connection with FIG. 4.

In step S103, the obtained aesthetic features are input into the image collage model, a predicted collage adjustment action sequence to be executed is obtained, the predicted collage adjustment action sequence is executed on the collage image to be adjusted currently, and the step S102 is returned to be executed. Specifically, when it is determined to continue adjusting the collage image, execution returns to step S102.

In step S104, when it is determined that the collage image is not to be continuously adjusted, parameters of the image collage model are adjusted according to a reward function for performing a sequence of predicted collage adjustment actions to train the image collage model.

In other words, step S102 and step S103 are iteratively executed until it is determined that the adjustment of the collage image is not to be continued, i.e., after step S103 is executed each time, step S102 is returned to execution based on the collage image adjusted this time.

An exemplary embodiment of step S103 will be described below in conjunction with fig. 2.

Referring to fig. 2, in step S1031, the obtained aesthetic features are input into the image collage model, and a collage adjustment action sequence to be executed in the current prediction is obtained.

As an example, the image collage model may be constructed based on a reinforcement learning algorithm.

As an example, before performing step S1031 this time, the aesthetic features obtained each time step S102 is performed may be input into the image collage model, resulting in the collage adjustment action sequence that needs to be performed, which is predicted this time. As an example, the state value s at the current time t can be set_t＝{o₀,o₁,…,o_tThe image collage model is input to decide which collage adjustment action or actions need to be performed next by the image collage model (i.e. agent) depending on the current state. Here, o_tRepresents the observed value at the current time t, i.e. the aesthetic characteristics of the collage image obtained at the current time t, i.e. the aesthetic characteristics of the collage image to be adjusted currently obtained by performing step S102 this time, and accordingly, o_t-1The aesthetic characteristics of the collage image obtained for the last time step S102 was performed.

As an example, the action space may be composed of a series of predefined actions, and the overall layout and/or local location details of the collage may be adjusted according to the actual scene.

By way of example, the types of collage adjustment actions may include: a layout adjustment action and/or a local adjustment action. For example, a tile position adjustment action for a number of images may be considered a layout adjustment action. For example, a collage position adjustment action for a single image may be considered a local adjustment action.

By way of example, the types of layout adjustment actions may include: swapping the positions of the two sub-images and terminating the layout adjustment action.

Specifically, the positions of the two sub-images are exchanged, i.e., the positions of the two sub-images in the collage image are exchanged. The overall layout adjustment is not performed any more by terminating the layout adjustment action.

As an example, the type of local adjustment action comprises at least one of: adjusting the placement position of the sub-images, not adjusting the placement position of the sub-images, rotating the placement angle of the sub-images, not adjusting the placement angle of the sub-images, adjusting the layer sequence of the sub-images, not adjusting the layer sequence of the sub-images.

FIG. 3 illustrates an example of a collage adjustment action, according to an example embodiment of the present disclosure. As shown in fig. 3, adjusting the sub-image placement position may include: the sub-image is moved horizontally/vertically by a short distance.

As an example, a sequence of layout adjustment actions that need to be performed may be predicted, and when the predicted sequence of collage adjustment actions that need to be performed includes a termination of the layout adjustment actions, a sequence of local adjustment actions that need to be performed may be predicted.

As an example, the action space for layout adjustment may include: swap operation and terminate action. The agent performs a swap operation to swap the positions of the two sub-images for faster generation of a better composition. The termination of the layout operation is a trigger of the agent, and when this operation is selected, the agent will stop the global optimization process and not adjust the layout any more. For the detail local adjustment, agent will operate on different sub-images, and the attributes include the layer, short distance relative horizontal/vertical movement, and rotation angle.

In step S1032, the collage image to be adjusted currently is subjected to the collage adjustment action sequence predicted this time, so as to obtain the collage image after the collage adjustment this time.

In step S1033, the tile image adjusted for this tile is cut according to the specified aspect ratio.

As an example, a plurality of candidate views conforming to the specified aspect ratio may be cropped from the collage image after the collage adjustment; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

As an example, the aesthetic scoring network can be a lightweight model View pro sales Net (VPN).

In step S1034, it is determined whether to continue adjusting the collage image, wherein when it is determined that the collage image is to continue adjusting, the collage image obtained by the current cropping is taken as the collage image to be currently adjusted, and the execution returns to step S102 to enter the next collage adjustment iteration.

As an example, the purpose of cropping the tile image after this tile adjustment is to adapt an irregularly shaped tile image to a specified aspect ratio canvas. The length-width ratio information is environment information, and feedback is provided to the agent after the collage images are adjusted each time. Specifically, after an agent adjusts a collage image each time (i.e., at each step), a plurality of candidate views are automatically cropped from the adjusted collage image to fit the canvas aspect ratio information. After the candidate view passes through the evaluation network (e.g., the aesthetic scoring network), the one with the highest score will transition to the next iteration. Thus, step S1033 can encourage agent to choose to expand the salient information and suppress the action of canvas blanking each time it interacts with the environment.

It should be understood that the present disclosure does not limit the execution order of step S1034, and may be performed after step S1032, for example.

As an example, whether to continue adjusting the collage image may be determined according to the number of times the predicted collage adjustment action sequence is performed (i.e., the number of times step S1032 is performed), wherein it is determined not to continue adjusting the collage image when the number of times the predicted collage adjustment action sequence is performed reaches a preset threshold. For example, the preset threshold may be 6.

It should be appreciated that other suitable ways of determining whether to continue adjusting the collage image (i.e., determining whether to end the training round for the training sample) may also be used, as the present disclosure is not limited in this respect.

When it is determined in step S1034 that the collage images are not to be continuously adjusted, step S104 is performed to adjust parameters of the image collage model according to a reward function for performing each predicted collage adjustment sequence, so as to train the image collage model.

In other words, the image collage model is trained by adjusting parameters of the image collage model according to a reward function that each time a predicted sequence of collage adjustment actions is performed before that.

As an example, a reward function is calculated for performing each predicted collage adjustment sequence of actions, i.e. a reward for performing each predicted collage adjustment sequence of actions is obtained by calculating the reward function.

As an example, the reward function for performing each predicted collage adjustment sequence of actions may be calculated by: a reward function for executing the collage adjustment sequence of the present prediction is calculated based on a difference between the aesthetic score of the collage image obtained after executing the collage adjustment sequence of the present prediction and the aesthetic score of the collage image obtained after executing the collage adjustment sequence of the previous prediction.

As an example, the reward function for performing the tth predicted collage adjustment sequence of actions may be calculated based on: t, the difference between the aesthetic score of the collage image obtained after execution of the collage adjustment sequence predicted the tth time, and the aesthetic score of the collage image obtained after execution of the collage adjustment sequence predicted the t-1 st time.

As an example, a reward function r of a sequence of collage adjustment actions that performs a t-th prediction_t(C_t) Can be calculated by the following formula:

r_t(C_t)＝r′_t(C_t)-0.01*(t+1)，

wherein r'_t(C_t) Represents: the difference between the aesthetic score of the collage image obtained after performing the t-th predicted collage adjustment sequence of actions and the aesthetic score of the collage image obtained after performing the t-1 st predicted collage adjustment sequence of actions.

As an example, the aesthetic score of a collage image may be calculated based on the number of aesthetic boxes of the collage image and/or the blank lost area of the collage image.

The image collage model aims to automatically adjust the original collage image to generate a collage image which is most in line with the mass aesthetics. Therefore, the reward function should enable the agent to obtain reward information each time the agent is adjusted, the reward function is calculated to give sufficient feedback information to the agent, and finally reward point accumulation can generate an adjusting mode with more aesthetic feeling.

The aesthetic scoring network of the present disclosure can use a lightweight model View pro posal Net (VPN) to aesthetically score the tiled image. In order to accelerate model convergence and ensure learning stability, the method can be used for actual trainingUse collage C_tNumber of aesthetic Frames s_a(C_t) As a supervisory signal and add collage C_tBlank loss s_b(C_t) To increase the compactness of the picture content. The suggested number of aesthetic boxes (aesthetic box number is the middle output of the aesthetic scoring network, aesthetic score is the final output of the aesthetic scoring network) can be used as a simplified version of agent supervision during training to ensure stability:

s_t(C_t)＝λ_as_a(C_t)-λ_bs_b(C_t)

each time the predicted collage adjustment action sequence is executed, the return for this action may be calculated using the difference between the newly adjusted collage image and the aesthetic score of the collage image from the last execution of the predicted collage adjustment action sequence:

r′_t(C_t)＝s_t+1(C_t+1)-s_t(C_t)

finally, because the rewarding scheme implicitly considers the number of steps as a cost, agent should follow a greedy strategy to avoid unnecessary actions, thereby reducing the cumulative utility and speeding up the collage optimization process:

r_t(C_t)＝r′_t(C_t)-0.01*(t+1)

where t represents the number of steps taken by an agent since the beginning to help speed up collage generation. As the number of agent operations increases, this constraint will reduce the overall rate of return.

As an example, the reward function of each time may be calculated after the step S1032 is performed.

As an example, the network structure of the image collage model may include: policy network theta_pAnd value network theta_v. Policy network theta_pAnd value network theta_vMay follow the abstract actor-criticc (A2C) algorithm. Both networks use the state s at the current time t^tAs input, the value network output value represents at state s^tExpectation of expected total reward later to measure current stateHow good or bad. So that the prize total R at a certain time t^tIs shown as

Wherein V(s)^t ^+k) Representing the value network's estimate of the prize total at time t + k. The attenuation factor is expressed by gamma, the closer the time t is to R^tThe greater the contribution. Gradient of the value network of

Value network awards an estimate V(s) for time t^(t)) To be followed by the actual prize sum R at the time t^tAnd the difference between the two is used as loss and is updated along the loss gradient as close as possible, so that the estimation network is more accurate.

The output of the strategy network is the collage image s at the current time t^tTaking the probability distribution of corresponding adjustment action under the state, wherein the optimal action, namely the action with the maximum probability distribution, can be used for collaging the image s at present^tThe largest subsequent reward total estimate achieved in a state is the best target operation to achieve collage results with higher aesthetic quality scores. The policy network gradient update may follow the following equation:

wherein A (a)^(t)，s^(t))＝R^(t)-V(s^(t)) Representing the actual subsequent prize sum R at time t^tWith the value network time t reward estimate V(s)^(t)) A difference of (a) (. pi. (a))^(t)|s^(t)) Representing the action probability distribution of the policy network output.

Further, it should be understood that the image collage model may be trained using multiple samples.

FIG. 4 illustrates a flow chart for obtaining aesthetic characteristics of a collage image currently to be adjusted, according to an exemplary embodiment of the present disclosure.

As shown in fig. 4, in step S1021, the tile image to be adjusted currently is input into the aesthetic feature generation network, and the aesthetic features of the plurality of image blocks sampled out-of-order from the tile image to be adjusted currently are obtained.

As an example, in order for the aesthetic feature generation network to capture both global and local detail information, the entire collage image and a plurality of randomly cut image patches in the collage image may be input into the aesthetic feature generation network.

It is to be understood that the image blocks are different from the sub-images and that the image blocks may comprise part or all of at least one sub-image.

As an example, the aesthetic feature generation network may be a deep neural network.

The present disclosure considers that since one collage image is composed of a plurality of sub-images, significant information loss may result if aesthetic features are extracted directly from the overall collage image, and significant local detail loss results. Thus, the present disclosure uses information of multiple tiles to represent aesthetic information of a collage image.

As an example, the aesthetic feature generation network may be pre-trained on a cpc (compatible Photo composition) aesthetic data set in advance, and then used to extract the aesthetic features for the single-step generated collage code. In general, the aesthetic feature generation network represents a collage with a plurality of patches, and extracts general aesthetic features from the image blocks sampled out of order, aiming at exploring comprehensive aesthetic properties between images.

In step S1022, the obtained aesthetic features of the plurality of image blocks are subjected to a fusion process to obtain the aesthetic features of the collage image to be currently adjusted.

Here, the aesthetic features of the collage image currently to be adjusted are used to characterize the image element aesthetic features and composition aesthetic features of the collage image currently to be adjusted.

As an example, the image element aesthetic features may characterize at least one of the following aesthetic content of the image: harmonious color, good light, balanced elements and good content. It should be understood that other aesthetic aspects of the image elements may also be characterized, and the present disclosure is not limited thereto.

As an example, the composition aesthetic feature may characterize at least one of the following aesthetic content of the image: composition matching between adjacent sub-images (e.g., color matching between adjacent sub-images can be included, etc.), highlighting important sub-images. It should be understood that other compositional aesthetic aspects may also be characterized, and the present disclosure is not limited thereto.

As an example, the aesthetic features of the plurality of image blocks may be fused based on the distance between the image block and the center of the collage image to be currently adjusted and/or the area of the image block, so as to obtain the aesthetic features of the collage image to be currently adjusted.

As an example, the obtained aesthetic features of the plurality of image blocks may be subjected to a fusion process through an attention mechanism to obtain the aesthetic features of the collage image to be currently adjusted.

As an example, the obtained aesthetic features of the plurality of image blocks may be subjected to a fusion process through an attention mechanism for composition aesthetics to obtain the aesthetic features of the collage image to be adjusted currently.

As an example, the aesthetic features of the plurality of image blocks may be weighted based on the euclidean distance between the image block and the center of the collage image to be currently adjusted, and the ratio of the area of the image block to the area of the collage image to be currently adjusted, so as to obtain the aesthetic features of the collage image to be currently adjusted.

As an example, the aesthetic characteristics f (c) of the collage image currently to be adjusted may be expressed as: f (C) · α (C) · f (C),

wherein α (C) ═ α₁，α₂，...，α_n]，

Wherein when

When the temperature of the water is higher than the set temperature,

is 1, otherwise

Is a non-volatile organic compound (I) with a value of 0,

wherein C represents the collage image to be adjusted currently, alpha (C) represents a weight matrix, f (C) represents an image block aesthetic feature matrix combining composition aesthetics, l_iRepresenting the Euclidean distance, P, of the ith image block from the center of the collage image to be adjusted currently_iRepresenting the ith image block, s (P), of the plurality of image blocks_i) Representing the area of the ith image block, s (C) representing the area of the collage image to be adjusted currently, eta representing the area proportionality coefficient, y_i，x_i) Denotes the center coordinates of the ith image block, (h)_c，w_c) Representing the size, h, of the collage image currently to be adjusted_cIndicating the height, w, of the collage image currently to be adjusted_cIndicating the width of the collage image currently to be adjusted, (y)_c，x_c) Representing the center coordinates of the collage image currently to be adjusted,

representing the aesthetic characteristics of the ith image block. That is, the collage image currently to be adjusted may be finally encoded as f (c) input to the image collage model.

In order to effectively perform information fusion on unordered blocks in a collage image, an attention-based feature fusion mode is provided so as to learn common image aesthetic features and typical composition aesthetic features in the collage at the same time. Since the most interesting parts of the collage image are the composition and color collocation of the constituent sub-images, the present disclosure shifts more attention to the composition between adjacent sub-images, rather than to the parts of a single sub-image. As described above, the selection criterion using the image block feature having the area ratio larger than η is set to find better composition quality and more harmonious content position.

An attention mechanism assigns dynamic weights to selected image block features in order to efficiently learn complex structural features of collaged images. The present disclosure uses the central laws in photography to give special attention to image patches near the center of a collage image in order to highlight important images, further improving aesthetic quality. The method uses Euclidean distance and area factor comprehensive weighting characteristics, and introduces an attention mechanism in the multi-image block characteristic fusion process to transfer more focuses to the center, wherein the expression form of the attention mechanism is as follows:

the technical scheme of the present disclosure is proposed in consideration of the following technical problems:

(1) because high-quality image collage requires certain professional experience, a large amount of annotation data is difficult to collect;

(2) the effect of image collage is difficult to measure, and how to automatically generate collage images which accord with the visual aesthetic of the masses is difficult; the result obtained by the layout optimization by the traditional method does not consider whether the collocation among the images is coordinated or not and whether the overall picture composition of the collage is beautiful or not, only meets the simple optimization on indexes, and does not have the design standard which is in line with the mass aesthetics;

(3) because of depending on the manually defined characteristics, the method can not provide enough characteristic representation for the complex collage image, thereby greatly limiting the image collage quality;

(4) the number of functional modules involved in a collage is large, and how to adjust and integrate the functional modules to optimize the effect is also a challenge.

According to the technical scheme, the automatic collage of the images is generated in three steps, modeling is carried out through reinforcement learning, the collage is decomposed into a series of interpretable operation steps, collage generation is simulated into a sequence decision process for adjusting global layout and placement position, rotation deflection angle and layer sequence, a deep collage aesthetic network is designed for feature extraction and evaluation of the collage, collage images under a specific length-width ratio can be generated finally, and the quality of image collage results is improved.

According to the technical scheme of the disclosure, a collage with a central attention-based or with expandable aesthetic rules can be generated; a deep aesthetic network and attention mechanism fusion are designed, common aesthetic features and composition aesthetic features are extracted for splicing, and richer high-level aesthetic features can be captured compared with manual definition; the method is characterized in that the collage image generated by using the reinforcement learning method is more interpretable, the collage is modeled into a reinforcement learning process of serialization decision, and the collage is divided into interpretable operation steps through layout generation and detail adjustment.

Fig. 5 illustrates a flowchart of an image collage method according to an exemplary embodiment of the present disclosure.

Referring to fig. 5, in step S201, a plurality of sub-images to be collaged are obtained, and the plurality of sub-images to be collaged are collaged into one image, so as to obtain an initial collage image to be adjusted.

As an example, the plurality of sub-images to be tiled may be a plurality of individual images, or may be a plurality of key frame sequences in the same video.

In step S202, the aesthetic features of the collage image currently to be adjusted are acquired.

In step S203, the obtained aesthetic features are input into the image collage model, a predicted collage adjustment action sequence to be executed is obtained, the predicted collage adjustment action sequence is executed on the collage image to be adjusted currently, and the step S202 is executed in a return mode. Specifically, when it is determined to continue adjusting the collage image, execution returns to step S202.

The image collage model is trained by: adjusting parameters of the image collage model based on a reward function that performs a sequence of collage adjustment actions for each prediction of the image collage model for a training sample.

In step S204, when it is determined that the collage image is not to be continuously adjusted, the adjusted collage image is output.

In other words, step S202 and step S203 are iteratively executed until it is determined that the adjustment of the collage image is not to be continued, i.e., after step S203 is executed each time, the execution of step S202 is returned based on the collage image adjusted this time.

An exemplary embodiment of step S203 will be described below in conjunction with fig. 6.

Referring to fig. 6, in step S2031, the obtained aesthetic features are input into the image collage model, and a collage adjustment action sequence to be executed in the current prediction is obtained.

In step S2032, the collage image to be adjusted currently is subjected to the collage adjustment sequence of this prediction to obtain a collage image after this collage adjustment.

In step S2033, the collage image after this collage adjustment is cut out according to the aspect ratio specified by the user.

In step S2034, it is determined whether to continue adjusting the collage image, wherein when it is determined that the collage image is to continue adjusting, the collage image obtained by this cropping is taken as the collage image to be currently adjusted, and execution returns to step S202.

When it is determined in step S2034 that the collage image is not to be continuously adjusted, step S204 is performed, wherein step S204 may include: step S2041, outputting the collage image obtained by the cutting.

As an example, the collage image currently to be adjusted may be input into an aesthetic feature generation network, resulting in aesthetic features of a plurality of image blocks sampled out-of-order from the collage image currently to be adjusted; and performing fusion processing on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be adjusted currently, wherein the aesthetic features of the collage image to be adjusted currently are used for representing the image element aesthetic features and composition aesthetic features of the collage image to be adjusted currently.

wherein α (C) ═ α₁，α₂，...，α_n]，

Wherein when

When the temperature of the water is higher than the set temperature,

is 1, otherwise

Is a non-volatile organic compound (I) with a value of 0,

representing the aesthetic characteristics of the ith image block.

By way of example, the types of collage adjustment actions may include: a layout adjustment action and/or a local adjustment action.

As an example, the type of local adjustment action may include at least one of: adjusting the placement position of the sub-images, not adjusting the placement position of the sub-images, rotating the placement angle of the sub-images, not adjusting the placement angle of the sub-images, adjusting the layer sequence of the sub-images, not adjusting the layer sequence of the sub-images.

As an example, a plurality of candidate views that conform to the aspect ratio specified by the user may be cropped from the collage image after the collage adjustment; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

As an example, the image collage model may be trained using a training method as described in the exemplary embodiments above.

The specific processing in the image collage method according to the exemplary embodiment of the present disclosure has been described in detail in the embodiment of the above-described related training method of the image collage model, and will not be explained in detail here.

Fig. 7 illustrates a block diagram of a training apparatus of an image collage model according to an exemplary embodiment of the present disclosure.

Referring to fig. 7, the training apparatus 10 of an image collage model according to an exemplary embodiment of the present disclosure includes: an initial collage unit 101, an aesthetic feature acquisition unit 102, a prediction adjustment unit 103, and a training unit 104.

Specifically, the initial collage unit 101 is configured to obtain a training sample including a plurality of sub-images, and collage the plurality of sub-images into one image to obtain an initial collage image to be adjusted.

The aesthetic feature acquisition unit 102 is configured to acquire an aesthetic feature of a collage image currently to be adjusted.

The prediction adjustment unit 103 is configured to input the acquired aesthetic features into the image collage model, obtain a predicted collage adjustment action sequence to be performed, perform the predicted collage adjustment action sequence on the collage image currently to be adjusted, and return to acquiring the aesthetic features of the collage image currently to be adjusted by the aesthetic feature acquisition unit 102.

The training unit 104 is configured to adjust parameters of the image collage model according to a reward function that performs a sequence of predicted collage adjustment actions to train the image collage model when it is determined not to continue adjusting the collage image.

As an example, the prediction adjustment unit 103 may include: a prediction unit (not shown), a collage adjustment unit (not shown), a cropping unit (not shown), an adjustment end determination unit (not shown).

Specifically, the prediction unit is configured to input the obtained aesthetic features into the image collage model, and obtain the collage adjustment action sequence needing to be executed in the prediction.

The collage adjustment unit is configured to execute the collage adjustment action sequence of the current prediction on the collage image to be adjusted currently to obtain the collage image after the current collage adjustment.

The cropping unit is configured to crop the tile image adjusted for the tile at a specified aspect ratio.

The adjustment end determination unit is configured to determine whether to continue adjusting the collage image, wherein when it is determined to continue adjusting the collage image, the aesthetic feature acquisition unit 102 takes the collage image cut this time as the collage image currently to be adjusted, and acquires the aesthetic feature of the collage image currently to be adjusted.

As an example, the aesthetic feature obtaining unit 102 may be configured to input the collage image currently to be adjusted into the aesthetic feature generation network, resulting in the aesthetic features of the plurality of image blocks sampled out-of-order from the collage image currently to be adjusted; and performing fusion processing on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be adjusted currently, wherein the aesthetic features of the collage image to be adjusted currently are used for representing the image element aesthetic features and composition aesthetic features of the collage image to be adjusted currently.

As an example, the aesthetic feature obtaining unit 102 may be configured to perform a fusion process on the aesthetic features of the plurality of image blocks based on the distances of the image blocks from the center of the collage image to be currently adjusted and/or the areas of the image blocks to obtain the aesthetic features of the collage image to be currently adjusted.

As an example, the aesthetic feature obtaining unit 102 may be configured to perform a fusion process on the obtained aesthetic features of the plurality of image blocks through an attention mechanism for composition aesthetics to obtain the aesthetic features of the collage image to be currently adjusted.

By way of example, the types of collage adjustment actions may include: a layout adjustment action and/or a local adjustment action; wherein the types of layout adjustment actions include: exchanging the positions of the two sub-images and terminating the layout adjustment action; wherein the type of local adjustment action may comprise at least one of: adjusting the placement position of the sub-images, not adjusting the placement position of the sub-images, rotating the placement angle of the sub-images, not adjusting the placement angle of the sub-images, adjusting the layer sequence of the sub-images, not adjusting the layer sequence of the sub-images.

As an example, the cropping unit may be configured to crop out a plurality of candidate views conforming to the specified aspect ratio from the collage image after the collage adjustment of this time; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

As an example, the aesthetic feature obtaining unit 102 may be configured to perform weighting processing on the aesthetic features of the plurality of image blocks based on euclidean distances of the image blocks from the center of the collage image to be currently adjusted and ratios of areas of the image blocks to the area of the collage image to be currently adjusted, so as to obtain the aesthetic features of the collage image to be currently adjusted.

Fig. 8 illustrates a block diagram of a configuration of an image collage apparatus according to an exemplary embodiment of the present disclosure.

Referring to fig. 8, the image collage apparatus 20 according to an exemplary embodiment of the present disclosure includes: an initial collage unit 201, an aesthetic feature acquisition unit 202, a prediction adjustment unit 203, and an output unit 204.

Specifically, the initial collage unit 201 is configured to acquire a plurality of sub-images to be collaged and collage the plurality of sub-images to be collaged into one image to obtain an initial collage image to be adjusted.

The aesthetic feature acquisition unit 202 is configured to acquire an aesthetic feature of a collage image currently to be adjusted.

The prediction adjustment unit 203 is configured to input the acquired aesthetic features into the image collage model, obtain a predicted collage adjustment action sequence to be performed, perform the predicted collage adjustment action sequence on the collage image currently to be adjusted, and return to acquiring the aesthetic features of the collage image currently to be adjusted by the aesthetic feature acquisition unit 202.

The output unit 204 is configured to output the adjusted collage image when it is determined that the collage image is not to be continuously adjusted.

As an example, the prediction adjusting unit 203 may include: a prediction unit (not shown), a collage adjustment unit (not shown), a cropping unit (not shown), an adjustment end determination unit (not shown).

The cropping unit is configured to crop the collage image after the collage adjustment according to the aspect ratio specified by the user.

The adjustment end determination unit is configured to determine whether to continue adjusting the collage image, wherein when it is determined to continue adjusting the collage image, the aesthetic feature acquisition unit 202 takes the collage image cut this time as the collage image currently to be adjusted, and acquires the aesthetic feature of the collage image currently to be adjusted.

The output unit 204 may be configured to output the collage image resulting from the present cropping when it is determined that the collage image is not to be continuously adjusted.

As an example, the aesthetic feature obtaining unit 202 may be configured to input the collage image currently to be adjusted into the aesthetic feature generation network, resulting in the aesthetic features of the plurality of image blocks sampled out-of-order from the collage image currently to be adjusted; and performing fusion processing on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be adjusted currently, wherein the aesthetic features of the collage image to be adjusted currently are used for representing the image element aesthetic features and composition aesthetic features of the collage image to be adjusted currently.

As an example, the aesthetic feature obtaining unit 202 may be configured to perform a fusion process on the aesthetic features of the plurality of image blocks based on the distances of the image blocks from the center of the collage image to be currently adjusted and/or the areas of the image blocks to obtain the aesthetic features of the collage image to be currently adjusted.

As an example, the aesthetic feature obtaining unit 202 may be configured to perform a fusion process on the obtained aesthetic features of the plurality of image blocks through an attention mechanism for composition aesthetics to obtain the aesthetic features of the collage image to be currently adjusted.

As an example, the clipping unit may be configured to: cutting out a plurality of candidate views which accord with the length-width ratio specified by a user from the collage image after the collage adjustment; inputting the candidate views into an aesthetic scoring network to obtain the estimated aesthetic scores of the candidate views; and taking the candidate view with the highest aesthetic score in the plurality of candidate views as the collage image obtained by the current cutting.

As an example, the aesthetic feature obtaining unit 202 may be configured to perform weighting processing on the aesthetic features of the plurality of image blocks based on euclidean distances of the image blocks from the center of the collage image to be currently adjusted and ratios of areas of the image blocks to the area of the collage image to be currently adjusted, so as to obtain the aesthetic features of the collage image to be currently adjusted.

As an example, the image collage model may be trained using the training apparatus 10 as described above.

With regard to the apparatus in the above-described embodiment, the specific manner in which the respective units perform operations has been described in detail in the embodiment related to the method, and will not be elaborated upon here.

Further, it should be understood that the respective units in the training apparatus 10 and the image collage apparatus 20 of the image collage model according to the exemplary embodiments of the present disclosure may be implemented as hardware components and/or software components. The individual units may be implemented, for example, using Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), depending on the processing performed by the individual units as defined by the skilled person.

Fig. 9 illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Referring to fig. 9, the electronic device 30 includes: at least one memory 301 and at least one processor 302, the at least one memory 301 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 302, perform a method of training an image collage model and/or a method of image collage as described in the above exemplary embodiments.

By way of example, the electronic device 30 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions described above. Here, the electronic device 30 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 30 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 30, the processor 302 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processor 302 may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 302 may execute instructions or code stored in the memory 301, wherein the memory 301 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 301 may be integrated with the processor 302, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 301 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 301 and the processor 302 may be operatively coupled or may communicate with each other, e.g., through I/O ports, network connections, etc., such that the processor 302 is able to read files stored in the memory.

In addition, the electronic device 30 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 30 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the image collage model training method and/or the image collage method as described in the above exemplary embodiments. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, a computer program product may also be provided, in which instructions are executable by at least one processor to perform a training method of an image collage model and/or an image collage method as described in the above exemplary embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of training an image collage model, the method comprising:

acquiring a training sample comprising a plurality of subimages, and collaging the subimages into an image to obtain an initial collage image to be adjusted;

obtaining the aesthetic characteristics of the collage image to be adjusted currently;

inputting the obtained aesthetic features into the image collage model to obtain a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be adjusted currently, and returning to the step of obtaining the aesthetic features of the collage image to be adjusted currently;

when it is determined that the collage image is not to be continually adjusted, parameters of the image collage model are adjusted according to a reward function that performs a sequence of predicted collage adjustment actions to train the image collage model.

2. The training method according to claim 1, wherein the step of inputting the obtained aesthetic features into the image collage model to obtain a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be currently adjusted, and returning to the step of obtaining the aesthetic features of the collage image to be currently adjusted comprises:

inputting the obtained aesthetic characteristics into the image collage model to obtain a collage adjustment action sequence needing to be executed in the prediction;

executing the collage adjusting action sequence of the current prediction on the collage image to be adjusted currently to obtain the collage image after the collage adjustment;

cutting the collage image after the collage adjustment according to the specified length-width ratio;

and determining whether to continuously adjust the collage image, wherein when the collage image is determined to be continuously adjusted, the collage image obtained by cutting at this time is used as the collage image to be currently adjusted, and the step of obtaining the aesthetic characteristics of the collage image to be currently adjusted is returned to be executed.

3. Training method according to claim 1, wherein the step of obtaining the aesthetic features of the collage image currently to be adjusted comprises:

inputting the collage image to be adjusted into an aesthetic feature generation network to obtain the aesthetic features of a plurality of image blocks sampled out of order from the collage image to be adjusted;

and performing fusion processing on the obtained aesthetic features of the plurality of image blocks to obtain the aesthetic features of the collage image to be adjusted currently, wherein the aesthetic features of the collage image to be adjusted currently are used for representing the image element aesthetic features and composition aesthetic features of the collage image to be adjusted currently.

4. The training method according to claim 3, wherein the step of performing a fusion process on the obtained aesthetic features of the image blocks to obtain the aesthetic features of the collage image to be currently adjusted comprises:

and performing fusion processing on the aesthetic features of the image blocks based on the distance between the image block and the center of the collage image to be adjusted currently and/or the area of the image block to obtain the aesthetic features of the collage image to be adjusted currently.

5. An image collage method, characterized in that it comprises:

acquiring a plurality of subimages to be collaged, and collaging the subimages to be collaged into one image to obtain an initial collage image to be adjusted;

inputting the obtained aesthetic features into an image collage model to obtain a predicted collage adjustment action sequence to be executed, executing the predicted collage adjustment action sequence on the collage image to be adjusted currently, and returning to the step of obtaining the aesthetic features of the collage image to be adjusted currently;

outputting the adjusted collage image when it is determined that the collage image is not to be adjusted continuously,

wherein the image collage model is trained by: adjusting parameters of the image collage model based on a reward function that performs a sequence of collage adjustment actions for each prediction of the image collage model for a training sample.

6. Training device for an image collage model, characterized in that it comprises:

an initial collage unit configured to obtain a training sample including a plurality of subimages and collage the plurality of subimages into one image to obtain an initial collage image to be adjusted;

an aesthetic feature acquisition unit configured to acquire an aesthetic feature of a collage image to be currently adjusted;

a prediction adjusting unit configured to input the obtained aesthetic features into the image collage model, obtain a predicted collage adjusting action sequence to be executed, execute the predicted collage adjusting action sequence on the collage image to be adjusted currently, and return to the aesthetic features of the collage image to be adjusted currently obtained by the aesthetic feature obtaining unit;

a training unit configured to adjust parameters of the image collage model according to a reward function that performs a sequence of predicted collage adjustment actions to train the image collage model when it is determined not to continue adjusting the collage image.

7. An image collage apparatus, characterized in that the image collage apparatus comprises:

an initial collage unit configured to acquire a plurality of sub-images to be collaged and collage the plurality of sub-images to be collaged into one image to obtain an initial collage image to be adjusted;

the prediction adjusting unit is configured to input the acquired aesthetic features into the image collage model, obtain a predicted collage adjusting action sequence needing to be executed, execute the predicted collage adjusting action sequence on the collage image to be adjusted currently, and return to the aesthetic features of the collage image to be adjusted currently acquired by the aesthetic feature acquiring unit;

an output unit configured to output the adjusted collage image when it is determined that the collage image is not to be continuously adjusted,

8. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a training method of an image collage model according to any one of claims 1 to 4 and/or an image collage method according to claim 5.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform a method of training an image collage model according to any one of claims 1 to 4 and/or a method of image collage according to claim 5.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by at least one processor, implement a training method of an image collage model according to any one of claims 1 to 4 and/or an image collage method according to claim 5.