CN111415396A

CN111415396A - Image generation method and device and storage medium

Info

Publication number: CN111415396A
Application number: CN201910140793.XA
Authority: CN
Inventors: 赵胜林; 李嘉麟; 陈锡显; 沈小勇; 戴宇榮; 賈佳亞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-01-08
Filing date: 2019-02-26
Publication date: 2020-07-14

Abstract

The embodiment of the invention discloses an image generation method, an image generation device and a storage medium. The embodiment of the invention can acquire the input text items, and screen out the target template from the preset candidate text templates according to the text items; typesetting the text items according to the target template to generate text plates; acquiring a background picture, and extracting a saliency map of the background picture; acquiring pixel information of the saliency map, and determining the target position of the text plate in the background picture according to the pixel information of the saliency map; and typesetting the background picture and the text plate according to the target position to generate a target image. Therefore, the scheme improves the whole image effect and improves the automation degree of image-text image production.

Description

Image generation method and device and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an image generation method, an image generation apparatus, and a storage medium.

Background

The poster in the form of pictures and texts is a propaganda means with very high daily use frequency, can quickly attract the attention of audiences and highlight propaganda information. Since it takes a lot of time and effort to design a poster manually, many businesses have started trying to produce a poster using design software.

During the research and practice process of the prior art, the inventor of the present invention finds that the current design software is essentially that a user inputs a file to be displayed, selects a template to be used from a template library provided by the design software, and then fills the file input by the user into the selected template by the design software to output a prepared poster image. If the user is not satisfied with the typesetting of the file in the template, the file needs to be manually adjusted according to the design experience, and a great deal of energy and time are consumed. It can be seen that the automation degree of the production of the image-text images such as posters and the like is poor at present.

Disclosure of Invention

The embodiment of the invention provides an image generation method, an image generation device and a storage medium, and aims to solve the technical problem of poor automation degree of image-text image production.

The embodiment of the invention provides an image generation method, which comprises the following steps:

acquiring an input text item, and screening a target template from preset candidate text templates according to the text item:

typesetting the text items according to the target template to generate text plates;

acquiring a background picture, and extracting a saliency map of the background picture;

acquiring pixel information of the saliency map, and determining the target position of the text plate in the background picture according to the pixel information of the saliency map;

and typesetting the background picture and the text plate according to the target position to generate a target image.

In some embodiments, there are a plurality of candidate text templates, and the screening a target template from preset candidate text templates according to the text items includes:

obtaining a plurality of sample texts contained in each candidate text template;

and respectively calculating the distance between the text item and each sample text, and selecting a target template from the candidate text templates according to the distance.

In some embodiments, the sample text includes text entries in multiple dimensions, and calculating the distance between the text entry and the sample text includes:

acquiring text entries of the text items and the sample text in each dimension, and respectively calculating the distance between the text entries in each dimension;

and calculating the distance between the text item and the sample text according to a preset penalty factor and the distance between the text items of each dimension.

In some embodiments, the determining the target position of the text plate in the background picture according to the pixel information of the saliency map comprises:

calculating the balance energy value of the text plate when the text plate is positioned at each preset candidate position in the background picture according to the pixel information of the saliency map;

and screening out the target position of the text plate in the background picture from the candidate positions according to the balance energy value.

In some embodiments, calculating, according to the pixel information of the saliency map, a balance energy value of the text plate when the text plate is located at a preset candidate position in the background picture includes:

acquiring pixel information of the text plate when the text plate is positioned at the candidate position;

calculating to obtain an energy parameter according to the pixel information of the saliency map and the text plate;

and calculating to obtain a balance energy value of the text plate when the text plate is positioned at the candidate position according to a preset weight and the energy parameter.

In some embodiments, the energy parameter includes a collision parameter, and the calculating an energy parameter according to the pixel information of the saliency map and the text plate includes:

according to the pixel information of the saliency map and the text plate, counting overlapped pixels of the saliency map and the text plate;

and calculating the conflict value of the saliency map and the text plate according to the overlapped pixels to obtain a conflict parameter.

In some embodiments, the energy parameter includes a margin parameter, and the calculating the energy parameter according to the pixel information of the saliency map and the text plate includes:

counting the blank pixels in the saliency map and the text plate according to the pixel information of the saliency map and the text plate;

and calculating the blank leaving rate according to the blank leaving pixels to obtain the blank leaving parameters.

In some embodiments, the calculating the energy parameter according to the pixel information of the saliency map and the text plate includes:

calculating the deviation value of the saliency map and the text plate in the preset direction according to the pixel information of the saliency map and the text plate;

and calculating to obtain an alignment parameter according to a preset coefficient corresponding to the preset direction and the deviation value.

In some embodiments, the calculating, according to the preset weight and the energy parameter, a balanced energy value of the text plate at the candidate position includes:

acquiring central position information of the background picture and the text plate, and calculating to obtain a central parameter according to the central position information;

and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight, the energy parameter and the central parameter.

In some embodiments, the screening out a target template from preset candidate text templates according to the text items comprises:

and obtaining subject information, and screening candidate text templates from a preset template library according to the subject information.

In some embodiments, the obtaining the background picture comprises:

acquiring keyword information, and screening a plurality of candidate background pictures in a preset background library according to the keyword information;

obtaining description texts corresponding to the candidate background images, and respectively calculating the similarity between each description text and the text item;

and selecting a background picture from the candidate background pictures according to the similarity.

In some embodiments, the selecting a background picture from the candidate background pictures according to the similarity includes:

acquiring performance values corresponding to the candidate background pictures;

respectively calculating the total score of each candidate background image according to the performance value and the similarity;

and selecting a background picture from the candidate background pictures according to the total score of each candidate background picture.

An embodiment of the present invention further provides an image generating apparatus, including:

the template unit is used for acquiring input text items and screening out a target template from preset candidate text templates according to the text items;

the plate unit is used for typesetting the text items according to the target template to generate a text plate;

the salient image unit is used for acquiring a background image and extracting a salient image of the background image;

the position unit is used for acquiring the pixel information of the saliency map and determining the target position of the text plate in the background picture according to the pixel information of the saliency map;

and the image unit is used for typesetting the background picture and the text plate according to the target position to generate a target image.

The embodiment of the present invention further provides a storage medium, where a plurality of instructions are stored, and the instructions are suitable for being loaded by a processor to execute the steps in the image generation method provided in any embodiment of the present invention.

According to the embodiment of the invention, the input text item is obtained, the target template is screened out from the preset candidate text templates according to the text item, and the matching degree of the obtained target template and the text item is higher; then, typesetting the text items according to the target template to generate text plates; acquiring a background picture, and extracting a saliency map of the background picture; then, acquiring pixel information of the saliency map, and determining the target position of the text plate in the background picture according to the pixel information of the saliency map; and typesetting the background picture and the text plate according to the target position to generate a target image. According to the scheme, a proper text template is correspondingly screened for each input text item, so that a target template which is most matched with the text item is obtained, and personalized pattern typesetting is realized. And the target position of the text plate is determined according to the saliency map in the background picture, so that the text plate and the background picture are typeset, the text plate and the saliency map are balanced or reach an optimal fusion state in the background picture, the visual balance of the target image is realized, and the integral aesthetics of the image typesetting is improved. Therefore, the scheme realizes the automatic typesetting of the document contents and the document positions and the automatic generation of the image-text images, does not need manual setting or adjustment of a user, saves the time and energy of the user, and improves the automation degree of image-text image production while improving the overall effect of the images.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1a is a schematic view of a scene of an information interaction system according to an embodiment of the present invention;

FIG. 1b is a schematic flow chart of an image generation method according to an embodiment of the present invention;

FIG. 1c is a diagram of entering text items on a copy page according to an embodiment of the present invention;

FIG. 1d is a schematic diagram of a target image according to an embodiment of the present invention.

FIG. 2a is a schematic flow chart of another image generation method provided by the embodiment of the invention;

FIG. 2b is a schematic diagram of subject information input provided by an embodiment of the present invention;

FIG. 3a is a schematic flow chart of an application scenario for target image creation according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of a target template provided by an embodiment of the present invention;

FIG. 3c is a diagram of a text tile provided by an embodiment of the present invention;

FIG. 3d is a schematic diagram of a background picture according to an embodiment of the present invention;

FIG. 3e is a schematic diagram of a target image provided by an embodiment of the invention;

FIG. 4a is a schematic structural diagram of an image generating apparatus according to an embodiment of the present invention;

FIG. 4b is a schematic structural diagram of another image generating apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides an image generation method, an image generation device and a storage medium.

The embodiment of the invention provides an information interaction system, which comprises any image generation device provided by the embodiment of the invention, wherein the image generation device can be integrated in equipment such as a server; in addition, the system may also include other devices, such as clients and the like. The client may be a terminal or a Personal Computer (PC) or the like.

Referring to fig. 1a, an embodiment of the present invention provides an information interaction system, which includes an image generation apparatus and a client.

A user uses a client to enter text items in a teletext image, such as a flat advertisement, when the user makes the image. The client transmits the text item input by the user to the image generation device. The image generation device acquires the text items, and screens out a target template from preset candidate text templates according to the text items; and then, typesetting the text items according to the target template to generate text plates. Moreover, the user can upload the background picture to the image generation device by using the client, or select the background picture from candidate background pictures provided by the image generation device. The image generation device acquires a background picture and extracts a saliency map of the background picture.

Then, the image generation device acquires pixel information of the saliency map, and determines the target position of the text plate in the background picture according to the pixel information of the saliency map; and typesetting the background picture and the text plate according to the target position to generate a target image.

According to the scheme, a proper text template is correspondingly screened for each input text item, so that a target template which is most matched with the text item is obtained, and personalized pattern typesetting is realized. And the target position of the text plate is determined according to the saliency map in the background picture, so that the text plate and the background picture are typeset, the text plate and the saliency map are balanced or reach an optimal fusion state in the background picture, the visual balance of the target image is realized, and the integral aesthetics of the image typesetting is improved. Therefore, the scheme realizes automatic typesetting of the document contents and the document positions, does not need manual setting or adjustment of a user, saves the time and energy of the user, and improves the automation degree of poster image production while ensuring the overall effect of the poster.

As shown in fig. 1b, the specific flow of the image generation method may be as follows:

101. and acquiring input text items, and screening a target template from preset candidate text templates according to the text items.

For example, when a user creates a target image such as a flat advertisement, a text item can be input on a document page provided by an image generating apparatus via a device such as a mobile phone, a tablet computer, or a personal computer. Where the text items entered by the user may include text entries in one or more dimensions. The dimension can be the category, the hierarchy, the priority and/or the sequencing of the text entry, and the like, and can be flexibly configured according to the actual requirement.

For example, in fig. 1c, the text entry entered by the user on the document page includes text entries with multiple dimensions, where the dimensions are: a main case I, an auxiliary case I, a main case II and an auxiliary case II. Wherein, the last winter is the text entry of the dimension of 'main file one', and the anchor: the dimension of Xiaobei is 'auxiliary case one' and the dimension of 'I love a girl' is 'main case two' and the dimension of auxiliary case two is empty.

The image generation apparatus acquires a text item input by a user.

Then, the image generating device screens out a target template adapted to the text item based on the number or content of the text included in the text item in the preset candidate text templates to achieve an optimal document layout effect, which will be exemplified in detail below.

The candidate text template can be preset or obtained by screening according to the theme of the target image in advance, and can be flexibly configured according to actual needs. There may be one or more candidate text templates, and if there is only one candidate text template, the candidate text template is used as the target template. The present embodiment will be illustrated hereinafter with a plurality of candidate text templates.

The candidate text template includes layout information of the text items, for example: the number of rows arranged, the spacing before the segments, the spacing of the rows, the font size, and/or the alignment, etc. Wherein the number of arranged lines refers to the number of lines of text entries in the text plate arranged in the text entry. The pre-paragraph spacing refers to the number of characters left blank at the beginning of each line of text, e.g., no blank, or 2 or more characters left blank, etc., and the pre-paragraph spacing of each line of text may be the same or different. The line spacing refers to the distance between each line of text, and the distance between different lines may be the same or different. Font refers to the font of each line of text, such as a regular font or a song font, and the font of each line of text may be the same or different. The font size refers to the font size of each line of text, such as four or five, and the size of each line of text may be the same or different. Alignment refers to the alignment of each line of text with other text and/or text blocks, including left alignment, center and/or right alignment, etc.

Therefore, the richness of the text typesetting mode is realized.

For example, in some embodiments, the image generation apparatus filters out a target template matching the text item based on the number of texts in a preset candidate text template. As an embodiment, the image generating apparatus may obtain the number of text entries in the text item, and select a template containing the same number of text entries as the target template from preset candidate text templates. For example, if the number of text entries in a text item is n, the image generation apparatus may extract the number of text entries included in each candidate text template, and select the candidate text template including the number of text entries n as the target template, where n is a positive integer.

In some other embodiments, the step of "filtering out a target template from the preset candidate text templates according to the text items" may include: obtaining a plurality of sample texts contained in each candidate text template; and respectively calculating the distance between the text item and each sample text, and selecting a target template from the candidate text templates according to the distance.

Specifically, the image generation apparatus may acquire sample texts contained in the respective candidate text templates, respectively. Wherein the sample text may contain sample entries in one or more dimensions.

Then, the image generating means calculates the distance between the text item and each sample text, respectively. Wherein the distance between the text item and the sample text is used to describe the degree of similarity between the text item and the sample text.

For example, when the image generation apparatus filters the target template based on the text content contained in the text item, the image generation apparatus may calculate semantic similarity, hamming distance, and the like of the text item and the sample text as the distance therebetween.

In another embodiment, when the image generation apparatus filters the target template based on the number of texts included in the text item, the similarity between the number of texts of the text item and the sample text may be calculated as the distance therebetween. Specifically, taking the sample text of any candidate text template as an example, the step "calculating the distance between the text item and the sample text" may include: acquiring text items of the text items and the sample text in each dimension, and respectively calculating to obtain the distance between the text items in each dimension; and calculating the distance between the text item and the sample text according to the preset penalty factor and the distance between the text items of each dimension.

The image generation device counts the dimension of the text entry in the sample text, and the image generation device counts the dimension of the text entry in the text entry.

Then, the image generation device performs merging and de-duplication processing on the text entry dimensions contained in the sample text and the text items to obtain all dimensions contained in the sample text and the text items. For example, the text entry dimensions included in the sample text include a primary document one, a secondary document one, and a primary document two, and the text entry dimensions included in the text entry include the primary document one and the primary document two, then the image generation apparatus merges the text entry dimensions included in the sample text and the text entry to obtain 2 primary documents one, 1 secondary document one, and 2 primary documents two. Then, the image generation device performs deduplication processing on the merged text entry dimensions to obtain 1 primary document I, 1 secondary document I and 1 primary document II. Thus, the image generation apparatus obtains all dimensions contained in the sample text and the text item: a main file one, a subsidiary file one and a main file two.

Then, the image generation device respectively acquires text entries of the text items and the sample text in each dimension according to all the obtained dimensions, and calculates the distance between the text entries in each dimension.

For example, the image generation device may count the number of characters, such as the number of chinese characters, the number of english characters, or the number of all characters, in the text entry of each dimension for the text item and the sample text, respectively. And if the text item or the sample text has no text entry in a certain dimension, the number of characters of the text entry in the dimension is zero.

Then, the image generating means calculates the difference in the number of characters of the text item and the text entry of the sample text in the same dimension, and takes the absolute value of the number difference as the text entry distance in the dimension. Thus, the image generation device calculates the text entry distance of each dimension between the text item and the sample text.

And then, the image generation device multiplies the text entry distance of each dimension by a corresponding preset penalty factor to obtain the reference distance of each dimension between the text entry and the sample text. Wherein the penalty factor may be a constraint function or a weight value. In this embodiment, a penalty factor is taken as a weighting value for example, a numerical value of the penalty factor may be configured in advance, and the numerical value of the penalty factor may be proportional to the importance degree of the corresponding dimension. For example, if the importance of the dimension "primary case one" is higher than that of the dimension "secondary case two", the penalty factor value corresponding to the dimension "primary case one" may be larger than that corresponding to the dimension "secondary case two". Therefore, the influence of the text entry distance of each dimension on the distance between the text item and the sample text is highlighted or weakened through the penalty factor, and further personalized customization of target template screening is achieved, and the method is more suitable for customer requirements.

In some embodiments, the image generation device obtains preset penalty factors of each dimension corresponding to the candidate text template according to the candidate text template where the sample text is located, and penalty factors corresponding to the same dimension may be different for different candidate text templates. Of course, the image generating apparatus may also detect whether the text entry of each dimension is missing, for example, if the text entry of any dimension is missing or a sample text, modify the penalty factor corresponding to the dimension to a preset value, so as to control the influence of the missing text of the dimension.

Then, the image generation device sums the reference distances of the text item and the sample text in each dimension to obtain the distance between the text item and the sample text.

For example, the text item is text1 and the sample text is text 2. The number of text entry words of dimension x in a text item is x₁The number of words of the text entry of dimension x in the sample text is x₂(ii) a The number of text entry words in the dimension y of a text item is y₁The number of words of the text entry in dimension y in the sample text is y₂(ii) a The number of text entry words in a text item having a dimension z is z₁The number of words of a text entry in the sample text having dimension z is z₂(ii) a … …, respectively; the number of text entry words in a text item of dimension w is w₁The number of words of the text entry in the dimension w in the sample text is w₂。

The penalty factor of the preset dimension x is η_xPenalty factor of η for dimension y_yPenalty factor of dimension z is η_z… …, penalty factor of dimension w is η_w。

The distance D between text1 and text2 (text1, text2) can be calculated using the following formula:

D(text1，text2)＝η_x×|x₁-x₂|+η_y×|y₁-y₂|+η_z×|z₁-z₂|+……+η_w×|w₁-w₂|。

thus, the image generation device calculates the distance between the text item and each sample text, and the distance represents the similarity of the text items and the number of texts in the sample text. Thus, distance data corresponding to each sample text is obtained.

Then, the image generating device determines the correspondence between each candidate text template and the distance data based on the correspondence between each sample text and each candidate text template and the correspondence between each sample text and the distance data.

After the distance is obtained, the image generation device selects a target template from the candidate text templates according to a preset screening rule. For example, the preset filtering rule may be to select, as the target template, a candidate text template with a smallest or largest distance value from the text item among a plurality of preset candidate text templates.

Thereby, the image generating apparatus obtains the target template.

102. And typesetting the text items according to the target template to generate a text plate.

After the target template is obtained, the image generation device typesets the text items according to the typesetting information contained in the target template to generate text plates which are used for being filled into the background images to generate the target images.

The layout information included in the target template may specifically include: the number of rows arranged, the spacing before the segments, the spacing of the rows, the font size, and/or the alignment, etc.

Text entries containing three dimensions in text terms: the first dimension of text, the second dimension of text and the third dimension of text are taken as examples for explanation. The sample text of the target template also comprises the text entries of the three dimensions, the arrangement lines are three lines, and the first, second and third line font sizes are respectively corresponding to a first size, a second size and a third size.

The image generation device sequentially arranges the text items with three dimensions in the text items in rows according to the preset dimension priority to obtain a text module consisting of three rows of texts. For example, the first, second, and third dimensions are arranged in order from the top to the bottom according to the priority of the dimensions, and the order of arrangement of the dimensions is the first dimension, the second, and the third dimensions, the image generation apparatus sequentially divides the first text, the second text, and the third text into three rows, where the first text is located in the first row, the second text is located in the second row, and the third text is located in the third row.

And the image generation device correspondingly sets the font sizes of the first text, the second text and the third text as the first size, the second size and the third size respectively.

It should be noted that the tile herein may be a fixed-size tile, which may be displayed directly on the upper layer of the background picture. Of course, the background color of the text plate can be configured to be transparent in advance, so as to avoid blocking the content in the background picture. Of course, the background color of the text plate may be other preset colors to highlight the text plate.

103. And acquiring a background picture, and extracting a saliency map of the background picture.

It should be noted that the execution timing of step 103 may be before or after

steps

101 and 102, and of course, may also be executed simultaneously. That is, the image generation apparatus may perform step 103 while, before, or after performing

steps

101 and 102.

Wherein the background picture is a background picture of the target image. For example, the user may upload a background picture, or the user may select a background picture from candidate pictures provided by the image generating device.

Saliency map, refers to a salient region in a picture, i.e. a region of interest to a viewer. When seeing the image, the viewer will automatically process the regions of interest and selectively ignore the regions of no interest, these regions of interest to the viewer are called salient regions.

After the image generation device acquires the background picture, a saliency map needs to be extracted.

For example, a saliency map in a background picture can be extracted using short-connection-based deep supervised saliency object detection (saliency with short connections). Or, the background picture can be subjected to saliency analysis through a bottom-up computational model or a top-down computational model, salient objects which may be noticed by human eyes are captured, and then the background picture is segmented according to the salient objects to obtain a saliency map containing the salient objects.

104. And acquiring pixel information of the saliency map, and determining the target position of the text plate in the background picture according to the pixel information of the saliency map.

The pixel information of the saliency map comprises pixel information in the saliency map, such as coordinates and/or color information of pixels of the saliency map in a background picture.

For example, the image generating device may calculate, according to the pixel information of the saliency map, a balance energy value when the text plate is located at each preset candidate position in the background picture; and screening out the target position of the text plate in the background picture from the candidate positions according to the balance energy value. The method comprises the following specific steps:

and I, calculating an equilibrium energy value.

Wherein the candidate position may be described using its coordinates in the background picture. For example, the coordinates of the candidate position in the background picture may be determined by using a corner of the background picture or a center position of the background picture as an origin of the coordinate system. It should be noted that the candidate positions may be any coordinate position in the background picture, and in this embodiment, a plurality of candidate positions are exemplified.

The image generation means calculates the balance energy values of the text patches at the respective candidate positions, respectively. Wherein the quantification of the balance energy value characterizes the visual balance and the overall aesthetic degree of the saliency map and the text plate and/or the background picture.

As an embodiment, taking any candidate position as an example, the step "calculating the balance energy value when the text plate is located at the preset candidate position in the background picture according to the pixel information of the saliency map" may include: acquiring pixel information when a text plate is positioned at a candidate position; calculating to obtain an energy parameter according to the pixel information of the saliency map and the text plate; and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight and the energy parameter.

The image generation device acquires pixel information of the text plate when the text plate is located at the candidate position, wherein the pixel information includes coordinates of pixels in the text plate in the background picture, and/or color information of the pixels.

Then, the image generation device calculates the energy parameter according to the pixel information of the saliency map and the text plate. The energy parameters comprise parameters for quantifying the collision degree, the blank degree and/or the alignment degree of the saliency map and the text plate.

In this embodiment, if a text plate is located in the region R and the saliency map is the region I, the energy parameter is specifically calculated as follows:

1. the energy parameter includes a collision parameter.

The step of calculating the energy parameter according to the pixel information of the saliency map and the text plate may include: according to the pixel information of the saliency map and the text plate, counting overlapped pixels of the saliency map and the text plate; and calculating the conflict value of the saliency map and the text plate according to the overlapped pixels to obtain the conflict parameter.

Wherein, the conflict parameter quantification represents the conflict degree displayed by the saliency map and the text plate.

The image generation device may count pixels where the saliency map and the text plate overlap in position according to coordinates of the pixels in the saliency map and the text plate, to obtain overlapping pixels. In order to make the degree of overlap more clear and remove the overlapped portion of the light color or the achromatic color, the image generation apparatus may count only overlapped pixels within a fixed color value or a preset color value range when counting the overlapped pixels.

After obtaining the overlapped pixels, the image generating device may calculate the occupation ratio of the overlapped pixels in all pixels of the text plate as the collision parameter. Further, the image generating means may calculate a ratio of the number of overlapping pixels to text entry pixels in the text plate as the collision parameter.

For example, the image generation means may calculate the collision parameter es (R) when the text plate is located in the region R using the following formula:

wherein (x, y) is pixel coordinate, (x, y) ∈ R indicates the pixel of the text plate located in the region R, I (x, y) indicates the pixel located in the saliency map, M (x, y) indicates the pixel with color value M, for example, M may take the values of 255. ∑ I (x, y), (x, y) ∈ R indicates the number of pixels located in both the region R and the region I, ∑ M (x, y), (x, y) ∈ R indicates the number of pixels with color value M in the region R.

The image retrieval means may calculate a ratio value of the number of overlapping pixels to pixels having a color value M in the text patch as the collision parameter.

2. The energy parameter comprises a whiteout parameter.

The step of calculating the energy parameter according to the pixel information of the saliency map and the text plate may include: counting the blank pixels in the saliency map and the text plate according to the pixel information of the saliency map and the text plate; and calculating the blank leaving rate according to the blank leaving pixels to obtain the blank leaving parameters.

Wherein, the conflict parameter quantization represents the regional condition of the frame and text plate blank. A blank pixel refers to a pixel of a predetermined color value, which may be a color value of white, a light system color, and/or a transparent color, etc., in general.

The image retrieval device can count pixels with preset color values in the saliency map and the text plate according to the color values of the pixels in the saliency map and the text plate to obtain the margin pixels.

Then, the image retrieval apparatus may calculate the margin parameter eu (R) when the text plate is located in the region R using the following formula:

wherein (x, y) is pixel coordinate, (x, y) ∈ R indicates the pixel of the text plate located in the region R, (x, y) ∈ I indicates the pixel located in the saliency map, N (x, y) is the pixel with color value N, N is a preset value, for example, N can take the values of 1, ∑ N (x, y), (x, y) ∈ R indicates the number of pixels with color value N located in the region R, ∑ N (x, y), (x, y) ∈ I indicates the number of pixels with color value N located in the region I.

Thus, the image retrieval device can calculate the ratio of the number of pixels with the color value of N in the text plate area and the saliency map as the margin parameter.

3. The energy parameter includes an alignment parameter.

The step of calculating the energy parameter according to the pixel information of the saliency map and the text plate may include: calculating the deviation value of the saliency map and the text plate in the preset direction according to the pixel information of the saliency map and the text plate; and calculating to obtain the alignment parameters according to the preset coefficient and the deviation value corresponding to the preset direction.

The alignment parameter quantification represents the alignment condition of the saliency map and the text plate in the preset direction. The predetermined direction includes a lateral direction and/or a longitudinal direction. In the present embodiment, the predetermined direction includes a lateral direction and a longitudinal direction.

The image retrieval device can respectively count the number of pixels of the saliency map aligned with the text plate in the transverse direction and the longitudinal direction, and calculate the deviation rate in the transverse direction and the longitudinal direction; and then, calculating to obtain an alignment parameter according to a preset coefficient and a deviation rate. For example, the image retrieval means may calculate the alignment parameter em (R) when the text plate is located in the region R using the following formula:

wherein, (x, y) is pixel coordinates;

y ∈ Ry refers to a pixel having a different abscissa but the same ordinate from that of the pixel in the text block, x ∈ R_x，

Refers to pixels that are the same abscissa as the pixels within the text panel, but not the same ordinate, I (x, y) refers to the pixel within the saliency map, ∑ I (x, y),

refers to the number of pixels within the saliency map that are not the same as the pixels within the text slab on the abscissa, but on the ordinate, ∑ I (x, y),

the number of pixels in the saliency map, which have the same abscissa but different ordinate as the pixel in the text plate, is O (x, y), which is a pixel having a color value of O, where O is a preset value, for example, in this embodiment, O may take the value 255, ∑ 0(x, y),

meaning that the saliency map is not the same as the pixel within the text tile on the abscissa, but on the ordinate, and has a color value of O, ∑ 0(x, y),

the number of pixels in the saliency map and the text panel having the same abscissa but different ordinate, and a color value of O.

And

the deviation rates, μ, of the saliency map and the text plate in the horizontal and vertical directions, respectively_m1And mu_m2Respectively, the preset coefficients corresponding to the horizontal direction and the vertical direction.

Thus, the image retrieval device can calculate the alignment parameter.

After obtaining the energy parameters, the image retrieval device may calculate the balance energy value of the text plate located at the current candidate position according to the preset weight corresponding to each energy parameter and the energy parameters. The preset weight can be flexibly configured according to actual needs, for example, the value of the weight is proportional to the importance degree of each energy parameter. It should be noted that the sum of the preset weights corresponding to the energy parameters is equal to 1.

The energy parameters may include a collision parameter, a whiteout parameter, and/or an alignment parameter. In this embodiment, the energy parameters include a conflict parameter, a blank parameter, and an alignment parameter.

The image retrieval means may calculate the equilibrium energy e (R) when the text slab is located in the region R using the following formula:

E(R)＝α×Es(R)+β×Eu(R)+γ×Em(R)。

α, β and gamma are preset weights corresponding to the conflict parameter, the blank parameter and the alignment parameter respectively, and the sum of the three is equal to 1.

Thus, the image retrieval device can calculate the balance energy values of the text plate at the candidate positions respectively.

In some embodiments, the alignment of the text plate and the background picture can be taken into account. The step of "calculating the balance energy value when the text plate is located at the candidate position according to the preset weight and the energy parameter" may include: acquiring central position information of the background picture and the text plate, and calculating according to the central position to obtain central parameter information; and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight, the energy parameter and the central parameter.

The central position of the background picture refers to the geometric center of the background picture, and can be represented by coordinates. The center position of the text plate refers to the geometric center of the text plate at the candidate position, and can be represented by coordinates.

The image generating device acquires the central coordinates of the background picture and the text plate respectively as central position information. Taking the central coordinate of the background picture as (C)_x，C_y) The center coordinate of the text plate is (x)_c，y_c)。

Then, the image retrieval apparatus may calculate the center parameter ec (r) using the following formula:

wherein, mu_c1And mu_c2Respectively, the preset coefficients corresponding to the transverse direction and the longitudinal direction.

By way of example, the energy parameters include a collision parameter, a blank parameter, an alignment parameter, and a center parameter, and the image retrieval apparatus may calculate the balance energy e (R) when the text slab is located in the region R using the following formula:

E(R)＝α×Es(R)+β×Eu(R)+γ×Em(R)+×Ec(R)。

wherein, a, β, γ and preset weights respectively corresponding to the conflict parameter, the blank leaving parameter, the alignment parameter and the center parameter, and the sum of the four is equal to 1.

And II, determining a target position.

After obtaining the balance energy value, the image generation device may set, as an embodiment, a candidate position having the smallest balance energy value among the candidate positions as the target position. The balance energy value is minimum, which indicates that the text plate is positioned at the position, and the visual balance and the integral aesthetics of the saliency map and the text plate and/or the background picture in the target image can be optimized.

Thus, the image generating device obtains the target position of the text plate in the background picture.

105. And typesetting the background picture and the text plate according to the target position to generate a target image.

For example, the image generating apparatus sets a text plate at a target position in the background picture, completing the layout. Then, the image generating means combines the background picture and the text plate to generate a target image.

As shown in fig. 1d, in the target image generated by the image generating means, the saliency map is a phonograph, and the text items include two text items of a music afternoon tea and a music station of a youth of the literature. It can be seen that the text items are well laid out, and the layout of the saliency map and the text items achieves the visual balance and the integral harmony.

As can be seen from the above, in the embodiment of the present invention, by acquiring the input text item and screening the target template from the preset candidate text templates according to the text item, the matching degree between the obtained target template and the text item is high; then, typesetting the text items according to the target template to generate text plates; acquiring a background picture, and extracting a saliency map of the background picture; then, acquiring pixel information of the saliency map, and determining the target position of the text plate in the background picture according to the pixel information of the saliency map; and typesetting the background picture and the text plate according to the target position to generate a target image. According to the scheme, a proper text template is correspondingly screened for each input text item, so that a target template which is most matched with the text item is obtained, and personalized pattern typesetting is realized. And the target position of the text plate is determined according to the saliency map in the background picture, so that the text plate and the background picture are typeset, the text plate and the saliency map are balanced or reach an optimal fusion state in the background picture, the visual balance of the target image is realized, and the integral aesthetics of the image typesetting is improved. Therefore, the scheme realizes the automatic typesetting of the document contents and the document positions and the automatic generation of the image-text images, does not need manual setting or adjustment of a user, saves the time and energy of the user, and improves the automation degree of image-text image production while improving the overall effect of the images.

The method according to the preceding embodiment is illustrated in further detail below by way of example.

For example, referring to fig. 2a, in the present embodiment, the image generating apparatus will be described as being specifically integrated in a server.

201. And obtaining the subject information, and screening candidate text templates from a preset template library according to the subject information.

For example, when a user creates a target image such as a poster, the user inputs a theme of the target image using a device such as a mobile phone. Alternatively, the user may select a subject of the target image among candidate subjects provided by the server. Wherein the theme of the target image can be a style theme and/or a scene theme, etc., such as a small fresh, a billboard, etc.

The server obtains the subject information. It should be noted that one or more theme items may be included in the theme information. As shown in fig. 2b, the server provides topics of two levels, namely, a first level classification and a second level classification, each level has a corresponding candidate topic, the user selects the topic type of emotion under the first level classification, and the topic under the second level classification is null.

And then, the server screens out the text template corresponding to the subject information in a preset template library to serve as a candidate text template. The preset template library comprises text templates corresponding to preset subjects, and the mapping relation between the subjects and the text templates is recorded.

For example, if the topic information includes a topic item, the server selects a text template corresponding to the topic item according to a preset topic-template mapping relationship in the template library, and the selected text template is used as a candidate text template.

For example, if the topic information includes a plurality of topic items, the server selects a text template corresponding to each topic item according to a preset topic-template mapping relationship in the template library. Then, screening out the text templates which are repeated twice or more from the text templates corresponding to the subject items as candidate text templates.

Therefore, the candidate template of the file is determined according to the theme of the target image, and the actual requirements of the theme of the target image on the file are better met.

202. And acquiring input text items, and screening a target template from the candidate text templates according to the text items.

For a specific implementation, reference may be made to the description in step 101 of the above-mentioned embodiment of the image generation method, which is not described herein again.

203. And typesetting the text items according to the target template to generate a text plate.

For a specific implementation, reference may be made to the description in step 102 of the above-mentioned embodiment of the image generation method, which is not described herein again.

204. And acquiring keyword information, and screening a plurality of candidate background pictures in a preset background library according to the keyword information.

It should be noted that the execution timings of

steps

204, 205, 206 and 207 can be executed before, after or simultaneously with

steps

201 and 203.

For example, the user may use a device such as a mobile phone to input keywords to filter the background pictures. It should be noted that there may be one or more keywords input by the user, and the keywords may be the same as the subject. Alternatively, the image generating apparatus may extract the keyword information by performing word segmentation processing on the input text item. For example, the image generation device performs word segmentation on the text item according to a preset dictionary, and obtains the part of speech of the word segmentation; then, the image generation apparatus configures the segmented words whose parts of speech are in a preset category as keywords. Wherein the preset category can be nouns, adjectives and/or verbs, etc., nouns such as cakes, adjectives such as refreshment, verbs such as dances, etc.

The server acquires keyword information input by a user, and a plurality of candidate background pictures are screened out from a preset background library according to the keyword information. The preset background library comprises background pictures corresponding to preset keywords, and the mapping relation between the keywords and the background pictures is recorded.

For example, if the keyword information includes a keyword, the server selects a background picture corresponding to the keyword as a candidate background picture according to a preset keyword-background picture mapping relationship in the background library.

For example, if the keyword information includes a plurality of keywords, the server selects a background picture corresponding to each keyword according to a keyword-background picture mapping relationship preset in the background library. Then, the background pictures which are repeated twice or more are screened out from the background pictures corresponding to the keywords to serve as candidate background pictures.

Therefore, the candidate background pictures are determined according to the keywords, and the flexibility and diversity of background picture selection are improved.

205. And obtaining the description texts corresponding to the candidate background images, and respectively calculating the similarity between each description text and the text item.

The description text corresponding to the candidate background image may be preset, or may be configured according to an object and/or color and the like included in the background image after the server identifies the candidate background image, and specifically may be flexibly configured according to actual needs.

Taking a description text corresponding to any candidate background image as an example, the server performs word segmentation on the text item and the description text of the candidate background image respectively to obtain a word vector space. Then, according to the text item and the word vector space describing the text, the similarity between the two is calculated, for example, the similarity can be calculated by using a cosine algorithm, and the similarity can also be calculated by using a likelihood calculation method. Therefore, the server can respectively calculate the similarity between the text item and each candidate background image description text.

206. And selecting a background picture from the candidate background pictures according to the similarity.

For example, the server may select a description text with the largest similarity value as a best matching text, and determine a candidate background image corresponding to the description text as a background image of the target image.

In some embodiments, the historical performance of the candidate background pictures can be further considered, and the step "selecting a background picture among the plurality of candidate background pictures according to the similarity" may include: acquiring performance values corresponding to the candidate background pictures; respectively calculating the total score of each candidate background image according to the performance value and the similarity; and selecting a background picture from the candidate background pictures according to the total score of each candidate background picture.

The performance value quantifications represent the performance capability of the candidate background images, and may be, for example, a historical click rate or a user's approval rate. As an implementation manner, the server may obtain the historical click times of the candidate background map, and perform regularization processing on the historical click times to obtain the historical click rate of the candidate background map.

Taking any candidate background image as an example, the server acquires the similarity between the description text of the candidate background image and the text item and the performance value of the candidate background image. Then, the server calculates the product of the similarity and the performance value, and takes the obtained product as the total score of the candidate background picture. Therefore, the server can respectively calculate the total score of each candidate background map.

Then, the server may select, from the plurality of candidate background images, a candidate background image with the largest total score value as the background image of the target image.

Therefore, the background picture selected by the server comprehensively considers the fit degree with the target theme and the performance capability of the background picture, so that the attraction of the target picture to audiences is improved, and a good commercial effect is achieved.

207. And extracting a saliency map of the background picture.

For a specific implementation, reference may be made to the description in step 103 of the above-mentioned embodiment of the image generation method, which is not described herein again.

208. And acquiring pixel information of the saliency map, and determining the target position of the text plate in the background picture according to the pixel information of the saliency map.

For a specific implementation, reference may be made to the description in step 104 of the above-mentioned embodiment of the image generation method, which is not described herein again.

209. And typesetting the background picture and the text plate according to the target position to generate a target image.

The detailed description of the step 105 of the above-mentioned embodiment of the image generation method can be referred to, and is not repeated herein.

Therefore, in the embodiment of the invention, the user can respectively and correspondingly screen the text template and the background picture according to the theme and the keywords, so that the diversity and the flexibility of making the target image are realized. Moreover, the text template and the background picture obtained by screening better accord with the preference of the user and adapt to the application scene, the text template and the background picture are more fit with the theme to be transmitted by the target image, the transmitted text information is highlighted, and the commercial value and the practical value of the target image are improved.

For example, referring to fig. 3a, in the present embodiment, the image generating apparatus will be described as being specifically integrated in a server.

And (I) configuring a template library.

The user can configure the text template of the file and store the text template in the template library.

In some embodiments, the user may also configure the mapping relationship between the text template and the subject, and store the mapping relationship in the template library.

Alternatively, the user may upload the image such as a poster to the server. The server extracts the text plate in the image-text image, then configures the text template according to the format information and/or color information and the like of the sample text in the text plate, and stores the configured text template into the template library.

In some embodiments, when the user uploads the image-text image, the user can also set a theme corresponding to the image-text image. Therefore, the server configures the mapping relation between the text template and the theme in the template library according to the theme corresponding to the image-text image.

And (II) acquiring text items and themes.

When the user creates the target image, the user can input the file to be displayed in the target image, namely the text item, to the server. And, the user can input subject information to match a more appropriate pattern template.

The server receives the text item and the subject information input by the user.

And (III) determining a target template.

And the server screens a plurality of candidate text templates from the template library according to the theme information and the mapping relation between the text templates and the themes in the template library. Therefore, the server screens out a target template with the highest matching degree with the text item from the candidate text templates according to the distance between the text item and the sample text in the candidate text templates.

The target template comprises information of the arrangement line number, the line spacing, the word size, the alignment mode and the like of the text items. The target template shown in fig. 3b includes a first line of text and a second line of text, etc. The first line of text and the second line of text have different word sizes and different alignment modes.

And (IV) generating text plates.

After the target template is obtained, the server typesets the text items according to the target template to generate text plates.

Based on the target template shown in fig. 3b, the text plate shown in fig. 3c includes the first text entry and the second text entry in the text item.

In some embodiments, the background of the text plate may be set to a transparent color.

And (V) extracting a saliency map of the background picture.

The server can acquire the background picture uploaded or selected by the user and then extract the saliency map of the background picture.

Such as the background picture shown in fig. 3d, in which the saliency map is identified.

And (VI) determining the target position according to the balance energy value.

The server can calculate the balance energy value when the text plate is positioned at each preset candidate position in the background picture according to the pixel information of the saliency map; and then, screening out the target position of the text plate in the background picture from the candidate positions according to the balance energy value.

In some embodiments, before calculating the balance energy value, the server may select a candidate position of the text plate in the background picture according to a preset rule.

The preset rule may be to select any position in the background picture as a candidate position.

To reduce the amount of computation of the image generation apparatus, for example, the preset rule may be to eliminate a position beyond the range of the background picture. And the server eliminates the position of the text plate beyond the area range of the background picture according to the size of the text plate, and the rest positions in the background picture are available candidate positions.

For example, the preset rule may be to cull out locations that completely overlap the saliency map. Then, the server eliminates the position of the text plate completely or partially within the range of the salient region according to the size of the text plate, and the rest positions in the background picture are available candidate positions.

In some embodiments, when calculating the balance energy value, taking any candidate position as an example, the server may obtain pixel information of a text plate when the text plate is located at the candidate position; calculating to obtain an energy parameter according to the pixel information of the saliency map and the text plate; and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight and the energy parameter.

The energy parameter may include a collision parameter, a whiteout parameter, and/or an alignment parameter, among others.

And (seventhly) generating a target image.

And after obtaining the background picture, the text plate and the target position thereof, the server typesets the background picture and the text plate according to the target position to generate a target image containing the pictures and texts.

Based on the text plate shown in fig. 3c and the background picture shown in fig. 3d, the target image shown in fig. 3e is obtained, wherein the saliency map and the text item are included, the current position of the text item minimizes the balance energy value, the overall balance and the visual aesthetic feeling of the target image are realized, and the propaganda file is highlighted.

As can be seen from the above, the embodiment of the invention screens the text template with the closest distance according to the subject and the content of the text item, thereby realizing the adaptation of the pattern template; and the visual balance and the integral aesthetics of the image typesetting after image-text fusion are measured through the energy value, the optimal position of the file in the background picture is found, the good image-text effect can be achieved without manual adjustment of a user, and the automation degree of image making is improved.

In order to better implement the above method, the embodiment of the present invention further provides an image generation apparatus, which may be specifically integrated in a terminal device such as a server, a mobile terminal, or a personal computer.

For example, as shown in fig. 4a, the image generation apparatus may include a template unit 401, a slab unit 402, a saliency map unit 403, a position unit 404, and an image unit 405 as follows:

a template unit 401;

the template unit 401 is configured to obtain an input text item, and screen out a target template from preset candidate text templates according to the text item.

For example, when a user creates a target image such as a flat advertisement, a text item may be input on a document page provided by the template unit 401 through a device such as a mobile phone, a tablet computer, or a personal computer. Where the text items entered by the user may include text entries in one or more dimensions. The dimension can be the category, the hierarchy, the priority and/or the sequencing of the text entry, and the like, and can be flexibly configured according to the actual requirement.

The template unit 401 acquires a text item input by the user. Then, the template unit 401 screens out a target template adapted to the text item from the preset candidate text templates to achieve the optimal typesetting effect of the text. The candidate text template can be preset or obtained by screening according to the theme of the target image in advance, and can be flexibly configured according to actual needs. There may be one or more candidate text templates, and if there is only one candidate text template, the candidate text template is used as the target template. The present embodiment is exemplified by a plurality of candidate text templates.

In some embodiments, the template unit 401 may specifically be configured to: obtaining a plurality of sample texts contained in each candidate text template; and respectively calculating the distance between the text item and each sample text, and selecting a target template from the candidate text templates according to the distance.

Specifically, the template unit 401 may acquire sample texts included in the respective candidate text templates, respectively. Wherein the sample text may contain sample entries in one or more dimensions.

Then, the template unit 401 calculates the distance between the text item and each sample text, respectively. Where the distance between the text item and the sample text is used to describe the degree of similarity between the text item and the sample text, for example, the template unit 401 may calculate semantic similarity, hamming distance, and the like between the text item and the sample text as the distance therebetween.

As an embodiment, when the template unit 401 filters the target template based on the number of texts contained in the text item, the similarity between the text item and the number of texts in the sample text may be calculated as the distance between the text item and the sample text. Specifically, taking a sample of any candidate text template text as an example, the template unit 401 may be configured to: acquiring text items of the text items and the sample text in each dimension, and respectively calculating to obtain the distance between the text items in each dimension; and calculating the distance between the text item and the sample text according to the preset penalty factor and the distance between the text items of each dimension.

The template unit 401 counts the dimension of the text entry in the sample text, and the template unit 401 counts the dimension of the text entry in the text entry.

Then, the template unit 401 performs merging and deduplication processing on the text entry dimensions contained in the sample text and the text entry to obtain all dimensions contained in the sample text and the text entry.

Then, the template unit 401 obtains text entries of the text entry and the sample text in each dimension according to all the obtained dimensions, and calculates the distance between the text entries in each dimension.

For example, the template unit 401 may count the number of characters, such as the number of chinese characters, the number of english characters, or the number of all characters, in the text entry of each dimension for the text item and the sample text, respectively. And if the text item or the sample text has no text entry in a certain dimension, the number of characters of the text entry in the dimension is zero.

Then, the template unit 401 calculates the difference in the number of characters of the text entry and the text entry of the sample text in the same dimension, and takes the absolute value of the number difference as the text entry distance of the dimension. Thus, the template unit 401 calculates the text entry distance of each dimension between the text item and the sample text.

Further, the template unit 401 multiplies the text entry distance of each dimension by the corresponding preset penalty factor, respectively, to obtain the reference distance of each dimension between the text entry and the sample text. The penalty factor may be pre-configured, and the value of the penalty factor may be proportional to the importance of the corresponding dimension. Therefore, the influence of the text entry distance of each dimension on the distance between the text item and the sample text in the morning is highlighted or weakened through the penalty factor, and further personalized customization of target template screening is realized, and the method is more suitable for customer requirements.

Then, the template unit 401 sums the reference distances of the text item and the sample text in each dimension to obtain the distance between the text item and the sample text.

For example, the text item is text1 and the sample text is text 2. The number of text entry words of dimension x in a text item is x₁The number of words of the text entry of dimension x in the sample text is x₂(ii) a The number of text entry words in the dimension y of a text item is y₁Text bar of dimension y in sample textNumber of digits is y₂(ii) a The number of text entry words in a text item having a dimension z is z₁The number of words of a text entry in the sample text having dimension z is z₂(ii) a … …, respectively; the number of text entry words in a text item of dimension w is w₁The number of words of the text entry in the dimension w in the sample text is w₂。

thus, the template unit 401 calculates a distance between the text item and each sample text, which represents the approximation degree of the specific components of the text item and the sample text. Thus, distance data corresponding to each sample text is obtained.

Then, the template unit 401 determines the correspondence between each candidate text template and the distance data according to the correspondence between each sample text and each candidate text template and the correspondence between each sample text and the distance data.

After obtaining the distance, the template unit 401 selects a target template from the candidate text templates according to a preset screening rule. For example, the preset filtering rule may be to select, as the target template, a candidate text template with a smallest or largest distance value from the text item among a plurality of preset candidate text templates.

Thereby, the template unit 401 obtains the target template.

(II) a plate unit 402;

and a block unit 402, configured to type the text items according to the target template, and generate text blocks.

After the target template is obtained, the block unit 402 performs layout on the text items according to the layout information included in the target template to generate a text block, which is used to fill the background picture to generate the target image.

(III) saliency map cell 403;

the saliency map unit 403 is configured to obtain a background picture and extract a saliency map of the background picture.

It should be noted that the execution timing of the saliency map unit 403 may be before or after the template unit 401 and the plate unit 402, and may be executed at the same time.

After the saliency map unit 403 acquires the background picture, a saliency map in the background picture needs to be extracted.

For example, the saliency map unit 403 may extract a saliency map in the background picture using short-connection-based depth supervised saliency object detection with short connections. Or, the background picture can be subjected to saliency analysis through a bottom-up computational model or a top-down computational model, salient objects which may be noticed by human eyes are captured, and then the background picture is segmented according to the salient objects to obtain a saliency map containing the salient objects.

(IV) position unit 404:

and a position unit 404, configured to obtain pixel information of the saliency map, and determine a target position of the text plate in the background picture according to the pixel information of the saliency map.

For example, the position unit 404 may specifically calculate, according to the pixel information of the saliency map, a balance energy value when the text plate is located at each preset candidate position in the background picture; and screening out the target position of the text plate in the background picture from the candidate positions according to the balance energy value. The method comprises the following specific steps:

and I, calculating an equilibrium energy value.

The location unit 404 calculates the equilibrium energy values of the text slabs at the respective candidate locations, respectively. Wherein the quantification of the balance energy value characterizes the visual balance and the overall aesthetic degree of the saliency map and the text plate and/or the background picture.

As an embodiment, taking any candidate location as an example, the location unit 404 may specifically be configured to: acquiring pixel information when a text plate is positioned at a candidate position; calculating to obtain an energy parameter according to the pixel information of the saliency map and the text plate; and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight and the energy parameter.

The location unit 404 obtains pixel information of the text plate at the candidate location, including coordinates of pixels in the text plate in the background picture, and/or color information of the pixels.

Then, the location unit 404 calculates an energy parameter according to the pixel information of the saliency map and the text plate. The energy parameters comprise parameters for quantifying the collision degree, the blank degree and/or the alignment degree of the saliency map and the text plate.

1. the energy parameter includes a collision parameter.

Location unit 404 may be specifically configured to: according to the pixel information of the saliency map and the text plate, counting overlapped pixels of the saliency map and the text plate; and calculating the conflict value of the saliency map and the text plate according to the overlapped pixels to obtain the conflict parameter.

The position unit 404 may count the pixels where the saliency map and the text plate overlap in position according to the coordinates of the pixels in the saliency map and the text plate, to obtain overlapping pixels. To further clarify the overlapping degree and remove the overlapped part with light color or no color, the position unit 404 may count only the overlapped pixels within the fixed color value or the preset color value range when counting the overlapped pixels.

After obtaining the overlapping pixels, location unit 404 may calculate the occupation ratio of the overlapping pixels in all pixels of the text plate as a collision parameter. Further, location unit 404 may calculate a ratio of the number of overlapping pixels to text entry pixels in the text plate as the collision parameter.

For example, the location unit 404 may calculate the collision parameter es (R) when the text plate is located in the region R using the following formula:

The location unit 404 may calculate a ratio value of the overlapped pixel to a pixel having a color value M in the text plate as the collision parameter.

2. The energy parameter comprises a whiteout parameter.

Location unit 404 may be specifically configured to: counting the blank pixels in the saliency map and the text plate according to the pixel information of the saliency map and the text plate; and calculating the blank leaving rate according to the blank leaving pixels to obtain the blank leaving parameters.

The position unit 404 may count pixels with preset color values in the saliency map and the text plate according to the color values of the pixels in the saliency map and the text plate, and obtain a margin pixel.

Then, location unit 404 may calculate the margin parameter eu (R) when the text plate is located in region R using the following formula:

Thus, the location unit 404 may calculate a ratio of the number of pixels having a color value of N in the text slab region and the saliency map as the resulting margin parameter.

3. The energy parameter includes an alignment parameter.

Location unit 404 may be specifically configured to: calculating the deviation value of the saliency map and the text plate in the preset direction according to the pixel information of the saliency map and the text plate; and calculating to obtain the alignment parameters according to the preset coefficient and the deviation value corresponding to the preset direction.

The position unit 404 may count the number of pixels of the saliency map aligned with the text plate in the horizontal and vertical directions, respectively, and calculate the deviation rates in the horizontal and vertical directions; and then, calculating to obtain an alignment parameter according to a preset coefficient and a deviation rate. For example, the location unit 404 may calculate the alignment parameter em (R) when the text plate is located in the region R using the following formula:

wherein, (x, y) is pixel coordinates;

y∈R_yrefers to pixels having the same abscissa as the abscissa of the pixel in the text block, x ∈ R_x，

meaning that the saliency map is not the same as the pixel inside the text panel on the abscissa, but on the ordinateThe coordinates are the same, and the number of pixels with a color value of O, ∑ 0(x, y),

the number of pixels in the saliency map and the text panel, which have the same abscissa but different ordinate and a color value of O, is referred to.

And

the deviation rates, μ, of the saliency map and the text plate in the horizontal and vertical directions, respectively_mm1And mu_m2Respectively, the preset coefficients corresponding to the horizontal direction and the vertical direction.

Thus, the position unit 404 may calculate the alignment parameters.

After obtaining the energy parameters, the location unit 404 may calculate a balance energy value when the text plate is located at the current candidate location according to the preset weight corresponding to each energy parameter and the energy parameter. The preset weight can be flexibly configured according to actual needs, for example, the value of the weight is proportional to the importance degree of each energy parameter. It should be noted that the sum of the preset weights corresponding to the energy parameters is equal to 1.

The location unit 404 may calculate the equilibrium energy e (R) when the text slab is located in the region R using the following formula:

E(R)＝α×Es(R)+β×Eμ(R)+γ×Em(R)。

Thus, the location unit 404 can calculate the balance energy value when the text plate is located at each candidate location.

In some embodiments, the alignment of the text plate and the background picture can be taken into account. The location unit 404 may specifically be configured to: acquiring central position information of the background picture and the text plate, and calculating according to the central position to obtain central parameter information; and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight, the energy parameter and the central parameter.

The position unit 404 acquires the center coordinates of the background picture and the text plate, respectively, as center position information. Taking the central coordinate of the background picture as (C)_x，C_y) The center coordinate of the text plate is (x)_c，y_c)。

Then, location unit 404 may calculate center parameter E using the following formula_c(R)：

For example, the energy parameters include a collision parameter, a margin parameter, an alignment parameter, and a center parameter, the location unit 404 may calculate the balance energy e (R) when the text slab is located in the region R using the following formula:

E(R)＝α×Es(R)+β×E_μ(R)+γ×Em(R)+×Ec(R)。

Thus, the location unit 404 calculates the balance energy values of the text slabs at the candidate locations respectively.

And II, determining a target position.

After obtaining the balance energy value, as an embodiment, the position unit 404 takes the candidate position with the smallest balance energy value among the candidate positions as the target position. The balance energy value is minimum, which indicates that the text plate is positioned at the position, and the visual balance and the integral aesthetics of the saliency map and the text plate and/or the background picture in the target image can be optimized.

Thus, the position unit 404 obtains the target position of the text plate in the background picture.

In some embodiments, before calculating the balance energy value, the location unit 404 may select candidate locations of text slabs in the background picture according to a preset rule.

The preset rule may be to select any position in the background picture as a candidate position. In order to reduce the amount of calculation by the image generation device, for example, the preset rule may be to eliminate a position beyond the range of the background picture or eliminate a position completely overlapping with the saliency map.

(IV) an image unit 405;

and an image unit 405, configured to perform typesetting on the background picture and the text plate according to the target position, so as to generate a target image.

For example, the image unit 405 sets a text plate at a target position in the background picture, completing the layout. Then, the image unit 405 combines the background picture and the text plate to generate a target image.

As can be seen from the above, in the embodiment of the present invention, the template unit 401 is used to obtain the input text item, and the target template is screened from the preset candidate text templates according to the text item, so that the matching degree between the obtained target template and the text item is high; then, the block unit 402 typesets the text items according to the target template to generate text blocks; moreover, the saliency map unit 403 acquires a background picture and extracts a saliency map of the background picture; then, the position unit 404 obtains the pixel information of the saliency map, and determines the target position of the text plate in the background picture according to the pixel information of the saliency map; the image unit 405 performs layout on the background picture and the text plate according to the target position to generate a target image. According to the scheme, a proper text template is correspondingly screened for each input text item, so that a target template which is most matched with the text item is obtained, and personalized pattern typesetting is realized. And the target position of the text plate is determined according to the saliency map in the background picture, so that the text plate and the background picture are typeset, the text plate and the saliency map are balanced or reach an optimal fusion state in the background picture, the visual balance of the target image is realized, and the integral aesthetics of the image typesetting is improved. Therefore, the scheme realizes the automatic typesetting of the document contents and the document positions and the automatic generation of the image-text images, does not need manual setting or adjustment of a user, saves the time and energy of the user, and improves the automation degree of image-text image production while improving the overall effect of the images.

Further, as shown in fig. 4b, the image generating apparatus may include a template unit 401, a slab unit 402, a saliency map unit 403, a position unit 404, an image unit 405, and a theme unit 406. The specific implementation of the template unit 401, the plate unit 402, the saliency map unit 403, the position unit 404, and the image unit 405 may refer to the description in the above embodiment of the image generation apparatus, and the theme unit 406 is as follows:

(six) a subject unit 406.

And a theme unit 406, configured to obtain theme information, and screen out a candidate text template in a preset template library according to the theme information.

For example, when a user creates a target image such as a poster, the user inputs a theme of the target image using a device such as a mobile phone. Alternatively, the user may select a subject of the target image among the candidate subjects provided by the subject unit 406. Wherein the theme of the target image can be a style theme and/or a scene theme, etc., such as a small fresh, a billboard, etc.

The subject unit 406 acquires subject information. It should be noted that one or more theme items may be included in the theme information. Then, the topic unit 406 screens out a text template corresponding to the topic information in a preset template library as a candidate text template. The preset template library comprises text templates corresponding to preset subjects, and the mapping relation between the subjects and the text templates is recorded.

Furthermore, in some embodiments, saliency map unit 403 may be specifically configured to: acquiring keyword information, and screening a plurality of candidate background pictures in a preset background library according to the keyword information; obtaining description texts corresponding to the candidate background images, and respectively calculating the similarity between each description text and a text item; and selecting a background picture from the candidate background pictures according to the similarity.

For example, the user may use a device such as a mobile phone to input keywords to filter the background pictures. It should be noted that there may be one or more keywords input by the user, and the keywords may be the same as the subject. Alternatively, the image generating apparatus may extract the keyword information by performing word segmentation processing on the input text item.

The saliency map unit 403 obtains keyword information input by a user, and screens a plurality of candidate background maps from a preset background library according to the keyword information. The preset background library comprises background pictures corresponding to preset keywords, and the mapping relation between the keywords and the background pictures is recorded. Therefore, the candidate background pictures are determined according to the keywords, and the flexibility and diversity of background picture selection are improved.

The description text corresponding to the candidate background image may be preset, or may be configured according to an object and/or color and the like included in the background image after the saliency map unit 403 identifies the candidate background image, and may be flexibly configured according to actual needs.

Taking a description text corresponding to any candidate background image as an example, the saliency map unit 403 performs word segmentation on the text item and the description text of the candidate background image respectively to obtain a word vector space. Then, according to the text item and the word vector space describing the text, the similarity between the two is calculated, for example, the similarity can be calculated by using a cosine algorithm. Thus, the saliency map unit 403 may calculate the similarity between the text item and each candidate background map description text, respectively.

The saliency map unit 403 may select the description text with the largest similarity value as the best matching text, and determine a candidate background map corresponding to the description text as the background picture of the target image.

In some embodiments, the historical performance of the candidate background map may also be considered, and the saliency map unit 403 may be specifically configured to: acquiring performance values corresponding to the candidate background pictures; respectively calculating the total score of each candidate background image according to the performance value and the similarity; and selecting a background picture from the candidate background pictures according to the total score of each candidate background picture.

The performance value quantifications represent the performance capability of the candidate background images, and may be, for example, a historical click rate or a user's approval rate. As an embodiment, the saliency map unit 403 may obtain the historical click times of the candidate background map, and perform regularization on the historical click times to obtain the historical click rate of the candidate background map.

Taking any candidate background image as an example, the saliency map unit 403 obtains the similarity between the description text of the candidate background image and the text item, and the performance value of the candidate background image. Then, the saliency map unit 403 calculates a product of the similarity and the performance value, and takes the resultant product as the total score of the candidate background map. Thus, the saliency map unit 403 may calculate a total score of each candidate background map.

Then, the saliency map unit 403 may select, from the plurality of candidate background maps, a candidate background map with the largest total score value as the background picture of the target image.

Therefore, the background picture selected by the saliency map unit 403 comprehensively considers the fit degree with the target theme and the performance capability of the background picture, so that the attraction of the target picture to the audience is improved, and a good commercial effect is achieved.

An embodiment of the present invention further provides a server, as shown in fig. 5, which shows a schematic structural diagram of the server according to the embodiment of the present invention, specifically:

the server may include components such as a processor 501 of one or more processing cores, memory 502 of one or more computer-readable storage media, a power supply 503, and an input unit 504. Those skilled in the art will appreciate that the server architecture shown in FIG. 5 is not meant to be limiting, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 501 is a control center of the server, connects various parts of the entire server by various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring of the server. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.

The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502. The memory 502 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as an image generation program) required for at least one function, and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.

The server further comprises a power supply 503 for supplying power to each component, and preferably, the power supply 503 may be logically connected to the processor 501 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are realized through the power management system. The power supply 503 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The server may also include an input unit 504, and the input unit 504 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the server may further include a display unit and the like, which will not be described in detail herein. Specifically, in this embodiment, the processor 501 in the server loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 501 runs the application program, such as the image generation program, stored in the memory 502, so as to implement various functions, as follows:

acquiring an input text item, and screening a target template from preset candidate text templates according to the text item; typesetting the text items according to the target template to generate text plates; acquiring a background picture, and extracting a saliency map of the background picture; acquiring pixel information of the saliency map, and determining the target position of the text plate in the background picture according to the pixel information of the saliency map; and typesetting the background picture and the text plate according to the target position to generate a target image.

In some embodiments, there are multiple candidate text templates, and the processor 501 runs the application program stored in the memory 502, and may further implement the following functions:

obtaining a plurality of sample texts contained in each candidate text template; and respectively calculating the distance between the text item and each sample text, and selecting a target template from the candidate text templates according to the distance.

In some embodiments, the sample text includes text entries in multiple dimensions, and the processor 501 runs the application stored in the memory 502, and may further implement the following functions:

acquiring text items of the text items and the sample text in each dimension, and respectively calculating to obtain the distance between the text items in each dimension; and calculating the distance between the text item and the sample text according to the preset penalty factor and the distance between the text items of each dimension.

In some embodiments, the processor 501 runs an application program stored in the memory 502, and may also implement the following functions:

calculating the balance energy value of the text plate when the text plate is positioned at each preset candidate position in the background picture according to the pixel information of the saliency map; and screening out the target position of the text plate in the background picture from the candidate positions according to the balance energy value.

acquiring pixel information when a text plate is positioned at a candidate position; calculating to obtain an energy parameter according to the pixel information of the saliency map and the text plate; and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight and the energy parameter.

In some embodiments, the energy parameter comprises a conflict parameter, and the processor 501 runs the application stored in the memory 502, and may further implement the following functions:

according to the pixel information of the saliency map and the text plate, counting overlapped pixels of the saliency map and the text plate; and calculating the conflict value of the saliency map and the text plate according to the overlapped pixels to obtain the conflict parameter.

In some embodiments, the energy parameter comprises a blank parameter, and the processor 501 runs an application program stored in the memory 502, and may further implement the following functions:

counting the blank pixels in the saliency map and the text plate according to the pixel information of the saliency map and the text plate; and calculating the blank leaving rate according to the blank leaving pixels to obtain the blank leaving parameters.

In some embodiments, the energy parameter packet alignment parameter, and the processor 501 running the application stored in the memory 502, may also implement the following functions:

calculating the deviation value of the saliency map and the text plate in the preset direction according to the pixel information of the saliency map and the text plate; and calculating to obtain the alignment parameters according to the preset coefficient and the deviation value corresponding to the preset direction.

acquiring central position information of the background picture and the text plate, and calculating to obtain a central parameter according to the central position information; and calculating to obtain the balance energy value of the text plate when the text plate is positioned at the candidate position according to the preset weight, the energy parameter and the central parameter.

and obtaining the subject information, and screening candidate text templates from a preset template library according to the subject information.

acquiring keyword information, and screening a plurality of candidate background pictures in a preset background library according to the keyword information; obtaining description texts corresponding to the candidate background images, and respectively calculating the similarity between each description text and a text item; and selecting a background picture from the candidate background pictures according to the similarity.

acquiring performance values corresponding to the candidate background pictures; respectively calculating the total score of each candidate background image according to the performance value and the similarity; and selecting a background picture from the candidate background pictures according to the total score of each candidate background picture.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present invention provide a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the image generation methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:

In some embodiments, there are multiple candidate text templates, and the instructions may further perform the steps of:

In some embodiments, the sample text includes text entries in multiple dimensions, and the instructions may further perform the steps of:

In some embodiments, the instructions may further perform the steps of:

In some embodiments, the energy parameter comprises a conflict parameter, the instructions further operable to:

In some embodiments, the energy parameter comprises a whiteout parameter, and the instructions are further executable to:

In some embodiments, the energy parameter includes an alignment parameter, and the instructions are further executable to:

In some embodiments, the instructions may further perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any image generation method provided in the embodiment of the present invention, the beneficial effects that can be achieved by any image generation method provided in the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The foregoing detailed description is directed to an image generating method, an image generating apparatus, and a storage medium according to embodiments of the present invention, and specific examples are applied herein to explain the principles and implementations of the present invention, and the descriptions of the foregoing embodiments are only used to help understand the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An image generation method, comprising:

acquiring an input text item, and screening a target template from preset candidate text templates according to the text item;

2. The image generation method of claim 1, wherein there are a plurality of candidate text templates, and the screening of the target template from the preset candidate text templates according to the text items comprises:

3. The image generation method of claim 2, wherein the sample text includes text entries in a plurality of dimensions, and calculating the distance between the text entries and the sample text includes:

4. The image generation method of claim 1, wherein the determining the target position of the text plate in the background picture according to the pixel information of the saliency map comprises:

5. The image generation method of claim 4, wherein calculating the balance energy value of the text plate at a preset candidate position in the background picture according to the pixel information of the saliency map comprises:

6. The image generation method of claim 5, wherein the energy parameter comprises a collision parameter, and the calculating the energy parameter from the pixel information of the saliency map and the text plate comprises:

7. The image generation method of claim 5, wherein the energy parameter comprises a margin parameter, and the calculating the energy parameter from the pixel information of the saliency map and the text slab comprises:

8. The image generation method of claim 5, wherein the energy parameter packet alignment parameter, the calculating an energy parameter from the pixel information of the saliency map and the text plate, comprises:

9. The image generating method of claim 5, wherein the calculating the equilibrium energy value of the text plate at the candidate position according to the preset weight and the energy parameter comprises:

10. The image generation method of claim 1, wherein the screening of the target template from the preset candidate text templates according to the text items comprises:

11. The image generation method of claim 1, wherein the obtaining a background picture comprises:

12. The image generation method of claim 11, wherein the selecting a background picture among the plurality of candidate background pictures according to the similarity comprises:

13. An image generation apparatus, comprising:

14. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the image generation method of any one of claims 1 to 12.