WO2022201237A1

WO2022201237A1 - Server, text field arrangement position method, and program

Info

Publication number: WO2022201237A1
Application number: PCT/JP2021/011672
Authority: WO
Inventors: 孝弘坪野; イーカーヤン; 美帆折坂
Original assignee: 株式会社オープンエイト
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2022-09-29

Abstract

[Problem] To provide a server, etc., that makes it possible to easily prepare composite content data, particularly to arrange a text sentence in consideration of the arrangement position relationship with respect to a salient region in a moving image. [Solution] According to one embodiment of the present invention, there is provided a server 1 comprising: a material content data setting unit 190 that sets image data to a cut; and a text field arrangement position determination unit 180 that determines the position of a text field (second data field) arranged at a cut, the determination being made with reference to a salient region in the image data.

Description

Server and text field placement position method, program

The present invention relates to a server for generating video content to be distributed to user terminals, a text field layout position method, and a program.

　Conventionally, content data such as moving images has been created. For example, Patent Document 1 proposes a moving image processing apparatus that efficiently searches for a desired scene image from a moving image having a plurality of chapters.

Japanese Patent Application Laid-Open No. 2011-130007

It takes a lot of time and effort to create content data such as moving images. However, depending on the technical level of the user who creates content data such as moving pictures, it is difficult to consider the optimum combination of them. rice field.

In particular, when combining text data with video data, it is necessary to select an appropriate position on the image content and place text sentences (including subtitles, captions, etc.) generated from the text data. In general, a moving image includes an object or area (hereinafter referred to as a "saliency area") with saliency that a viewer pays attention to, and a text sentence is superimposed on at least a part of the saliency area. Then, the visibility of the moving image is impaired. Furthermore, if the text sentences are not arranged in a direction and distance at which the line of sight can be naturally moved from the saliency area of the object to the text sentences, the visibility of the moving image is impaired, and the readability of the text sentences is also impaired. Therefore, when the user manually places the text sentences in the video, for each video in the entire composite content to be created, pay attention so that the text sentences to be placed do not overlap the saliency regions in the video. At the same time, it is necessary to pay attention to the positional relationship between the saliency regions and the text sentences.

Therefore, the present invention provides a server or the like that makes it possible to easily create compound content data, and in particular, makes it possible to arrange text sentences in consideration of the arrangement positional relationship with saliency regions in a moving image. intended to

According to one aspect of the present invention, a material content data setting unit that sets image data for a cut, and a text field layout that determines the position of the text field to be placed on the cut by referring to the saliency region in the image data. A position determining unit is provided.

Other features and advantages of the present invention can be understood from the following description and accompanying drawings, given by way of example and non-exhaustively.

According to the present invention, there is provided a server or the like that makes it possible to easily create composite content data, and in particular, to arrange text sentences in consideration of the arrangement positional relationship with saliency regions in a moving image. It becomes possible to

1 is a configuration diagram of a system according to an embodiment; FIG. 1 is a configuration diagram of a server according to an embodiment; FIG. 3 is a configuration diagram of a management terminal and a user terminal according to an embodiment; FIG. 1 is a functional block diagram of a system according to an embodiment; FIG. FIG. 4 is a diagram for explaining an example screen layout that constitutes a cut; 4 is a flow chart of a system according to an example embodiment; FIG. 10 is an explanatory diagram of an aspect of displaying a list of a plurality of cuts forming composite content data on a screen; It is a figure explaining the 2nd data field arrangement|positioning identification method based on an example of embodiment. FIG. 5 is a diagram illustrating saliency object detection according to an example embodiment; FIG. 3 shows an example original image for saliency-based detection; FIG. 11 illustrates an example of saliency object detection for the image of FIG. 10; FIG. 4 is a diagram illustrating saliency map detection according to an example embodiment; FIG. 13 illustrates an example of saliency map detection for the image of FIG. 12; FIG. 10 is a diagram showing an example in which a second data field placement recommendation region is shown to the upper right of large mountains that are salient objects in the image shown in FIG. 9; FIG. 15 is a diagram showing a state in which the second data field is arranged in a portion of the second data field arrangement recommended area in the example shown in FIG. 14 and having a high score indicated by a high density; FIG. 12 is a diagram showing an example in which a second data field placement recommendation area is displayed on the right side of the animal in the image, which is the saliency object in the image shown in FIG. 11; FIG. 17 is a diagram showing a state in which the second data field is arranged in a portion of the second data field arrangement recommended area in the example shown in FIG. 16 with a high score indicated by a low density; Figure 11 shows an example of hybrid saliency map detection for the image of Figure 10; FIG. 10 is a diagram showing a modification in which the placement position specifying unit specifies the placement position of the second data field on the cut into which the material content data (image) is inserted; 20A and 20B show various placement examples for placing the second data field at the cell positions identified by the method shown in FIG. 19; FIG. FIG. 10 is a diagram showing another example of specifying the placement position of the second data field on the cut into which the material content data (image) is inserted, by the placement position specifying unit;

The contents of the embodiments of the present invention are listed and explained. A server or the like according to an embodiment of the present invention has the following configuration.
[Item 1]
a material content data setting unit for setting image data for a cut;
a text field placement position determination unit that determines the position of the text field to be placed on the cut by referring to the saliency region in the image data;
A server with
[Item 2]
The text field placement position determination unit is
a saliency region discriminating unit that discriminates a saliency region included in image data;
a placement position specifying unit that specifies the placement position of the text field on the cut with reference to the saliency region;
A server according to item 1, comprising:
[Item 3]
The placement position specifying unit determines the placement position of the text field on the cut based on a model generated by learning image data that satisfies predetermined conditions for the relationship between the saliency region in the image and the text field as training data. configured to identify the
The server of item 2.
[Item 4]
The arrangement position identifying unit is configured to identify the arrangement position of the text field on the cut based on the scoring value calculated for each pixel of the image data.
A server according to item 3.
[Item 5]
The placement position specifying unit is configured to specify the placement position of the text field on the cut based on the scoring value calculated for each of the plurality of cells that separate the image data.
A server according to item 3.
[Item 6]
The placement position specifying unit
dividing the entire image of the cut in which the image data is set into a plurality of cells;
excluding cells that include at least a portion of the saliency region among the plurality of cells;
Identifying a cell that satisfies a predetermined condition regarding a relationship with the saliency region among the plurality of cells remaining after exclusion as the placement position of the text field;
is configured to run
The server of item 2.
[Item 7]
the saliency regions are detected by hybrid saliency map detection using saliency object detection and saliency map detection;
7. The server according to any one of items 1-6.
[Item 8]
the saliency regions are detected by saliency map detection,
7. The server according to any one of items 1-6.
[Item 9]
The saliency regions are detected by saliency object detection,
7. The server according to any one of items 1-6.
[Item 10]
A system comprising a server according to any one of items 1-9.
[Item 11]
A computer-implemented text field placement method comprising:
setting data for the cut;
determining the position of the text field to be placed on the cut based on the saliency regions in the image data;
Text field placement position method, including.
[Item 12]
A program for causing a computer to execute a text field placement position method,
The text field placement position method is
setting image data for the cut;
determining the position of the text field to be placed on the cut based on the saliency regions in the image data;
program, including

<Details of Embodiment>
A system for creating composite content data (hereinafter referred to as "this system") and the like according to an embodiment of the present invention will now be described. In the accompanying drawings, the same or similar elements are denoted by the same or similar reference numerals and names, and duplicate descriptions of the same or similar elements may be omitted in the description of each embodiment. Also, the features shown in each embodiment can be applied to other embodiments as long as they are not mutually contradictory.

<Configuration>
As shown in FIG. 1, the system according to the embodiment includes a server 1, an administrator terminal 2, and a user terminal 3. FIG. A server 1, an administrator terminal 2, and a user terminal 3 are connected via a network N so as to be able to communicate with each other. Network N may be a local network, or may be connectable to an external network. In the example of FIG. 1, an example in which the server 1 is composed of one unit is described, but it is also possible to realize the server 1 using a plurality of server devices. Also, the server 1 and the administrator terminal 2 may be shared.

<Server 1>
FIG. 2 is a diagram showing the hardware configuration of the server 1 shown in FIG. 1. As shown in FIG. Note that the illustrated configuration is an example, and other configurations may be employed. Also, the server 1 may be a general-purpose computer such as a workstation or a personal computer, or may be logically realized by cloud computing.

The server 1 includes at least a processor 10 , a memory 11 , a storage 12 , a transmission/reception section 13 , an input/output section 14 and the like, which are electrically connected to each other through a bus 15 .

The processor 10 is an arithmetic device that controls the overall operation of the server 1, controls transmission and reception of data between elements, executes applications, and performs information processing necessary for authentication processing. For example, the processor 10 includes a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit), and executes programs for this system stored in the storage 12 and developed in the memory 11 to perform each information process. It should be noted that the processing capability of the processor 10 only needs to be sufficient for executing necessary information processing, so for example, the processor 10 may be composed only of a CPU, and is not limited to this.

The memory 11 includes a main memory composed of a volatile memory device such as a DRAM (Dynamic Random Access Memory), and an auxiliary memory composed of a non-volatile memory device such as a flash memory or a HDD (Hard Disc Drive). . The memory 11 is used as a work area or the like for the processor 10, and may store a BIOS (Basic Input/Output System) executed when the server 1 is started, various setting information, and the like.

The storage 12 stores various programs such as application programs. A database storing data used for each process may be constructed in the storage 12 . Particularly in this embodiment, the storage 12 stores a computer program for causing the server 1 to execute the composite content data creation method described with reference to FIG. A computer program is stored for causing the server 1 to execute the second data arrangement position determination method described below.

The transmission/reception unit 13 connects the server 1 to the network.

The input/output unit 14 is an information input device such as a keyboard and mouse, and an output device such as a display.

A bus 15 is commonly connected to the above elements and transmits, for example, address signals, data signals and various control signals.

<Administrator Terminal 2, User Terminal 3>
The administrator terminal 2 and the user terminal 3 shown in FIG. 3 also include a processor 20, a memory 21, a storage 22, a transmission/reception section 23, an input/output section 24, etc. These are electrically connected to each other through a bus 25. . Since the function of each element can be configured in the same manner as the server 1 described above, detailed description of each element will be omitted. The administrator uses the administrator terminal 2 to, for example, change the settings of the server 1 and manage the operation of the database. A user can access the server 1 from the user terminal 3 to create or view composite content data, for example.

<Functions of server 1>
FIG. 4 is a block diagram illustrating functions implemented in the server 1. As shown in FIG. In this embodiment, the server 1 includes a communication section 110 , a storage section 160 and a material content data setting section 190 . Material content data setting unit 190 includes identified information analysis unit 120, second data generation unit 130, composite content data generation unit 140, association unit 150, classifier 170, and second data placement position determination unit 180. there is Composite content data generator 140 includes base data generator 142 , second data allocation unit 144 , and material content data allocation unit 146 . The storage unit 160 is composed of storage areas such as the memory 11 and the storage 11, and includes a base data storage unit 161, a material content data storage unit 163, a compound content data storage unit 165, an interface information storage unit 167, and other various storage areas. Contains databases. A second data placement position determination unit 180 includes a saliency determination unit 182 that determines a saliency region related to an object or range having salience in the material content data, and a placement data field that specifies the position where the second data field is to be placed. and a locator 184 . The functions of the units 120, 130, 140, 150, 170, and 180 that make up the material content data setting unit 190 can be implemented by one or more processors 10, for example.

The communication unit 110 communicates with the administrator terminal 2 and the user terminal 3. The communication unit 110 also functions as a reception unit that receives first data including information to be identified, for example, from the user terminal 3 . The first data is, for example, text data such as articles containing information to be identified (for example, press releases, news, etc.), image data containing information to be identified (for example, photographs, illustrations, etc.), or video data. , voice data including information to be identified, and the like. Note that the text data here is not limited to text data at the time of transmission to the server 1, but may be text data generated by a known voice recognition technique from voice data transmitted to the server 1, for example. may In addition, the first data may be text data such as articles, etc., summarized by existing automatic summarization technology such as extractive summary or generative summary (including information to be identified). In this case, the number of cuts included in the base data is reduced, the data volume of the entire composite content data can be reduced, and the content can be simplified.

Also, the audio data referred to here is not limited to audio data acquired by an input device such as a microphone, but may be audio data extracted from video data or audio data generated from text data. In the former case, only audio data such as narration and lines are extracted from temporary images such as rough sketches and temporary moving images such as temporary video, and composite content is extracted along with material content data based on the audio data as will be described later. Data may be generated. In the latter case, for example, voice data may be created from text data with a story, and in the case of fairy tales, for example, a picture-story show or moving image based on the read-out story and material content data may be generated as composite content data.

For example, when the second data generation unit 130 determines that it is not necessary to divide the first data (for example, the text data is a short sentence with a preset number of characters or less), the second data generation unit 130 The data generator 130 generates the first data as it is as the second data. On the other hand, for example, when it is determined that the first data needs to be divided (for example, the sentence is longer than the preset number of characters), the second data generation unit 130 divides the first data. The data is divided and generated as second data each including at least part of the information to be identified of the first data. At this time, division number information of the second data is also generated. Any known technique may be used for the method of dividing the first data by the second data generation unit 130. For example, if the first data can be converted into text, Based on the analysis results of the maximum number of characters in each cut of the base data and the modification relationship between clauses, sentences may be separated so that a natural section as a sentence fits into each cut.

The identified information analysis unit 120 analyzes the second data described above and acquires identified information. Here, the information to be identified may be any information as long as it can be analyzed by the information to be identified analysis unit 120 . In one aspect, the identified information may be in word form defined by a language model. More specifically, it may be one or more words (for example, "Shibuya, Shinjuku, Roppongi" or "Shibuya, Landmark, Youth") accompanied by a word vector, which will be described later. Note that the words may include words that are not usually used alone, such as "n", depending on the language model. Also, a feature vector extracted from a document, an image, or a moving image may be used instead of the above-described word format.

The composite content data generation unit 140 generates base data including the number of cuts (one or more cuts) according to the division number information of the second data generated by the second data generation unit 130 described above. and the material content data newly input from the user terminal 3 and/or the material content data stored in the material content data storage unit 163 and the base data in which the above-described second data is assigned to each cut are combined. The composite content data is generated as content data, stored in the composite content data storage unit 165 , and displayed on the user terminal 3 . The base data generation unit 142 assigns numbers to the generated one or more cuts, such as scene 1, scene 2, scene 3 or cut 1, cut 2, cut 3, for example.

Fig. 5 is an example of a screen layout of cuts that make up the base data. In the example shown in FIG. 4A, the edited second data (for example, delimited text sentences) is inserted into the second data field 31, which is a text data field, and the material content data field The material content data selected in 32 is inserted. In the example shown in FIG. 5(a), the second data field 31 and the material content data field 32 are separated. A second data field 31 may be inserted to overlay the field 32 . However, the second data field 31 must be arranged so as not to overlap the saliency region in the material content data.

For each cut of the base data, the preset maximum number of characters (in the case of text data), screen layout, and playback time (in the case of video) may be stipulated. Also, composite content data does not necessarily need to be stored in the composite content data storage unit 165, and may be stored at appropriate timing. Also, the base data to which only the second data is assigned may be displayed on the user terminal 3 as progress information of the composite content data.

Referring to FIG. 4 again, the second data allocation unit 144 sequentially allocates the second data in the order of numbers assigned to one or more cuts generated by the base data generation unit 142 described above.

The association unit 150 compares at least part of the information to be identified included in the second data described above with, for example, extracted information extracted from the material content data (for example, class labels extracted by the classifier), For example, mutual similarity or the like is determined, and material content data suitable for the second data (for example, data having a high degree of similarity) and the second data are associated with each other. As a more specific example, for example, material content data A (for example, an image of a woman) whose identified information included in the second data represents "teacher" and extracted information is "face" and "mountain". is prepared (for example, an image of Mt. Fuji), the relationship between the word vector obtained from "teacher" and the word vector obtained from "face" is the word vector obtained from "teacher" and The second data is associated with the material content data A because it is more similar than the association of word vectors obtained from "mountain". The extraction information of the material content data may be extracted in advance by the user and stored in the material content data storage unit 163, or may be extracted by the classifier 170, which will be described later. In addition, the similarity determination may be performed by preparing a trained model that has learned word vectors, and using the vectors to determine the similarity of words by a method such as cosine similarity or Word Mover's Distance.

Material content data can be, for example, image data, video data, sound data (eg, music data, voice data, sound effects, etc.), but is not limited to this. The material content data may be stored in advance in the material content data storage unit 163 by the user or administrator, or may be acquired from the network and stored in the material content data storage unit 163. may be

The material content data allocation unit 146 allocates suitable material content data to cuts to which the corresponding second data is allocated, based on the above-described association.

The interface information storage unit 167 stores various control information to be displayed on the display unit (display, etc.) of the administrator terminal 2 or the user terminal 3.

The classifier 170 acquires learning data from a learning data storage unit (not shown) and performs machine learning to create a trained model. Creation of the classifier 170 occurs periodically. The learning data for creating a classifier may be data collected from the network or data owned by the user with class labels attached, or a data set with class labels may be procured and used. . The classifier 170 is, for example, a trained model using a convolutional neural network, and upon input of material content data, extracts one or a plurality of extracted information (eg, class labels, etc.). The classifier 170, for example, extracts class labels representing objects associated with the material content data (eg, seafood, grilled meat, people, furniture).

FIG. 6 is a diagram explaining an example of the flow of creating composite content data.

First, the server 1 receives first data including at least identification information from the user terminal 3 via the communication unit 110 (step S101). In this example, the identified information is, for example, one or more words, and the first data may be, for example, text data consisting of an article containing one or more words or a summary of the text data.

Next, the server 1 acquires identified information by analyzing the first data by the identified information analysis unit 120, and generates one or more data containing at least part of the identified information by the second data generation unit 130. second data and division number information are generated (step S102).

Next, the server 1 causes the base data generation section 142 to generate the base data including the number of cuts according to the division number information by the composite content data generation section 140 (step S103).

Next, the server 1 allocates the second data to the cut by the second data allocation unit (step S104). The base data in this state may be displayed on the user terminal 3 so that the progress can be checked.

Next, based on at least part of the information to be identified included in the second data and the extracted information extracted from the material content data, the server 1 causes the association unit 150 to extract the material content data in the material content data storage unit 163. and the second data (step S105), and the material content data allocation unit 146 allocates the material content data to the cut (step S106).

Next, the server 1 uses the second data placement position determining unit 180 to determine each cut based on the saliency region related to the object/range having salience in the image of the material content data, which is detected from the material content data. The placement position of the second data field 31 to be placed above is determined (step 107).

Then, the server 1 generates the base data to which the second data and the material content data are assigned as composite content data, stores the composite content data in the composite content data storage unit 165, and displays the composite content data on the user terminal 3 (step S108). The arrangement position of the second data field 31 in each cut is determined by the second data arrangement position determination section 180 as described above, and the server 1 determines the arrangement determined by the second data arrangement position determination section 180. A second data field 31 is inserted into each cut according to its position. As for the display of composite content data, as shown in FIG. 7, a list of a plurality of cuts forming the composite content data can be displayed on the screen. For each cut, along with the displayed material content data and second data, information on the playback time (in seconds) of each cut may also be displayed. The user can, for example, correct the content by clicking the second data field 31 or the corresponding button, and replace the material content data by clicking the material content data field 32 or the corresponding button. can be done. Furthermore, it is also possible for the user to add other material content data to each scene from the user terminal.

It should be noted that the flow of creating composite content data described above is just an example, and for example, step S102 for reading the base data may be executed as long as it has been read before the assignment of the second data or material content data. may Further, for example, step S104 for assigning the second data, step S105 for association, step S106 for assigning material content data, and determination of the arrangement position of the second data field 31 The order of the steps may be executed in any order as long as there is no discrepancy with each other.

In addition, the material content data setting unit 190 using the identified information analysis unit 120, the association unit 150, the classifier 170, and the second data placement position determination unit 180 described above is one setting of the composite content data creation system. It may be a function, and the setting method by the material content data setting unit 190 is not limited to this. For example, the base data is generated by the base data generation unit 142 in the above example, but it may be read from the base data storage unit 161 instead. The read-out base data may include, for example, a predetermined number of blank cuts, or template data in which predetermined material content data, format information, etc. have been set for each cut (for example, music data, background data, etc.). image, font information, etc.) may be used. Furthermore, as in the conventional composite content data creation system, the user may be able to set any material content to all or part of each data field from the user terminal. A setting method may be combined with a user operation, such as a user inputting arbitrary text using a user terminal, extracting information to be identified from these texts as described above, and associating material content.

(Determination of second data field placement position)
Next, an example of a method of determining the second data field placement position on each cut by the second data placement position determining unit 180 in this embodiment will be described with reference to FIGS. 9 to 18. FIG. A second data field placement location determination method is performed in step 107 above.

FIG. 8 is a diagram explaining an example of the flow of the second data field arrangement position determination method. As shown in FIG. 8, in the second data field arrangement position determination method, the saliency region determining unit 182 determines a saliency region related to the material content data (image data) set for each cut (S201). ), and a step of specifying the placement position of the second data field 31 on each cut by referring to the saliency region of the image by the placement position specifying unit 184 (S202).

First, the determination of the saliency region of the image by the saliency region determination unit 182 will be described. The saliency region determination unit 182 uses a saliency determination model, which is a trained model regarding salience, obtained by a known learning method such as the saliency object detection shown in FIG. 9 or the saliency map detection shown in FIG. are used to discriminate objects and ranges (hereinafter also referred to as “saliency regions”) having saliency in the image. The saliency determination model is stored in the model storage unit 169 of the storage unit 160, for example. The saliency region determination unit 182 determines saliency regions in images based on saliency information as exemplified in the images shown in FIGS. 9 and 11 to 13 .

FIG. 9 shows an example of detecting a saliency object in an image using a saliency object detection model. Saliency object detection using a saliency object detection model can be realized using a known technique such as an encoder-decoder model. In the example shown in FIG. 9, large and small mountains surrounded by dotted lines in FIG. 9 are detected as saliency objects in the image, and the saliency region discriminating unit 182 discriminates these large and small mountains as saliency objects.

10 and 11 show another example of detecting objects with salience in an image using the saliency object detection model. For example, when saliency object detection is performed on an image including animals shown in FIG. 10 using a saliency object detection model, a relatively bright region in FIG. The contour shape of the animal as shown is detected, and the saliency region discriminating section 182 discriminates the region showing this contour shape as a saliency object.

Also, FIG. 12 shows an example of detecting a saliency range in an image using a saliency map detection model. Saliency range detection using a saliency map detection model can be realized using a known technique such as applying a trained model to a feature map generated using a convolutional neural network based on an input image. can. In saliency range detection using a saliency map detection model, a saliency map is generated by determining the visual saliency strength of each pixel in the image. In the example of the saliency map shown in FIG. 12, by way of illustration, the darkness of the black portion expresses the strength of visual salience. The saliency region discriminating unit 182 discriminates a range occupied by a large proportion of pixels with relatively strong visual saliency (in the example shown in FIG. 12, a region of large mountains among large and small mountains) as a saliency range.

FIG. 13 shows another example of detecting a saliency range in an image using a saliency map detection model. For example, when saliency range detection is performed using the saliency map detection model for the image including animals shown in FIG. The range showing the outer shape of the animal as shown by the relatively bright region in 13 is discriminated as the saliency range. In this example of the saliency range, strong visual saliency is detected in the portion corresponding to the animal's face, indicating that the saliency of that portion is particularly high.

Next, the placement position specification of the second data field 31 by the placement position specifying unit 184 will be described. The placement position specifying unit 184 uses a placement position determination model, which is a learned model obtained by machine learning the relationship between the placement positions of the saliency regions in the image and the second data field 31, to determine the material content data. The arrangement position of the second data field 31 on the cut in which the (image) is inserted is specified. The arrangement position determination model is also stored, for example, in the model storage unit 169 of the storage unit 160 in the same manner as the saliency determination model.

The arrangement position determination model is, for example, an image to which text is added, and an image that is recognized to have a good relationship between the saliency region in the image and the arrangement position of the text as training data, and an arbitrary learner can be generated by machine learning using A learner that generates an arrangement position determination model extracts saliency regions and text from images of training data and learns their relative positional relationships in the images. Images used as training data are preferably selected based on, for example, the following points of view as conditions for the relationship between the saliency regions in the image and the placement positions of the text.
・The relationship between the saliency regions in the image and the placement position of the text is taken into consideration (the text is placed in a direction and distance that allows natural movement of the line of sight from the saliency region of the object to the text).
• All or part of the text does not overlap the saliency region.
- The text is placed near (or away from) a portion of the saliency region that has particularly high salience.

As an example, the layout position specifying unit 184 calculates a recommended area for arranging the second data field 31 for the image data set for each cut using the layout position determination model described above. The above calculation may be performed, for example, by scoring the degree of recommendation of the placement position of the second data field 31 for each pixel of the image. 14 and 16 are diagrams showing the second recommended data field placement areas calculated by the placement position specifying unit 184. FIG. FIG. 14 shows an example in which the second data field placement area is shown to the upper right of the large mountains that are the salient objects in the image shown in FIG. 9. In the example shown in FIG. A high score is given to a portion of the data field placement recommendation area that is shown in high density. Also, FIG. 16 shows an example in which the second data field placement recommendation area is displayed on the right side of the animal in the image, which is the saliency object in the image shown in FIG. , the portion of the second data field placement recommendation area shown in low density has a high score. 14 and 16 show the second recommended data field placement area specified by the placement position specifying unit 184 for explanation and visualization. Display of the data field placement recommendation area of 2 is not essential.

The placement position specifying unit 184 specifies, as the placement position of the second data field 31, a portion with a higher score among the second data field placement regions obtained as described above. FIG. 15 shows a state in which the second data field 31 is arranged in a high-scoring part indicated by high density in the second data field arrangement area in the example shown in FIG. 16 shows a state in which the second data field 31 is arranged in a portion of the second data field arrangement area in the example shown in FIG.

As described above, according to the second data arrangement position determination unit 180, the saliency region in the image is determined based on the saliency information, and the arrangement position of the second data field 31 is positioned at the same position as the saliency region. Since the relationship is specified in consideration of the relationship, the server 1 easily creates composite content data in which the second data field 31 is arranged at an appropriate position with respect to the saliency region in the image on each cut. becomes possible.

In this example, the placement position specifying unit 184 of the second data placement position determining unit 180 specifies the placement position of the second data field 31 using the placement position determination model. Depending on the position and size of the occupied area of the second data field 31, the position where a part of the second data field 31 overlaps the saliency area in the image may be specified as the arrangement position of the second data field 31. have a nature. Therefore, the second data placement position determining unit 180 determines whether the specified placement position of the second data field 31 is a position where a part of the second data field 31 overlaps the saliency region in the image, When it is determined that the second data field 31 overlaps the saliency region, it is preferable that the position shifted to the position where the second data field 31 does not overlap the saliency region is specified again as the placement position of the second data field 31 . When the second data placement position determination unit 180 determines that a portion of the second data field 31 overlaps the saliency region in the image, the second data placement position determination unit 180 may, for example, place the second data A position where the overlap can be eliminated with the shortest displacement distance from the first specified placement position of the field 31 is specified, and the placement position of the second data field 31 is corrected to that position.

By the way, when saliency object detection is used as described with reference to FIG. Since the intensity distribution cannot be obtained, the second data placement position determining unit 180 may specify a position away from the more salient portion of the saliency object as the placement position of the second data field 31 . On the other hand, when the saliency map detection is used as described with reference to FIG. 13, the overall image of the detected saliency object is unclear. When the second data field 31 is actually arranged at the position where the second data field 31 is arranged, a part of the second data field 31 may overlap the saliency area.

Therefore, as shown in FIG. 18, by acquiring saliency information using a hybrid saliency map detection model that combines saliency object detection and saliency map detection, the saliency region of the visually focused It is possible to capture information on both the contour and the more salience points within the saliency region. Thereby, the placement position specifying unit 184 of the second data placement position determining unit 180 can place the second data field 31 at a more appropriate position with respect to the detected saliency region.

Furthermore, considering that the accuracy of saliency detection is affected by image quality, for example, by combining known resolution up-conversion technology and/or HDR (high dynamic range) conversion technology, image resolution and/or dynamic By increasing the range and then detecting the saliency region in the image, the accuracy of detecting the saliency region can be further increased.

[Modification]
In the above-described embodiment, as an example of calculating the recommended area for arranging the second data field 31 by the arrangement position specifying unit 184, the degree of recommendation of the arrangement position of the second data field 31 is scored for each pixel of the image. I explained with an example. In this case, since a scoring value is obtained for each pixel, detailed information about the degree of recommendation of the arrangement position of the second data field 31 can be obtained. On the other hand, however, since the scoring value assigned to each pixel out of a large number of pixels is extremely small, the placement recommendation region of the second data field 31 specified by their distribution is uniformed and relatively small. It can be a large area.

On the other hand, in this modification, when the placement position specifying unit 184 calculates the recommended area in which the second data field 31 is placed for the image data set for each cut using the above placement position determination model, , divide the target image into a specific number of cells in advance, score the degree of recommendation of the placement position of the second data field 31 for each cell, and place the position of the cell with the highest scoring value in the second data field 31 Placement position. Note that functions and operations of other configurations of the server 1 that implements this modification are as described with reference to FIGS.

FIG. 19 is a diagram showing a modification in which the placement position specifying unit 184 specifies the placement position of the second data field 31 on the cut into which the material content data (image) is inserted.

In this example, the placement position specifying unit 184 divides the entire image on the cut in which the material content data (image) is inserted into a plurality of cells, and uses the placement position determination model for each cell to obtain the second data. A scoring value relating to the degree of recommendation of the placement position of the field 31 is calculated, and the cell with the highest scoring value is specified as the cell in which the second data field 31 should be placed. Although FIG. 19 shows an example in which the entire image is partitioned into 18 cells of 3 vertically and 6 horizontally, the number of cells partitioning the entire image may be set arbitrarily. Also, in FIG. 19, the boundary lines separating the cells are indicated by dashed lines, but this is shown for the sake of explanation, and such boundary lines are not actually drawn in the processing by the placement position specifying unit 184. .

Next, the placement position specifying processing of the second data field 31 by the placement position specifying unit 184 of this example will be described.

First, the placement position specifying unit 184 divides the entire cut image into which the material content data (image) is inserted into a predetermined number of cells (see FIG. 19(a)). As an example, in this example, an example is shown in which an image is divided into 18 cells of 3 vertically and 6 horizontally, but the number of cells dividing the entire image can be set arbitrarily.

Next, the layout position specifying unit 184 calculates a recommended area for arranging the second data field 31 for the image data set for each cut using the layout position determination model (FIG. 19B). reference). The above calculation is performed by scoring the degree of recommendation of the placement position of the second data field 31 for each cell of the image divided as shown in FIG. 19(a). FIG. 19(b) shows the scoring values calculated for each cell. Although the scoring values of each cell are shown in FIG. 19(b) for explanation and visualization, the display of these scoring values is not essential in the actual processing by the arrangement position specifying unit 184. FIG.

Next, the placement position specifying unit 184 specifies the cell with the highest calculated scoring value among these cells as the placement position of the second data field 31 (see FIG. 19(c)). In the example shown in FIG. 19(b), the cell in the second column from the right in the top row has the highest scoring value of 0.34, so that cell is identified as the placement position of the second data field 31. be done. In FIG. 19(c), the cells specified as the placement position of the second data field 31 are shaded for explanation and visualization. does not generate or display shading.

Thus, according to this example, the entire cut image is divided into a predetermined number of cells, and the placement position specifying unit 184 calculates the scoring value regarding the degree of recommendation of the placement position of the second data field 31 for each cell. The cell with the highest scoring value among those cells is specified as the arrangement position of the second data field 31 . In the method of this example, compared to the method of calculating the scoring value for each pixel of the image and specifying the arrangement position of the second data field 31 as described above, the number of targets for which the scoring value is calculated is small. The difference in scoring value between cells becomes clear, and as a result, it becomes easier to uniquely identify the cell with the highest scoring value as the optimal position as the placement position of the second data field 31 .

FIG. 20 is a diagram showing various placement examples for placing the second data field 31 at the position of the cell specified by the method shown in FIG.

Depending on the number of cells dividing the entire image of the cut, the size of one cell may not be able to completely accommodate the second data field 31 . Therefore, it is preferable that the second data field 31 is arranged in the specified cell based on the settings made in advance in the placement position specifying section 184 .

FIG. 20(a) shows an example in which the placement position specifying unit 184 is set so as to place the second data field 31 in the center of the specified cell. In this example, the placement position specifying unit 184 places the second data field 31 in the center of the specified cell so that the center of the specified cell and the second data field 31 approximately match. As a result, in the example shown in FIG. 20(a), the left and right portions of the second data field 31 extend into cells adjacent to each other on the left and right.

FIG. 20(b) shows, for the identified cell, the placement position so that the second data field 31 is placed as far away as possible from the saliency region in the image (large mountains in the example shown in FIG. 20). An example in which the specifying part 184 is set is shown. In this example, the placement location identifier 184 causes the second data field 31 to extend along the upper edge of the identified cell and the right portion of the second data field 31 into the adjacent cell to the right. A second data field 31 is placed in the identified cell as follows.

In FIG. 20(c), the placement position specifying unit 184 is set so as to place the second data field 31 at a position as close as possible to the saliency region (large mountains) in the image for the specified cell. Here are some examples: In this example, placement location identifier 184 extends second data field 31 along the bottom edge of the identified cell and the right portion of second data field 31 extends into the adjacent cell to the left. A second data field 31 is placed in the identified cell as follows.

The above-described various settings of the placement position specifying unit 184 can be appropriately changed in the server 1 according to user input received from the user terminal 3 via the communication unit 110, for example.

In any of the cases described with reference to FIGS. 20A to 20C, the second data field 31 placed in the specified cell protrudes outside the area of the entire image of the cut. However, as described in the above-described embodiments, depending on the position and size of the saliency region and/or the occupied region of the second data field 31, for example, the 20(a)-(c), the second data field 31 placed as described with reference to 20(a)-(c) does not meet these constraints, so in this example the second data placement position The determining unit 180 determines whether the arranged second data field 31 protrudes outside the area of the entire image of the cut or overlaps the saliency area, and determines that at least one of them applies. In this case, the position shifted to the position where the second data field 31 satisfies the above constraints (the position where the second data field 31 does not protrude outside the region of the entire image of the cut or does not overlap the saliency region) is shifted to the first position. 2 data field 31 is preferably configured to be specified again.

<Other Examples>
In the above embodiment, the placement position identifying unit 184 determines the placement position determination model, which is a learned model obtained by machine learning the relationship between the placement positions of the saliency regions in the image and the second data field 31. An example of specifying the arrangement position of the second data field 31 on the cut into which the material content data (image) is inserted has been described. According to this example, the second data field 31 can be arranged at an appropriate position considering the positional relationship between the saliency region and the second data field 31 . However, on the other hand, for example, scoring for each pixel in image data requires a particularly large computational cost.

FIG. 21 is a diagram showing another example of specifying the placement position of the second data field 31 on the cut into which the material content data (image) is inserted by the placement position specifying unit 184. As shown in FIG.

In this example, the placement position specifying unit 184 divides the entire cut image into which the material content data (image) is inserted into a plurality of cells, and specifies the optimum cell in which the second data field 31 should be placed. Although FIG. 21 shows an example in which the entire image is partitioned into 9 cells, 3 vertically and 3 horizontally, the number of cells partitioning the entire image may be set arbitrarily. Also, in FIG. 21, the boundary lines separating the cells are indicated by dashed lines, but this is shown for the sake of explanation, and such boundary lines are not actually drawn in the processing by the placement position specifying unit 184. . Note that the functions and operations of other components of the server 1 that implements this embodiment are as described with reference to FIGS.

First, the placement position specifying unit 184 divides the entire cut image into which the material content data (image) is inserted into a predetermined number of cells (see FIG. 21(a)). Next, the placement position specifying unit 184 excludes cells including at least part of the saliency regions specified by the saliency region discrimination unit 182 from among those cells from the placement position object of the second data field 31. (See FIG. 21(b)). In FIG. 21(b), the cells excluded from the placement position target are indicated by X marks for explanation and visualization, but the X marks are not generated or displayed in the actual processing by the placement position specifying unit 184. Finally, the placement position specifying unit 184 specifies the cell closest to the saliency region in the image, among the remaining cells that have not been excluded from the placement position target, as the placement position of the second data field 31 (Fig. 21(c)). In FIG. 21(c), the cells specified as the placement position of the second data field 31 are shaded for explanation and visualization. No odds are generated or displayed.

As described above, according to this example, the most suitable position under a predetermined condition is selected from a plurality of options for the placement position of the second data field 31 generated by dividing the entire cut image into a predetermined number of cells. The determined cell is identified as the placement position of the second data field 31 . In the method of this example, compared to the method of specifying the placement position of the second data field 31 using the placement position determination model described above, although the appropriateness of the placement position for the saliency region may be somewhat reduced, it is less It has the advantage of being computationally costly.

In this example, the condition for selecting the placement position of the second data field 31 is "closest to the saliency region", but the selection condition for the placement position is not limited to this. For example, the condition may be a position at a predetermined distance from the saliency region, a position farthest from the saliency region, a position in a predetermined direction from the saliency region, or the like. may be combined arbitrarily as long as no

According to the present system of the embodiment described above, it is possible to easily create composite content data without preparing editing software, servers, editors with specialized skills, and the like. For example, it is expected to be used in the following situations.
1) Animating product information sold at EC shops 2) Distributing press release information, CSR information, etc. as videos 3) Animating manuals such as usage and operation flow 4) Creating creatives that can be used as video advertisements

Although the preferred embodiments of the present invention have been described above, the technical scope of the present invention is not limited to the description of the above embodiments. Various modifications and improvements can be added to the above-described embodiment examples, and forms with such modifications and improvements are also included in the technical scope of the present invention.

1 Server 2 Administrator terminal 3 User terminal

Claims

a material content data setting unit for setting image data for a cut;
a text field placement position determining unit that determines the position of the text field to be placed on the cut by referring to the saliency region in the image data;
A server with
The text field placement position determination unit
a saliency region discriminating unit that discriminates a saliency region included in the image data;
an arrangement position identifying unit that identifies the arrangement position of the text field on the cut by referring to the saliency region;
2. The server of claim 1, comprising:
The placement position specifying unit is configured to determine the text field on the cut based on a model generated by learning image data that satisfies a predetermined condition for the relationship between the saliency region in the image and the text field as teacher data. is configured to identify the location of the
3. A server according to claim 2.
The arrangement position identifying unit is configured to identify the arrangement position of the text field on the cut based on a scoring value calculated for each pixel of the image data.
4. A server according to claim 3.
The placement position specifying unit is configured to specify the placement position of the text field on the cut based on a scoring value calculated for each of a plurality of cells that divide the image data.
4. A server according to claim 3.
The arrangement position specifying unit
dividing the entire image of the cut in which the image data is set into a plurality of cells;
excluding cells that include at least part of the saliency region among the plurality of cells;
identifying, among the plurality of cells remaining after the exclusion, a cell that satisfies a predetermined condition regarding a relationship with the saliency region as the placement position of the text field;
is configured to run
3. A server according to claim 2.
the saliency regions are detected by hybrid saliency map detection using saliency object detection and saliency map detection;
A server according to any one of claims 1-6.
the saliency regions are detected by saliency map detection;
A server according to any one of claims 1-6.
the saliency regions are detected by saliency object detection;
A server according to any one of claims 1-6.
A system comprising the server according to any one of claims 1 to 9.
A computer-implemented text field placement method comprising:
setting image data for the cut;
determining the location of a text field to place on the cut based on saliency regions in the image data;
Text field placement position method, including.
A program for causing a computer to execute a text field placement position method,
The text field placement position method is
setting image data for the cut;
determining the location of a text field to place on the cut based on saliency regions in the image data;
program, including