WO2022201515A1

WO2022201515A1 - Server, animation recommendation system, animation recommendation method, and program

Info

Publication number: WO2022201515A1
Application number: PCT/JP2021/012945
Authority: WO
Inventors: 孝弘坪野; イーカーヤン; 美帆折坂
Original assignee: 株式会社オープンエイト
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-09-29
Also published as: JPWO2022201515A1; JP6979738B1

Abstract

[Problem] To provide a server and the like which make it possible to easily prepare compound content data, and in particular, make it possible to recommend to a user a suitable animation with respect to an image. [Solution] Provided are: a server characterized by comprising a material content data setting unit that sets image data with respect to a cut, a reference frame setting unit that sets a reference frame for the image data, and an animation recommendation unit that recommends an animation type which moves in a visible region with the reference frame serving as a starting point or an ending point; an animation recommendation system; an animation recommendation preparation method; and a program.

Description

SERVER AND ANIMATION RECOMMENDATION SYSTEM, ANIMATION RECOMMENDATION METHOD, AND PROGRAM

The present invention relates to a server or the like that recommends animations for images.

　Conventionally, content data such as moving images has been created. For example, Patent Document 1 proposes a moving image processing apparatus that efficiently searches for a desired scene image from a moving image having a plurality of chapters.

Japanese Patent Application Laid-Open No. 2011-130007

It takes a lot of time and effort to create content data such as video. Consideration is difficult depending on the technical level of the user, so there has been a demand for a system capable of easily creating composite content data. In addition, for image data, it is necessary for the user to select an animation (zoom, slide, etc.), and selecting an appropriate animation is difficult depending on the user's technical level. Therefore, a system that can recommend an appropriate animation to the user was also requested to provide

Therefore, it is an object of the present invention to provide a server or the like that makes it possible to easily create composite content data, and particularly makes it possible to recommend an appropriate animation to a user.

The main inventions of the present invention for solving the above problems are a material content data setting unit for setting image data for a cut, and an animation start point or end point for the image data based on saliency information about the image data. A server, comprising: a reference frame setting unit that sets a suitable reference frame; and an animation recommendation unit that recommends the animation type having the reference frame as a starting point or an end point.

According to the present invention, it is possible to provide a server or the like that makes it possible to easily create composite content data, and particularly makes it possible to recommend an appropriate animation to a user.

1 is a configuration diagram of a system according to an embodiment; FIG. 1 is a configuration diagram of a server according to an embodiment; FIG. 3 is a configuration diagram of a management terminal and a user terminal according to an embodiment; FIG. 1 is a functional block diagram of a system according to an embodiment; FIG. FIG. 4 is a diagram for explaining an example screen layout that constitutes a cut; 4 is a flow chart of a system according to an example embodiment; FIG. 10 is an explanatory diagram of an aspect of displaying a list of a plurality of cuts forming composite content data on a screen; FIG. 10 is a diagram illustrating animation recommendation according to an embodiment; FIG. 10 is a diagram illustrating setting of a reference frame according to the embodiment; FIG. 5 is a diagram illustrating saliency object detection according to an example embodiment; FIG. 3 shows an example original image for saliency-based detection; FIG. 12 illustrates an example of saliency object detection for the image of FIG. 11; FIG. 4 is a diagram illustrating saliency map detection according to an example embodiment; FIG. 12 illustrates an example of saliency map detection for the image of FIG. 11; FIG. 10 is a diagram illustrating an animation recommendation example according to the embodiment; FIG. 12 illustrates an example of hybrid saliency map detection for the image of FIG. 11;

The contents of the embodiments of the present invention are listed and explained. A server or the like according to an embodiment of the present invention has the following configuration.
[Item 1]
a material content data setting unit for setting image data for cuts;
a reference frame setting unit that sets a reference frame suitable for the start point or end point of an animation in the image data based on saliency information about the image data;
an animation recommendation unit that recommends the animation type with the reference frame as a start point or an end point;
A server characterized by:
[Item 2]
The server according to item 1,
Furthermore, an animation score calculation unit that scores animation types based on the saliency information,
The animation recommendation unit recommends the animation type based on the animation score.
A server characterized by:
[Item 4]
The server according to any one of items 1 or 2,
the saliency information is obtained by hybrid saliency map detection using saliency object detection and saliency map detection;
A server characterized by:
[Item 5]
The server according to any one of items 1 or 2,
wherein the saliency information is obtained by saliency map detection;
A server characterized by:
[Item 6]
The server according to any one of items 1 or 2,
wherein the saliency information is obtained by saliency object detection;
A server characterized by:
[Item 7]
a material content data setting unit for setting image data for cuts;
a reference frame setting unit that sets a reference frame suitable for the start point or end point of an animation in the image data based on saliency information about the image data;
an animation recommendation unit that recommends an animation type whose starting point or ending point is the reference frame;
An animation recommendation system characterized by:
[Item 8]
a step of setting image data for a cut by a material content data setting unit;
setting a reference frame suitable for a start point or an end point of an animation in the image data based on saliency information about the image data by a reference frame setting unit;
an animation recommendation unit recommending an animation type having the reference frame as a start point or an end point;
An animation recommendation method characterized by:
[Item 9]
A program for causing a computer to execute an animation recommendation method,
The animation recommendation method includes:
a step of setting image data for a cut by a material content data setting unit;
setting a reference frame suitable for a start point or an end point of an animation in the image data based on saliency information about the image data by a reference frame setting unit;
an animation recommendation unit recommending an animation type having the reference frame as a start point or an end point;
A program characterized by

<Details of Embodiment>
A system for creating composite content data (hereinafter referred to as "this system") and the like according to an embodiment of the present invention will now be described. In the accompanying drawings, the same or similar elements are denoted by the same or similar reference numerals and names, and duplicate descriptions of the same or similar elements may be omitted in the description of each embodiment. Also, the features shown in each embodiment can be applied to other embodiments as long as they are not mutually contradictory.

<Configuration>
As shown in FIG. 1, the system according to the embodiment includes a server 1, an administrator terminal 2, and a user terminal 3. FIG. The server 1, the administrator terminal 2, and the user terminal 3 are communicably connected to each other via a network. The network may be a local network or may be connectable to an external network. In the example of FIG. 1, an example in which the server 1 is composed of one unit is described, but it is also possible to realize the server 1 using a plurality of server devices. Also, the server 1 and the administrator terminal 2 may be shared.

<Server 1>
FIG. 2 is a diagram showing the hardware configuration of the server 1 shown in FIG. 1. As shown in FIG. Note that the illustrated configuration is an example, and other configurations may be employed. Also, the server 1 may be a general-purpose computer such as a workstation or a personal computer, or may be logically realized by cloud computing.

The server 1 includes at least a processor 10 , a memory 11 , a storage 12 , a transmission/reception section 13 , an input/output section 14 and the like, which are electrically connected to each other through a bus 15 .

The processor 10 is an arithmetic device that controls the overall operation of the server 1, controls transmission and reception of data between elements, executes applications, and performs information processing necessary for authentication processing. For example, the processor 10 is a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit), and executes programs for this system stored in the storage 12 and developed in the memory 11 to perform each information process. It should be noted that the processing capability of the processor 10 only needs to be sufficient for executing necessary information processing, so for example, the processor 10 may be composed only of a CPU, and is not limited to this.

The memory 11 includes a main memory composed of a volatile memory device such as a DRAM (Dynamic Random Access Memory), and an auxiliary memory composed of a non-volatile memory device such as a flash memory or a HDD (Hard Disc Drive). . The memory 11 is used as a work area or the like for the processor 10, and may store a BIOS (Basic Input/Output System) executed when the server 1 is started, various setting information, and the like.

The storage 12 stores various programs such as application programs. A database storing data used for each process may be constructed in the storage 12 .

The transmission/reception unit 13 connects the server 1 to the network.

The input/output unit 14 is an information input device such as a keyboard and mouse, and an output device such as a display.

A bus 15 is commonly connected to the above elements and transmits, for example, address signals, data signals and various control signals.

<Administrator Terminal 2, User Terminal 3>
The administrator terminal 2 and the user terminal 3 shown in FIG. 3 also include a processor 20, a memory 21, a storage 22, a transmission/reception section 23, an input/output section 24, etc. These are electrically connected to each other through a bus 25. . Since the function of each element can be configured in the same manner as the server 1 described above, detailed description of each element will be omitted. The administrator uses the administrator terminal 2 to, for example, change the settings of the server 1 and manage the operation of the database. A user can access the server 1 from the user terminal 3 to create or view composite content data, for example.

<Functions of server 1>
FIG. 4 is a block diagram illustrating functions implemented in the server 1. As shown in FIG. In this embodiment, the server 1 includes a communication unit 110, an identified information analysis unit 120, a second data generation unit 130, a composite content data generation unit 140, an association unit 150, a storage unit 160, a classifier 170, an animation A recommendation unit 180 is provided. Composite content data generator 140 includes base data generator 142 , second data allocation unit 144 , and material content data allocation unit 146 . The storage unit 160 includes storage areas such as the memory 11 and the storage 11, and includes a base data storage unit 161, material content data storage unit 163, composite content data storage unit 165, interface information storage unit 167, animation data Various databases such as the storage unit 169 are included. The animation recommendation section 180 includes a score calculation section 182 and a reference frame setting section 184 . The material content data setting unit 190 is executed by the processor 10, for example, although it will be described later.

The communication unit 110 communicates with the administrator terminal 2 and the user terminal 3. The communication unit 110 also functions as a reception unit that receives first data including information to be identified, for example, from the user terminal 3 . The first data is, for example, text data such as articles containing information to be identified (for example, press releases, news, etc.), image data containing information to be identified (for example, photographs, illustrations, etc.), or video data. , voice data including information to be identified, and the like. Note that the text data here is not limited to text data at the time of transmission to the server 1, but may be text data generated by a known voice recognition technique from voice data transmitted to the server 1, for example. may In addition, the first data may be text data such as articles, etc., summarized by existing automatic summarization technology such as extractive summary or generative summary (including information to be identified). In this case, the number of cuts included in the base data is reduced, the data volume of the entire composite content data can be reduced, and the content can be simplified.

Also, the audio data referred to here is not limited to audio data acquired by an input device such as a microphone, but may be audio data extracted from video data or audio data generated from text data. In the former case, only audio data such as narration and lines are extracted from temporary images such as rough sketches and temporary moving images such as temporary video, and composite content is extracted along with material content data based on the audio data as will be described later. Data may be generated. In the latter case, for example, voice data may be created from text data with a story, and in the case of fairy tales, for example, a picture-story show or moving image based on the read-out story and material content data may be generated as composite content data.

For example, when the second data generation unit 130 determines that it is not necessary to divide the first data (for example, the text data is a short sentence with a preset number of characters or less), the second data generation unit 130 The data generator 130 generates the first data as it is as the second data. On the other hand, for example, when it is determined that the first data needs to be divided (for example, the sentence is longer than the preset number of characters), the second data generation unit 130 divides the first data. The data is divided and generated as second data each including at least part of the information to be identified of the first data. At this time, division number information of the second data is also generated. Any known technique may be used for the method of dividing the first data by the second data generation unit 130. For example, if the first data can be converted into text, Based on the analysis results of the maximum number of characters in each cut of the base data and the modification relationship between clauses, sentences may be separated so that a natural section as a sentence fits into each cut.

The identified information analysis unit 120 analyzes the second data described above and acquires identified information. Here, the information to be identified may be any information as long as it can be analyzed by the information to be identified analysis unit 120 . In one aspect, the identified information may be in word form defined by a language model. More specifically, it may be one or more words (for example, "Shibuya, Shinjuku, Roppongi" or "Shibuya, Landmark, Youth") accompanied by a word vector, which will be described later. Note that the words may include words that are not usually used alone, such as "n", depending on the language model. Also, a feature vector extracted from a document, an image, or a moving image may be used instead of the above-described word format.

The composite content data generation unit 140 generates base data including the number of cuts (one or more cuts) according to the division number information of the second data generated by the second data generation unit 130 described above. and the material content data newly input from the user terminal 3 and/or the material content data stored in the material content data storage unit 163 and the base data in which the above-described second data is assigned to each cut are combined. The composite content data is generated as content data, stored in the composite content data storage unit 165 , and displayed on the user terminal 3 . It should be noted that FIG. 5 is an example of a screen layout of cuts that constitute the base data. Edited second data (for example, delimited text sentences) is inserted in a second data field 31 in the figure, and selected material content data is inserted in a material content data field 32 . For each cut of the base data, the preset maximum number of characters (in the case of text data), screen layout, and playback time (in the case of moving images) may be specified. Also, composite content data does not necessarily need to be stored in the composite content data storage unit 165, and may be stored at appropriate timing. Also, the base data to which only the second data is assigned may be displayed on the user terminal 3 as progress information of the composite content data.

The second data allocation unit 144 assigns numbers to the one or more cuts generated by the base data generation unit 142 described above, such as scene 1, scene 2, scene 3, or cut 1, cut 2, cut 3, for example. The second data are sequentially assigned in this numerical order.

The association unit 150 compares at least part of the information to be identified included in the second data described above with, for example, extracted information extracted from the material content data (for example, class labels extracted by the classifier), For example, mutual similarity or the like is determined, and material content data suitable for the second data (for example, data having a high degree of similarity) and the second data are associated with each other. As a more specific example, for example, material content data A (for example, an image of a woman) whose identified information included in the second data represents "teacher" and extracted information is "face" and "mountain". is prepared (for example, an image of Mt. Fuji), the relationship between the word vector obtained from "teacher" and the word vector obtained from "face" is the word vector obtained from "teacher" and The second data is associated with the material content data A because it is more similar than the association of word vectors obtained from "mountain". The extraction information of the material content data may be extracted in advance by the user and stored in the material content data storage unit 163, or may be extracted by the classifier 170, which will be described later. In addition, the similarity determination may be performed by preparing a trained model that has learned word vectors, and using the vectors to determine the similarity of words by a method such as cosine similarity or Word Mover's Distance.

Material content data can be, for example, image data, video data, sound data (eg, music data, voice data, sound effects, etc.), but is not limited to this. The material content data may be stored in the material content data storage unit 163 by the user or administrator, or may be acquired from the network and stored in the material content data storage unit 163. may be

The material content data allocation unit 146 allocates suitable material content data to cuts to which the corresponding second data is allocated, based on the above-described association.

The interface information storage unit 167 stores various control information to be displayed on the display unit (display, etc.) of the administrator terminal 2 or the user terminal 3.

The classifier 170 acquires learning data from a learning data storage unit (not shown) and performs machine learning to create a learned model. Creation of the classifier 170 occurs periodically. The learning data for creating a classifier may be data collected from the network or data owned by the user with class labels attached, or a data set with class labels may be procured and used. . The classifier 170 is, for example, a trained model using a convolutional neural network, and upon input of material content data, extracts one or a plurality of extracted information (eg, class labels, etc.). The classifier 170, for example, extracts class labels representing objects associated with the material content data (eg, seafood, grilled meat, people, furniture).

FIG. 6 is a diagram explaining an example of the flow of creating composite content data.

First, the server 1 receives first data including at least identification information from the user terminal 3 via the communication unit 110 (step S101). In this example, the identified information is, for example, one or more words, and the first data may be, for example, text data consisting of an article containing one or more words or a summary of the text data.

Next, the server 1 acquires identified information by analyzing the first data by the identified information analysis unit 120, and generates one or more data containing at least part of the identified information by the second data generation unit 130. second data and division number information are generated (step S102).

Next, the server 1 causes the base data generation section 142 to generate the base data including the number of cuts according to the division number information by the composite content data generation section 140 (step S103).

Next, the server 1 allocates the second data to the cut by the second data allocation unit (step S104). The base data in this state may be displayed on the user terminal 3 so that the progress can be checked.

Next, based on at least part of the information to be identified included in the second data and the extracted information extracted from the material content data, the server 1 causes the association unit 150 to extract the material content data in the material content data storage unit 163. and the second data (step S105), and the material content data allocation unit 146 allocates the material content data to the cut (step S106).

Then, the server 1 generates the base data to which the second data and the material content data are assigned as composite content data, stores the composite content data in the composite content data storage unit 165, and displays the composite content data on the user terminal 3 (step S107). As for the display of composite content data, as shown in FIG. 7, a list of a plurality of cuts forming the composite content data can be displayed on the screen. For each cut, along with the displayed material content data and second data, information on the playback time (in seconds) of each cut may also be displayed. The user can, for example, correct the content by clicking the second data field 31 or the corresponding button, and replace the material content data by clicking the material content data field 32 or the corresponding button. can be done. Furthermore, it is also possible for the user to add other material content data to each scene from the user terminal.

It should be noted that the flow of creating composite content data described above is just an example, and for example, step S102 for reading the base data may be executed as long as it has been read before the assignment of the second data or material content data. may Also, for example, the order of step S104 for assigning the second data, step S105 for association, and step S106 for assigning material content data are executed in any order if there is no discrepancy with each other. may be

Further, the material content data setting unit 190 using the identified information analysis unit 120, the association unit 150, and the classifier 170 described so far may be one setting function of the composite content data creation system. The setting method by the setting unit 190 is not limited to this. For example, the base data is generated by the base data generation unit 142 in the above example, but it may be read from the base data storage unit 161 instead. The read-out base data may include, for example, a predetermined number of blank cuts, or template data in which predetermined material content data, format information, etc. have been set for each cut (for example, music data, background data, etc.). image, font information, etc.) may be used. Furthermore, as in the conventional composite content data creation system, the user may be able to set any material content to all or part of each data field from the user terminal. A setting method may be combined with a user operation, such as a user inputting arbitrary text using a user terminal, extracting information to be identified from these texts as described above, and associating material content.

(Animation recommendation function)
An example of a method for recommending an animation for an image by the animation recommendation unit 180 will be described with reference to FIGS. 8 to 12. FIG. For example, as performed in step 10* above.

FIG. 8 is a diagram explaining an example of the animation recommendation flow. The term "animation" as used herein includes, for example, known animations such as zooming in, zooming out, and slides moving up, down, left, and right. may be

In the example of FIG. 8, the score calculation unit 182 is used to score which animation type is appropriate for the image data set for each cut, and an animation is recommended according to the score. The score calculation unit 182 performs the above scoring on images based on the animation recommendation model. The animation recommendation model, for example, presents a predetermined image to an unspecified number of people, asks them to select an animation that matches it, and uses saliency information (details will be described later) about the image and a set of animation types as teacher data. It may be generated by machine learning.

For animation, for example, as illustrated in FIG. 9, a reference frame is provided for the entire image, and each animation is operated by moving the reference frame so that the position of the reference frame shown in the drawing becomes the starting point or the ending point. do. The size of the reference frame may be set to a predetermined value in advance, or may be set by the user using the user terminal.

The position of the reference frame is set by the reference frame setting unit 184 to a predetermined position that is most desired to be visualized in animation based on the saliency determination model. A saliency determination model is a trained model of saliency obtained by a known learning method such as saliency object detection in FIG. 10 or saliency map detection in FIG. By using this, the reference frame setting unit 184 sets the position of the reference frame based on the saliency information illustrated in FIGS. 10 and 13 so as to include many of the parts with the highest salience, for example.

FIG. 10 shows an example using a saliency object detection model, which can be implemented by a known method such as an encoder-decoder model. In FIG. 10, only large and small mountains are detected as saliency objects. Set the reference frame to a position that encloses only the For example, when the saliency object detection model is used for the animal image of FIG. 11, the result of detecting the shape of the animal is obtained as shown in FIG.

Also, FIG. 13 shows an example using a saliency map detection model, which can be realized by a known method such as a trained model using a convolutional neural network. In FIG. 13, the strength of visual salience of each pixel is determined as a saliency map, and as an example, the density of the black portion expresses the strength of visual salience. The reference frame setting unit 184 sets the reference frame at a position where the strength of visual saliency occupies a large proportion of the size of the reference frame (for example, a position surrounding only large mountains). For example, when the saliency map detection model is used for the animal image shown in FIG. 11, a result is obtained in which visual salience is strongly detected in the animal's face portion as shown in FIG.

Note that the score calculation unit 182 is not limited to the above-described score calculation by machine learning. For example, based on the saliency information shown in FIGS. You may make it calculate a high score with respect to. For example, the score calculation unit 182 executes each animation on the image and calculates the score for each movement. For example, as shown in FIG. For example, it recommends a left slide animation that ends the slide at the position of the frame of reference.

In this way, the animation recommendation unit 180 can recommend an appropriate animation for the image to the user.

By the way, when the saliency object detection in FIG. 10 is used, since there is no saliency gradation, the position of the reference frame may be a part that the user does not intend. Even when the saliency map detection of FIG. 13 is used, since the entire image of the object is unknown, the part unintended by the user may be the position of the reference frame, or the animation may include unintended parts. It is possible.

Therefore, as shown in FIG. 16, in order to acquire saliency information, we want to visually fit in the most visible region by using a hybrid saliency map detection model that combines saliency object detection and saliency map detection. By capturing information on both the outline of an object and the most important part of the object, it is possible to improve the accuracy of setting the reference frame and recommend a more appropriate animation.

Furthermore, considering that the accuracy of saliency detection is affected by the image quality, for example, by combining known super-resolution techniques, the resolution of the image is first increased, and then saliency detection is performed to achieve a more salience detection. Accuracy of sex information can be improved.

According to the present system of the embodiment described above, it is possible to easily create composite content data without preparing editing software, servers, editors with specialized skills, and the like. For example, it is expected to be used in the following situations.
1) Animating product information sold at EC shops 2) Distributing press release information, CSR information, etc. as videos 3) Animating manuals such as usage and operation flow 4) Creating creatives that can be used as video advertisements

Although the preferred embodiments of the present invention have been described above, the technical scope of the present invention is not limited to the description of the above embodiments. Various modifications and improvements can be added to the above-described embodiment examples, and forms with such modifications and improvements are also included in the technical scope of the present invention.

1 Server 2 Administrator terminal 3 User terminal

Claims

a material content data setting unit for setting image data for cuts;
a reference frame setting unit that sets a reference frame for the image data based on saliency information about the image data;
an animation recommendation unit that recommends an animation type that operates a visible area with the reference frame as a start point or an end point;
A server characterized by:
A server according to claim 1,
Further comprising a score calculation unit for scoring animation types based on saliency information about the image data,
The animation recommendation unit recommends the animation type based on the scoring.
A server characterized by:
The server according to any one of claims 1 or 2,
The reference frame setting unit sets the reference frame based on saliency information about the image data.
A server characterized by:
The server according to any one of claims 2 or 3,
the saliency information is obtained by hybrid saliency map detection using saliency object detection and saliency map detection;
A server characterized by:
The server according to any one of claims 2 or 3,
wherein the saliency information is obtained by saliency map detection;
A server characterized by:
The server according to any one of claims 2 or 3,
wherein the saliency information is obtained by saliency object detection;
A server characterized by:
a material content data setting unit for setting image data for cuts;
a reference frame setting unit that sets a reference frame for the image data;
an animation recommendation unit that recommends an animation type that operates a visible area with the reference frame as a start point or an end point;
An animation recommendation system characterized by:
a step of setting image data for a cut by a material content data setting unit;
setting a reference frame for the image data by a reference frame setting unit;
an animation recommendation unit recommending an animation type that operates a visible region with the reference frame as a start point or an end point;
An animation recommendation method characterized by:
A program for causing a computer to execute an animation recommendation method,
The animation recommendation method includes:
a step of setting image data for a cut by a material content data setting unit;
setting a reference frame for the image data by a reference frame setting unit;
an animation recommendation unit recommending an animation type that operates a visible region with the reference frame as a start point or an end point;
A program characterized by