WO2022201237A1 - Serveur, procédé de position d'agencement de champ de texte et programme - Google Patents

Serveur, procédé de position d'agencement de champ de texte et programme Download PDF

Info

Publication number
WO2022201237A1
WO2022201237A1 PCT/JP2021/011672 JP2021011672W WO2022201237A1 WO 2022201237 A1 WO2022201237 A1 WO 2022201237A1 JP 2021011672 W JP2021011672 W JP 2021011672W WO 2022201237 A1 WO2022201237 A1 WO 2022201237A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
saliency
placement position
text field
cut
Prior art date
Application number
PCT/JP2021/011672
Other languages
English (en)
Japanese (ja)
Inventor
孝弘 坪野
イー カー ヤン
美帆 折坂
Original Assignee
株式会社オープンエイト
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社オープンエイト filed Critical 株式会社オープンエイト
Priority to PCT/JP2021/011672 priority Critical patent/WO2022201237A1/fr
Publication of WO2022201237A1 publication Critical patent/WO2022201237A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor

Definitions

  • the present invention relates to a server for generating video content to be distributed to user terminals, a text field layout position method, and a program.
  • Patent Document 1 proposes a moving image processing apparatus that efficiently searches for a desired scene image from a moving image having a plurality of chapters.
  • a moving image includes an object or area (hereinafter referred to as a "saliency area") with saliency that a viewer pays attention to, and a text sentence is superimposed on at least a part of the saliency area. Then, the visibility of the moving image is impaired. Furthermore, if the text sentences are not arranged in a direction and distance at which the line of sight can be naturally moved from the saliency area of the object to the text sentences, the visibility of the moving image is impaired, and the readability of the text sentences is also impaired.
  • the present invention provides a server or the like that makes it possible to easily create compound content data, and in particular, makes it possible to arrange text sentences in consideration of the arrangement positional relationship with saliency regions in a moving image. intended to
  • a material content data setting unit that sets image data for a cut, and a text field layout that determines the position of the text field to be placed on the cut by referring to the saliency region in the image data.
  • a position determining unit is provided.
  • a server or the like that makes it possible to easily create composite content data, and in particular, to arrange text sentences in consideration of the arrangement positional relationship with saliency regions in a moving image. It becomes possible to
  • FIG. 1 is a configuration diagram of a system according to an embodiment
  • FIG. 1 is a configuration diagram of a server according to an embodiment
  • FIG. 3 is a configuration diagram of a management terminal and a user terminal according to an embodiment
  • FIG. 1 is a functional block diagram of a system according to an embodiment
  • FIG. 4 is a diagram for explaining an example screen layout that constitutes a cut
  • 4 is a flow chart of a system according to an example embodiment
  • FIG. 10 is an explanatory diagram of an aspect of displaying a list of a plurality of cuts forming composite content data on a screen; It is a figure explaining the 2nd data field arrangement
  • FIG. 1 is a configuration diagram of a system according to an embodiment
  • FIG. 1 is a configuration diagram of a server according to an embodiment
  • FIG. 3 is a configuration diagram of a management terminal and a user terminal according to an embodiment
  • FIG. 1 is a functional block diagram of a system according to
  • FIG. 5 is a diagram illustrating saliency object detection according to an example embodiment
  • FIG. 3 shows an example original image for saliency-based detection
  • FIG. 11 illustrates an example of saliency object detection for the image of FIG. 10
  • FIG. 4 is a diagram illustrating saliency map detection according to an example embodiment
  • FIG. 13 illustrates an example of saliency map detection for the image of FIG. 12
  • FIG. 10 is a diagram showing an example in which a second data field placement recommendation region is shown to the upper right of large mountains that are salient objects in the image shown in FIG. 9
  • FIG. 15 is a diagram showing a state in which the second data field is arranged in a portion of the second data field arrangement recommended area in the example shown in FIG.
  • FIG. 12 is a diagram showing an example in which a second data field placement recommendation area is displayed on the right side of the animal in the image, which is the saliency object in the image shown in FIG. 11;
  • FIG. 17 is a diagram showing a state in which the second data field is arranged in a portion of the second data field arrangement recommended area in the example shown in FIG. 16 with a high score indicated by a low density;
  • Figure 11 shows an example of hybrid saliency map detection for the image of Figure 10;
  • FIG. 12 is a diagram showing an example in which a second data field placement recommendation area is displayed on the right side of the animal in the image, which is the saliency object in the image shown in FIG. 11;
  • FIG. 17 is a diagram showing a state in which the second data field is arranged in a portion of the second data field arrangement recommended area in the example shown in FIG. 16 with a high score indicated by a low density;
  • Figure 11 shows an example of hybrid saliency map detection for the image of Figure 10;
  • FIG. 10 is a diagram showing a modification in which the placement position specifying unit specifies the placement position of the second data field on the cut into which the material content data (image) is inserted; 20A and 20B show various placement examples for placing the second data field at the cell positions identified by the method shown in FIG. 19; FIG. FIG. 10 is a diagram showing another example of specifying the placement position of the second data field on the cut into which the material content data (image) is inserted, by the placement position specifying unit;
  • a server or the like has the following configuration.
  • [Item 1] a material content data setting unit for setting image data for a cut; a text field placement position determination unit that determines the position of the text field to be placed on the cut by referring to the saliency region in the image data;
  • a server with [Item 2] The text field placement position determination unit is a saliency region discriminating unit that discriminates a saliency region included in image data; a placement position specifying unit that specifies the placement position of the text field on the cut with reference to the saliency region;
  • a server according to item 1, comprising: [Item 3]
  • the placement position specifying unit determines the placement position of the text field on the cut based on a model generated by learning image data that satisfies predetermined conditions for the relationship between the saliency region in the image and the text field as training data.
  • the arrangement position identifying unit is configured to identify the arrangement position of the text field on the cut based on the scoring value calculated for each pixel of the image data.
  • the placement position specifying unit is configured to specify the placement position of the text field on the cut based on the scoring value calculated for each of the plurality of cells that separate the image data.
  • the placement position specifying unit dividing the entire image of the cut in which the image data is set into a plurality of cells; excluding cells that include at least a portion of the saliency region among the plurality of cells; Identifying a cell that satisfies a predetermined condition regarding a relationship with the saliency region among the plurality of cells remaining after exclusion as the placement position of the text field; is configured to run The server of item 2.
  • the saliency regions are detected by hybrid saliency map detection using saliency object detection and saliency map detection; 7. The server according to any one of items 1-6.
  • the saliency regions are detected by saliency map detection, 7. The server according to any one of items 1-6.
  • the saliency regions are detected by saliency object detection, 7.
  • the server according to any one of items 1-6.
  • a system comprising a server according to any one of items 1-9.
  • a computer-implemented text field placement method comprising: setting data for the cut; determining the position of the text field to be placed on the cut based on the saliency regions in the image data; Text field placement position method, including.
  • this system A system for creating composite content data (hereinafter referred to as "this system") and the like according to an embodiment of the present invention will now be described.
  • this system A system for creating composite content data (hereinafter referred to as "this system") and the like according to an embodiment of the present invention.
  • this system A system for creating composite content data (hereinafter referred to as "this system") and the like according to an embodiment of the present invention.
  • this system A system for creating composite content data (hereinafter referred to as "this system") and the like according to an embodiment of the present invention.
  • this system A system for creating composite content data (hereinafter referred to as "this system") and the like according to an embodiment of the present invention will now be described.
  • this system the same or similar elements are denoted by the same or similar reference numerals and names, and duplicate descriptions of the same or similar elements may be omitted in the description of each embodiment.
  • the features shown in each embodiment can be applied to other embodiments as long as they are not mutually contradictory.
  • the system according to the embodiment includes a server 1, an administrator terminal 2, and a user terminal 3.
  • FIG. A server 1, an administrator terminal 2, and a user terminal 3 are connected via a network N so as to be able to communicate with each other.
  • Network N may be a local network, or may be connectable to an external network.
  • the server 1 is composed of one unit is described, but it is also possible to realize the server 1 using a plurality of server devices.
  • the server 1 and the administrator terminal 2 may be shared.
  • FIG. 2 is a diagram showing the hardware configuration of the server 1 shown in FIG. 1. As shown in FIG. Note that the illustrated configuration is an example, and other configurations may be employed. Also, the server 1 may be a general-purpose computer such as a workstation or a personal computer, or may be logically realized by cloud computing.
  • the server 1 includes at least a processor 10 , a memory 11 , a storage 12 , a transmission/reception section 13 , an input/output section 14 and the like, which are electrically connected to each other through a bus 15 .
  • the processor 10 is an arithmetic device that controls the overall operation of the server 1, controls transmission and reception of data between elements, executes applications, and performs information processing necessary for authentication processing.
  • the processor 10 includes a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit), and executes programs for this system stored in the storage 12 and developed in the memory 11 to perform each information process.
  • the processing capability of the processor 10 only needs to be sufficient for executing necessary information processing, so for example, the processor 10 may be composed only of a CPU, and is not limited to this.
  • the memory 11 includes a main memory composed of a volatile memory device such as a DRAM (Dynamic Random Access Memory), and an auxiliary memory composed of a non-volatile memory device such as a flash memory or a HDD (Hard Disc Drive). .
  • the memory 11 is used as a work area or the like for the processor 10, and may store a BIOS (Basic Input/Output System) executed when the server 1 is started, various setting information, and the like.
  • BIOS Basic Input/Output System
  • the storage 12 stores various programs such as application programs.
  • a database storing data used for each process may be constructed in the storage 12 .
  • the storage 12 stores a computer program for causing the server 1 to execute the composite content data creation method described with reference to FIG.
  • a computer program is stored for causing the server 1 to execute the second data arrangement position determination method described below.
  • the transmission/reception unit 13 connects the server 1 to the network.
  • the input/output unit 14 is an information input device such as a keyboard and mouse, and an output device such as a display.
  • a bus 15 is commonly connected to the above elements and transmits, for example, address signals, data signals and various control signals.
  • the administrator terminal 2 and the user terminal 3 shown in FIG. 3 also include a processor 20, a memory 21, a storage 22, a transmission/reception section 23, an input/output section 24, etc. These are electrically connected to each other through a bus 25. . Since the function of each element can be configured in the same manner as the server 1 described above, detailed description of each element will be omitted.
  • the administrator uses the administrator terminal 2 to, for example, change the settings of the server 1 and manage the operation of the database.
  • a user can access the server 1 from the user terminal 3 to create or view composite content data, for example.
  • FIG. 4 is a block diagram illustrating functions implemented in the server 1.
  • the server 1 includes a communication section 110 , a storage section 160 and a material content data setting section 190 .
  • Material content data setting unit 190 includes identified information analysis unit 120, second data generation unit 130, composite content data generation unit 140, association unit 150, classifier 170, and second data placement position determination unit 180.
  • Composite content data generator 140 includes base data generator 142 , second data allocation unit 144 , and material content data allocation unit 146 .
  • the storage unit 160 is composed of storage areas such as the memory 11 and the storage 11, and includes a base data storage unit 161, a material content data storage unit 163, a compound content data storage unit 165, an interface information storage unit 167, and other various storage areas. Contains databases.
  • a second data placement position determination unit 180 includes a saliency determination unit 182 that determines a saliency region related to an object or range having salience in the material content data, and a placement data field that specifies the position where the second data field is to be placed. and a locator 184 .
  • the functions of the units 120, 130, 140, 150, 170, and 180 that make up the material content data setting unit 190 can be implemented by one or more processors 10, for example.
  • the communication unit 110 communicates with the administrator terminal 2 and the user terminal 3.
  • the communication unit 110 also functions as a reception unit that receives first data including information to be identified, for example, from the user terminal 3 .
  • the first data is, for example, text data such as articles containing information to be identified (for example, press releases, news, etc.), image data containing information to be identified (for example, photographs, illustrations, etc.), or video data. , voice data including information to be identified, and the like.
  • the text data here is not limited to text data at the time of transmission to the server 1, but may be text data generated by a known voice recognition technique from voice data transmitted to the server 1, for example.
  • the first data may be text data such as articles, etc., summarized by existing automatic summarization technology such as extractive summary or generative summary (including information to be identified).
  • extractive summary or generative summary (including information to be identified).
  • generative summary including information to be identified.
  • the audio data referred to here is not limited to audio data acquired by an input device such as a microphone, but may be audio data extracted from video data or audio data generated from text data.
  • audio data such as narration and lines are extracted from temporary images such as rough sketches and temporary moving images such as temporary video, and composite content is extracted along with material content data based on the audio data as will be described later.
  • Data may be generated.
  • voice data may be created from text data with a story, and in the case of fairy tales, for example, a picture-story show or moving image based on the read-out story and material content data may be generated as composite content data.
  • the second data generation unit 130 determines that it is not necessary to divide the first data (for example, the text data is a short sentence with a preset number of characters or less), the second data generation unit 130 The data generator 130 generates the first data as it is as the second data.
  • the second data generation unit 130 divides the first data.
  • the data is divided and generated as second data each including at least part of the information to be identified of the first data.
  • division number information of the second data is also generated. Any known technique may be used for the method of dividing the first data by the second data generation unit 130. For example, if the first data can be converted into text, Based on the analysis results of the maximum number of characters in each cut of the base data and the modification relationship between clauses, sentences may be separated so that a natural section as a sentence fits into each cut.
  • the identified information analysis unit 120 analyzes the second data described above and acquires identified information.
  • the information to be identified may be any information as long as it can be analyzed by the information to be identified analysis unit 120 .
  • the identified information may be in word form defined by a language model. More specifically, it may be one or more words (for example, "Shibuya, Shinjuku, Roppongi” or "Shibuya, Landmark, Teen”) accompanied by a word vector, which will be described later.
  • the words may include words that are not usually used alone, such as "n", depending on the language model.
  • a feature vector extracted from a document, an image, or a moving image may be used instead of the above-described word format.
  • the composite content data generation unit 140 generates base data including the number of cuts (one or more cuts) according to the division number information of the second data generated by the second data generation unit 130 described above. and the material content data newly input from the user terminal 3 and/or the material content data stored in the material content data storage unit 163 and the base data in which the above-described second data is assigned to each cut are combined.
  • the composite content data is generated as content data, stored in the composite content data storage unit 165 , and displayed on the user terminal 3 .
  • the base data generation unit 142 assigns numbers to the generated one or more cuts, such as scene 1, scene 2, scene 3 or cut 1, cut 2, cut 3, for example.
  • Fig. 5 is an example of a screen layout of cuts that make up the base data.
  • the edited second data for example, delimited text sentences
  • the second data field 31 which is a text data field
  • the material content data field The material content data selected in 32 is inserted.
  • the second data field 31 and the material content data field 32 are separated.
  • a second data field 31 may be inserted to overlay the field 32 .
  • the second data field 31 must be arranged so as not to overlap the saliency region in the material content data.
  • the preset maximum number of characters in the case of text data
  • screen layout in the case of text
  • playback time in the case of video
  • composite content data does not necessarily need to be stored in the composite content data storage unit 165, and may be stored at appropriate timing.
  • the base data to which only the second data is assigned may be displayed on the user terminal 3 as progress information of the composite content data.
  • the second data allocation unit 144 sequentially allocates the second data in the order of numbers assigned to one or more cuts generated by the base data generation unit 142 described above.
  • the association unit 150 compares at least part of the information to be identified included in the second data described above with, for example, extracted information extracted from the material content data (for example, class labels extracted by the classifier), For example, mutual similarity or the like is determined, and material content data suitable for the second data (for example, data having a high degree of similarity) and the second data are associated with each other.
  • material content data A for example, an image of a woman
  • identified information included in the second data represents "teacher” and extracted information is "face” and "mountain”.
  • is prepared for example, an image of Mt.
  • the relationship between the word vector obtained from “teacher” and the word vector obtained from “face” is the word vector obtained from "teacher” and
  • the second data is associated with the material content data A because it is more similar than the association of word vectors obtained from "mountain”.
  • the extraction information of the material content data may be extracted in advance by the user and stored in the material content data storage unit 163, or may be extracted by the classifier 170, which will be described later.
  • the similarity determination may be performed by preparing a trained model that has learned word vectors, and using the vectors to determine the similarity of words by a method such as cosine similarity or Word Mover's Distance.
  • Material content data can be, for example, image data, video data, sound data (eg, music data, voice data, sound effects, etc.), but is not limited to this.
  • the material content data may be stored in advance in the material content data storage unit 163 by the user or administrator, or may be acquired from the network and stored in the material content data storage unit 163. may be
  • the material content data allocation unit 146 allocates suitable material content data to cuts to which the corresponding second data is allocated, based on the above-described association.
  • the interface information storage unit 167 stores various control information to be displayed on the display unit (display, etc.) of the administrator terminal 2 or the user terminal 3.
  • the classifier 170 acquires learning data from a learning data storage unit (not shown) and performs machine learning to create a trained model. Creation of the classifier 170 occurs periodically.
  • the learning data for creating a classifier may be data collected from the network or data owned by the user with class labels attached, or a data set with class labels may be procured and used. .
  • the classifier 170 is, for example, a trained model using a convolutional neural network, and upon input of material content data, extracts one or a plurality of extracted information (eg, class labels, etc.).
  • the classifier 170 for example, extracts class labels representing objects associated with the material content data (eg, seafood, grilled meat, people, furniture).
  • FIG. 6 is a diagram explaining an example of the flow of creating composite content data.
  • the server 1 receives first data including at least identification information from the user terminal 3 via the communication unit 110 (step S101).
  • the identified information is, for example, one or more words
  • the first data may be, for example, text data consisting of an article containing one or more words or a summary of the text data.
  • the server 1 acquires identified information by analyzing the first data by the identified information analysis unit 120, and generates one or more data containing at least part of the identified information by the second data generation unit 130. Second data and division number information are generated (step S102).
  • the server 1 causes the base data generation section 142 to generate the base data including the number of cuts according to the division number information by the composite content data generation section 140 (step S103).
  • the server 1 allocates the second data to the cut by the second data allocation unit (step S104).
  • the base data in this state may be displayed on the user terminal 3 so that the progress can be checked.
  • the server 1 causes the association unit 150 to extract the material content data in the material content data storage unit 163. and the second data (step S105), and the material content data allocation unit 146 allocates the material content data to the cut (step S106).
  • the server 1 uses the second data placement position determining unit 180 to determine each cut based on the saliency region related to the object/range having salience in the image of the material content data, which is detected from the material content data.
  • the placement position of the second data field 31 to be placed above is determined (step 107).
  • the server 1 generates the base data to which the second data and the material content data are assigned as composite content data, stores the composite content data in the composite content data storage unit 165, and displays the composite content data on the user terminal 3 (step S108).
  • the arrangement position of the second data field 31 in each cut is determined by the second data arrangement position determination section 180 as described above, and the server 1 determines the arrangement determined by the second data arrangement position determination section 180.
  • a second data field 31 is inserted into each cut according to its position.
  • a list of a plurality of cuts forming the composite content data can be displayed on the screen. For each cut, along with the displayed material content data and second data, information on the playback time (in seconds) of each cut may also be displayed.
  • the user can, for example, correct the content by clicking the second data field 31 or the corresponding button, and replace the material content data by clicking the material content data field 32 or the corresponding button. can be done. Furthermore, it is also possible for the user to add other material content data to each scene from the user terminal.
  • step S102 for reading the base data may be executed as long as it has been read before the assignment of the second data or material content data.
  • step S104 for assigning the second data may be executed as long as it has been read before the assignment of the second data or material content data.
  • step S105 for association for assigning material content data
  • step S106 for assigning material content data may be determined of the arrangement position of the second data field 31
  • the order of the steps may be executed in any order as long as there is no discrepancy with each other.
  • the material content data setting unit 190 using the identified information analysis unit 120, the association unit 150, the classifier 170, and the second data placement position determination unit 180 described above is one setting of the composite content data creation system. It may be a function, and the setting method by the material content data setting unit 190 is not limited to this.
  • the base data is generated by the base data generation unit 142 in the above example, but it may be read from the base data storage unit 161 instead.
  • the read-out base data may include, for example, a predetermined number of blank cuts, or template data in which predetermined material content data, format information, etc. have been set for each cut (for example, music data, background data, etc.). image, font information, etc.) may be used.
  • the user may be able to set any material content to all or part of each data field from the user terminal.
  • a setting method may be combined with a user operation, such as a user inputting arbitrary text using a user terminal, extracting information to be identified from these texts as described above, and associating material content.
  • FIG. 1 A second data field placement location determination method is performed in step 107 above.
  • FIG. 8 is a diagram explaining an example of the flow of the second data field arrangement position determination method.
  • the saliency region determining unit 182 determines a saliency region related to the material content data (image data) set for each cut (S201). ), and a step of specifying the placement position of the second data field 31 on each cut by referring to the saliency region of the image by the placement position specifying unit 184 (S202).
  • the saliency region determination unit 182 uses a saliency determination model, which is a trained model regarding salience, obtained by a known learning method such as the saliency object detection shown in FIG. 9 or the saliency map detection shown in FIG. are used to discriminate objects and ranges (hereinafter also referred to as “saliency regions”) having saliency in the image.
  • the saliency determination model is stored in the model storage unit 169 of the storage unit 160, for example.
  • the saliency region determination unit 182 determines saliency regions in images based on saliency information as exemplified in the images shown in FIGS. 9 and 11 to 13 .
  • FIG. 9 shows an example of detecting a saliency object in an image using a saliency object detection model.
  • Saliency object detection using a saliency object detection model can be realized using a known technique such as an encoder-decoder model.
  • large and small mountains surrounded by dotted lines in FIG. 9 are detected as saliency objects in the image, and the saliency region discriminating unit 182 discriminates these large and small mountains as saliency objects.
  • FIG. 10 and 11 show another example of detecting objects with salience in an image using the saliency object detection model.
  • saliency object detection is performed on an image including animals shown in FIG. 10 using a saliency object detection model, a relatively bright region in FIG. The contour shape of the animal as shown is detected, and the saliency region discriminating section 182 discriminates the region showing this contour shape as a saliency object.
  • FIG. 12 shows an example of detecting a saliency range in an image using a saliency map detection model.
  • Saliency range detection using a saliency map detection model can be realized using a known technique such as applying a trained model to a feature map generated using a convolutional neural network based on an input image. can.
  • a saliency map is generated by determining the visual saliency strength of each pixel in the image.
  • the darkness of the black portion expresses the strength of visual salience.
  • the saliency region discriminating unit 182 discriminates a range occupied by a large proportion of pixels with relatively strong visual saliency (in the example shown in FIG. 12, a region of large mountains among large and small mountains) as a saliency range.
  • FIG. 13 shows another example of detecting a saliency range in an image using a saliency map detection model.
  • saliency range detection is performed using the saliency map detection model for the image including animals shown in FIG.
  • the range showing the outer shape of the animal as shown by the relatively bright region in 13 is discriminated as the saliency range.
  • strong visual saliency is detected in the portion corresponding to the animal's face, indicating that the saliency of that portion is particularly high.
  • the placement position specifying unit 184 uses a placement position determination model, which is a learned model obtained by machine learning the relationship between the placement positions of the saliency regions in the image and the second data field 31, to determine the material content data.
  • the arrangement position of the second data field 31 on the cut in which the (image) is inserted is specified.
  • the arrangement position determination model is also stored, for example, in the model storage unit 169 of the storage unit 160 in the same manner as the saliency determination model.
  • the arrangement position determination model is, for example, an image to which text is added, and an image that is recognized to have a good relationship between the saliency region in the image and the arrangement position of the text as training data, and an arbitrary learner can be generated by machine learning using A learner that generates an arrangement position determination model extracts saliency regions and text from images of training data and learns their relative positional relationships in the images. Images used as training data are preferably selected based on, for example, the following points of view as conditions for the relationship between the saliency regions in the image and the placement positions of the text.
  • the relationship between the saliency regions in the image and the placement position of the text is taken into consideration (the text is placed in a direction and distance that allows natural movement of the line of sight from the saliency region of the object to the text). • All or part of the text does not overlap the saliency region. - The text is placed near (or away from) a portion of the saliency region that has particularly high salience.
  • the layout position specifying unit 184 calculates a recommended area for arranging the second data field 31 for the image data set for each cut using the layout position determination model described above. The above calculation may be performed, for example, by scoring the degree of recommendation of the placement position of the second data field 31 for each pixel of the image.
  • 14 and 16 are diagrams showing the second recommended data field placement areas calculated by the placement position specifying unit 184.
  • FIG. FIG. 14 shows an example in which the second data field placement area is shown to the upper right of the large mountains that are the salient objects in the image shown in FIG. 9. In the example shown in FIG. A high score is given to a portion of the data field placement recommendation area that is shown in high density. Also, FIG.
  • 16 shows an example in which the second data field placement recommendation area is displayed on the right side of the animal in the image, which is the saliency object in the image shown in FIG. , the portion of the second data field placement recommendation area shown in low density has a high score. 14 and 16 show the second recommended data field placement area specified by the placement position specifying unit 184 for explanation and visualization. Display of the data field placement recommendation area of 2 is not essential.
  • the placement position specifying unit 184 specifies, as the placement position of the second data field 31, a portion with a higher score among the second data field placement regions obtained as described above.
  • FIG. 15 shows a state in which the second data field 31 is arranged in a high-scoring part indicated by high density in the second data field arrangement area in the example shown in FIG. 16 shows a state in which the second data field 31 is arranged in a portion of the second data field arrangement area in the example shown in FIG.
  • the saliency region in the image is determined based on the saliency information, and the arrangement position of the second data field 31 is positioned at the same position as the saliency region. Since the relationship is specified in consideration of the relationship, the server 1 easily creates composite content data in which the second data field 31 is arranged at an appropriate position with respect to the saliency region in the image on each cut. becomes possible.
  • the placement position specifying unit 184 of the second data placement position determining unit 180 specifies the placement position of the second data field 31 using the placement position determination model.
  • the position where a part of the second data field 31 overlaps the saliency area in the image may be specified as the arrangement position of the second data field 31. have a nature.
  • the second data placement position determining unit 180 determines whether the specified placement position of the second data field 31 is a position where a part of the second data field 31 overlaps the saliency region in the image, When it is determined that the second data field 31 overlaps the saliency region, it is preferable that the position shifted to the position where the second data field 31 does not overlap the saliency region is specified again as the placement position of the second data field 31 .
  • the second data placement position determination unit 180 may, for example, place the second data A position where the overlap can be eliminated with the shortest displacement distance from the first specified placement position of the field 31 is specified, and the placement position of the second data field 31 is corrected to that position.
  • the second data placement position determining unit 180 may specify a position away from the more salient portion of the saliency object as the placement position of the second data field 31 .
  • the saliency map detection is used as described with reference to FIG. 13, the overall image of the detected saliency object is unclear.
  • the second data field 31 is actually arranged at the position where the second data field 31 is arranged, a part of the second data field 31 may overlap the saliency area.
  • the placement position specifying unit 184 of the second data placement position determining unit 180 can place the second data field 31 at a more appropriate position with respect to the detected saliency region.
  • the degree of recommendation of the arrangement position of the second data field 31 is scored for each pixel of the image.
  • a scoring value is obtained for each pixel, detailed information about the degree of recommendation of the arrangement position of the second data field 31 can be obtained.
  • the scoring value assigned to each pixel out of a large number of pixels is extremely small, the placement recommendation region of the second data field 31 specified by their distribution is uniformed and relatively small. It can be a large area.
  • this modification when the placement position specifying unit 184 calculates the recommended area in which the second data field 31 is placed for the image data set for each cut using the above placement position determination model, , divide the target image into a specific number of cells in advance, score the degree of recommendation of the placement position of the second data field 31 for each cell, and place the position of the cell with the highest scoring value in the second data field 31 Placement position.
  • functions and operations of other configurations of the server 1 that implements this modification are as described with reference to FIGS.
  • FIG. 19 is a diagram showing a modification in which the placement position specifying unit 184 specifies the placement position of the second data field 31 on the cut into which the material content data (image) is inserted.
  • the placement position specifying unit 184 divides the entire image on the cut in which the material content data (image) is inserted into a plurality of cells, and uses the placement position determination model for each cell to obtain the second data. A scoring value relating to the degree of recommendation of the placement position of the field 31 is calculated, and the cell with the highest scoring value is specified as the cell in which the second data field 31 should be placed.
  • FIG. 19 shows an example in which the entire image is partitioned into 18 cells of 3 vertically and 6 horizontally, the number of cells partitioning the entire image may be set arbitrarily. Also, in FIG. 19, the boundary lines separating the cells are indicated by dashed lines, but this is shown for the sake of explanation, and such boundary lines are not actually drawn in the processing by the placement position specifying unit 184. .
  • the placement position specifying unit 184 divides the entire cut image into which the material content data (image) is inserted into a predetermined number of cells (see FIG. 19(a)).
  • a predetermined number of cells see FIG. 19(a)
  • an example is shown in which an image is divided into 18 cells of 3 vertically and 6 horizontally, but the number of cells dividing the entire image can be set arbitrarily.
  • the layout position specifying unit 184 calculates a recommended area for arranging the second data field 31 for the image data set for each cut using the layout position determination model (FIG. 19B). reference).
  • the above calculation is performed by scoring the degree of recommendation of the placement position of the second data field 31 for each cell of the image divided as shown in FIG. 19(a).
  • FIG. 19(b) shows the scoring values calculated for each cell. Although the scoring values of each cell are shown in FIG. 19(b) for explanation and visualization, the display of these scoring values is not essential in the actual processing by the arrangement position specifying unit 184.
  • FIG. 19B layout position determination model
  • the placement position specifying unit 184 specifies the cell with the highest calculated scoring value among these cells as the placement position of the second data field 31 (see FIG. 19(c)).
  • the cell in the second column from the right in the top row has the highest scoring value of 0.34, so that cell is identified as the placement position of the second data field 31. be done.
  • the cells specified as the placement position of the second data field 31 are shaded for explanation and visualization. does not generate or display shading.
  • the entire cut image is divided into a predetermined number of cells, and the placement position specifying unit 184 calculates the scoring value regarding the degree of recommendation of the placement position of the second data field 31 for each cell.
  • the cell with the highest scoring value among those cells is specified as the arrangement position of the second data field 31 .
  • the number of targets for which the scoring value is calculated is small. The difference in scoring value between cells becomes clear, and as a result, it becomes easier to uniquely identify the cell with the highest scoring value as the optimal position as the placement position of the second data field 31 .
  • FIG. 20 is a diagram showing various placement examples for placing the second data field 31 at the position of the cell specified by the method shown in FIG.
  • the second data field 31 is arranged in the specified cell based on the settings made in advance in the placement position specifying section 184 .
  • FIG. 20(a) shows an example in which the placement position specifying unit 184 is set so as to place the second data field 31 in the center of the specified cell.
  • the placement position specifying unit 184 places the second data field 31 in the center of the specified cell so that the center of the specified cell and the second data field 31 approximately match.
  • the left and right portions of the second data field 31 extend into cells adjacent to each other on the left and right.
  • FIG. 20(b) shows, for the identified cell, the placement position so that the second data field 31 is placed as far away as possible from the saliency region in the image (large mountains in the example shown in FIG. 20).
  • An example in which the specifying part 184 is set is shown.
  • the placement location identifier 184 causes the second data field 31 to extend along the upper edge of the identified cell and the right portion of the second data field 31 into the adjacent cell to the right.
  • a second data field 31 is placed in the identified cell as follows.
  • the placement position specifying unit 184 is set so as to place the second data field 31 at a position as close as possible to the saliency region (large mountains) in the image for the specified cell.
  • placement location identifier 184 extends second data field 31 along the bottom edge of the identified cell and the right portion of second data field 31 extends into the adjacent cell to the left.
  • a second data field 31 is placed in the identified cell as follows.
  • the above-described various settings of the placement position specifying unit 184 can be appropriately changed in the server 1 according to user input received from the user terminal 3 via the communication unit 110, for example.
  • the second data field 31 placed in the specified cell protrudes outside the area of the entire image of the cut.
  • the second data field 31 placed as described with reference to 20(a)-(c) does not meet these constraints, so in this example the second data placement position
  • the determining unit 180 determines whether the arranged second data field 31 protrudes outside the area of the entire image of the cut or overlaps the saliency area, and determines that at least one of them applies.
  • the position shifted to the position where the second data field 31 satisfies the above constraints is shifted to the first position.
  • 2 data field 31 is preferably configured to be specified again.
  • the placement position identifying unit 184 determines the placement position determination model, which is a learned model obtained by machine learning the relationship between the placement positions of the saliency regions in the image and the second data field 31.
  • the placement position determination model is a learned model obtained by machine learning the relationship between the placement positions of the saliency regions in the image and the second data field 31.
  • An example of specifying the arrangement position of the second data field 31 on the cut into which the material content data (image) is inserted has been described.
  • the second data field 31 can be arranged at an appropriate position considering the positional relationship between the saliency region and the second data field 31 .
  • scoring for each pixel in image data requires a particularly large computational cost.
  • FIG. 21 is a diagram showing another example of specifying the placement position of the second data field 31 on the cut into which the material content data (image) is inserted by the placement position specifying unit 184. As shown in FIG.
  • the placement position specifying unit 184 divides the entire cut image into which the material content data (image) is inserted into a plurality of cells, and specifies the optimum cell in which the second data field 31 should be placed.
  • FIG. 21 shows an example in which the entire image is partitioned into 9 cells, 3 vertically and 3 horizontally, the number of cells partitioning the entire image may be set arbitrarily.
  • the boundary lines separating the cells are indicated by dashed lines, but this is shown for the sake of explanation, and such boundary lines are not actually drawn in the processing by the placement position specifying unit 184. . Note that the functions and operations of other components of the server 1 that implements this embodiment are as described with reference to FIGS.
  • the placement position specifying unit 184 divides the entire cut image into which the material content data (image) is inserted into a predetermined number of cells (see FIG. 21(a)).
  • the placement position specifying unit 184 excludes cells including at least part of the saliency regions specified by the saliency region discrimination unit 182 from among those cells from the placement position object of the second data field 31. (See FIG. 21(b)).
  • the cells excluded from the placement position target are indicated by X marks for explanation and visualization, but the X marks are not generated or displayed in the actual processing by the placement position specifying unit 184.
  • the placement position specifying unit 184 specifies the cell closest to the saliency region in the image, among the remaining cells that have not been excluded from the placement position target, as the placement position of the second data field 31 (Fig. 21(c)).
  • the cells specified as the placement position of the second data field 31 are shaded for explanation and visualization. No odds are generated or displayed.
  • the most suitable position under a predetermined condition is selected from a plurality of options for the placement position of the second data field 31 generated by dividing the entire cut image into a predetermined number of cells.
  • the determined cell is identified as the placement position of the second data field 31 .
  • the appropriateness of the placement position for the saliency region may be somewhat reduced, it is less It has the advantage of being computationally costly.
  • the condition for selecting the placement position of the second data field 31 is "closest to the saliency region", but the selection condition for the placement position is not limited to this.
  • the condition may be a position at a predetermined distance from the saliency region, a position farthest from the saliency region, a position in a predetermined direction from the saliency region, or the like. may be combined arbitrarily as long as no

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Le problème à résoudre par la présente invention est de fournir un serveur, etc., permettant de préparer facilement des données de contenu composite, en particulier pour agencer une phrase de texte en tenant compte de la relation de position d'agencement par rapport à une zone saillante dans une image animée. La solution, selon un mode de réalisation de la présente invention, porte sur un serveur 1 comprenant : une unité de réglage de données de contenu matériel 190 qui définit des données d'image sur une coupe ; et une unité de détermination de position d'agencement de champ de texte 180 qui détermine la position d'un champ de texte (second champ de données) disposé au niveau d'une coupe, la détermination étant faite en référence à une zone saillante dans les données d'image.
PCT/JP2021/011672 2021-03-22 2021-03-22 Serveur, procédé de position d'agencement de champ de texte et programme WO2022201237A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/011672 WO2022201237A1 (fr) 2021-03-22 2021-03-22 Serveur, procédé de position d'agencement de champ de texte et programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/011672 WO2022201237A1 (fr) 2021-03-22 2021-03-22 Serveur, procédé de position d'agencement de champ de texte et programme

Publications (1)

Publication Number Publication Date
WO2022201237A1 true WO2022201237A1 (fr) 2022-09-29

Family

ID=83395279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/011672 WO2022201237A1 (fr) 2021-03-22 2021-03-22 Serveur, procédé de position d'agencement de champ de texte et programme

Country Status (1)

Country Link
WO (1) WO2022201237A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014038601A (ja) * 2012-08-16 2014-02-27 Naver Corp イメージ分析によるイメージ自動編集装置、方法およびコンピュータ読み取り可能な記録媒体
WO2016053820A1 (fr) * 2014-09-30 2016-04-07 Microsoft Technology Licensing, Llc Optimisation de la lisibilité d'un texte affiché
US20200310631A1 (en) * 2017-11-20 2020-10-01 Huawei Technologies Co., Ltd. Method and Apparatus for Dynamically Displaying Icon Based on Background Image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014038601A (ja) * 2012-08-16 2014-02-27 Naver Corp イメージ分析によるイメージ自動編集装置、方法およびコンピュータ読み取り可能な記録媒体
WO2016053820A1 (fr) * 2014-09-30 2016-04-07 Microsoft Technology Licensing, Llc Optimisation de la lisibilité d'un texte affiché
US20200310631A1 (en) * 2017-11-20 2020-10-01 Huawei Technologies Co., Ltd. Method and Apparatus for Dynamically Displaying Icon Based on Background Image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LU PENG; ZHANG HAO; PENG XUJUN; JIN XIAOFU: "Learning the Relation Between Interested Objects and Aesthetic Region for Image Cropping", IEEE TRANSACTIONS ON MULTIMEDIA, IEEE, USA, vol. 23, 9 October 2020 (2020-10-09), USA, pages 3618 - 3630, XP011884057, ISSN: 1520-9210, DOI: 10.1109/TMM.2020.3029882 *

Similar Documents

Publication Publication Date Title
CN109618222B (zh) 一种拼接视频生成方法、装置、终端设备及存储介质
US10949744B2 (en) Recurrent neural network architectures which provide text describing images
CN111062871B (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
US10068380B2 (en) Methods and systems for generating virtual reality environments from electronic documents
EP3520081B1 (fr) Techniques permettant d'incorporer une image contenant du texte dans une image numérique
CN109803180B (zh) 视频预览图生成方法、装置、计算机设备及存储介质
CN111373740B (zh) 使用选择界面将横向视频转换成纵向移动布局的方法
CN110134931B (zh) 媒介标题生成方法、装置、电子设备及可读介质
CN110023927B (zh) 用于将布局应用于文档的系统和方法
US7945142B2 (en) Audio/visual editing tool
CN110795925B (zh) 基于人工智能的图文排版方法、图文排版装置及电子设备
CN111460183A (zh) 多媒体文件生成方法和装置、存储介质、电子设备
US20230027412A1 (en) Method and apparatus for recognizing subtitle region, device, and storage medium
US20230115551A1 (en) Localization of narrations in image data
JP2020005309A (ja) 動画編集サーバおよびプログラム
Chu et al. Optimized comics-based storytelling for temporal image sequences
WO2019245033A1 (fr) Serveur et programme d'édition d'images animées
CN112287168A (zh) 用于生成视频的方法和装置
WO2022200110A1 (fr) Évaluation de qualité de contenu multimédia
US20220303459A1 (en) Enhancing quality of multimedia
CN113407696A (zh) 收集表处理方法、装置、设备以及存储介质
WO2022201237A1 (fr) Serveur, procédé de position d'agencement de champ de texte et programme
CN111881900A (zh) 语料生成、翻译模型训练、翻译方法、装置、设备及介质
KR20210003547A (ko) Gan을 이용한 웹사이트 자동생성 방법, 장치 및 프로그램
RU2739262C1 (ru) Способ управления предъявлением информации

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932851

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932851

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP