WO2022059817A1

WO2022059817A1 - Ai-based minimal contextual exploration method on basis of meta-information recognition that can be known from dialogues and backgrounds of images and videos

Info

Publication number: WO2022059817A1
Application number: PCT/KR2020/012598
Authority: WO
Inventors: 박문수
Original assignee: 주식회사 사이
Priority date: 2020-09-18
Filing date: 2020-09-18
Publication date: 2022-03-24

Abstract

Disclosed is an AI-based minimal contextual exploration method on the basis of meta-information recognition that can be known from dialogues and backgrounds of images and videos. The method comprises a trailer analysis step of extracting dialogues and backgrounds from a trailer video, recognizing meta-information on the basis of the extracted dialogues and backgrounds, and then performing artificial intelligence-based minimal contextual exploration using the recognized meta-information.

Description

AI minimal contextual exploration method of meta-information recognition that can be known from the dialogue and background of images and videos

The present invention relates to image-related technology, and more particularly, to meta-information recognition-based AI minimal context search technology.

Korean Patent Application Laid-Open No. 10-2015-0011652 discloses a content of generating a clip video and providing a preview video using the generated clip video. A method of generating video section data according to this is as follows. In the first step, a moving picture is reproduced in the first area of the display by using the moving picture data. In the second step, a section within the video is selected based on one or more signals input through the user interface. In the third step, an image representing the section is generated by using data corresponding to the section selected from among the video data. In the fourth step, the representative image is displayed on the second area of the display. In the fifth last step, when a representative image is selected through the user interface, video section data corresponding to the section selected above is generated.

An object of the present invention is to provide an AI minimum context search method for recognizing meta information that can be known from the dialogue and background of a trailer video.

Furthermore, an object of the present invention is to provide a technical method for automatically generating a trailer image composed of only scenes containing emotions preferred by a user from one or more image contents.

According to an aspect, the AI minimum context exploration method of meta-information recognition that can be known from the lines and backgrounds of images and videos extracts lines and backgrounds from the trailer video, and after recognizing meta information based on the extracted lines and backgrounds, the It includes a trailer analysis step of performing artificial intelligence-based minimal contextual exploration using the recognized meta information.

According to the present invention, it is possible to explore the AI minimum context of meta-information recognition that can be known from the dialogue and background of images and videos.

1 is a block diagram of an AI minimal context search system according to an embodiment.

2 is a flowchart of an AI minimal context search method according to an embodiment.

3 is a diagram illustrating emotion items.

4 is a detailed flowchart of S200 according to an embodiment.

5 is a diagram illustrating a process of generating a clip image and clip information.

6 is a detailed flowchart of S230 according to an embodiment.

7 is a diagram illustrating a process of converting clip information into a multidimensional vector.

8 is a diagram illustrating vector grouping.

9 is a diagram illustrating words classified by group.

10 is an exemplary diagram illustrating a process of extracting emotion words from clip information.

The foregoing and further aspects of the present invention will become more apparent through preferred embodiments described with reference to the accompanying drawings. Hereinafter, the present invention will be described in detail so that those skilled in the art can easily understand and reproduce it through these examples.

1 is a block diagram of an AI minimal context search system according to an embodiment. As shown in FIG. 1 , the AI minimal context search system includes a clip generation unit 100 , a clip emotion mapping unit 200 , and a trailer generation unit 300 , and further includes a user preference information generation unit 400 . can do. In addition, although not shown, the AI minimal context search system may further include a trailer analysis unit. These are all software-implementable configurations, and may be executed by one or more processors. And the AI minimal context search system of FIG. 1 may be configured in a user device, or may be configured in a server system that provides a trailer image to the user device. Alternatively, a part of FIG. 1 may be configured by being dualized in a user device and the rest in a server system.

The clip generating unit 100 generates a plurality of clip images from image content. When one or a plurality of moving images are given as input, the clip generating unit 100 divides each moving image into a plurality of pieces and generates a plurality of clip images. In one embodiment, the clip generating unit 100 generates a clip image by cutting a video based on the subtitle for an image section in which a subtitle exists, and for an image section in which a subtitle does not exist, a scene unit or A clip image is created by cutting in units of time or the like. The created clip images are stored and stored in storage.

The clip emotion mapping unit 200 analyzes each clip image and maps one or more emotion items for each clip image. In other words, the clip emotion mapping unit 200 recognizes a universal emotion that a person feels from the clip image, and maps and manages the captured emotion to the clip image. In one embodiment, the clip emotion mapping unit 200 maps emotion items corresponding to the clip image based on the caption analysis and the image analysis when the clip image includes a caption, and when the clip image does not include a caption The emotion item corresponding to the clip image is mapped based only on image analysis. Mapping information for each clip image is stored and managed in a database.

The trailer generator 300 generates a trailer image by combining some of the clip images of the target video, but generates the trailer image in consideration of emotion items mapped to the clip images of the target video. Here, the target video refers to video content designated by the user. In an embodiment, the trailer generator 300 generates a trailer image from clip images to which emotion items belonging to user preference emotion information are mapped. Thereby, it becomes possible to create a user-customized trailer image. For reference, the emotion items belonging to the user preference emotion information may include anger, fear, and sadness.

1 , the clip emotion mapping unit 200 may include a clip information generating unit 210 and an emotion mapping unit 230 . The clip information generating unit 210 generates clip information for each clip image. The clip information is information in the form of text, and refers to meta information about a clip image. The clip information generator 210 may perform caption analysis and image analysis, and may generate clip information according to the analysis result. Only image analysis can be performed on clip images without subtitles. Also, in the case of a clip image including subtitles, the clip information may include caption text and image description text, and clip information of a clip image not including subtitles may include only image description text except for caption text. The emotion mapping unit 230 maps one or more emotion items for each clip image by using the clip information generated by the clip information generating unit 210 . That is, the emotion mapping unit 230 maps one or more emotion items for each clip image based on the text included in the clip information. In an embodiment, the emotion mapping unit 230 maps the emotion item to the clip image by analyzing the vectorized emotion after vectorizing the clip information.

1 , the emotion mapping unit 230 may include a vector generating unit 231 , a vector grouping unit 232 , and a mapping unit 233 . The vector generator 231 converts the clip information generated by the clip information generator 210 into a multidimensional vector. In this case, the vector generator 231 converts the clip information into a multidimensional vector using a pre-trained model through machine learning. The vector grouping unit 232 groups the multidimensional vectors by clustering them. That is, similar values are classified into groups (clusters) among vectors. Here, each group is a group to which a unique emotion item is assigned. In this regard, the group may be referred to as an emotion group (emotion cluster). The mapping unit 233 maps one or more emotion items to a corresponding clip image according to a unique emotion item for each group. Clip information of a clip image is converted into a multidimensional vector, and the vectors are grouped, and emotion items assigned to one or more groups to which the vectors belong are mapped to the corresponding clip image. In an embodiment, the mapping unit 233 maps only emotion items for a group including a predetermined number or more of vectors to the clip image.

1 , the clip emotion mapping unit 200 may further include a clip information preprocessing unit 220 . The clip information pre-processing unit 220 pre-processes the clip information generated by the clip information generating unit 210 . In an embodiment, the clip information preprocessor 220 removes unnecessary words from clip information through preprocessing including normalization, tokenization, and stemming. The clip information preprocessed by the clip information preprocessor 220 is transmitted to the emotion mapping unit 230 .

The user preference information generating unit 400 generates user preference emotion information for providing a customized trailer image to the user based on the user's preferred emotion. In an embodiment, the user preference information generating unit 400 generates user preference emotion information based on emotion items of clip images constituting one or a plurality of video contents preferred by the user. That is, the user preference information generating unit 400 is a user composed of emotion items preferred by the user according to the result of processing through the clip generating unit 100 and the clip emotion mapping unit 200 for the image content preferred by the user. Generate preference emotion information.

On the other hand, the trailer analysis unit 500 extracts dialogue and background from the trailer image, recognizes meta information based on the extracted dialogue and background, and then uses the recognized meta information to perform AI-based minimal context exploration. can

2 is a flowchart of an AI minimal context search method according to an embodiment. The clip generating unit 100 generates a plurality of clip images by dividing the target moving image (S100). In S100, the clip generating unit 100 may generate a clip image based on the subtitle for the video section in which the caption exists, and cut the clip image by scene unit or time unit for the video section in which the caption does not exist. can create The clip emotion mapping unit 200 analyzes each of the clip images and maps one or more emotion items for each clip image (S200). All emotion items are exemplified in FIG. 3 . All emotion items may be composed of positive emotions, negative emotions, and neutrals as shown in FIG. 3(A), and anger and disgust as shown in FIG. 3(B). ), Fear, Happiness, Sadness, and Surprise, and may be made more diversely as shown in FIG. 3C .

The trailer generator 300 generates a trailer image by combining some of the clip images of the target video (S300). In S300 , the trailer generator 300 generates a trailer image in consideration of emotion items mapped to clip images of the target video, and may generate a trailer image only from clip images having an emotion item preferred by the user. For example, when a user's preferred emotion item is happiness, sadness, or surprise, a trailer image is generated with clip images mapped thereto. In an embodiment, the trailer generator 300 generates a trailer image by randomly selecting and combining some of all clip images having an emotion item preferred by the user. Thereafter, the trailer analysis unit (not shown) extracts lines and backgrounds from the trailer image, recognizes meta information based on the extracted lines and backgrounds, and uses the recognized meta information to perform artificial intelligence-based minimal context exploration. do.

4 is a detailed flowchart of S200 according to an embodiment. The clip information generating unit 210 generates clip information for each clip image (S210). The clip information may include caption text and image description text obtained through caption analysis and image analysis. For reference, a process of generating clip information by generating clip images from one image and analyzing the clip images is illustrated in FIG. 5 . “Moana” is exemplified as a target video. Subtitles and images are analyzed for video sections that include captions, and only image analysis is performed for video sections that do not include captions. And, as clip information according to the analysis result, text information such as “Thanks, Moana” and “A girl and an old woman standing side to side” is generated.

The clip information pre-processing unit 220 pre-processes clip information for each clip image (S220). Through preprocessing, unnecessary words are removed from clip information. For example, articles, conjunctions, or prepositions are removed. The emotion mapping unit 230 maps one or more emotion items to the clip image by using the clip information (S230). For example, anger and fear are mapped to clip image A, happiness is mapped to clip image B, and fear and sadness are mapped to clip image C. .

6 is a detailed flowchart of S230 according to an embodiment. The vector generator 231 converts the clip information into a multidimensional vector (S231). As illustrated in FIG. 7 , clip information “Thanks, Moana” and “A girl and an old woman standing side to side” are given as inputs to the training model and are converted into vectors. The vector grouping unit 232 clusters the multidimensional vectors and groups them as shown in FIG. 8 ( S232 ). As illustrated in FIG. 8 , vectors having similar values are grouped together. When the groups are the positive emotion group, the negative emotion group, and the no emotion group, words frequently appearing in each group are exemplified in FIG. 9 . The mapping unit 233 maps one or more emotion items to the corresponding clip image according to the group-specific emotion items (S233).

10 is an exemplary diagram illustrating a process of extracting emotion words from clip information. As is well known, the Naive Bayes Classifier is an algorithm used for sentiment analysis. A naive Bayes classifier learns a vast amount of data set, and a pre-trained model is created through it. The text, which is clip information, is preprocessed through a preprocessing process including normalization, tokenization, and stemming and is input to the learning model. ) is exported. This emotion word is the vector described above.

On the other hand, the above-described method can be written in a computer program. Codes and/or code segments constituting such a program can be easily inferred by a computer programmer in the art. In addition, such a program is stored in a computer-readable recording medium, and the method can be implemented by being read and executed by the computer. Such a recording medium may be a magnetic recording medium, an optical recording medium, or the like.

So far, the present invention has been looked at with respect to preferred embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

Claims

a trailer analysis step of extracting dialogue and background from the trailer image, recognizing meta information based on the extracted dialogue and background, and performing artificial intelligence-based minimal context exploration using the recognized meta information;

AI minimal contextual exploration method of meta-information recognition that can be known from the dialogue and background of images and videos containing
The method of claim 1,

a clip generating step of generating a plurality of clip images from image contents;

a clip emotion mapping step of analyzing each clip image and mapping one or more emotion items; and

A trailer generating step of generating a trailer image by combining some of the clip images of the target video, and generating a trailer image in consideration of emotion items mapped to the clip images;

AI minimal contextual exploration method of meta-information recognition that can be known from the dialogue and background of images and videos that further include.
The method of claim 1, wherein the clip emotion mapping step comprises:

a clip information generation step of analyzing a clip image to generate clip information; and

Emotion mapping comprising a vector generation step of converting clip information into a multidimensional vector, a vector grouping step of clustering and grouping the multidimensional vector, and a mapping step of mapping one or more emotion items to the clip image according to the unique emotion items for each group step;

AI minimal contextual exploration method of meta-information recognition that can be known from the dialogue and background of images and videos containing
4. The method of claim 3, wherein the clip emotion mapping step comprises:

Clip information pre-processing step of pre-processing clip information;

AI minimal contextual exploration method of meta-information recognition that can be known from the dialogue and background of images and videos that further include.
The method of claim 1,

The trailer creation step is an AI minimal context exploration method of recognizing meta information that can be known from the dialogue and background of images and videos that generate trailer images with clip images to which emotion items belonging to user preference emotional information are mapped.
6. The method of claim 5,

a user preference information generation step of generating user preference emotion information based on emotion items of clip images constituting one or more video contents preferred by the user;

AI minimal contextual exploration method of meta-information recognition that can be known from the dialogue and background of images and videos that further include.