WO2022050632A1

WO2022050632A1 - Multimedia automatic generation system for automatically generating multimedia appropriate for user voice data by using artificial intelligence

Info

Publication number: WO2022050632A1
Application number: PCT/KR2021/011413
Authority: WO
Inventors: 이수민
Original assignee: 주식회사 웨인힐스벤처스
Priority date: 2020-09-03
Filing date: 2021-08-26
Publication date: 2022-03-10
Also published as: KR102213618B1

Abstract

The present invention relates to a multimedia automatic generation system for automatically generating and distributing multimedia appropriate for user voice data by using artificial intelligence, and the system may comprise: a voice input unit for receiving user voice data transmitted from a user terminal; a text conversion unit for converting voice data into text data; a video search unit for searching for unit video data corresponding to the text data; and a video generation unit for producing customized video data by combining the unit video data.

Description

Multimedia automatic creation system that automatically creates multimedia suitable for user's voice data using artificial intelligence

The present invention relates to an automatic multimedia generation system for automatically generating multimedia suitable for a user's voice data using artificial intelligence, and more particularly, to convert a user's voice into text, and to It relates to an automatic multimedia generation system that automatically generates related multimedia and automatically distributes the generated multimedia through a media platform.

As the spread of smartphones increases and wireless communication networks capable of high-speed data transmission spread, it has become common for many people to obtain information while watching videos through smartphones.

Accordingly, text information is gradually out of interest, and attempts are being made to make such text information into a video and provide it. However, it is very difficult for ordinary people who do not have professional knowledge in video editing to change text information into video information.

The technical task to be achieved by the automatic multimedia generation system according to the technical idea of the present invention is to convert the user's voice into text using artificial intelligence, automatically generate multimedia related to the content of the converted text, and generate the generated multimedia. It is to provide a multimedia automatic creation system that is automatically distributed through a media platform.

The technical problems to be achieved by the automatic multimedia generation system according to the technical spirit of the present invention are not limited to the above-mentioned tasks, and another task not mentioned will be clearly understood by those skilled in the art from the following description.

According to an embodiment of the present invention, an automatic multimedia generation system includes: a voice input unit connected to a user terminal through a network and receiving user voice data from the user terminal; a text converter for converting voice data into text data; a video search unit for searching unit video data corresponding to text data; and a video generation unit for producing customized video data by combining unit video data.

The text conversion unit may extract a keyword from sentences included in the text data to generate a summary sentence including one or more keywords.

The automatic multimedia generation system may further include a moving picture database in which unit moving picture data is stored, and the moving picture search unit may search the moving picture database for a plurality of unit moving picture data corresponding to keywords included in one summary sentence.

The automatic multimedia generation system may further include a feedback collecting unit that evaluates whether the searched unit video data matches a keyword of a summary sentence based on a search by a score from the user terminal.

The feedback collecting unit receives a suitability score from the user terminals in relation to suitability between the summary sentence and the corresponding customized video data, and applies the transmitted suitability score to the keyword included in the summary sentence and the customized video data. is included and is given as a suitability score between the unit video data corresponding to the keyword, scores transmitted from other user terminals are accumulated in the suitability score, and the reciprocal of the accumulated suitability score is calculated between the keyword and the unit video data It can be set to the suitability distance of , and virtually arranged so that keywords fall apart by the suitability distance centering on each unit of video data, and stored in the video database.

The video search unit may search unit video data having a search radius including all keywords included in the summary sentence, but may select unit video data having the smallest search radius.

The multimedia automatic creation system includes: a music database in which music data is stored; and a music search unit for searching music data corresponding to the customized video data, wherein the music search unit may search a music database for a plurality of pieces of music data corresponding to one piece of customized video data.

The feedback collecting unit receives scores from user terminals in relation to suitability between the customized video data and the corresponding music data, and calculates the transmitted scores between the unit video data included in the customized video data and the music data. assigning a suitability score, accumulating scores transmitted from other user terminals to the suitability score, setting the reciprocal of the accumulated suitability score as a suitability distance between the unit video data and the music data, and each music data It can be stored in the music database by virtually arranging the unit video data so that the unit video data is separated by the fit distance from the center.

The music search unit searches for music data having a search radius including all unit video data included in the customized video data, but may select music data having the smallest search radius.

The automatic multimedia generation system may further include a subtitle generator for adding summary sentences corresponding to the customized video data as subtitles.

The multimedia automatic generation system according to embodiments according to the technical idea of the present invention converts a user's voice into text using artificial intelligence, automatically generates multimedia related to the content of the converted text, and generates the generated multimedia. It provides an automatic multimedia creation system that is automatically distributed through a media platform.

However, effects that can be achieved by the automatic multimedia generation system according to an embodiment of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below. will be able

In order to more fully understand the drawings cited herein, a brief description of each drawing is provided.

1 is a schematic diagram of a system for automatically generating multimedia according to an embodiment of the present invention.

2 is a flowchart illustrating an automatic multimedia generation system according to an embodiment of the present invention.

3 is a diagram schematically illustrating a process of generating customized video data by an automatic multimedia generation system according to an embodiment of the present invention.

4 is a diagram illustrating an example in which the automatic multimedia generation system selects optimal unit video data according to an embodiment of the present invention.

Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and will be described in detail through the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood that the present invention includes all modifications, equivalents and substitutes included in the spirit and scope of the present invention.

In describing the present invention, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description process of the present specification are only identification symbols for distinguishing one component from other components.

In addition, in this specification, when a component is referred to as “connected” or “connected” with another component, the component may be directly connected or directly connected to the other component, but in particular It should be understood that, unless there is a description to the contrary, it may be connected or connected through another element in the middle.

In addition, in the present specification, two or more components may be combined into one component, or one component may be divided into two or more for each more subdivided function. In addition, each of the components to be described below may additionally perform some or all of the functions of other components in addition to the main functions they are responsible for, and some of the main functions of each component may have different functions. It goes without saying that it may be performed exclusively by the component.

Hereinafter, embodiments according to the technical spirit of the present invention will be described in detail in turn.

1 is a schematic diagram of a system for automatically generating multimedia according to an embodiment of the present invention. 2 is a flowchart illustrating an automatic multimedia generation system according to an embodiment of the present invention. 3 is a diagram schematically illustrating a process of generating customized video data by an automatic multimedia generation system according to an embodiment of the present invention. 4 is a diagram illustrating an example in which the automatic multimedia generation system selects optimal unit video data according to an embodiment of the present invention.

The multimedia automatic generation system 100 according to an embodiment of the present invention may be connected to the user's terminal 10 and the network 50 through a voice input unit 110, a text conversion unit 120, a video search unit ( 130 ), a video generator 140 , a video database 150 , a feedback collector 160 , a music database 170 , a music searcher 180 , and a subtitle generator 190 .

A user may access the automatic multimedia generation system 100 using the user terminal 10 to send and receive signals to and from the automatic multimedia generation system 100 . It is preferable that an application connectable to the automatic multimedia generation system 100 is installed and driven in the terminal 10 .

The terminal 10 may be implemented as a computer capable of accessing a remote server or terminal through the network 50 . Here, the computer may include, for example, a laptop equipped with a web browser, a desktop, and a laptop. In addition, the terminal may be implemented as a terminal device capable of accessing a remote server or terminal through a network. A terminal device is, for example, a wireless communication device that ensures portability and mobility, and includes a Personal Communication System (PCS), a Global System for Mobile communications (GSM), a Personal Digital Cellular (PDC), a Personal Handyphone System (PHS), and a PDA (Personal Communication System). Personal Digital Assistant), International Mobile Telecommunication (IMT)-2000, Code Division Multiple Access (CDMA)-2000, W-Code Division Multiple Access (W-CDMA), Wireless Broadband Internet (Wibro) terminals, smartphones, It may include all types of handheld-based wireless communication devices, such as a smart pad, a tablet PC, and the like.

Here, the network 50 means a connection structure capable of exchanging information with each other, such as a plurality of terminals and servers, and an example of such a network includes a 3rd Generation Partnership Project (3GPP) network, a Long Long (LTE) network. Term Evolution) network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), A Bluetooth network, a satellite broadcasting network, an analog broadcasting network, a Digital Multimedia Broadcasting (DMB) network, etc. are included, but are not limited thereto.

The voice input unit 110 may receive user voice data from the user terminal 10 . The user voice data may be directly recorded by the user or may be recorded by a third party. It can be part or all of a movie or music, or it can be a real-time speech of the user.

The text converter 120 may convert voice data into text data. Various well-known text-to-speech conversion programs may be used as the conversion program of the text conversion unit. That is, the voice audio file is converted into a text file by the text conversion unit 120 .

The text conversion unit 120 may extract a keyword from sentences included in the text data, and generate a summary sentence including one or more keywords. Using artificial intelligence, long sentences can be converted into short sentences containing keywords, and unimportant sentences without keywords can be deleted.

An extractive summary method of extracting important key keywords from within a sentence and making a summary sentence with the keywords may be used. In addition, an abstract summary method that directly writes a concise summary sentence that can understand the entire sentence and express the content well based on natural language processing technology based on deep learning may be used.

As a method for finding keywords in the extractive summary method, a statistical method, frequency survey, may be used. In addition, as a more sophisticated method, TF-IDF (Term Frequency - Inverse Document Frequency), information gain, mutual information, and other IDF-modified measures may be used. In selecting keywords, the importance of words, position of sentences, frequency of positive keywords (keywords that frequently appear in important sentences) in sentences, frequency of negative keywords (keywords that appear frequently in non-important sentences), centrality of sentences, titles The degree of similarity with the family, the length of the sentence, the presence or absence of numerical data in the sentence, the presence or absence of an entity name, etc. may be considered.

In addition, through machine learning, it is possible to train a classifier that can classify whether a newly input sentence is important or trivial by receiving as input a number of sentence pairs labeled as important or trivial sentences.

The video search unit 130 may search unit video data corresponding to the text data. More specifically, the video search unit 130 may search for a plurality of unit video data related to a corresponding summary sentence for each summary sentence. The unit video data means a short video, and a short video of 5 to 30 seconds is preferable. The unit video data may be stored in advance in the video database 150 .

One summary sentence may include a plurality of keywords, and unit video data may be searched based on these keywords. The unit video data may be machine-learned as related to a keyword, or may be preset as related to a keyword. The video data related to the keyword becomes the related video data of the summary sentence including the keyword.

For example, as shown in FIG. 3 , in the summary sentence "We run to the blue sea full of dreams and hopes", for each of the keywords "dream", "hope", "sea", and "run" Related unit video data can be searched. Video A1, video A2, video A3 and video An are searched for keyword A "dream", video B1, video B2, video B3 video Bn is searched for keyword B "hope", and video Bn is searched for keyword C " Video C1, video C2, and video C3 are searched for video C1, video C2, and video C3 for "sea", and video D1, video D2, video D3 and video Dn can be searched for the keyword D, "I'm running". Video A1, Video A2, Video A3 Short Video An, Video B1, Video B2, Video B3 Short Video Bn, Video C1, Video C2, Video C3 Short Video Cn, Video D1, Video D2, Video D3 Short Video Dn is a summary sentence of related video data.

The moving picture search unit 130 may search for such unit moving picture data in the moving picture database 150, collect unit moving picture data by searching through the Internet, or use an external database.

The video generating unit 140 may create customized video data by combining unit video data. The video generating unit 140 may connect unit videos selected as most related to the corresponding keyword in the keyword order.

The degree of relevance between the unit video and the keyword may be determined by the cumulative evaluation of users, and the user does not evaluate the direct relevance of the keyword and the unit video, but when the keyword enters the summary sentence, the summary sentence is expressed. The suitability score with the customized video data is directly applied to keywords included in the summary sentence and unit video data included in the customized video data, and the suitability score is accumulated. That is, the suitability is not evaluated by looking at only one keyword, but the suitability between the keyword and the unit video data is evaluated in consideration of the relationship between the keyword and other keywords.

To this end, the feedback collection unit 160 may receive evaluation from the user terminal 10 as a score on whether the searched unit video data is suitable for the keyword based on the search. This evaluation may be performed on the user's own summary sentences or on other people's summary sentences. All summary sentences generated in the system may be disclosed to all users and evaluated by the feedback collecting unit 160 .

The feedback collecting unit 160 receives a suitability score from the user terminals 10 in relation to suitability between the summary sentence and the corresponding customized video data, and sets the sent suitability score to a keyword included in the summary sentence, and customized It is included in the video data and may be given as a suitability score between unit video data corresponding to the keyword. The suitability scores transmitted from other user terminals are also accumulated in the suitability scores between the keyword and the unit video data.

The feedback collector 160 may set the reciprocal of the accumulated suitability score as the suitability distance between the keyword and the unit video data.

For example, when customized video data generated for a summary sentence including keyword A, keyword B, keyword C, and keyword D consists of unit video data A1, unit video data B3, unit video data C1, and unit video data D4 E, the sum of the evaluation scores of users for the summary sentence and the customized video data is 524 points, the sum of the existing cumulative suitability scores for the suitability of the keyword A “dream” and the unit video data A1 is 1234 points, and the keyword B is “ The sum of the existing cumulative suitability scores for the suitability of “hope” and unit video data B3 is 43 points, the sum of the existing cumulative suitability scores for the suitability of keyword C, “sea” and unit video data C1, is 153 points, keyword D If the sum of "I'm running" and the existing cumulative suitability score for the suitability of the unit video data D4 is 732 points, the cumulative suitability score of the keyword A "dream" and the unit video data A1 becomes 1758 points, and the keyword "B" The cumulative suitability score of “hope” and unit video data B3 becomes 567 points, the cumulative suitability score of keyword C “sea” and unit video data C1 becomes 677 points, and the cumulative conformity score of keyword D “runs” and unit video data D4 The suitability score is 1256 points.

Therefore, the suitability distance between the keyword A “dream” and the unit video data A1 is 0.0005688, the fit distance between the keyword B “hope” and the unit video data B3 is 0.0017637, and the The fit distance becomes 0.0014771, and the fit distance between the keyword D "runs" and the unit video data D4 becomes 0.0007961.

In this way, the feedback collection unit 160 gives the same relevance distance to each relationship between keywords included in the summary sentence and unit video data included in the customized video data, and this distance is updated by the evaluation of users, as shown in FIG. 4 . As described above, the virtual structure of unit moving picture data in which keywords separated by a suitability distance are arranged around each unit moving picture data may be stored in the moving picture database 150 .

The video search unit 130 may search unit video data having a search radius including all keywords included in the summary sentence, but may select unit video data having the smallest search radius. In other words, when searching for a unit video suitable for the summary sentence "We run to the blue sea full of dreams and hopes" with the keywords "dream", "hope", "sea", and "run", the keyword " You can first find unit video data for which dreams", "hope", "sea", and "run" and the conformance distance are set, and as shown in FIG. 4, A1 with the smallest search radius (DST) is the highest priority You can select and select 3 more unit video data in order of the size of the search radius (DST). Among the four unit video data selected in this way, as the unit video data to be displayed in the keyword “dream” part, the unit video data having the shortest conformance distance to the keyword “dream” from among the four selected unit video data is selected, and the keyword “hope” is selected as the unit video data. As the unit video data to be displayed in the part, the unit video data with the shortest conformance distance to the keyword "hope" is selected from among the four selected unit video data, and as the unit video data to be displayed in the keyword "sea" part, 4 selected unit video data Select the unit video data with the shortest conformance distance to the keyword “sea” among the data, and the keyword “run” and the conformance distance among the four selected unit video data as the unit video data to appear in the keyword “run” section You can create customized video data by selecting the shortest unit of video data.

Music may be combined with the generated customized video data. The music search unit 180 may search for music data corresponding to the customized video data. Music data may be stored in advance in the music database 170 . The music search unit 180 may search such music data in the music database 170 , collect music data by searching through the Internet, or use an external music database.

Specifically, the music search unit 180 may search the music database 170 for a plurality of music data corresponding to keywords included in one customized video data.

The degree of relevance between music data and unit video data can be determined by the cumulative evaluation of users, and when the unit video data enters the customized video data, instead of directly evaluating the relationship between music data and unit video data, users , the suitability score between the customized moving image data and the music data is directly applied to the unit moving image data and music data included in the customized moving image data, and the suitability score is accumulated. That is, the suitability is not evaluated by looking at the unit moving picture data alone, but the suitability between the unit moving picture data and the music data is evaluated in consideration of the relationship between the unit moving picture data and other unit moving picture data.

To this end, the feedback collecting unit 160 may receive an evaluation from the user terminal 10 as to whether the searched music data is suitable for the unit video data based on the search as a score. Such evaluation may be performed on the user's own customized video data or on a third party's customized video data. All customized video data generated in the system may be disclosed and evaluated to all users by the feedback collecting unit 160 .

The feedback collecting unit 160 receives a suitability score from the user terminals 10 in relation to suitability between the customized video data and the corresponding music data, and collects the transmitted suitability score as unit video data included in the customized video data. It can be given as a suitability score between and music data. The suitability scores transmitted from other user terminals are also accumulated in the suitability scores between the unit video data and the music data.

The feedback collector 160 may set the reciprocal of the accumulated suitability score as the suitability distance between the unit video data and the music data. In this way, the feedback collecting unit 160 gives the same relevance distance to each relationship between the unit video data and music data included in the customized video data, and this distance is updated according to the evaluation of users, and the relevance distance is based on each music data. A virtual structure of music data in which unit moving image data separated by a distance is arranged may be stored in the moving image database 170 .

The music search unit 180 searches for music data having a search radius including all unit video data included in the customized video data, but may select the music video data having the smallest search radius.

The subtitle generator 190 may add summary sentences corresponding to the customized video data as subtitles at the bottom of the screen.

The customized video data generated in this way may be automatically uploaded to a media platform such as YouTube and made public (S160). Custom video data can go through user verification before it is uploaded, and a machine-learning verification module can do it automatically. The customized video data may be uploaded through an account preset on the media platform, and users may transmit an evaluation score for the customized video data published on the media platform to the feedback collection unit 160 .

Meanwhile, the voice input unit 110 includes an information backup unit, and the information backup unit may back up voice data to a voice database. The voice database is composed of an aggregate of a plurality of sub-databases, and it may be preferable that these sub-databases are physically divided.

In addition, the information backup unit may include an information partitioning unit, a code assignment unit, a random number creator, and a distributed storage unit.

The information partitioning unit arbitrarily divides the voice data into a plurality of pieces, and sets each of the divided pieces as a plurality of individual pieces of information. For example, the individual piece information may include P1, P2, P3, P4, P5, and the like. The voice data may be divided and separated in an X-shape or a zig-zag shape.

The code assigning unit assigns a different code (code) to the plurality of individual pieces of information divided by the information partitioning unit as described above. This code is a kind of ID. For example, codes such as sdf223 to P1, gdf213 to P2, kdf312 to P3, gsd465 to P4, and btu323 to P5 are assigned.

Each of these individual pieces of information P1 to P5 is divided and stored in a sub-database that is an individual physical space, and before being stored, the random number creator uses the codes of these individual pieces of information, that is, sdf223 in P1, gdf213 in P2, kdf312 in P3, and kdf312 in P4. The same random variable is shared with gsd465 for , and btu323 for P5 for a predetermined time.

For example, i) between 16:10:00 and 16:10:20, the same random variable agsdaefdf3456436 for each of the codes sdf223 of P1, gdf213 of P2, kdf312 of P3, gsd465 of P4, and btu323 of P5. ii) From 16:10:20 to 16:10:40, the same random for each of P1's code sdf223, P2's code gdf213, P3's code kdf312, P4's code gsd465, and P5's code btu323 The variable dafdfreyj8143489 is shared. After that, it is repeated.

When there is a call to all voice data at a certain moment, P1 to P5 constituting the voice data needs a process of recombination, and this recombination is mediated through a random variable shared at the moment.

As described above, the distributed storage unit distributes and stores a plurality of individual pieces of information, for example, P1 to P5 in each physically divided voice database.

In addition, for security of conformity score data transmission, the user terminal 10 divides the suitability score data signal to generate a plurality of data pieces, assigns an address to each of the data pieces, and applies a combination rule considering the address. a rule generation module to generate; a first sending module for transmitting a plurality of data pieces to the feedback collecting unit 160 in a random order; and a second sending module that transmits the combination rule to the feedback collection unit 160 . The feedback collector 160 may include a regeneration module that receives the plurality of data pieces and the combination rule transmitted to the feedback collector 160, and combines the plurality of data pieces according to the combination rule to generate data. can

Alternatively, the user terminal 10 is configured to image the suitability score in the form of a QR code and set n first partitioning lines in the horizontal direction in the QR code image, and m second partitioning lines in the vertical direction. partitioning lining module, where n and m are natural numbers; Selecting any one of the first partitioning lines and the second partitioning lines, the selected partitioning line (hereinafter referred to as “first rotation axis partitioning line”) as the center of the QR code image on either side in any one direction Creating a first layered image by rotating the QR code image area on both sides of the first rotation axis partitioning line to overlap in two layers, selecting the other one of the first partitioning lines and the second partitioning lines Step, by rotating the first layered image on either side in either direction around the selected other partitioning line (hereinafter referred to as “second rotation axis partitioning line”) a layering module that performs a layering rule including generating a second layered image by making the one layered image region overlap in two layers; a first sending module for transmitting the second layered image to the feedback collecting unit 160; and a second sending module for transmitting the layering rule to the feedback collecting unit 160 , wherein the feedback collecting unit 160 receives the second layered image and the layering rule transmitted to the feedback collecting unit 160, and layering It may include an unlayering module for generating a QR code image by restoring the second layered image according to the rule.

The functional operations described in this specification and the embodiments related to the present subject matter can be implemented in a digital electronic circuit, computer software, firmware, or hardware, including the structures disclosed herein and structural equivalents thereof, or in a combination of one or more thereof Do.

Embodiments of the subject matter described herein are one or more computer program products, ie, one or more modules directed to computer program instructions encoded on a tangible program medium for execution by or for controlling the operation of a data processing apparatus. can be implemented. A tangible program medium may be a radio wave signal or a computer-readable medium. A radio wave signal is an artificially generated signal, eg a machine-generated electrical, optical or electromagnetic signal, that is generated to encode information for transmission to an appropriate receiver device for execution by a computer. The computer-readable medium may be a machine-readable storage device, a machine-readable storage substrate, a memory device, a combination of materials that affect a machine-readable radio wave signal, or a combination of one or more of these.

A computer program (also known as a program, software, software application, script or code) may be written in any form of any programming language, including compiled or interpreted language or a priori or procedural language, and may be written as a stand-alone program or module; It can be deployed in any form, including components, subroutines, or other units suitable for use in a computer environment.

A computer program does not necessarily correspond to a file in a file system. A program may be in a single file provided to the requested program, or in multiple interacting files (eg, files that store one or more modules, subprograms, or portions of code), or portions of files that hold other programs or data. (eg, one or more scripts stored within a markup language document).

The computer program may be deployed to be executed on one computer or multiple computers located at one site or distributed over a plurality of sites and interconnected by a communication network.

Additionally, the logic flows and structural block diagrams described herein describe corresponding acts and/or specific methods supported by corresponding functions and steps supported by the disclosed structural means, and corresponding software. It can also be used to build structures and algorithms and their equivalents.

The processes and logic flows described herein may be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.

Processors suitable for the execution of computer programs include, for example, both general and special purpose microprocessors and any one or more processors of any kind of digital computer. Typically, the processor will receive instructions and data from read-only memory, random access memory, or both.

A key element of a computer is one or more memory devices for storing instructions and data and a processor for executing instructions. In addition, a computer is generally operably coupled to receive data from, transmit data to, or both of one or more mass storage devices for storing data, such as, for example, magnetic, magneto-optical disks or optical disks. or will include However, the computer need not have such a device.

The present description sets forth the best mode of the invention, and provides examples to illustrate the invention, and to enable any person skilled in the art to make or use the invention. The specification thus prepared does not limit the present invention to the specific terms presented.

Accordingly, although the present invention has been described in detail with reference to the above-described examples, those skilled in the art can make modifications, changes, and modifications to the examples without departing from the scope of the present invention. In short, in order to achieve the intended effect of the present invention, it is not necessary to separately include all the functional blocks shown in the drawings or follow all the orders shown in the drawings. indicate that it may be within the scope

[Explanation of code]

100: multimedia automatic creation system

110: voice input unit

120: character conversion unit

130: video search unit

140: video generation unit

150: video database

160: feedback collecting unit

170: music database

180: music search unit

190: subtitle generator

Claims

In the multimedia automatic generation system connected to the user terminal and the network,

a voice input unit receiving user voice data from the user terminal;

a text converter for converting voice data into text data;

a video search unit for searching unit video data corresponding to text data; and

Automated multimedia generation system, characterized in that it comprises a video generating unit for producing customized video data by combining unit video data.
The method of claim 1,

The text conversion unit extracts keywords from sentences included in the text data, and generates a summary sentence including one or more keywords.
3. The method of claim 2, wherein the multimedia automatic generation system comprises:

Further comprising a video database in which unit video data is stored,

The video search unit searches the video database for a plurality of unit video data corresponding to keywords included in one summary sentence.
4. The method of claim 3, wherein the multimedia automatic generation system comprises:

The automatic multimedia generation system according to claim 1, further comprising: a feedback collecting unit that evaluates whether the searched unit video data matches the keyword of the summary sentence based on the search as a score from the user terminal.
5. The method of claim 4,

The feedback collecting unit receives a suitability score from the user terminals in relation to the suitability between the summary sentence and the corresponding customized video data,

The transmitted suitability score is given as a suitability score between the keyword included in the summary sentence and the unit video data included in the customized video data and corresponding to the keyword,

accumulating scores transmitted from other user terminals into the suitability score;

The reciprocal of the accumulated suitability score is set as the suitability distance between the keyword and the unit video data, and the keywords are arranged so that the keywords fall by the suitability distance centering on each unit video data and stored in a video database. Multimedia automatic creation system.
6. The method of claim 5,

The video search unit searches unit video data having a search radius including all keywords included in the summary sentence, and selects the video data unit having the smallest search radius.
7. The method of claim 6, wherein the multimedia automatic generation system comprises:

a music database in which music data is stored; and

Further comprising a music search unit for searching for music data corresponding to the customized video data,

The music search unit searches the music database for a plurality of music data corresponding to one customized video data.
8. The method of claim 7,

The feedback collecting unit receives scores from user terminals in relation to the suitability between the customized video data and the corresponding music data,

The transmitted score is given as a suitability score between the unit video data included in the customized video data and the music data,

accumulating scores transmitted from other user terminals into the suitability score;

The reciprocal of the accumulated suitability score is set as the suitability distance between the unit video data and the music data, and the unit video data is virtually arranged so that the unit video data is separated by the suitability distance around each music data and stored in the music database. Multimedia automatic creation system with
9. The method of claim 8,

The music search unit searches for music data having a search radius including all unit video data included in the customized video data, and selects music data having the smallest search radius.
The method of claim 1, wherein the multimedia automatic generation system comprises:

Automatic multimedia generation system, characterized in that it further comprises a subtitle generator for adding summary sentences corresponding to the customized video data as subtitles.