WO2023096392A1

WO2023096392A1 - System for automatically producing video

Info

Publication number: WO2023096392A1
Application number: PCT/KR2022/018776
Authority: WO
Inventors: 김도연
Original assignee: 주식회사 콘텐츠민주주의
Priority date: 2021-11-29
Filing date: 2022-11-25
Publication date: 2023-06-01
Also published as: KR102424150B1

Abstract

The present invention provides a system for automatically producing a video, which comprises a video production device. The video production device comprises: a high-definition image reception unit for receiving a 4K high-definition image from a 4K camera in real time; a high-definition image segmentation unit for segmenting the 4K high-definition image into at least two FHD images on the basis of a first artificial intelligence model; an image selection unit for selecting one FHD image from among the at least two FHD images on the basis of a second artificial intelligence model; and a video production unit for inserting the one FHD image into a video to be provided to a user terminal.

Description

Video automatic production system

The present invention relates to an automatic video production system, and relates to an automatic video production system capable of producing high-quality video content at low cost without the need for professional manpower for video production (shooting and editing) through the design of an artificial intelligence model.

Due to the outbreak and spread of COVID-19, the non-face-to-face untact era has arrived. As people's video viewing time naturally increases, the number of people showing interest in video production (filming and editing) is gradually increasing. there is.

However, it is not easy for ordinary people without video production experience to prepare shooting equipment/place and editing equipment/place, and it is difficult to select the optimal location of a camera, microphone, lighting, and chroma key as needed. In addition, when it is necessary to quickly edit the captured video for practical purposes, it will be difficult to edit.

Republic of Korea Patent Registration No. 10-0867407 (2008.11.06.) discloses a mobile immersive virtual environment providing system. However, the patent document discloses a system that can be easily applied even in environments with various harmful factors, such as temperature and humidity changes, vibration, and inflow of dust, and requires video production, but the general public without video production experience It does not disclose a system that can be provided with a video production result without experiencing a problem.

An object of the present invention is to devise a system capable of producing professional-level images without professional knowledge/experience in image production, and an object of the present invention is to automate image production, that is, image shooting and editing.

The present invention is an automatic video production system, including a video production device, wherein the video production device includes a high-definition image receiving unit for receiving 4K high-definition video from a 4K camera in real time, and at least two 4K high-definition videos based on a first artificial intelligence model. A high-definition image segmentation unit that divides into FHD images, an image selector that selects one FHD image from among at least two FHD images based on the second artificial intelligence model, and inserts one FHD image into a video to be provided to a user terminal It includes a video production department.

The first artificial intelligence model may divide a 4K high-definition image into at least two FHD images using at least one of a face recognition algorithm, a motion recognition algorithm, and an object recognition algorithm.

The second artificial intelligence model may select one FHD image based on at least one of the speaker's speech content or the speaker's motion in the 4K high-definition video.

The second artificial intelligence model selects another FHD image from among at least two or more FHD images according to at least one of the speaker's speech content or the speaker's motion in the 4K high-definition video, and the video producer selects one FHD video inserted into the video. It is possible to change from one FHD image to another FHD image.

When the second artificial intelligence model determines that an NG cut has occurred in an FHD image inserted into a video, one FHD image can be selected from among the remaining FHD images excluding the FHD image in which the NG cut has occurred among at least two or more FHD images.

The second artificial intelligence model may determine the NG cut when the speaker does not perform at least one of voice utterance or motion performance for 5 seconds or longer.

The video production apparatus further includes a production video output/modification unit, and the user terminal requests streaming of the video production completed to the production video output/modification unit, and the production video output/modification unit responds to the user terminal's request. A video that has been produced can be streamed to the display unit of the terminal.

The production video output/editing unit may request a user terminal to confirm whether or not an NG cut has occurred in the video that has been produced.

When the user terminal responds to the confirmation request that an NG cut has occurred in a video that has been produced, the production video output/editing unit may request NG cut generation time information from the user terminal.

The user terminal can request the production video output/revision unit to modify the NG cut, and the production video output/revision unit responds to the NG cut modification request and deletes the FHD video in which the NG cut occurred from the video that has been produced and sends the NG cut One FHD video can be selected and inserted in the time zone where the NG cut occurred.

When the user terminal responds to the confirmation request that no NG cut has occurred in the video that has been produced, the production video output/editing unit may transmit the user terminal's response to the payment request/confirmation unit of the user management device.

The present invention can provide a system capable of producing professional-level videos without professional knowledge/experience in video production, and can automate video shooting and editing.

1 is a schematic diagram of a video production system according to an embodiment of the present invention.

2 is a schematic diagram of a user management device according to an embodiment of the present invention.

3 is a schematic diagram of a video recording device according to an embodiment of the present invention.

4 is a schematic diagram of a video production device according to an embodiment of the present invention.

5 is a conceptual diagram of an external appearance of a video production booth including a video recording device and a video production device.

[Description of code]

10: automatic video production system 100: user terminal

200: user management device 210: customer information verification unit

220: Customer information DB 230: Reservation schedule provision/confirmation unit

240: reservation schedule DB 250: activation unit

260: Payment request/confirmation unit 300: Video recording device

310: condenser microphone 320: 4K camera

330: computing device 340: monitor

350: operator PC 360: lighting for chroma key

370: user lighting 380: control unit

400: video production device 410: high-definition video receiver

420: high-definition image segmentation unit 430: image selection unit

440: video production unit 450: production video output / correction unit

460: production video providing unit 470: control unit

Hereinafter, the present invention will be described in detail. However, the present invention is not limited or limited by exemplary embodiments. The objects and effects of the present invention can be naturally understood or more clearly understood by the following description, and the objects and effects of the present invention are not limited only by the following description. In addition, in describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted.

1 is a schematic diagram of an automatic video production system according to an embodiment of the present invention. Referring to FIG. 1 , the automatic video production system 10 includes a user terminal 100 , a user management device 200 , a video capture device 300 and a video production device 400 . The user terminal 100 , the user management device 200 , the video recording device 300 and the video production device 400 are connected through a network and can transmit/receive data to each other.

2 is a schematic diagram of a user management device 200 according to an embodiment of the present invention. Referring to FIG. 2, the user management device 200 includes a customer information confirmation unit 210, a customer information DB 220, a reservation schedule providing/confirmation unit 230, a reservation schedule DB 240, and an activation unit 250. and a payment request/confirmation unit 260.

Upon receiving a log-in request from the user terminal 100, the customer information checking unit 210 checks whether the log-in information is pre-stored in the customer information DB 220 and whether it matches pre-stored customer information. When it is determined that the login information is not pre-stored in the customer information DB 220 or does not match pre-stored customer information, login request rejection information may be provided to the user terminal 100 . When it is confirmed that the login information matches the customer information pre-stored in the customer information DB 220, the login of the user terminal 100 may be accepted.

The reservation schedule providing/confirmation unit 230 stores the reservation schedule DB 240 at the same time as the user terminal 100 logs in, after a predetermined time after the user terminal 100 logs in, or at the request of the user terminal 100. Pre-stored reservation schedule information may be provided to the user terminal 100 . Reservation schedule information includes year/month/day/time information. According to an embodiment of the present invention, after the year/month/day information is primarily provided to the user terminal 100, the year/month/day selection information of the user terminal 100 is provided by the reservation schedule providing/confirming unit 230 Reservation schedule information may be provided to the user terminal 100 in a manner in which time information is secondarily provided to the user terminal 100 after being received by ). Thereafter, it goes without saying that the user terminal 100 provides the time selection information to the reservation schedule providing/confirming unit 230 . However, the aforementioned reservation schedule providing method is not limited thereto, and a known reservation schedule providing method may be used. On the other hand, the reservation schedule information may be provided separately from the reservation schedule and the non-reservation schedule, and it is preferable to disable the reservation schedule so that the user terminal 100 cannot select the reservation schedule.

The activation unit 250 may activate the video recording device 300 and the video production device 400 according to a schedule reserved by the user terminal 100 by selection. The activation unit 250 may deactivate the video recording device 300 and the video production device 400 when there is no reserved schedule or when the reserved schedule has expired. The activation/deactivation described above may be performed through the controller 380 of the video recording device 300 to be described later. According to an embodiment of the present invention, activation may mean switching the power of the video recording device 300 and the video production device 400, which will be described later, from Off to On, and inactivation, on the contrary, turns the power of each device on. It may mean switching from Off to On.

3 is a schematic diagram of a video recording device 300 according to an embodiment of the present invention. Referring to FIG. 3 , a video recording device 300 includes a chroma key for background synthesis, a condenser microphone 310 for recording a user's voice, a 4K camera 320 for photographing a user, and a chroma key for synthesizing video. A computing device 330 capable of displaying content to be displayed and capable of manipulation by a user, a monitor 340 for displaying at least one of the user and the content, an operator PC 350 running video production software, lighting for chroma key 360 ), a user light 370 and a control unit 380 may be included.

The condenser microphone 310, 4K camera 320, computing device 330, monitor 340, operator PC 350, chroma key lighting 360 and user lighting 370 are IOT-based devices. A known IOT-based device can be used, and an IOT environment can be configured in a known method. The controller 380 manages each of the

devices

310, 320, 330, 340, 350, 360, and 370 and controls data processing between the devices.

4 is a schematic diagram of a video production apparatus 400 according to an embodiment of the present invention. Referring to FIG. 4 , the video production device 400 includes a high-quality video receiving unit 410, a high-quality video segmentation unit 420, an image selection unit 430, a video production unit 440, and a production video output/modification unit 450. , A production video providing unit 460 and a control unit 470 are included. The control unit 470 manages each unit 410, 420, 430, 440, 450, and 460, and may control data processing between each unit.

The high-definition video receiver 410 may receive 4K high-definition video from the 4K camera 320 of the video recording device 300 in real time. According to an embodiment of the present invention, one 4K high-definition video can be received in real time from one 4K camera 320 . According to another embodiment of the present invention, two or more 4K high-definition images may be received in real time from two or more 4K cameras 320 .

The high-definition image segmentation unit 420 may divide the 4K high-definition image into at least two FHD images based on the first artificial intelligence model. Currently, the screen size standard mainly used in broadcasting is FHD (Full-HD). Since the 4K high-definition video has a screen size four times that of the FHD video, the high-definition video dividing unit 420 may divide the 4K high-definition video into at least two or more FHD images. However, when the screen sizes of the 4K high-definition video and the FHD video are arithmetically prepared, one 4K high-definition video can be divided into four FHD videos, and the high-definition video division unit 420 must divide the 4K high-definition video into four FHD videos. It doesn't mean that you have to split it into images.

According to an embodiment of the present invention, the first artificial intelligence model of the high-definition image segmentation unit 420 converts at least two or more 4K high-definition images by using at least one or more of a face recognition algorithm, a motion recognition algorithm, and an object recognition algorithm. It can be segmented into FHD video. The face recognition algorithm, motion recognition algorithm, and object recognition algorithm may be known recognition algorithms.

The image selection unit 430 may select one FHD image from among at least two or more FHD images based on the second artificial intelligence model, and the video production unit 440 may select the selected one FHD image as a video to be provided to the user terminal. can be inserted.

For example, if the first artificial intelligence model divides one 4K high-definition image into three FHD images and the first FHD image, the second FHD image, and the third FHD image are generated, while the 4K high-definition image is received, First to third FHD images exist, and the second artificial intelligence model selects one FHD image most suitable for the purpose of producing a video from among the first to third FHD images.

The second artificial intelligence model may select one FHD image from among at least two or more FHD images based on at least one of the speaker's speech content or the speaker's motion in the 4K high-definition image.

For example, the 4K high-definition video is an introduction video for earrings, the first utterance of a speaker in the 4K high-definition video is a greeting, the first and second FHD videos are a full speaker video and a close-up video of the speaker, respectively, and the third FHD video If it is an image of a potted plant prepared in the space where the speaker is located, the second artificial intelligence model selects the first FHD image from among the first to third FHD images. The full speaker image refers to an image including the speaker and the background around the speaker, and the speaker close image refers to an image centered on the speaker. After selecting the first FHD image of the second artificial intelligence model, the video production unit 440 inserts the first FHD image into the video.

The second artificial intelligence model can select another FHD video from among at least two or more FHD videos according to at least one of the speaker's utterance content or the speaker's motion in the 4K high-definition video, and the video production unit selects the FHD video inserted into the video. You can change from one FHD video to another FHD video.

For example, if a speaker in a 4K high-definition video greets for 1 minute, and after the greeting is over, the speaker only mentions the earring worn in his/her own ear or touches his or her ear while mentioning the earring, the second artificial intelligence model After maintaining the selection of the FHD image for 1 minute, the second FHD image is selected, and the video production unit 440 inserts the first FHD image into the video for 1 minute, and then converts the FHD video to be inserted into the video into the second FHD video. change

When the second artificial intelligence model determines that an NG cut has occurred in an FHD image inserted into a video, one FHD image can be selected from among the remaining FHD images excluding the FHD image in which the NG cut has occurred among at least two or more FHD images. According to an embodiment of the present invention, the second artificial intelligence model may determine the NG cut when the speaker does not perform at least one of voice utterance or motion performance for 5 seconds or more.

For example, if 5 seconds elapsed without saying anything because the speaker could not remember the manufacturer name at the moment when he had to mention the name of the manufacturer of the earrings, the second artificial intelligence model judged that an NG cut had occurred in the second FHD image, and A third FHD image may be selected from among the first FHD image and the third FHD image. While inserting the second FHD video into the video, the video production unit 440 changes the FHD video to be inserted into the video into a third FHD video.

The second artificial intelligence model may be an artificial intelligence model pre-learned to select one FHD image from among at least two FHD images based on at least one of a speaker's speech content or a speaker's motion in a 4K high-definition image. The second artificial intelligence model uses various types of videos such as lecture videos, documentary videos, and YouTube videos as learning videos, and identifies a central topic from the contents of the speaker's speech in the learning video, It is possible to learn to match the type of image being output (full full image, close image, background image, object image, etc.).

When the second artificial intelligence model determines that an NG cut has occurred in an FHD image inserted into a video, it is pretrained to select one FHD image from among the remaining FHD images excluding the FHD image in which the NG cut has occurred among at least two or more FHD images. It can be an artificial intelligence model. The second artificial intelligence model recognizes the 'NG' or 'cut' voice of the speaker (e.g., director) by using behind-the-scenes videos of video production such as lecture videos, documentary videos, dramas, movies, and YouTube videos as learning videos. ' or 'cut' can be learned to recognize the situation before the voice occurs as an NG cut.

When the production video output/modification unit 450 receives a video production completion signal from the video production unit 440, the production video output/modification unit 450 transmits the video production completion information to the user terminal 100 and the user terminal ( 100) may request the production video output/modification unit 450 to stream a video that has been produced, and the production video output/modification unit 450 responds to the request of the user terminal 100 to the user terminal 100. The production completed video can be streamed to the display unit of the .

The production video output/modification unit 450 may request the user terminal 100 to confirm whether or not an NG cut has occurred in the video that has been produced, and the user terminal 100 When responding to the confirmation request that an NG cut has occurred in a video that has been produced, the production video output/modifying unit 450 may request NG cut generation time information from the user terminal 100, and the user terminal 100 may provide NG cut generation time information to the production video output/modification unit 450.

The user terminal 100 may request the production video output/modification unit 450 to modify the NG cut together with or separately from the provision of NG cut generation time information, and the production video output/modification unit 450 In response to the modification request, the FHD video in which the NG cut has occurred is deleted from the video that has been produced, and one FHD video can be selected and inserted in the time zone where the NG cut has occurred.

The production video output/modification unit 450 may re-request the selection of whether or not an NG cut occurs while providing the modified production video to the user terminal 100, and the user terminal 100 responds again that an NG cut has occurred If so, the deletion of NG cuts and the insertion of FHD videos are repeated.

When the user terminal 100 responds to the confirmation request of the production video output/modification unit 450 that no NG cut has occurred in the production completed video, the production video output/modification unit 450 responds to the user terminal 100 ) may be transmitted to the payment request/confirmation unit 260 of the user management device 200, and the payment request/confirmation unit 260 may request payment from the user terminal 100.

When the payment request/confirmation unit 260 receives a payment completion signal from the user terminal 100, the payment completion signal may be provided to the production video providing unit 460 of the video production device 400, and the production video The study 460 may provide the created video to the user terminal 100 . Hereinafter, it will be described in more detail through an embodiment of video production.

실시예 1: 강의Example 1: Lecture

During the lecture, the user underlines the lecture content displayed through the laptop 330 (which may be any one of a desktop, a laptop, a tablet PC, etc.) and reads the lecture content, or writes additional explanations in the lecture content. It is assumed that the lecture can be conducted in this manner, and that lecture content can be explained only verbally without a user's action.

The high-definition video receiver 420 receives a 4K high-definition lecture video including both users and lecture contents from the 4K webcam 320 . Here, the contents of the lecture are synthesized with the aforementioned chroma key for background synthesis.

The high-definition video segmentation unit 420 divides the lecture video into three FHD images, that is, a user close video, a lecture content video, and a full user and lecture content video (hereinafter referred to as 'full video' for convenience of explanation). can

The user close image means an image centered on the user's face or an image in which both the user's face and at least a part of the body are displayed. Both the upper and lower body of the user do not have to appear, and only the upper body may appear. The lecture content image refers to an image output to the display unit of the laptop 330 . A full video refers to a video in which both the user and lecture contents appear on a single screen.

The artificial intelligence model of the image selector 430 may select a full image as the starting image of a video based on the characteristics of the lecture video, and the full image may be transmitted to the video production unit 440 and inserted into the starting image of the video.

When the user reads the lecture content while underlining the lecture content, or performs an action such as writing an amplified explanation on the lecture content, the video selection unit 430 may select the lecture content video, and the lecture content video is a video. It is delivered to the production unit 440 and the full video can be converted into a lecture content video.

When a user gazes at the 4K camera 320 during a lecture, the image selector 430 may select a user close-up image, and the user-closed image may be transmitted to the video production unit 440 to convert the lecture content image into a user close-up image. there is.

On the other hand, if the user does not explain or take any action for more than 5 seconds while the user's closed video is transmitted to the video production unit 440 and inserted into the video, the video selection unit 430 selects the lecture content video again. The lecture content video may be transmitted to the video production unit 440 and the user close video may be converted into a lecture content video.

실시예 2: 가방 소개 영상Example 2: Bag introduction video

An example is the case where a user creates a video introducing a bag with a user review while holding the bag in his hand.

The high-definition video receiving unit 420 receives a 4K high-definition introduction video including a full video including the user and the bag from the 4K webcam 320 .

The high-definition image segmentation unit 420 may divide the introduction video into three FHD images, that is, a user close image, a bag close image, and a full image of the user and bag.

The user close image means an image centered on the user's face or an image in which the user's face and body parts are displayed. As described above, both the upper and lower body of the user do not have to appear. The bag closing video refers to an image centered on the bag, and the full video refers to an image in which the user and the bag are both displayed on one screen.

The artificial intelligence model of the image selection unit 430 may select a full image as a starting image of a video based on the characteristics of a product introduction video, and the full image may be transmitted to the video production unit 440 and inserted into the starting video of the video. .

When the user takes an action of opening the bag, the image selection unit 430 may select a bag closing image, and the bag closing image may be transmitted to the video production unit 440 to convert the full image into a bag closing image.

When the user performs an action of putting the bag on the user's shoulder, the image selector 430 may select a user close-up image, and the user close-up image is transmitted to the video production unit 440, and the bag close image is converted into a user close-up image. can be converted

On the other hand, if the user's closed video is transmitted to the video production unit 440 and inserted into the video, and the user who was sitting stands up, the video selection unit 430 may select a full video, and the full video is sent to the video production unit 440. It is delivered so that the user close video can be converted to a full video.

5 is a conceptual diagram of an external appearance of a video production booth including a video recording device and a video production device. Referring to FIG. 5, the video production booth can be transported by vehicle and has a certain internal space. The video production booth may mean a container in the form of a box or a house in the form of a container. The video production booth suffices if it is in a form convenient for production, transport, and installation, and as described above, it is not limited to a box form or a container form. On the other hand, the video production booth includes a door for the user's access, and a known technology used in the booth production method can be applied to the video production booth.

According to the present invention, it is possible to take chroma-key lectures, that is, all types of lectures that require synthesis of lecture contents, and news-type shooting. Since it is filmed in a green chromakey background and the background or lecture contents are synthesized, it is possible to give a lecture like an announcer or VJ. You can also create natural and lively images by using the prompter.

According to the present invention, since camera lectures, for example, online lectures, live lectures, or video conferences can be conducted through desktops, laptops, tablet PCs, etc., images can be filmed and produced without time and space limitations.

Although the present invention has been described in detail through representative embodiments, those skilled in the art will understand that various modifications are possible to the above-described embodiments without departing from the scope of the present invention. will be. Therefore, the scope of the present invention should not be limited to the described embodiments and should not be defined, and should be defined by all changes or modifications derived from the claims and equivalent concepts as well as the claims to be described later.

Claims

Including a video production device,

The video production device,

a high-definition video receiver for receiving 4K high-definition video from a 4K camera in real time;

a high-definition image division unit dividing the 4K high-definition image into at least two FHD images based on a first artificial intelligence model;

an image selector selecting one FHD image from among the at least two FHD images based on a second artificial intelligence model;

A video production unit for inserting the one FHD video into a video to be provided to a user terminal;

The second artificial intelligence model selects another FHD image among the at least two or more FHD images according to at least one of the content of a speaker's speech in the 4K high-definition video or the speaker's motion,

The video production unit changes the FHD video inserted into the video from the one FHD video to the other FHD video.
According to claim 1,

The first artificial intelligence model divides the 4K high-definition image into at least two or more FHD images using at least one algorithm of a face recognition algorithm, a motion recognition algorithm, and an object recognition algorithm.
According to claim 1,

The second artificial intelligence model selects the one FHD video based on at least one of the speech content of the speaker in the 4K high-definition video or the speaker's motion.
According to claim 1,

When the second artificial intelligence model determines that an NG cut has occurred in the FHD image inserted into the video, selects one FHD image from among the remaining FHD images excluding the FHD image in which the NG cut has occurred among the at least two or more FHD images , video automatic production system.
According to claim 4,

The second artificial intelligence model determines that the NG cut is determined when the speaker does not perform at least one of voice utterance or motion performance for 5 seconds or more.
According to claim 1,

The video production device further includes a production video output/modification unit,

The user terminal requests streaming of the produced video to the production video output / modification unit,

The production video output/editing unit streams the produced video to a display unit of the user terminal in response to a request of the user terminal.
According to claim 6,

The production video output/modification unit requests the user terminal to confirm whether or not an NG cut has occurred in the video that has been produced.
According to claim 7,

When the user terminal responds to the confirmation request that an NG cut has occurred in the video that has been produced, the production video output/editing unit requests NG cut generation time information for the user terminal.
According to claim 8,

The user terminal may request modification of the NG cut to the production video output/modification unit, and the production video output/modification unit responds to the request for modification of the NG cut to the FHD where the NG cut occurred in the production completed video. An automatic video production system for deleting a video, selecting one FHD video from among FHD videos in a time zone where the NG cut occurred, and inserting the video into the time zone where the NG cut occurred.
According to claim 9,

When the user terminal responds to the confirmation request that no NG cut has occurred in the produced video, the production video output/editing unit transmits the user terminal's response to the payment request/confirmation unit of the user management device , video automatic production system.