WO2022250439A1

WO2022250439A1 - Streaming data-based method, server, and computer program for recommending image edit point

Info

Publication number: WO2022250439A1
Application number: PCT/KR2022/007394
Authority: WO
Inventors: 이현우; 추성훈; 홍기용; 최우진
Original assignee: 주식회사 잘라컴퍼니
Priority date: 2021-05-28
Filing date: 2022-05-25
Publication date: 2022-12-01
Also published as: KR20230024825A; KR102354592B1

Abstract

Provided are a streaming data-based method, server, and computer program for recommending an image edit point. The streaming data-based method for recommending an image edit point, the method performed by a computing apparatus, according to various embodiments of the present invention, may comprise the steps of: obtaining streaming content information; obtaining edit frame information on the basis of streaming content information; and generating edit point recommendation information on the basis of the edit frame information.

Description

Video editing point recommendation method based on streaming data, server and computer program

Various embodiments of the present invention relate to a method for recommending video edit points based on streaming data, a server, and a computer program.

Today, as the Internet becomes popular with the development of information and communication technology, users can share various information with other users through the Internet environment. In this environment, with the advent of smartphones, platforms based on one-person media are spreading as an environment in which various contents can be easily consumed is prepared.

Along with the growth of media platforms, the number of one-person creators (or streamers) who transmit personal broadcasting through the Internet is gradually increasing. Due to the growth of media platforms and the increase of one-person creators, the amount of video content has exploded.

Along with the explosive increase in the amount of video content, as media for exposing a large amount of video content also increase, viewers tend to search for or select their preferred content to view it.

On the other hand, in recent years, a technology for a consumer-customized video service in which a viewer can select a desired program by himself and freely selects the time and place, away from the conventional broadcasting form in which a provider unilaterally selects and broadcasts a program, is receiving particular attention. In particular, among viewer-customized video services, demand for highlight video extraction to summarize and view only the information desired by viewers is increasing.

Accordingly, a single creator creates and exposes a highlight video by collecting, editing, and summarizing only interesting parts (highlight parts) from the original video in order to attract viewers' attention. However, since the basics of video editing are to explore and edit interesting parts that viewers who watch edited videos can enjoy, that is, the video editing method is still time-consuming, and the original video (or streaming video) that has been in progress for a long time. It is very difficult to monitor all and find edit points.

For example, when producing a highlight video, editing points are selected based on an editor's personal judgment, which increases monitoring time and judgment cost. That is, there is a concern that the subjective criterion of the editor is applied to the criterion for selecting the highlight section from the original video, and objectivity may be lacking, and the quality of the generated highlight video may vary depending on the editor's competency.

Therefore, in the industry, there is a server for maximizing the editor's editing efficiency through the recommendation of various editing points as well as understanding the flow of the video based on information that guarantees objectivity or reliability in the process of creating a highlight video for the streamer's original video. There may be a demand for

[Prior art literature]

[Patent Literature]

Korean registered patent 2010-0085720

The problem to be solved by the present invention has been devised in response to the above background art, and is to provide recommendation information related to an image editing point based on streaming data.

The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the description below.

A method for recommending video edit points based on streaming data according to an embodiment of the present invention for solving the above problems is disclosed. The method may include obtaining streaming content information, obtaining edit frame information based on the streaming content information, and generating edit point recommendation information based on the edit frame information.

In an alternative embodiment, the streaming content information includes basic content information including information related to the user's broadcast content, streaming image data related to the broadcast content, and viewers related to reactions of a plurality of viewers watching the streaming image data. and streaming data including response data, wherein the edit frame information, which is information based on generating the edit point recommendation information, is applied to at least some of the plurality of video sub-data of the plurality of video sub-data constituting the streaming video data. Related main video sub-data identification information and recommendation strength information corresponding to the main video sub-data identification information may be included.

In an alternative embodiment, the obtaining of the edit frame information may include obtaining first edit frame information through image analysis of the streaming image data, and second edit frame information through response analysis of the viewer response data. The method may include acquiring information and obtaining the edited frame information by integrating the first edited frame information and the second edited frame information.

In an alternative embodiment, the obtaining of the edit frame information may include generating weight application information related to at least one of the first edit frame information and the second edit frame information based on the content basic information; and The method may include acquiring the edit frame information by integrating the first edit frame information and the second edit frame information based on the weight application information.

In an alternative embodiment, the viewer response data includes information about the number of viewers watching the streaming video data, information about a chatting frequency related to the streaming video data, chatting keyword information related to the streaming video data, and the streaming video data. At least one of donation information related to video data may be included, and the reaction analysis may be an analysis of a real-time variation of the viewer reaction data corresponding to the streaming video data.

In an alternative embodiment, the obtaining of the edit frame information may include one or more first main video sub-data corresponding to the first edit frame information and one or more second main video sub-data corresponding to the second edit frame information. It may be characterized in that a weight is assigned to the recommendation strength information based on the similarity of each viewpoint of data.

In an alternative embodiment, the edit point recommendation information is information on one or more recommended edit points related to the streaming video data, and includes one or more edit point recommendation sub information corresponding to each of the one or more recommended edit points, The generating of the edit point recommendation information may include: searching for one or more related sub-data identification information based on the one or more main video sub-data identification information included in the edit frame information; and identifying the one or more main video sub-data. and generating the one or more edit point recommendation sub information based on information and the one or more related sub data identification information, wherein each of the one or more related sub data identification information is at the beginning of each edit point recommended sub information. It may include related start frame information and end frame information related to the end of each edit point recommendation sub information.

In an alternative embodiment, the method further comprises generating and providing a video editing user interface based on the edit point recommendation information, wherein the video editing user interface includes the edit point recommendation information corresponding to the streaming video data. and a video editing screen that allows a user's adjustment input to the video editing screen, wherein the video editing screen is based on the recommendation strength information corresponding to each of the one or more edit point recommendation sub information. Thus, each edit point may be characterized in that it is displayed through different visual expressions.

According to another embodiment of the present disclosure, a computing device for performing a video edit point recommendation method based on streaming data is disclosed. The computing device includes a storage unit that stores one or more instructions and a processor that executes one or more instructions stored in the memory, and the processor executes the one or more instructions, thereby editing the video based on the streaming data. A point recommendation method can be performed.

According to another embodiment of the present invention, a computer program stored in a computer-readable recording medium is disclosed. The computer program may be combined with a computer, which is hardware, to perform a video editing point recommendation method based on streaming data.

Other specific details of the invention are included in the detailed description and drawings.

According to various embodiments of the present invention, in the process of editing a summary of a video, it is possible to improve the editor's video editing efficiency by providing recommended information related to video editing points.

The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

1 is a conceptual diagram illustrating a system in which various aspects of a streaming data-based video edit point recommendation server related to an embodiment of the present invention can be implemented.

2 shows a block diagram of a streaming data-based video edit point recommendation server related to an embodiment of the present invention.

3 is an exemplary view illustrating a process of generating edit point recommendation information based on streaming content information related to an embodiment of the present invention.

4 is an exemplary diagram illustrating an image editing user interface related to an embodiment of the present invention.

5 is a schematic diagram showing a method for training a classification model related to an embodiment of the present invention.

FIG. 6 is an exemplary diagram illustrating edit types classified into a plurality according to each user's style related to an embodiment of the present invention.

7 is an exemplary view illustrating edit point recommendation information and edit point correction information related to an embodiment of the present invention.

8 is a flowchart illustrating an exemplary embodiment of the present invention and a method for recommending video edit points based on streaming data.

9 is a schematic diagram illustrating one or more network functions related to an embodiment of the present invention.

Various embodiments are now described with reference to the drawings. In this specification, various descriptions are presented to provide an understanding of the present disclosure. However, it is apparent that these embodiments may be practiced without these specific details.

The terms “component,” “module,” “system,” and the like, as used herein, refer to a computer-related entity, hardware, firmware, software, a combination of software and hardware, or an execution of software. For example, a component may be, but is not limited to, a procedure, processor, object, thread of execution, program, and/or computer running on a processor. For example, both an application running on a computing device and a computing device may be components. One or more components may reside within a processor and/or thread of execution. A component can be localized within a single computer. A component may be distributed between two or more computers. Also, these components can execute from various computer readable media having various data structures stored thereon. Components may be connected, for example, via signals with one or more packets of data (e.g., data and/or signals from one component interacting with another component in a local system, distributed system) to other systems and over a network such as the Internet. data being transmitted) may communicate via local and/or remote processes.

In addition, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or”. That is, unless otherwise specified or clear from the context, “X employs A or B” is intended to mean one of the natural inclusive substitutions. That is, X uses A; X uses B; Or, if X uses both A and B, "X uses either A or B" may apply to either of these cases. Also, the term "and/or" as used herein should be understood to refer to and include all possible combinations of one or more of the listed related items.

Also, the terms "comprises" and/or "comprising" should be understood to mean that the features and/or components are present. However, it should be understood that the terms "comprises" and/or "comprising" do not exclude the presence or addition of one or more other features, elements, and/or groups thereof. Also, unless otherwise specified or where the context clearly indicates that a singular form is indicated, the singular in this specification and claims should generally be construed to mean "one or more".

Those skilled in the art will further understand that the various illustrative logical blocks, components, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein may be implemented using electronic hardware, computer software, or combinations of both. It should be recognized that it can be implemented as To clearly illustrate the interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented in hardware or as software depends on the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. However, such implementation decisions should not be interpreted as causing a departure from the scope of this disclosure.

The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be apparent to those skilled in the art of this disclosure. The general principles defined herein may be applied to other embodiments without departing from the scope of this disclosure. Thus, the present disclosure is not limited to the embodiments presented herein. This disclosure is to be interpreted in the widest light consistent with the principles and novel features presented herein.

In this specification, a computer means any kind of hardware device including at least one processor, and may be understood as encompassing a software configuration operating in a corresponding hardware device according to an embodiment. For example, a computer may be understood as including a smartphone, a tablet PC, a desktop computer, a laptop computer, and user clients and applications running on each device, but is not limited thereto.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

Although each step described in this specification is described as being performed by a computer, the subject of each step is not limited thereto, and at least a part of each step may be performed in different devices according to embodiments.

1 is a conceptual diagram illustrating a system in which a streaming data-based video edit point recommendation server related to an embodiment of the present invention can be implemented.

As shown in FIG. 1 , a system according to embodiments of the present invention may include a server 100, a user terminal 10, an external server 20, and a network. Components shown in FIG. 1 are exemplary, and additional components may exist or some of the components shown in FIG. 1 may be omitted. The server 100 according to embodiments of the present invention, the user terminal 10, and the external server 20 may mutually transmit and receive data for the system according to embodiments of the present invention through a network.

Networks according to embodiments of the present invention include a Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), and Very High Speed DSL (VDSL). ), UADSL (Universal Asymmetric DSL), HDSL (High Bit Rate DSL), and various wired communication systems such as a local area network (LAN) may be used.

In addition, the network presented here is CDMA (Code Division Multi Access), TDMA (Time Division Multi Access), FDMA (Frequency Division Multi Access), OFDMA (Orthogonal Frequency Division Multi Access), SC-FDMA (Single Carrier-FDMA) and Various wireless communication systems may be used, such as different systems.

The network according to the embodiments of the present invention may be configured regardless of its communication mode, such as wired and wireless, and is composed of various communication networks such as a personal area network (PAN) and a wide area network (WAN). It can be. In addition, the network may be the known World Wide Web (WWW), or may use a wireless transmission technology used for short-range communication, such as Infrared Data Association (IrDA) or Bluetooth. The techniques described herein may be used in the networks mentioned above as well as other networks.

According to an embodiment of the present invention, user terminal 10 may refer to any type of node(s) in a system having a mechanism for communication with server 100 . The user terminal 10 is a terminal that can receive edit point recommendation information for streaming content information (eg, streaming video) through information exchange with the server 100, and may refer to a terminal possessed by a user. For example, the user terminal 10 may be a terminal related to a streamer transmitting real-time video content or an editor in charge of editing a specific streamer. The user terminal 10 may transmit various real-time contents, such as video and audio contents, for example, through a streaming server, and a plurality of viewers connect to the corresponding streaming server and transmit the real-time contents through the user terminal 10. can watch.

User terminal 10 may refer to any type of entity(s) in a system having a mechanism for communication with server 100 . For example, the user terminal 10 includes a personal computer (PC), a note book, a mobile terminal, a smart phone, a tablet PC, and a wearable device. and the like, and may include all types of terminals capable of accessing wired/wireless networks. In addition, the user terminal 10 may include an arbitrary server implemented by at least one of an agent, an application programming interface (API), and a plug-in. In addition, the user terminal 10 may include an application source and/or a client application.

According to an embodiment of the present invention, the external server 20 may be a server that stores information related to a plurality of learning data for neural network learning. For example, the external server 20 may store information about editing styles of a plurality of streamers or information about content transmitted by each streamer. For another example, the external server 20 may store information about a plurality of entire content videos corresponding to a plurality of streamers or a plurality of streamer terminals and an edited video (or highlight video) corresponding to each content video. . Information stored in the external server 20 can be used as learning data, verification data, and test data for training the neural network in the present invention. The server 100 of the present invention may build a learning data set based on data received from the external server 20, and by learning a neural network model including one or more network functions through the training data set, An inventive editing style classification model, a style editing store recommendation model, or a custom editing store recommendation model may be created.

The external server 20 is a digital device, and may be a digital device equipped with a processor, memory, and arithmetic capability, such as a laptop computer, a notebook computer, a desktop computer, a web pad, and a mobile phone. The external server 20 may be a web server that processes services. The types of servers described above are only examples and the present disclosure is not limited thereto.

In a further embodiment, the external server 20 may be a streaming server. Streaming server is a server that transmits real-time content through the Internet and provides viewing service by interconnecting real-time content related to each of a plurality of user (eg, streamer) terminals to each of a plurality of user (eg, viewer) terminals related to viewers can be The streaming server may acquire viewer response data related to real-time content from each viewer terminal in relation to real-time content watched by each of a plurality of viewer terminals and transmit the obtained viewer response data to the server 100 .

According to an embodiment of the present invention, a streaming server may be included in the server 100, and functions of transmitting content and providing information recommending edit points may be performed in one integrated server. In this example, when the streaming server and the server 100 are implemented as one integrated server, the server 100 may obtain viewer response data for real-time streaming content from a plurality of viewer terminals.

According to an embodiment of the present invention, the server 100 may provide edit point recommendation information for the streaming content information 210 . Here, the streaming content information 210 includes streaming image data 220 related to broadcast content transmitted by a user (eg, a streamer) and viewer response data 230 related to a reaction of a plurality of viewers watching the corresponding image data. The edit point recommendation information 610 is recommendation information related to a highlight section in which the fun factor appears most prominently among all frames of the corresponding video data, for example, sections in which viewers are most interested in the corresponding video data. It may be information about. That is, the server 100 may recommend and provide one or more edit points related to video data based on the real-time streaming content information 210 related to the user (ie, the streamer). Accordingly, after real-time streaming broadcasting, when producing an editing video related to the broadcast data, various editing points can be recommended and information related to understanding the flow of the video can be recommended, reducing editing time and improving the editor's editing efficiency. It can be.

The server 100 may generate an image editing user interface 300 according to embodiments of the present invention. The server may be a computing system that provides information to clients (eg, user terminals) over a network. The server 100 may transmit the generated video editing user interface 300 to the user terminal 10 . In this case, the user terminal 10 may be any type of computing device capable of accessing the server 100 . The processor 130 of the server 100 may transmit the video editing user interface 300 to the user terminal 10 through the network unit 110 .

Although only one server 100 is shown in FIG. 1, it is common in the present application that more servers may also be included in the scope of the present invention and that the server 100 may include additional components. will be clear to those who have knowledge of That is, the server 100 may be composed of a plurality of computing devices. In other words, a set of a plurality of nodes may constitute the server 100 .

According to an embodiment of the present invention, the server 100 may be a server providing cloud computing services. More specifically, the server 100 is a kind of Internet-based computing and may be a server that provides a cloud computing service that processes information with another computer connected to the Internet rather than a user's computer. The cloud computing service may be a service that stores data on the Internet and allows users to use the data stored on the Internet anytime and anywhere through Internet access without installing necessary data or programs on their computers. Easy to share and forward with just a click. In addition, the cloud computing service not only simply stores data in a server on the Internet, but also allows users to perform desired tasks by using the functions of application programs provided on the web without installing a separate program. It may be a service that allows you to work while sharing. In addition, the cloud computing service may be implemented in the form of at least one of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), virtual machine-based cloud server, and container-based cloud server. . That is, the server 100 of the present invention may be implemented in the form of at least one of the aforementioned cloud computing services. The specific description of the cloud computing service described above is just an example, and may include any platform for constructing the cloud computing environment of the present invention.

A detailed description of a learning method for a neural network in the present invention, a learning process, a method of constructing an editing style database, and a method of providing recommendation information for editing points based on streaming content information is provided with reference to FIGS. 2 to 6 below. Let me tell you later.

As shown in FIG. 2 , the server 100 may include a network unit 110 , a memory 120 and a processor 130 . The components included in the above-described server 100 are examples, and the scope of the present invention is not limited to the above-mentioned components. That is, additional components may be included or some of the above components may be omitted according to implementation aspects of the embodiments of the present invention.

According to an embodiment of the present invention, the server 100 may include an external server 20 and a network unit 110 that transmits and receives data to and from the user terminal 10 . The network unit 110 may transmit/receive streaming content information and edit point recommendation information corresponding to the streaming content information according to an embodiment of the present invention with at least one of the user terminal 10 and the external server 20 . For example, the server 100 may receive streaming content information 210 from the user terminal 10 through the network unit 110 . For another example, the server 100 may transmit edit point recommendation information generated corresponding to the streaming content information to the user terminal 10 through the network unit 110 . As another example, the server 100 may receive a training data set for learning a neural network from the external server 20 through the network unit 110 . The detailed description of information transmitted and received by the above-described network unit is only an example, and the present disclosure is not limited thereto. Additionally, the network unit 110 may permit information transmission between the server 100, the external server 20, and the user terminal 10 by calling a procedure to the server 100.

The network unit 110 according to an embodiment of the present invention includes a Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), and VDSL ( Various wired communication systems such as Very High Speed DSL), Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN) may be used.

In addition, the network unit 110 presented in this specification includes Code Division Multi Access (CDMA), Time Division Multi Access (TDMA), Frequency Division Multi Access (FDMA), Orthogonal Frequency Division Multi Access (OFDMA), SC-FDMA ( Single Carrier-FDMA) and other systems.

In the present invention, the network unit 110 may be configured regardless of its communication mode, such as wired and wireless, and may be configured with various communication networks such as a personal area network (PAN) and a wide area network (WAN). can In addition, the network may be the known World Wide Web (WWW), or may use a wireless transmission technology used for short-range communication, such as Infrared Data Association (IrDA) or Bluetooth. The techniques described herein may be used in the networks mentioned above as well as other networks.

According to an embodiment of the present invention, the memory 120 may store a computer program for performing a streaming data-based image editing point recommendation method, and the stored computer program may be read and driven by the processor 130 . In addition, the memory 120 may store any type of information generated or determined by the processor 130 and any type of information received by the network unit 110 . Also, the memory 120 may store streaming content information or edit point recommendation information corresponding to the streaming content information. For example, the memory 120 stores input/output data (eg, edit frame information corresponding to streaming data, edit point recommendation information corresponding to the edit frame information, and user generated based on edit point recommendation information). interfaces, etc.) can be stored temporarily or permanently. For example, the memory 120 may store application programs for identifying edit frame information based on video data and viewer response data related to the video data. For another example, the memory 120 may store a pretrained neural network model for providing a recommended editing point based on a personalized editing style corresponding to each streamer. A specific description of the information stored in the aforementioned memory is only an example, and the present invention is not limited thereto.

According to an embodiment of the present disclosure, the memory 120 is a flash memory type, a hard disk type, a multimedia card micro type, or a card type memory (eg, SD or XD memory, etc.), RAM (Random Access Memory, RAM), SRAM (Static Random Access Memory), ROM (Read-Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory) -Only Memory), a magnetic memory, a magnetic disk, and an optical disk may include at least one type of storage medium. The server 100 may operate in relation to a web storage performing a storage function of the memory 120 on the Internet. The above description of the memory is only an example, and the present disclosure is not limited thereto.

According to an embodiment of the present invention, the processor 130 may be composed of one or more cores, a central processing unit (CPU), and a general purpose graphics processing unit (GPGPU) of a computing device. , a processor for data analysis and deep learning, such as a tensor processing unit (TPU).

The processor 130 may read a computer program stored in the memory 120 and perform data processing for deep learning according to an embodiment of the present invention. According to an embodiment of the present invention, the processor 130 may perform an operation for learning a neural network. The processor 130 is used for neural network learning, such as processing input data for learning in deep learning (DL), extracting features from input data, calculating errors, and updating neural network weights using backpropagation. calculations can be performed.

Also, in the processor 130, at least one of CPU, GPGPU, and TPU may process learning of a network function. For example, the CPU and GPGPU can process learning of network functions and data classification using network functions. In addition, in one embodiment of the present invention, the learning of a network function and data classification using a network function may be processed by using processors of a plurality of computing devices together. In addition, a computer program executed in a computing device according to an embodiment of the present invention may be a CPU, GPGPU or TPU executable program.

In this specification, network functions may be used interchangeably with artificial neural networks and neuron networks. In this specification, a network function may include one or more neural networks, and in this case, an output of the network function may be an ensemble of outputs of one or more neural networks.

The processor 130 may read the computer program stored in the memory 120 and provide an editing style classification model and a custom edit point recommendation model according to an embodiment of the present invention. According to an embodiment of the present invention, the processor 130 may generate edit point recommendation information corresponding to the user's streaming data information. According to an embodiment of the present invention, the processor 130 may perform calculations for training an editing style classification model and a custom edit point recommendation model.

According to one embodiment of the present invention, the processor 130 may normally process the overall operation of the server 100 . The processor 130 provides or processes appropriate information or functions to a user or user terminal by processing signals, data, information, etc. input or output through the components described above or by driving an application program stored in the memory 120. can do.

According to an embodiment of the present invention, the processor 130 may obtain streaming content information 210 . The streaming content information 210 may include basic content information and streaming data. Here, the basic content information may include information related to broadcast content of a user (ie, a streamer). For example, the basic content information may include information indicating that the streaming image data of the first user is related to at least one of game broadcasting content, outdoor broadcasting content, and communication broadcasting content. Also, for example, the basic content information may further include information related to the name, age, and gender of a user (or streamer) transmitting streaming video data. The detailed description of the above-described basic content information is only an example, and the present invention is not limited thereto.

Streaming data may include streaming image data 220 related to broadcast content and viewer response data 230 related to reactions of a plurality of viewers watching the corresponding image data. The streaming image data 220 may be data related to an image comprising a plurality of image sub-data as a plurality of frames. The viewer reaction data 230 may be data related to reactions of one or more viewers who watch streaming video data transmitted in real time. For example, the viewer response data 230 includes information on the number of viewers watching the streaming video data, information on chatting frequency related to the streaming video data, chatting keyword information related to the streaming video data, and donation information related to the streaming video data. may include at least one of them. As a specific example, the viewer reaction data 230 is a chat input received through a chat window from a first viewer terminal in relation to a first point in streaming video data transmitted in real time by a user terminal (ie, a streamer terminal) It may be information about. For another example, the viewer response data 230 is information about a donation input received from a second viewer terminal in relation to a second point in streaming video data transmitted in real time by a user terminal (ie, a streamer terminal) can The detailed description of the above-described viewer response data is only an example, and the present invention is not limited thereto.

According to an embodiment of the present invention, the processor 130 may obtain the edit frame information 240 based on the streaming content information 210 . The edit frame information 240 may be information based on generating edit point recommendation information. The edit frame information 240 may include main video sub-data identification information related to at least some video sub-data among a plurality of video sub-data constituting the video data and recommendation strength information corresponding to the main video sub-data identification information. have. That is, the edited frame information 240 may include identification information related to some frames (ie, main video sub-data) determined to be important among a plurality of frames constituting the video data and information about the degree of recommendation of the corresponding frame. can A detailed description of a process of obtaining the editing frame information 240 based on the streaming content information 210 will be described later with reference to FIG. 3 .

FIG. 3 illustrates an example diagram illustrating a process of generating edit point recommendation information 610 based on streaming content information 210 related to an embodiment of the present invention.

Referring to FIG. 3 , the streaming content information 210 may include streaming image data 220 and viewer response data 230, and the processor 130 may generate an image for the streaming image data 220. Edit frame information 240 may be obtained based on the analysis 221 and the reaction analysis 231 for the viewer reaction data 230 .

Specifically, the processor 130 may obtain the first edit frame information 222 through image analysis 221 of the streaming image data 220 . In this case, the video analysis 221 of the streaming video data 220 may include voice analysis and video analysis of the streaming video data 220 . In an embodiment, a video analysis model and a sound analysis model may be used to detect or identify a specific frame in the streaming image data 220 . The video analysis model may be a neural network model trained to detect a specific event (eg, a specific image frame) within a video. For example, the video analysis model may identify a frame related to a specific event within a game screen. For example, the video analysis model may identify frames related to a scene in which a player (eg, a streamer) kills another player or is killed by another player in relation to a first game. As another example, the video analysis model may identify a frame related to a screen corresponding to a point in time when a player clears a specific stage during the first game play. The detailed description of the specific frame identified through the above-described video analysis model is only an example, and the present disclosure is not limited thereto.

According to the embodiment, the acoustic analysis model may be a model that identifies a specific frame in the streaming image data 220 based on at least one of a change in the size of sound related to the streaming image data 220 or whether a specific keyword is detected. For example, the sound analysis model may identify a video frame related to a point in time when the sound of a game play or the sound of a user (ie, a streamer) rapidly increases in the audio data of the streaming video data 220. . For another example, the sound analysis model may identify a video frame related to a point in time when a specific game-related keyword is recognized in the audio data of the streaming video data 220 or when it detects that the user's speech is a specific keyword. . As a specific example, the acoustic analysis model may identify a specific frame by detecting a moment when the first streamer screams (ie, a sound volume rapidly increases) in the streaming image data. For another example, the sound analysis model may identify a specific frame by detecting a moment when a specific keyword (eg, pentakill) is recognized in relation to a game in streaming video data. The detailed description of the specific frame identified through the above-described acoustic analysis model is only an example, and the present disclosure is not limited thereto.

In an embodiment, the acoustic analysis model may be a keyword recognition model constructed through one or more network functions and trained to implement a keyword spotting technique. For example, the keyword recognition model takes one or more features corresponding to audio data of streaming video data as an input and calculates a matching score between the audio data and a predetermined keyword, so that the audio data is a keyword for identifying a specific frame. As a result, it may be a model that determines whether or not it is appropriate. According to an embodiment, the keyword recognition model may be a deep neural network model that calculates a score corresponding to voice data based on one or more features related to a spectrogram corresponding to the voice data.

According to an additional embodiment, the processor 130 may obtain recommendation strength information corresponding to each main image sub-data included in the first edit frame information 222 . For example, the processor 130 may pre-map different recommendation strength information for each event. Accordingly, the processor 130 may obtain recommendation strength information corresponding to '8' in relation to the frame in which another player is killed (eg, the first video sub data), and in relation to the frame in which the player clears a specific stage. Accordingly, recommendation strength information corresponding to '3' may be obtained. The processor 130 may obtain high recommendation strength information in response to a frame in which viewers are expected to be more interested in a video analysis process (eg, a frame related to an event affecting winning or losing a game).

Also, the processor 130 may obtain different recommendation strength information for each frame in a sound analysis process. For example, in a sound analysis process, the processor 130 may obtain different recommendation strength information according to the size of sound corresponding to each frame (ie, image sub-data). For example, among the first video sub-data and the second video sub-data included in the first edit frame information 222, the volume of sound related to the first image sub-data is greater than the volume of sound related to the second image sub-data. In this case, the processor 130 may assign higher recommendation strength information to the first image sub-data than to the second image sub-data. For another example, the processor 130 may pre-map different recommendation strength information according to each recognized keyword, and correspondingly match and store different recommendation strength information for each frame. For example, when the keyword recognized in relation to the first video sub-data is 'double kill' and the keyword recognized in relation to the second video sub-data is 'pentakill', the processor 130 executes the second video sub-data Recommendation intensity information higher than that of the first image sub data may be assigned to . The detailed description of the aforementioned recommendation strength information is only an example to help understanding of the present invention, and the present invention is not limited thereto.

That is, the processor 130 may obtain different recommendation strength information for each video sub-data included in the first edit frame information 222, match the obtained recommendation strength information with each video sub-data, and store the same.

In other words, the processor 130 may obtain the first edit frame information 222 through video analysis and voice analysis of the streaming image data 220 . The first edit frame information 222 obtained by the processor 130 is a main video sub predicted to be of interest to viewers through video or audio analysis among a plurality of video sub data constituting the streaming video data 220. It may contain information about data.

Also, the processor 130 may obtain the second edit frame information 232 through reaction analysis 231 for the viewer reaction data 230 . In this case, the reaction analysis 231 may be an analysis related to a real-time variation of the viewer reaction data 230 corresponding to the video data. For example, the processor 130 may obtain the second edit frame information 232 through quantitative analysis related to the viewer response data 230 . For example, the processor 130 identifies frames related to points in time at which the viewer response data 230 is acquired above a predetermined threshold (ie, when chatting input is obtained at or above a set value) in the streaming video data, and second Edit frame information 232 may be obtained. In one embodiment, the predetermined threshold may be determined based on the number of viewers watching the streaming image data 220 . For example, at a first point in time when the number of viewers of the streaming image data 220 is 100, a predetermined threshold may be set to 70, and at a second point in time when the number of viewers is 50, a predetermined threshold may be set to 35. . That is, when chatting is input from 70 or more viewers at a first time point, the processor 130 may identify video sub data corresponding to the corresponding time point as main video sub data.

In other words, the threshold serving as a criterion for identifying the second edit frame information 232 through quantitative analysis based on the number of viewer response data may be variably adjusted based on the number of viewers watching real-time content at each point in time. . Accordingly, the processor 130 may obtain second edit frame information based on frames related to a section in which the number of viewer response data is high compared to viewers in each section.

According to an additional embodiment, the processor 130 may obtain recommendation strength information corresponding to each main image sub-data included in the second edit frame information 232 . For example, the processor 130 may obtain recommendation strength information corresponding to each image sub-data based on a difference between each main image sub-data and a predetermined threshold. For example, when the difference between the video sub data and the predetermined threshold is large, the processor 130 may obtain high recommendation strength information corresponding to the corresponding video sub data. Conversely, when the difference between the video sub-data and the predetermined threshold is small, the processor 130 may obtain low recommendation strength information corresponding to the corresponding video sub-data. The processor 130 may acquire recommendation strength information corresponding to the corresponding video sub-data as the difference between the video sub-data and the predetermined threshold increases. The detailed description of the aforementioned recommendation strength information is only an example to help understanding of the present invention, and the present invention is not limited thereto.

That is, the processor 130 may obtain the second edit frame information 232 through reaction analysis on the viewer reaction data 230 . The second edit frame information 232 obtained by the processor 130 includes information about main video sub-data related to a section in which viewers responded a lot among a plurality of video sub-data constituting the streaming video data 220. can include

According to an embodiment, the processor 130 may obtain the edited frame information 240 by integrating the first edited frame information 222 and the second edited frame information 232 . The first edit frame information 222 may include information about main video sub-data predicted to be of interest to viewers through video or audio analysis among a plurality of video sub-data constituting the streaming video data 220. And, the second edit frame information 232 may include information about main video sub-data related to a section in which viewers responded a lot among a plurality of video sub-data constituting the streaming video data.

For a specific example, the streaming image data 220 may be composed of 60 frames (ie, 60 image sub-data). In this case, the first edit frame information 222 obtained through the video analysis 221 of the streaming video data 220 includes the 5th video sub-data, the 11th video sub-data, the 17th video sub-data, and the 56th video sub-data. Information on image sub data may be included. In other words, through video analysis (ie, video or audio of streaming video data), first edit frame information 222 indicating that main video sub data identification information that is predicted to attract viewers in the future is the same as above can be obtained. have.

In addition, the second edit frame information 232 obtained through the reaction analysis 231 for the viewer response data 230 includes information on the 11th video sub-data, the 27th video sub-data, and the 57th video sub-data. can include In other words, through response analysis (ie, quantitative analysis of viewer response data), second edit frame information 232 indicating that the main video sub data identification information predicted in the section with the most viewer response is as described above is obtained. can

In this case, the processor 130 may obtain the edited frame information 240 by integrating the first edited frame information 222 and the second edited frame information 232 . For example, the integrated and generated edit frame information 240 includes the fifth sub-data, the 11th sub-data, the 17th sub-data, the 27th sub-data, the 56th sub-data, and the 57th sub-data. can include That is, the edit frame information 240 may be obtained based on main image sub data identification information included in each of the first edit frame information 222 and the second edit frame information 232 . For example, the edit frame information 240 may be generated by including all or at least some of the image sub-data included in each edit frame information. The processor 130 may obtain the edit frame information 240 based on some of a plurality of image sub-data through recommendation strength information corresponding to each image sub-data. The detailed description of the image sub-data included in the above-described edit frame information is only an example to help understanding of the present invention, and the present invention is not limited thereto.

In an additional embodiment, the processor 130 may generate weight application information related to at least one of the first edit frame information 222 and the second edit frame information 232 based on the basic content information. The weight application information may be information related to applying a weight to video sub data (or main video sub data identification information) related to specific edit frame information. Applying a weight to the specific edit frame information may mean improving the probability that the edit frame information 240 is formed through image sub-data included in the specific edit frame information. In other words, as the probability of being included in the edit frame information 240 increases, the probability of being recommended to the user may increase. That is, the weighted edit frame information may affect recommendation strength (ie, recommendation strength information) corresponding to each image sub-data.

Specifically, the processor 130 identifies basic content information corresponding to the streaming content information, and relates to at least one of the first edit frame information 222 and the second edit frame information 232 corresponding to the corresponding content basic information. Weight application information can be created.

For example, when basic content information of 'game broadcasting' is included in the streaming content information, the processor 130 provides weight application information for assigning a weight to the first edit frame information 222 based on the basic content information. can create For another example, when basic content information of 'communication broadcasting' is included in the streaming content information, the weight application information indicating that the processor 130 assigns a weight to the second edit frame information 232 based on the basic content information. can create In other words, when it is expected that main frames can be identified through image analysis of the streaming image data 220 (eg, when basic content information is a game broadcast), the processor 130 performs a first step related to the image analysis result. Weight application information for applying a weight to the edit frame information 222 may be generated. Conversely, when it is expected that the processor 130 will be able to identify important frames through analysis of viewers' reactions rather than video analysis of streaming video data (eg, when basic content information is communication broadcasting), control related to the results of reaction analysis is expected. Weight application information for applying a weight to the 2 edit frame information 232 may be generated. That is, the processor 130 identifies the basic content information and analyzes the frames identified through video analysis corresponding to specific streaming content information (ie, video sub-data included in the first edit frame information) and response (ie, Weight application information for assigning a weight to at least one of the frames identified through the image sub-data included in the first edit frame information may be generated.

According to an embodiment, the processor 130 may obtain the edited frame information 240 by integrating the first edited frame information 222 and the second edited frame information 232 based on the weight application information. When a weight is applied to specific frame information based on the weight application information, the probability that data included in the corresponding frame information is included in main image sub-data may be improved.

For example, a weight may be assigned to each of one or more image sub-data included in the first edit frame information 222 based on weight application information indicating that a weight is applied to the first edit frame information 222 . In this case, the probability that the one or more image sub-data to which a weight is assigned is obtained as the main image sub-data is greater than the one or more image sub-data to which a weight is not applied (ie, the one or more image sub-data included in the second edit frame information) this can be high For example, all of the one or more image sub-data included in the weighted first edit frame information 222 may be selected as main image sub-data, and the second edit frame information 232 to which weight is not applied Only some (that is, some but not all) of the one or more image sub-data included in may be selected as main image sub-data. In other words, in the process of selecting main video sub-data, video sub-data included in specific frame information to which weights are applied increases the probability of being selected as main video sub-data, and video sub-data included in frame information to which weight is not applied. may lower the probability of being selected as main image sub-data. The detailed description of the weight application described above is only an example, and the present invention is not limited thereto.

That is, the processor 130 may determine which of image analysis and reaction analysis to obtain main image data by applying a weight according to the type of streaming content to be transmitted. In other words, by identifying basic content information and assigning weights to video sub-data that are expected to be reliable in analysis among the video analysis results and reaction analysis results, it is possible to ensure improvement in the reliability of recommended editing points.

According to an embodiment, the processor 130 assigns a weight to the recommendation strength information based on a viewpoint similarity between one or more image sub-data included in the first edit frame information 222 and the second edit frame information 232. can do. Specifically, the processor 130 generates one or more first main video sub-data corresponding to the first edit frame information 222 and one or more second main video sub-data corresponding to the second edit frame information 232, respectively. Based on the viewpoint similarity, a weight may be assigned to recommendation strength information. For example, assigning a weight to recommendation strength information increases the degree of recommendation for specific video sub-data, and may be to recognize that the specific video sub-data is related to a high interest factor. For example, when a weight is given to recommendation strength information related to first video sub-data, the corresponding first video sub-data has a higher recommendation level than other video sub-data in the video editing process, and thus other video sub-data It can be displayed on the video editing user interface through a different color distinguished from .

For a specific example, the streaming image data 220 may be composed of 60 frames (ie, 60 image sub-data). The first edit frame information 222 obtained through the video analysis 221 of the streaming video data 220 includes the 5th video sub-data, the 11th video sub-data, the 17th video sub-data, and the 56th video sub-data. may contain information about In addition, the second edit frame information 232 obtained through the reaction analysis 231 for the viewer response data 230 includes information on the 11th video sub-data, the 27th video sub-data, and the 57th video sub-data. can include

In this case, the processor 130 applies a large weight related to '10' to the recommendation strength information of the 11th image sub-data equally included in each of the first edit frame information 222 and the second edit frame information 232. can do. In addition, recommendation intensity information related to the frame is identified by identifying similarities between the 56th video sub-data of the first edit frame information 222 and the 57th video sub-data of the second edit frame information 232 (ie, similar time points). A weight related to '8' can be assigned to . In an embodiment, when the difference in views between video sub-data included in each edit frame information is less than or equal to a predetermined threshold (eg, 3 video sub-data), the processor 130 assigns a weight to recommendation strength information of each video sub-data. can do. That is, when the video sub-data perfectly matches each edit frame information, a large weight can be assigned to the corresponding video sub-data, and when the viewpoint difference between video sub-data is relatively close to each edit frame information, each video sub-data A weight may be assigned, and when similarity in view between video sub-data is not identified in each edit frame information, no weight may be assigned. The detailed description of the above-described streaming video data, video sub-data, and video sub-data included in each frame information is only an example, and the present disclosure is not limited thereto.

That is, when the image analysis result and the reaction analysis result indicate frames related to the same or similar time points at the same time, the processor 130 may assign a weight to the recommendation strength information so that the recommendation strength related to the corresponding frame is improved. . In other words, when the result of the video analysis and the result of the reaction analysis indicate a similar or identical time point, recommendation strength information may be weighted with respect to the corresponding image sub-data.

According to an embodiment of the present invention, edit point recommendation information 610 may be generated based on edit frame information. The edit point recommendation information 610 may be information about one or more recommended edit points related to streaming video data. The edit point recommendation information 610 may include one or more edit point recommendation sub information corresponding to each of one or more recommended edit points. The edit point recommendation information 610 may be visualization information related to an edit point of streaming video data. The edit point recommendation information 610 may be characterized in that different visualization expressions are assigned to each section according to the importance of each section.

The processor 130 may search for one or more pieces of related sub-data identification information based on one or more pieces of main image sub-data identification information included in the edit frame information 240 . Also, the processor 130 may generate one or more pieces of edit point recommendation sub information based on one or more main image sub data identification information and one or more related sub data identification information. In this case, each of the one or more pieces of related sub-data identification information may include start frame information related to the start of each edit point recommendation sub information and end frame information related to the end of each edit point recommendation sub information.

That is, the processor 130 searches for one or more related sub-data identification information based on the one or more main video sub-data identification information included in the edit frame information 240, thereby editing point recommendation information including one or more edit points. can create For example, if the streaming video data is a video corresponding to 10 minutes, the editing point recommendation information includes first editing point recommendation information related to 1 minute and 30 seconds, second editing point recommendation information related to 5 minutes and 10 seconds, and 8 minutes. It may include recommendation information for a third editing point related to an invitation of 20 minutes. The numerical description of the number of one or more pieces of edit point recommendation information included in the aforementioned edit point recommendation information and the time period of each edit point recommendation information is only an example, and the present invention is not limited thereto.

As a more specific example, when there are 100 frames corresponding to the entire image data (ie, the number of a plurality of image sub-data constituting the entire image data is 100), among them, the frame in which the viewer's response has risen sharply (ie, As the image sub-data), the 34th image sub-data and the 75th image sub-data may be selected to form main image sub-data.

In this case, one or more related sub-data related to each of the 34th image sub-data and the 75th image sub-data may be identified. As one or more related sub-data of the 34th image sub-data, the 31st image sub-data to the 35th image sub-data may be identified. In this case, the start frame may be the 31st image sub data, and the end frame may be the 35th image sub data.

In addition, as one or more related sub-data of the 75th image sub-data, the 70th image sub-data and the 80th image sub-data may be identified. In this case, the start frame may be the 70th image sub-data, and the end frame may be the 80th image sub-data.

Accordingly, the processor 130 processes frames related to 31st to 35th (ie, 31st to 35th video sub-data) and 70 to 80 (ie, 70th to 70th video sub-data) among 100 total image sub-data. Edit point recommendation information 610 may be generated based on a frame related to 80 image sub data). The specific numerical description related to the image sub data described above is only an example, and the present invention is not limited thereto.

In addition, the processor 130 may generate edit point recommendation information 610 including different visual expressions for each section based on recommendation strength information corresponding to each of a plurality of image sub-data included in the edit frame information 240. have. For example, the recommendation intensity information corresponding to the first image sub-data is 'high', the recommendation intensity information corresponding to the second video sub-data is 'medium', and the recommendation intensity information corresponding to the third video sub-data is 'high'. In the case of 'low', the processor 130 displays the section corresponding to the first image sub-data in red, the section corresponding to the second image sub-data in orange, and the corresponding section to the third image sub-data. Edit point recommendation information can be created by expressing the section in yellow. The description of the color expression corresponding to each section described above is only an example, and the edit point recommendation information of the present invention can be generated by including more diverse visual expressions.

That is, the processor 130 may generate edit point recommendation information through different visual expressions according to recommendation strength information of each recommended edit point. Accordingly, when a user (ie, an editor) is provided with edit point recommendation information, a section that is strongly recommended (eg, a section marked in red) and a section that is weakly recommended (eg, a section marked in yellow) in the streaming video data ) can be visually easily recognized, so editing efficiency can be improved.

According to an embodiment of the present invention, the processor 130 may generate and provide the video editing user interface 300 based on the edit point recommendation information 610 .

Specifically, as shown in FIG. 4 , the video editing user interface 300 may include a video editing screen 340 including edit point recommendation information corresponding to streaming video data. According to an embodiment, the video editing user interface 300 may allow a user's adjustment input to the video editing screen 340 . The adjustment input may include at least one of a length adjustment input related to each edit point recommendation sub information, a removal input related to each edit point recommendation sub information, and an additional frame generation input. The video editing screen 340 may display each edit point through different visual expressions based on recommendation strength information corresponding to one or more pieces of edit point recommendation sub information. As shown in FIG. 4 , edit point recommendation information may be displayed on the video editing screen 340 . That is, each of one or more edit points included in the edit point recommendation information may be displayed on the image editing screen 340 through different visual expressions according to the recommendation strength information. Accordingly, the user can intuitively recognize information (eg, recommendation strength information) on main editing sections through edit point recommendation information (ie, visualization information), so that editing efficiency can be improved.

Depending on the embodiment, as shown in FIG. 4 , the video editing screen 340 may display at least one of chatting data change amount, video change amount, and audio data change amount corresponding to each edit point. For example, the video editing screen 340 may display information related to the number of occurrences or size of voice data of the streamer for each viewpoint corresponding to each recommended editing point (ie, one or more recommended sub-information of the editing point). For another example, the video editing screen 340 may display information related to whether a specific video event related to the streamer's play for each viewpoint has occurred in correspondence to each recommended editing point. For another example, information related to the amount of change in the number of chatting inputs of viewers for each viewpoint may be displayed on the video editing screen 340 in correspondence with each recommended editing point. The detailed description of the information displayed on the above-described video editing screen is only an example, and the present invention is not limited thereto.

Also, the video editing user interface 300 may include a video data playback screen 310 . The video data playback screen 310 may be a screen for reproducing at least a portion of streaming video data. The video data playback screen 310 may reproduce a section of streaming video data in response to a user's manipulation on the video editing screen 340 . For example, when the user selects edit point recommendation sub information corresponding to the first section displayed through high recommendation strength on the video editing screen 340, the first section of streaming video data is reproduced on the video data playback screen 310. can be displayed. According to the embodiment, the video data playback screen 310 may display a video frame corresponding to a user's input to increase or decrease a specific recommended editing section on the video editing screen 340 . That is, the user may be provided with a video related to the recommended editing point through the video data reproduction screen 310 . Accordingly, since images related to each recommended editing point can be easily recognized, editing efficiency can be improved. Additionally, in the process of finely editing a specific editing section, the user's video editing efficiency can be further improved by displaying an image frame corresponding to the user's adjustment input on the image data reproduction screen 310 .

Also, the video editing user interface 300 may include an event display screen 320 . The event display screen 320 may display information about a start frame and an end frame of each of a plurality of edit point recommendation sub information included in the edit point recommendation information. Also, the event display screen 320 may display information about an event type corresponding to each edit point recommendation sub information. In this case, the event type may be identified through video analysis related to video analysis and audio analysis. For example, as shown in FIG. 4 , on the event display screen 320, event type information indicating that a specific edit point recommendation (ie, specific edit point recommendation sub information) corresponds to 'engagement' in the game and that the corresponding event is 01 It can be displayed that it progressed from :05:55 to 01:06:10. The specific description related to the information displayed on the above-described event display screen is only an example, and the present invention is not limited thereto. That is, through the information displayed on the event display screen 320, the user can easily grasp the type of event related to each section in the editing process. In other words, based on the information provided through the event display screen 320, the user can easily identify only the necessary section, so editing efficiency can be improved, such as reducing editing time.

Also, the video editing user interface 300 may include a reference information display screen 330 . As shown in FIG. 4 , the reference information display screen 330 may display information related to a confidence level for each section and frequently exposed keywords for each section. Here, the frequently exposed keyword for each section may be related to a keyword most exposed as a result of sound analysis related to a streamer or a voice related to a game. In addition, frequently exposed keywords for each section may be related to keywords exposed the most as a result of viewer response analysis (ie, keywords exposed in chatting windows). That is, the reference information display screen 330 may display information related to keywords frequently used by streamers for each section during broadcasting or keywords frequently mentioned by viewers for each section during broadcasting. This can improve convenience in the process of searching for a specific section by providing information on keywords for each section in the editing process of streaming video data.

As described above, by providing the video editing user interface 300 including edit point recommendation information, it is possible to improve the user's editing efficiency. Additionally, through various screens included in the video editing user interface 300, it is possible to recognize the reaction of each section of the viewer terminals who watched real-time content, and based on this, it is possible to provide an effect of improving the efficiency of editing video production. . In other words, it is possible to present a direction of individuality for content by making it easier to recognize viewers' reactions through section-by-section reactions, and overall editing efficiency can be improved by reducing the editor's editing time.

According to an embodiment of the present invention, the processor 130 may acquire a plurality of streaming content information and a plurality of editing history information corresponding to each of a plurality of users to build an editing style database. Here, a plurality of users may mean a plurality of streamers each transmitting various real-time contents such as video and audio contents through a streaming server. Streaming content information may include basic content information and streaming data. Basic content information may include information related to broadcast content of a user (ie, a streamer). For example, the basic content information may include information indicating that the streaming image data of the first user is related to at least one of game broadcasting content, outdoor broadcasting content, and communication broadcasting content. Also, for example, the basic content information may further include information related to the name, age, and gender of a user (or streamer) transmitting streaming video data. The detailed description of the above-described basic content information is only an example, and the present invention is not limited thereto.

Streaming data may include streaming image data related to broadcast content and viewer response data related to reactions of a plurality of viewers watching the corresponding image data. The streaming image data may be data related to an image comprising a plurality of image sub-data as a plurality of frames. Viewer reaction data may be data related to reactions of one or more viewers who watch streaming video data transmitted in real time. For example, the viewer response data may include at least one of information on the number of viewers watching the streaming video data, information on chatting frequency related to the streaming video data, chatting keyword information related to the streaming video data, and donation information related to the streaming video data. can include For example, the viewer response data is information about a chat input received through a chat window from a first viewer terminal in relation to a first point in streaming video data transmitted in real time by a user terminal (ie, a streamer terminal) can be For another example, the viewer response data may be information about a donation input received from a second viewer terminal in relation to a second viewpoint in streaming video data transmitted in real time by a user terminal (ie, a streamer terminal). The detailed description of the above-described viewer response data is only an example, and the present invention is not limited thereto.

In an embodiment, each of the pieces of edit history information may include edit point recommendation information 610 and edit point correction information corresponding to the edit point recommendation information 610 . The editing history information may include edit point recommendation information 610 provided from the server 100 of the present invention and edit point correction information in which the user actually decides to edit based on the corresponding edit point recommendation information 610. . Such editing history information may be meaningful information for understanding each user's editing style.

According to an embodiment, the edit point recommendation information 610 may be visualization information obtained through video analysis and reaction analysis on streaming content information. The edit point recommendation information 610 may be visualization information in which each of one or more edit points is expressed through different visual expressions according to recommendation strength information. As a specific example, as shown in FIG. 7 , the edit point recommendation information 610 may be visualization information displayed in different colors according to the strength of recommendation for each section corresponding to all streaming video data.

That is, the edit point recommendation information 610 is recommendation information related to a highlight section in which the fun factor appears most prominently among all frames of the streaming video data. It may be visualization information expressed differently from other sections.

The edit point correction information 620 may be generated through correction (or modification) of the edit point recommendation information 610 . Specifically, the edit point recommendation information 610 may be included in the video editing user interface 300 and provided to the user, and may be changed according to various adjustment inputs of the user (eg, deletion, section length adjustment, addition, etc.) have. For example, the user may delete a section from the recommended edit point recommendation information 610 corresponding to the streaming video data through an adjustment input through the video editing user interface 300 . For another example, the user may adjust the length of one section or change the recommendation strength in the recommended editing point recommendation information 610 in response to the streaming video data through an adjustment input through the video editing user interface 300. . For another example, the user may add a new frame as a main section in the recommended editing point recommendation information 610 corresponding to the streaming video data through the video editing user interface 300 . That is, as various user adjustment inputs are applied to the edit point recommendation information 610 through the video editing user interface 300 , the edit point correction information 620 may be generated. In other words, the edit point correction information 620 may be related to information on which the user has actually decided to edit, corresponding to the recommended edit point recommendation information 610 . For a more specific example, the edit point correction information 620 acquired in correspondence with the edit point recommendation information 610 may be as shown in FIG. 7 .

According to an embodiment of the present invention, the processor 130 may perform clustering on the editing style database. The processor 130 may classify each of a plurality of pieces of streaming content information and a plurality of pieces of editing history information into one or more clusters, respectively. Here, each of one or more clusters may be a criterion for classifying the editing style of each of a plurality of users.

More specifically, the processor 130 may perform pre-processing on data included in the editing style database. The editing style database may include a plurality of streaming content data and a plurality of editing history information corresponding to each of a plurality of users. The processor 130 determines various elements constituting the plurality of streaming content data and the plurality of editing history information (eg, broadcast type, streamer's gender, age, viewer information, number of edit point marker manipulations, difference between editing history information, etc. ), data corresponding to each user (ie, streaming content data and editing history information) may be vectorized. For example, the processor 130 may vectorize and display data corresponding to each user in n*m dimensions through a dimension reduction network function (eg, an encoder). The dimensionality reduction network function may extract features (ie, embeddings on a vector space) by taking each user's data as an input. Also, the processor 130 may determine an optimal feature through principle component analysis (PCA).

Also, the processor 130 may perform clustering by classifying features embedded in the vector space into one or more clusters based on a k-means algorithm. For example, the processor 130 may set k centroids based on initial clusters formed by vectorized features corresponding to a plurality of elements constituting streaming content data and editing history information. After setting k centroids, the processor 130 may allocate centroids based on distances between clusters formed by each element. In other words, each centroid can be assigned to a position close to each element. Thereafter, the processor 130 may update each centroid by moving each centroid to the center of the cluster corresponding to each cluster. The processor 130 may optimize the algorithm by repeating the process of assigning and updating centroids until the cluster assignment does not change or until a predetermined tolerance or maximum number of iterations is reached. For example, the processor 130 may perform optimization by identifying that the allowable error value for the amount of change returns to within a certain level while repeatedly calculating the sum of squared errors whenever the centroid changes. Through the above process, the processor 130 may classify a plurality of streaming content data and a plurality of editing history information into one or more clusters, respectively. In the above description, it is described that the processor performs clustering based on the k-means algorithm, but this is only an example and the present invention is not limited thereto. For example, the clustering of the present invention may be performed through DBSCAN or Gaussian Mixture Model, which allocates clusters based on density.

According to another embodiment, the processor 130 may classify a plurality of streaming content data and a plurality of editing history information into one or more clusters, respectively, by utilizing a classification model. A method of performing clustering using a classification model will be described below with reference to FIG. 5 .

The classification model of the present invention can be learned by the processor 130 to form a cluster between similar data on the sea space 400 . More specifically, in the classification model, target data 401 and target similar data 402 are included in one cluster 410, and target dissimilar data 403 is different from target data 401 and target similar data 402. It can be learned to be included in different clusters. In the sea space of the learned classification model, each cluster may be positioned to have a certain distance margin 420 .

The classification model receives a subset of learning data including target data 401, target similar data 402, and target dissimilar data 403, maps each data to the sea space, and obtains labeled cluster information in the sea space The weights of one or more network functions included in the classification model can be updated so that they can be clustered according to . That is, in the classification model, the target data 401, the target similar data 402, and the target dissimilar data 403 such that the distances of the target data 401 and the target similar data 402 in the sea space become close to each other. ) can be trained so that the distances in the solution space between them become farther apart from each other. Classification models can be trained using, for example, triplet-based cost functions. The triplet-based cost function aims to separate pairs of input data of the same class from third input data of different classes, and determines the first distance between pairs of input data of the same class (i.e., the size of the cluster 410); A difference value between a second distance (ie, 401 or the distance between 402 and 403) between one of the pairs of input data of the same classification and the third input data is at least a distance margin 420, and the classification model is learned. A method of doing this includes reducing the first distance below a certain percentage of the distance margin. Here, the distance margin 420 can always be a positive number. Weights of one or more network functions included in the classification model may be updated to reach the distance margin 420, and the weight update may be performed per iteration or per epoch.

In addition, the classification model can be learned as a magnet loss-based model that can consider not only cluster classification of dissimilar data, but also a semantic relationship between individual data in one cluster or another cluster. may be The initial distance between the center points of each cluster on the solution space of the classification model may be modified in the learning process. After mapping the data on the sea space, the classification model may adjust the location of each data on the sea space based on the cluster to which each data belongs and the similarity with the data inside and outside the cluster.

That is, the processor 130 may train a classification model to classify a plurality of streaming content data and a plurality of editing history information into one or more clusters, respectively.

The sea space 400 shown in FIG. 5 is only an example, and the classification model may include an arbitrary number of clusters and an arbitrary number of data for each cluster. The shape of the data (431, 433, 441, 443, etc.) included in the cluster shown in FIG. 5 is only an example to indicate similar data.

In the present disclosure, the sea space is composed of one or more dimensional space and includes one or more clusters, and each cluster is configured based on the location of feature data based on each target data and feature data based on target similar data in the sea space. It can be.

In the sea space, the first cluster 430 and the second cluster 440 may be clusters for dissimilar data. Also, the third cluster 450 may be a cluster for data dissimilar to the first and second clusters. The

distances

445 and 435 between clusters may be measures representing differences in data belonging to each cluster.

The twelfth distance 445 between the first cluster 430 and the second cluster 440 may be a measure representing a difference between data belonging to the first cluster 430 and data belonging to the second cluster 440 . Also, the thirteenth distance 435 between the first cluster 430 and the second cluster 440 may be a measure representing a difference between data belonging to the first cluster 430 and data belonging to the third cluster 450. have. In the example shown in FIG. 5 , data belonging to the first cluster 430 may be more similar to data belonging to the second cluster 440 than data belonging to the third cluster 450 . That is, when the distance between clusters is long, data belonging to each cluster may be more dissimilar, and when the distance between clusters is short, data belonging to each cluster may be less dissimilar. The

distances

435 and 445 between the clusters may be greater than a predetermined ratio or greater than the radius of the clusters. The processor 130 calculates input data (ie, a plurality of streaming content information and a plurality of editing history information corresponding to each user) using a classification model, whereby the feature data of the input data is mapped to the sea space of the classification model. Based on the input data can be classified.

The processor 130 may process input data using a pre-learned classification model, thereby mapping feature data of the input data to the solution space of the pre-learned classification model. The processor 130 may classify the input data based on which cluster among one or more clusters in the sea space the input data belongs to based on the location of the input data in the sea space.

In other words, the processor 130 may generate one or more clusters by clustering a plurality of streaming content data and a plurality of editing history information through the learned classification model. Each of one or more clusters may be a criterion for classifying editing styles of each of a plurality of users. That is, each cluster may be associated with each of various editing styles.

For example, one or more clusters generated according to classification through a classification model may be as shown in FIG. 6 . One or more clusters may include edit type A (511), edit type B (512), and edit type C (513). A plurality of streaming content data and a plurality of edit history information may be classified into edit type A 511 , edit type B 512 , and edit type C 513 through a classification model. In this case, the editing type A 511 may include data related to a broadcasting type of 'League of Legends', an hourly chatting frequency of 10000 to 20000, and an editing operation degree of 'low'. In addition, the editing type B 512 may include data related to a broadcasting type of 'battleground', a chatting frequency per hour of 2000 to 4000, and an editing manipulation degree of 'medium'. In addition, editing type C (513) may include data related to a broadcasting type of 'communication', a chatting frequency per hour of 300 to 500, and an editing operation degree of 'high'. The detailed description related to each cluster (or each editing type) described above is only an example, and the present invention is not limited thereto.

That is, the processor 130 may perform clustering by classifying each of the plurality of streaming content information and the plurality of editing history information into one or more clusters.

According to an embodiment of the present invention, the processor 130 may generate an editing style classification model based on a clustering result.

Specifically, the processor 130 generates a plurality of learning input data based on a plurality of streaming content information and a plurality of editing history information, and based on each of one or more clusters corresponding to each streaming content information and a plurality of editing history information. Thus, a plurality of learning output data can be generated. In addition, the processor may build a learning data set by matching and labeling each learning output data set corresponding to each learning input data. The processor 130 may generate an editing style classification model by performing learning on one or more network functions through a training data set. Accordingly, the generated editing style classification model may derive a specific cluster (ie, editing style information) by taking streaming content information and editing history information of a specific user as inputs.

According to an embodiment of the present invention, the processor 130 may generate one or more style edit point recommendation models corresponding to each cluster based on a plurality of edit history information included in each cluster. More specifically, the editing history information may include edit point recommendation information 610 and edit point correction information 620 corresponding to the edit point recommendation information 610 . The editing history information includes edit point recommendation information 610 provided from the server 100 of the present invention and edit point correction information 620 in which the user actually confirmed editing based on the corresponding edit point recommendation information 610. can do. Such editing history information may be meaningful information for understanding each user's editing style.

The processor 130 may construct a plurality of learning input data based on a plurality of edit point recommendation information included in each cluster, and may construct a plurality of learning output data based on a plurality of edit point correction information. In addition, the processor 130 may construct a learning data set corresponding to each cluster by matching and labeling a plurality of learning output data corresponding to a plurality of learning input data. That is, a learning data set related to edit point recommendation information and edit point correction information can be constructed for each cluster. The processor 130 may generate one or more style edit point recommendation models by performing learning on the neural network through each of one or more training data sets corresponding to each cluster.

For example, the processor 130 may generate one or more style edit point recommendation models corresponding to each cluster (or each edit type) as shown in FIG. 6 . The processor 130 generates a first style edit point recommendation model 521 corresponding to a cluster related to edit type A 511, and a second style edit point recommendation model corresponding to a cluster related to edit type B 512. 522 may be generated, and a third style edit point recommendation model 523 may be generated corresponding to the cluster related to the edit type C 513 . In this case, each style edit point recommendation model may be a neural network model generated by learning using data included in each cluster as training data. That is, each style edit point recommendation model can perform a more appropriate prediction with respect to each cluster (ie, each edit type). That is, each of the one or more style edit point recommendation models generated corresponding to each cluster may recommend different edit frames corresponding to one piece of streaming content information. The specific description of the above-described editing type is only an example, and the present invention is not limited thereto.

According to an embodiment of the present invention, the processor 130 may obtain first streaming content information of a first user. The first streaming content information may include first content basic information and first streaming data related to the first user. The first streaming data may include first streaming image data and first viewer response data.

The processor 130 may generate first edit point recommendation information based on the first streaming content information. The processor 130 obtains first editing frame information related to a recommended editing point in the video through video analysis (including video analysis and sound analysis) related to the first streaming video data and reaction analysis on the first viewer response data. and first edit point recommendation information may be generated based on the corresponding first edit frame information. In this case, the first edit point recommendation information may be visualization information displayed in different colors according to the recommendation intensity for each section corresponding to the entire streaming video data.

In addition, the processor 130 may identify a first cluster corresponding to the first streaming content information by utilizing an editing style classification model. The editing style classification model may be a neural network model trained to derive a first cluster (ie, first editing style information) corresponding to streaming content information of a specific user as an input.

The processor 130 may perform correction on the first edit point recommendation information by utilizing the first style edit point recommendation model corresponding to the first cluster. The first style edit point recommendation model may be a neural network model learned through the training data set included in the first cluster. That is, the first-style edit point recommendation model may perform a more appropriate prediction with respect to the first cluster.

The processor 130 may obtain corrected first edit point recommendation information by inputting the first edit point recommendation information to the first style edit point recommendation model corresponding to the first cluster. That is, corrected first edit point recommendation information corresponding to first edit point recommendation information, which is visualization information recommended through image analysis and reaction analysis, may be derived. That is, first, through video analysis and response analysis, first edit point recommendation information related to editing point recommendation is generated, and then, a neural network model corresponding to a cluster having a similar editing style to the corresponding user (i.e., style editing Corrected first edit point recommendation information may be generated by secondarily correcting the first edit point recommendation information through the point recommendation model). In this case, the corrected first edit point recommendation information is derived through a neural network model that has completed learning through learning data included in the first cluster, and may reflect an editing style corresponding to the first cluster. Also, the processor 130 may generate and provide an image editing user interface based on the corrected first edit point recommendation information.

The video editing user interface 300 may include a video editing screen 340 including edit point recommendation information, and allow a user's adjustment input to the video editing screen 340. That is, through video analysis and reaction analysis, a recommended editing point is derived from the video data, and the recommended editing point is corrected and provided based on information of other users who have a similar editing style to the user. can do. Accordingly, since the user can be provided with more sophisticated recommended editing points, editing efficiency can be maximized.

According to an embodiment of the present invention, the processor 130 obtains first edit point correction information corresponding to the corrected first edit point recommendation information based on a user's adjustment input to the video editing user interface 300. can The adjustment input may include, for example, at least one of a frame length adjustment input, a frame removal input, and an additional frame generation input. That is, the first edit point correction information may be visualization information about edit contents actually determined by the first user in correspondence to the recommended edit points (ie, corrected first edit point recommendation information). The first edit point correction information may include visualization information different from the edit point recommendation information corrected as existing recommended edit points are adjusted or deleted or new edit points are inserted.

The processor 130 may obtain first editing history information related to the first user based on the corrected first edit point recommendation information and the first edit point correction information. Also, the processor 130 may build a first user database through the first editing history information. In addition, the processor 130 may generate a customized edit point recommendation model by performing an update on the first style edit point recommendation model by utilizing the first user database. In this case, the custom edit point recommendation model may be a neural network model additionally learned based on continuously accumulated editing history information related to the first user (ie, an additionally learned style edit point recommendation model). That is, each user's editing history content is continuously accumulated in each user's database, and as the editing history content of the corresponding user is continuously reflected in the learning of the neural network model, the edit point recommendation model can be advanced over time. As the neural network is advanced, each of a plurality of users may be provided with a recommended editing point optimized for them. This can provide an effect of improving editing efficiency by causing a significant reduction in editing time. A detailed description of a method for providing an advanced neural network through accumulated information corresponding to each user will be described later.

According to an embodiment of the present invention, the processor 130 may acquire a plurality of streaming content information and a plurality of editing history information corresponding to each of a plurality of users to build an editing style database. Streaming content information may include basic content information and streaming data. Basic content information may include information related to broadcast content of a user (ie, a streamer). For example, the basic content information may include information indicating that the streaming image data of the first user is related to at least one of game broadcasting content, outdoor broadcasting content, and communication broadcasting content. Also, for example, the basic content information may further include information related to the name, age, and gender of a user (or streamer) transmitting streaming video data. The detailed description of the above-described basic content information is only an example, and the present invention is not limited thereto.

In one embodiment, each of the plurality of pieces of edit history information may include edit point recommendation information 610 and edit point correction information 620 corresponding to the edit point recommendation information 610 . The editing history information includes edit point recommendation information 610 provided from the server 100 of the present invention and edit point correction information 620 in which the user actually confirmed editing based on the corresponding edit point recommendation information 610. can do. Such editing history information may be meaningful information for understanding each user's editing style.

The edit point correction information 620 may be generated through correction (or correction) of the edit point recommendation information 610 . Specifically, the edit point recommendation information 610 may be included in the video editing user interface 300 and provided to the user, and may be changed according to various adjustment inputs of the user (eg, deletion, section length adjustment, addition, etc.) have. For example, the user may delete a section from the recommended edit point recommendation information 610 corresponding to the streaming video data through an adjustment input through the video editing user interface 300 . For another example, the user may adjust the length of a section or change the recommendation strength in the recommended edit point recommendation information 610 in response to the streaming video data through an adjustment input through the video editing user interface 300. . For another example, the user may add a new frame as a main section in the recommended editing point recommendation information 610 corresponding to the streaming video data through the video editing user interface 300 . That is, edit point correction information 620 may be generated as a user's various adjustment inputs are applied to the edit point recommendation information 610 through the video editing user interface 300 . In other words, the edit point correction information 620 may be related to information on which the user has actually decided to edit, corresponding to the recommended edit point recommendation information 610 . For a more specific example, the edit point correction information 620 acquired in correspondence with the edit point recommendation information 610 may be as shown in FIG. 7 .

According to an embodiment of the present invention, the processor 130 may obtain a training data set through an editing style database.

According to an embodiment, the processor 130 may generate a plurality of learning input data based on a plurality of edit point recommendation information corresponding to each user. The processor 130 may generate a plurality of learning output data based on a plurality of edit point correction information corresponding to each user.

In addition, the processor 130 may build one or more subsets of learning data for each user by matching each learning output data corresponding to each learning input data. The processor 130 may match learning input data and learning output data corresponding to each user.

In this case, the learning input data may be characterized as a feature corresponding to a pixel value of the edit point recommendation information, and the learning output data may be characterized as a feature corresponding to a pixel value of the edit point correction information. For example, each of the plurality of learning input data may be a plurality of features (n*m) corresponding to pixel values of each of the plurality of edit point recommendation information, and each of the plurality of learning output data may be each of the plurality of edit point correction information. It may be a plurality of features (n*m) corresponding to pixel values of .

Additionally, the processor 130 may obtain a plurality of pieces of editing style information corresponding to each of a plurality of users. A plurality of pieces of editing style information corresponding to each of a plurality of users may be obtained through an editing style classification model. The editing style classification model may derive a specific cluster (ie, editing style information) by taking streaming content information and editing history information of a specific user as inputs. For example, the editing style classification model may classify the corresponding first user as editing type A based on the first user's streaming content information. That is, editing style information of the first user called editing type A may be obtained. The specific description of the above-described editing style information is only an example, and the present invention is not limited thereto.

According to another embodiment, the processor 130 may generate learning input data based on a plurality of edit point recommendation information and a plurality of editing style information corresponding to each user. The processor 130 may generate a plurality of learning output data based on a plurality of edit point correction information corresponding to each user.

In addition, the processor 130 may build one or more subsets of learning data for each user by matching each learning output data corresponding to each learning input data.

In this case, the learning input data is a feature corresponding to the pixel value of the edit point recommendation information and the feature value of the editing style information, and the learning output data is a feature corresponding to the pixel value of the edit point correction information. can be characterized. For example, each of the plurality of learning input data may be a plurality of features (n*m+1) corresponding to a pixel value of each of a plurality of edit point recommendation information and a feature value of each of a plurality of editing style information. Each of the output data may be a plurality of features (n*m) corresponding to a pixel value of each of a plurality of edit point correction information. Data related to input may be n*m+1 as editing style information is additionally considered, and data related to output may be n*m corresponding to image pixels.

One or more subsets of learning data for each user may be generated by the processor 130 . For example, the first training data subset may be a set of data related to the first user (eg, first edit point recommendation information and first edit point correction information related to the first user), and the second training data subset may be a set of data related to the second user (eg, second edit point recommendation information and second edit point correction information related to the second user).

According to an embodiment of the present invention, the processor 130 may generate a plurality of customized edit point recommendation models corresponding to each of a plurality of users. Specifically, the processor 130 may generate a plurality of customized edit point recommendation models corresponding to each of a plurality of users by performing learning on one or more network functions through a training data set. In detail, by performing learning on each neural network through each training data subset corresponding to each user, a plurality of customized edit point recommendation models may be generated in correspondence with each user. For example, the processor 130 may generate a first customized edit point recommendation model by performing learning on a neural network through a first training data subset related to a first user.

That is, each of the plurality of custom edit point recommendation models may be a neural network model that derives corrected edit point recommendation information corresponding to the edit point recommendation information by performing image analysis on the edit point recommendation information corresponding to each of the plurality of users. have.

In this case, each custom edit point recommendation model is learned through different training data sub-data, and even if the same edit point recommendation information is used as an input, different outputs (ie, edit point corrections) can be derived. In other words, the processor 130 may provide a custom editing point recommendation model that provides more appropriate editing point recommendations to each individual as it is learned through data related to each user.

According to an embodiment, the processor 130 may obtain first streaming content information and first editing style information of a first user. The first streaming content information may include first content basic information and first streaming data related to the first user. The first streaming data may include first streaming image data and first viewer response data.

The processor 130 may obtain first edit point recommendation information based on the first streaming content information. Specifically, the processor 130 may generate first edit point recommendation information based on the first streaming content information. The processor 130 obtains first editing frame information related to a recommended editing point in the video through video analysis (including video analysis and sound analysis) related to the first streaming video data and reaction analysis on the first viewer response data. and first edit point recommendation information may be generated based on the corresponding first edit frame information. In this case, the first edit point recommendation information may be visualization information displayed in different colors according to the recommendation intensity for each section corresponding to the entire streaming video data.

The processor 130 may generate corrected first edit point recommendation information by using the first edit point recommendation information as an input of a first customized edit point recommendation model corresponding to the first user. In this case, since the corrected first edit point recommendation information is derived through the first customized edit point recommendation model corresponding to the first user, the first edit point recommendation is based on the accumulated editing style related to the first user. Correction of information may be performed. That is, the processor 130 may provide each user with a neural network model that corrects an optimal edit point for each user through learning based on accumulated data. Accordingly, since the user can receive edit point recommendation information reflecting the user's existing editing style, editing efficiency can be improved.

According to another embodiment, the processor 130 may obtain first streaming content information and first editing style information of a first user. Also, the processor 130 may obtain first edit point recommendation information based on the first streaming content information.

The processor 130 may obtain corrected first edit point recommendation information by using the first streaming content information and the first editing style information as inputs of a first customized edit point recommendation model corresponding to the first user. In this case, the first customized edit point recommendation model may be a neural network model learned through learning input data related to n*m+1 and learning output data related to n*m as additionally considered editing style information. In this case, the corrected first edit point recommendation information may be generated based on information on editing styles of other users similar to the first user and accumulated editing styles related to the first user.

That is, when acquiring edit point recommendation information and editing style information of a specific user, the processor 130 may perform correction on the edit point recommendation information based on each variable. This correction is performed by a neural network model (ie, a custom edit point recommendation model) trained in response to each user, and as editing style information is also considered as a variable in learning the neural network, a more suitable edit point is recommended for each user. can do. In other words, the corrected edit points may be generated by considering information on the editing styles of other users similar to each user and the accumulated editing styles related to each user.

Additionally, as the custom edit point recommendation model customized to the individual is advanced over time, it may cause optimization of the recommended edit point.

As described above, the processor 130 may continuously refine recommended edit points through accumulated data. Accordingly, the quality of service may increase, which may cause expansion of the scale of users. As the amount of data accumulated through the expansion of the user scale increases, it becomes possible to build big data, which can form a virtuous cycle system in which the quality of service is improved by further elaborating the recommended edit points.

8 is a flowchart exemplarily illustrating a method for recommending video edit points based on streaming data related to an embodiment of the present invention.

According to an embodiment of the present invention, the method may include obtaining streaming content information (S110).

According to an embodiment of the present invention, the method may include obtaining edit frame information based on streaming content information (S120).

According to an embodiment of the present invention, the method may include generating edit point recommendation information based on edit frame information (S130).

The order of the steps shown in FIG. 8 described above may be changed as needed, and at least one or more steps may be omitted or added. That is, the above steps are only one embodiment of the present invention, and the scope of the present invention is not limited thereto.

Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. A neural network may consist of a set of interconnected computational units, which may be generally referred to as “nodes”. These “nodes” may also be referred to as “neurons”. A neural network includes one or more nodes. Nodes (or neurons) constituting neural networks may be interconnected by one or more “links”.

In a neural network, one or more nodes connected through a link may form a relative relationship of an input node and an output node. The concept of an input node and an output node is relative, and any node in an output node relationship with one node may have an input node relationship with another node, and vice versa. As described above, an input node to output node relationship may be created around a link. More than one output node can be connected to one input node through a link, and vice versa.

In a relationship between an input node and an output node connected through one link, the value of the output node may be determined based on data input to the input node. Here, a node interconnecting an input node and an output node may have a weight. The weight may be variable, and may be changed by a user or an algorithm in order to perform a function desired by the neural network. For example, when one or more input nodes are interconnected by respective links to one output node, the output node is set to a link corresponding to values input to input nodes connected to the output node and respective input nodes. An output node value may be determined based on the weight.

As described above, in the neural network, one or more nodes are interconnected through one or more links to form an input node and output node relationship in the neural network. Characteristics of the neural network may be determined according to the number of nodes and links in the neural network, an association between the nodes and links, and a weight value assigned to each link. For example, when there are two neural networks having the same number of nodes and links and different weight values between the links, the two neural networks may be recognized as different from each other.

A neural network may include one or more nodes. Some of the nodes constituting the neural network may configure one layer based on distances from the first input node, for example, a set of nodes having a distance n from the first input node, n layers can be configured. The distance from the first input node may be defined by the minimum number of links that must be passed through to reach the corresponding node from the first input node. However, the definition of such a layer is arbitrary for explanation, and the order of a layer in a neural network may be defined in a method different from the above. For example, a layer of nodes may be defined by a distance from a final output node.

An initial input node may refer to one or more nodes to which data is directly input without going through a link in relation to other nodes among nodes in the neural network. Alternatively, in a relationship between nodes based on a link in a neural network, it may mean nodes that do not have other input nodes connected by links. Similarly, the final output node may refer to one or more nodes that do not have an output node in relation to other nodes among nodes in the neural network. Also, the hidden node may refer to nodes constituting the neural network other than the first input node and the last output node. In the neural network according to an embodiment of the present invention, the number of nodes in the input layer may be the same as the number of nodes in the output layer, and the number of nodes decreases and then increases again as the number of nodes progresses from the input layer to the hidden layer. can In addition, the neural network according to another embodiment of the present invention may be a neural network in which the number of nodes of the input layer may be less than the number of nodes of the output layer, and the number of nodes decreases as the number of nodes increases from the input layer to the hidden layer. have. In addition, the neural network according to another embodiment of the present invention is a type of neural network in which the number of nodes in the input layer may be greater than the number of nodes in the output layer, and the number of nodes increases as the number of nodes progresses from the input layer to the hidden layer. can A neural network according to another embodiment of the present invention may be a neural network in the form of a combination of the aforementioned neural networks.

A deep neural network (DNN) may refer to a neural network including a plurality of hidden layers in addition to an input layer and an output layer. Deep neural networks can reveal latent structures in data. In other words, it can identify the latent structure of a photo, text, video, sound, or music (e.g., what objects are in the photo, what the content and emotion of the text are, what the content and emotion of the audio are, etc.). . Deep neural networks include convolutional neural networks (CNNs), recurrent neural networks (RNNs), auto encoders, generative adversarial networks (GANs), and restricted boltzmann machines (RBMs). machine), deep belief network (DBN), Q network, U network, Siamese network, and the like. The description of the deep neural network described above is only an example, and the present invention is not limited thereto.

The neural network may be trained using at least one of supervised learning, unsupervised learning, and semi-supervised learning. The learning of neural networks is to minimize errors in the output. In the learning of the neural network, the learning data is repeatedly input into the neural network, the output of the neural network for the training data and the error of the target are calculated, and the error of the neural network is transferred from the output layer of the neural network to the input layer in the direction of reducing the error. It is a process of updating the weight of each node of the neural network by backpropagating in the same direction. In the case of teacher learning, the learning data in which the correct answer is labeled is used for each learning data (ie, the labeled learning data), and in the case of comparative teacher learning, the correct answer may not be labeled in each learning data. That is, for example, learning data in the case of teacher learning regarding data classification may be data in which each learning data is labeled with a category. Labeled training data is input to the neural network, and an error may be calculated by comparing an output (category) of the neural network and a label of the training data. As another example, in the case of comparative history learning for data classification, an error may be calculated by comparing input learning data with a neural network output. The calculated error is back-propagated in a reverse direction (ie, from the output layer to the input layer) in the neural network, and the connection weight of each node of each layer of the neural network may be updated according to the back-propagation. The amount of change in the connection weight of each updated node may be determined according to a learning rate. The neural network's computation of input data and backpropagation of errors can constitute a learning cycle (epoch). The learning rate may be applied differently according to the number of iterations of the learning cycle of the neural network. For example, a high learning rate may be used in the early stage of neural network training to increase efficiency by allowing the neural network to quickly obtain a certain level of performance, and a low learning rate may be used in the late stage to increase accuracy.

In neural network learning, generally, training data can be a subset of real data (ie, data to be processed using the trained neural network), and therefore, errors for training data are reduced, but errors for real data are reduced. There may be incremental learning cycles. Overfitting is a phenomenon in which errors for actual data increase due to excessive learning on training data. For example, a phenomenon in which a neural network that has learned a cat by showing a yellow cat does not recognize that it is a cat when it sees a cat other than yellow may be a type of overfitting. Overfitting can act as a cause of increasing the error of machine learning algorithms. Various optimization methods can be used to prevent such overfitting. In order to prevent overfitting, methods such as increasing training data, regularization, and omitting some nodes of a network in the process of learning may be applied.

Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. (Hereinafter, it is unified and described as a neural network.) The data structure may include a neural network. And the data structure including the neural network may be stored in a computer readable medium. The data structure including the neural network may also include data input to the neural network, weights of the neural network, hyperparameters of the neural network, data acquired from the neural network, an activation function associated with each node or layer of the neural network, and a loss function for learning the neural network. have. A data structure including a neural network may include any of the components described above. In other words, the data structure including the neural network includes data input to the neural network, weights of the neural network, hyperparameters of the neural network, data obtained from the neural network, activation function associated with each node or layer of the neural network, and loss function for training the neural network. It may be configured to include any combination of. In addition to the foregoing configurations, the data structure comprising the neural network may include any other information that determines the characteristics of the neural network. In addition, the data structure may include all types of data used or generated in the computational process of the neural network, but is not limited to the above. A computer readable medium may include a computer readable recording medium and/or a computer readable transmission medium. A neural network may consist of a set of interconnected computational units, which may generally be referred to as nodes. These nodes may also be referred to as neurons. A neural network includes one or more nodes.

The data structure may include data input to the neural network. A data structure including data input to the neural network may be stored in a computer readable medium. Data input to the neural network may include training data input during a neural network learning process and/or input data input to a neural network that has been trained. Data input to the neural network may include pre-processed data and/or data subject to pre-processing. Pre-processing may include a data processing process for inputting data to a neural network. Accordingly, the data structure may include data subject to pre-processing and data generated by pre-processing. The above data structure is only an example and the present invention is not limited thereto.

The data structure may include the weights of the neural network. (In this specification, weights and parameters may be used in the same meaning.) Also, a data structure including weights of a neural network may be stored in a computer readable medium. A neural network may include a plurality of weights. The weight may be variable, and may be changed by a user or an algorithm in order to perform a function desired by the neural network. For example, when one or more input nodes are interconnected by respective links to one output node, the output node is set to a link corresponding to values input to input nodes connected to the output node and respective input nodes. An output node value can be determined based on the parameter. The above data structure is only an example and the present invention is not limited thereto.

As a non-limiting example, the weights may include weights that are varied during neural network training and/or weights for which neural network training has been completed. The variable weight in the neural network learning process may include a weight at the time the learning cycle starts and/or a variable weight during the learning cycle. The weights for which neural network learning has been completed may include weights for which learning cycles have been completed. Accordingly, the data structure including the weights of the neural network may include a data structure including weights that are variable during the neural network learning process and/or weights for which neural network learning is completed. Therefore, it is assumed that the above-described weights and/or combinations of weights are included in the data structure including the weights of the neural network. The above data structure is only an example and the present invention is not limited thereto.

The data structure including the weights of the neural network may be stored in a computer readable storage medium (eg, a memory or a hard disk) after going through a serialization process. Serialization can be the process of converting a data structure into a form that can be stored on the same or another computing device and later reconstructed and used. A computing device may serialize data structures to transmit and receive data over a network. The data structure including the weights of the serialized neural network may be reconstructed on the same computing device or another computing device through deserialization. The data structure including the weights of the neural network is not limited to serialization. Furthermore, the data structure including the weights of the neural network is a data structure for increasing the efficiency of operation while minimizing the resource of the computing device (for example, B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree). The foregoing is only an example and the present invention is not limited thereto.

The data structure may include hyper-parameters of the neural network. Also, the data structure including the hyperparameters of the neural network may be stored in a computer readable medium. A hyperparameter may be a variable variable by a user. Hyperparameters include, for example, learning rate, cost function, number of learning cycle iterations, weight initialization (eg, setting the range of weight values to be targeted for weight initialization), hidden unit number (eg, the number of hidden layers and the number of nodes in the hidden layer). The above data structure is only an example and the present invention is not limited thereto.

Steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, implemented in a software module executed by hardware, or implemented by a combination thereof. A software module may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any form of computer readable recording medium well known in the art to which the present invention pertains.

Components of the present invention may be implemented as a program (or application) to be executed in combination with a computer, which is hardware, and stored in a medium. Components of the present invention may be implemented as software programming or software elements, and similarly, embodiments may include various algorithms implemented as data structures, processes, routines, or combinations of other programming constructs, such as C, C++ , Java (Java), can be implemented in a programming or scripting language such as assembler (assembler). Functional aspects may be implemented in an algorithm running on one or more processors.

Those skilled in the art will understand that the various illustrative logical blocks, modules, processors, means, circuits, and algorithm steps described in connection with the embodiments disclosed herein are electronic hardware, (for convenience) , may be implemented by various forms of program or design code (referred to herein as “software”) or a combination of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the particular application and the design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Various embodiments presented herein may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" includes a computer program, carrier, or media accessible from any computer-readable device. For example, computer-readable media include magnetic storage devices (eg, hard disks, floppy disks, magnetic strips, etc.), optical disks (eg, CDs, DVDs, etc.), smart cards, and flash memory. device (eg, EEPROM, card, stick, key drive, etc.), but is not limited thereto. Additionally, various storage media presented herein include one or more devices and/or other machine-readable media for storing information. The term “machine-readable medium” includes, but is not limited to, wireless channels and various other media that can store, hold, and/or convey instruction(s) and/or data.

It is to be understood that the specific order or hierarchy of steps in the processes presented is an example of example approaches. Based upon design priorities, it is to be understood that the specific order or hierarchy of steps in the processes may be rearranged while remaining within the scope of the present invention. The accompanying method claims present elements of the various steps in a sample order, but are not meant to be limited to the specific order or hierarchy presented.

The description of the presented embodiments is provided to enable any person skilled in the art to use or practice the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments without departing from the scope of the present invention. Thus, the present invention is not to be limited to the embodiments presented herein, but is to be construed in the widest scope consistent with the principles and novel features presented herein.

The related contents have been described in the best mode for carrying out the invention as described above.

INDUSTRIAL APPLICABILITY The present invention can be utilized in the field of providing edited images by editing image data.

Claims

A method performed on one or more processors of a computing device, comprising:

Obtaining streaming content information;

obtaining edit frame information based on the streaming content information; and

generating edit point recommendation information based on the edit frame information;

including,

Streaming data-based video editing store recommendation method.
According to claim 1,

The streaming content information,

Streaming data including basic content information including information related to a user's broadcast content, streaming image data related to the broadcast content, and viewer reaction data related to a reaction of a plurality of viewers watching the streaming image data,

The edit frame information,

Corresponds to main video sub data identification information related to at least some video sub data among a plurality of video sub data constituting the streaming video data and main video sub data identification information, as information based on generating the edit point recommendation information. Including recommended strength information to do,

Streaming data-based video editing store recommendation method.
According to claim 2,

Obtaining the edit frame information,

obtaining first edit frame information through image analysis of the streaming image data;

obtaining second edit frame information through response analysis on the viewer response data; and

obtaining the edited frame information by integrating the first edited frame information and the second edited frame information;

including,

Streaming data-based video editing store recommendation method.
According to claim 3,

Obtaining the edit frame information,

generating weight application information related to at least one of the first edit frame information and the second edit frame information based on the content basic information; and

obtaining the edited frame information by integrating the first edited frame information and the second edited frame information based on the weight application information;

including,

Streaming data-based video editing store recommendation method.
According to claim 3,

The viewer reaction data,

At least one of information on the number of viewers watching the streaming video data, information on chatting frequency related to the streaming video data, chatting keyword information related to the streaming video data, and donation information related to the streaming video data; ,

The reaction analysis,

Characterized in that the analysis of the real-time variation of the viewer response data corresponding to the streaming video data,

Streaming data-based video editing store recommendation method.
According to claim 3,

Obtaining the edit frame information,

A weighting factor for the recommendation strength information is determined based on a viewpoint similarity between one or more first main video sub-data corresponding to the first edit frame information and one or more second main video sub-data corresponding to the second edit frame information. characterized by giving

Streaming data-based video editing store recommendation method.
According to claim 2,

The editing point recommendation information,

information on one or more recommended editing points related to the streaming video data, including one or more recommended sub information corresponding to the one or more recommended editing points;

In the step of generating the edit point recommendation information,

searching for one or more pieces of related sub-data identification information based on one or more pieces of main video sub-data identification information included in the edit frame information; and

generating the one or more edit point recommendation sub information based on the one or more main video sub data identification information and the one or more related sub data identification information;

Including,

Each of the one or more related sub-data identification information,

Including start frame information related to the start of each edit point recommendation sub information and end frame information related to the end of each edit point recommendation sub information,

Streaming data-based video editing store recommendation method.
According to claim 2,

generating and providing an image editing user interface based on the edit point recommendation information;

Including more,

The video editing user interface,

A video editing screen including the edit point recommendation information corresponding to the streaming video data, and allowing a user's adjustment input to the video editing screen,

The video editing screen,

Characterized in that each edit point is displayed through different visual expressions based on the recommendation strength information corresponding to each of the one or more edit point recommendation sub information.

Streaming data-based video editing store recommendation method.
a memory that stores one or more instructions; and

a processor to execute the one or more instructions stored in the memory;

By executing the one or more instructions, the processor:

An apparatus that performs the method of claim 1 .
A computer program stored in a computer-readable recording medium to be combined with a computer, which is hardware, to perform the method of claim 1