KR20170074015A

KR20170074015A - Method for editing video conference image and apparatus for executing the method

Info

Publication number: KR20170074015A
Application number: KR1020150182982A
Authority: KR
Inventors: 송지훈
Original assignee: 삼성에스디에스 주식회사
Priority date: 2015-12-21
Filing date: 2015-12-21
Publication date: 2017-06-29

Abstract

A video conference video editing method and apparatus for performing the same are disclosed. A method for editing a video conference image according to an exemplary embodiment includes the steps of receiving a video conference video, receiving a request for editing a content of interest from a video conference terminal in a video conference, A step of recognizing a voice in a part of the image corresponding to the request for editing a part of interest and converting the recognized voice into a text, And creating an edited image.

Description

TECHNICAL FIELD [0001] The present invention relates to a video editing method and a video editing method, and more particularly,

Embodiments of the invention relate to video conferencing technology.

Conferences are held for the purpose of conveying information, collecting opinions, and drawing out results among people. In recent years, with the development of IT technology, there has been an increasing tendency to collect information and receive opinions through video conferences (video conferences or teleconferences).

In these videoconferences, it is difficult for the presenter to know when the content of the meeting is being delivered to the participants of the conference, as intended. It is also difficult for participants to understand the content of the presentation if they are interested in the presentation and the terminology or abbreviation is used in the presentation. Finally, when editing the meeting results is necessary, it takes a lot of time and effort to understand and organize important contents. Particularly, in the case of a person who can not attend the meeting, it is necessary to watch the whole meeting video.

Korean Patent Registration No. 10-0575634 (Nov. 10, 2006)

An embodiment of the present invention is to provide a video conference video editing method and an apparatus for performing the video conference video editing method in which a video conference participant can easily edit an image of a necessary part in a video conference.

An embodiment of the present invention is to provide a video conference video editing method capable of generating a highlight editing video for a video conference video by grasping the interest of video conference participants and an apparatus for performing the video conference video editing method.

A video conference video editing method according to an exemplary embodiment includes: receiving a video conference video at a video conference editing server; Extracting an image of a portion corresponding to the interest portion edit request from the video conference image when a request for editing the interest portion is received from the video conference terminal in the video conference by the video conference editing server; Recognizing a voice in an image of a portion corresponding to the request for editing the interest portion in the video conference editing server and converting the recognized voice into text; And generating, in the video conference editing server, an edited image by inserting the converted text as a subtitle into an image of a portion corresponding to the interest portion editing request.

Wherein the step of extracting the image corresponding to the request for editing the interest portion comprises the steps of: receiving, at the video conference editing server, Respectively; And extracting, at the video conference editing server, a section from the start of speech to the end of speech in the video conference video.

The method may further include the step of providing the edited video to the video conference terminal in the video conference at the video conference editing server after the editing video is generated.

Confirming at the video conference editing server whether or not an annotation required term is used in the video conference video; Retrieving annotation information describing the meaning of the annotation required term in the video conference editing server when the annotation required term is used; And providing the retrieved annotation information to each video conference terminal in the video conference at the video conference editing server.

Further comprising the step of statistically processing the interest of the participants of the video conference by analyzing the interest part editing requests received from each video conference terminal in the video conference at the video conference editing server after the step of generating the edited video can do.

The method may further include, at the video conference editing server, generating a highlight editing image for the video conference image based on the interest of the participants of the video conference after the statistical processing step.

The step of generating the highlight editing video may further include the step of providing the highlight editing video to each video conference terminal participating in the video conference at the video conference editing server.

A video conference video editing method according to another exemplary embodiment includes: receiving a video conference video at a video conference editing server; Receiving, at the video conference editing server, a request for editing a point of interest from each video conference terminal in a video conference; Analyzing the interest part editing requests received from each video conference terminal and statistically processing the participants' interest in the video conference at the video conference editing server; And generating, in the video conference editing server, a highlight editing video for the video conference video based on the interest of the participants of the video conference.

The highlight editing video may be generated for a portion of the video conference image having the highest degree of interest of the participants.

The step of generating the highlight editing image may include extracting an image of a portion of the video conference image having the highest degree of interest by the participant in the video conference editing server, Recognizing a voice in an image of a part having the highest degree of interest of the participants in the video conference editing server and converting the recognized voice into text; And inserting the converted text as a subtitle into an image of a portion of the participant having the highest interest in the video conference editing server.

An apparatus according to one exemplary embodiment includes one or more processors; Memory; And one or more programs configured to be stored in the memory and configured to be executed by the one or more processors, the program comprising the steps of: determining whether a point of interest editing request is received from a video conferencing terminal in a video conference; ; Extracting an image of a portion corresponding to the interest portion editing request from the video conference video when the interested portion editing request is received; Recognizing speech in a portion of the image corresponding to the interest portion edit request, and converting the recognized speech into text; And inserting the converted text as a subtitle into an image of a portion corresponding to the interest portion editing request to generate an edited image.

The program may further include a step of extracting an image of a portion corresponding to the request for editing the interest portion and detecting a start portion of the speaker's utterance and an end portion of the utterance in the video conference image, ; And extracting an interval from the start of speech to the end of speech in the video conference video.

The program may further include instructions for performing the step of providing the edited video to the corresponding video conference terminal in the video conference after the step of generating the edited video.

The program comprising the steps of: determining whether an annotation required term is used in the video conference video; Searching for annotation information describing the meaning of the annotation required term if the annotation required term is used; And providing the retrieved annotation information to each videoconference terminal in the videoconference.

The program further comprising the step of analyzing the interest portion editing requests received from each video conference terminal in the video conference after the step of generating the edited video to further statistically process the participants' Lt; / RTI >

The program may include instructions for further performing the step of generating a highlighted edit image for the video conference video based on the interest of the participants of the video conference after the statistical processing step.

The program may further comprise instructions for performing the step of providing the highlight edit image to each videoconference terminal participating in the videoconference, after the step of generating the highlight edit video.

An apparatus according to another exemplary embodiment includes one or more processors; Memory; And one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, the program comprising the steps of: receiving a request for editing a point of interest from each videoconference terminal in a videoconference step; Analyzing the interest portion editing requests received from each videoconference terminal to statistically process the participants' interest in the videoconference; And generating a highlight edit image for the videoconference image based on the interest of the participants of the videoconference.

The program may generate the highlight editing image for a portion of the video conference image having the highest degree of interest of the participants.

The program comprising the steps of: extracting an image of a portion of the video conference image having the highest degree of interest of the participant in the step of generating the highlight edit video; Recognizing speech in a portion of the image of the highest degree of interest of the participants and converting the recognized speech into text; And inserting the transformed text into a caption of an image of a part having the highest degree of interest of the participants.

According to the embodiment of the present invention, a participant of a video conference during a video conference can edit and manage a part of his / her interest in real time. In other words, by providing editing video reflecting the interest of the participant in each video conference attendee, the presenter can grasp the interest information and the understanding level of the attendees of each conference, and attendees can edit and summarize the video conference contents in a simple way . In addition, by providing the highlighted portion of the video conference as the most interesting part of the video conference attendees, those who can not attend the meeting can quickly grasp the contents of the video conference.

1 is a block diagram illustrating the configuration of a video conferencing system according to an exemplary embodiment;
2 is a block diagram showing a configuration of a video conference editing server according to an exemplary embodiment;
3 is a flowchart showing a video conference video editing method according to the exemplary embodiment
4 is a view showing a user interface screen provided in a video conference terminal according to an exemplary embodiment;
5 illustrates a computing environment including an exemplary computing device suitable for use in the exemplary embodiments

Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to provide a comprehensive understanding of the methods, apparatus, and / or systems described herein. However, this is merely an example and the present invention is not limited thereto.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intention or custom of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification. The terms used in the detailed description are intended only to describe embodiments of the invention and should in no way be limiting. Unless specifically stated otherwise, the singular form of a term includes plural forms of meaning. In this description, the expressions "comprising" or "comprising" are intended to indicate certain features, numbers, steps, operations, elements, parts or combinations thereof, Should not be construed to preclude the presence or possibility of other features, numbers, steps, operations, elements, portions or combinations thereof.

In the following description, terms such as " transmission ", "transmission "," transmission ", "reception ", and the like, of a signal or information refer not only to the direct transmission of signals or information from one component to another But also through other components. In particular, "transmitting" or "transmitting" a signal or information to an element is indicative of the final destination of the signal or information and not a direct destination. This is the same for "reception" of a signal or information. Also, in this specification, the fact that two or more pieces of data or information are "related" means that when one piece of data (or information) is acquired, at least a part of the other data (or information) can be obtained based thereon.

1 is a block diagram showing the configuration of a video conferencing system according to an exemplary embodiment.

Referring to FIG. 1, the video conferencing system 100 may include a video conferencing terminal 102, a video conferencing control server 104, and a video conference editing server 106. The video conferencing terminal 102 is connected to the video conferencing control server 104 and the video conferencing server 104 via a network 150 such as a local area network (LAN), a wide area network (WAN), and a cellular network or the Internet. And the conference editing server 106, respectively.

The video conference terminal 102 may be a terminal used by a participant in a video conference. There may be a plurality of video conference terminals 102. The videoconference terminal 102 may be a desktop computer, a notebook, a smart phone, a tablet PC, or the like. Each video conference terminal 102 can receive the video conference video from the video conference control server 104 and display it on the screen. Video conferencing video may include audio. The videoconference terminal 102 can transmit a request for editing a part of interest during a videoconference to the videoconference editing server 106 according to the input of the user.

The videoconference control server 104 serves to control the videoconference between the videoconference terminals 102. The videoconference control server 104 can transfer the video taken by at least one videoconference terminal 102 (for example, a terminal of the videoconference presenter) to another videoconference terminal 102 as a videoconference video. In addition, the video conference control server 104 can generate a video conference video by synthesizing the videos photographed by the video conference terminals 102, and then transmit the generated video conference video to each video conference terminal 102. The videoconference control server 104 may include an MCU (Multipoint Control Unit) for multi-party video conferencing. The videoconference control server 104 can transmit the videoconference video to the videoconference editing server 106. At this time, the video conference control server 104 can transmit the video conference video to the video conference editing server 106 in real time.

The video conference editing server 106 can record a speech part during a video conference based on a point of time when a request for editing a point of interest is received when a point of interest editing request of the video conference terminal 102 is received. In the exemplary embodiment, the video conference editing server 106 detects the speaker's start point and the end point of the speaker at the point of time of receiving the request for editing the point of interest during the video conference, Can be recorded to generate an edited image.

The video conference editing server 106 can convert the voice included in the edited video into a character and then insert the converted character into the edited video as a caption. The video conference editing server 106 can transmit the edited video to the video conference terminal 102. The video conference editing server 106 can transmit the edited video to the video conference terminal 102 in real time during the video conference.

When a technical term or an abbreviation is used in a video conference, the video conference editing server 106 can search for annotation information describing the meaning of the technical term or the abbreviation and transmit it to each video conference terminal 102. When a technical term or an abbreviation is used in the edited video, the video conference editing server 106 may provide annotation information on a technical term or abbreviation to the edited video and provide it to the video conference terminal 102.

The video conference editing server 106 analyzes a request for editing the interest portion received from each video conference terminal 102 and generates a highlighted video by editing a portion of the video conference participant having the highest interest level as a highlight video have.

Here, it is described that the video conference editing server 106 provides the edited video or annotation information to the videoconference terminal 102. However, the present invention is not limited to this, and the edited video or annotation information generated by the video conference editing server 106 Etc. may be provided to the videoconference terminal 102 via the videoconference control server 104. [

Although the video conference editing server 106 is described as being implemented separately from the video conference control server 104, the present invention is not limited thereto and may be implemented integrally with the video conference control server 104. For example, the videoconference editing server 106 may be modularized and implemented in the videoconference control server 104. Further, the functions of the video conference editing server 106 may be implemented in each videoconference terminal 102.

2 is a block diagram showing a configuration of a video conference editing server according to an exemplary embodiment.

2, the video conference editing server 106 includes a communication unit 111, an editing unit 113, a text conversion unit 115, an annotation providing unit 117, a statistical analysis unit 119, and a storage unit 121 ).

The communication unit 111 can be communicably connected to each of the video conference terminals 102 via the network 150. The communication unit 111 can be communicably connected to the video conference control server 104 via the network 150. The communication unit 111 can receive a request for editing the interested part from the video conference terminal 102. [ The attention part editing request may include identification information of the videoconference terminal 102. The communication unit 111 may transmit the edited video corresponding to the interest part editing request to the videoconference terminal 102. The communication unit 111 can transmit the highlight edited video of the video conference to each video conference terminal 102 participating in the video conference. The communication unit 111 can transmit annotation information about the used terminology or abbreviation to each video conference terminal 102 during the video conference. The communication unit 111 can receive the video conference video from the video conference control server 104. [

When the communication unit 111 receives the request for editing the interested part from the video conference terminal 102, the editing unit 113 can edit the portion corresponding to the request for editing the interested part in the video conference video to generate an edited image. For example, the editing unit 113 may record an utterance portion of the presenter based on a point of time when a request for editing a point of interest is received from a video conference image to generate an edited image. At this time, the editing unit 113 detects the start of the utterance of the presenter at the time of requesting the editing of the interested part in the video conference video (the beginning of the utterance may be before the point of time of requesting the editing of the interested part) , An edited image can be generated by dividing a section from the start of speech to the end of speech in the video conference video. The edited video can be generated in units of a speaker's speech interval (interval from the start of speech to the end of speech) during a video conference.

In addition, the editing unit 113 can generate a highlight editing image according to the degree of interest of the video conference participants based on the analysis result of the statistical analysis unit 119. For example, the editing unit 113 can generate a highlighted portion of the participant's highest interest in the video conference. That is, the editing unit 113 can extract a portion having the highest degree of interest of the participants in the entire video conference image, and generate a highlight editing image (which may include subtitles). However, the present invention is not limited thereto, and a highlight editing image can be generated by various other methods. For example, the editing unit 113 may generate a highlight edited image of the video conference image including a portion of the participant's interest level up to a preset rank, with the participants having the highest degree of interest.

The text conversion unit 115 can extract the voice from the edited image generated by the editing unit 113 and convert the extracted voice into text (character). The text converting unit 115 can extract the voice from the edited image generated by the editing unit 113 through Automatic Speech Recognition (ASR) technology. The text conversion unit 115 can convert the speech extracted through the STT (Speech to Text) technology into text. The text conversion unit 115 may transmit the converted text to the editing unit 113. [ Then, the editing unit 113 can insert text into the edited video as a subtitle.

When the annotation required term is used in the video conference, the annotation providing unit 117 can search the annotation information for explaining the meaning of the annotation required term. Herein, the annotation required term is a term in which the meaning of the public can not be easily understood, for example, may include terminology, technical term, abbreviation, and the like. Specifically, the annotator providing unit 117 analyzes the video image of the video conference, and if the presenter uses the annotation required term or if the annotation required term is described in the presentation, the annotation information explaining the meaning of the annotation required term is searched , And transmit the retrieved annotation information to the communication unit 111. The comment maker 117 may search annotation information in a predetermined database or annotation information on the Internet or the like.

The statistical analysis unit 119 may analyze the interest portion editing requests received from the respective videoconference terminals 102 and perform statistical processing. In the exemplary embodiment, the statistical analysis unit 119 can analyze the participants' interest in the video conference according to the number of requests for editing the interest portion of the video conference terminal 102. [ The statistical analysis unit 119 may analyze the ranking of the parts of the participants having a high degree of interest according to the number of requests for editing the interested part in the entire video conference video.

The storage unit 121 may store the video conference image received from the video conference control server 104. [ The storage unit 121 may store an edited image generated by the editing unit 113. [ The storage unit 121 may store a highlight edit image generated by the editing unit 113. [ The storage unit 121 may store statistical information generated by the statistical analysis unit 119.

3 is a flowchart illustrating a video conference video editing method according to an exemplary embodiment. In the illustrated flow chart, the method is described as being divided into a plurality of steps, but at least some of the steps may be performed in reverse order, combined with other steps, performed together, omitted, divided into detailed steps, One or more steps may be added and performed. Also, one or more steps not shown in the method according to the embodiment may be performed with the method.

Referring to FIG. 3, the video conference editing server 106 receives the video conference video from the video conference control server 104 (S101). The video conference editing server 106 can receive the video conference video in real time.

Next, the video conference editing server 106 confirms whether a request to edit the interested part is received from the video conference terminal 102 (S103).

If it is determined in step S 103 that the interested part editing request is received, the video conference editing server 106 extracts the video corresponding to the interested part editing request from the video conference video (S 105). For example, the video conference editing server 106 can extract the video section from the start point of the speaker to the end point of the utterance of the corresponding speaker at the time of requesting the editing of the interested part in the video conference video.

Next, the video conference editing server 106 extracts the voice from the image corresponding to the request for editing the interest portion, and converts the extracted voice to text (S 107).

Next, the video conference editing server 106 inserts the converted text into the image of the portion corresponding to the request for editing the interest portion as a subtitle to generate an edited image (S 109). The video conference editing server 106 can transmit the edited video to the video conference terminal 102.

Next, the video conference editing server 106 confirms whether a terminology or an abbreviation is used in the video conference video (S 111). As a result of the check in step S111, when a terminology or an abbreviation is used in the video conference video, the video conference editing server 106 searches for annotation information that explains the meaning of the terminology or abbreviation, (S113). Here, it is explained that the step S 111 is performed as the next step of the step S 109, but the present invention is not limited thereto, and the step S 111 can be performed in real time from the time of receiving the video conference video.

Next, the video conference editing server 106 generates statistical information reflecting the degree of interest of the participants in the video conference using the attention part editing request received from each video conference terminal 102 (S 115).

Next, the video conference editing server 106 generates a highlight editing image of the video conference video based on the statistical information (S 117). The video conference editing server 106 can provide a highlight editing video to each video conference terminal 102.

4 is a diagram illustrating a user interface screen provided in a video conference terminal according to an exemplary embodiment.

Referring to FIG. 4, the video conference terminal 102 may include a user interface screen 150. The user interface screen 150 may be implemented with a touch screen, but is not limited thereto. In the user interface screen 150, the area A may be an area where a video conference video is output and displayed. The user of the videoconference terminal 102 can click the edit request button B to transmit a request for editing the interested part to the videoconference editing server 106. [ When the video conference terminal 102 receives an edited video, the video conference terminal 102 can display the received edited video in the area C on the user interface screen 150. A plurality of edit images requested by the user can be displayed in the C region. A participant list of the video conference can be displayed in the D area of the user interface screen 150. [ Annotation information about a technical term, an abbreviation, or the like can be displayed in the E area of the user interface screen 150. Various tools may be displayed in the F area of the user interface screen 150. [

FIG. 5 illustrates a computing environment including an exemplary computing device suitable for use in the exemplary embodiments.

The exemplary computing environment 200 shown in FIG. 5 includes a computing device 210. Typically, each configuration may have different functions and capabilities, and may additionally include components that are appropriate for the configuration, even if not described below. The computing device 210 may be an apparatus (e.g., a video conference editing server 106) for editing a video conference video.

The computing device 210 includes at least one processor 212, a computer readable storage medium 214, and a bus 260. The processor 212 is coupled to the bus 260 and the bus 260 includes a computer readable storage medium 214 to connect the various other components of the computing device 210 to the processor 212.

The processor 212 may cause the computing device 210 to operate in accordance with the exemplary embodiment discussed above. For example, the processor 212 may execute computer-executable instructions stored in the computer-readable storage medium 214, and computer-executable instructions stored in the computer-readable storage medium 214 may be executed by the processor 212 The computing device 210 may be configured to perform operations in accordance with certain exemplary embodiments.

Computer readable storage medium 214 may store computer-executable instructions or program code (e.g., instructions contained in application 230), program data (e.g., data used by application 230), and / As shown in FIG. The application 230 stored in the computer-readable storage medium 214 includes a predetermined set of instructions executable by the processor 212. [

The memory 216 and the storage device 218 shown in FIG. 5 are examples of a computer-readable storage medium 214. The memory 216 may be loaded with computer executable instructions that may be executed by the processor 212. Also, the program data may be stored in the memory 216. [ For example, such memory 216 may be volatile memory, such as random access memory, non-volatile memory, or any suitable combination thereof. As another example, the storage device 218 may include one or more removable or non-removable components for storage of information. For example, the storage device 218 may be a hard disk, flash memory, magnetic disk, optical disk, any other form of storage medium that can be accessed by the computing device 210 and store the desired information, or any suitable combination thereof.

The computing device 210 may also include one or more input / output interfaces 220 that provide an interface for one or more input / output devices 270. The input / output interface 220 is connected to the bus 260. The input / output device 270 may be connected to (other components of) the computing device 210 via the input / output interface 220. The input / output device 270 includes an input device such as a pointing device, a keyboard, a touch input device, a voice input device, a sensor device and / or a photographing device and / or an output device such as a display device, printer, speaker and / can do.

On the other hand, certain embodiments may include a computer readable storage medium comprising a program for performing the procedures described herein on a computer. Such computer-readable storage media may include program instructions, local data files, local data structures, etc., alone or in combination. The computer-readable storage medium may be those specially designed and constructed for the present invention. Examples of computer-readable storage media include magnetic media such as hard disks, floppy disks and magnetic tape, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floppy disks, and ROMs, And hardware devices specifically configured to store and execute the same program instructions. Examples of program instructions may include machine language code such as those generated by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, . Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be determined by equivalents to the appended claims, as well as the appended claims.

100: Video conferencing system
102: video conference terminal
104: Video conference management server
106: Video conference editing server
111:
113: Editor
115: Text conversion section
117:
119: Statistical Analysis Department
121:

Claims

Receiving, at a video conference editing server, a video conference video;
Extracting an image of a portion corresponding to the interest portion edit request from the video conference image when a request for editing the interest portion is received from the video conference terminal in the video conference by the video conference editing server;
Recognizing a voice in an image of a portion corresponding to the request for editing the interest portion in the video conference editing server and converting the recognized voice into text; And
And editing the video conference by the video conference editing server by inserting the converted text as a subtitle into an image of a portion corresponding to the interest portion editing request to generate an edited image.

The method according to claim 1,
Wherein the step of extracting an image of a portion corresponding to the interest-
Detecting, at the video conference editing server, a start point of a speaker and a end point of a speaker in the video conference video based on a point of time when the request for editing the point of interest is received; And
And extracting a section from the start of speech to the end of speech in the video conference video in the video conference editing server.

The method according to claim 1,
After the step of generating the edited image,
Further comprising the step of providing, in the video conference editing server, the edited video to the video conference terminal in the video conference.

The method according to claim 1,
Confirming at the video conference editing server whether or not an annotation required term is used in the video conference video;
Retrieving annotation information describing the meaning of the annotation required term in the video conference editing server when the annotation required term is used; And
Further comprising the step of providing, at the video conference editing server, the retrieved annotation information to each video conference terminal in the video conference.

The method according to claim 1,
After the step of generating the edited image,
Further comprising the step of statistically processing the interest of the participants of the video conference by analyzing the interest part editing requests received from each video conference terminal in the video conference at the video conference editing server.

The method of claim 5,
After the statistical processing step,
Further comprising the step of creating, in the video conference editing server, a highlight editing video for the video conference video based on the interest of the participants of the video conference.

The method of claim 6,
After the step of generating the highlight edit image,
Further comprising the step of, at the video conference editing server, providing the highlight editing video to each video conference terminal participating in the video conference.

Receiving, at a video conference editing server, a video conference video;
Receiving, at the video conference editing server, a request for editing a point of interest from each video conference terminal in a video conference;
Analyzing the interest part editing requests received from each video conference terminal and statistically processing the participants' interest in the video conference at the video conference editing server; And
And a step of generating, in the video conference editing server, a highlight editing video for the video conference video based on the interest of the participants of the video conference.

The method of claim 8,
The highlight editing image may include:
Wherein a portion of the video conference image having the highest degree of interest of the participants is generated.

The method of claim 8,
Wherein the step of generating the highlight editing image comprises:
Extracting, at the video conference editing server, an image of a portion of the video conference image in which the participants are most interested;
Recognizing a voice in an image of a part having the highest degree of interest of the participants in the video conference editing server and converting the recognized voice into text; And
And inserting the converted text as a subtitle into an image of a portion of the participant having the highest degree of interest in the video conference editing server.

The method of claim 8,
Confirming at the video conference editing server whether or not an annotation required term is used in the video conference video;
Retrieving annotation information describing the meaning of the annotation required term in the video conference editing server when the annotation required term is used; And
Further comprising the step of providing, at the video conference editing server, the retrieved annotation information to each video conference terminal in the video conference.

The method of claim 8,
After the step of generating the highlight edit image,
Further comprising the step of, at the video conference editing server, providing the highlight editing video to each video conference terminal participating in the video conference.

One or more processors;
Memory; And
An apparatus comprising one or more programs,
Wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors,
The program includes:
Confirming whether a request for editing a point of interest is received from a video conference terminal in a video conference;
Extracting an image of a portion corresponding to the interest portion editing request from the video conference image when the interested portion editing request is received;
Recognizing speech in a portion of the image corresponding to the interest portion edit request, and converting the recognized speech into text; And
And inserting the converted text as a subtitle into an image of a portion corresponding to the interest portion edit request to generate an edited image.

14. The method of claim 13,
Wherein the program extracts an image of a portion corresponding to the interest portion edit request,
Detecting a speaker's utterance start point and a utterance end point in the video conference video based on a point of time when the request for editing the point of interest is received; And
And extracting an interval from the start of speech to the end of speech in the video conference image.

14. The method of claim 13,
The program may further include, after the step of generating the edited video,
And providing the edited video to a corresponding videoconference terminal in the videoconference.

14. The method of claim 13,
The program includes:
Confirming whether an annotation necessary term is used in the video conference video;
Searching for annotation information describing the meaning of the annotation required term if the annotation required term is used; And
And providing the retrieved annotation information to each videoconference terminal in the videoconference.

14. The method of claim 13,
The program may further include, after the step of generating the edited video,
Further comprising analyzing interest portion editing requests received from each videoconference terminal in the videoconference to statistically process the participants' interest in the videoconference.

18. The method of claim 17,
The program may further comprise, after the statistical processing step,
Further comprising generating a highlighted edit image for the video conference image based on an interest of the participants of the video conference.

19. The method of claim 18,
The program may further include, after the step of generating the highlight edit image,
And providing the highlight edit image to each videoconference terminal participating in the videoconference.

One or more processors;
Memory; And
An apparatus comprising one or more programs,
Wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors,
The program includes:
Receiving a request for editing a point of interest from each videoconference terminal in a videoconference;
Analyzing the interest portion editing requests received from each videoconference terminal to statistically process the participants' interest in the videoconference; And
And generating a highlighted edit image for the videoconference image based on the interest of the participants of the videoconference.

The method of claim 20,
The program includes:
And generates the highlight edited image for a portion of the video conference image having the highest degree of interest of the participants.

The method of claim 20,
The program may further include, in the step of generating the highlight editing image,
Extracting an image of a portion of the video conference image in which the participants are most interested;
Recognizing speech in a portion of the image of the highest degree of interest of the participants and converting the recognized speech into text; And
And inserting the transformed text into a caption in an image of a portion of the participants having the highest degree of interest.

The method of claim 20,
The program includes:
Confirming whether an annotation necessary term is used in the video conference video;
Searching for annotation information describing the meaning of the annotation required term if the annotation required term is used; And
And providing the retrieved annotation information to each videoconference terminal in the videoconference.

The method of claim 20,
The program may further include, after the step of generating the highlight edit image,
And providing the highlight edit image to each videoconference terminal participating in the videoconference.