KR101630614B1

KR101630614B1 - System and method for producing lecture contents

Info

Publication number: KR101630614B1
Application number: KR1020150041692A
Authority: KR
Inventors: 백민호
Original assignee: (주)에어사운드
Priority date: 2015-03-25
Filing date: 2015-03-25
Publication date: 2016-06-24

Abstract

The present invention relates to a system and method for producing video lecture content. The system for producing video lecture content comprises: multiple sound collectors which collect voices; a voice controller which combines the voices and generates positional data of a sound source in response to the time difference of voice signals transmitted from the sound collectors; a moving camera with movement controlled in response to the positional data; a mixer which mixes an image transmitted from the moving camera with the voices transmitted from the voice controller; and a database which stores a video output by the mixer. According to the present invention, it is possible to increase lecturer convenience because the sound collectors are used for obtaining the voices; achieve unitary control by integrating camera control by using the signal transmitted from the sound collectors; configure the entire system at relatively low installation costs; and improve easiness of production of the video lecture content via automation by using the voices.

Description

TECHNICAL FIELD The present invention relates to a system and a method for producing a video lecture content,

BACKGROUND OF THE INVENTION 1. Field of the Invention [0002] The present invention relates to a technology for producing a moving picture lecture, and more particularly, to a system and method for creating a moving lecture content for automating and optimizing the production of a moving picture lecture using a plurality of sound trackers.

Recently, the rate of using video lectures is increasing according to various purposes (for example, learning) for individuals. Video lectures are one of the ways to complement the shortcomings of offline real-time on-site lectures.

In the production of contents of this video lecture, it is produced in a studio equipped with a broadcasting equipment. However, video lecture contents produced in the studio are limited in the variety of lectures. That is, since only the content of the lecturer's lecture invited to the studio is produced as the content, the diversity of contents of the lecture can not be limited.

Therefore, in order to produce contents of various video lectures, a technique has been proposed in which video lecture contents can be easily produced at an arbitrary place. However, this technology also requires basic camera and sound equipment, and at the same time, it needs to solve at least the problem of automating camera and sound equipment in order to minimize the production staff (or one-person production system).

In other words, lectures on the off-line are often performed by lecturers for writing. In this case, it is possible to acquire various images through the movement and zoom function of the camera and the tracking method. However, there is a problem that it is difficult to acquire voice when the lecturer is moved away from the microphone formed in the camera or fixed separately.

As a method for acquiring voice, a lecturer uses a method of directly holding a microphone or attaching a wired / wireless microphone to the body. However, this has the problem of making lecturers uncomfortable.

Therefore, it is necessary to have a way to obtain voice effectively.

Korean Patent Publication No. 10-2014-0078043 (published on June 25, 2014). Korean Patent Publication No. 10-2008-0049431 (published on June 4, 2008).

SUMMARY OF THE INVENTION Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and it is an object of the present invention to provide a portable sound recorder capable of automatically adjusting the position of a camera using a time difference of a voice signal transmitted from a plurality of sound collectors And a video lecture contents production system and method in which video lecture contents are produced by finally mixing audio and video compensated for time differences.

According to another aspect of the present invention, there is provided a moving picture lecture contents production system including: a plurality of sound collectors for collecting sounds; A voice controller for combining voice corresponding to a time difference of the voice signal transmitted from the plurality of sound collectors and generating location data of the sound source; A mobile camera for performing movement control corresponding to the position data; A mixer for mixing an image transmitted from the mobile camera and a voice transmitted from the voice controller; And a database for storing moving images output from the mixer.

Here, a fixed camera for photographing a certain space may be further included, and the mixer mixes the video transmitted from the mobile camera and the fixed camera with the voice transmitted from the audio controller.

It is also preferable that a speaker is connected to the audio controller.

A wired / wireless communication device for transmitting moving images to the mixer is preferably connected.

Wherein the sound controller comprises: a time difference measuring unit for measuring a time difference of a voice signal transmitted from each sound collector; A position determiner for determining a position of a lecturer based on a time difference of a voice signal of each sound collector to generate position data; A time difference correcting unit for correcting a time difference corresponding to a time difference of a sound signal of each sound collector; And a synthesizer for outputting the voice data synthesized with the time-lapse corrected voice.

The sound collector includes a first collecting part having a curvature in a first direction and collecting sound into the inside thereof; A second collecting part having a curvature in the first direction and guiding and collecting sound in a gap separated from the first collecting part; And an acoustic processor for performing acoustic processing on the sound collected using the plurality of microphones.

According to another aspect of the present invention, there is provided a method for producing a moving picture lecture content, comprising: generating each sound signal from a sound collected using a plurality of sound pickup units; Generating location data from a time difference of each voice signal in a voice controller; Moving the camera corresponding to the position data; And mixing the image data transmitted from the camera and the voice data transmitted from the voice controller using a hop sequence.

Preferably, the voice data is combined with voice corresponding to a time difference between voice signals transmitted from a plurality of sound pickup units.

Here, the audio controller can output a field sound through the connected speaker, and can transmit the moving picture in real time through the wired / wireless communication device connected to the mixer.

As described above, according to the video lecture contents production system and method of the present invention, since the loudspeaker is used to acquire the voice, it is possible not only to improve the convenience of the lecturer, So that control can be unified.

Further, according to the present invention, the entire system can be configured with a relatively low installation cost, and the ease of production of video lecture contents can be improved through automation using voice.

1 is a block diagram of a video lecture contents production system according to an embodiment of the present invention.
2 is a configuration diagram of a voice controller according to an embodiment of the present invention.
3 to 11 are various examples of the sound pickup machine applied to the present invention.
FIG. 12 is a flowchart of a method of producing a moving picture lecture content according to an embodiment of the present invention.
FIG. 13 and FIG. 14 are conceptual diagrams showing an extended contents production system of a moving picture lecture of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram of a video lecture contents production system according to an embodiment of the present invention.

First, the same reference numerals are assigned to the components that perform the same function in the drawings.

1, a moving picture lecture contents production system according to the present invention includes a plurality of sound pickup units 1 for picking up sounds, a sound pickup unit 11 for combining sounds corresponding to time differences of the sound signals transmitted from the plurality of sound pickup units 1, A moving camera 3 in which movement control is carried out in accordance with the position data; a video image transmitted from the moving camera 3 and a voice transmitted from the audio controller 2; And a database 5 for storing the moving image outputted from the mixer 4. The database 5 stores the moving images output from the mixer 4,

Here, it may further include a fixed camera 6 for photographing a certain space. The mixer 4 mixes the video transmitted from the mobile camera 3 and the fixed camera 6 with the audio transmitted from the audio controller 2.

Also, the speaker 7 can be connected to the audio controller 2. [ This speaker 7 is intended for use as an output for a student in the field of a lecture room or the like.

The mixer 4 may be connected to a wired / wireless communication device 8 for transmitting moving images. The wired / wireless communication device 8 is for delivering the mixed video to the field learners in real time. Therefore, students will be able to watch videos on the terminal.

On the other hand, in the present embodiment, the case where a plurality of sound pickup units 1 are constituted is described, but a system can be constituted by using one sound pickup unit 1. This will be described in detail through the configuration shown in FIG. 3 to FIG.

2 is a configuration diagram of a voice controller according to an embodiment of the present invention.

2, the voice controller 2 of the present invention includes a time difference measurement unit 21 for measuring a time difference of a voice signal transmitted from each sound collector 1, A time difference correcting unit 23 for correcting the time difference in accordance with the time difference between the audio signals of the sound collectors 1, And a synthesizer 24 for outputting one voice data.

Here, the time difference measurement unit 21 determines whether the sound source intensity has a predetermined value or more. Then, the output energy including the delay function is obtained for each speech signal frame with respect to the sound source strength of a predetermined value or more. Then, the cross correlation between the sound pickup units 1 proportional to the output energy is obtained. On the other hand, each step-by-step control process in the time difference measuring unit can be constituted by functional blocks. That is, the time difference measurement unit may include a sound source intensity determination unit, an output energy calculation unit, and a cross correlation calculation unit.

Then, the position determination unit 22 selects a sound source direction candidate group from the cross correlation between each sound pickup unit 1 including the delay function. Next, a predetermined number of candidate directions are selected for the sound source direction candidate group using the sampling frequency and the sound collector 1 interval. Then, the output energy is obtained by using the cross-correlation for each selected candidate direction. Thus, the candidate direction having the largest output energy is determined as the sound source direction. On the other hand, each step-by-step control process in the position determination unit can be composed of functional blocks. That is, the position determination unit may include a sound source direction candidate group selection unit, a candidate direction selection unit, an output energy calculation unit, and a sound source direction determination unit.

3 to 11 are various examples of the sound pickup machine applied to the present invention.

3 to 11, the first collection part 11 preferably has a hollow hemispherical shape, and it is preferable to form a hole H1 for allowing sound to pass through the bottom of the first collection part 11. Inside the bottom part, It is preferable to form a coupling step 111 for coupling with the second collection part 12.

The second collection part 12 preferably has a hollow hemisphere shape having a radius smaller than that of the first collection part 11. It is preferable to form the collection hall H2 on the bottom surface of the second collection part 12, It is preferable to form the coupling protrusion 121 to be coupled to the coupling step 111 of the first collection part 11. [

Here, the first spiral part 11 and the second scavenging part 12 are maintained at predetermined intervals by the coupling step 111 and the coupling protrusion 121. [ In this embodiment, the case where the first pick-up portion 11 and the second pick-up portion 12 maintain the set spacing by the coupling protrusion 111 and the coupling protrusion 121 is exemplified, but various separation methods are applied The first collecting part 11 and the second collecting part 12 can be formed by selectively forming protrusions or the like that minimize the disturbance of sound progression inside the first collecting part 11 and the second collecting part 12, The sound section 12 will be able to maintain the set interval.

Meanwhile, the second collection part 12 may further include a storage part 122 extending from a hollow hemispherical shape. Here, the acoustic hole H3 may be formed at the boundary portion between the second collection part 12 and the storage part 122. [ In addition, the sound receiving unit 122 can be housed with the sound processing unit 13. Here, the sound processing unit 13 includes a plurality of microphones 131 arranged at predetermined intervals. Here, the sound source direction can be determined using one sound collector 1 having a plurality of microphones 131. That is, the same process as shown in FIG. 2 can be applied to a voice signal generated from a plurality of microphones 131. On the other hand, in the present embodiment, a case where a plurality of sound pickup units 1 are used is exemplified, and therefore, it is regarded that there is no time difference of a sound signal generated from each microphone 131 formed in one sound pickup unit 1.

The outer peripheral portion of the receiving portion 122 may be formed with a flange that can be fixed to a wall surface or a ceiling using a fixing member.

In addition, the receiving portion 122 may be formed with a coupling ring 123 that can be fastened to the fixing portion 14 fixed to the wall surface or the ceiling. At this time, when the storage portion 122 is coupled to the fixing portion 14, the sound processing portion 13 may be housed in the fixing portion 14. [

In this case, the fixing portion 14 includes a flange 141 formed with a hole into which a fixing member is inserted so as to be fixed to a wall surface or a ceiling, a flange 141 connected to the flange 141, And a hollow insert 142 that is inserted into the insert 122.

At this time, it is preferable that a substrate or the like is provided at the entrance of the inserting portion 142. Of course, the mounting position of the substrate can be changed in the hollow insertion portion 142. The height of the inserting portion 142 is preferably equal to or smaller than the height of the receiving portion 122.

On the other hand, the first collector 11 and the second collector 12 can be selectively coupled to the fixing part 14. [ At this time, it is preferable that the second collection part 12 is coupled to the fixing part 14 and the first collection part 11 is coupled to the second collection part 12. Thus making it easier to manufacture, assemble and disassemble.

On the other hand, the directional mount 15 can be used so that the sound pickup unit 1 of the present invention can be installed with directionality.

The directional mount 15 is fixed to the wall surface or the ceiling so that the fixing portion 14 and the directional mount 15 are coupled by separate joining members. The joining member may be a joining projection, a joining jaw, a fixing member, or the like. The directional mount 15 can be formed between 0 and 90 degrees ahead of the direction of the sound pickup 1, for example, with reference to the ground direction.

On the other hand, the function of the directional mount 15 may be formed in the fixing portion 14.

Hereinafter, a method for producing a moving picture lecture content of the present invention using the system configured as described above will be described.

FIG. 12 is a flowchart of a method of producing a moving picture lecture content according to an embodiment of the present invention.

Referring to FIG. 12, a sound signal is generated by each sound collector corresponding to a sound propagated from a sound source position (lecture position), and is transmitted to a sound controller (S1).

The voice controller measures the time difference of each voice signal transmitted (S2).

The voice controller generates position data from the time difference of the voice signal and outputs the generated position data (S3).

Then, the position data is transmitted to the mobile camera, and the mobile camera moves the mobile camera in accordance with the position data (S4). Thereafter, it is preferable that the lecturer can be photographed more accurately through a video tracking technique or the like. On the other hand, the fixed camera captures, for example, a part or all of the blackboard in a fixed manner.

On the other hand, the time difference is corrected corresponding to the time difference of the audio signal (S5). At this time, it is preferable to correct the reference of correction based on the strongest speech signal. Of course, it can be corrected on the basis of the speech signal which is inputted first among the speech signals. Subsequently, the voice controller outputs voice data obtained by synthesizing the time-difference-corrected voice (S6).

Then, the video data photographed from the mobile camera and the fixed camera are mixed with the audio data transmitted from the audio controller (S7), and the mixed video data is stored in the database (S8).

On the other hand, when a speaker is connected to the voice controller, the lecture voice can be output in real time through a speaker as a field sound.

When a wired / wireless communication device for transmitting a moving picture to the mixer is connected, the moving picture can be delivered to a field student in real time.

FIG. 13 and FIG. 14 are conceptual diagrams showing an extended contents production system of a moving picture lecture of the present invention.

13 and 14, as an example, it is possible to construct a server for managing a video-lecture contents production system on-line by combining network equipment with a mixer, and thereby, video data transmitted through a network can be stored in a database There will be.

In addition, the video lecture contents production system may be installed for each classroom, and video data transmitted from a plurality of video lecture contents production systems may be integrally managed using a server and a database. It will also be manageable by group (eg school, school).

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the invention.

1: Sound collector
2: Audio controller
3: Moving camera
4: Mixer
5: Database
6: Fixed camera
7: Speaker
8: wired / wireless communication device

Claims

A plurality of sound collectors for collecting sounds;
A voice controller for generating position data of a lecturer by combining the plurality of voice signals in consideration of a time difference between each of a plurality of voice signals transmitted from each of the plurality of sound collectors and a receiving intensity of each of the plurality of voice signals;
A mobile camera for performing movement control corresponding to the position data;
A mixer for mixing an image transmitted from the mobile camera and a voice transmitted from the voice controller; And
And a database for storing moving images output from the mixer,
Wherein the voice controller comprises:
A time difference measurement unit for measuring a time difference of each of the plurality of audio signals transmitted from each of the plurality of sound collectors;
A position determination unit for determining the position of the lecturer from the time difference of each of the plurality of sound signals of each of the plurality of sound collector to generate the position data;
A time difference correction unit for correcting the time difference of each of the plurality of audio signals; And
And a synthesizer for outputting voice data obtained by synthesizing the plurality of voice signals corrected by the time difference corrector,
Wherein the time difference measuring unit determines whether or not the sound source intensity has a sound source strength higher than a set value and measures a first output energy including a delay function for each voice signal frame of each of the plurality of voice signals having a sound source strength higher than the set value, Determining a cross-correlation between the plurality of sound pickups based on the output energy,
Wherein the position determining unit selects the sound source direction candidate group based on the cross correlation and selects a predetermined number of candidate directions based on the sampling frequency and the interval between the plurality of sound trackers for the sound source direction candidate group, The second output energy is calculated using the cross correlation for each of the candidate directions, and the direction in which the second output energy is the largest among the candidate directions is determined as the position data.

The method according to claim 1,
And a fixed camera for photographing a predetermined space,
Wherein the mixer mixes the video transmitted from the mobile camera and the fixed camera with the voice transmitted from the audio controller.

The method according to claim 1,
And a speaker is connected to the audio controller.

The method according to claim 1,
And a wired / wireless communication device for transmitting a moving picture to the mixer is connected to the mixer.

delete

5. The method according to any one of claims 1 to 4,
Wherein each of the plurality of sound collectors comprises :
A first collecting part having a curvature in a first direction and collecting sound into the first collecting part;
A second collecting part having a curvature in the first direction and guiding and collecting sound in a gap separated from the first collecting part; And
And a sound processor for performing sound processing on the sound collected using a plurality of microphones.

Generating a plurality of audio signals from the audio collected using a plurality of sound pickup units;
Generating position data of a lecturer by combining the plurality of voice signals based on a time difference for each of the plurality of voice signals and a reception intensity of the plurality of voice signals in a voice controller;
Moving the camera corresponding to the position data; And
Mixing the video data transmitted from the camera with the plurality of audio signals transmitted from the audio controller using a mixer,
Wherein the voice controller comprises:
A time difference measurement unit for measuring a time difference of each of the plurality of audio signals transmitted from each of the plurality of sound collectors;
A position determination unit for determining the position of the lecturer from the time difference of each of the plurality of sound signals of each of the plurality of sound collector to generate the position data;
A time difference correction unit for correcting the time difference of each of the plurality of audio signals; And
And a synthesizer for outputting voice data obtained by synthesizing the plurality of voice signals corrected by the time difference corrector,
Wherein the time difference measuring unit determines whether or not the sound source intensity has a sound source strength higher than a set value and measures a first output energy including a delay function for each voice signal frame of each of the plurality of voice signals having a sound source strength higher than the set value, Determining a cross-correlation between the plurality of sound pickups based on the output energy,
Wherein the position determining unit selects the sound source direction candidate group based on the cross correlation and selects a predetermined number of candidate directions based on the sampling frequency and the interval between the plurality of sound trackers for the sound source direction candidate group, The second output energy is calculated using the cross correlation for each of the candidate directions, and the direction in which the second output energy is the largest among the candidate directions is determined as the position data.

8. The method of claim 7,
The audio data is the time difference by way content generated video lecture to combine the speech corresponding to the speech signal for each transmitted from the sound collector can suit.

8. The method of claim 7,
Wherein the audio controller outputs field sounds through connected speakers.

8. The method of claim 7,
And transmitting the moving picture in real time through the wired / wireless communication device connected to the mixer.