WO2013008869A1

WO2013008869A1 - Electronic device and data generation method

Info

Publication number: WO2013008869A1
Application number: PCT/JP2012/067757
Authority: WO
Inventors: 八木　健
Original assignee: 株式会社ニコン
Priority date: 2011-07-14
Filing date: 2012-07-11
Publication date: 2013-01-17
Also published as: JPWO2013008869A1

Abstract

An electronic device (1) is provided with an image analysis unit (105) for analyzing the motion of a subject in a moving image, and a generation unit (106) for generating audio data and vibration data that correspond to the motion of the subject analyzed by the image analysis unit (105), and generating data by temporally synchronizing the generated audio data and vibration data with image data relating to the moving image.

Description

Electronic device and data generation method

The present invention relates to an electronic device and a data generation method.
This application claims priority based on Japanese Patent Application No. 2011-155586 for which it applied on July 14, 2011, and uses the content here.

Currently, a technique for generating an audio signal corresponding to a video is known (for example, see Patent Document 1). In the technique described in Patent Document 1, color information of a predetermined detection area in a video is detected, and an audio signal is generated using a predetermined tone color corresponding to the color information.

JP 2001-92478 A

However, the technique described in Patent Document 1 has a problem that only an audio signal is generated using a timbre corresponding to color information in an image, and movement of a subject in the image is not taken into consideration.

An object of an aspect of the present invention is to provide an electronic device and a data generation method that can generate audio data and vibration data according to the movement of a subject in a moving image.

One embodiment of the present invention includes an analysis unit that analyzes the movement of a subject in a moving image, generation of audio data according to the movement of the subject analyzed by the analysis unit, and generation of vibration data according to the generated audio data An electronic device comprising: a generation unit that performs
Another aspect of the present invention is an analysis unit that analyzes the movement of a subject in a moving image, generation of audio data according to the movement of the subject analyzed by the analysis unit, and the movement of the subject analyzed by the analysis unit An electronic device comprising: a generation unit that generates vibration data according to

According to the aspect of the present invention, it is possible to generate audio data and vibration data corresponding to the movement of the subject in the moving image.

It is a block diagram which shows the structure of the electronic device by 1st Embodiment. It is a figure for demonstrating the production | generation method of the multimedia data by 1st Embodiment. It is a flowchart which shows the procedure of the multimedia data generation process by 1st Embodiment. It is a flowchart which shows the procedure of the multimedia data generation process by 2nd Embodiment. It is a block diagram which shows the structure of the electronic device by 3rd Embodiment. It is a flowchart which shows the procedure of the multimedia data generation process by 3rd Embodiment.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

[First Embodiment]
FIG. 1 is a block diagram showing a configuration of an electronic device 1 according to the first embodiment of the present invention.
The electronic device 1 is a portable information terminal such as a mobile phone, a smartphone, or a digital camera. The electronic device 1 includes a control unit 101, an imaging unit 102, a microphone 103, a data storage unit 104, an image analysis unit 105, a generation unit 106, a library (storage unit) 107, an audio output unit 108, A vibration unit 109 and a display unit 110 are included.

The imaging unit 102 images a subject and generates image data. For example, the imaging unit 102 outputs image data of a captured still image in response to a still image shooting operation. In addition, the imaging unit 102 outputs image data of moving images continuously captured at a predetermined interval in accordance with the moving image shooting operation. Then, still image data and moving image data captured by the imaging unit 102 are recorded in the data storage unit 104 under the control of the control unit 101. The imaging unit 102 outputs image data obtained continuously at a predetermined interval as through image data (through image) in a shooting standby state in which no shooting operation is performed. The through image data obtained by the imaging unit 102 is displayed on the display unit 110 under the control of the control unit 101.
The microphone 103 collects sound and generates sound data corresponding to the collected sound.

The data storage unit 104 stores moving image data, moving image audio data, multimedia data, and the like. The moving image audio data is data including moving image data and audio data that is temporally synchronized with the moving image data. Multimedia data is data including moving image data, audio data that is temporally synchronized with the moving image image data, and vibration data that is temporally synchronized with the moving image data. is there.

The control unit 101 controls each part of the electronic device 1 in an integrated manner. For example, the control unit 101 generates moving image audio data by temporally synchronizing the moving image data generated by the imaging unit 102 and the audio data collected by the microphone 103, and the generated moving image audio data is stored in the data storage unit 104. Write to. Further, the control unit 101 controls the image analysis unit 105 and the generation unit 106 to generate audio data and vibration data corresponding to the moving image data, and generates the audio data and vibration data generated in the moving image data. The multimedia data is generated by synchronizing them with each other in time. Further, the control unit 101 reads the multimedia data from the data storage unit 104, and controls the display unit 110, the audio output unit 108, and the vibration unit 109 to reproduce the read multimedia data.

The image analysis unit 105 analyzes the movement of the subject in the moving image data, and outputs the analyzed movement of the subject to the generation unit 106.
The library 107 is a storage unit that stores audio element data corresponding to each movement of the subject.
The generation unit 106 generates audio data corresponding to the movement of the subject analyzed by the image analysis unit 105. Specifically, the generation unit 106 reads audio element data corresponding to the movement of the subject from the library 107, and generates audio data based on the read audio element data. The generation unit 106 generates vibration data corresponding to the generated audio data. Specifically, the generation unit 106 converts audio data into vibration data using a predetermined conversion formula. For example, the generation unit 106 generates vibration data so that vibration is generated when the vibration of the sound is greater than a predetermined value on the time axis of the sound data. Then, the generation unit 106 generates multimedia data by temporally synchronizing the image data of the moving image, the generated audio data, and the generated vibration data, and the generated multimedia data is stored in the data storage unit 104. Write.

The display unit 110 is a display such as a liquid crystal display. The display unit 110 displays image data.
The audio output unit 108 outputs audio corresponding to the audio data. The audio output unit 108 includes a codec that converts digital audio data into analog, and a speaker that outputs the converted analog audio data.
The vibration unit 109 generates vibration according to the vibration data. The vibration unit 109 includes a vibration signal generation unit that vibrates a vibration device based on vibration data, and a vibration device that generates vibration such as a linear vibration actuator.

Next, processing for generating multimedia data from moving image data will be described with reference to FIGS. FIG. 2 is a diagram for explaining a multimedia data generation method according to the present embodiment. FIG. 3 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
FIG. 2 shows a moving image of the main subject (person) T running. Hereinafter, a moving image in which the main subject T is running will be described as an example.
The control unit 101 reads moving image data from the data storage unit 104 and outputs the read moving image data to the image analysis unit 105 to instruct generation of multimedia data.

First, the image analysis unit 105 extracts the main subject T in the moving image data (step S101). For example, the image analysis unit 105 extracts a person by pattern matching, and sets the extracted person as the main subject T. At this time, when a plurality of persons are extracted, the image analysis unit 105 sets a person close to the center of the image data as the main subject T, or sets a specific person as the main subject T by face recognition. When the main subject T is determined by face recognition, the electronic device 1 stores in advance data on the face of the person who is the main subject T. Alternatively, the image analysis unit 105 may use not only a person but also an object located near the center or an object that frequently appears in a moving image as the main subject T.

Next, the image analysis unit 105 analyzes the extracted movement of the main subject T (step S102). Specifically, the image analysis unit 105 performs pattern matching between a motion pattern (for example, running, jumping, etc.) stored in advance and the motion of the main subject T in a moving image, thereby moving the main subject's motion. Determine. In this example, the image analysis unit 105 determines that the main subject T is running. Further, the image analysis unit 105 extracts the timing at which the foot of the main subject T lands on the ground (time position in the moving image) by, for example, vector analysis of foot motion. When the image analysis unit 105 performs extraction by vector analysis, the timing at which the vector direction of the motion changes more than a predetermined value is set as the timing at which the foot reaches the ground.

Next, the generation unit 106 generates audio data corresponding to the movement of the main subject T (step S103). Specifically, first, the generation unit 106 reads out sound element data corresponding to the movement of the main subject T from the library 107, and generates sound data based on the read sound element data. In this example, the generation unit 106 reads the running footstep “t” from the library 107. Then, the generation unit 106 generates audio data so that the footstep “tap” sounds at the timing when the foot of the main subject T lands on the ground. Thereby, sound data of a sound “tattattata” that matches the movement of the main subject T in the moving image is generated.

Next, the production | generation part 106 produces | generates the vibration data according to the produced | generated audio | voice data (step S104). In this example, the generation unit 106 generates vibration data so as to vibrate at the timing when the footstep “t” is output.
Finally, the generation unit 106 generates multimedia data in time synchronization with the moving image data, the generated audio data, and the generated vibration data (step S105).

When the generated multimedia data is played back, the control unit 101 displays moving image data on the display unit 110, outputs audio data to the audio output unit 108, and outputs vibration data to the vibration unit 109. As a result, when the multimedia data is reproduced, the audio output unit 108 reproduces the footstep “Tattatta” in accordance with the movement of the main subject T displayed on the display unit 110, and the vibration unit 109 generates vibration. That is, at the timing when the foot of the main subject T lands on the ground on the display unit 110, the footstep “tap” is reproduced and the electronic device 1 vibrates.

In the above-described example, multimedia data is generated from the moving image data already stored in the data storage unit 104. However, the multimedia data is generated from the moving image data captured by the imaging unit 102. Also good. In this case, the control unit 101 sequentially outputs the image data of the moving image being imaged by the imaging unit 102 to the image analysis unit 105. Then, the control unit 101 controls the generation unit 106 to generate multimedia data of the moving image in response to an operation for ending the shooting of the moving image. Thereby, the user can acquire the multimedia data to which the vibration is added only by performing the moving image shooting operation.
In the present embodiment, the generation unit 106 converts the sound data into vibration data by a predetermined conversion formula. However, information related to vibration (for example, frequency, vibration amplitude, (Or vibration time, etc.) is stored in the library 107 in advance, information relating to vibration corresponding to the sound data (voice element data) is read from the library 107, and vibration data is generated based on the read information relating to vibration. Also good.

As described above, according to the present embodiment, the generation unit 106 generates audio data and vibration data corresponding to the movement of the subject analyzed by the image analysis unit 105, and adds the audio data and vibration data to the moving image data. Multimedia data is generated in time synchronization. Thereby, it is possible to generate audio data and vibration data corresponding to the movement of the subject in the moving image. Further, when the multimedia data is reproduced, sound is reproduced in accordance with the moving image and vibration is generated in the electronic device 1, so that the multimedia data can be viewed more enjoyably. For example, in the case of multimedia data in which vibration data is combined with memorized video / audio data, the three senses of video, sound, and vibration make the memories related to the video clearer than with video and sound alone. Can help you remember.

[Second Embodiment]
Next, a second embodiment of the present invention will be described.
The library 107 according to the present embodiment stores data relating to vibration corresponding to each movement of the subject. The data related to vibration includes, for example, frequency, amplitude, vibration time, and the like. Further, the generation unit 106 generates vibration data according to the movement of the subject. Specifically, the generation unit 106 reads data related to vibration according to the movement of the subject from the library 107, and generates vibration data based on the read data related to vibration.
Since other configurations are the same as those of the first embodiment, description thereof is omitted.

Next, a process for generating multimedia data from moving image / sound data will be described with reference to FIG. FIG. 4 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
The control unit 101 reads moving image / sound data from the data storage unit 104 and outputs the read moving image / sound data to the image analysis unit 105 to instruct generation of multimedia data.

First, the image analysis unit 105 extracts the main subject T in the moving image data included in the moving image audio data (step S201).
Next, the image analysis unit 105 analyzes the movement of the extracted main subject T (step S202). In this example, the image analysis unit 105 determines that the main subject T is running. Further, the image analysis unit 105 extracts the timing at which the foot of the main subject T lands on the ground (time position in the moving image) by, for example, vector analysis of foot motion.

Next, the generation unit 106 generates vibration data according to the movement of the main subject T (step S203). Specifically, the control unit 106 reads data related to vibration according to the movement of the main subject T, and generates vibration data based on the read data related to vibration. In this example, the generation unit 106 generates vibration data so that vibration according to vibration-related data (for example, frequency, amplitude, vibration time, etc.) is generated at the timing when the foot of the main subject T lands on the ground. .
Alternatively, vibration data can be generated based on data obtained by vector analysis. In this case, it is conceivable that vibration data is generated so as to vibrate when the motion vector direction changes more than a predetermined value.
Finally, the generation unit 106 generates multimedia data by temporally synchronizing the vibration data generated to the moving image audio data (step S204).

When this multimedia data is reproduced, the vibration unit 109 generates vibration in accordance with the movement of the main subject T displayed on the display unit 110. That is, vibration occurs in the electronic device 1 at the timing when the foot of the main subject T lands on the ground in the display unit 110.

Thus, according to the present embodiment, since the generation unit 106 generates vibration data based on the analysis result of the image analysis unit 105, it is possible to generate vibrations more in line with the movement of the subject.

[Third Embodiment]
Next, a third embodiment of the present invention will be described.
FIG. 5 is a block diagram illustrating the configuration of the electronic apparatus 2 according to the present embodiment.
The electronic device 2 according to the present embodiment includes a voice extraction unit 211 in addition to the configuration of the electronic device 1 shown in FIG.
The sound extraction unit 211 extracts sound corresponding to the movement of the main subject T from the sound data based on the analysis result of the moving image data by the image analysis unit 205, and information about the extracted sound (for example, the sound data in the sound data) The time position of the sound is output to the generation unit 206.
The generation unit 206 newly generates audio data by increasing the volume of the audio extracted by the audio extraction unit 211 in the audio data to at least a predetermined amount set in advance. Next, the generation unit 206 generates vibration data according to the generated audio data. Specifically, the generation unit 206 generates vibration data so that the sound volume vibrates at a time position where the volume is larger than a predetermined amount. Alternatively, the audio data may be converted into vibration data by a predetermined conversion formula. Then, the generation unit 206 generates multimedia data by temporally synchronizing the image data of the moving image, the generated audio data, and the generated vibration data.
Since other configurations are the same as those of the first embodiment, description thereof is omitted.

Next, processing for generating multimedia data from moving image / sound data will be described with reference to FIG. FIG. 6 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
The control unit 201 reads moving image / sound data from the data storage unit 204 and outputs the read moving image / sound data to the image analysis unit 205 and the sound extraction unit 211 to instruct generation of multimedia data.

First, the image analysis unit 205 extracts the main subject T in the moving image data included in the moving image audio data (step S301).
Next, the image analysis unit 205 analyzes the movement of the extracted main subject T (step S302). In this example, the image analysis unit 205 determines that the main subject T is running. Further, the image analysis unit 205 extracts the timing (position (time) in the moving image) at which the foot of the main subject T lands on the ground by, for example, vector analysis of foot motion.

Next, the sound extraction unit 211 extracts sound corresponding to the movement of the main subject T from the sound data included in the moving image sound data (step S303). In this example, the sound extraction unit 211 extracts a footstep sound of the main subject T by frequency analysis or the like based on, for example, temporal timing. When extracting by frequency analysis, the electronic device 2 stores in advance data relating to the frequency of footsteps. That is, the voice extraction unit 211 extracts a voice when the foot of the main subject T lands on the ground as a footstep from the voice data based on the analysis result by the image analysis unit 205.

Next, the generation unit 206 generates audio data based on the audio extracted by the audio extraction unit 211 (step S304). Specifically, the generation unit 206 increases the volume of the audio extracted by the audio extraction unit 211 in the audio data included in the moving image audio data. In this example, the generation unit 206 increases the volume of the sound (footstep) at the timing when the foot of the main subject T lands on the ground in the sound data included in the moving image sound data. That is, the generation unit 206 emphasizes footsteps in the audio data included in the moving image audio data.

Next, the generation unit 206 generates vibration data according to the generated audio data (step S305). In this example, the generation unit 206 generates vibration data so as to vibrate at the timing when the emphasized footsteps are output.
Finally, the generation unit 206 generates multimedia data by temporally synchronizing the generated audio data and the generated vibration data to the moving image data included in the moving image audio data (step S306).

When the multimedia data is reproduced, the audio output unit 208 reproduces the footsteps emphasized in accordance with the movement of the main subject T displayed on the display unit 210, and the vibration unit 209 generates vibration. That is, at the timing when the foot of the main subject T lands on the ground on the display unit 210, the emphasized footsteps are reproduced and the electronic device 2 vibrates.

Thus, according to the present embodiment, since the generation unit 206 emphasizes the voice according to the movement of the subject, it can express the movement of the subject with the voice.

In addition, by recording a program for realizing each step shown in FIGS. 3, 4, and 6 on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium, Multimedia data generation processing may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices.
“Computer-readable recording medium” means a floppy (registered trademark) disk, a magneto-optical disk, an SD card, a writable non-volatile memory such as a flash memory, a portable medium such as a CD-ROM, and a computer system. A built-in storage device such as a hard disk.

Further, the “computer-readable recording medium” refers to a volatile memory (for example, DRAM (Dynamic DRAM)) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

The embodiments of the present invention have been described in detail above with reference to the drawings. However, the specific configuration is not limited to the above-described one, and various design changes and the like can be made without departing from the scope of the present invention. Is possible.
For example, in the present embodiment, a moving image of a person is described as an example. However, the present invention is not limited to this, and the main subject is added to a moving image other than a person, such as adding sound data of a flapping sound to a moving image of a bird. On the other hand, sound data or vibration data may be generated. Also, audio data or vibration data may be generated for a moving image such as an animation.
In this embodiment, sound data or vibration data for a moving image is generated. However, sound data or vibration data for a still image may be generated. In this case, the image analysis unit 105 (or 205) analyzes the movement of the subject from the still image by pattern matching or the like. For example, in the case of a still image of an athletic meet, voices corresponding to the athletic meet (running sounds, cheers, marches, etc.) are combined with the still image.

1, 2, ...

Electronic devices

101, 201 ...

Control part

102, 202 ...

Imaging part

103, 203 ...

Microphone

104, 204 ...

Data storage part

105, 205 ...

Image analysis part

106, 206 ...

Generation part

107, 207 ... Library 108, 208:

Voice output unit

109, 209 ...

Vibration unit

110, 210 ... Display unit 211 ... Voice extraction unit

Claims

An analysis unit for analyzing the movement of the subject in the video;
A generation unit that generates audio data according to the movement of the subject analyzed by the analysis unit, and generates vibration data according to the generated audio data;
An electronic device comprising:
A storage unit for storing audio element data corresponding to each movement of the subject;
The said generation part reads the audio | voice element data according to the motion of the to-be-analyzed analyzed by the said analysis part from the said memory | storage part, The said audio | voice data are produced | generated based on the read audio | voice element data. The electronic device as described in.
In the audio data that is temporally synchronized with the moving image, the audio data is provided with an extraction unit that extracts audio corresponding to the movement of the subject analyzed by the analysis unit,
The electronic device according to claim 1, wherein the generation unit emphasizes the voice extracted by the extraction unit in the voice data.
An analysis unit for analyzing the movement of the subject in the video;
A generation unit that generates vibration data according to the movement of the subject analyzed by the analysis unit;
An electronic device comprising:
The electronic device according to claim 4, wherein the generation unit generates audio data corresponding to the movement of the subject analyzed by the analysis unit.
A first storage unit that stores audio element data corresponding to each movement of the subject;
The generation unit reads audio element data corresponding to the movement of the subject analyzed by the analysis unit from the first storage unit, and generates the audio data based on the read audio element data. The electronic device according to claim 5.
In the audio data that is temporally synchronized with the moving image, the audio data is provided with an extraction unit that extracts audio corresponding to the movement of the subject analyzed by the analysis unit,
The electronic device according to claim 4, wherein the generation unit emphasizes the voice extracted by the extraction unit in the voice data.
A second storage unit that stores information related to vibration according to each movement of the subject;
The said generation part reads the information regarding the vibration according to the motion of the to-be-photographed object analyzed by the said analysis part from the said 2nd memory | storage part, and produces | generates vibration data based on the read information regarding the vibration. Item 8. The electronic device according to any one of Items 4 to 7.
The electronic device according to claim 1, wherein the generation unit generates multimedia data in which audio data and vibration data are temporally synchronized with the moving image.
A display for displaying an image;
An audio output unit for outputting audio;
A vibration part that generates vibration;
The moving image included in the multimedia data is displayed on the display unit, the sound of the audio data included in the multimedia data is output to the sound output unit, and vibration based on the vibration data included in the multimedia data is generated. A reproducing unit for reproducing the data by causing the vibration unit to generate the data;
The electronic apparatus according to claim 9, further comprising:
An imaging unit for imaging a subject;
A control unit that controls the image analysis unit and the generation unit to generate data corresponding to the video when a moving image is captured in the imaging unit;
The electronic apparatus according to claim 1, further comprising:
The electronic apparatus according to claim 1, wherein the analysis unit analyzes the movement of the subject based on a motion vector of the subject.
The electronic device analyzes the movement of the subject in the video,
The electronic device generates sound data according to the analyzed movement of the subject and generates vibration data according to the generated sound data;
A data generation method characterized by comprising:
The electronic device analyzes the movement of the subject in the video,
The electronic device generates audio data according to the analyzed movement of the subject, and generates vibration data according to the movement of the subject analyzed by the analysis unit;
A data generation method characterized by comprising: