WO2013008869A1 - Electronic device and data generation method - Google Patents

Electronic device and data generation method Download PDF

Info

Publication number
WO2013008869A1
WO2013008869A1 PCT/JP2012/067757 JP2012067757W WO2013008869A1 WO 2013008869 A1 WO2013008869 A1 WO 2013008869A1 JP 2012067757 W JP2012067757 W JP 2012067757W WO 2013008869 A1 WO2013008869 A1 WO 2013008869A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unit
subject
audio
vibration
Prior art date
Application number
PCT/JP2012/067757
Other languages
French (fr)
Japanese (ja)
Inventor
八木 健
Original Assignee
株式会社ニコン
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ニコン filed Critical 株式会社ニコン
Publication of WO2013008869A1 publication Critical patent/WO2013008869A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/055Time compression or expansion for synchronising with other signals, e.g. video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/52Details of telephonic subscriber devices including functional features of a camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/765Interface circuits between an apparatus for recording and another apparatus
    • H04N5/77Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
    • H04N5/772Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera the recording apparatus and the television camera being placed in the same enclosure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/79Processing of colour television signals in connection with recording
    • H04N9/80Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback
    • H04N9/82Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only
    • H04N9/8205Transformation of the television signal for recording, e.g. modulation, frequency changing; Inverse transformation for playback the individual colour picture signal components being recorded simultaneously only involving the multiplexing of an additional signal and the colour video signal

Definitions

  • the present invention relates to an electronic device and a data generation method.
  • This application claims priority based on Japanese Patent Application No. 2011-155586 for which it applied on July 14, 2011, and uses the content here.
  • Patent Document 1 a technique for generating an audio signal corresponding to a video is known (for example, see Patent Document 1).
  • color information of a predetermined detection area in a video is detected, and an audio signal is generated using a predetermined tone color corresponding to the color information.
  • Patent Document 1 has a problem that only an audio signal is generated using a timbre corresponding to color information in an image, and movement of a subject in the image is not taken into consideration.
  • An object of an aspect of the present invention is to provide an electronic device and a data generation method that can generate audio data and vibration data according to the movement of a subject in a moving image.
  • One embodiment of the present invention includes an analysis unit that analyzes the movement of a subject in a moving image, generation of audio data according to the movement of the subject analyzed by the analysis unit, and generation of vibration data according to the generated audio data
  • An electronic device comprising: a generation unit that performs Another aspect of the present invention is an analysis unit that analyzes the movement of a subject in a moving image, generation of audio data according to the movement of the subject analyzed by the analysis unit, and the movement of the subject analyzed by the analysis unit
  • An electronic device comprising: a generation unit that generates vibration data according to
  • FIG. 1 is a block diagram showing a configuration of an electronic device 1 according to the first embodiment of the present invention.
  • the electronic device 1 is a portable information terminal such as a mobile phone, a smartphone, or a digital camera.
  • the electronic device 1 includes a control unit 101, an imaging unit 102, a microphone 103, a data storage unit 104, an image analysis unit 105, a generation unit 106, a library (storage unit) 107, an audio output unit 108, A vibration unit 109 and a display unit 110 are included.
  • the imaging unit 102 images a subject and generates image data. For example, the imaging unit 102 outputs image data of a captured still image in response to a still image shooting operation. In addition, the imaging unit 102 outputs image data of moving images continuously captured at a predetermined interval in accordance with the moving image shooting operation. Then, still image data and moving image data captured by the imaging unit 102 are recorded in the data storage unit 104 under the control of the control unit 101. The imaging unit 102 outputs image data obtained continuously at a predetermined interval as through image data (through image) in a shooting standby state in which no shooting operation is performed. The through image data obtained by the imaging unit 102 is displayed on the display unit 110 under the control of the control unit 101. The microphone 103 collects sound and generates sound data corresponding to the collected sound.
  • the data storage unit 104 stores moving image data, moving image audio data, multimedia data, and the like.
  • the moving image audio data is data including moving image data and audio data that is temporally synchronized with the moving image data.
  • Multimedia data is data including moving image data, audio data that is temporally synchronized with the moving image image data, and vibration data that is temporally synchronized with the moving image data. is there.
  • the control unit 101 controls each part of the electronic device 1 in an integrated manner. For example, the control unit 101 generates moving image audio data by temporally synchronizing the moving image data generated by the imaging unit 102 and the audio data collected by the microphone 103, and the generated moving image audio data is stored in the data storage unit 104. Write to. Further, the control unit 101 controls the image analysis unit 105 and the generation unit 106 to generate audio data and vibration data corresponding to the moving image data, and generates the audio data and vibration data generated in the moving image data. The multimedia data is generated by synchronizing them with each other in time. Further, the control unit 101 reads the multimedia data from the data storage unit 104, and controls the display unit 110, the audio output unit 108, and the vibration unit 109 to reproduce the read multimedia data.
  • the image analysis unit 105 analyzes the movement of the subject in the moving image data, and outputs the analyzed movement of the subject to the generation unit 106.
  • the library 107 is a storage unit that stores audio element data corresponding to each movement of the subject.
  • the generation unit 106 generates audio data corresponding to the movement of the subject analyzed by the image analysis unit 105. Specifically, the generation unit 106 reads audio element data corresponding to the movement of the subject from the library 107, and generates audio data based on the read audio element data.
  • the generation unit 106 generates vibration data corresponding to the generated audio data. Specifically, the generation unit 106 converts audio data into vibration data using a predetermined conversion formula.
  • the generation unit 106 generates vibration data so that vibration is generated when the vibration of the sound is greater than a predetermined value on the time axis of the sound data. Then, the generation unit 106 generates multimedia data by temporally synchronizing the image data of the moving image, the generated audio data, and the generated vibration data, and the generated multimedia data is stored in the data storage unit 104. Write.
  • the display unit 110 is a display such as a liquid crystal display.
  • the display unit 110 displays image data.
  • the audio output unit 108 outputs audio corresponding to the audio data.
  • the audio output unit 108 includes a codec that converts digital audio data into analog, and a speaker that outputs the converted analog audio data.
  • the vibration unit 109 generates vibration according to the vibration data.
  • the vibration unit 109 includes a vibration signal generation unit that vibrates a vibration device based on vibration data, and a vibration device that generates vibration such as a linear vibration actuator.
  • FIG. 2 is a diagram for explaining a multimedia data generation method according to the present embodiment.
  • FIG. 3 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
  • FIG. 2 shows a moving image of the main subject (person) T running.
  • the control unit 101 reads moving image data from the data storage unit 104 and outputs the read moving image data to the image analysis unit 105 to instruct generation of multimedia data.
  • the image analysis unit 105 extracts the main subject T in the moving image data (step S101). For example, the image analysis unit 105 extracts a person by pattern matching, and sets the extracted person as the main subject T. At this time, when a plurality of persons are extracted, the image analysis unit 105 sets a person close to the center of the image data as the main subject T, or sets a specific person as the main subject T by face recognition. When the main subject T is determined by face recognition, the electronic device 1 stores in advance data on the face of the person who is the main subject T. Alternatively, the image analysis unit 105 may use not only a person but also an object located near the center or an object that frequently appears in a moving image as the main subject T.
  • the image analysis unit 105 analyzes the extracted movement of the main subject T (step S102). Specifically, the image analysis unit 105 performs pattern matching between a motion pattern (for example, running, jumping, etc.) stored in advance and the motion of the main subject T in a moving image, thereby moving the main subject's motion. Determine. In this example, the image analysis unit 105 determines that the main subject T is running. Further, the image analysis unit 105 extracts the timing at which the foot of the main subject T lands on the ground (time position in the moving image) by, for example, vector analysis of foot motion. When the image analysis unit 105 performs extraction by vector analysis, the timing at which the vector direction of the motion changes more than a predetermined value is set as the timing at which the foot reaches the ground.
  • a motion pattern for example, running, jumping, etc.
  • the generation unit 106 generates audio data corresponding to the movement of the main subject T (step S103). Specifically, first, the generation unit 106 reads out sound element data corresponding to the movement of the main subject T from the library 107, and generates sound data based on the read sound element data. In this example, the generation unit 106 reads the running footstep “t” from the library 107. Then, the generation unit 106 generates audio data so that the footstep “tap” sounds at the timing when the foot of the main subject T lands on the ground. Thereby, sound data of a sound “tattattata” that matches the movement of the main subject T in the moving image is generated.
  • generation part 106 produces
  • the generation unit 106 generates vibration data so as to vibrate at the timing when the footstep “t” is output.
  • the generation unit 106 generates multimedia data in time synchronization with the moving image data, the generated audio data, and the generated vibration data (step S105).
  • the control unit 101 displays moving image data on the display unit 110, outputs audio data to the audio output unit 108, and outputs vibration data to the vibration unit 109.
  • the audio output unit 108 reproduces the footstep “Tattatta” in accordance with the movement of the main subject T displayed on the display unit 110, and the vibration unit 109 generates vibration. That is, at the timing when the foot of the main subject T lands on the ground on the display unit 110, the footstep “tap” is reproduced and the electronic device 1 vibrates.
  • multimedia data is generated from the moving image data already stored in the data storage unit 104.
  • the multimedia data is generated from the moving image data captured by the imaging unit 102.
  • the control unit 101 sequentially outputs the image data of the moving image being imaged by the imaging unit 102 to the image analysis unit 105.
  • the control unit 101 controls the generation unit 106 to generate multimedia data of the moving image in response to an operation for ending the shooting of the moving image.
  • the user can acquire the multimedia data to which the vibration is added only by performing the moving image shooting operation.
  • the generation unit 106 converts the sound data into vibration data by a predetermined conversion formula.
  • information related to vibration for example, frequency, vibration amplitude, (Or vibration time, etc.) is stored in the library 107 in advance, information relating to vibration corresponding to the sound data (voice element data) is read from the library 107, and vibration data is generated based on the read information relating to vibration. Also good.
  • the generation unit 106 generates audio data and vibration data corresponding to the movement of the subject analyzed by the image analysis unit 105, and adds the audio data and vibration data to the moving image data.
  • Multimedia data is generated in time synchronization. Thereby, it is possible to generate audio data and vibration data corresponding to the movement of the subject in the moving image. Further, when the multimedia data is reproduced, sound is reproduced in accordance with the moving image and vibration is generated in the electronic device 1, so that the multimedia data can be viewed more enjoyably.
  • the three senses of video, sound, and vibration make the memories related to the video clearer than with video and sound alone. Can help you remember.
  • the library 107 stores data relating to vibration corresponding to each movement of the subject.
  • the data related to vibration includes, for example, frequency, amplitude, vibration time, and the like.
  • the generation unit 106 generates vibration data according to the movement of the subject. Specifically, the generation unit 106 reads data related to vibration according to the movement of the subject from the library 107, and generates vibration data based on the read data related to vibration. Since other configurations are the same as those of the first embodiment, description thereof is omitted.
  • FIG. 4 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
  • the control unit 101 reads moving image / sound data from the data storage unit 104 and outputs the read moving image / sound data to the image analysis unit 105 to instruct generation of multimedia data.
  • the image analysis unit 105 extracts the main subject T in the moving image data included in the moving image audio data (step S201). Next, the image analysis unit 105 analyzes the movement of the extracted main subject T (step S202). In this example, the image analysis unit 105 determines that the main subject T is running. Further, the image analysis unit 105 extracts the timing at which the foot of the main subject T lands on the ground (time position in the moving image) by, for example, vector analysis of foot motion.
  • the generation unit 106 generates vibration data according to the movement of the main subject T (step S203). Specifically, the control unit 106 reads data related to vibration according to the movement of the main subject T, and generates vibration data based on the read data related to vibration. In this example, the generation unit 106 generates vibration data so that vibration according to vibration-related data (for example, frequency, amplitude, vibration time, etc.) is generated at the timing when the foot of the main subject T lands on the ground. . Alternatively, vibration data can be generated based on data obtained by vector analysis. In this case, it is conceivable that vibration data is generated so as to vibrate when the motion vector direction changes more than a predetermined value. Finally, the generation unit 106 generates multimedia data by temporally synchronizing the vibration data generated to the moving image audio data (step S204).
  • vibration-related data for example, frequency, amplitude, vibration time, etc.
  • the vibration unit 109 When this multimedia data is reproduced, the vibration unit 109 generates vibration in accordance with the movement of the main subject T displayed on the display unit 110. That is, vibration occurs in the electronic device 1 at the timing when the foot of the main subject T lands on the ground in the display unit 110.
  • the generation unit 106 since the generation unit 106 generates vibration data based on the analysis result of the image analysis unit 105, it is possible to generate vibrations more in line with the movement of the subject.
  • FIG. 5 is a block diagram illustrating the configuration of the electronic apparatus 2 according to the present embodiment.
  • the electronic device 2 according to the present embodiment includes a voice extraction unit 211 in addition to the configuration of the electronic device 1 shown in FIG.
  • the sound extraction unit 211 extracts sound corresponding to the movement of the main subject T from the sound data based on the analysis result of the moving image data by the image analysis unit 205, and information about the extracted sound (for example, the sound data in the sound data)
  • the time position of the sound is output to the generation unit 206.
  • the generation unit 206 newly generates audio data by increasing the volume of the audio extracted by the audio extraction unit 211 in the audio data to at least a predetermined amount set in advance.
  • the generation unit 206 generates vibration data according to the generated audio data. Specifically, the generation unit 206 generates vibration data so that the sound volume vibrates at a time position where the volume is larger than a predetermined amount.
  • the audio data may be converted into vibration data by a predetermined conversion formula.
  • the generation unit 206 generates multimedia data by temporally synchronizing the image data of the moving image, the generated audio data, and the generated vibration data. Since other configurations are the same as those of the first embodiment, description thereof is omitted.
  • FIG. 6 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
  • the control unit 201 reads moving image / sound data from the data storage unit 204 and outputs the read moving image / sound data to the image analysis unit 205 and the sound extraction unit 211 to instruct generation of multimedia data.
  • the image analysis unit 205 extracts the main subject T in the moving image data included in the moving image audio data (step S301).
  • the image analysis unit 205 analyzes the movement of the extracted main subject T (step S302). In this example, the image analysis unit 205 determines that the main subject T is running. Further, the image analysis unit 205 extracts the timing (position (time) in the moving image) at which the foot of the main subject T lands on the ground by, for example, vector analysis of foot motion.
  • the sound extraction unit 211 extracts sound corresponding to the movement of the main subject T from the sound data included in the moving image sound data (step S303).
  • the sound extraction unit 211 extracts a footstep sound of the main subject T by frequency analysis or the like based on, for example, temporal timing.
  • the electronic device 2 stores in advance data relating to the frequency of footsteps. That is, the voice extraction unit 211 extracts a voice when the foot of the main subject T lands on the ground as a footstep from the voice data based on the analysis result by the image analysis unit 205.
  • the generation unit 206 generates audio data based on the audio extracted by the audio extraction unit 211 (step S304). Specifically, the generation unit 206 increases the volume of the audio extracted by the audio extraction unit 211 in the audio data included in the moving image audio data. In this example, the generation unit 206 increases the volume of the sound (footstep) at the timing when the foot of the main subject T lands on the ground in the sound data included in the moving image sound data. That is, the generation unit 206 emphasizes footsteps in the audio data included in the moving image audio data.
  • the generation unit 206 generates vibration data according to the generated audio data (step S305).
  • the generation unit 206 generates vibration data so as to vibrate at the timing when the emphasized footsteps are output.
  • the generation unit 206 generates multimedia data by temporally synchronizing the generated audio data and the generated vibration data to the moving image data included in the moving image audio data (step S306).
  • the audio output unit 208 reproduces the footsteps emphasized in accordance with the movement of the main subject T displayed on the display unit 210, and the vibration unit 209 generates vibration. That is, at the timing when the foot of the main subject T lands on the ground on the display unit 210, the emphasized footsteps are reproduced and the electronic device 2 vibrates.
  • the generation unit 206 since the generation unit 206 emphasizes the voice according to the movement of the subject, it can express the movement of the subject with the voice.
  • Multimedia data generation processing may be performed.
  • the “computer system” may include an OS and hardware such as peripheral devices.
  • “Computer-readable recording medium” means a floppy (registered trademark) disk, a magneto-optical disk, an SD card, a writable non-volatile memory such as a flash memory, a portable medium such as a CD-ROM, and a computer system.
  • a built-in storage device such as a hard disk.
  • the “computer-readable recording medium” refers to a volatile memory (for example, DRAM (Dynamic DRAM)) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
  • the program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium.
  • the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
  • the program may be for realizing a part of the functions described above. Furthermore, what can implement
  • a moving image of a person is described as an example.
  • the present invention is not limited to this, and the main subject is added to a moving image other than a person, such as adding sound data of a flapping sound to a moving image of a bird.
  • sound data or vibration data may be generated.
  • audio data or vibration data may be generated for a moving image such as an animation. In this embodiment, sound data or vibration data for a moving image is generated.
  • the image analysis unit 105 analyzes the movement of the subject from the still image by pattern matching or the like. For example, in the case of a still image of an athletic meet, voices corresponding to the athletic meet (running sounds, cheers, marches, etc.) are combined with the still image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

An electronic device (1) is provided with an image analysis unit (105) for analyzing the motion of a subject in a moving image, and a generation unit (106) for generating audio data and vibration data that correspond to the motion of the subject analyzed by the image analysis unit (105), and generating data by temporally synchronizing the generated audio data and vibration data with image data relating to the moving image.

Description

電子機器及びデータ生成方法Electronic device and data generation method
 本発明は、電子機器及びデータ生成方法に関する。
 本願は、2011年7月14日に出願された特願2011-155586号に基づき優先権を主張し、その内容をここに援用する。
The present invention relates to an electronic device and a data generation method.
This application claims priority based on Japanese Patent Application No. 2011-155586 for which it applied on July 14, 2011, and uses the content here.
 現在、映像に応じた音声信号を生成する技術が知られている(例えば、特許文献1参照)。特許文献1に記載された技術では、映像における所定の検出領域の色情報を検出し、当該色情報に対応する所定の音色を用いて音声信号を生成している。 Currently, a technique for generating an audio signal corresponding to a video is known (for example, see Patent Document 1). In the technique described in Patent Document 1, color information of a predetermined detection area in a video is detected, and an audio signal is generated using a predetermined tone color corresponding to the color information.
特開2001-92478号JP 2001-92478 A
 しかしながら、特許文献1に記載された技術では、映像における色情報に対応する音色を用いて音声信号のみを生成しており、映像における被写体の動きについては考慮されていない、という問題がある。 However, the technique described in Patent Document 1 has a problem that only an audio signal is generated using a timbre corresponding to color information in an image, and movement of a subject in the image is not taken into consideration.
 本発明の態様は、動画における被写体の動きに応じた音声データと振動データとを生成することができる電子機器及びデータ生成方法を提供することを目的とする。 An object of an aspect of the present invention is to provide an electronic device and a data generation method that can generate audio data and vibration data according to the movement of a subject in a moving image.
 本発明の一態様は、動画における被写体の動きを解析する解析部と、前記解析部により解析された被写体の動きに応じた音声データの生成と、前記生成した音声データに応じた振動データの生成とを行なう生成部と、を備えることを特徴とする電子機器である。
 本発明の他の一態様は、動画における被写体の動きを解析する解析部と、前記解析部により解析された被写体の動きに応じた音声データの生成と、前記解析部により解析された被写体の動きに応じた振動データの生成とを行なう生成部と、を備えることを特徴とする電子機器である。
One embodiment of the present invention includes an analysis unit that analyzes the movement of a subject in a moving image, generation of audio data according to the movement of the subject analyzed by the analysis unit, and generation of vibration data according to the generated audio data An electronic device comprising: a generation unit that performs
Another aspect of the present invention is an analysis unit that analyzes the movement of a subject in a moving image, generation of audio data according to the movement of the subject analyzed by the analysis unit, and the movement of the subject analyzed by the analysis unit An electronic device comprising: a generation unit that generates vibration data according to
 本発明の態様によれば、動画における被写体の動きに応じた音声データと振動データとを生成することができる。 According to the aspect of the present invention, it is possible to generate audio data and vibration data corresponding to the movement of the subject in the moving image.
第1の実施形態による電子機器の構成を示すブロック図である。It is a block diagram which shows the structure of the electronic device by 1st Embodiment. 第1の実施形態によるマルチメディアデータの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of the multimedia data by 1st Embodiment. 第1の実施形態によるマルチメディアデータ生成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the multimedia data generation process by 1st Embodiment. 第2の実施形態によるマルチメディアデータ生成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the multimedia data generation process by 2nd Embodiment. 第3の実施形態による電子機器の構成を示すブロック図である。It is a block diagram which shows the structure of the electronic device by 3rd Embodiment. 第3の実施形態によるマルチメディアデータ生成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the multimedia data generation process by 3rd Embodiment.
 以下、図面を参照しながら本発明の実施形態について詳しく説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[第1の実施形態]
 図1は、本発明の第1の実施形態による電子機器1の構成を示すブロック図である。
 電子機器1は、携帯電話機、スマートフォン、デジタルカメラ等の携帯情報端末である。電子機器1は、制御部101と、撮像部102と、マイク103と、データ記憶部104と、画像解析部105と、生成部106と、ライブラリ(記憶部)107と、音声出力部108と、振動部109と、表示部110とを含む。
[First Embodiment]
FIG. 1 is a block diagram showing a configuration of an electronic device 1 according to the first embodiment of the present invention.
The electronic device 1 is a portable information terminal such as a mobile phone, a smartphone, or a digital camera. The electronic device 1 includes a control unit 101, an imaging unit 102, a microphone 103, a data storage unit 104, an image analysis unit 105, a generation unit 106, a library (storage unit) 107, an audio output unit 108, A vibration unit 109 and a display unit 110 are included.
 撮像部102は、被写体を撮像して画像データを生成する。例えば、撮像部102は、静止画撮影操作に応じて、撮像した静止画の画像データを出力する。また、撮像部102は、動画撮影操作に応じて、所定の間隔で連続的に撮像した動画の画像データを出力する。そして、撮像部102によって撮像された静止画の画像データ及び動画の画像データは、制御部101の制御により、データ記憶部104に記録される。また、撮像部102は、撮影操作がされていない撮影待機状態の場合、所定の間隔で連続的に得られる画像データをスルー画像データ(スルー画)として出力する。そして、撮像部102によって得られたスルー画像データは、制御部101の制御により、表示部110に表示される。
 マイク103は、音声を収音し、収音した音声に応じた音声データを生成する。
The imaging unit 102 images a subject and generates image data. For example, the imaging unit 102 outputs image data of a captured still image in response to a still image shooting operation. In addition, the imaging unit 102 outputs image data of moving images continuously captured at a predetermined interval in accordance with the moving image shooting operation. Then, still image data and moving image data captured by the imaging unit 102 are recorded in the data storage unit 104 under the control of the control unit 101. The imaging unit 102 outputs image data obtained continuously at a predetermined interval as through image data (through image) in a shooting standby state in which no shooting operation is performed. The through image data obtained by the imaging unit 102 is displayed on the display unit 110 under the control of the control unit 101.
The microphone 103 collects sound and generates sound data corresponding to the collected sound.
 データ記憶部104は、動画の画像データ、動画音声データ、及び、マルチメディアデータ等を記憶する。動画音声データは、動画の画像データと、当動画の画像データと時間的に同期している音声データとを含むデータである。また、マルチメディアデータは、動画の画像データと、当動画の画像データと時間的に同期している音声データと、当動画の画像データと時間的に同期している振動データとを含むデータである。 The data storage unit 104 stores moving image data, moving image audio data, multimedia data, and the like. The moving image audio data is data including moving image data and audio data that is temporally synchronized with the moving image data. Multimedia data is data including moving image data, audio data that is temporally synchronized with the moving image image data, and vibration data that is temporally synchronized with the moving image data. is there.
 制御部101は、電子機器1の各部を統括して制御する。例えば、制御部101は、撮像部102が生成した動画の画像データとマイク103が収集した音声データとを時間的に同期させて動画音声データを生成し、生成した動画音声データをデータ記憶部104に書き込む。また、制御部101は、画像解析部105及び生成部106を制御することにより、動画の画像データに対応する音声データ及び振動データを生成し、動画の画像データに生成した音声データと振動データとを時間的に同期させることによりマルチメディアデータを生成する。また、制御部101は、データ記憶部104からマルチメディアデータを読み出し、表示部110、音声出力部108、及び振動部109を制御して読み出したマルチメディアデータを再生する。 The control unit 101 controls each part of the electronic device 1 in an integrated manner. For example, the control unit 101 generates moving image audio data by temporally synchronizing the moving image data generated by the imaging unit 102 and the audio data collected by the microphone 103, and the generated moving image audio data is stored in the data storage unit 104. Write to. Further, the control unit 101 controls the image analysis unit 105 and the generation unit 106 to generate audio data and vibration data corresponding to the moving image data, and generates the audio data and vibration data generated in the moving image data. The multimedia data is generated by synchronizing them with each other in time. Further, the control unit 101 reads the multimedia data from the data storage unit 104, and controls the display unit 110, the audio output unit 108, and the vibration unit 109 to reproduce the read multimedia data.
 画像解析部105は、動画の画像データにおける被写体の動きを解析し、解析した被写体の動きを生成部106に出力する。
 ライブラリ107は、被写体の動きそれぞれに応じた音声要素データを記憶する記憶部である。
 生成部106は、画像解析部105により解析された被写体の動きに応じた音声データを生成する。具体的には、生成部106は、被写体の動きに応じた音声要素データをライブラリ107から読み出し、読み出した音声要素データに基づいて音声データを生成する。また、生成部106は、生成した音声データに応じた振動データを生成する。具体的には、生成部106は、所定の変換式を用いて音声データを振動データに変換する。例えば、生成部106は、音声データの時間軸において音声の振動が所定値より大きい場合に振動が発生するように振動データを生成する。そして、生成部106は、当動画の画像データと、生成した音声データと、生成した振動データとを時間的に同期させてマルチメディアデータを生成し、生成したマルチメディアデータをデータ記憶部104に書き込む。
The image analysis unit 105 analyzes the movement of the subject in the moving image data, and outputs the analyzed movement of the subject to the generation unit 106.
The library 107 is a storage unit that stores audio element data corresponding to each movement of the subject.
The generation unit 106 generates audio data corresponding to the movement of the subject analyzed by the image analysis unit 105. Specifically, the generation unit 106 reads audio element data corresponding to the movement of the subject from the library 107, and generates audio data based on the read audio element data. The generation unit 106 generates vibration data corresponding to the generated audio data. Specifically, the generation unit 106 converts audio data into vibration data using a predetermined conversion formula. For example, the generation unit 106 generates vibration data so that vibration is generated when the vibration of the sound is greater than a predetermined value on the time axis of the sound data. Then, the generation unit 106 generates multimedia data by temporally synchronizing the image data of the moving image, the generated audio data, and the generated vibration data, and the generated multimedia data is stored in the data storage unit 104. Write.
 表示部110は、例えば液晶表示器等のディスプレイである。表示部110は、画像データを表示する。
 音声出力部108は、音声データに応じた音声を出力する。音声出力部108は、デジタルの音声データをアナログに変換するコーデックと、変換されたアナログの音声データを出力するスピーカとを備える。
 振動部109は、振動データに応じた振動を発生する。振動部109は、振動データに基づいて振動デバイスを振動させる振動信号発生部と、リニア振動アクチュエータ等の振動を発生する振動デバイスとを備える。
The display unit 110 is a display such as a liquid crystal display. The display unit 110 displays image data.
The audio output unit 108 outputs audio corresponding to the audio data. The audio output unit 108 includes a codec that converts digital audio data into analog, and a speaker that outputs the converted analog audio data.
The vibration unit 109 generates vibration according to the vibration data. The vibration unit 109 includes a vibration signal generation unit that vibrates a vibration device based on vibration data, and a vibration device that generates vibration such as a linear vibration actuator.
 次に、図2及び図3を参照して、動画の画像データからマルチメディアデータを生成する処理について説明する。図2は、本実施形態によるマルチメディアデータの生成方法を説明するための図である。図3は、本実施形態によるマルチメディアデータ生成処理の手順を示すフローチャートである。
 図2は、主被写体(人物)Tが走っている動画を示す。以下、主被写体Tが走っている動画を例に説明する。
 制御部101は、データ記憶部104から動画の画像データを読み出し、読み出した動画の画像データを画像解析部105に出力することによりマルチメディアデータの生成を指示する。
Next, processing for generating multimedia data from moving image data will be described with reference to FIGS. FIG. 2 is a diagram for explaining a multimedia data generation method according to the present embodiment. FIG. 3 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
FIG. 2 shows a moving image of the main subject (person) T running. Hereinafter, a moving image in which the main subject T is running will be described as an example.
The control unit 101 reads moving image data from the data storage unit 104 and outputs the read moving image data to the image analysis unit 105 to instruct generation of multimedia data.
 まず、画像解析部105が、動画の画像データにおける主被写体Tを抽出する(ステップS101)。例えば、画像解析部105は、パターンマッチングにより人物を抽出し、抽出した人物を主被写体Tとする。このとき、画像解析部105は、複数の人物が抽出された場合には、画像データの中心部に近い人物を主被写体Tとしたり、顔認識により特定の人物を主被写体Tとする。顔認識により主被写体Tを決定する場合、電子機器1は、主被写体Tとする人物の顔のデータを予め記憶している。或いは、画像解析部105は、人物に限らず、中心部の近くに位置するオブジェクト、又は、動画中に多く出現するオブジェクトを主被写体Tとしてもよい。 First, the image analysis unit 105 extracts the main subject T in the moving image data (step S101). For example, the image analysis unit 105 extracts a person by pattern matching, and sets the extracted person as the main subject T. At this time, when a plurality of persons are extracted, the image analysis unit 105 sets a person close to the center of the image data as the main subject T, or sets a specific person as the main subject T by face recognition. When the main subject T is determined by face recognition, the electronic device 1 stores in advance data on the face of the person who is the main subject T. Alternatively, the image analysis unit 105 may use not only a person but also an object located near the center or an object that frequently appears in a moving image as the main subject T.
 次に、画像解析部105は、抽出した主被写体Tの動きを解析する(ステップS102)。具体的には、画像解析部105は、予め記憶してある動きのパターン(例えば、走っている、飛び跳ねている等)と動画における主被写体Tの動きをパターンマッチングすることにより、主被写体の動きを判定する。本例では、画像解析部105は、主被写体Tが走っていると判定する。また、画像解析部105は、主被写体Tの足が地面に着地するタイミング(動画における時間位置)を、例えば足の動きのベクトル解析により抽出する。画像解析部105は、ベクトル解析により抽出する場合、動きのベクトル方向が所定値より大きく変化したタイミングを足が地面に着地するタイミングとする。 Next, the image analysis unit 105 analyzes the extracted movement of the main subject T (step S102). Specifically, the image analysis unit 105 performs pattern matching between a motion pattern (for example, running, jumping, etc.) stored in advance and the motion of the main subject T in a moving image, thereby moving the main subject's motion. Determine. In this example, the image analysis unit 105 determines that the main subject T is running. Further, the image analysis unit 105 extracts the timing at which the foot of the main subject T lands on the ground (time position in the moving image) by, for example, vector analysis of foot motion. When the image analysis unit 105 performs extraction by vector analysis, the timing at which the vector direction of the motion changes more than a predetermined value is set as the timing at which the foot reaches the ground.
 次に、生成部106が、主被写体Tの動きに応じた音声データを生成する(ステップS103)。具体的には、まず、生成部106は、主被写体Tの動きに応じた音声要素データをライブラリ107から読み出し、読み出した音声要素データに基づいて音声データを生成する。本例では、生成部106は、走っている足音「タッ」をライブラリ107から読み出す。そして、生成部106は、主被写体Tの足が地面に着地するタイミングで足音「タッ」が鳴るように音声データを生成する。これにより、動画における主被写体Tが走る動きに合わせた「タッタッタッタ」という音の音声データが生成される。 Next, the generation unit 106 generates audio data corresponding to the movement of the main subject T (step S103). Specifically, first, the generation unit 106 reads out sound element data corresponding to the movement of the main subject T from the library 107, and generates sound data based on the read sound element data. In this example, the generation unit 106 reads the running footstep “t” from the library 107. Then, the generation unit 106 generates audio data so that the footstep “tap” sounds at the timing when the foot of the main subject T lands on the ground. Thereby, sound data of a sound “tattattata” that matches the movement of the main subject T in the moving image is generated.
 次に、生成部106は、生成した音声データに応じた振動データを生成する(ステップS104)。本例では、生成部106は、足音「タッ」が出力されるタイミングで振動するように振動データを生成する。
 最後に、生成部106は、動画の画像データと、生成した音声データと、生成した振動データと時間的に同期させてマルチメディアデータを生成する(ステップS105)。
Next, the production | generation part 106 produces | generates the vibration data according to the produced | generated audio | voice data (step S104). In this example, the generation unit 106 generates vibration data so as to vibrate at the timing when the footstep “t” is output.
Finally, the generation unit 106 generates multimedia data in time synchronization with the moving image data, the generated audio data, and the generated vibration data (step S105).
 制御部101は、生成されたマルチメディアデータを再生する際、動画の画像データを表示部110に表示するとともに、音声データを音声出力部108に出力し、振動データを振動部109に出力する。これにより、このマルチメディアデータが再生されると、表示部110に表示される主被写体Tの動きに合わせて音声出力部108が足音「タッタッタッタ」を再生し、振動部109が振動を発生する。すなわち、表示部110において主被写体Tの足が地面に着地するタイミングで、足音「タッ」が再生されるとともに電子機器1において振動が発生する。 When the generated multimedia data is played back, the control unit 101 displays moving image data on the display unit 110, outputs audio data to the audio output unit 108, and outputs vibration data to the vibration unit 109. As a result, when the multimedia data is reproduced, the audio output unit 108 reproduces the footstep “Tattatta” in accordance with the movement of the main subject T displayed on the display unit 110, and the vibration unit 109 generates vibration. That is, at the timing when the foot of the main subject T lands on the ground on the display unit 110, the footstep “tap” is reproduced and the electronic device 1 vibrates.
 上述した例では、データ記憶部104に既に記憶されている動画の画像データからマルチメディアデータを生成しているが、撮像部102が撮影している動画の画像データからマルチメディアデータを生成してもよい。この場合、制御部101は、撮像部102が撮像中の動画の画像データを順次画像解析部105に出力する。そして、制御部101は、動画の撮影を終了する操作に応じて、生成部106を制御してこの動画のマルチメディアデータを生成する。これにより、ユーザは、動画撮影操作を行うだけで、振動が付加されたマルチメディアデータを取得することができる。
 また、本実施形態では、生成部106は、所定の変換式により音声データを振動データに変換しているが、音声データ(音声要素データ)に対応する振動に関する情報(例えば、周波数、振動振幅、又は、振動時間等)を予めライブラリ107に記憶しておいて、ライブラリ107から音声データ(音声要素データ)に対応する振動に関する情報を読み出し、読み出した振動に関する情報に基づいて振動データを生成してもよい。
In the above-described example, multimedia data is generated from the moving image data already stored in the data storage unit 104. However, the multimedia data is generated from the moving image data captured by the imaging unit 102. Also good. In this case, the control unit 101 sequentially outputs the image data of the moving image being imaged by the imaging unit 102 to the image analysis unit 105. Then, the control unit 101 controls the generation unit 106 to generate multimedia data of the moving image in response to an operation for ending the shooting of the moving image. Thereby, the user can acquire the multimedia data to which the vibration is added only by performing the moving image shooting operation.
In the present embodiment, the generation unit 106 converts the sound data into vibration data by a predetermined conversion formula. However, information related to vibration (for example, frequency, vibration amplitude, (Or vibration time, etc.) is stored in the library 107 in advance, information relating to vibration corresponding to the sound data (voice element data) is read from the library 107, and vibration data is generated based on the read information relating to vibration. Also good.
 このように、本実施形態によれば、生成部106は、画像解析部105により解析された被写体の動きに応じた音声データ及び振動データを生成し、動画の画像データに音声データ及び振動データを時間的に同期させてマルチメディアデータを生成する。これにより、動画における被写体の動きに応じた音声データ及び振動データを生成することができる。また、マルチメディアデータを再生すると、動画に合わせて音声が再生されるとともに、電子機器1において振動が発生するため、マルチメディアデータの視聴をより楽しむことができる。例えば、思い出のある動画音声データに振動データを結合したマルチメディアデータの場合には、動画と音と振動という3感の表現により、動画と音だけの場合より、より鮮明に動画に関連する思い出を思い出すことに貢献できる。 As described above, according to the present embodiment, the generation unit 106 generates audio data and vibration data corresponding to the movement of the subject analyzed by the image analysis unit 105, and adds the audio data and vibration data to the moving image data. Multimedia data is generated in time synchronization. Thereby, it is possible to generate audio data and vibration data corresponding to the movement of the subject in the moving image. Further, when the multimedia data is reproduced, sound is reproduced in accordance with the moving image and vibration is generated in the electronic device 1, so that the multimedia data can be viewed more enjoyably. For example, in the case of multimedia data in which vibration data is combined with memorized video / audio data, the three senses of video, sound, and vibration make the memories related to the video clearer than with video and sound alone. Can help you remember.
[第2の実施形態]
 次に、本発明の第2の実施形態について説明する。
 本実施形態によるライブラリ107は、被写体の動きそれぞれに応じた振動に関するデータを記憶する。振動に関するデータには、例えば、周波数、振幅、又は、振動時間等が含まれる。また、生成部106は、被写体の動きに応じた振動データを生成する。具体的には、生成部106は、被写体の動きに応じた振動に関するデータをライブラリ107から読み出し、読み出した振動に関するデータに基づいて振動データを生成する。
 他の構成は第1の実施形態と同様であるため説明を省略する。
[Second Embodiment]
Next, a second embodiment of the present invention will be described.
The library 107 according to the present embodiment stores data relating to vibration corresponding to each movement of the subject. The data related to vibration includes, for example, frequency, amplitude, vibration time, and the like. Further, the generation unit 106 generates vibration data according to the movement of the subject. Specifically, the generation unit 106 reads data related to vibration according to the movement of the subject from the library 107, and generates vibration data based on the read data related to vibration.
Since other configurations are the same as those of the first embodiment, description thereof is omitted.
 次に、図4を参照して、動画音声データからマルチメディアデータを生成する処理について説明する。図4は、本実施形態によるマルチメディアデータ生成処理の手順を示すフローチャートである。
 制御部101は、データ記憶部104から動画音声データを読み出し、読み出した動画音声データを画像解析部105に出力することによりマルチメディアデータの生成を指示する。
Next, a process for generating multimedia data from moving image / sound data will be described with reference to FIG. FIG. 4 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
The control unit 101 reads moving image / sound data from the data storage unit 104 and outputs the read moving image / sound data to the image analysis unit 105 to instruct generation of multimedia data.
 まず、画像解析部105が、動画音声データに含まれる動画の画像データにおける主被写体Tを抽出する(ステップS201)。
 次に、画像解析部105は、抽出した主被写体Tの動きを解析する(ステップS202)。本例では、画像解析部105は、主被写体Tが走っていると判定する。また、画像解析部105は、主被写体Tの足が地面に着地するタイミング(動画における時間位置)を、例えば足の動きのベクトル解析により抽出する。
First, the image analysis unit 105 extracts the main subject T in the moving image data included in the moving image audio data (step S201).
Next, the image analysis unit 105 analyzes the movement of the extracted main subject T (step S202). In this example, the image analysis unit 105 determines that the main subject T is running. Further, the image analysis unit 105 extracts the timing at which the foot of the main subject T lands on the ground (time position in the moving image) by, for example, vector analysis of foot motion.
 次に、生成部106が、主被写体Tの動きに応じた振動データを生成する(ステップS203)。具体的には、制御部106は、主被写体Tの動きに応じた振動に関するデータを読み出し、読み出した振動に関するデータに基づいて振動データを生成する。本例では、生成部106は、主被写体Tの足が地面に着地するタイミングで、振動に関するデータ(例えば、周波数、振幅、振動時間等)に従った振動が発生するように振動データを生成する。
 或いは、ベクトル解析により得られたデータを基に、振動データを生成することもできる。この場合、動きのベクトル方向が所定値より大きく変化した場合に振動するように振動データを生成する等考えられる。
 最後に、生成部106は、動画音声データに生成した振動データを時間的に同期させてマルチメディアデータを生成する(ステップS204)。
Next, the generation unit 106 generates vibration data according to the movement of the main subject T (step S203). Specifically, the control unit 106 reads data related to vibration according to the movement of the main subject T, and generates vibration data based on the read data related to vibration. In this example, the generation unit 106 generates vibration data so that vibration according to vibration-related data (for example, frequency, amplitude, vibration time, etc.) is generated at the timing when the foot of the main subject T lands on the ground. .
Alternatively, vibration data can be generated based on data obtained by vector analysis. In this case, it is conceivable that vibration data is generated so as to vibrate when the motion vector direction changes more than a predetermined value.
Finally, the generation unit 106 generates multimedia data by temporally synchronizing the vibration data generated to the moving image audio data (step S204).
 このマルチメディアデータが再生されると、表示部110に表示される主被写体Tの動きに合わせて振動部109が振動を発生する。すなわち、表示部110において主被写体Tの足が地面に着地するタイミングで、電子機器1において振動が発生する。 When this multimedia data is reproduced, the vibration unit 109 generates vibration in accordance with the movement of the main subject T displayed on the display unit 110. That is, vibration occurs in the electronic device 1 at the timing when the foot of the main subject T lands on the ground in the display unit 110.
 このように、本実施形態によれば、生成部106は、画像解析部105の解析結果に基づき振動データを生成するため、より被写体の動きに合わせた振動を発生することができる。 Thus, according to the present embodiment, since the generation unit 106 generates vibration data based on the analysis result of the image analysis unit 105, it is possible to generate vibrations more in line with the movement of the subject.
[第3の実施形態]
 次に、本発明の第3の実施形態について説明する。
 図5は、本実施形態による電子機器2の構成を示すブロック図である。
 本実施形態による電子機器2は、図1に示す電子機器1の構成に加えて音声抽出部211を備える。
 音声抽出部211は、画像解析部205による動画の画像データの解析結果に基づいて、音声データから主被写体Tの動きに応じた音声を抽出し、抽出した音声に関する情報(例えば、音声データにおける当該音声の時間位置)を生成部206に出力する。
 生成部206は、音声データにおいて音声抽出部211が抽出した音声の音量を、少なくとも予め設定された所定量より大きくして新たに音声データを生成する。次に、生成部206は、生成した音声データに応じた振動データを生成する。具体的には、生成部206は、音声データにおいて音量が所定量より大きい時間位置で振動するように振動データを生成する。或いは、所定の変換式により音声データを振動データに変換してもよい。そして、生成部206は、動画の画像データと、生成した音声データと、生成した振動データとを時間的に同期させてマルチメディアデータを生成する。
 他の構成は第1の実施形態と同様であるため説明を省略する。
[Third Embodiment]
Next, a third embodiment of the present invention will be described.
FIG. 5 is a block diagram illustrating the configuration of the electronic apparatus 2 according to the present embodiment.
The electronic device 2 according to the present embodiment includes a voice extraction unit 211 in addition to the configuration of the electronic device 1 shown in FIG.
The sound extraction unit 211 extracts sound corresponding to the movement of the main subject T from the sound data based on the analysis result of the moving image data by the image analysis unit 205, and information about the extracted sound (for example, the sound data in the sound data) The time position of the sound is output to the generation unit 206.
The generation unit 206 newly generates audio data by increasing the volume of the audio extracted by the audio extraction unit 211 in the audio data to at least a predetermined amount set in advance. Next, the generation unit 206 generates vibration data according to the generated audio data. Specifically, the generation unit 206 generates vibration data so that the sound volume vibrates at a time position where the volume is larger than a predetermined amount. Alternatively, the audio data may be converted into vibration data by a predetermined conversion formula. Then, the generation unit 206 generates multimedia data by temporally synchronizing the image data of the moving image, the generated audio data, and the generated vibration data.
Since other configurations are the same as those of the first embodiment, description thereof is omitted.
 次に、図6を参照して、動画音声データからマルチメディアデータを生成する処理について説明する。図6は、本実施形態によるマルチメディアデータ生成処理の手順を示すフローチャートである。
 制御部201は、データ記憶部204から動画音声データを読み出し、読み出した動画音声データを画像解析部205と音声抽出部211とに出力することによりマルチメディアデータの生成を指示する。
Next, processing for generating multimedia data from moving image / sound data will be described with reference to FIG. FIG. 6 is a flowchart showing a procedure of multimedia data generation processing according to this embodiment.
The control unit 201 reads moving image / sound data from the data storage unit 204 and outputs the read moving image / sound data to the image analysis unit 205 and the sound extraction unit 211 to instruct generation of multimedia data.
 まず、画像解析部205が、動画音声データに含まれる動画の画像データにおける主被写体Tを抽出する(ステップS301)。
 次に、画像解析部205は、抽出した主被写体Tの動きを解析する(ステップS302)。本例では、画像解析部205は、主被写体Tが走っていると判定する。また、画像解析部205は、主被写体Tの足が地面に着地するタイミング(動画における位置(時間))を、例えば足の動きのベクトル解析により抽出する。
First, the image analysis unit 205 extracts the main subject T in the moving image data included in the moving image audio data (step S301).
Next, the image analysis unit 205 analyzes the movement of the extracted main subject T (step S302). In this example, the image analysis unit 205 determines that the main subject T is running. Further, the image analysis unit 205 extracts the timing (position (time) in the moving image) at which the foot of the main subject T lands on the ground by, for example, vector analysis of foot motion.
 次に、音声抽出部211が、動画音声データに含まれる音声データから主被写体Tの動きに応じた音声を抽出する(ステップS303)。本例では、音声抽出部211は、主被写体Tの足音を、例えば時間的なタイミングを基にして、周波数解析などにより抽出する。周波数解析により抽出する場合、電子機器2は、足音の周波数に関するデータを予め記憶している。すなわち、音声抽出部211は、画像解析部205による解析結果に基づいて、主被写体Tの足が地面に着地する際の音声を足音として音声データから抽出する。 Next, the sound extraction unit 211 extracts sound corresponding to the movement of the main subject T from the sound data included in the moving image sound data (step S303). In this example, the sound extraction unit 211 extracts a footstep sound of the main subject T by frequency analysis or the like based on, for example, temporal timing. When extracting by frequency analysis, the electronic device 2 stores in advance data relating to the frequency of footsteps. That is, the voice extraction unit 211 extracts a voice when the foot of the main subject T lands on the ground as a footstep from the voice data based on the analysis result by the image analysis unit 205.
 次に、生成部206が、音声抽出部211の抽出した音声に基づき音声データを生成する(ステップS304)。具体的には、生成部206は、動画音声データに含まれる音声データにおいて音声抽出部211が抽出した音声の音量を大きくする。本例では、生成部206は、動画音声データに含まれる音声データにおいて、主被写体Tの足が地面に着地するタイミングの音声(足音)の音量を大きくする。すなわち、生成部206は、動画音声データに含まれる音声データにおいて、足音を強調する。 Next, the generation unit 206 generates audio data based on the audio extracted by the audio extraction unit 211 (step S304). Specifically, the generation unit 206 increases the volume of the audio extracted by the audio extraction unit 211 in the audio data included in the moving image audio data. In this example, the generation unit 206 increases the volume of the sound (footstep) at the timing when the foot of the main subject T lands on the ground in the sound data included in the moving image sound data. That is, the generation unit 206 emphasizes footsteps in the audio data included in the moving image audio data.
 次に、生成部206は、生成した音声データに応じた振動データを生成する(ステップS305)。本例では、生成部206は、強調された足音が出力されるタイミングで振動するように振動データを生成する。
 最後に、生成部206は、動画音声データに含まれる動画の画像データに、生成した音声データと生成した振動データとを時間的に同期させてマルチメディアデータを生成する(ステップS306)。
Next, the generation unit 206 generates vibration data according to the generated audio data (step S305). In this example, the generation unit 206 generates vibration data so as to vibrate at the timing when the emphasized footsteps are output.
Finally, the generation unit 206 generates multimedia data by temporally synchronizing the generated audio data and the generated vibration data to the moving image data included in the moving image audio data (step S306).
 このマルチメディアデータが再生されると、表示部210に表示される主被写体Tの動きに合わせて音声出力部208が強調された足音を再生し、振動部209が振動を発生する。すなわち、表示部210において主被写体Tの足が地面に着地するタイミングで、強調された足音が再生されるとともに電子機器2において振動が発生する。 When the multimedia data is reproduced, the audio output unit 208 reproduces the footsteps emphasized in accordance with the movement of the main subject T displayed on the display unit 210, and the vibration unit 209 generates vibration. That is, at the timing when the foot of the main subject T lands on the ground on the display unit 210, the emphasized footsteps are reproduced and the electronic device 2 vibrates.
 このように、本実施形態によれば、生成部206は、被写体の動きに応じた音声を強調しているため、被写体の動きを音声により強調して表現することができる。 Thus, according to the present embodiment, since the generation unit 206 emphasizes the voice according to the movement of the subject, it can express the movement of the subject with the voice.
 また、図3,4,6に示す各ステップを実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、マルチメディアデータ生成処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、OSや周辺機器等のハードウェアを含むものであってもよい。
 また、「コンピュータ読み取り可能な記録媒体」とは、フロッピー(登録商標)ディスク、光磁気ディスク、SDカード、フラッシュメモリ等の書き込み可能な不揮発性メモリ、CD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。
In addition, by recording a program for realizing each step shown in FIGS. 3, 4, and 6 on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium, Multimedia data generation processing may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices.
“Computer-readable recording medium” means a floppy (registered trademark) disk, a magneto-optical disk, an SD card, a writable non-volatile memory such as a flash memory, a portable medium such as a CD-ROM, and a computer system. A built-in storage device such as a hard disk.
 さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ(例えばDRAM(Dynamic Random Access Memory))のように、一定時間プログラムを保持しているものも含むものとする。
 また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク(通信網)や電話回線等の通信回線(通信線)のように情報を伝送する機能を有する媒体のことをいう。
 また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル(差分プログラム)であっても良い。
Further, the “computer-readable recording medium” refers to a volatile memory (for example, DRAM (Dynamic DRAM)) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.
 以上、図面を参照してこの発明の実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。
 例えば、本実施形態では、人物が走っている動画を例に説明したが、これに限られることなく、鳥が羽ばたく動画に羽ばたく音の音声データを付加する等、主被写体が人物以外の動画に対して音声データ又は振動データを生成してもよい。また、アニメーション等の動画に対して音声データ又は振動データを生成してもよい。
 また、本実施形態では、動画に対する音声データ又は振動データを生成しているが、静止画に対する音声データ又は振動データを生成してもよい。この場合、画像解析部105(又は205)は、パターンマッチング等により、静止画から被写体の動きを解析する。例えば、運動会の静止画の場合には、運動会に応じた音声(走っている音や歓声、行進曲等)を静止画に結合する。
The embodiments of the present invention have been described in detail above with reference to the drawings. However, the specific configuration is not limited to the above-described one, and various design changes and the like can be made without departing from the scope of the present invention. Is possible.
For example, in the present embodiment, a moving image of a person is described as an example. However, the present invention is not limited to this, and the main subject is added to a moving image other than a person, such as adding sound data of a flapping sound to a moving image of a bird. On the other hand, sound data or vibration data may be generated. Also, audio data or vibration data may be generated for a moving image such as an animation.
In this embodiment, sound data or vibration data for a moving image is generated. However, sound data or vibration data for a still image may be generated. In this case, the image analysis unit 105 (or 205) analyzes the movement of the subject from the still image by pattern matching or the like. For example, in the case of a still image of an athletic meet, voices corresponding to the athletic meet (running sounds, cheers, marches, etc.) are combined with the still image.
 1,2…電子機器 101,201…制御部 102,202…撮像部 103,203…マイク 104,204…データ記憶部 105,205…画像解析部 106,206…生成部 107,207…ライブラリ 108,208…音声出力部 109,209…振動部 110,210…表示部 211…音声抽出部 1, 2, ... Electronic devices 101, 201 ... Control part 102, 202 ... Imaging part 103, 203 ... Microphone 104, 204 ... Data storage part 105, 205 ... Image analysis part 106, 206 ... Generation part 107, 207 ... Library 108, 208: Voice output unit 109, 209 ... Vibration unit 110, 210 ... Display unit 211 ... Voice extraction unit

Claims (14)

  1.  動画における被写体の動きを解析する解析部と、
     前記解析部により解析された被写体の動きに応じた音声データの生成と、前記生成した音声データに応じた振動データの生成とを行なう生成部と、
     を備えることを特徴とする電子機器。
    An analysis unit for analyzing the movement of the subject in the video;
    A generation unit that generates audio data according to the movement of the subject analyzed by the analysis unit, and generates vibration data according to the generated audio data;
    An electronic device comprising:
  2.  被写体の動きそれぞれに応じた音声要素データを記憶する記憶部を備え、
     前記生成部は、前記解析部により解析された被写体の動きに応じた音声要素データを前記記憶部から読み出し、読み出した音声要素データに基づいて前記音声データを生成する
     ことを特徴とする請求項1に記載の電子機器。
    A storage unit for storing audio element data corresponding to each movement of the subject;
    The said generation part reads the audio | voice element data according to the motion of the to-be-analyzed analyzed by the said analysis part from the said memory | storage part, The said audio | voice data are produced | generated based on the read audio | voice element data. The electronic device as described in.
  3.  前記動画と時間的に同期している音声データにおいて、前記解析部により解析された被写体の動きに応じた音声を抽出する抽出部を備え、
     前記生成部は、前記音声データにおいて、前記抽出部により抽出された音声を強調する
     ことを特徴とする請求項1に記載の電子機器。
    In the audio data that is temporally synchronized with the moving image, the audio data is provided with an extraction unit that extracts audio corresponding to the movement of the subject analyzed by the analysis unit,
    The electronic device according to claim 1, wherein the generation unit emphasizes the voice extracted by the extraction unit in the voice data.
  4.  動画における被写体の動きを解析する解析部と、
     前記解析部により解析された被写体の動きに応じた振動データの生成を行なう生成部と、
     を備えることを特徴とする電子機器。
    An analysis unit for analyzing the movement of the subject in the video;
    A generation unit that generates vibration data according to the movement of the subject analyzed by the analysis unit;
    An electronic device comprising:
  5.  前記生成部は、前記解析部により解析された被写体の動きに応じた音声データを生成する
     ことを特徴とする請求項4に記載の電子機器。
    The electronic device according to claim 4, wherein the generation unit generates audio data corresponding to the movement of the subject analyzed by the analysis unit.
  6.  被写体の動きそれぞれに応じた音声要素データを記憶する第1の記憶部を備え、
     前記生成部は、前記解析部により解析された被写体の動きに応じた音声要素データを前記第1の記憶部から読み出し、読み出した音声要素データに基づいて前記音声データを生成する
     ことを特徴とする請求項5に記載の電子機器。
    A first storage unit that stores audio element data corresponding to each movement of the subject;
    The generation unit reads audio element data corresponding to the movement of the subject analyzed by the analysis unit from the first storage unit, and generates the audio data based on the read audio element data. The electronic device according to claim 5.
  7.  前記動画と時間的に同期している音声データにおいて、前記解析部により解析された被写体の動きに応じた音声を抽出する抽出部を備え、
     前記生成部は、前記音声データにおいて、前記抽出部により抽出された音声を強調する
     ことを特徴とする請求項4又は5に記載の電子機器。
    In the audio data that is temporally synchronized with the moving image, the audio data is provided with an extraction unit that extracts audio corresponding to the movement of the subject analyzed by the analysis unit,
    The electronic device according to claim 4, wherein the generation unit emphasizes the voice extracted by the extraction unit in the voice data.
  8.  被写体の動きそれぞれに応じた振動に関する情報を記憶する第2の記憶部を備え、
     前記生成部は、前記第2の記憶部から前記解析部により解析された被写体の動きに応じた振動に関する情報を読み出し、読み出した振動に関する情報に基づいて振動データを生成する
     ことを特徴とする請求項4から7のうちいずれか1項に記載の電子機器。
    A second storage unit that stores information related to vibration according to each movement of the subject;
    The said generation part reads the information regarding the vibration according to the motion of the to-be-photographed object analyzed by the said analysis part from the said 2nd memory | storage part, and produces | generates vibration data based on the read information regarding the vibration. Item 8. The electronic device according to any one of Items 4 to 7.
  9.  前記生成部は、前記動画に、音声データと振動データとを時間的に同期させたマルチメディアデータを生成する
     ことを特徴とする請求項1から8のうちいずれか1項に記載の電子機器。
    The electronic device according to claim 1, wherein the generation unit generates multimedia data in which audio data and vibration data are temporally synchronized with the moving image.
  10.  画像を表示する表示部と、
     音声を出力する音声出力部と、
     振動を発生する振動部と、
     前記マルチメディアデータに含まれる動画を前記表示部に表示するとともに、前記マルチメディアデータに含まれる音声データの音声を前記音声出力部に出力させ、前記マルチメディアデータに含まれる振動データに基づく振動を前記振動部に発生させることにより前記データを再生する再生部と、
     を備えることを特徴とする請求項9に記載の電子機器。
    A display for displaying an image;
    An audio output unit for outputting audio;
    A vibration part that generates vibration;
    The moving image included in the multimedia data is displayed on the display unit, the sound of the audio data included in the multimedia data is output to the sound output unit, and vibration based on the vibration data included in the multimedia data is generated. A reproducing unit for reproducing the data by causing the vibration unit to generate the data;
    The electronic apparatus according to claim 9, further comprising:
  11.  被写体を撮像する撮像部と、
     前記撮像部において動画を撮像しているときに、前記画像解析部及び前記生成部を制御して当該動画に対応するデータを生成する制御部と、
     を備えることを特徴とする請求項1から10のうちいずれか1項に記載の電子機器。
    An imaging unit for imaging a subject;
    A control unit that controls the image analysis unit and the generation unit to generate data corresponding to the video when a moving image is captured in the imaging unit;
    The electronic apparatus according to claim 1, further comprising:
  12.  前記解析部は、前記被写体の動きのベクトルに基づいて前記被写体の動きを解析する
     ことを特徴とする請求項1から11のうちいずれか1項に記載の電子機器。
    The electronic apparatus according to claim 1, wherein the analysis unit analyzes the movement of the subject based on a motion vector of the subject.
  13.  電子機器が、動画における被写体の動きを解析することと、
     前記電子機器が、解析された前記被写体の動きに応じた音声データの生成と、前記生成した音声データに応じた振動データの生成とを行なうことと、
     を有することを特徴とするデータ生成方法。
    The electronic device analyzes the movement of the subject in the video,
    The electronic device generates sound data according to the analyzed movement of the subject and generates vibration data according to the generated sound data;
    A data generation method characterized by comprising:
  14.  電子機器が、動画における被写体の動きを解析することと、
     前記電子機器が、解析された被写体の動きに応じた音声データの生成と、前記解析部により解析された被写体の動きに応じた振動データの生成とを行なうことと、
     を有することを特徴とするデータ生成方法。
    The electronic device analyzes the movement of the subject in the video,
    The electronic device generates audio data according to the analyzed movement of the subject, and generates vibration data according to the movement of the subject analyzed by the analysis unit;
    A data generation method characterized by comprising:
PCT/JP2012/067757 2011-07-14 2012-07-11 Electronic device and data generation method WO2013008869A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011-155586 2011-07-14
JP2011155586 2011-07-14

Publications (1)

Publication Number Publication Date
WO2013008869A1 true WO2013008869A1 (en) 2013-01-17

Family

ID=47506148

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/067757 WO2013008869A1 (en) 2011-07-14 2012-07-11 Electronic device and data generation method

Country Status (2)

Country Link
JP (1) JPWO2013008869A1 (en)
WO (1) WO2013008869A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014239430A (en) * 2013-05-24 2014-12-18 イマージョン コーポレーションImmersion Corporation Method and system for coding and streaming tactile sense data
JP2016119071A (en) * 2014-12-19 2016-06-30 イマージョン コーポレーションImmersion Corporation System and method for recording haptic data for use with multi-media data
JP7344096B2 (en) 2019-11-19 2023-09-13 日本放送協会 Haptic metadata generation device, video-tactile interlocking system, and program
JP7488704B2 (en) 2020-06-18 2024-05-22 日本放送協会 Haptic metadata generating device, video-haptic linking system, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09219858A (en) * 1996-02-13 1997-08-19 Matsushita Electric Ind Co Ltd Video/voiced encoder and video/voice decoder
JP2004261272A (en) * 2003-02-28 2004-09-24 Oki Electric Ind Co Ltd Cenesthetic device, motion signal generation method and program
JP2007006313A (en) * 2005-06-27 2007-01-11 Megachips Lsi Solutions Inc Moving picture imaging apparatus and file storage method
JP2010278997A (en) * 2009-06-01 2010-12-09 Sharp Corp Image processing device, image processing method, and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09219858A (en) * 1996-02-13 1997-08-19 Matsushita Electric Ind Co Ltd Video/voiced encoder and video/voice decoder
JP2004261272A (en) * 2003-02-28 2004-09-24 Oki Electric Ind Co Ltd Cenesthetic device, motion signal generation method and program
JP2007006313A (en) * 2005-06-27 2007-01-11 Megachips Lsi Solutions Inc Moving picture imaging apparatus and file storage method
JP2010278997A (en) * 2009-06-01 2010-12-09 Sharp Corp Image processing device, image processing method, and program

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014239430A (en) * 2013-05-24 2014-12-18 イマージョン コーポレーションImmersion Corporation Method and system for coding and streaming tactile sense data
US10085069B2 (en) 2013-05-24 2018-09-25 Immersion Corporation Method and system for haptic data encoding and streaming using a multiplexed data stream
US10542325B2 (en) 2013-05-24 2020-01-21 Immersion Corporation Method and system for haptic data encoding and streaming using a multiplexed data stream
JP2016119071A (en) * 2014-12-19 2016-06-30 イマージョン コーポレーションImmersion Corporation System and method for recording haptic data for use with multi-media data
US10650859B2 (en) 2014-12-19 2020-05-12 Immersion Corporation Systems and methods for recording haptic data for use with multi-media data
JP7344096B2 (en) 2019-11-19 2023-09-13 日本放送協会 Haptic metadata generation device, video-tactile interlocking system, and program
JP7488704B2 (en) 2020-06-18 2024-05-22 日本放送協会 Haptic metadata generating device, video-haptic linking system, and program

Also Published As

Publication number Publication date
JPWO2013008869A1 (en) 2015-02-23

Similar Documents

Publication Publication Date Title
US9261960B2 (en) Haptic sensation recording and playback
JP2019525571A5 (en)
WO2013024704A1 (en) Image-processing device, method, and program
JP2011239141A (en) Information processing method, information processor, scenery metadata extraction device, lack complementary information generating device and program
CN111445901A (en) Audio data acquisition method and device, electronic equipment and storage medium
WO2013008869A1 (en) Electronic device and data generation method
CN110312162A (en) Selected stage treatment method, device, electronic equipment and readable medium
JP4725918B2 (en) Program image distribution system, program image distribution method, and program
KR20220106848A (en) Video special effects processing methods and devices
JP6073145B2 (en) SINGING VOICE DATA GENERATION DEVICE AND SINGING MOVIE DATA GENERATION DEVICE
JP4318182B2 (en) Terminal device and computer program applied to the terminal device
JP2023534975A (en) Music playback method, device, device and storage medium based on user interaction
CN107087208B (en) Panoramic video playing method, system and storage device
JP2018019393A (en) Reproduction control system, information processing apparatus, and program
WO2017061278A1 (en) Signal processing device, signal processing method, and computer program
JP2009260718A (en) Image reproduction system and image reproduction processing program
JP5310682B2 (en) Karaoke equipment
JP2010200079A (en) Photography control device
CN114760574A (en) Audio playing method and laser projection equipment
JP2013054334A (en) Electronic device
CN111696566A (en) Voice processing method, apparatus and medium
JP2013183280A (en) Information processing device, imaging device, and program
TWI581626B (en) System and method for processing media files automatically
WO2023084933A1 (en) Information processing device, information processing method, and program
KR101562901B1 (en) System and method for supporing conversation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12811472

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2013523971

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12811472

Country of ref document: EP

Kind code of ref document: A1