WO2014207874A1 - Dispositif électronique, méthode de sortie, et programme - Google Patents

Dispositif électronique, méthode de sortie, et programme Download PDF

Info

Publication number
WO2014207874A1
WO2014207874A1 PCT/JP2013/067716 JP2013067716W WO2014207874A1 WO 2014207874 A1 WO2014207874 A1 WO 2014207874A1 JP 2013067716 W JP2013067716 W JP 2013067716W WO 2014207874 A1 WO2014207874 A1 WO 2014207874A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
sound
sound information
audio
audio information
Prior art date
Application number
PCT/JP2013/067716
Other languages
English (en)
Japanese (ja)
Inventor
谷内 謙一
Original Assignee
株式会社東芝
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社東芝 filed Critical 株式会社東芝
Priority to PCT/JP2013/067716 priority Critical patent/WO2014207874A1/fr
Publication of WO2014207874A1 publication Critical patent/WO2014207874A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H20/00Arrangements for broadcast or for distribution combined with broadcast
    • H04H20/10Arrangements for replacing or switching information during the broadcast or the distribution
    • H04H20/106Receiver-side switching

Definitions

  • Embodiments described herein relate generally to an electronic device, an output method, and a program.
  • the electronic device of the embodiment includes a separation unit, a conversion unit, and an output unit.
  • the separation unit separates the background sound information and the first sound information from the sound information.
  • the conversion unit converts the first sound information into second sound information corresponding to the first sound information.
  • the output unit mixes and outputs the background sound information and the second sound information.
  • FIG. 1 is a block diagram showing a main signal processing system of a digital television as an example of the electronic apparatus according to the first embodiment.
  • FIG. 2 is a block diagram illustrating a configuration of a signal processing unit included in the digital television according to the first embodiment.
  • FIG. 3 is a flowchart illustrating a flow of output processing of sound information and image information by a signal processing unit included in the digital television according to the first embodiment.
  • FIG. 4 is a diagram illustrating an example of a setting screen for various information in the digital television according to the first embodiment.
  • FIG. 5 is a diagram illustrating a configuration of an information processing system having a notebook PC as an example of an electronic apparatus according to the second embodiment.
  • FIG. 6 is a sequence diagram illustrating a flow of output processing of sound information in the information processing system according to the second embodiment.
  • FIG. 7 is a diagram illustrating a hardware configuration of a PC that is an example of an electronic apparatus according to the third embodiment.
  • FIG. 8 is a block diagram illustrating a functional configuration of a PC according to the third embodiment.
  • FIG. 1 is a block diagram showing a main signal processing system of a digital television as an example of the electronic apparatus according to the first embodiment.
  • the satellite digital television broadcast signal received by the BS / CS digital broadcast receiving antenna 121 is supplied to the satellite digital broadcast tuner 202 a provided in the broadcast input unit 202 via the input terminal 201.
  • the tuner 202a selects a broadcast signal of a desired channel based on a control signal from the control unit 205, and outputs the selected broadcast signal to a PSK (Phase Shift Keying) demodulator 202b.
  • PSK Phase Shift Keying
  • the PSK demodulator 202b included in the broadcast input unit 202 demodulates the broadcast signal selected by the tuner 202a based on a control signal from the control unit 205, and obtains a transport stream (TS) including a desired program. The result is output to the TS decoder 202c.
  • TS transport stream
  • a TS decoder 202c included in the broadcast input unit 202 performs TS decoding processing on a signal in which a transport stream (TS) is multiplexed by a control signal from the control unit 205, and digital video signals and sound signals of a desired program.
  • PES Packetized Elementary Stream
  • the TS decoder 202c outputs section information transmitted by digital broadcasting to a section processing unit (not shown) in the signal processing unit 206.
  • the terrestrial digital television broadcast signal received by the terrestrial broadcast receiving antenna 122 is supplied to the terrestrial digital broadcast tuner 204 a provided in the broadcast input unit 202 via the input terminal 203.
  • the tuner 204a can select a broadcast signal of a desired channel by a control signal from the control unit 205.
  • the tuner 204a outputs the broadcast signal to an OFDM (Orthogonal Frequency Division Multiplexing) demodulator 204b.
  • OFDM Orthogonal Frequency Division Multiplexing
  • the OFDM demodulator 204b included in the broadcast input unit 202 demodulates the broadcast signal selected by the tuner 204a based on a control signal from the control unit 205, obtains a transport stream including a desired program, and a TS decoder To 204c.
  • a TS decoder 204c included in the broadcast input unit 202 performs TS decoding processing on a signal in which a transport stream (TS) is multiplexed by a control signal from the control unit 205, and performs digital video signal and sound signal of a desired program. Is output to the STD buffer in the signal processing unit 206.
  • the TS decoder 204c outputs section information transmitted by digital broadcasting to a section processing unit (not shown) in the signal processing unit 206.
  • the signal processing unit 206 selectively performs predetermined digital signal processing on the digital video signal and sound signal respectively supplied from the TS decoder 202c and the TS decoder 204c when viewing the television, and the graphic processing unit 207 And output to the audio output unit 208. Further, the signal processing unit 206 selectively outputs a signal obtained by performing predetermined digital signal processing on the digital video signal and sound signal respectively supplied from the TS decoder 202c and the TS decoder 204c at the time of program recording. The recording is performed in the whole recording storage device (for example, HDD: Hard Disk Drive) 271 and the external storage device 226 via the control unit 205.
  • HDD Hard Disk Drive
  • the round recording according to the present embodiment is different from the reserved recording in which the recording is performed in units of program content selected by the user, in order to prevent the user from overlooking the broadcast channel for a predetermined time period (including all day). ) Is a method for recording all program content broadcasted on the Internet.
  • the recording time zone may be different for each broadcast channel.
  • the signal processing unit 206 also plays back recorded program data (video signal and sound signal) read from the recording / recording storage device 271 or the external storage device 226 via the control unit 205 during playback of the recorded program. Then, predetermined digital signal processing is performed and output to the graphic processing unit 207 and the audio output unit 208.
  • recorded program data video signal and sound signal
  • a section processing unit (not shown) included in the signal processing unit 206 includes various data, electronic program guide (EPG) information, and program attributes for acquiring a program from the section information input from the TS decoders 202c and 204c.
  • Information program genre, etc.
  • subtitle information, etc. service information, SI, PSI
  • the tuner 202a, the PSK demodulator 202b, the TS decoder 202c, the tuner 204a, the OFDM demodulator 204b, and the TS decoder 204c shown in FIG. 1 have more than the number of systems necessary for the round recording function.
  • the digital television 100 is an apparatus capable of recording all the terrestrial key stations in Tokyo, the digital television 100 includes seven or more tuners 204a, OFDM demodulators 204b, and TS decoders 204c.
  • the control unit 205 receives various data (such as key information for B-CAS descrambling), electronic program guide (EPG) information, program attribute information (program genre, etc.) for acquiring a program from the signal processing unit 206. Subtitle information and the like (service information, SI and PSI) are input. The control unit 205 generates screen information for displaying EPG information, caption information, and the like from the input information, and outputs the generated screen information to the graphic processing unit 207.
  • EPG electronic program guide
  • program attribute information program genre, etc.
  • Subtitle information and the like service information, SI and PSI
  • the control unit 205 generates screen information for displaying EPG information, caption information, and the like from the input information, and outputs the generated screen information to the graphic processing unit 207.
  • control unit 205 has a function of controlling program recording and program reservation recording.
  • the control unit 205 When the program reservation is accepted, the control unit 205 generates screen information for displaying the EPG information on the display unit 214, and performs graphic processing on the generated screen information.
  • reservation contents are set in a predetermined storage unit by a user input via the operation unit 220 or the remote controller 221. Then, the control unit 205 controls the tuners 202a and 204a, the PSK demodulator 202b, the OFDM demodulator 204b, the TS decoders 202c and 204c, and the signal processing unit 206 so as to record the reserved program at the set time.
  • the digital television 100 when the digital television 100 automatically records programs of all channels that can be recorded by the round recording function, the digital television 100 performs recording by controlling each device during a time period set separately from the reservation.
  • the OSD (On Screen Display) signal generation unit 209 generates setting screen information (OSD signal) for displaying a setting screen for setting various information, and outputs the generated setting screen information to the graphic processing unit 207. To do.
  • the graphic processing unit 207 outputs the digital video signal output from the signal processing unit 206, the setting screen information generated by the OSD signal generation unit 209 and the screen information generated by the control unit 205 to the video processing unit 210.
  • the digital video signal output from the graphic processing unit 207 is supplied to the video processing unit 210.
  • the video processing unit 210 converts the input digital video signal into an analog video signal in a format that can be displayed on an external device connected via the display unit 214 or the output terminal 211, and then outputs the analog video signal to the output terminal 211 or the display unit.
  • the video is output to 214 and displayed.
  • the audio output unit 208 converts the input digital sound signal into an analog sound signal in a format that can be played back by the speaker 213, and then outputs the analog sound signal to an external device or speaker 213 connected via the output terminal 212. Let it play.
  • control unit 205 incorporates a CPU (Central Processing Unit) and the like, receives operation information from the operation unit 220, or receives operation information sent from the remote controller 221 via the light receiving unit 222. Each unit is controlled so that the operation content is reflected.
  • CPU Central Processing Unit
  • the control unit 205 stores a ROM (Read Only Memory) 205a that stores a control program executed by the CPU, a RAM (Random Access Memory) 205b that provides a work area for the CPU, and various setting information and control information.
  • the non-volatile memory 205c is used.
  • the control unit 205 is connected to a card holder 225 in which a memory card 224 can be mounted via a card I / F (Interface) 223. As a result, the control unit 205 can transmit information to the memory card 224 attached to the card holder 225 via the card I / F 223.
  • control unit 205 is connected to the first LAN terminal 230 via the communication I / F 229. As a result, the control unit 205 can transmit information to and from the LAN compatible device connected to the first LAN terminal 230 via the communication I / F 229.
  • the control unit 205 is connected to the second LAN terminal 232 via the communication I / F 231. Accordingly, the control unit 205 can transmit information to and from various LAN-compatible devices connected to the second LAN terminal 232 via the communication I / F 231.
  • control unit 205 is connected to the USB terminal 234 via the USB I / F 233. Accordingly, the control unit 205 can transmit information to various devices (for example, the external storage device 226) connected to the USB terminal 234 via the USB I / F 233.
  • FIG. 2 is a block diagram illustrating a configuration of a signal processing unit included in the digital television according to the first embodiment.
  • the signal processing unit 206 decodes a video signal (image information reproduced in synchronization with a sound signal) input from the broadcast input unit 202 or the control unit 205 into a data format that can be processed by the video processing unit 210.
  • An audio decoder 242 that decodes the sound signal input from the broadcast input unit 202 or the control unit 205 into a data format that can be processed by the audio output unit 208; and an output destination of the sound signal decoded by the audio decoder 242 as a separator
  • a switch unit 248 that switches to 243 or the synchronization processing unit 247, a separator 243 that separates background sound information and first sound information from a sound signal (sound information) decoded by the sound decoder 242, and a first sound information Performing voice recognition processing to analyze and acquire the content of the first voice information as text data;
  • the translator 244 that translates the text data into a translated language (second language) that is a language different from the original language (first language) that is the language of the first speech information, and the text data translated into the translated language
  • a synthesizer 245 for synthesizing the second sound information
  • a mixing unit 246 for mixing and outputting the background sound information and the second sound information, and a sound obtained by
  • the translator 244 and the synthesizer 245 function as a conversion unit that converts the first speech information into second speech information in a translation language different from the original language of the first speech information.
  • the translator 244 and the synthesizer 245 convert the first speech information into second speech information in a translation language different from the original language of the first speech information will be described. What is necessary is just to convert information into the 2nd audio
  • the first voice information in the standard language may be converted into the second voice information in the dialect, or the first voice information in the voice may be converted into the second voice information in the pseudo sound.
  • the signal processing unit 206 includes a switch unit 248.
  • the switch unit 248 When the conversion to the second audio information is instructed by the control signal from the control unit 205, the switch unit 248 outputs the sound information decoded by the audio decoder 242 to the separator 243, and the separators 243, 243 The sound information is output to the synchronization processing unit 247 via the translator 244, the synthesizer 245 and the mixing unit 246.
  • the switch unit 248 does not go through the separator 243, the translator 244, the synthesizer 245, and the mixing unit 246.
  • the input sound information is output to the synchronization processing unit 247.
  • FIG. 3 is a flowchart illustrating a flow of output processing of sound information and image information by a signal processing unit included in the digital television according to the first embodiment.
  • FIG. 4 is a diagram illustrating an example of a setting screen for various information in the digital television according to the first embodiment.
  • the OSD signal generation unit 209 (an example of a display control unit) performs sound information and image information output processing by the signal processing unit 206 when the control unit 205 instructs conversion to second audio information.
  • the control unit 205 instructs conversion to second audio information.
  • the setting screen information of the setting screen that can set the setting (synchronization setting) is generated and output to the graphic processing unit 207.
  • the OSD signal generation unit 209 causes the display unit 214 to display a setting screen.
  • the OSD signal generation unit 209 can input the volume of each of the first sound information (original sound), the second sound information (translated sound), and the background sound information (background sound).
  • a slider 401 that is an example of an image for use, a select box 402 that can input a translation language that is the language of the second audio information, and whether to adjust the reproduction time of the second audio information or the reproduction time of the image information can be set
  • a setting screen 400 including a radio button 403 and the like is displayed on the display unit 214.
  • the OSD signal generation unit 209 displays on the display unit 214 the slider 401 that can input the volume of each of the background sound information, the first sound information, and the second sound information. However, at least the first sound information is displayed. It is only necessary to display a volume input image capable of inputting the volume of each of the second audio information.
  • the audio decoder 242 first determines whether or not conversion to second audio information is instructed by the control signal from the control unit 205 (step S301). When conversion to the second audio information is instructed (step S301: Yes), the audio decoder 242 decodes the input audio information into a data format that can be processed by the audio output unit 208. Further, the separator 243 separates the first sound information and the background sound information from the sound information decoded by the sound decoder 242 (step S302).
  • the separator 243 first performs frequency analysis of the sound information and acquires a feature amount of the sound information.
  • the separator 243 may acquire a feature amount obtained by frequency analysis in an external device.
  • the separator 243 calculates a background sound base matrix representing the background sound using the feature amount acquired at a certain time.
  • the separator 243 estimates a first background sound component having non-stationaryness among the background sound components of the feature amount using the acquired feature amount and the calculated background sound base matrix.
  • the separator 243 estimates a representative component of the first background sound component within a predetermined time from the first background sound component estimated from one or more feature amounts acquired at a predetermined time including the past.
  • the separator 243 estimates the first speech component that is the speech component of the feature amount using the acquired feature amount. Further, the separator 243 creates a filter that extracts the spectrum of the sound or the spectrum of the background sound from the estimated first sound component and the representative component of the first background sound component. Next, the separator 243 separates the sound information into the first sound information and the background sound information using the created filter and the spectrum of the sound information.
  • the translator 244 acquires text data from the first voice information separated from the sound information by the separator 243 by voice recognition processing (step S303). Further, the translator 244 acquires a translation language set in advance on the setting screen 400 shown in FIG. 4 (step S304). Then, the translator 244 translates the text data acquired from the first speech information into text data of a preset translation language by natural language processing (step S305).
  • the synthesizer 245 synthesizes speech information (second speech information in the translation language) from the text data translated by the translator 244 (text data in a preset translation language) (step S306).
  • the mixing unit 246 performs synchronization setting (in this embodiment, synchronization setting input on the setting screen 400 shown in FIG. 4) indicating whether to adjust the reproduction time of the second audio information or the reproduction time of the image information. Obtain (step S307). Next, the mixing unit 246 determines whether or not the reproduction time of the synthesized second audio information is different from the reproduction time of the first audio information (step S308). If the reproduction time of the second audio information is different from the reproduction time of the first audio information (step S308: Yes), the mixing unit 246 adjusts the reproduction time of the second audio information based on the acquired synchronization setting. It is determined whether or not (step S309).
  • synchronization setting in this embodiment, synchronization setting input on the setting screen 400 shown in FIG. 4
  • the mixing unit 246 determines whether or not the reproduction time of the second audio information is different from the reproduction time of the first audio information, but the reproduction time of the second audio information and the first audio information When the difference from the reproduction time is longer than the predetermined allowable time, the reproduction time of the second audio information or the reproduction time of the image information may be adjusted. Thus, when the difference between the reproduction time of the second audio information and the reproduction time of the first audio information is short, the image information is viewed without adjusting the reproduction time of the second audio information or the reproduction time of the image information. be able to.
  • the mixing unit 246 synchronizes the reproduction time of the second audio information with the second audio information.
  • the reproduction time of the image information to be reproduced (in other words, the image information corresponding to the second audio information) is the same as the reproduction time (in other words, the reproduction time of the second audio information is the same as the reproduction time of the first audio information).
  • the reproduction time of the second audio information is adjusted (step S310). As a result, the second audio information and the image information can be reproduced in synchronization.
  • the mixing unit 246 compares the time stamp added to the second audio information with the time stamp added to the image information, so that the second image information is selected from the input image information. Image information to be reproduced in synchronization with audio information is determined. Further, in the present embodiment, the mixing unit 246 has the second audio information ((2) so that the reproduction time of the second audio information is the same as the reproduction time of the image information reproduced in synchronization with the second audio information.
  • the reproduction time of the image information is adjusted, but the difference between the reproduction time of the second audio information and the reproduction time of the image information reproduced in synchronization with the second audio information is equal to or less than a predetermined allowable time.
  • it may be anything that adjusts the reproduction time of the second audio information (or image information).
  • the translator 244 translates the text data acquired from the first audio information into a plurality of text data in a preset translation language.
  • the synthesizer 245 synthesizes a plurality of second speech information candidates from each of a plurality of text data in a preset translation language. That is, the translator 244 and the synthesizer 245 convert the first speech information into a plurality of second speech information candidates.
  • the mixing unit 246 selects a second audio information candidate that can be reproduced at the same reproduction time as the reproduction time of the image information that is reproduced in synchronization with the second audio information, from among the plurality of second audio information candidates. The reproduction time of the second audio information is adjusted by selecting and selecting the selected second audio information candidate as the second audio information.
  • the synthesizer 245 synthesizes a plurality of candidates for the second speech information from all the plurality of text data in a preset translation language.
  • the present invention is not limited to this, and is preset.
  • the reproduction is performed with the same reproduction time as the reproduction time of the image information reproduced in synchronization with the second audio information. It is also possible to select text data that can be second possible voice information, and use the voice information synthesized from the selected text data as the second voice information.
  • the mixing unit 246 selects a second audio information candidate that can be reproduced with the same reproduction time as the reproduction time of the image information from a plurality of second audio information candidates as the second audio information.
  • the present invention is not limited to this.
  • the second audio information is controlled by controlling the audio output unit 208 to change the reproduction speed for reproducing the second audio information. You may adjust the playback time.
  • the synchronization processing unit 247 reproduces the reproduction time of the image information that is reproduced in synchronization with the second audio information.
  • the reproduction time of the image information is adjusted so as to be the same as the reproduction time of the second audio information (step S311).
  • the synchronization processing unit 247 controls the video processing unit 210 to adjust the reproduction time of the image information by changing the reproduction speed for reproducing the image information reproduced in synchronization with the second audio information. To do. Thereby, it becomes possible to reproduce
  • the synchronization processing unit 247 adjusts the reproduction time of the image information by changing the reproduction speed at which the image information is reproduced.
  • the image information is the moving image information.
  • the reproduction time of the image information may be adjusted by thinning out some of the plurality of frames constituting the moving image information or adding frames.
  • the playback time of the second audio information or the playback time of the image information that is played back in synchronization with the second audio information is adjusted. At least one of the reproduction time of the second audio information and the reproduction time of the image information reproduced in synchronization with the second audio information so that the reproduction time of the image information reproduced in synchronization with the audio information becomes the same.
  • the present invention is not limited to this as long as one is adjusted.
  • the reproduction time of the second audio information is a time that is twice or more the reproduction time of the image information reproduced in synchronization with the second audio information, or when the reproduction time of the second audio information is The playback time of the second audio information and the playback time of the image information played back in synchronization with the second audio information, such as when the playback time is half or less of the playback time of the image information played back in synchronization with the 2 audio information Is greater than a preset allowable value
  • the second audio information is adjusted by adjusting either the reproduction time of the second audio information or the reproduction time of the image information reproduced in synchronization with the second audio information.
  • the reproduction time of the second audio information and the reproduction time of the image information reproduced in synchronization with the second audio information is made the same.
  • the reproduction time of the second audio information is short, the reproduction time of the second audio information is lengthened and the reproduction time of the image information reproduced in synchronization with the second audio information is shortened.
  • the reproduction time of the second audio information is long, the reproduction time of the second audio information is shortened and the reproduction time of the image information reproduced in synchronization with the second audio information is lengthened.
  • the reproduction time of the second audio information and the reproduction time of the image information reproduced in synchronization with the second audio information is determined based on the synchronization setting.
  • the present invention is not limited to this. Specifically, based on at least one of the type of the image reproduced from the image information and the difference between the reproduction time of the second audio information and the reproduction time of the image information, the reproduction time of the second audio information and the It may be determined which of the reproduction times of the image information reproduced in synchronization with the second audio information is adjusted.
  • the difference between the reproduction time of the second audio information and the reproduction time of the image information reproduced in synchronization with the second audio information is less than a preset allowable value. If there is a low possibility that the user will feel uncomfortable with the image reproduced from the image information even if the reproduction time of the image information is adjusted in some cases, it may be decided to adjust the reproduction time of the image information. .
  • the image information is moving image information, or when the difference between the reproduction time of the second audio information and the reproduction time of the image information reproduced in synchronization with the second audio information is greater than a preset allowable value May be determined to adjust the reproduction time of the second audio information.
  • the mixing unit 246 Adjusts the frequency of the second audio information based on the original language of the first audio information and the translated language of the second audio information (step S312). For example, when the original language of the first audio information is English and the translation language of the second audio information is Japanese, the mixing unit 246 lowers the frequency of the second audio information.
  • the mixing unit 246 is configured to input the volume previously input for each of the first sound information, the second sound information, and the background sound information (in the present embodiment, on the setting screen 400 shown in FIG. 4, the first sound information, the second sound information).
  • the volume input for each of the information and the background sound information) is acquired (step S313).
  • the mixing unit 246 adjusts the volume of each of the first sound information, the second sound information, and the background sound information in accordance with the volume input in advance (step S314).
  • the mixing unit 246 adjusts the volume of each of the first sound information, the second sound information, and the background sound information according to the volume input in advance, but the present invention is not limited to this.
  • the mixing unit 246 may adjust the volume of the second audio information according to the volume of the first audio information.
  • the mixing unit 246 can prevent the second voice information from becoming difficult to hear by making the volume of the first voice information smaller than the volume of the second voice information.
  • the mixing unit 246 mixes (in other words, adds) the first audio information, the second audio information, and the background sound information (step S315).
  • the mixing unit 246 mixes the first sound information, the second sound information, and the background sound information. However, at least the second sound information and the background sound information are mixed and output. It ’s fine.
  • the mixing unit 246 mixes and outputs the background sound information and the second sound information reproduced in synchronization with the background sound information. In other words, the mixing unit 246 adjusts the timing of outputting the background sound information and the second sound information reproduced in synchronization with the background sound information, and outputs the background sound information and the second sound information in synchronization. To do.
  • the mixing unit 246 compares the time stamp added to the second sound information with the time stamp added to the background sound information, and thereby compares the second background information with the second background sound information.
  • the background sound information to be reproduced in synchronization with the sound information is determined.
  • the mixing unit 246 may adjust the volume of the second audio information based on the original language of the first audio information and the translated language of the second audio information. For example, when the original language of the first audio information is English and the translated language of the second audio information is Japanese, the volume of the second audio information is set higher than the volume of the first audio information.
  • the synchronization processing unit 247 outputs the image information and the first information by delaying the image information output from the image decoder 241 to the video processing unit 210 after delaying the conversion time required for the conversion from the first audio information to the second audio information. Synchronization processing for reproducing the two audio information in synchronization is executed (step S316).
  • the sound output unit 208 outputs sound information obtained by mixing the first sound information, the second sound information, and the background sound information in the mixing unit 246 to the speaker 213 via the synchronization processing unit 247 (step S317).
  • the video processing unit 210 outputs the image information output from the image decoder 241 to the display unit 214 via the synchronization processing unit 247 (step S317).
  • the background sound information and the first sound information are separated from the input sound information, and the first sound information is converted into the first sound information.
  • the second voice information is converted into the second voice information in a translation language different from the original language, and the first voice information is replaced with the second voice information by mixing and outputting the background sound information and the second voice information. Therefore, when outputting the 2nd audio
  • the background sound information and the first sound information are separated from the input sound information, and the first sound information to the second sound information. It is an example which performs conversion to audio
  • FIG. 5 is a diagram illustrating a configuration of an information processing system having a notebook PC as an example of an electronic apparatus according to the second embodiment.
  • a notebook PC Personal Computer
  • a notebook PC 500 includes a content server 510 that stores content to be reproduced (content including at least sound information) via a network such as the Internet, and a notebook.
  • a Web server 520 that exchanges various types of information with the notebook PC 500 via a browser executed on the PC 500, separation of background sound information and first sound information from the input sound information, and text data from the first sound information
  • a speech processing server 530 that performs acquisition and the like is connected to a translation server 540 that translates text data acquired from the first speech information into a translation language.
  • FIG. 6 is a sequence diagram showing a flow of sound information output processing in the information processing system according to the second embodiment.
  • the notebook PC 500 connects to the Web server 520 through a browser, and requests the Web server 520 to display the setting screen 400 (see FIG. 4) (step S601).
  • the Web server 520 transmits the screen information of the setting screen 400 to the notebook PC 500, and displays the setting screen 400 on a display unit (not shown) of the notebook PC 500 (step S602).
  • the notebook PC 500 transmits various settings set on the setting screen 400 (volumes of the first voice information, second voice information and background sound information, translation language settings, synchronization settings, etc.) to the web server 520 (step S603). ). Furthermore, the notebook PC 500 selects content to be output from the content stored in the content server 510 via the browser (step S604).
  • the Web server 520 requests the content server 510 to acquire the content selected on the notebook PC 500 (step S605), and acquires the content from the content server 510 (step S606).
  • the Web server 520 transmits the sound information included in the acquired content to the sound processing server 530, and requests the separation of the first sound information and the background sound information from the sound information (step S607).
  • the speech processing server 530 separates the background sound information and the first speech information from the sound information and the text from the first speech information in the same manner as the separator 243 (see FIG. 2) and the translator 244 (see FIG. 2). Acquire data. Then, the web server 520 acquires the first sound information, background sound information, and text data from the sound processing server 530 (step S608).
  • the Web server 520 transmits the text data acquired from the speech processing server 530 and the translation language set on the setting screen 400 (see FIG. 4) to the translation server 540 and requests translation of the text data into the translation language (Ste S609).
  • the translation server 540 translates the text data into the translation language in the same manner as the translator 244 (see FIG. 2). Then, the web server 520 acquires text data (translation result) translated into the translation language from the translation server 540 (step S610).
  • the Web server 520 includes text data translated into a translation language, background sound information, first sound information, and various settings (background sound information, first sound information, and second sound) set on the setting screen 400 (see FIG. 4).
  • the volume of each information, the synchronization setting, etc.) is transmitted to the audio processing server 530, and the second audio information is synthesized, various adjustments (for example, adjustment of the reproduction time of the second audio information, the first audio information, the second audio information) And mixing of the second sound information and the background sound information is requested (step S611).
  • the sound processing server 530 Similar to the synthesizer 245 (see FIG. 2) and the mixing unit 246 (see FIG. 2), the sound processing server 530 performs synthesis of the second sound information, various adjustments, and mixing of the second sound information and the background sound information. Do. Then, the web server 520 acquires sound information obtained by mixing the second sound information and the background sound information (step S612).
  • the Web server 520 transmits the content obtained by replacing the sound information included in the content acquired in step S606 with the sound information acquired from the audio processing server 530 to the notebook PC 500 (step S613).
  • the background sound information and the first sound information are separated from the input sound information, and the first sound information is used. Since it is not necessary to perform conversion to the second sound information and mixing of the background sound information and the second sound information, the processing load on the notebook PC 500 can be reduced.
  • FIG. 7 is a diagram illustrating a hardware configuration of a PC that is an example of an electronic apparatus according to the third embodiment.
  • the PC 700 includes a CPU 701, a ROM 702, a RAM 703, a display unit 704, an input unit 705, a storage control unit 706, a communication unit 707, a speaker 708, and a storage device 709. I have.
  • the CPU 701 performs various processes in cooperation with various control programs stored in the ROM 702 or the like using the RAM 703 as a work area, and comprehensively controls the operation of each unit constituting the PC 700.
  • the ROM 702 stores a program for controlling the PC 700, various setting information, and the like in a non-rewritable manner.
  • the RAM 703 is a volatile storage medium and functions as a work area for the CPU 701.
  • the display unit 704 has a display screen configured by an LCD (Liquid Crystal Display), an organic EL (Electro Luminescence) display, and the like, and displays a process progress, a result, and the like according to control of the CPU 701.
  • the speaker 708 outputs sound information according to the control of the CPU 701.
  • the input unit 705 has an input device such as a keyboard and a mouse, and notifies the CPU 701 of commands and information input from the user via the input device.
  • the storage control unit 706 controls the operation of the storage device 709, and executes processing corresponding to a request such as data writing or data reading input from the CPU 701, in the storage device 709.
  • the storage device 709 is a storage device having a recording medium such as a magnetic disk, a semiconductor memory, or an optical disk.
  • the communication unit 707 is a wireless communication interface, establishes communication with an external device (not shown), and transmits and receives data (for example, content including sound information and image information).
  • FIG. 8 is a block diagram showing a functional configuration of the PC according to the third embodiment.
  • the PC 700 executes an image decoder 710, an audio decoder 711, a separator 243, a translator 244, a synthesizer 245, and a mixing unit 246 by executing a program stored in the ROM 702 by the CPU 701.
  • the image decoder 710 decodes image information included in the content received by the communication unit 707 (image information reproduced in synchronization with sound information included in the content) into a data format that can be processed by the video processing unit 712.
  • the audio decoder 711 decodes sound information included in the content received by the communication unit 707 into a data format that can be processed by the audio output unit 713.
  • the switch unit 248 switches the output destination of the sound signal decoded by the audio decoder 242 to the separator 243 or the synchronization processing unit 247.
  • the separator 243 separates the background sound information and the first sound information from the sound information decoded by the sound decoder 711.
  • the translator 244 performs a speech recognition process of analyzing the first speech information and acquiring the content of the first speech information as text data, and using the original language (the first language as the language of the first speech information) for the text data.
  • the language is translated into a translated language (second language) which is a different language.
  • the synthesizer 245 synthesizes the second speech information based on the text data translated into the translation language.
  • the mixing unit 246 mixes and outputs the background sound information and the second sound information.
  • the synchronization processing unit 247 synchronizes and outputs the sound information obtained by mixing the background sound information and the second sound information by the mixing unit 246 and the image information reproduced in synchronization with the sound information
  • the video processing unit 712 converts the image information output from the synchronization processing unit 247 into an analog video signal in a format that can be displayed on the display unit 704, and then outputs the analog video signal to the display unit 704 for video display.
  • the audio output unit 713 converts the digital sound information output from the synchronization processing unit 247 into an analog sound signal in a format that can be reproduced by the speaker 708, and then outputs the analog sound signal to the speaker 708 for audio reproduction.
  • the second sound information As described above, according to the first to third embodiments, it is possible to prevent the second sound information from becoming difficult to hear when outputting the second sound information converted from the first sound information. It is also possible to prevent the background sound from being heard.
  • the program executed by the electronic device of the present embodiment is provided by being incorporated in advance in a ROM or the like.
  • the program executed in the electronic device of the present embodiment is a file in an installable format or an executable format, and is a computer such as a CD-ROM, flexible disk (FD), CD-R, DVD (Digital Versatile Disk). It may be configured to be recorded on a readable recording medium.
  • the program executed by the electronic device of the present embodiment may be provided by being stored on a computer connected to a network such as the Internet and downloaded via the network. Further, the program executed by the electronic device of the present embodiment may be provided or distributed via a network such as the Internet.
  • the program executed by the electronic device of the present embodiment has a module configuration including the above-described units (separator 243, translator 244, synthesizer 245, mixing unit 246, and synchronization processing unit 247).
  • a CPU processor

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Selon un mode de réalisation de l'invention, un dispositif électronique est équipé d'une unité de séparation, une unité de conversion, et une unité de sortie. L'unité de séparation sépare des informations sonores d'arrière-plan et des premières informations vocales d'informations sonores. L'unité de conversion convertit les premières informations vocales en deuxièmes informations vocales correspondant aux premières informations audios. L'unité de sortie mixe puis produit en sortie les informations sonores d'arrière-plan et les deuxièmes informations vocales.
PCT/JP2013/067716 2013-06-27 2013-06-27 Dispositif électronique, méthode de sortie, et programme WO2014207874A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2013/067716 WO2014207874A1 (fr) 2013-06-27 2013-06-27 Dispositif électronique, méthode de sortie, et programme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2013/067716 WO2014207874A1 (fr) 2013-06-27 2013-06-27 Dispositif électronique, méthode de sortie, et programme

Publications (1)

Publication Number Publication Date
WO2014207874A1 true WO2014207874A1 (fr) 2014-12-31

Family

ID=52141274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/067716 WO2014207874A1 (fr) 2013-06-27 2013-06-27 Dispositif électronique, méthode de sortie, et programme

Country Status (1)

Country Link
WO (1) WO2014207874A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827843A (zh) * 2018-08-14 2020-02-21 Oppo广东移动通信有限公司 音频处理方法、装置、存储介质及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322077A (ja) * 1999-05-12 2000-11-24 Sony Corp テレビジョン装置
JP2001238299A (ja) * 2000-02-22 2001-08-31 Victor Co Of Japan Ltd 放送受信装置
JP2009152782A (ja) * 2007-12-19 2009-07-09 Toshiba Corp コンテンツ再生装置及びコンテンツ再生方法
JP2010074574A (ja) * 2008-09-19 2010-04-02 Toshiba Corp 電子機器及び音声調整方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000322077A (ja) * 1999-05-12 2000-11-24 Sony Corp テレビジョン装置
JP2001238299A (ja) * 2000-02-22 2001-08-31 Victor Co Of Japan Ltd 放送受信装置
JP2009152782A (ja) * 2007-12-19 2009-07-09 Toshiba Corp コンテンツ再生装置及びコンテンツ再生方法
JP2010074574A (ja) * 2008-09-19 2010-04-02 Toshiba Corp 電子機器及び音声調整方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827843A (zh) * 2018-08-14 2020-02-21 Oppo广东移动通信有限公司 音频处理方法、装置、存储介质及电子设备
CN110827843B (zh) * 2018-08-14 2023-06-20 Oppo广东移动通信有限公司 音频处理方法、装置、存储介质及电子设备

Similar Documents

Publication Publication Date Title
JP5201692B2 (ja) クローズド・キャプションをつけるシステムおよび方法
US8112783B2 (en) Method of controlling ouput time and output priority of caption information and apparatus thereof
JP5423425B2 (ja) 画像処理装置
US8301457B2 (en) Method for selecting program and apparatus thereof
TW200522731A (en) Translation of text encoded in video signals
JP6399726B1 (ja) テキストコンテンツ生成装置、送信装置、受信装置、およびプログラム
US20090149128A1 (en) Subtitle information transmission apparatus, subtitle information processing apparatus, and method of causing these apparatuses to cooperate with each other
JP4989271B2 (ja) 放送受信機及び表示方法
JP2006211488A (ja) 映像再生装置
JP5110978B2 (ja) 送信装置、受信装置及び再生装置
US20140119542A1 (en) Information processing device, information processing method, and information processing program product
JP2010016521A (ja) 映像処理装置および映像処理方法
JP6385236B2 (ja) 映像再生装置および映像再生方法
WO2014207874A1 (fr) Dispositif électronique, méthode de sortie, et programme
US8059941B2 (en) Multiplex DVD player
JP7001639B2 (ja) システム
JP2009260685A (ja) 放送受信装置
KR20150081706A (ko) 영상표시장치, 영상처리방법 및 컴퓨터 판독가능 기록매체
JP2006148839A (ja) 放送装置、受信装置、及びこれらを備えるデジタル放送システム
KR20060127630A (ko) 방송 프로그램을 저장하고 재생하는 장치 및 방법
KR100781284B1 (ko) 고음질 오디오 파일의 부가정보를 생성하는 방송 수신기 및그 제어방법
JP2006050507A (ja) ディジタル放送内容表示装置およびその表示方法
JP4968946B2 (ja) 情報処理装置、映像表示装置、及びプログラム
JP2009159270A (ja) 録画装置
KR20060130800A (ko) 방송 스트림의 캡션 데이터를 이용한 학습자료 제작 방법및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13887811

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13887811

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP