WO2009130773A1 - Reproducing device and method, and computer program - Google Patents

Reproducing device and method, and computer program Download PDF

Info

Publication number
WO2009130773A1
WO2009130773A1 PCT/JP2008/057893 JP2008057893W WO2009130773A1 WO 2009130773 A1 WO2009130773 A1 WO 2009130773A1 JP 2008057893 W JP2008057893 W JP 2008057893W WO 2009130773 A1 WO2009130773 A1 WO 2009130773A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
audio
adjustment
sound
pattern
Prior art date
Application number
PCT/JP2008/057893
Other languages
French (fr)
Japanese (ja)
Inventor
四郎 鈴木
英世 水流
Original Assignee
パイオニア株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パイオニア株式会社 filed Critical パイオニア株式会社
Priority to PCT/JP2008/057893 priority Critical patent/WO2009130773A1/en
Publication of WO2009130773A1 publication Critical patent/WO2009130773A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • H04N21/4545Input to filtering algorithms, e.g. filtering a region of the image

Definitions

  • the present invention is composed of video signals and audio signals, such as a flat display with speakers, an AV (Audio Visual: AV) system, a television device, a DVD player, or a BD (Blu-ray Disc: BD) player.
  • AV Audio Visual
  • BD Blu-ray Disc
  • the present invention relates to a technical field of a playback apparatus and method for playing back content and a computer program.
  • the playback apparatus When viewing such content, if the sound quality is not harmonized with the content, the viewer may feel very uncomfortable. For example, consider a news program and a music program as contents. In general, a music program has a wide frequency range of low and high frequency audio reproduction compared to the frequency of audio reproduction of a news program. Therefore, the playback apparatus performs audio adjustment on the audio signal so as to emphasize the low frequency and high frequency in accordance with the audio reproduction frequency band of the music program. Thereby, the viewer can view the music program with high sound quality. However, if the viewer views another news program without changing the type of sound adjustment as it is, the bass may be emphasized too much, making it difficult to hear the human voice.
  • Patent Document 1 refers to electronic program information (Electronic Program Guide: EPG) which is recorded in advance in the electronic program information regarding the content to be reproduced.
  • EPG Electronic Program Guide
  • a technique for automatically switching the type of audio adjustment to be applied in accordance with “program genre” information is disclosed.
  • Patent Document 1 may have the following problems. That is, for the same content, since the “program genre” information is the same, the same audio adjustment is applied, but even within the same content, for example, a live scene where music is played and appearance May be mixed with studio scenes where people talk, and depending on the scene, inappropriate audio adjustments may be applied, such as audio that does not match the video (that is, the viewer feels uncomfortable) There is a risk of it.
  • the present invention has been made in view of, for example, the above-mentioned problems, and can suitably reduce a sense of discomfort felt by a viewer due to a mismatch between video and audio when playing content composed of video and audio signals.
  • PROBLEM TO BE SOLVED To provide a reproducing apparatus and method, and a computer program.
  • a playback device plays back content composed of a video signal and an audio signal, and the video signal input to the playback device is converted into one or more video signals.
  • Analyzing means for analyzing each video signal parameter, a pattern suitable for each scene of the reproduced content, each video signal parameter relating to the analyzed video signal, and each video relating to each of a plurality of predetermined patterns
  • Pattern discriminating means for discriminating on the basis of the degree of conformity with the signal parameter, audio adjusting means for performing a predetermined audio adjustment on the audio signal input to the reproduction apparatus, and a type corresponding to the determined pattern
  • Voice control means for controlling the voice adjustment means so as to perform the voice adjustment on the input voice signal.
  • the uncomfortable feeling that the viewer feels due to the mismatch between the video and the audio can be suitably reduced as follows.
  • Content composed of video signals and audio signals means programs that are perceived by the viewer's visual and auditory sense, such as movies, sports, animation, studios, or live performances.
  • the video signal and the audio signal are sequentially input to the reproduction device.
  • the analysis means is composed of, for example, a signal extraction circuit and an arithmetic / storage circuit, and analyzes the video signal input to the playback apparatus for each of one or a plurality of video signal parameters.
  • the video signal parameter is, for example, luminance information (for example, information indicating whether the image is bright or dark), color information (for example, information indicating primary color or intermediate color, or a large number of colors) in a plurality of images constituting the video. ), Motion information (for example, information indicating whether it is static or dynamic), frame information (for example, information indicating the presence / absence of a black belt), and the like.
  • “Analyze for each video signal parameter” is a general term for qualitative or quantitative analysis for each video signal parameter.
  • the pattern discriminating means comprises, for example, an arithmetic / storage circuit, and matches each scene of the content to be reproduced with each video signal parameter relating to the analyzed video signal and each video signal relating to each of a plurality of predetermined patterns. Discrimination is made based on the degree of compatibility with the parameters. For example, a video signal parameter related to a video signal is compared with each video signal parameter related to each of a plurality of predetermined patterns stored in advance in a storage circuit. It is determined that the video signal corresponds to the pattern.
  • the video signal to be discriminated may be one frame constituting the video or a plurality of frames over a predetermined period.
  • the sound adjustment means performs predetermined sound adjustment on the sound signal input to the playback device.
  • the audio adjustment is a general term for processes that can change the impression given to the viewer by the audio output based on the audio signal by some method, such as gain increase / decrease processing for each band or predetermined acoustic processing. It is.
  • the voice control unit is, for example, a control unit, and controls the voice adjustment unit so that the type of voice adjustment corresponding to the determined pattern is performed on the input voice signal. For example, if it is determined that the pattern that matches the scene of the content to be played is live, the audio adjustment is performed so that the audio signal that relatively emphasizes the low frequency band is applied to the input audio signal. Control means. Thereafter, the pattern that matches the scene of the content to be reproduced is determined again by the analysis unit and the pattern determination unit regularly or irregularly, preferably in real time. At this time, for example, when it is determined that the pattern suitable for the scene of the content to be played back is a studio, audio adjustment that relatively emphasizes high frequencies is performed on the input audio signal. , To control the sound adjustment means.
  • the sound control unit may perform the predetermined sound adjustment on the sound signal from a sound adjustment parameter applied before performing the predetermined sound adjustment. Then, the sound adjusting means is controlled so as to perform a sound cross-fade process that changes while gradually approaching the predetermined sound adjusting parameter.
  • the audio control unit changes the time spent for the audio cross-fade processing according to the number of patterns determined to be suitable for the scene of the content to be reproduced when performing the audio cross-fade processing. May be.
  • the audio control means has two patterns that are determined to be suitable for the scene of the content to be reproduced when performing the audio cross-fading process.
  • the time spent for the audio cross-fading process may be made longer than in the case of one pattern.
  • the uncertainty of pattern discrimination accuracy can be absorbed. More specifically, the pattern discrimination accuracy is not always perfect, and the number of patterns may be further narrowed down by the next pattern discrimination result performed regularly or irregularly. Therefore, when the number of patterns that are determined to be suitable for the scene of the content to be played is one, the time spent for audio cross-fading processing remains the standard value, and the type of audio adjustment corresponding to the pattern is not lost. On the other hand, when the number of the patterns is two or more, the time spent for the audio crossfading process is longer than the standard value, and the type corresponding to one of the two or more patterns is used. Audio adjustment is performed relatively slowly.
  • an appropriate type corresponding to the other pattern is determined.
  • the sound adjustment is preferably performed on the input sound signal. In this way, when it is impossible to narrow down which pattern is optimal, it is possible to avoid applying the type of sound adjustment corresponding to any pattern rapidly, so the uncertainty of pattern discrimination accuracy Can be absorbed.
  • the video adjustment means for performing a predetermined video adjustment on the video signal input to the playback apparatus and the determined pattern may further comprise video control means for controlling the video adjustment means so as to perform the type of the video adjustment on the input video signal.
  • the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio can be reduced more suitably.
  • the video control unit is configured to calculate, based on the video adjustment parameter applied before the predetermined video adjustment is performed, in the predetermined video adjustment.
  • the video adjusting means may also be controlled so as to perform a video cross-fade process that changes while gradually approaching a predetermined video adjustment parameter.
  • the video cross-fade process is performed at the time of video adjustment, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio.
  • the video control unit performs the video crossfading process according to the number of patterns determined to be suitable for the scene of the content to be reproduced.
  • the time spent for the cross-fade process may be changed, and the lower limit value of the time spent for the video cross-fade process may be set longer than the lower limit value of the time spent for the audio cross-fade process.
  • the lower limit of the time spent on the video crossfade processing is Since it is set longer than the lower limit value of the time spent for processing, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer.
  • the time spent for the audio crossfading process is relatively shortened compared to the case where there are two or more patterns (that is, On the other hand, the time spent for the video cross-fade processing is longer than the standard value. Thereby, the discomfort felt by the viewer can be more suitably reduced.
  • the analysis means analyzes the audio signal for each one or a plurality of audio signal parameters in addition to the video signal input to the reproduction apparatus, and the pattern discrimination means Is a degree of matching between each video signal parameter related to the analyzed video signal and each video signal parameter related to each of the predetermined plurality of patterns, and each audio signal parameter related to the analyzed audio signal Based on the degree of matching with each audio signal parameter relating to each of the predetermined plurality of patterns, a pattern that matches the scene of the content to be reproduced is determined.
  • the audio signal is analyzed for each of one or more audio signal parameters.
  • Audio signal parameters include, for example, rhythm, gain for each range, information indicating whether or not a song is being played, information indicating whether or not a human speech is included, and the like. It is a parameter characterizing from The degree of matching between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of the predetermined plural patterns, each audio signal parameter relating to the analyzed audio signal, and the predetermined plural patterns
  • a pattern that matches the scene of the content to be reproduced is determined based on the degree of matching with each audio signal parameter relating to each of the above. In this way, by analyzing the audio signal, it is possible to discriminate a pattern that could not be discriminated to one from the video signal, or a pattern that was discriminated incorrectly, to a smaller number, or appropriate New patterns can be re-determined.
  • the playback method of the present invention is a playback method for playing back content composed of a video signal and an audio signal, and analyzes the input video signal for each of one or more video signal parameters.
  • An analysis step, and a pattern suitable for each scene of the content to be played back, a degree of fitness between each analyzed video signal parameter for each of the analyzed video signals and each of the predetermined video signal parameters for each of a plurality of predetermined patterns A pattern discrimination step for discriminating on the basis of the voice signal, a voice adjustment step for performing a predetermined voice adjustment on the input voice signal, and a voice adjustment of the type corresponding to the determined pattern.
  • the playback method of the present invention when playing content composed of video signals and audio signals, the viewer feels uncomfortable due to the mismatch between video and audio. It can be suitably reduced.
  • the computer program of the present invention causes a computer to function as the above-described playback device of the present invention (including various aspects thereof).
  • the computer program is stored. If the computer program is read from a recording medium such as a CD-ROM or DVD-ROM to be executed by a computer or executed after being downloaded through a communication means, the above-described book
  • the playback device of the invention (including various aspects thereof) can be constructed relatively easily. As a result, as in the case of the playback apparatus of the present invention described above, when playing back content composed of video signals and audio signals, it is possible to suitably reduce the sense of discomfort felt by the viewer due to incompatibility between video and audio. .
  • FIG. 6 is a correspondence diagram illustrating a discrimination material for discriminating a scene pattern in the playback apparatus 2 and video adjustment and audio adjustment applied to each pattern according to the first embodiment. It is a block diagram which shows the basic composition of the audio
  • FIG. 6 is a correspondence diagram illustrating a setting example of audio adjustment parameters applied to each pattern according to the first embodiment. It is a characteristic view which shows the frequency characteristic concerning the audio
  • Timing chart which compares and shows a comparative example and the 1st example about sound adjustment corresponding to a result of having distinguished a scene pattern. It is a timing chart which shows the cross fade process with respect to each of image
  • FIG. 1 is a block diagram showing a basic configuration of a playback apparatus 2 according to the first embodiment.
  • the playback device 2 analyzes the video signal and the audio signal input from the source device 1, performs a predetermined adjustment based on the analysis result, and outputs the video signal to the display unit 31.
  • the audio signal is output to the audio output unit 32.
  • the source device 1 is a device such as a DVD player, a BD player, or a set-top box that acquires video signals and audio signals constituting the content to be played back from a recording medium or a communication network and outputs them to the playback device 2 It is.
  • the playback device 2 includes the following components, analyzes a video signal and an audio signal input from the source device 1, performs a predetermined adjustment based on the analysis result, and outputs the video signal to the display unit 31, The audio signal is output to the audio output unit 32.
  • the video extraction unit 211 extracts an input video signal and outputs the video signal to each of the video analysis unit 231 and the video adjustment unit 212 provided in the control unit 23.
  • the voice extraction unit 221 extracts an input voice signal and outputs the extracted voice signal to each of the voice analysis unit 232 and the voice adjustment unit 222 provided in the control unit 23.
  • the video analysis unit 231 analyzes the extracted video signal qualitatively or quantitatively for each of one or a plurality of video signal parameters.
  • the voice analysis unit 232 analyzes the extracted voice signal qualitatively or quantitatively for each of one or a plurality of voice signal parameters.
  • the storage unit 24 stores the extracted video signal, audio signal, and analysis results thereof as video information and audio information as a history over a predetermined period.
  • the storage unit 24 stores, as pattern discrimination information, the adaptation value or the adaptation range of each video signal parameter related to each of a plurality of predetermined patterns, which is referred to when the pattern discrimination unit 233 described later performs pattern discrimination. To do.
  • the pattern discriminating unit 233 discriminates a pattern suitable for each scene of the content to be reproduced, based on at least the analysis result by the video analyzing unit 231 among the video analyzing unit 231 and the audio analyzing unit 232.
  • the pattern discriminating unit 233 compares, for example, the video signal parameters relating to the video signal with the pattern discriminating information stored in the storage unit 24 in advance, and as a result, the pattern having a predetermined number or more of matching parameters or the largest pattern is obtained. It is determined that the video signal is applicable.
  • the video control unit 234 controls the video adjustment unit 212 so as to apply the type of video adjustment corresponding to the pattern determined by the pattern determination unit 233 to the input video signal.
  • the audio control unit 235 controls the audio adjustment unit 222 so that the type of audio adjustment corresponding to the pattern determined by the pattern determination unit 233 is performed on the input audio signal.
  • the operation unit 25 manually instructs the type of audio adjustment and / or video adjustment corresponding to the desired pattern regardless of the pattern determined by the pattern determination unit 233 according to the operation of the viewer 4. Operation buttons for each type are provided so that they can be used.
  • the display unit 31 is a device that can output a video signal as a video, such as a plasma display or a liquid crystal display, for example, and provides content to the viewer 4 based on the video signal output from the video adjustment unit 212. Display the video.
  • the audio output unit 32 is a device that can output an audio signal as sound, such as a speaker or a headphone, for example. Based on the audio signal output from the audio adjustment unit 222, the audio output unit 32 outputs content to the viewer 4. Output audio.
  • the scene pattern is “movie” is determined based on the frame information of the video signal. And if the scene pattern is determined to be “movie”, the video adjustment can be set to enjoy the movie without damaging the atmosphere even in the brightest environment in the house, for example, clear contrast, brightness Adjustments are made to improve overall. In addition to or in place of this, as voice adjustment, an adjustment that achieves both a realistic expression of force and a clear dialogue is performed.
  • Whether or not the scene pattern is “sports” is determined based on the degree or presence of movement in the video signal or color information (for example, the degree of green color corresponding to lawn).
  • a video adjustment a setting for lighting a room at home and watching sports, for example, an adjustment for emphasizing green details is performed.
  • the sound adjustment in order to emphasize the sense of realism, since there are few low and high frequencies, adjustment is made to emphasize the middle and high frequencies, and in addition or alternatively, the frequency characteristics are Adjustments are made to cover many parts, and surrounds are used to increase the presence.
  • Whether or not the scene pattern is “animation” is determined based on whether or not there are many solid colors. If it is determined that the scene pattern is “animation”, as an image adjustment, a setting assuming that the room is turned on at home and an animation program is viewed, for example, an adjustment for setting a low contrast is performed. Applied. In addition to or in place of this, as a sound adjustment, the basic object is an adjustment that makes a weak sound effect attractive while the solid animation and the dialog are dominant.
  • Whether or not the scene pattern is “studio” is determined based on whether the skin color ratio is large and the movement is small in the screen area. If it is determined that the scene pattern is “studio”, the video adjustment is performed on the assumption that the room program is turned on at home and the studio program is viewed, for example, an adjustment for setting the contrast low. Is done. In addition to or instead of this, as a sound adjustment, an adjustment for realizing a dialog with high clarity even with a modest volume is performed.
  • Whether the scene pattern is “live” or not is determined based on whether it is dark overall and there is a light in the center.
  • an adjustment that emphasizes contrast is performed in order to create a live atmosphere.
  • an adjustment for emphasizing a low frequency band is performed on music-based content based on the hi-fi concept.
  • FIG. 3 is a block diagram illustrating a basic configuration of the audio adjustment unit 222 according to the first embodiment.
  • FIG. 4 is a correspondence diagram illustrating a setting example of a parameter for audio adjustment applied to each pattern according to the first embodiment.
  • the audio adjustment unit 222 is a low frequency increase / decrease unit 2221 that selectively increases / decreases a low frequency gain among input audio signals.
  • a mid-range increasing / decreasing unit 2222 that selectively increases / decreases the gain and a high-frequency increasing / decreasing unit 2223 that selectively increases / decreases the high-frequency gain are provided.
  • each of the low frequency increase / decrease unit 2221 to high frequency increase / decrease unit 2223 is determined by the audio control unit 235 so as to correspond to the pattern determined by the pattern determination unit 233 as shown in FIG. Settings are controlled.
  • the setting value of the audio adjustment parameter applied to each pattern is shown as “h000”, and the larger the setting value, the higher the gain or the applicability.
  • the adder 2228 integrates the signals output from each of the low frequency increasing / decreasing unit 2221 to the high frequency increasing / decreasing unit 2223.
  • the audio adjustment unit 222 enhances the low-frequency range of the audio signal by a special method as a sound processing, a stereoscopic sound field feeling processing unit 2224 that gives a three-dimensional sound field feeling to the integrated audio signal.
  • FIG. 3 also shows detailed configurations of these parts.
  • the three-dimensional sound field feeling processing section 2224, of the left-channel audio signal L in and right-channel audio signal R in are respectively input.
  • the input audio signals L in and R in are integrated by the adder 22241, the reverberation sound (or various delay signals) is generated by the reverberation sound generator 22242, the reverberation time is expanded, and each part of the component generation processing unit 22243 changed by a 1, a 2, a 3 administrat a n and b 1, b 2, b 3 .... b n, spread component is spatial spread of the reverberation sound is adjusted with is generated for the left and right channels.
  • the spread component generated for the left channel and the audio signal output from the variable gain increase / decrease unit 22244 are integrated by the adder 22245 and output as the left channel audio signal L OUT .
  • the spread component generated for the right channel and the audio signal output from the variable gain increasing / decreasing unit 22246 are integrated by the adding unit 22247 and output as the audio signal R OUT of the right channel.
  • the three-dimensional sound field feeling processing unit 2224 gives a three-dimensional sound field feeling to the input audio signal.
  • the special low frequency enhancement processing unit 2225 receives the audio signal In.
  • a bass sound is extracted from the input audio signal In by the bass sound extraction unit 22251, and a harmonic component is generated by the harmonic component generation unit 22252.
  • the generated overtone and the audio signal output from the variable gain increase / decrease unit 22254 are integrated by the adder 22255 and output as an audio signal Out.
  • the special low frequency enhancement processing unit 2225 enhances the low frequency of the input audio signal by a special method, and improves the low-frequency feeling using the virtual pitch effect.
  • the speech clear processing unit 2226 receives the left channel audio signal L in and the right channel audio signal R in, respectively .
  • the speech is often localized in the center of the left and right, so the difference between the signal level of the left audio signal and the signal level of the right audio signal in a certain frequency band is very small.
  • the sound in the frequency band has a very high probability of being speech indicating speech (see, for example, Japanese Patent Laid-Open No. 2007-158873). Therefore, the input audio signals L in and R in are integrated by the adder 22261, and a frequency band having a high probability of indicating a line is specified and filtered by the equalizer 22262.
  • variable gain increase / decrease units 22263 to 22267 adjust the gain increase / decrease for each of the input audio signals L in and R in and the audio signal in the specific band filtered by the equalizer 22262.
  • the audio signal of the specific band that has been filtered by the equalizer 22262 and whose gain has been adjusted by the variable gain increasing / decreasing unit 22265 includes the audio signals of both the left and right channels. Therefore, for the left channel, when the audio signal in the specific band is added to the audio signal of the left channel in the adder 22268, the audio signal of the right channel whose gain is adjusted by the variable gain increase / decrease unit 22264 is subtracted. As a result, the audio signal L OUT of the left channel in which the dialogue is emphasized is output. Similarly, the right channel audio signal R OUT in which the dialogue is emphasized is output for the right channel. In this way, the dialogue clear processing unit 2226 makes it possible to hear the dialogue of the input audio signal clearly.
  • the gain or applicability in each of the three-dimensional sound field feeling processing unit 2224 to the dialogue clear processing unit 2226 corresponds to the pattern determined by the pattern determining unit 233 as shown in FIG. 4, for example.
  • Setting control is performed by H.235.
  • FIG. 5 is a characteristic diagram illustrating frequency characteristics related to audio adjustment applied to each pattern according to the first embodiment.
  • the frequency characteristics related to the audio adjustment applied to each pattern are different for each pattern.
  • the setting idea is as shown in FIG.
  • the audio control unit 235 controls the audio adjustment unit 222 so that the frequency characteristic suitable for the pattern determined by the analysis on the video signal or the like is obtained.
  • FIGS. 7 to 9 Based on the flowchart of FIG. 6, the basic operation of the playback apparatus 2 according to the first embodiment will be described.
  • the voice control unit 235 resets the voice adjustment parameters (step S10). That is, as shown in FIG. 4, the setting value corresponding to each pattern is returned to the leftmost “standard”.
  • the video analysis unit 231 analyzes the video signal extracted by the video extraction unit 211 qualitatively or quantitatively for each of one or a plurality of video signal parameters (step S20). And based on this analysis result, the pattern discrimination
  • the audio control unit 235 performs an audio cross-fade process that will be described in detail below in order to reduce discomfort to the viewer 4 (step S40).
  • the video control unit 235 may perform a video cross fade process.
  • the audio control unit 235 refers to the correspondence diagram shown in FIG. 4 stored in advance in the storage unit 24, and sets the audio adjustment parameter corresponding to the determined pattern as a target (step) S41).
  • the target is reset from each value written in the “live” column to each value written in the “studio” column.
  • the sound control unit 235 increases or decreases the sound adjustment parameter stepwise so as to approach the set target (step S42).
  • the process waits for a predetermined period (for example, 100 mSec or less) (step S43), and repeats until the current value of the audio adjustment parameter reaches the target setting value (step S44).
  • a predetermined period for example, 100 mSec or less
  • the determination by the pattern determination unit 233 is preferably performed in parallel even during the audio cross-fading process. As a result, if the determination result changes, the target of the audio adjustment parameter is reset, and the audio cross-fading process is further performed toward the set value.
  • FIG. 7 is a timing chart showing the comparison between the comparative example and the first embodiment regarding the audio adjustment corresponding to the result of determining the scene pattern.
  • the technology according to the comparative example automatically switches the type of audio adjustment to be applied according to the “program genre” information recorded in advance in the electronic program information regarding the content. Then, as shown in the second row from the top in FIG. 7, it is impossible to distinguish between a live scene and a studio scene in the same content. Therefore, as shown in the third row from the top in FIG. 7, even if the content to be played changes from a live scene to a studio scene, the type of sound adjustment corresponding to the live scene remains applied. Therefore, the viewer 4 feels uncomfortable.
  • the video analysis unit 231 of the playback device 2 qualitatively or quantitatively analyzes the video signal extracted by the video extraction unit 211 for each of one or a plurality of video signal parameters. Analyze (step S20). And based on this analysis result, the pattern discrimination
  • the audio control unit 235 controls the audio adjustment unit 222 so as to perform the type of audio adjustment corresponding to the determined pattern on the input audio signal. As a result, it is possible to suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio.
  • FIG. 8 is a timing chart showing crossfade processing for each of video adjustment and audio adjustment corresponding to the result of determining the scene pattern according to the first embodiment.
  • video cross-fading processing and audio cross-fading processing are performed using predetermined periods T 1 and T 2 , respectively. .
  • FIG. 9 is a characteristic diagram showing the audio cross-fading process according to the first embodiment.
  • the audio control unit 235 sets the audio adjustment parameters to the target setting values (in this case, the respective “Studio” columns in FIG. 4).
  • the value is increased or decreased stepwise (for example, in three steps) so as to gradually approach the value.
  • the sound control unit 235 controls the sound adjustment unit 222 so as to perform sound adjustment of a type corresponding to the determined pattern on the input sound signal.
  • the audio crossfading process allows the predetermined audio adjustment parameters to be gradually approached over a predetermined period rather than once, so that the viewer 4 feels uncomfortable. Can be reduced.
  • video cross-fade processing is performed at the time of the video adjustment, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer 4 due to the mismatch between video and audio. .
  • FIG. 10 is a flowchart showing the basic operation of the playback apparatus 2 according to the second embodiment.
  • the playback device 2 performs voice analysis (step S21) in addition to video analysis (step S20).
  • the voice signal includes, for example, rhythm, gain for each range, information indicating whether or not music is being played, information indicating whether or not a human speech is included, and the like.
  • Each voice signal parameter is analyzed. As a result, the accuracy of pattern discrimination is improved as follows.
  • FIG. 11 is a timing chart showing a comparison between the comparative example, the first example, and the second example regarding the result of determining the scene pattern and the sound adjustment corresponding to the result.
  • the audio adjustment is of a type corresponding to the live scene. Is still applied, the viewer 4 feels uncomfortable.
  • the playback apparatus 2 performs video analysis (step S20) as shown in the fourth to sixth stages from the top in FIG. A change from a scene to a studio scene is suitably determined, and a corresponding type of audio adjustment is performed.
  • the playback apparatus 2 performs only video analysis (step S20), and at that time, it is analyzed whether or not the skin color ratio of the screen area is greater than a predetermined threshold. Therefore, as shown in the fourth and fifth steps from the top in FIG. 11, there is a possibility that the live scene is erroneously determined to be a studio scene, even though the vocal appearance is simply copied. There is. If audio adjustment is performed based on this determination result, the viewer 4 still feels uncomfortable.
  • the playback apparatus 2 analyzes the audio signal for each audio signal parameter as shown in the seventh and eighth stages from the top in FIG.
  • the voice analysis unit 232 also analyzes whether or not music is being played. Thereby, since it is confirmed that the music is played, it can avoid that the pattern discrimination
  • the second embodiment by analyzing the audio signal, it is possible to discriminate a pattern that cannot be discriminated to be one from the video signal or a pattern that has been discriminated erroneously to a smaller number. Or an appropriate pattern can be re-determined.
  • FIG. 12 is a flowchart showing the basic operation of the playback apparatus 2 according to the third embodiment.
  • FIG. 13 is a timing chart showing a comparison between the first embodiment and the third embodiment regarding the audio adjustment corresponding to the result of determining the scene pattern.
  • the same components as those of the playback apparatus 2 according to the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.
  • the pattern discriminating unit 233 discriminates the scene pattern according to the number of patterns discriminated to match the scene of the content to be played back. Change the time spent on voice crossfade processing. Specifically, first, it is determined whether there are two or more patterns that are determined to be suitable for the scene of the content to be reproduced (step S31).
  • step S31 when there is not two or more patterns that are determined to be suitable for the scene of the content to be reproduced and there is only one pattern (step S31: No), as shown in the third row from the top in FIG. setting a predetermined time T 2 spend subsequent speech crossfading to a predetermined standard value (step S32). As a result, in the subsequent audio crossfading process, the type of audio adjustment corresponding to the pattern is performed without hesitation (steps S40 to S44).
  • step S31 if there are two or more patterns determined to be suitable for the scene of the content to be played back (step S31: Yes), as shown in the fifth row from the top in FIG. the predetermined time T 2 spend than the standard value, prolonged by [Delta] T 2 (step S33). Thereby, the type of sound adjustment corresponding to one of the two or more patterns is relatively slowly performed. Then, even if it is determined that the content to be reproduced is another pattern by the next pattern determination (step S30) performed in the middle of the audio crossfading process, it corresponds to the other pattern. Appropriate types of audio adjustments are preferably applied to the input audio signal.
  • the third embodiment there are a plurality number of patterns to be discriminated, if any pattern is not fully aperture or is optimal, for example, the time T 2 spends audio crossfading relatively long Therefore, it is possible to absorb the uncertainty of the pattern discrimination accuracy because it is avoided that the voice adjustment of the type corresponding to any pattern is performed rapidly.
  • the second and fourth steps from the top of FIG. as shown in the lower limit value of the time T 1 spent on video cross-fade processing is set to be longer than the lower limit value of the time T 2 to spend on voice crossfading.
  • Reproducing apparatus 2 is a specific example of “reproducing apparatus” according to the present invention
  • the “video analysis unit 231” and / or the “voice analysis unit 232” is a specific example of the “analysis unit” according to the present invention.
  • the “pattern discrimination unit 233” is a specific example of the “pattern discrimination unit” according to the present invention.
  • “Voice control unit 235" is a specific example of “voice control means” according to the present invention
  • “Video control unit 234" is a specific example of “video control means” according to the present invention
  • the “sound adjustment unit 222” is a specific example of the “sound adjustment unit” according to the present invention
  • the “image adjusting unit 212” is a specific example of the “image adjusting unit” according to the present invention.
  • “Step S20” and / or “Step S21” is a specific example of the “analysis step” according to the present invention.
  • “Step S30” is a specific example of the “pattern discrimination step” according to the present invention.
  • “Steps S40 to S44” is a specific example of the “voice control step” according to the present invention.
  • the reproduction apparatus and method and the computer program according to the present invention are video signals such as a flat display with speakers, an AV (Audio Visual: AV) system, a television apparatus, and a BD (Blu-ray Disc: BD) player. And a playback apparatus and method for playing back content composed of audio signals, and a computer program.
  • video signals such as a flat display with speakers, an AV (Audio Visual: AV) system, a television apparatus, and a BD (Blu-ray Disc: BD) player.
  • AV Anaudio Visual
  • BD Blu-ray Disc

Abstract

A reproducing device (2) comprises analysis means (231, 232) for analyzing a video signal to be inputted for each of one or more video signal parameters, a pattern distinguishing means (233) for distinguishing patterns suitable for the scenes of contents to be reproduced according to the goodness of the video signal parameters associated with the analyzed video signals and the video signal parameters associated with predetermined patterns, an audio adjusting means (222) for applying predetermined audio adjustment to an audio signal to be inputted to the reproducing device, and an audio control means (235) for controlling the audio adjusting means so as to apply the audio adjustment of the type corresponding to the distinguished patterns to the audio signal to be inputted.

Description

再生装置及び方法、並びにコンピュータプログラムPlayback apparatus and method, and computer program
 本発明は、例えばスピーカ付きのフラットディスプレイ、AV(Audio Visual:AV)システム、テレビジョン装置、DVDプレーヤ、或いはBD(Blu-ray Disc:BD)プレーヤのように、映像信号及び音声信号から構成されるコンテンツを再生するための再生装置及び方法、並びにコンピュータプログラムの技術分野に関する。 The present invention is composed of video signals and audio signals, such as a flat display with speakers, an AV (Audio Visual: AV) system, a television device, a DVD player, or a BD (Blu-ray Disc: BD) player. The present invention relates to a technical field of a playback apparatus and method for playing back content and a computer program.
 この種の再生技術において、映像信号及び音声信号から構成されるコンテンツを再生するために、映像信号に基づく映像がディスプレイから出力され、音声信号に基づく音声がスピーカから出力される。 In this type of playback technology, in order to play back content composed of video signals and audio signals, video based on the video signals is output from the display, and audio based on the audio signals is output from the speaker.
 かかるコンテンツを視聴する場合、該コンテンツに音質が調和していなければ、視聴者は非常に違和感を覚える虞がある。例えば、コンテンツとして、ニュース番組と音楽番組とを考える。一般にニュース番組の音声再生周波数帯域に比べ、音楽番組は低域及び高域の音声再生周波数帯域が広い。そこで、再生装置は、音楽番組の音声再生周波数帯域に合わせて、低域及び高域を強調するような音声調整を音声信号に対して施す。これにより、視聴者は音楽番組を高音質で視聴できる。ところが、視聴者が、このまま音声調整の種類を変更することなく、別のニュース番組を視聴した場合、低音が強調されすぎてしまうので、人の声が聞き取りづらくなる場合がある。 When viewing such content, if the sound quality is not harmonized with the content, the viewer may feel very uncomfortable. For example, consider a news program and a music program as contents. In general, a music program has a wide frequency range of low and high frequency audio reproduction compared to the frequency of audio reproduction of a news program. Therefore, the playback apparatus performs audio adjustment on the audio signal so as to emphasize the low frequency and high frequency in accordance with the audio reproduction frequency band of the music program. Thereby, the viewer can view the music program with high sound quality. However, if the viewer views another news program without changing the type of sound adjustment as it is, the bass may be emphasized too much, making it difficult to hear the human voice.
 かかる場合には通常、視聴者がコンテンツを視聴してその内容を確認したのちに、適当な音声調整の種類を自ら判断し、音質切替えスイッチを押すなどして、適用される音声調整の種類を手動で切り替えなければならないという煩わしさがある。 In such a case, usually, after the viewer views the content and confirms the content, he / she decides on the appropriate sound adjustment type and presses the sound quality changeover switch to determine the type of sound adjustment to be applied. There is a hassle of having to switch manually.
 かかる煩わしさを解消するべく、特許文献1には、再生装置が、電子番組情報(Electronic Program Guide:EPG)を参照し、再生対象になっているコンテンツに関して電子番組情報に予め記録されている「番組ジャンル」情報に応じて、適用される音声調整の種類を自動で切り替える技術が開示されている。 In order to eliminate such annoyance, Patent Document 1 refers to electronic program information (Electronic Program Guide: EPG) which is recorded in advance in the electronic program information regarding the content to be reproduced. A technique for automatically switching the type of audio adjustment to be applied in accordance with “program genre” information is disclosed.
 しかしながら、この特許文献1に開示されている技術には、以下のような問題が生じ得る。すなわち、同一のコンテンツについては、「番組ジャンル」情報が同じであるため、同じ音声調整が適用されることになるが、同一のコンテンツ内であっても、例えば音楽が演奏されるライブシーンと出演者が会話をするスタジオシーンとが混在することもあるので、シーンによっては、音声が映像と適合しないような(つまり、視聴者が違和感を感じるような)、不適切な音声調整が適用されてしまう虞がある。 However, the technique disclosed in Patent Document 1 may have the following problems. That is, for the same content, since the “program genre” information is the same, the same audio adjustment is applied, but even within the same content, for example, a live scene where music is played and appearance May be mixed with studio scenes where people talk, and depending on the scene, inappropriate audio adjustments may be applied, such as audio that does not match the video (that is, the viewer feels uncomfortable) There is a risk of it.
特開2005-318225号公報JP 2005-318225 A
 本発明は、例えば上述した問題点に鑑みてなされたものであり、映像信号及び音声信号から構成されるコンテンツを再生するにあたって、映像と音声との不適合により視聴者が感じる違和感を好適に低減可能な再生装置及び方法、並びにコンピュータプログラムを提供することを課題とする。

The present invention has been made in view of, for example, the above-mentioned problems, and can suitably reduce a sense of discomfort felt by a viewer due to a mismatch between video and audio when playing content composed of video and audio signals. PROBLEM TO BE SOLVED: To provide a reproducing apparatus and method, and a computer program.

 (再生装置)
 本発明に係る再生装置は、上記課題を解決するために、映像信号及び音声信号から構成されるコンテンツを再生する再生装置であって、当該再生装置に入力される前記映像信号を1又は複数の映像信号パラメータ毎に分析する分析手段と、前記再生されるコンテンツの各シーンに適合するパターンを、前記分析された前記映像信号に係る各映像信号パラメータと、所定の複数パターンの各々に係る各映像信号パラメータとの適合度に基づいて判別するパターン判別手段と、当該再生装置に入力される前記音声信号に対して所定の音声調整を施す音声調整手段と、前記判別されたパターンに対応する種類の前記音声調整を、前記入力される音声信号に対して施すように、前記音声調整手段を制御する音声制御手段とを備える。
(Playback device)
In order to solve the above problems, a playback device according to the present invention plays back content composed of a video signal and an audio signal, and the video signal input to the playback device is converted into one or more video signals. Analyzing means for analyzing each video signal parameter, a pattern suitable for each scene of the reproduced content, each video signal parameter relating to the analyzed video signal, and each video relating to each of a plurality of predetermined patterns Pattern discriminating means for discriminating on the basis of the degree of conformity with the signal parameter, audio adjusting means for performing a predetermined audio adjustment on the audio signal input to the reproduction apparatus, and a type corresponding to the determined pattern Voice control means for controlling the voice adjustment means so as to perform the voice adjustment on the input voice signal.
 本発明に係る再生装置によれば、映像信号及び音声信号から構成されるコンテンツを再生するにあたって、映像と音声との不適合により視聴者が感じる違和感を、以下のように、好適に低減できる。 According to the playback device of the present invention, when playing back content composed of a video signal and an audio signal, the uncomfortable feeling that the viewer feels due to the mismatch between the video and the audio can be suitably reduced as follows.
 映像信号及び音声信号から構成されるコンテンツとは、例えばムービー、スポーツ、アニメ、スタジオ、或いはライブ等のように、視聴者の視覚及び聴覚によって感受される番組を意味する。このコンテンツを再生するにあたって、映像信号及び音声信号が、当該再生装置に逐次入力される。 Content composed of video signals and audio signals means programs that are perceived by the viewer's visual and auditory sense, such as movies, sports, animation, studios, or live performances. In reproducing the content, the video signal and the audio signal are sequentially input to the reproduction device.
 分析手段は、例えば信号抽出回路及び演算・記憶回路からなり、当該再生装置に入力される映像信号を1又は複数の映像信号パラメータ毎に、分析する。映像信号パラメータとは、例えば、当該映像を構成する複数の画像における輝度情報(例えば、明るいか暗いかを示す情報)、色情報(例えば、原色か中間色か、或いは色数の多さを示す情報)、動き情報(例えば、静的か動的かを示す情報)、及びフレーム情報(例えば、黒帯の有無を示す情報)等のように、映像を何らかの観点から特徴づけるパラメータである。「映像信号パラメータ毎に分析する」とは、映像信号パラメータ毎に定性的、又は定量的な分析をすることの総称である。 The analysis means is composed of, for example, a signal extraction circuit and an arithmetic / storage circuit, and analyzes the video signal input to the playback apparatus for each of one or a plurality of video signal parameters. The video signal parameter is, for example, luminance information (for example, information indicating whether the image is bright or dark), color information (for example, information indicating primary color or intermediate color, or a large number of colors) in a plurality of images constituting the video. ), Motion information (for example, information indicating whether it is static or dynamic), frame information (for example, information indicating the presence / absence of a black belt), and the like. “Analyze for each video signal parameter” is a general term for qualitative or quantitative analysis for each video signal parameter.
 パターン判別手段は、例えば演算・記憶回路からなり、再生されるコンテンツの各シーンに適合するパターンを、分析された映像信号に係る各映像信号パラメータと、所定の複数パターンの各々に係る各映像信号パラメータとの適合度に基づいて判別する。例えば、映像信号に係る映像信号パラメータを、予め記憶回路に記憶された所定の複数パターンの各々に係る各映像信号パラメータと比較し、その結果、適合するパラメータが所定個数以上あるパターン、或いは最も多いパターンに当該映像信号が該当することを判別する。判別対象とされる映像信号は、当該映像を構成する1フレーム分でもよいし、所定期間に亘る複数フレーム分でもよい。 The pattern discriminating means comprises, for example, an arithmetic / storage circuit, and matches each scene of the content to be reproduced with each video signal parameter relating to the analyzed video signal and each video signal relating to each of a plurality of predetermined patterns. Discrimination is made based on the degree of compatibility with the parameters. For example, a video signal parameter related to a video signal is compared with each video signal parameter related to each of a plurality of predetermined patterns stored in advance in a storage circuit. It is determined that the video signal corresponds to the pattern. The video signal to be discriminated may be one frame constituting the video or a plurality of frames over a predetermined period.
 音声調整手段は、当該再生装置に入力される音声信号に対して所定の音声調整を施す。ここで、音声調整とは、例えば帯域別のゲイン増減処理、あるいは所定の音響処理のように、音声信号に基づいて出力される音声が視聴者に与える印象を何らかの手法によって変更可能な処理の総称である。 The sound adjustment means performs predetermined sound adjustment on the sound signal input to the playback device. Here, the audio adjustment is a general term for processes that can change the impression given to the viewer by the audio output based on the audio signal by some method, such as gain increase / decrease processing for each band or predetermined acoustic processing. It is.
 音声制御手段は、例えば制御部であり、判別されたパターンに対応する種類の音声調整を、入力される音声信号に対して施すように、音声調整手段を制御する。例えば、再生されるコンテンツのシーンに適合するパターンがライブであると判別される場合には、相対的に低域を強調する音声調整を、入力される音声信号に対して施すように、音声調整手段を制御する。その後、定期に又は不定期に好ましくはリアルタイムに、上記分析手段、及びパターン判別手段によって、再生されるコンテンツのシーンに適合するパターンが再度判別される。この際、例えば、再生されるコンテンツのシーンに適合するパターンがスタジオであると判別される場合には、相対的に高域を強調する音声調整を、入力される音声信号に対して施すように、音声調整手段を制御する。 The voice control unit is, for example, a control unit, and controls the voice adjustment unit so that the type of voice adjustment corresponding to the determined pattern is performed on the input voice signal. For example, if it is determined that the pattern that matches the scene of the content to be played is live, the audio adjustment is performed so that the audio signal that relatively emphasizes the low frequency band is applied to the input audio signal. Control means. Thereafter, the pattern that matches the scene of the content to be reproduced is determined again by the analysis unit and the pattern determination unit regularly or irregularly, preferably in real time. At this time, for example, when it is determined that the pattern suitable for the scene of the content to be played back is a studio, audio adjustment that relatively emphasizes high frequencies is performed on the input audio signal. , To control the sound adjustment means.
 このようにして、仮に同一のコンテンツ内で音楽が演奏されるライブシーンと出演者が会話をするスタジオシーンとが混在する場合でも、シーン毎に好適な音声調整が自動的に施される。その結果、映像と音声との不適合により視聴者が感じる違和感を好適に低減可能となり、各シーンの雰囲気をより印象深く感じてもらうことができる。 In this way, even if a live scene where music is played in the same content and a studio scene where performers talk are mixed, suitable audio adjustment is automatically performed for each scene. As a result, it is possible to suitably reduce the sense of discomfort felt by the viewer due to incompatibility between video and audio, and the atmosphere of each scene can be felt more impressively.
 本発明に係る再生装置の一態様において、前記音声制御手段は、前記所定の音声調整を前記音声信号に対して施すにあたり、当該所定の音声調整を施す前に適用されている音声調整用パラメータから、当該所定の音声調整用パラメータへと、段階的に近づけながら変更していく音声クロスフェード処理を行うように、前記音声調整手段を制御する。 In one aspect of the reproducing apparatus according to the present invention, the sound control unit may perform the predetermined sound adjustment on the sound signal from a sound adjustment parameter applied before performing the predetermined sound adjustment. Then, the sound adjusting means is controlled so as to perform a sound cross-fade process that changes while gradually approaching the predetermined sound adjusting parameter.
 この態様によれば、音声クロスフェード処理によって、異なる音声調整を施す際に、所定の音声調整用パラメータへと、一度ではなく、所定の期間に亘って段階的に近づけられるので、違和感を好適に低減できる。 According to this aspect, when different audio adjustments are performed by the audio cross-fade process, it is possible to approach the predetermined audio adjustment parameters step by step over a predetermined period instead of once. Can be reduced.
 この態様において、前記音声制御手段は、前記音声クロスフェード処理を行うにあたり、前記再生されるコンテンツのシーンに適合すると判別されるパターンの数に応じて、前記音声クロスフェード処理に費やす時間を変更してもよい。 In this aspect, the audio control unit changes the time spent for the audio cross-fade processing according to the number of patterns determined to be suitable for the scene of the content to be reproduced when performing the audio cross-fade processing. May be.
 この態様によれば、判別されるパターンの数が複数あり、何れのパターンが最適であるかが絞りきれない場合には、例えば音声クロスフェード処理に費やす時間が相対的に長期化され、何れかのパターンに対応する種類の音声調整を急速に施すことは回避されるので、パターン判別精度の不確定性を吸収することができる。 According to this aspect, when there are a plurality of patterns to be discriminated and it is impossible to narrow down which pattern is optimal, for example, the time spent for the audio crossfade processing is relatively prolonged, Therefore, it is possible to absorb the uncertainty of pattern discrimination accuracy because it is possible to avoid the rapid voice adjustment of the type corresponding to the pattern.
 更にこのように音声クロスフェード処理に費やす時間が変更される態様において、前記音声制御手段は、前記音声クロスフェード処理を行うにあたり、前記再生されるコンテンツのシーンに適合すると判別されるパターンが二つ以上の場合には、該パターンが一つの場合に比べて、前記音声クロスフェード処理に費やす時間を長くしてもよい。 Further, in the aspect in which the time spent for the audio cross-fading process is changed in this way, the audio control means has two patterns that are determined to be suitable for the scene of the content to be reproduced when performing the audio cross-fading process. In the above case, the time spent for the audio cross-fading process may be made longer than in the case of one pattern.
 この態様によれば、上述のごとく、パターン判別精度の不確定性を吸収することができる。より詳しくは、パターン判別精度は必ずしも完全ではなく、定期に又は不定期に行われる次のパターン判別結果によって、更にパターンの数が更に絞り込まれることもある。そこで、再生されるコンテンツのシーンに適合すると判別されるパターンの数が一つの場合には、音声クロスフェード処理に費やす時間は標準値のまま、当該パターンに対応する種類の音声調整が迷うことなく施される一方で、該パターンの数が二つ以上の場合には、音声クロスフェード処理に費やす時間は標準値よりも長期化され、当該二つ以上のパターンのうち何れかに対応する種類の音声調整が相対的にゆっくりと施される。そうすると、仮に音声クロスフェード処理の途中に行われる次のパターン判別によって、再生されるコンテンツが他のパターンであることが判別される場合であっても、当該他のパターンに対応する適切な種類の音声調整が、入力される音声信号に対して好適に施される。このようにして、いずれのパターンが最適であるかが絞りきれない場合には、いずれかのパターンに対応する種類の音声調整を急速に施すことは回避されるので、パターン判別精度の不確定性を吸収することができる。 According to this aspect, as described above, the uncertainty of pattern discrimination accuracy can be absorbed. More specifically, the pattern discrimination accuracy is not always perfect, and the number of patterns may be further narrowed down by the next pattern discrimination result performed regularly or irregularly. Therefore, when the number of patterns that are determined to be suitable for the scene of the content to be played is one, the time spent for audio cross-fading processing remains the standard value, and the type of audio adjustment corresponding to the pattern is not lost. On the other hand, when the number of the patterns is two or more, the time spent for the audio crossfading process is longer than the standard value, and the type corresponding to one of the two or more patterns is used. Audio adjustment is performed relatively slowly. Then, even if it is determined that the content to be reproduced is another pattern by the next pattern determination performed in the middle of the audio crossfading process, an appropriate type corresponding to the other pattern is determined. The sound adjustment is preferably performed on the input sound signal. In this way, when it is impossible to narrow down which pattern is optimal, it is possible to avoid applying the type of sound adjustment corresponding to any pattern rapidly, so the uncertainty of pattern discrimination accuracy Can be absorbed.
 或いはこのように音声クロスフェード処理に費やす時間が変更される態様において、当該再生装置に入力される前記映像信号に対して所定の映像調整を施す映像調整手段と、前記判別されたパターンに対応する種類の前記映像調整を、前記入力される映像信号に対して施すように、前記映像調整手段を制御する映像制御手段とを更に備えてもよい。 Alternatively, in a manner in which the time spent for the audio cross-fading process is changed in this manner, the video adjustment means for performing a predetermined video adjustment on the video signal input to the playback apparatus and the determined pattern It may further comprise video control means for controlling the video adjustment means so as to perform the type of the video adjustment on the input video signal.
 この態様によれば、音声調整のみならず、映像調整も行われるので、映像と音声との不適合により視聴者が感じる違和感を一層好適に低減可能である。 According to this aspect, since not only the audio adjustment but also the video adjustment is performed, the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio can be reduced more suitably.
 このように映像調整手段と映像制御手段とを更に備える態様において、前記映像制御手段は、前記所定の映像調整において、当該所定の映像調整を施す前に適用されている映像調整用パラメータから、当該所定の映像調整用パラメータへと、段階的に近づけながら変更していく映像クロスフェード処理を行うように、前記映像調整手段も制御してもよい。 As described above, in the aspect further including the video adjustment unit and the video control unit, the video control unit is configured to calculate, based on the video adjustment parameter applied before the predetermined video adjustment is performed, in the predetermined video adjustment. The video adjusting means may also be controlled so as to perform a video cross-fade process that changes while gradually approaching a predetermined video adjustment parameter.
 この態様によれば、映像調整の際に映像クロスフェード処理が行われるので、映像と音声との不適合により視聴者が感じる違和感を一層好適に低減可能である。 According to this aspect, since the video cross-fade process is performed at the time of video adjustment, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio.
 このように映像クロスフェード処理が行われる態様において、前記映像制御手段は、前記映像クロスフェード処理を行うにあたり、前記再生されるコンテンツのシーンに適合すると判別されるパターンの数に応じて、前記映像クロスフェード処理に費やす時間を変更し、該映像クロスフェード処理に費やす時間の下限値は、前記音声クロスフェード処理に費やす時間の下限値よりも長く設定されてもよい。 Thus, in the aspect in which the video crossfading process is performed, the video control unit performs the video crossfading process according to the number of patterns determined to be suitable for the scene of the content to be reproduced. The time spent for the cross-fade process may be changed, and the lower limit value of the time spent for the video cross-fade process may be set longer than the lower limit value of the time spent for the audio cross-fade process.
 この態様によれば、音声が急に切り替わる場合に比べて、映像が急に切り替わる場合に視聴者が感じる違和感は大きいことに鑑みて、映像クロスフェード処理に費やす時間の下限値は、音声クロスフェード処理に費やす時間の下限値よりも長く設定されているので、視聴者が感じる違和感を一層好適に低減できる。例えば、再生されるコンテンツのシーンに適合すると判別されるパターンが一つの場合には、該パターンが二つ以上の場合に比べて、音声クロスフェード処理に費やす時間を相対的に短くする(すなわち、標準値に戻す)一方で、映像クロスフェード処理に費やす時間は、該標準値よりも長い時間とする。これにより、視聴者が感じる違和感を一層好適に低減できる。 According to this aspect, in view of the fact that the viewer feels a sense of discomfort when the video changes suddenly compared to when the audio changes suddenly, the lower limit of the time spent on the video crossfade processing is Since it is set longer than the lower limit value of the time spent for processing, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer. For example, when there is one pattern that is determined to be suitable for the scene of the content to be played back, the time spent for the audio crossfading process is relatively shortened compared to the case where there are two or more patterns (that is, On the other hand, the time spent for the video cross-fade processing is longer than the standard value. Thereby, the discomfort felt by the viewer can be more suitably reduced.
 本発明に係る再生装置の他態様において、前記分析手段は、当該再生装置に入力される前記映像信号に加えて、前記音声信号を1又は複数の音声信号パラメータ毎に分析し、前記パターン判別手段は、前記分析された前記映像信号に係る各映像信号パラメータと、前記所定の複数パターンの各々に係る各映像信号パラメータとの適合度、及び前記分析された前記音声信号に係る各音声信号パラメータと、前記所定の複数パターンの各々に係る各音声信号パラメータとの適合度に基づいて、前記再生されるコンテンツのシーンに適合するパターンを判別する。 In another aspect of the reproduction apparatus according to the present invention, the analysis means analyzes the audio signal for each one or a plurality of audio signal parameters in addition to the video signal input to the reproduction apparatus, and the pattern discrimination means Is a degree of matching between each video signal parameter related to the analyzed video signal and each video signal parameter related to each of the predetermined plurality of patterns, and each audio signal parameter related to the analyzed audio signal Based on the degree of matching with each audio signal parameter relating to each of the predetermined plurality of patterns, a pattern that matches the scene of the content to be reproduced is determined.
 この態様によれば、当該再生装置に入力される映像信号に加えて、その音声信号が、1又は複数の音声信号パラメータ毎に分析される。音声信号パラメータとは、例えば、リズム、音域別のゲイン、楽曲が演奏されているか否かを示す情報、人間の話声が含まれているか否かを示す情報等のように、音声を何らかの観点から特徴づけるパラメータである。そして、分析された映像信号に係る各映像信号パラメータと、所定の複数パターンの各々に係る各映像信号パラメータとの適合度、及び分析された音声信号に係る各音声信号パラメータと、所定の複数パターンの各々に係る各音声信号パラメータとの適合度に基づいて、再生されるコンテンツのシーンに適合するパターンが判別される。このように、音声信号も分析することで、映像信号からだけでは一つに判別しきれなかったパターン、或いは誤って判別してしまったパターンを、より少ない数に判別しきることができ、或いは適切なパターンを判別し直すことができる。 According to this aspect, in addition to the video signal input to the playback device, the audio signal is analyzed for each of one or more audio signal parameters. Audio signal parameters include, for example, rhythm, gain for each range, information indicating whether or not a song is being played, information indicating whether or not a human speech is included, and the like. It is a parameter characterizing from The degree of matching between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of the predetermined plural patterns, each audio signal parameter relating to the analyzed audio signal, and the predetermined plural patterns A pattern that matches the scene of the content to be reproduced is determined based on the degree of matching with each audio signal parameter relating to each of the above. In this way, by analyzing the audio signal, it is possible to discriminate a pattern that could not be discriminated to one from the video signal, or a pattern that was discriminated incorrectly, to a smaller number, or appropriate New patterns can be re-determined.
 (再生方法)
 本発明の再生方法は上記課題を解決するために、映像信号及び音声信号から構成されるコンテンツを再生する再生方法であって、入力される前記映像信号を1又は複数の映像信号パラメータ毎に分析する分析工程と、前記再生されるコンテンツの各シーンに適合するパターンを、前記分析された前記映像信号に係る各映像信号パラメータと、所定の複数パターンの各々に係る各映像信号パラメータとの適合度に基づいて判別するパターン判別工程と、入力される前記音声信号に対して所定の音声調整を施す音声調整工程と、前記判別されたパターンに対応する種類の前記音声調整を、前記入力される音声信号に対して施すように、前記音声調整手段を制御する音声制御工程とを備える。
(Playback method)
In order to solve the above problems, the playback method of the present invention is a playback method for playing back content composed of a video signal and an audio signal, and analyzes the input video signal for each of one or more video signal parameters. An analysis step, and a pattern suitable for each scene of the content to be played back, a degree of fitness between each analyzed video signal parameter for each of the analyzed video signals and each of the predetermined video signal parameters for each of a plurality of predetermined patterns A pattern discrimination step for discriminating on the basis of the voice signal, a voice adjustment step for performing a predetermined voice adjustment on the input voice signal, and a voice adjustment of the type corresponding to the determined pattern. A voice control step of controlling the voice adjustment means so as to be applied to the signal.
 本発明の再生方法によれば、上述した本発明の再生装置の場合と同様に、映像信号及び音声信号から構成されるコンテンツを再生するにあたって、映像と音声との不適合により視聴者が感じる違和感を好適に低減可能となる。 According to the playback method of the present invention, as in the case of the playback apparatus of the present invention described above, when playing content composed of video signals and audio signals, the viewer feels uncomfortable due to the mismatch between video and audio. It can be suitably reduced.
 尚、本発明の再生方法においても、上述した本発明の再生装置における各種態様と同様の各種態様を採ることが可能である。 In the reproduction method of the present invention, various aspects similar to the various aspects of the reproduction apparatus of the present invention described above can be adopted.
 (コンピュータプログラム)
 本発明のコンピュータプログラムは上記課題を解決するために、コンピュータを、上述した本発明の再生装置(但し、その各種態様を含む)として機能させる
 本発明のコンピュータプログラムによれば、当該コンピュータプログラムを格納するCD-ROM、DVD-ROM等の記録媒体から、当該コンピュータプログラムをコンピュータに読み込んで実行させれば、或いは、当該コンピュータプログラムを通信手段を介してダウンロードさせた後に実行させれば、上述した本発明の再生装置(但し、その各種態様を含む)を比較的簡単に構築できる。これにより、上述した本発明の再生装置の場合と同様に、映像信号及び音声信号から構成されるコンテンツを再生するにあたって、映像と音声との不適合により視聴者が感じる違和感を好適に低減可能となる。
(Computer program)
In order to solve the above-described problems, the computer program of the present invention causes a computer to function as the above-described playback device of the present invention (including various aspects thereof). According to the computer program of the present invention, the computer program is stored. If the computer program is read from a recording medium such as a CD-ROM or DVD-ROM to be executed by a computer or executed after being downloaded through a communication means, the above-described book The playback device of the invention (including various aspects thereof) can be constructed relatively easily. As a result, as in the case of the playback apparatus of the present invention described above, when playing back content composed of video signals and audio signals, it is possible to suitably reduce the sense of discomfort felt by the viewer due to incompatibility between video and audio. .
 本発明の作用及び他の利得は次に説明する実施例から明らかにされよう。 The operation and other advantages of the present invention will be made clear from the embodiments described below.
第1実施例に係る、再生装置2の基本構成を示すブロック図である。It is a block diagram which shows the basic composition of the reproducing | regenerating apparatus 2 based on 1st Example. 第1実施例に係る、再生装置2においてシーンのパターンを判別するための判別材料、ならびに各パターンに適用する映像調整及び音声調整を夫々示す対応図である。FIG. 6 is a correspondence diagram illustrating a discrimination material for discriminating a scene pattern in the playback apparatus 2 and video adjustment and audio adjustment applied to each pattern according to the first embodiment. 第1実施例に係る、音声調整部222の基本構成を示すブロック図である。It is a block diagram which shows the basic composition of the audio | voice adjustment part 222 based on 1st Example. 第1実施例に係る、各パターンに適用する音声調整用パラメータの設定例を示す対応図である。FIG. 6 is a correspondence diagram illustrating a setting example of audio adjustment parameters applied to each pattern according to the first embodiment. 第1実施例に係る、各パターンに適用する音声調整に係る周波数特性を示す特性図である。It is a characteristic view which shows the frequency characteristic concerning the audio | voice adjustment applied to each pattern based on 1st Example. 第1実施例に係る、再生装置2の基本動作を示すフローチャートである。It is a flowchart which shows the basic operation | movement of the reproducing | regenerating apparatus 2 based on 1st Example. シーンのパターンを判別した結果に対応する音声調整に関して、比較例と第1実施例とを対比して示すタイミングチャートである。It is a timing chart which compares and shows a comparative example and the 1st example about sound adjustment corresponding to a result of having distinguished a scene pattern. 第1実施例に係る、シーンのパターンを判別した結果に対応する映像調整及び音声調整の各々に対するクロスフェード処理を示すタイミングチャートである。It is a timing chart which shows the cross fade process with respect to each of image | video adjustment and audio | voice adjustment corresponding to the result which discriminate | determined the pattern of the scene based on 1st Example. 第1実施例に係る、音声クロスフェード処理を示す特性図である。It is a characteristic view which shows the audio | voice cross fade process based on 1st Example. 第2実施例に係る、再生装置2の基本動作を示すフローチャートである。It is a flowchart which shows the basic operation | movement of the reproducing | regenerating apparatus 2 based on 2nd Example. シーンのパターンを判別した結果及びその結果に対応する音声調整に関して、比較例と第1実施例と第2実施例とを対比して示すタイミングチャートである。It is a timing chart which compares and shows a comparative example, the 1st example, and the 2nd example about the result of discriminating the pattern of a scene, and sound adjustment corresponding to the result. 第3実施例に係る、再生装置2の基本動作を示すフローチャートである。It is a flowchart which shows the basic operation | movement of the reproducing | regenerating apparatus 2 based on 3rd Example. シーンのパターンを判別した結果に対応する音声調整に関して、第1実施例と第3実施例とを対比して示すタイミングチャートである。It is a timing chart which compares and compares the 1st Example and the 3rd Example about the voice adjustment corresponding to the result of having distinguished the scene pattern.
符号の説明Explanation of symbols
 1 ソース機器
 2 再生装置
 211 映像抽出部
 212 映像調整部
 221 音声抽出部
 222 音声調整部
 23 制御部
 231 映像分析部
 232 音声分析部
 233 パターン判別部
 234 映像制御部
 235 音声制御部
 24 記憶部
 25 操作部
 31 表示部
 32 音声出力部
 4 視聴者
DESCRIPTION OF SYMBOLS 1 Source apparatus 2 Reproduction | regeneration apparatus 211 Image | video extraction part 212 Image | video adjustment part 221 Audio | voice extraction part 222 Audio | voice adjustment part 23 Control part 231 Image | video analysis part 232 Audio | voice analysis part 233 Pattern discrimination | determination part 234 Image | video control part 235 Audio | voice control part 24 Memory | storage part 25 Operation Section 31 Display section 32 Audio output section 4 Viewer
 以下、本発明を実施するための最良の形態について実施例毎に順に図面に基づいて説明する。 Hereinafter, the best mode for carrying out the present invention will be described in order for each embodiment based on the drawings.
 (1)第1実施例
 第1実施例に係る再生装置2の構成及び動作処理を図1から図9を参照して説明する。
(1) First Example The configuration and operation process of the playback apparatus 2 according to the first example will be described with reference to FIGS.
 図1は、第1実施例に係る、再生装置2の基本構成を示すブロック図である。 FIG. 1 is a block diagram showing a basic configuration of a playback apparatus 2 according to the first embodiment.
 図1に示すように、再生装置2は、ソース機器1から入力される映像信号及び音声信号を分析し、分析結果に基づいて所定の調整を施し、映像信号を表示部31へ出力するとともに、音声信号を音声出力部32へ出力する。これにより、映像信号及び音声信号から構成されるコンテンツを再生するにあたって、映像と音声との不適合により視聴者4が感じる違和感を好適に低減可能となる。 As shown in FIG. 1, the playback device 2 analyzes the video signal and the audio signal input from the source device 1, performs a predetermined adjustment based on the analysis result, and outputs the video signal to the display unit 31. The audio signal is output to the audio output unit 32. Thereby, when reproducing the content composed of the video signal and the audio signal, it is possible to suitably reduce the uncomfortable feeling that the viewer 4 feels due to the mismatch between the video and the audio.
 ソース機器1は、DVDプレーヤ、BDプレーヤ、セットトップボックスのように、再生対象であるコンテンツを構成する映像信号及び音声信号を、記録媒体や通信網から取得し、再生装置2へと出力する機器である。 The source device 1 is a device such as a DVD player, a BD player, or a set-top box that acquires video signals and audio signals constituting the content to be played back from a recording medium or a communication network and outputs them to the playback device 2 It is.
 再生装置2は、以下の構成要素を備え、ソース機器1から入力される映像信号及び音声信号を分析し、分析結果に基づいて所定の調整を施し、映像信号を表示部31へ出力するとともに、音声信号を音声出力部32へ出力する。 The playback device 2 includes the following components, analyzes a video signal and an audio signal input from the source device 1, performs a predetermined adjustment based on the analysis result, and outputs the video signal to the display unit 31, The audio signal is output to the audio output unit 32.
 映像抽出部211は、入力される映像信号を抽出して、制御部23に備わる映像分析部231、及び映像調整部212の各々へと出力する。 The video extraction unit 211 extracts an input video signal and outputs the video signal to each of the video analysis unit 231 and the video adjustment unit 212 provided in the control unit 23.
 音声抽出部221は、入力される音声信号を抽出して、制御部23に備わる音声分析部232、及び音声調整部222の各々へと出力する。 The voice extraction unit 221 extracts an input voice signal and outputs the extracted voice signal to each of the voice analysis unit 232 and the voice adjustment unit 222 provided in the control unit 23.
 映像分析部231は、抽出される映像信号を1又は複数の映像信号パラメータ毎に、定性的に又は定量的に、分析する。 The video analysis unit 231 analyzes the extracted video signal qualitatively or quantitatively for each of one or a plurality of video signal parameters.
 音声分析部232は、抽出される音声信号を1又は複数の音声信号パラメータ毎に、定性的に又は定量的に、分析する。 The voice analysis unit 232 analyzes the extracted voice signal qualitatively or quantitatively for each of one or a plurality of voice signal parameters.
 記憶部24は、抽出される映像信号、音声信号、及びそれらの分析結果を、映像情報、及び音声情報として、所定期間に亘る履歴として、記憶する。加えて、記憶部24は、後述のパターン判別部233によるパターン判別の際に参照される、所定の複数パターンの各々に係る各映像信号パラメータの適合値、又は適合範囲を、パターン判別情報として記憶する。 The storage unit 24 stores the extracted video signal, audio signal, and analysis results thereof as video information and audio information as a history over a predetermined period. In addition, the storage unit 24 stores, as pattern discrimination information, the adaptation value or the adaptation range of each video signal parameter related to each of a plurality of predetermined patterns, which is referred to when the pattern discrimination unit 233 described later performs pattern discrimination. To do.
 パターン判別部233は、映像分析部231、及び音声分析部232のうち、少なくとも映像分析部231による分析結果に基づいて、再生されるコンテンツの各シーンに適合するパターンを判別する。パターン判別部233は、例えば、映像信号に係る映像信号パラメータを、予め記憶部24に記憶されたパターン判別情報と比較し、その結果、適合するパラメータが所定個数以上あるパターン、或いは最も多いパターンに当該映像信号が該当することを判別する。 The pattern discriminating unit 233 discriminates a pattern suitable for each scene of the content to be reproduced, based on at least the analysis result by the video analyzing unit 231 among the video analyzing unit 231 and the audio analyzing unit 232. The pattern discriminating unit 233 compares, for example, the video signal parameters relating to the video signal with the pattern discriminating information stored in the storage unit 24 in advance, and as a result, the pattern having a predetermined number or more of matching parameters or the largest pattern is obtained. It is determined that the video signal is applicable.
 映像制御部234は、パターン判別部233により判別されたパターンに対応する種類の映像調整を、入力される映像信号に対して施すように、映像調整部212を制御する。 The video control unit 234 controls the video adjustment unit 212 so as to apply the type of video adjustment corresponding to the pattern determined by the pattern determination unit 233 to the input video signal.
 音声制御部235は、パターン判別部233により判別されたパターンに対応する種類の音声調整を、入力される音声信号に対して施すように、音声調整部222を制御する。 The audio control unit 235 controls the audio adjustment unit 222 so that the type of audio adjustment corresponding to the pattern determined by the pattern determination unit 233 is performed on the input audio signal.
 操作部25は、視聴者4の操作に応じて、パターン判別部233により判別されたパターンに関わらず、視聴者4が所望のパターンに対応する種類の音声調整及び/又は映像調整を手動で指示できるように、各種類毎の操作ボタンを備える。 The operation unit 25 manually instructs the type of audio adjustment and / or video adjustment corresponding to the desired pattern regardless of the pattern determined by the pattern determination unit 233 according to the operation of the viewer 4. Operation buttons for each type are provided so that they can be used.
 表示部31は、例えばプラズマディスプレイ、或いは液晶ディスプレイのように映像信号を映像として出力することができるデバイスであり、映像調整部212から出力される映像信号に基づいて、視聴者4に対してコンテンツの映像を表示する。 The display unit 31 is a device that can output a video signal as a video, such as a plasma display or a liquid crystal display, for example, and provides content to the viewer 4 based on the video signal output from the video adjustment unit 212. Display the video.
 音声出力部32は、例えばスピーカ、ヘッドフォン等のように音声信号を音として出力することができるデバイスであり、音声調整部222から出力される音声信号に基づいて、視聴者4に対してコンテンツの音声を出力する。 The audio output unit 32 is a device that can output an audio signal as sound, such as a speaker or a headphone, for example. Based on the audio signal output from the audio adjustment unit 222, the audio output unit 32 outputs content to the viewer 4. Output audio.
 次に、図2の対応図を参照して、第1実施例に係る、再生装置2においてシーンのパターンを判別するための判別材料、並びに各パターンに適用する映像調整及び音声調整について説明する。 Next, the discrimination material for discriminating the scene pattern in the playback apparatus 2 and the video adjustment and audio adjustment applied to each pattern according to the first embodiment will be described with reference to the corresponding diagram of FIG.
 図2に示すように、シーンのパターンが「ムービー」であるか否かは、映像信号のフレーム情報に基づいて判別される。そして、シーンのパターンが「ムービー」であると判別される場合には、映像調整として、家の中の一番明るい環境でも雰囲気を損なわずにムービーを楽しめる設定、例えば、コントラストをすっきりさせ、輝度を全体的に向上させる調整が施される。これに加えて又は代えて、音声調整として、迫力の臨場感表現と明瞭な台詞を両立させる調整が施される。 As shown in FIG. 2, whether or not the scene pattern is “movie” is determined based on the frame information of the video signal. And if the scene pattern is determined to be “movie”, the video adjustment can be set to enjoy the movie without damaging the atmosphere even in the brightest environment in the house, for example, clear contrast, brightness Adjustments are made to improve overall. In addition to or in place of this, as voice adjustment, an adjustment that achieves both a realistic expression of force and a clear dialogue is performed.
 シーンのパターンが「スポーツ」であるか否かは、映像信号における動きの程度や有無、或いは色情報(例えば、芝生に相当するグリーン色の程度)に基づいて判別される。そして、シーンのパターンが「スポーツ」であると判別される場合には、映像調整として、家で部屋の照明をつけてスポーツを見れる設定、例えば、グリーンのディティールを強調する調整が施される。これに加えて又は代えて、音声調整として、臨場感を重視するために、低、高域が少ないので、中高域を強調する調整を施し、これに加えて又は代えて、周波数特性は情報が多い部分をカバーする調整を施し、サラウンドを使って臨場感を増す調整が施される。 Whether or not the scene pattern is “sports” is determined based on the degree or presence of movement in the video signal or color information (for example, the degree of green color corresponding to lawn). When it is determined that the scene pattern is “sports”, as a video adjustment, a setting for lighting a room at home and watching sports, for example, an adjustment for emphasizing green details is performed. In addition to or in place of this, as the sound adjustment, in order to emphasize the sense of realism, since there are few low and high frequencies, adjustment is made to emphasize the middle and high frequencies, and in addition or alternatively, the frequency characteristics are Adjustments are made to cover many parts, and surrounds are used to increase the presence.
 シーンのパターンが「アニメ」であるか否かは、色のべた塗りが多いか否かに基づいて判別される。そして、シーンのパターンが「アニメ」であると判別される場合には、映像調整として、家で部屋の照明をつけてアニメ番組を見ることを想定した設定、例えば、コントラストを弱く設定する調整が施される。これに加えて又は代えて、音声調整として、基本対象は、ベタアニメ、ダイアログが支配的であるところ、弱めの効果音を魅力的に演出する調整が施される。 Whether or not the scene pattern is “animation” is determined based on whether or not there are many solid colors. If it is determined that the scene pattern is “animation”, as an image adjustment, a setting assuming that the room is turned on at home and an animation program is viewed, for example, an adjustment for setting a low contrast is performed. Applied. In addition to or in place of this, as a sound adjustment, the basic object is an adjustment that makes a weak sound effect attractive while the solid animation and the dialog are dominant.
 シーンのパターンが「スタジオ」であるか否かは、画面の面積のうち肌色の比率が多く、動きが少ないか否かに基づいて判別される。そして、シーンのパターンが「スタジオ」であると判別される場合には、映像調整として、家で部屋の照明をつけてスタジオ番組を見ることを想定した設定、例えばコントラストを弱く設定する調整が施される。これに加えて又は代えて、音声調整として、控えめなボリュームでも明瞭度の高いダイアログを実現するための調整が施される。 Whether or not the scene pattern is “studio” is determined based on whether the skin color ratio is large and the movement is small in the screen area. If it is determined that the scene pattern is “studio”, the video adjustment is performed on the assumption that the room program is turned on at home and the studio program is viewed, for example, an adjustment for setting the contrast low. Is done. In addition to or instead of this, as a sound adjustment, an adjustment for realizing a dialog with high clarity even with a modest volume is performed.
 シーンのパターンが「ライブ」であるか否かは、全体的に暗く、中央に灯りがあるか否かに基づいて判別される。そして、シーンのパターンが「ライブ」であると判別される場合には、映像調整として、ライブの雰囲気を出すために、コントラストを重視する調整が施される。これに加えて又は代えて、音声調整として、音楽主体のコンテンツに対して、ハイファイ思想をベースに、若干の低域を強調する調整が施される。 Whether the scene pattern is “live” or not is determined based on whether it is dark overall and there is a light in the center. When it is determined that the scene pattern is “live”, as a video adjustment, an adjustment that emphasizes contrast is performed in order to create a live atmosphere. In addition to or instead of this, as a sound adjustment, an adjustment for emphasizing a low frequency band is performed on music-based content based on the hi-fi concept.
 次に、特に音声調整について、図3から図5を参照して説明を補足する。ここで、
図3は、第1実施例に係る、音声調整部222の基本構成を示すブロック図であり、
図4は、第1実施例に係る、各パターンに適用する音声調整用パラメータの設定例を示す対応図である。
Next, the explanation of the audio adjustment will be supplemented with reference to FIGS. here,
FIG. 3 is a block diagram illustrating a basic configuration of the audio adjustment unit 222 according to the first embodiment.
FIG. 4 is a correspondence diagram illustrating a setting example of a parameter for audio adjustment applied to each pattern according to the first embodiment.
 図3に示すように、音声調整部222は、例えば、帯域別のゲイン増減処理として、入力される音声信号のうち、低域のゲインを選択的に増減する低域増減部2221、中域のゲインを選択的に増減する中域増減部2222、及び高域のゲインを選択的に増減する高域増減部2223を備える。 As shown in FIG. 3, for example, as the gain increase / decrease processing for each band, the audio adjustment unit 222 is a low frequency increase / decrease unit 2221 that selectively increases / decreases a low frequency gain among input audio signals. A mid-range increasing / decreasing unit 2222 that selectively increases / decreases the gain and a high-frequency increasing / decreasing unit 2223 that selectively increases / decreases the high-frequency gain are provided.
 これら低域増減部2221~高域増減部2223の各々におけるゲイン、或いは適用度は、例えば図4に示すように、パターン判別部233によって判別されたパターンに対応するように、音声制御部235によって設定制御される。尚、図4において、各パターンに適用する音声調整用パラメータの設定値が、「h000」のように示されており、この設定値が大きいほどゲイン、或いは適用度が高いことを示す。 The gain or applicability of each of the low frequency increase / decrease unit 2221 to high frequency increase / decrease unit 2223 is determined by the audio control unit 235 so as to correspond to the pattern determined by the pattern determination unit 233 as shown in FIG. Settings are controlled. In FIG. 4, the setting value of the audio adjustment parameter applied to each pattern is shown as “h000”, and the larger the setting value, the higher the gain or the applicability.
 そして、低域増減部2221~高域増減部2223の各々から出力された信号を、加算部2228が統合する。 Then, the adder 2228 integrates the signals output from each of the low frequency increasing / decreasing unit 2221 to the high frequency increasing / decreasing unit 2223.
 加えて、音声調整部222は、音響処理として、統合された音声信号に対して、立体的な音場感を与える立体音場感処理部2224、音声信号の低域を特殊な方法で増強する特殊低域増強処理部2225、及び音声信号の高域の音質を改善して台詞が明瞭に聞こえるようにする台詞くっきり処理部2226を備える。図3には、これら各部の詳細な構成も併せて示されている。 In addition, the audio adjustment unit 222 enhances the low-frequency range of the audio signal by a special method as a sound processing, a stereoscopic sound field feeling processing unit 2224 that gives a three-dimensional sound field feeling to the integrated audio signal. A special low frequency enhancement processing unit 2225 and a dialogue clear processing unit 2226 for improving the sound quality of the high frequency range of the audio signal so that the speech can be heard clearly. FIG. 3 also shows detailed configurations of these parts.
 立体音場感処理部2224には、左チャンネルの音声信号Lin及び右チャンネルの音声信号Rinが夫々入力される。入力された音声信号Lin及びRinは、加算部22241によって統合され、その残響音(或いは各種遅延信号)が残響音生成部22242によって生成され、その残響時間が広がり成分生成処理部22243の各部a、a、a……a及びb、b、b……bによって変えられ、もって残響音の広がり具合が調整された広がり成分が左右両チャンネルについて生成される。左チャンネルについて生成された広がり成分と、可変ゲイン増減部22244から出力される音声信号とは、加算部22245において統合され、左チャンネルの音声信号LOUTとして出力される。同様に、右チャンネルについて生成された広がり成分と、可変ゲイン増減部22246から出力される音声信号とは、加算部22247において統合され、右チャンネルの音声信号ROUTとして出力される。このようにして、立体音場感処理部2224は、入力される音声信号に対して、立体的な音場感を与える。 The three-dimensional sound field feeling processing section 2224, of the left-channel audio signal L in and right-channel audio signal R in are respectively input. The input audio signals L in and R in are integrated by the adder 22241, the reverberation sound (or various delay signals) is generated by the reverberation sound generator 22242, the reverberation time is expanded, and each part of the component generation processing unit 22243 changed by a 1, a 2, a 3 ...... a n and b 1, b 2, b 3 ...... b n, spread component is spatial spread of the reverberation sound is adjusted with is generated for the left and right channels. The spread component generated for the left channel and the audio signal output from the variable gain increase / decrease unit 22244 are integrated by the adder 22245 and output as the left channel audio signal L OUT . Similarly, the spread component generated for the right channel and the audio signal output from the variable gain increasing / decreasing unit 22246 are integrated by the adding unit 22247 and output as the audio signal R OUT of the right channel. In this way, the three-dimensional sound field feeling processing unit 2224 gives a three-dimensional sound field feeling to the input audio signal.
 特殊低域増強処理部2225には、音声信号Inが入力される。入力された音声信号Inから低音抽出部22251によって低音が抽出され、倍音成分生成部22252によってその倍音が生成される。生成された倍音と、可変ゲイン増減部22254から出力される音声信号とが、加算部22255において統合され、音声信号Outとして出力される。このようにして、特殊低域増強処理部2225は、入力される音声信号の低域を特殊な方法で増強し、バーチャルピッチ効果を利用して低音感の向上を図る。 The special low frequency enhancement processing unit 2225 receives the audio signal In. A bass sound is extracted from the input audio signal In by the bass sound extraction unit 22251, and a harmonic component is generated by the harmonic component generation unit 22252. The generated overtone and the audio signal output from the variable gain increase / decrease unit 22254 are integrated by the adder 22255 and output as an audio signal Out. In this way, the special low frequency enhancement processing unit 2225 enhances the low frequency of the input audio signal by a special method, and improves the low-frequency feeling using the virtual pitch effect.
 台詞くっきり処理部2226には、左チャンネルの音声信号Lin及び右チャンネルの音声信号Rinが夫々入力される。ここで、台詞は、背景音と異なり、音声の位置が左右の中央に定位することが多いので、ある周波数帯域における左音声信号の信号レベルと右音声信号における信号レベルとの差分が微小となる場合、その周波数帯域の音は、台詞を示す音声である確率が極めて高いことが知られている(例えば、特開2007-158873号公報を参照)。そこで、入力された音声信号Lin及びRinは、加算部22261によって統合され、その周波数帯域のうち台詞を示す確率が高い周波数帯域がイコライザ22262によって特定されフィルタリングされる。可変ゲイン増減部22263~22267は、入力された音声信号Lin及びRin、並びにイコライザ22262によってフィルタリングされた特定帯域の音声信号の各々に対してゲイン増減の調整を行う。ここで、イコライザ22262によってフィルタリングされ、可変ゲイン増減部22265によってゲインが調整された特定帯域の音声信号には、左右両チャンネルの音声信号が含まれている。そこで、左チャンネルについては、この特定帯域の音声信号が加算部22268において左チャンネルの音声信号に加算される際に、可変ゲイン増減部22264によってゲインが調整された右チャンネルの音声信号が減算されるようにし、この結果、台詞が強調された左チャンネルの音声信号LOUTが出力される。右チャンネルについても同様にして、台詞が強調された右チャンネルの音声信号ROUTが出力される。このようにして、台詞くっきり処理部2226は、入力される音声信号の台詞が明瞭に聞こえるようにする。 The speech clear processing unit 2226 receives the left channel audio signal L in and the right channel audio signal R in, respectively . Here, unlike the background sound, the speech is often localized in the center of the left and right, so the difference between the signal level of the left audio signal and the signal level of the right audio signal in a certain frequency band is very small. In this case, it is known that the sound in the frequency band has a very high probability of being speech indicating speech (see, for example, Japanese Patent Laid-Open No. 2007-158873). Therefore, the input audio signals L in and R in are integrated by the adder 22261, and a frequency band having a high probability of indicating a line is specified and filtered by the equalizer 22262. The variable gain increase / decrease units 22263 to 22267 adjust the gain increase / decrease for each of the input audio signals L in and R in and the audio signal in the specific band filtered by the equalizer 22262. Here, the audio signal of the specific band that has been filtered by the equalizer 22262 and whose gain has been adjusted by the variable gain increasing / decreasing unit 22265 includes the audio signals of both the left and right channels. Therefore, for the left channel, when the audio signal in the specific band is added to the audio signal of the left channel in the adder 22268, the audio signal of the right channel whose gain is adjusted by the variable gain increase / decrease unit 22264 is subtracted. As a result, the audio signal L OUT of the left channel in which the dialogue is emphasized is output. Similarly, the right channel audio signal R OUT in which the dialogue is emphasized is output for the right channel. In this way, the dialogue clear processing unit 2226 makes it possible to hear the dialogue of the input audio signal clearly.
 これら立体音場感処理部2224~台詞くっきり処理部2226の各々におけるゲイン、或いは適用度は、例えば図4に示すように、パターン判別部233によって判別されたパターンに対応するように、音声制御部235によって設定制御される。 The gain or applicability in each of the three-dimensional sound field feeling processing unit 2224 to the dialogue clear processing unit 2226 corresponds to the pattern determined by the pattern determining unit 233 as shown in FIG. 4, for example. Setting control is performed by H.235.
 上述したような帯域別のゲイン増減処理、及び音響処理から構成される音声調整によって、図5のような周波数特性が得られる。ここで、
図5は、第1実施例に係る、各パターンに適用する音声調整に係る周波数特性を示す特性図である。
The frequency characteristics as shown in FIG. 5 are obtained by the sound adjustment composed of the gain increase / decrease processing for each band and the acoustic processing as described above. here,
FIG. 5 is a characteristic diagram illustrating frequency characteristics related to audio adjustment applied to each pattern according to the first embodiment.
 図5に示すように、各パターンに適用する音声調整に係る周波数特性は、各パターン毎に異なる。その設定思想は、図2に示した通りである。このように、映像信号等に対する分析によって判別されたパターンに好適な周波数特性となるように、音声制御部235は、音声調整部222を制御するのである。 As shown in FIG. 5, the frequency characteristics related to the audio adjustment applied to each pattern are different for each pattern. The setting idea is as shown in FIG. As described above, the audio control unit 235 controls the audio adjustment unit 222 so that the frequency characteristic suitable for the pattern determined by the analysis on the video signal or the like is obtained.
 続いて、適宜図7から図9を参照しつつ、
図6のフローチャートに基づいて、第1実施例に係る、再生装置2の基本動作を説明する。
Subsequently, referring to FIGS. 7 to 9 as appropriate,
Based on the flowchart of FIG. 6, the basic operation of the playback apparatus 2 according to the first embodiment will be described.
 図6に示すように、先ず、音声制御部235は、音声調整用パラメータをリセットする(ステップS10)。つまり、図4に示すような、各パターンに対応する設定値を、最左の「標準」に戻す。映像分析部231は、映像抽出部211によって抽出される映像信号を1又は複数の映像信号パラメータ毎に、定性的に又は定量的に、分析する(ステップS20)。そして、この分析結果に基づいて、パターン判別部233は、シーンのパターンを判別する(ステップS30)。 As shown in FIG. 6, first, the voice control unit 235 resets the voice adjustment parameters (step S10). That is, as shown in FIG. 4, the setting value corresponding to each pattern is returned to the leftmost “standard”. The video analysis unit 231 analyzes the video signal extracted by the video extraction unit 211 qualitatively or quantitatively for each of one or a plurality of video signal parameters (step S20). And based on this analysis result, the pattern discrimination | determination part 233 discriminate | determines the pattern of a scene (step S30).
 音声制御部235は、視聴者4への違和感を低減するために、以下に詳述する音声クロスフェード処理を行う(ステップS40)。尚、併せて、映像制御部235は、映像クロスフェード処理を行ってもよい。音声クロスフェード処理において先ず、音声制御部235は、記憶部24に予め記憶された図4に示す対応図を参照して、判別されたパターンに対応した音声調整用パラメータをターゲットに設定する(ステップS41)。例えば、図4に示す対応図において、「ライブ」の列に記された各値から、「スタジオ」の列に記された各値へと、ターゲットを設定し直す。そして、音声制御部235は、該設定されたターゲットに近づけるように音声調整用パラメータを段階的に増減する(ステップS42)。つまり、音声調整用パラメータを少し増減しては、所定期間(例えば100mSec以下)ほど待機し(ステップS43)、音声調整用パラメータの現状値がターゲット設定値になるまで繰り返す(ステップS44)。尚、パターン判別部233による判別は、音声クロスフェード処理の途中であっても並行して行われることが望ましい。これにより、判別結果が変われば、音声調整用パラメータのターゲットを設定し直し、その設定値へ向かって、更に音声クロスフェード処理が行われる。 The audio control unit 235 performs an audio cross-fade process that will be described in detail below in order to reduce discomfort to the viewer 4 (step S40). In addition, the video control unit 235 may perform a video cross fade process. In the audio cross-fade process, first, the audio control unit 235 refers to the correspondence diagram shown in FIG. 4 stored in advance in the storage unit 24, and sets the audio adjustment parameter corresponding to the determined pattern as a target (step) S41). For example, in the correspondence diagram shown in FIG. 4, the target is reset from each value written in the “live” column to each value written in the “studio” column. Then, the sound control unit 235 increases or decreases the sound adjustment parameter stepwise so as to approach the set target (step S42). That is, after slightly increasing / decreasing the audio adjustment parameter, the process waits for a predetermined period (for example, 100 mSec or less) (step S43), and repeats until the current value of the audio adjustment parameter reaches the target setting value (step S44). Note that the determination by the pattern determination unit 233 is preferably performed in parallel even during the audio cross-fading process. As a result, if the determination result changes, the target of the audio adjustment parameter is reset, and the audio cross-fading process is further performed toward the set value.
 上述の結果として得られる効果について、図7から図9を参照して説明する。ここで、
図7は、シーンのパターンを判別した結果に対応する音声調整に関して、比較例と第1実施例とを対比して示すタイミングチャートである。
The effects obtained as a result will be described with reference to FIGS. here,
FIG. 7 is a timing chart showing the comparison between the comparative example and the first embodiment regarding the audio adjustment corresponding to the result of determining the scene pattern.
 図7の最上段に示すように、再生されるコンテンツは、同一のコンテンツ内であっても、例えば音楽が演奏されるライブシーンと出演者が会話をするスタジオシーンとが混在している。 As shown in the uppermost part of FIG. 7, even if the content to be reproduced is within the same content, for example, a live scene where music is played and a studio scene where performers have a conversation are mixed.
 比較例に係る技術は、コンテンツに関して電子番組情報に予め記録されている「番組ジャンル」情報に応じて、適用される音声調整の種類を自動で切り替える。そうすると、図7の上から2段目に示すように、同一のコンテンツ内におけるライブシーンとスタジオシーンとを区別することができない。それゆえ、図7の上から3段目に示すように、再生されるコンテンツが、ライブシーンからスタジオシーンへと変わっても、音声調整はライブシーンに対応する種類のものが適用されたままであるため、視聴者4が違和感を感じる。 The technology according to the comparative example automatically switches the type of audio adjustment to be applied according to the “program genre” information recorded in advance in the electronic program information regarding the content. Then, as shown in the second row from the top in FIG. 7, it is impossible to distinguish between a live scene and a studio scene in the same content. Therefore, as shown in the third row from the top in FIG. 7, even if the content to be played changes from a live scene to a studio scene, the type of sound adjustment corresponding to the live scene remains applied. Therefore, the viewer 4 feels uncomfortable.
 これに対して、第1実施例に係る再生装置2の映像分析部231は、映像抽出部211によって抽出される映像信号を1又は複数の映像信号パラメータ毎に、定性的に又は定量的に、分析する(ステップS20)。そして、この分析結果に基づいて、パターン判別部233は、シーンのパターンを判別する(ステップS30)。そうすると、例えば、画面の面積のうち肌色の比率が所定閾値よりも多いか否かが分析されるので、図7の上から4段目に示すように、映像信号が、コンサートの全体映像から、スタジオにおける話者のアップへと切り替わったとしても、その切り替わりを、図7の上から5段目に示すように、好適に判別できる。尚、判別周期は、視聴者4に対して違和感を与えない程度に短い時間とすることが望ましく、例えば500msec毎である。その結果、音声制御部235は、判別されたパターンに対応する種類の音声調整を、入力される音声信号に対して施すように、音声調整部222を制御する。これにより、映像と音声との不適合により視聴者が感じる違和感を好適に低減可能となる。 On the other hand, the video analysis unit 231 of the playback device 2 according to the first embodiment qualitatively or quantitatively analyzes the video signal extracted by the video extraction unit 211 for each of one or a plurality of video signal parameters. Analyze (step S20). And based on this analysis result, the pattern discrimination | determination part 233 discriminate | determines the pattern of a scene (step S30). Then, for example, it is analyzed whether or not the skin color ratio of the screen area is larger than a predetermined threshold value. Therefore, as shown in the fourth row from the top in FIG. Even if the speaker is switched up in the studio, the switching can be suitably determined as shown in the fifth row from the top in FIG. Note that it is desirable that the determination cycle be as short as not to give the viewer 4 a sense of incongruity, for example, every 500 msec. As a result, the audio control unit 235 controls the audio adjustment unit 222 so as to perform the type of audio adjustment corresponding to the determined pattern on the input audio signal. As a result, it is possible to suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio.
 続いて、図8及び図9を参照して、クロスフェード処理による利点を補足する。 Subsequently, with reference to FIG. 8 and FIG. 9, the advantages of the crossfade process will be supplemented.
 図8は、第1実施例に係る、シーンのパターンを判別した結果に対応する映像調整及び音声調整の各々に対するクロスフェード処理を示すタイミングチャートである。 FIG. 8 is a timing chart showing crossfade processing for each of video adjustment and audio adjustment corresponding to the result of determining the scene pattern according to the first embodiment.
 図8に示すように、ライブからスタジオへとシーンのパターンが変化したことが判別されたら、所定期間T、及びTを夫々費やして、映像クロスフェード処理、及び音声クロスフェード処理が行われる。 As shown in FIG. 8, when it is determined that the scene pattern has changed from live to studio, video cross-fading processing and audio cross-fading processing are performed using predetermined periods T 1 and T 2 , respectively. .
 図9は、第1実施例に係る、音声クロスフェード処理を示す特性図である。 FIG. 9 is a characteristic diagram showing the audio cross-fading process according to the first embodiment.
 図9の中、太矢印が示すように、音声クロスフェード処理によると、音声制御部235は、音声調整用パラメータをターゲット設定値(この場合、図4の「スタジオ」の列に記された各値)に徐々に近づけるように段階的に(例えば3段階で)増減する。 As shown by the thick arrows in FIG. 9, according to the audio cross-fade process, the audio control unit 235 sets the audio adjustment parameters to the target setting values (in this case, the respective “Studio” columns in FIG. 4). The value is increased or decreased stepwise (for example, in three steps) so as to gradually approach the value.
 以上、第1実施例によると、音声制御部235は、判別されたパターンに対応する種類の音声調整を、入力される音声信号に対して施すように、音声調整部222を制御する。これにより、映像と音声との不適合により視聴者が感じる違和感を好適に低減可能となる。そして、異なる音声調整を施す際に、音声クロスフェード処理によって、所定の音声調整用パラメータへと、一度ではなく、所定の期間に亘って段階的に近づけられるので、視聴者4が感じる違和感を好適に低減できる。しかも、音声調整のみならず、映像調整も行われ、該映像調整の際に映像クロスフェード処理が行われるので、映像と音声との不適合により視聴者4が感じる違和感を一層好適に低減可能である。 As described above, according to the first embodiment, the sound control unit 235 controls the sound adjustment unit 222 so as to perform sound adjustment of a type corresponding to the determined pattern on the input sound signal. As a result, it is possible to suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio. Then, when performing different audio adjustments, the audio crossfading process allows the predetermined audio adjustment parameters to be gradually approached over a predetermined period rather than once, so that the viewer 4 feels uncomfortable. Can be reduced. Moreover, since not only audio adjustment but also video adjustment is performed, and video cross-fade processing is performed at the time of the video adjustment, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer 4 due to the mismatch between video and audio. .
 (2)第2実施例
 続いて、第2実施例に係る再生装置2の構成及び動作処理を、図1から図5に加えて、図10及び図11を参照して説明する。尚、第1実施例に係る再生装置2の構成要素と同様のものに関しては同一の符号を付し詳細な説明を適宜省略する。
(2) Second Example Next, the configuration and operation processing of the playback apparatus 2 according to the second example will be described with reference to FIGS. 10 and 11 in addition to FIGS. The same components as those of the playback apparatus 2 according to the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.
 図10は、第2実施例に係る、再生装置2の基本動作を示すフローチャートである。 FIG. 10 is a flowchart showing the basic operation of the playback apparatus 2 according to the second embodiment.
 図10に示すように、第2実施例に係る再生装置2は、映像分析(ステップS20)に加えて、音声分析(ステップS21)も行う。音声分析(ステップS21)によると、音声信号が、例えば、リズム、音域別のゲイン、楽曲が演奏されているか否かを示す情報、人間の話声が含まれているか否かを示す情報等の音声信号パラメータ毎に分析される。その結果、以下のように、パターン判別の精度が向上する。 As shown in FIG. 10, the playback device 2 according to the second embodiment performs voice analysis (step S21) in addition to video analysis (step S20). According to the voice analysis (step S21), the voice signal includes, for example, rhythm, gain for each range, information indicating whether or not music is being played, information indicating whether or not a human speech is included, and the like. Each voice signal parameter is analyzed. As a result, the accuracy of pattern discrimination is improved as follows.
 図11は、シーンのパターンを判別した結果及びその結果に対応する音声調整に関して、比較例と第1実施例と第2実施例とを対比して示すタイミングチャートである。 FIG. 11 is a timing chart showing a comparison between the comparative example, the first example, and the second example regarding the result of determining the scene pattern and the sound adjustment corresponding to the result.
 図11の上から1段目に示すように、再生されるコンテンツは、同一のコンテンツ内であっても、例えば音楽が演奏されるライブシーンと出演者が会話をするスタジオシーンとが混在している。 As shown in the first row from the top in FIG. 11, even if the content to be reproduced is in the same content, for example, a live scene where music is played and a studio scene where performers talk are mixed. Yes.
 図11の上から2段目及び3段目に示すように、比較例においては、再生されるコンテンツが、ライブシーンからスタジオシーンへと変わっても、音声調整はライブシーンに対応する種類のものが適用されたままであるため、視聴者4が違和感を感じる。 As shown in the second and third stages from the top in FIG. 11, in the comparative example, even if the content to be reproduced changes from a live scene to a studio scene, the audio adjustment is of a type corresponding to the live scene. Is still applied, the viewer 4 feels uncomfortable.
 これに対して、第1実施例に係る再生装置2は、図11の上から4段目から6段目に示すように、映像分析(ステップS20)を行うので、再生されるコンテンツが、ライブシーンからスタジオシーンへと変わることを、好適に判別し、それに対応した種類の音声調整が施される。 On the other hand, the playback apparatus 2 according to the first embodiment performs video analysis (step S20) as shown in the fourth to sixth stages from the top in FIG. A change from a scene to a studio scene is suitably determined, and a corresponding type of audio adjustment is performed.
 ところが、第1実施例に係る再生装置2は、映像分析(ステップS20)しか行っておらず、その際に、画面の面積のうち肌色の比率が所定閾値よりも多いか否かが分析されるので、図11の上から4段目及び5段目に示すように、ライブシーンにおいて、単にボーカルのアップ姿が写されているだけにもかかわらず、スタジオシーンであると誤って判別される虞がある。この判定結果に基づいて音声調整が施されると、やはり、視聴者4が違和感を感じてしまう。 However, the playback apparatus 2 according to the first embodiment performs only video analysis (step S20), and at that time, it is analyzed whether or not the skin color ratio of the screen area is greater than a predetermined threshold. Therefore, as shown in the fourth and fifth steps from the top in FIG. 11, there is a possibility that the live scene is erroneously determined to be a studio scene, even though the vocal appearance is simply copied. There is. If audio adjustment is performed based on this determination result, the viewer 4 still feels uncomfortable.
 これに対して、第2実施例に係る再生装置2は、図11の上から7段目及び8段目に示すように、音声信号を音声信号パラメータ毎に分析する。例えば、音声分析部232が、楽曲が演奏されているか否かも分析する。これにより、楽曲が演奏されていることが確認されるので、パターン判別部233が、スタジオシーンであると誤って判別してしまうことを回避できる。 On the other hand, the playback apparatus 2 according to the second embodiment analyzes the audio signal for each audio signal parameter as shown in the seventh and eighth stages from the top in FIG. For example, the voice analysis unit 232 also analyzes whether or not music is being played. Thereby, since it is confirmed that the music is played, it can avoid that the pattern discrimination | determination part 233 mistakenly discriminate | determines that it is a studio scene.
 以上、第2実施例によると、音声信号も分析することで、映像信号からだけでは一つに判別しきれなかったパターン、或いは誤って判別してしまったパターンを、より少ない数に判別しきることができ、或いは適切なパターンを判別し直すことができる。 As described above, according to the second embodiment, by analyzing the audio signal, it is possible to discriminate a pattern that cannot be discriminated to be one from the video signal or a pattern that has been discriminated erroneously to a smaller number. Or an appropriate pattern can be re-determined.
 (3)第3実施例
 続いて、第3実施例に係る再生装置2の構成及び動作処理を、図1から図5に加えて、図12及び図13を参照して説明する。ここで、
図12は、第3実施例に係る、再生装置2の基本動作を示すフローチャートであり、
図13は、シーンのパターンを判別した結果に対応する音声調整に関して、第1実施例と第3実施例とを対比して示すタイミングチャートである。
尚、第1実施例に係る再生装置2の構成要素と同様のものに関しては同一の符号を付し詳細な説明を適宜省略する。
(3) Third Example Next, the configuration and operation processing of the playback apparatus 2 according to the third example will be described with reference to FIGS. 12 and 13 in addition to FIGS. here,
FIG. 12 is a flowchart showing the basic operation of the playback apparatus 2 according to the third embodiment.
FIG. 13 is a timing chart showing a comparison between the first embodiment and the third embodiment regarding the audio adjustment corresponding to the result of determining the scene pattern.
The same components as those of the playback apparatus 2 according to the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.
 図12に示すように、第3実施例に係る再生装置2は、パターン判別部233が、シーンのパターンを判別した結果、再生されるコンテンツのシーンに適合すると判別されるパターンの数に応じて、音声クロスフェード処理に費やす時間を変更する。具体的には、先ず、再生されるコンテンツのシーンに適合すると判別されるパターンが二つ以上あるか否かを判定する(ステップS31)。 As shown in FIG. 12, in the playback device 2 according to the third embodiment, the pattern discriminating unit 233 discriminates the scene pattern according to the number of patterns discriminated to match the scene of the content to be played back. Change the time spent on voice crossfade processing. Specifically, first, it is determined whether there are two or more patterns that are determined to be suitable for the scene of the content to be reproduced (step S31).
 ここで、再生されるコンテンツのシーンに適合すると判別されるパターンが二つ以上はなく、一つしかない場合には(ステップS31:No)、図13の上から3段目に示すように、その後の音声クロスフェード処理に費やす所定の時間Tを所定の標準値に設定する(ステップS32)。これにより、その後の音声クロスフェード処理において、当該パターンに対応する種類の音声調整が迷うことなく施される(ステップS40~S44)。 Here, when there is not two or more patterns that are determined to be suitable for the scene of the content to be reproduced and there is only one pattern (step S31: No), as shown in the third row from the top in FIG. setting a predetermined time T 2 spend subsequent speech crossfading to a predetermined standard value (step S32). As a result, in the subsequent audio crossfading process, the type of audio adjustment corresponding to the pattern is performed without hesitation (steps S40 to S44).
 他方で、再生されるコンテンツのシーンに適合すると判別されるパターンが二つ以上の場合には(ステップS31:Yes)、図13の上から5段目に示すように、その後の音声クロスフェード処理に費やす所定の時間Tを標準値よりも、ΔTだけ長期化する(ステップS33)。これにより、当該二つ以上のパターンのうち何れかに対応する種類の音声調整が相対的にゆっくりと施される。そうすると、仮に音声クロスフェード処理の途中に行われる次のパターン判別(ステップS30)によって、再生されるコンテンツが他のパターンであることが判別される場合であっても、当該他のパターンに対応する適切な種類の音声調整が、入力される音声信号に対して好適に施される。 On the other hand, if there are two or more patterns determined to be suitable for the scene of the content to be played back (step S31: Yes), as shown in the fifth row from the top in FIG. the predetermined time T 2 spend than the standard value, prolonged by [Delta] T 2 (step S33). Thereby, the type of sound adjustment corresponding to one of the two or more patterns is relatively slowly performed. Then, even if it is determined that the content to be reproduced is another pattern by the next pattern determination (step S30) performed in the middle of the audio crossfading process, it corresponds to the other pattern. Appropriate types of audio adjustments are preferably applied to the input audio signal.
 以上、第3実施例によると、判別されるパターンの数が複数あり、何れのパターンが最適であるかが絞りきれない場合には、例えば音声クロスフェード処理に費やす時間Tが相対的に長期化され、何れかのパターンに対応する種類の音声調整を急速に施すことは回避されるので、パターン判別精度の不確定性を吸収することができる。 As described above, according to the third embodiment, there are a plurality number of patterns to be discriminated, if any pattern is not fully aperture or is optimal, for example, the time T 2 spends audio crossfading relatively long Therefore, it is possible to absorb the uncertainty of the pattern discrimination accuracy because it is avoided that the voice adjustment of the type corresponding to any pattern is performed rapidly.
 加えて、第3実施例では、音声が急に切り替わる場合に比べて、映像が急に切り替わる場合に視聴者が感じる違和感は大きいことに鑑みて、図13の上から2段目及び4段目に示すように、映像クロスフェード処理に費やす時間Tの下限値は、音声クロスフェード処理に費やす時間Tの下限値よりも長く設定されている。これにより、視聴者が感じる違和感を一層好適に低減できる。 In addition, in the third embodiment, in view of the discomfort felt by the viewer when the video changes suddenly compared to when the audio changes suddenly, the second and fourth steps from the top of FIG. as shown in the lower limit value of the time T 1 spent on video cross-fade processing is set to be longer than the lower limit value of the time T 2 to spend on voice crossfading. Thereby, the discomfort felt by the viewer can be more suitably reduced.
 尚、上述した各実施例において、
「再生装置2」が、本発明に係る「再生装置」の一具体例であり、
「映像分析部231」及び/又は「音声分析部232」が、本発明に係る「分析手段」の一具体例であり、
「パターン判別部233」が、本発明に係る「パターン判別手段」の一具体例であり、
「音声制御部235」が、本発明に係る「音声制御手段」の一具体例であり、
「映像制御部234」が、本発明に係る「映像制御手段」の一具体例であり、
「音声調整部222」が、本発明に係る「音声調整手段」の一具体例であり、
「映像調整部212」が、本発明に係る「映像調整手段」の一具体例であり、
「ステップS20」及び/又は「ステップS21」が、本発明に係る「分析工程」の一具体例であり、
「ステップS30」が、本発明に係る「パターン判別工程」の一具体例であり、
「ステップS40~S44」が、本発明に係る「音声制御工程」の一具体例である。
In each embodiment described above,
"Reproducing apparatus 2" is a specific example of "reproducing apparatus" according to the present invention,
The “video analysis unit 231” and / or the “voice analysis unit 232” is a specific example of the “analysis unit” according to the present invention.
The “pattern discrimination unit 233” is a specific example of the “pattern discrimination unit” according to the present invention.
"Voice control unit 235" is a specific example of "voice control means" according to the present invention,
"Video control unit 234" is a specific example of "video control means" according to the present invention,
The “sound adjustment unit 222” is a specific example of the “sound adjustment unit” according to the present invention,
The “image adjusting unit 212” is a specific example of the “image adjusting unit” according to the present invention.
“Step S20” and / or “Step S21” is a specific example of the “analysis step” according to the present invention.
“Step S30” is a specific example of the “pattern discrimination step” according to the present invention.
“Steps S40 to S44” is a specific example of the “voice control step” according to the present invention.
 尚、本発明は、上述した実施例に限られるものではなく、請求の範囲及び明細書全体から読み取れる発明の要旨、或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴うもまた、本発明の技術的範囲に含まれるものである。 It should be noted that the present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit or idea of the invention that can be read from the claims and the entire specification, and is accompanied by such changes. Moreover, it is included in the technical scope of the present invention.
 本発明に係る再生装置及び方法、並びにコンピュータプログラムは、例えばスピーカ付きのフラットディスプレイ、AV(Audio Visual:AV)システム、テレビジョン装置、BD(Blu-ray Disc:BD)プレーヤのように、映像信号及び音声信号から構成されるコンテンツを再生するための再生装置及び方法、並びにコンピュータプログラムの技術分野に利用可能である。 The reproduction apparatus and method and the computer program according to the present invention are video signals such as a flat display with speakers, an AV (Audio Visual: AV) system, a television apparatus, and a BD (Blu-ray Disc: BD) player. And a playback apparatus and method for playing back content composed of audio signals, and a computer program.

Claims (10)

  1.  映像信号及び音声信号から構成されるコンテンツを再生する再生装置であって、
     当該再生装置に入力される前記映像信号を1又は複数の映像信号パラメータ毎に分析する分析手段と、
     前記再生されるコンテンツの各シーンに適合するパターンを、前記分析された前記映像信号に係る各映像信号パラメータと、所定の複数パターンの各々に係る各映像信号パラメータとの適合度に基づいて判別するパターン判別手段と、
     当該再生装置に入力される前記音声信号に対して所定の音声調整を施す音声調整手段と、
     前記判別されたパターンに対応する種類の前記音声調整を、前記入力される音声信号に対して施すように、前記音声調整手段を制御する音声制御手段と
     を備えることを特徴とする再生装置。
    A playback device for playing back content composed of video and audio signals,
    Analyzing means for analyzing the video signal input to the playback apparatus for each of one or a plurality of video signal parameters;
    A pattern suitable for each scene of the content to be reproduced is determined based on a degree of conformity between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of a plurality of predetermined patterns. Pattern discrimination means;
    Audio adjustment means for performing predetermined audio adjustment on the audio signal input to the playback device;
    And a sound control means for controlling the sound adjustment means so that the sound adjustment of the type corresponding to the determined pattern is performed on the input sound signal.
  2.  前記音声制御手段は、前記所定の音声調整を前記音声信号に対して施すにあたり、当該所定の音声調整を施す前に適用されている音声調整用パラメータから、当該所定の音声調整用パラメータへと、段階的に近づけながら変更していく音声クロスフェード処理を行うように、前記音声調整手段を制御する
     ことを特徴とする請求の範囲第1項記載の再生装置。
    In performing the predetermined sound adjustment on the sound signal, the sound control means, from the sound adjustment parameter applied before performing the predetermined sound adjustment, to the predetermined sound adjustment parameter, The playback apparatus according to claim 1, wherein the sound adjusting means is controlled so as to perform a sound cross-fade process that changes while approaching in steps.
  3.  前記音声制御手段は、前記音声クロスフェード処理を行うにあたり、前記再生されるコンテンツのシーンに適合すると判別されるパターンの数に応じて、前記音声クロスフェード処理に費やす時間を変更する
     ことを特徴とする請求の範囲第2項記載の再生装置。
    The sound control means, when performing the sound cross-fading process, changes a time spent for the sound cross-fading process according to the number of patterns determined to be suitable for a scene of the content to be reproduced. The playback apparatus according to claim 2.
  4.  前記音声制御手段は、前記音声クロスフェード処理を行うにあたり、前記再生されるコンテンツのシーンに適合すると判別されるパターンが二つ以上の場合には、該パターンが一つの場合に比べて、前記音声クロスフェード処理に費やす時間を長くする
     ことを特徴とする請求の範囲第3項記載の再生装置。
    In performing the audio cross-fading process, the audio control means, when there are two or more patterns determined to be suitable for the scene of the content to be reproduced, compared to the case where there is only one pattern, the audio control means 4. The playback apparatus according to claim 3, wherein a time spent for the cross-fade process is lengthened.
  5.  当該再生装置に入力される前記映像信号に対して所定の映像調整を施す映像調整手段と、
     前記判別されたパターンに対応する種類の前記映像調整を、前記入力される映像信号に対して施すように、前記映像調整手段を制御する映像制御手段とを更に備える
     ことを特徴とする請求の範囲第3項記載の再生装置。
    Video adjusting means for performing predetermined video adjustment on the video signal input to the playback device;
    The video control means for controlling the video adjustment means so as to apply the video adjustment of the type corresponding to the discriminated pattern to the input video signal. 4. A playback apparatus according to item 3.
  6.  前記映像制御手段は、前記所定の映像調整において、当該所定の映像調整を施す前に適用されている映像調整用パラメータから、当該所定の映像調整用パラメータへと、段階的に近づけながら変更していく映像クロスフェード処理を行うように、前記映像調整手段も制御する
     ことを特徴とする請求の範囲第5項記載の再生装置。
    In the predetermined video adjustment, the video control means changes from the video adjustment parameter applied before performing the predetermined video adjustment to the predetermined video adjustment parameter while gradually approaching the predetermined video adjustment. 6. The playback apparatus according to claim 5, wherein the video adjusting means is also controlled so as to perform a video cross fade process.
  7.  前記映像制御手段は、前記映像クロスフェード処理を行うにあたり、前記再生されるコンテンツのシーンに適合すると判別されるパターンの数に応じて、前記映像クロスフェード処理に費やす時間を変更し、
     該映像クロスフェード処理に費やす時間の下限値は、前記音声クロスフェード処理に費やす時間の下限値よりも長く設定されている
     ことを特徴とする請求の範囲第6項記載の再生装置。
    The video control means, when performing the video cross-fade processing, changes the time spent in the video cross-fade processing according to the number of patterns determined to match the scene of the content to be reproduced,
    The playback apparatus according to claim 6, wherein the lower limit value of the time spent for the video crossfade process is set longer than the lower limit value of the time spent for the audio crossfade process.
  8.  前記分析手段は、当該再生装置に入力される前記映像信号に加えて、前記音声信号を1又は複数の音声信号パラメータ毎に分析し、
     前記パターン判別手段は、前記分析された前記映像信号に係る各映像信号パラメータと、前記所定の複数パターンの各々に係る各映像信号パラメータとの適合度、及び前記分析された前記音声信号に係る各音声信号パラメータと、前記所定の複数パターンの各々に係る各音声信号パラメータとの適合度に基づいて、前記再生されるコンテンツのシーンに適合するパターンを判別する
     ことを特徴とする請求の範囲第1項記載の再生装置。
    The analysis means analyzes the audio signal for each of one or a plurality of audio signal parameters in addition to the video signal input to the playback device,
    The pattern discriminating means includes a degree of matching between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of the predetermined plurality of patterns, and each relating to the analyzed audio signal. The pattern matching the scene of the content to be reproduced is determined based on the degree of matching between the audio signal parameter and each audio signal parameter related to each of the predetermined plurality of patterns. The reproducing apparatus according to item.
  9.  映像信号及び音声信号から構成されるコンテンツを再生する再生方法であって、
     入力される前記映像信号を1又は複数の映像信号パラメータ毎に分析する分析工程と、
     前記再生されるコンテンツの各シーンに適合するパターンを、前記分析された前記映像信号に係る各映像信号パラメータと、所定の複数パターンの各々に係る各映像信号パラメータとの適合度に基づいて判別するパターン判別工程と、
     入力される前記音声信号に対して所定の音声調整を施す音声調整工程と、
     前記判別されたパターンに対応する種類の前記音声調整を、前記入力される音声信号に対して施すように、前記音声調整手段を制御する音声制御工程と
     を備えることを特徴とする再生方法。
    A playback method for playing back content composed of video signals and audio signals,
    Analyzing the input video signal for each of one or more video signal parameters;
    A pattern suitable for each scene of the content to be reproduced is determined based on a degree of conformity between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of a plurality of predetermined patterns. A pattern discrimination process;
    A sound adjustment step of performing a predetermined sound adjustment on the input sound signal;
    And a sound control step of controlling the sound adjustment means so that the sound adjustment of the type corresponding to the determined pattern is performed on the input sound signal.
  10.  コンピュータを、請求の範囲第1項記載の再生装置として機能させることを特徴とするコンピュータプログラム。 A computer program for causing a computer to function as the playback device according to claim 1.
PCT/JP2008/057893 2008-04-24 2008-04-24 Reproducing device and method, and computer program WO2009130773A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2008/057893 WO2009130773A1 (en) 2008-04-24 2008-04-24 Reproducing device and method, and computer program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2008/057893 WO2009130773A1 (en) 2008-04-24 2008-04-24 Reproducing device and method, and computer program

Publications (1)

Publication Number Publication Date
WO2009130773A1 true WO2009130773A1 (en) 2009-10-29

Family

ID=41216522

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/057893 WO2009130773A1 (en) 2008-04-24 2008-04-24 Reproducing device and method, and computer program

Country Status (1)

Country Link
WO (1) WO2009130773A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005094072A (en) * 2003-09-12 2005-04-07 Sony Corp Television receiver and method thereof
JP2005130416A (en) * 2003-09-30 2005-05-19 Toshiba Corp Moving picture processor, moving picture processing method and moving picture processing program
JP2005234494A (en) * 2004-02-23 2005-09-02 Sony Corp Musical piece correspondence display device
JP2005318225A (en) * 2004-04-28 2005-11-10 Matsushita Electric Ind Co Ltd Recording/reproducing device
JP2006245745A (en) * 2005-03-01 2006-09-14 Mitsubishi Electric Corp Digital broadcast receiver
JP2007053510A (en) * 2005-08-17 2007-03-01 Sony Corp Recording/reproducing device, reproducing device, and method and program for adjusting volume automatically
JP2007159527A (en) * 2005-12-16 2007-06-28 Noritsu Koki Co Ltd Enzyme water production apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005094072A (en) * 2003-09-12 2005-04-07 Sony Corp Television receiver and method thereof
JP2005130416A (en) * 2003-09-30 2005-05-19 Toshiba Corp Moving picture processor, moving picture processing method and moving picture processing program
JP2005234494A (en) * 2004-02-23 2005-09-02 Sony Corp Musical piece correspondence display device
JP2005318225A (en) * 2004-04-28 2005-11-10 Matsushita Electric Ind Co Ltd Recording/reproducing device
JP2006245745A (en) * 2005-03-01 2006-09-14 Mitsubishi Electric Corp Digital broadcast receiver
JP2007053510A (en) * 2005-08-17 2007-03-01 Sony Corp Recording/reproducing device, reproducing device, and method and program for adjusting volume automatically
JP2007159527A (en) * 2005-12-16 2007-06-28 Noritsu Koki Co Ltd Enzyme water production apparatus

Similar Documents

Publication Publication Date Title
US7336792B2 (en) Virtual acoustic image localization processing device, virtual acoustic image localization processing method, and recording media
JP5702599B2 (en) Device and method for processing audio data
EP3108672B1 (en) Content-aware audio modes
KR930004932B1 (en) Sound effect system
US9332373B2 (en) Audio depth dynamic range enhancement
JP4602204B2 (en) Audio signal processing apparatus and audio signal processing method
KR101249239B1 (en) Audio level control
KR830000937B1 (en) Stereo sound synthesizer
JPS63183495A (en) Sound field controller
WO2015097831A1 (en) Electronic device, control method, and program
US5798922A (en) Method and apparatus for electronically embedding directional cues in two channels of sound for interactive applications
JP2009005369A (en) File creation apparatus and data output apparatus
CN114466242A (en) Display device and audio processing method
CN114615534A (en) Display device and audio processing method
JP6569571B2 (en) Signal processing apparatus and signal processing method
WO2009130773A1 (en) Reproducing device and method, and computer program
CN114598917A (en) Display device and audio processing method
CN114466241A (en) Display device and audio processing method
JP2885138B2 (en) Sound reproduction device
JP5067240B2 (en) Delay control device
JP4967916B2 (en) Signal processing device
KR0132814B1 (en) Sound mode control method of disc player
JP5097149B2 (en) Content data playback device
JPH11331982A (en) Sound processor
Nyqvist What Audio Quality Attributes Affect the Viewer's Preference, Comparing Overhead and Underneath Boom Microphone Techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08740820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08740820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP