WO2009130773A1

WO2009130773A1 - Reproducing device and method, and computer program

Info

Publication number: WO2009130773A1
Application number: PCT/JP2008/057893
Authority: WO
Inventors: 四郎鈴木; 英世水流
Original assignee: パイオニア株式会社
Priority date: 2008-04-24
Filing date: 2008-04-24
Publication date: 2009-10-29

Abstract

A reproducing device (2) comprises analysis means (231, 232) for analyzing a video signal to be inputted for each of one or more video signal parameters, a pattern distinguishing means (233) for distinguishing patterns suitable for the scenes of contents to be reproduced according to the goodness of the video signal parameters associated with the analyzed video signals and the video signal parameters associated with predetermined patterns, an audio adjusting means (222) for applying predetermined audio adjustment to an audio signal to be inputted to the reproducing device, and an audio control means (235) for controlling the audio adjusting means so as to apply the audio adjustment of the type corresponding to the distinguished patterns to the audio signal to be inputted.

Description

Playback apparatus and method, and computer program

The present invention is composed of video signals and audio signals, such as a flat display with speakers, an AV (Audio Visual: AV) system, a television device, a DVD player, or a BD (Blu-ray Disc: BD) player. The present invention relates to a technical field of a playback apparatus and method for playing back content and a computer program.

In this type of playback technology, in order to play back content composed of video signals and audio signals, video based on the video signals is output from the display, and audio based on the audio signals is output from the speaker.

When viewing such content, if the sound quality is not harmonized with the content, the viewer may feel very uncomfortable. For example, consider a news program and a music program as contents. In general, a music program has a wide frequency range of low and high frequency audio reproduction compared to the frequency of audio reproduction of a news program. Therefore, the playback apparatus performs audio adjustment on the audio signal so as to emphasize the low frequency and high frequency in accordance with the audio reproduction frequency band of the music program. Thereby, the viewer can view the music program with high sound quality. However, if the viewer views another news program without changing the type of sound adjustment as it is, the bass may be emphasized too much, making it difficult to hear the human voice.

In such a case, usually, after the viewer views the content and confirms the content, he / she decides on the appropriate sound adjustment type and presses the sound quality changeover switch to determine the type of sound adjustment to be applied. There is a hassle of having to switch manually.

In order to eliminate such annoyance, Patent Document 1 refers to electronic program information (Electronic Program Guide: EPG) which is recorded in advance in the electronic program information regarding the content to be reproduced. A technique for automatically switching the type of audio adjustment to be applied in accordance with “program genre” information is disclosed.

However, the technique disclosed in Patent Document 1 may have the following problems. That is, for the same content, since the “program genre” information is the same, the same audio adjustment is applied, but even within the same content, for example, a live scene where music is played and appearance May be mixed with studio scenes where people talk, and depending on the scene, inappropriate audio adjustments may be applied, such as audio that does not match the video (that is, the viewer feels uncomfortable) There is a risk of it.

JP 2005-318225 A

The present invention has been made in view of, for example, the above-mentioned problems, and can suitably reduce a sense of discomfort felt by a viewer due to a mismatch between video and audio when playing content composed of video and audio signals. PROBLEM TO BE SOLVED: To provide a reproducing apparatus and method, and a computer program.

(Playback device)
In order to solve the above problems, a playback device according to the present invention plays back content composed of a video signal and an audio signal, and the video signal input to the playback device is converted into one or more video signals. Analyzing means for analyzing each video signal parameter, a pattern suitable for each scene of the reproduced content, each video signal parameter relating to the analyzed video signal, and each video relating to each of a plurality of predetermined patterns Pattern discriminating means for discriminating on the basis of the degree of conformity with the signal parameter, audio adjusting means for performing a predetermined audio adjustment on the audio signal input to the reproduction apparatus, and a type corresponding to the determined pattern Voice control means for controlling the voice adjustment means so as to perform the voice adjustment on the input voice signal.

According to the playback device of the present invention, when playing back content composed of a video signal and an audio signal, the uncomfortable feeling that the viewer feels due to the mismatch between the video and the audio can be suitably reduced as follows.

Content composed of video signals and audio signals means programs that are perceived by the viewer's visual and auditory sense, such as movies, sports, animation, studios, or live performances. In reproducing the content, the video signal and the audio signal are sequentially input to the reproduction device.

The analysis means is composed of, for example, a signal extraction circuit and an arithmetic / storage circuit, and analyzes the video signal input to the playback apparatus for each of one or a plurality of video signal parameters. The video signal parameter is, for example, luminance information (for example, information indicating whether the image is bright or dark), color information (for example, information indicating primary color or intermediate color, or a large number of colors) in a plurality of images constituting the video. ), Motion information (for example, information indicating whether it is static or dynamic), frame information (for example, information indicating the presence / absence of a black belt), and the like. “Analyze for each video signal parameter” is a general term for qualitative or quantitative analysis for each video signal parameter.

The pattern discriminating means comprises, for example, an arithmetic / storage circuit, and matches each scene of the content to be reproduced with each video signal parameter relating to the analyzed video signal and each video signal relating to each of a plurality of predetermined patterns. Discrimination is made based on the degree of compatibility with the parameters. For example, a video signal parameter related to a video signal is compared with each video signal parameter related to each of a plurality of predetermined patterns stored in advance in a storage circuit. It is determined that the video signal corresponds to the pattern. The video signal to be discriminated may be one frame constituting the video or a plurality of frames over a predetermined period.

The sound adjustment means performs predetermined sound adjustment on the sound signal input to the playback device. Here, the audio adjustment is a general term for processes that can change the impression given to the viewer by the audio output based on the audio signal by some method, such as gain increase / decrease processing for each band or predetermined acoustic processing. It is.

The voice control unit is, for example, a control unit, and controls the voice adjustment unit so that the type of voice adjustment corresponding to the determined pattern is performed on the input voice signal. For example, if it is determined that the pattern that matches the scene of the content to be played is live, the audio adjustment is performed so that the audio signal that relatively emphasizes the low frequency band is applied to the input audio signal. Control means. Thereafter, the pattern that matches the scene of the content to be reproduced is determined again by the analysis unit and the pattern determination unit regularly or irregularly, preferably in real time. At this time, for example, when it is determined that the pattern suitable for the scene of the content to be played back is a studio, audio adjustment that relatively emphasizes high frequencies is performed on the input audio signal. , To control the sound adjustment means.

In this way, even if a live scene where music is played in the same content and a studio scene where performers talk are mixed, suitable audio adjustment is automatically performed for each scene. As a result, it is possible to suitably reduce the sense of discomfort felt by the viewer due to incompatibility between video and audio, and the atmosphere of each scene can be felt more impressively.

In one aspect of the reproducing apparatus according to the present invention, the sound control unit may perform the predetermined sound adjustment on the sound signal from a sound adjustment parameter applied before performing the predetermined sound adjustment. Then, the sound adjusting means is controlled so as to perform a sound cross-fade process that changes while gradually approaching the predetermined sound adjusting parameter.

According to this aspect, when different audio adjustments are performed by the audio cross-fade process, it is possible to approach the predetermined audio adjustment parameters step by step over a predetermined period instead of once. Can be reduced.

In this aspect, the audio control unit changes the time spent for the audio cross-fade processing according to the number of patterns determined to be suitable for the scene of the content to be reproduced when performing the audio cross-fade processing. May be.

According to this aspect, when there are a plurality of patterns to be discriminated and it is impossible to narrow down which pattern is optimal, for example, the time spent for the audio crossfade processing is relatively prolonged, Therefore, it is possible to absorb the uncertainty of pattern discrimination accuracy because it is possible to avoid the rapid voice adjustment of the type corresponding to the pattern.

Further, in the aspect in which the time spent for the audio cross-fading process is changed in this way, the audio control means has two patterns that are determined to be suitable for the scene of the content to be reproduced when performing the audio cross-fading process. In the above case, the time spent for the audio cross-fading process may be made longer than in the case of one pattern.

According to this aspect, as described above, the uncertainty of pattern discrimination accuracy can be absorbed. More specifically, the pattern discrimination accuracy is not always perfect, and the number of patterns may be further narrowed down by the next pattern discrimination result performed regularly or irregularly. Therefore, when the number of patterns that are determined to be suitable for the scene of the content to be played is one, the time spent for audio cross-fading processing remains the standard value, and the type of audio adjustment corresponding to the pattern is not lost. On the other hand, when the number of the patterns is two or more, the time spent for the audio crossfading process is longer than the standard value, and the type corresponding to one of the two or more patterns is used. Audio adjustment is performed relatively slowly. Then, even if it is determined that the content to be reproduced is another pattern by the next pattern determination performed in the middle of the audio crossfading process, an appropriate type corresponding to the other pattern is determined. The sound adjustment is preferably performed on the input sound signal. In this way, when it is impossible to narrow down which pattern is optimal, it is possible to avoid applying the type of sound adjustment corresponding to any pattern rapidly, so the uncertainty of pattern discrimination accuracy Can be absorbed.

Alternatively, in a manner in which the time spent for the audio cross-fading process is changed in this manner, the video adjustment means for performing a predetermined video adjustment on the video signal input to the playback apparatus and the determined pattern It may further comprise video control means for controlling the video adjustment means so as to perform the type of the video adjustment on the input video signal.

According to this aspect, since not only the audio adjustment but also the video adjustment is performed, the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio can be reduced more suitably.

As described above, in the aspect further including the video adjustment unit and the video control unit, the video control unit is configured to calculate, based on the video adjustment parameter applied before the predetermined video adjustment is performed, in the predetermined video adjustment. The video adjusting means may also be controlled so as to perform a video cross-fade process that changes while gradually approaching a predetermined video adjustment parameter.

According to this aspect, since the video cross-fade process is performed at the time of video adjustment, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio.

Thus, in the aspect in which the video crossfading process is performed, the video control unit performs the video crossfading process according to the number of patterns determined to be suitable for the scene of the content to be reproduced. The time spent for the cross-fade process may be changed, and the lower limit value of the time spent for the video cross-fade process may be set longer than the lower limit value of the time spent for the audio cross-fade process.

According to this aspect, in view of the fact that the viewer feels a sense of discomfort when the video changes suddenly compared to when the audio changes suddenly, the lower limit of the time spent on the video crossfade processing is Since it is set longer than the lower limit value of the time spent for processing, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer. For example, when there is one pattern that is determined to be suitable for the scene of the content to be played back, the time spent for the audio crossfading process is relatively shortened compared to the case where there are two or more patterns (that is, On the other hand, the time spent for the video cross-fade processing is longer than the standard value. Thereby, the discomfort felt by the viewer can be more suitably reduced.

In another aspect of the reproduction apparatus according to the present invention, the analysis means analyzes the audio signal for each one or a plurality of audio signal parameters in addition to the video signal input to the reproduction apparatus, and the pattern discrimination means Is a degree of matching between each video signal parameter related to the analyzed video signal and each video signal parameter related to each of the predetermined plurality of patterns, and each audio signal parameter related to the analyzed audio signal Based on the degree of matching with each audio signal parameter relating to each of the predetermined plurality of patterns, a pattern that matches the scene of the content to be reproduced is determined.

According to this aspect, in addition to the video signal input to the playback device, the audio signal is analyzed for each of one or more audio signal parameters. Audio signal parameters include, for example, rhythm, gain for each range, information indicating whether or not a song is being played, information indicating whether or not a human speech is included, and the like. It is a parameter characterizing from The degree of matching between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of the predetermined plural patterns, each audio signal parameter relating to the analyzed audio signal, and the predetermined plural patterns A pattern that matches the scene of the content to be reproduced is determined based on the degree of matching with each audio signal parameter relating to each of the above. In this way, by analyzing the audio signal, it is possible to discriminate a pattern that could not be discriminated to one from the video signal, or a pattern that was discriminated incorrectly, to a smaller number, or appropriate New patterns can be re-determined.

(Playback method)
In order to solve the above problems, the playback method of the present invention is a playback method for playing back content composed of a video signal and an audio signal, and analyzes the input video signal for each of one or more video signal parameters. An analysis step, and a pattern suitable for each scene of the content to be played back, a degree of fitness between each analyzed video signal parameter for each of the analyzed video signals and each of the predetermined video signal parameters for each of a plurality of predetermined patterns A pattern discrimination step for discriminating on the basis of the voice signal, a voice adjustment step for performing a predetermined voice adjustment on the input voice signal, and a voice adjustment of the type corresponding to the determined pattern. A voice control step of controlling the voice adjustment means so as to be applied to the signal.

According to the playback method of the present invention, as in the case of the playback apparatus of the present invention described above, when playing content composed of video signals and audio signals, the viewer feels uncomfortable due to the mismatch between video and audio. It can be suitably reduced.

In the reproduction method of the present invention, various aspects similar to the various aspects of the reproduction apparatus of the present invention described above can be adopted.

(Computer program)
In order to solve the above-described problems, the computer program of the present invention causes a computer to function as the above-described playback device of the present invention (including various aspects thereof). According to the computer program of the present invention, the computer program is stored. If the computer program is read from a recording medium such as a CD-ROM or DVD-ROM to be executed by a computer or executed after being downloaded through a communication means, the above-described book The playback device of the invention (including various aspects thereof) can be constructed relatively easily. As a result, as in the case of the playback apparatus of the present invention described above, when playing back content composed of video signals and audio signals, it is possible to suitably reduce the sense of discomfort felt by the viewer due to incompatibility between video and audio. .

The operation and other advantages of the present invention will be made clear from the embodiments described below.

It is a block diagram which shows the basic composition of the reproducing | regenerating apparatus 2 based on 1st Example. FIG. 6 is a correspondence diagram illustrating a discrimination material for discriminating a scene pattern in the playback apparatus 2 and video adjustment and audio adjustment applied to each pattern according to the first embodiment. It is a block diagram which shows the basic composition of the audio | voice adjustment part 222 based on 1st Example. FIG. 6 is a correspondence diagram illustrating a setting example of audio adjustment parameters applied to each pattern according to the first embodiment. It is a characteristic view which shows the frequency characteristic concerning the audio | voice adjustment applied to each pattern based on 1st Example. It is a flowchart which shows the basic operation | movement of the reproducing | regenerating apparatus 2 based on 1st Example. It is a timing chart which compares and shows a comparative example and the 1st example about sound adjustment corresponding to a result of having distinguished a scene pattern. It is a timing chart which shows the cross fade process with respect to each of image | video adjustment and audio | voice adjustment corresponding to the result which discriminate | determined the pattern of the scene based on 1st Example. It is a characteristic view which shows the audio | voice cross fade process based on 1st Example. It is a flowchart which shows the basic operation | movement of the reproducing | regenerating apparatus 2 based on 2nd Example. It is a timing chart which compares and shows a comparative example, the 1st example, and the 2nd example about the result of discriminating the pattern of a scene, and sound adjustment corresponding to the result. It is a flowchart which shows the basic operation | movement of the reproducing | regenerating apparatus 2 based on 3rd Example. It is a timing chart which compares and compares the 1st Example and the 3rd Example about the voice adjustment corresponding to the result of having distinguished the scene pattern.

Explanation of symbols

Hereinafter, the best mode for carrying out the present invention will be described in order for each embodiment based on the drawings.

(1) First Example The configuration and operation process of the playback apparatus 2 according to the first example will be described with reference to FIGS.

FIG. 1 is a block diagram showing a basic configuration of a playback apparatus 2 according to the first embodiment.

As shown in FIG. 1, the playback device 2 analyzes the video signal and the audio signal input from the source device 1, performs a predetermined adjustment based on the analysis result, and outputs the video signal to the display unit 31. The audio signal is output to the audio output unit 32. Thereby, when reproducing the content composed of the video signal and the audio signal, it is possible to suitably reduce the uncomfortable feeling that the viewer 4 feels due to the mismatch between the video and the audio.

The source device 1 is a device such as a DVD player, a BD player, or a set-top box that acquires video signals and audio signals constituting the content to be played back from a recording medium or a communication network and outputs them to the playback device 2 It is.

The playback device 2 includes the following components, analyzes a video signal and an audio signal input from the source device 1, performs a predetermined adjustment based on the analysis result, and outputs the video signal to the display unit 31, The audio signal is output to the audio output unit 32.

The video extraction unit 211 extracts an input video signal and outputs the video signal to each of the video analysis unit 231 and the video adjustment unit 212 provided in the control unit 23.

The voice extraction unit 221 extracts an input voice signal and outputs the extracted voice signal to each of the voice analysis unit 232 and the voice adjustment unit 222 provided in the control unit 23.

The video analysis unit 231 analyzes the extracted video signal qualitatively or quantitatively for each of one or a plurality of video signal parameters.

The voice analysis unit 232 analyzes the extracted voice signal qualitatively or quantitatively for each of one or a plurality of voice signal parameters.

The storage unit 24 stores the extracted video signal, audio signal, and analysis results thereof as video information and audio information as a history over a predetermined period. In addition, the storage unit 24 stores, as pattern discrimination information, the adaptation value or the adaptation range of each video signal parameter related to each of a plurality of predetermined patterns, which is referred to when the pattern discrimination unit 233 described later performs pattern discrimination. To do.

The pattern discriminating unit 233 discriminates a pattern suitable for each scene of the content to be reproduced, based on at least the analysis result by the video analyzing unit 231 among the video analyzing unit 231 and the audio analyzing unit 232. The pattern discriminating unit 233 compares, for example, the video signal parameters relating to the video signal with the pattern discriminating information stored in the storage unit 24 in advance, and as a result, the pattern having a predetermined number or more of matching parameters or the largest pattern is obtained. It is determined that the video signal is applicable.

The video control unit 234 controls the video adjustment unit 212 so as to apply the type of video adjustment corresponding to the pattern determined by the pattern determination unit 233 to the input video signal.

The audio control unit 235 controls the audio adjustment unit 222 so that the type of audio adjustment corresponding to the pattern determined by the pattern determination unit 233 is performed on the input audio signal.

The operation unit 25 manually instructs the type of audio adjustment and / or video adjustment corresponding to the desired pattern regardless of the pattern determined by the pattern determination unit 233 according to the operation of the viewer 4. Operation buttons for each type are provided so that they can be used.

The display unit 31 is a device that can output a video signal as a video, such as a plasma display or a liquid crystal display, for example, and provides content to the viewer 4 based on the video signal output from the video adjustment unit 212. Display the video.

The audio output unit 32 is a device that can output an audio signal as sound, such as a speaker or a headphone, for example. Based on the audio signal output from the audio adjustment unit 222, the audio output unit 32 outputs content to the viewer 4. Output audio.

Next, the discrimination material for discriminating the scene pattern in the playback apparatus 2 and the video adjustment and audio adjustment applied to each pattern according to the first embodiment will be described with reference to the corresponding diagram of FIG.

As shown in FIG. 2, whether or not the scene pattern is “movie” is determined based on the frame information of the video signal. And if the scene pattern is determined to be “movie”, the video adjustment can be set to enjoy the movie without damaging the atmosphere even in the brightest environment in the house, for example, clear contrast, brightness Adjustments are made to improve overall. In addition to or in place of this, as voice adjustment, an adjustment that achieves both a realistic expression of force and a clear dialogue is performed.

Whether or not the scene pattern is “sports” is determined based on the degree or presence of movement in the video signal or color information (for example, the degree of green color corresponding to lawn). When it is determined that the scene pattern is “sports”, as a video adjustment, a setting for lighting a room at home and watching sports, for example, an adjustment for emphasizing green details is performed. In addition to or in place of this, as the sound adjustment, in order to emphasize the sense of realism, since there are few low and high frequencies, adjustment is made to emphasize the middle and high frequencies, and in addition or alternatively, the frequency characteristics are Adjustments are made to cover many parts, and surrounds are used to increase the presence.

Whether or not the scene pattern is “animation” is determined based on whether or not there are many solid colors. If it is determined that the scene pattern is “animation”, as an image adjustment, a setting assuming that the room is turned on at home and an animation program is viewed, for example, an adjustment for setting a low contrast is performed. Applied. In addition to or in place of this, as a sound adjustment, the basic object is an adjustment that makes a weak sound effect attractive while the solid animation and the dialog are dominant.

Whether or not the scene pattern is “studio” is determined based on whether the skin color ratio is large and the movement is small in the screen area. If it is determined that the scene pattern is “studio”, the video adjustment is performed on the assumption that the room program is turned on at home and the studio program is viewed, for example, an adjustment for setting the contrast low. Is done. In addition to or instead of this, as a sound adjustment, an adjustment for realizing a dialog with high clarity even with a modest volume is performed.

Whether the scene pattern is “live” or not is determined based on whether it is dark overall and there is a light in the center. When it is determined that the scene pattern is “live”, as a video adjustment, an adjustment that emphasizes contrast is performed in order to create a live atmosphere. In addition to or instead of this, as a sound adjustment, an adjustment for emphasizing a low frequency band is performed on music-based content based on the hi-fi concept.

Next, the explanation of the audio adjustment will be supplemented with reference to FIGS. here,
FIG. 3 is a block diagram illustrating a basic configuration of the audio adjustment unit 222 according to the first embodiment.
FIG. 4 is a correspondence diagram illustrating a setting example of a parameter for audio adjustment applied to each pattern according to the first embodiment.

As shown in FIG. 3, for example, as the gain increase / decrease processing for each band, the audio adjustment unit 222 is a low frequency increase / decrease unit 2221 that selectively increases / decreases a low frequency gain among input audio signals. A mid-range increasing / decreasing unit 2222 that selectively increases / decreases the gain and a high-frequency increasing / decreasing unit 2223 that selectively increases / decreases the high-frequency gain are provided.

The gain or applicability of each of the low frequency increase / decrease unit 2221 to high frequency increase / decrease unit 2223 is determined by the audio control unit 235 so as to correspond to the pattern determined by the pattern determination unit 233 as shown in FIG. Settings are controlled. In FIG. 4, the setting value of the audio adjustment parameter applied to each pattern is shown as “h000”, and the larger the setting value, the higher the gain or the applicability.

Then, the adder 2228 integrates the signals output from each of the low frequency increasing / decreasing unit 2221 to the high frequency increasing / decreasing unit 2223.

In addition, the audio adjustment unit 222 enhances the low-frequency range of the audio signal by a special method as a sound processing, a stereoscopic sound field feeling processing unit 2224 that gives a three-dimensional sound field feeling to the integrated audio signal. A special low frequency enhancement processing unit 2225 and a dialogue clear processing unit 2226 for improving the sound quality of the high frequency range of the audio signal so that the speech can be heard clearly. FIG. 3 also shows detailed configurations of these parts.

The three-dimensional sound field feeling processing section 2224, of the left-channel audio signal L _in and right-channel audio signal R _in are respectively input. The input audio signals L _in and R _in are integrated by the adder 22241, the reverberation sound (or various delay signals) is generated by the reverberation sound generator 22242, the reverberation time is expanded, and each part of the component generation processing unit 22243 changed by _{_{_{a 1, a 2, a 3}}} ...... a n and _{_{_{b 1, b 2, b 3}}} ...... b n, spread component is spatial spread of the reverberation sound is adjusted with is generated for the left and right channels. The spread component generated for the left channel and the audio signal output from the variable gain increase / decrease unit 22244 are integrated by the adder 22245 and output as the left channel audio signal L _OUT . Similarly, the spread component generated for the right channel and the audio signal output from the variable gain increasing / decreasing unit 22246 are integrated by the adding unit 22247 and output as the audio signal R _OUT of the right channel. In this way, the three-dimensional sound field feeling processing unit 2224 gives a three-dimensional sound field feeling to the input audio signal.

The special low frequency enhancement processing unit 2225 receives the audio signal In. A bass sound is extracted from the input audio signal In by the bass sound extraction unit 22251, and a harmonic component is generated by the harmonic component generation unit 22252. The generated overtone and the audio signal output from the variable gain increase / decrease unit 22254 are integrated by the adder 22255 and output as an audio signal Out. In this way, the special low frequency enhancement processing unit 2225 enhances the low frequency of the input audio signal by a special method, and improves the low-frequency feeling using the virtual pitch effect.

The speech clear processing unit 2226 receives the left channel audio signal L _in and the right channel audio signal R _{in, respectively} . Here, unlike the background sound, the speech is often localized in the center of the left and right, so the difference between the signal level of the left audio signal and the signal level of the right audio signal in a certain frequency band is very small. In this case, it is known that the sound in the frequency band has a very high probability of being speech indicating speech (see, for example, Japanese Patent Laid-Open No. 2007-158873). Therefore, the input audio signals L _in and R _in are integrated by the adder 22261, and a frequency band having a high probability of indicating a line is specified and filtered by the equalizer 22262. The variable gain increase / decrease units 22263 to 22267 adjust the gain increase / decrease for each of the input audio signals L _in and R _in and the audio signal _in the specific band filtered by the equalizer 22262. Here, the audio signal of the specific band that has been filtered by the equalizer 22262 and whose gain has been adjusted by the variable gain increasing / decreasing unit 22265 includes the audio signals of both the left and right channels. Therefore, for the left channel, when the audio signal in the specific band is added to the audio signal of the left channel in the adder 22268, the audio signal of the right channel whose gain is adjusted by the variable gain increase / decrease unit 22264 is subtracted. As a result, the audio signal L _OUT of the left channel in which the dialogue is emphasized is output. Similarly, the right channel audio signal R _OUT in which the dialogue is emphasized is output for the right channel. In this way, the dialogue clear processing unit 2226 makes it possible to hear the dialogue of the input audio signal clearly.

The gain or applicability in each of the three-dimensional sound field feeling processing unit 2224 to the dialogue clear processing unit 2226 corresponds to the pattern determined by the pattern determining unit 233 as shown in FIG. 4, for example. Setting control is performed by H.235.

The frequency characteristics as shown in FIG. 5 are obtained by the sound adjustment composed of the gain increase / decrease processing for each band and the acoustic processing as described above. here,
FIG. 5 is a characteristic diagram illustrating frequency characteristics related to audio adjustment applied to each pattern according to the first embodiment.

As shown in FIG. 5, the frequency characteristics related to the audio adjustment applied to each pattern are different for each pattern. The setting idea is as shown in FIG. As described above, the audio control unit 235 controls the audio adjustment unit 222 so that the frequency characteristic suitable for the pattern determined by the analysis on the video signal or the like is obtained.

Subsequently, referring to FIGS. 7 to 9 as appropriate,
Based on the flowchart of FIG. 6, the basic operation of the playback apparatus 2 according to the first embodiment will be described.

As shown in FIG. 6, first, the voice control unit 235 resets the voice adjustment parameters (step S10). That is, as shown in FIG. 4, the setting value corresponding to each pattern is returned to the leftmost “standard”. The video analysis unit 231 analyzes the video signal extracted by the video extraction unit 211 qualitatively or quantitatively for each of one or a plurality of video signal parameters (step S20). And based on this analysis result, the pattern discrimination | determination part 233 discriminate | determines the pattern of a scene (step S30).

The audio control unit 235 performs an audio cross-fade process that will be described in detail below in order to reduce discomfort to the viewer 4 (step S40). In addition, the video control unit 235 may perform a video cross fade process. In the audio cross-fade process, first, the audio control unit 235 refers to the correspondence diagram shown in FIG. 4 stored in advance in the storage unit 24, and sets the audio adjustment parameter corresponding to the determined pattern as a target (step) S41). For example, in the correspondence diagram shown in FIG. 4, the target is reset from each value written in the “live” column to each value written in the “studio” column. Then, the sound control unit 235 increases or decreases the sound adjustment parameter stepwise so as to approach the set target (step S42). That is, after slightly increasing / decreasing the audio adjustment parameter, the process waits for a predetermined period (for example, 100 mSec or less) (step S43), and repeats until the current value of the audio adjustment parameter reaches the target setting value (step S44). Note that the determination by the pattern determination unit 233 is preferably performed in parallel even during the audio cross-fading process. As a result, if the determination result changes, the target of the audio adjustment parameter is reset, and the audio cross-fading process is further performed toward the set value.

The effects obtained as a result will be described with reference to FIGS. here,
FIG. 7 is a timing chart showing the comparison between the comparative example and the first embodiment regarding the audio adjustment corresponding to the result of determining the scene pattern.

As shown in the uppermost part of FIG. 7, even if the content to be reproduced is within the same content, for example, a live scene where music is played and a studio scene where performers have a conversation are mixed.

The technology according to the comparative example automatically switches the type of audio adjustment to be applied according to the “program genre” information recorded in advance in the electronic program information regarding the content. Then, as shown in the second row from the top in FIG. 7, it is impossible to distinguish between a live scene and a studio scene in the same content. Therefore, as shown in the third row from the top in FIG. 7, even if the content to be played changes from a live scene to a studio scene, the type of sound adjustment corresponding to the live scene remains applied. Therefore, the viewer 4 feels uncomfortable.

On the other hand, the video analysis unit 231 of the playback device 2 according to the first embodiment qualitatively or quantitatively analyzes the video signal extracted by the video extraction unit 211 for each of one or a plurality of video signal parameters. Analyze (step S20). And based on this analysis result, the pattern discrimination | determination part 233 discriminate | determines the pattern of a scene (step S30). Then, for example, it is analyzed whether or not the skin color ratio of the screen area is larger than a predetermined threshold value. Therefore, as shown in the fourth row from the top in FIG. Even if the speaker is switched up in the studio, the switching can be suitably determined as shown in the fifth row from the top in FIG. Note that it is desirable that the determination cycle be as short as not to give the viewer 4 a sense of incongruity, for example, every 500 msec. As a result, the audio control unit 235 controls the audio adjustment unit 222 so as to perform the type of audio adjustment corresponding to the determined pattern on the input audio signal. As a result, it is possible to suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio.

Subsequently, with reference to FIG. 8 and FIG. 9, the advantages of the crossfade process will be supplemented.

FIG. 8 is a timing chart showing crossfade processing for each of video adjustment and audio adjustment corresponding to the result of determining the scene pattern according to the first embodiment.

As shown in FIG. 8, when it is determined that the scene pattern has changed from live to studio, video cross-fading processing and audio cross-fading processing are performed using predetermined periods T ₁ and T ₂ , respectively. .

FIG. 9 is a characteristic diagram showing the audio cross-fading process according to the first embodiment.

As shown by the thick arrows in FIG. 9, according to the audio cross-fade process, the audio control unit 235 sets the audio adjustment parameters to the target setting values (in this case, the respective “Studio” columns in FIG. 4). The value is increased or decreased stepwise (for example, in three steps) so as to gradually approach the value.

As described above, according to the first embodiment, the sound control unit 235 controls the sound adjustment unit 222 so as to perform sound adjustment of a type corresponding to the determined pattern on the input sound signal. As a result, it is possible to suitably reduce the uncomfortable feeling felt by the viewer due to the mismatch between the video and the audio. Then, when performing different audio adjustments, the audio crossfading process allows the predetermined audio adjustment parameters to be gradually approached over a predetermined period rather than once, so that the viewer 4 feels uncomfortable. Can be reduced. Moreover, since not only audio adjustment but also video adjustment is performed, and video cross-fade processing is performed at the time of the video adjustment, it is possible to more suitably reduce the uncomfortable feeling felt by the viewer 4 due to the mismatch between video and audio. .

(2) Second Example Next, the configuration and operation processing of the playback apparatus 2 according to the second example will be described with reference to FIGS. 10 and 11 in addition to FIGS. The same components as those of the playback apparatus 2 according to the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.

FIG. 10 is a flowchart showing the basic operation of the playback apparatus 2 according to the second embodiment.

As shown in FIG. 10, the playback device 2 according to the second embodiment performs voice analysis (step S21) in addition to video analysis (step S20). According to the voice analysis (step S21), the voice signal includes, for example, rhythm, gain for each range, information indicating whether or not music is being played, information indicating whether or not a human speech is included, and the like. Each voice signal parameter is analyzed. As a result, the accuracy of pattern discrimination is improved as follows.

FIG. 11 is a timing chart showing a comparison between the comparative example, the first example, and the second example regarding the result of determining the scene pattern and the sound adjustment corresponding to the result.

As shown in the first row from the top in FIG. 11, even if the content to be reproduced is in the same content, for example, a live scene where music is played and a studio scene where performers talk are mixed. Yes.

As shown in the second and third stages from the top in FIG. 11, in the comparative example, even if the content to be reproduced changes from a live scene to a studio scene, the audio adjustment is of a type corresponding to the live scene. Is still applied, the viewer 4 feels uncomfortable.

On the other hand, the playback apparatus 2 according to the first embodiment performs video analysis (step S20) as shown in the fourth to sixth stages from the top in FIG. A change from a scene to a studio scene is suitably determined, and a corresponding type of audio adjustment is performed.

However, the playback apparatus 2 according to the first embodiment performs only video analysis (step S20), and at that time, it is analyzed whether or not the skin color ratio of the screen area is greater than a predetermined threshold. Therefore, as shown in the fourth and fifth steps from the top in FIG. 11, there is a possibility that the live scene is erroneously determined to be a studio scene, even though the vocal appearance is simply copied. There is. If audio adjustment is performed based on this determination result, the viewer 4 still feels uncomfortable.

On the other hand, the playback apparatus 2 according to the second embodiment analyzes the audio signal for each audio signal parameter as shown in the seventh and eighth stages from the top in FIG. For example, the voice analysis unit 232 also analyzes whether or not music is being played. Thereby, since it is confirmed that the music is played, it can avoid that the pattern discrimination | determination part 233 mistakenly discriminate | determines that it is a studio scene.

As described above, according to the second embodiment, by analyzing the audio signal, it is possible to discriminate a pattern that cannot be discriminated to be one from the video signal or a pattern that has been discriminated erroneously to a smaller number. Or an appropriate pattern can be re-determined.

(3) Third Example Next, the configuration and operation processing of the playback apparatus 2 according to the third example will be described with reference to FIGS. 12 and 13 in addition to FIGS. here,
FIG. 12 is a flowchart showing the basic operation of the playback apparatus 2 according to the third embodiment.
FIG. 13 is a timing chart showing a comparison between the first embodiment and the third embodiment regarding the audio adjustment corresponding to the result of determining the scene pattern.
The same components as those of the playback apparatus 2 according to the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.

As shown in FIG. 12, in the playback device 2 according to the third embodiment, the pattern discriminating unit 233 discriminates the scene pattern according to the number of patterns discriminated to match the scene of the content to be played back. Change the time spent on voice crossfade processing. Specifically, first, it is determined whether there are two or more patterns that are determined to be suitable for the scene of the content to be reproduced (step S31).

Here, when there is not two or more patterns that are determined to be suitable for the scene of the content to be reproduced and there is only one pattern (step S31: No), as shown in the third row from the top in FIG. setting a predetermined time T ₂ spend subsequent speech crossfading to a predetermined standard value (step S32). As a result, in the subsequent audio crossfading process, the type of audio adjustment corresponding to the pattern is performed without hesitation (steps S40 to S44).

On the other hand, if there are two or more patterns determined to be suitable for the scene of the content to be played back (step S31: Yes), as shown in the fifth row from the top in FIG. the predetermined time _{T 2} spend than the standard value, prolonged by [Delta] T ₂ (step S33). Thereby, the type of sound adjustment corresponding to one of the two or more patterns is relatively slowly performed. Then, even if it is determined that the content to be reproduced is another pattern by the next pattern determination (step S30) performed in the middle of the audio crossfading process, it corresponds to the other pattern. Appropriate types of audio adjustments are preferably applied to the input audio signal.

As described above, according to the third embodiment, there are a plurality number of patterns to be discriminated, if any pattern is not fully aperture or is optimal, for example, the time T ₂ spends audio crossfading relatively long Therefore, it is possible to absorb the uncertainty of the pattern discrimination accuracy because it is avoided that the voice adjustment of the type corresponding to any pattern is performed rapidly.

In addition, in the third embodiment, in view of the discomfort felt by the viewer when the video changes suddenly compared to when the audio changes suddenly, the second and fourth steps from the top of FIG. as shown in the lower limit value of the time T ₁ spent on video cross-fade processing is set to be longer than the lower limit value of the time T ₂ to spend on voice crossfading. Thereby, the discomfort felt by the viewer can be more suitably reduced.

In each embodiment described above,
"Reproducing apparatus 2" is a specific example of "reproducing apparatus" according to the present invention,
The “video analysis unit 231” and / or the “voice analysis unit 232” is a specific example of the “analysis unit” according to the present invention.
The “pattern discrimination unit 233” is a specific example of the “pattern discrimination unit” according to the present invention.
"Voice control unit 235" is a specific example of "voice control means" according to the present invention,
"Video control unit 234" is a specific example of "video control means" according to the present invention,
The “sound adjustment unit 222” is a specific example of the “sound adjustment unit” according to the present invention,
The “image adjusting unit 212” is a specific example of the “image adjusting unit” according to the present invention.
“Step S20” and / or “Step S21” is a specific example of the “analysis step” according to the present invention.
“Step S30” is a specific example of the “pattern discrimination step” according to the present invention.
“Steps S40 to S44” is a specific example of the “voice control step” according to the present invention.

It should be noted that the present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit or idea of the invention that can be read from the claims and the entire specification, and is accompanied by such changes. Moreover, it is included in the technical scope of the present invention.

The reproduction apparatus and method and the computer program according to the present invention are video signals such as a flat display with speakers, an AV (Audio Visual: AV) system, a television apparatus, and a BD (Blu-ray Disc: BD) player. And a playback apparatus and method for playing back content composed of audio signals, and a computer program.

Claims

A playback device for playing back content composed of video and audio signals,
Analyzing means for analyzing the video signal input to the playback apparatus for each of one or a plurality of video signal parameters;
A pattern suitable for each scene of the content to be reproduced is determined based on a degree of conformity between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of a plurality of predetermined patterns. Pattern discrimination means;
Audio adjustment means for performing predetermined audio adjustment on the audio signal input to the playback device;
And a sound control means for controlling the sound adjustment means so that the sound adjustment of the type corresponding to the determined pattern is performed on the input sound signal.
In performing the predetermined sound adjustment on the sound signal, the sound control means, from the sound adjustment parameter applied before performing the predetermined sound adjustment, to the predetermined sound adjustment parameter, The playback apparatus according to claim 1, wherein the sound adjusting means is controlled so as to perform a sound cross-fade process that changes while approaching in steps.
The sound control means, when performing the sound cross-fading process, changes a time spent for the sound cross-fading process according to the number of patterns determined to be suitable for a scene of the content to be reproduced. The playback apparatus according to claim 2.
In performing the audio cross-fading process, the audio control means, when there are two or more patterns determined to be suitable for the scene of the content to be reproduced, compared to the case where there is only one pattern, the audio control means 4. The playback apparatus according to claim 3, wherein a time spent for the cross-fade process is lengthened.
Video adjusting means for performing predetermined video adjustment on the video signal input to the playback device;
The video control means for controlling the video adjustment means so as to apply the video adjustment of the type corresponding to the discriminated pattern to the input video signal. 4. A playback apparatus according to item 3.
In the predetermined video adjustment, the video control means changes from the video adjustment parameter applied before performing the predetermined video adjustment to the predetermined video adjustment parameter while gradually approaching the predetermined video adjustment. 6. The playback apparatus according to claim 5, wherein the video adjusting means is also controlled so as to perform a video cross fade process.
The video control means, when performing the video cross-fade processing, changes the time spent in the video cross-fade processing according to the number of patterns determined to match the scene of the content to be reproduced,
The playback apparatus according to claim 6, wherein the lower limit value of the time spent for the video crossfade process is set longer than the lower limit value of the time spent for the audio crossfade process.
The analysis means analyzes the audio signal for each of one or a plurality of audio signal parameters in addition to the video signal input to the playback device,
The pattern discriminating means includes a degree of matching between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of the predetermined plurality of patterns, and each relating to the analyzed audio signal. The pattern matching the scene of the content to be reproduced is determined based on the degree of matching between the audio signal parameter and each audio signal parameter related to each of the predetermined plurality of patterns. The reproducing apparatus according to item.
A playback method for playing back content composed of video signals and audio signals,
Analyzing the input video signal for each of one or more video signal parameters;
A pattern suitable for each scene of the content to be reproduced is determined based on a degree of conformity between each video signal parameter relating to the analyzed video signal and each video signal parameter relating to each of a plurality of predetermined patterns. A pattern discrimination process;
A sound adjustment step of performing a predetermined sound adjustment on the input sound signal;
And a sound control step of controlling the sound adjustment means so that the sound adjustment of the type corresponding to the determined pattern is performed on the input sound signal.
A computer program for causing a computer to function as the playback device according to claim 1.