WO2011089689A1

WO2011089689A1 - Monitoring device

Info

Publication number: WO2011089689A1
Application number: PCT/JP2010/050619
Authority: WO
Inventors: 浜田高宏
Original assignee: 株式会社Ｋ－Ｗｉｌｌ
Priority date: 2010-01-20
Filing date: 2010-01-20
Publication date: 2011-07-28
Also published as: JP5435597B2; JPWO2011089689A1

Abstract

Disclosed is a monitoring device which makes use of the characteristic that original feature quantities are less likely to be damaged by encoding or decoding than video/audio signals since original feature quantities are composed of simple numerical values. Specifically, if no damage has occurred to a decoded video/audio signal, the decoded video/audio signal should be the same as the video/audio signal prior to decoding. Consequently, since the original feature quantity and the decoding feature quantity should be consistent, it is determined that an error has occurred if there is a difference of a prescribed value or greater between the original feature quantity and the decoding feature quantity.

Description

Monitoring device

The present invention relates to a monitoring device, and more particularly to a monitoring device suitable for monitoring digital video / audio signals.

Recently, with the improvement of video processing technology, high-definition video such as high-definition television broadcasting has been shown. Here, digital video signals related to high-definition broadcasting and the like are often transmitted to each home via a satellite broadcasting or cable TV network. However, errors may occur due to various causes while the video signal is transmitted. When an error occurs, there is a risk of inconveniences such as video freeze, blackout, noise, and audio mute, and countermeasures are required.

In contrast, US Pat. No. 7,605,845 discloses a first feature amount extracted from a video / audio signal before being encoded and a first feature amount extracted from a video / audio signal after being decoded. A detection system that compares two feature quantities in real time and determines that a transmission error has occurred when there is a difference of a predetermined value or more between the first feature quantity and the second feature quantity is disclosed. Yes.

However, in the technique disclosed in US Pat. No. 7,605,845, the first feature quantity and the second feature quantity must be compared in real time. However, even in such a case, there is a problem that it is necessary to develop a network or the like for surely transmitting the second feature quantity to the transmission destination in real time separately from the first feature quantity. In addition, in recent years, video / audio signals have been filed to ensure ease of handling and saved on a server or transmitted to the other party. Sometimes files are getting damaged. However, for the video / audio signal filed in this way, the first and second feature quantities cannot always be prepared at the time of decoding, so it cannot be determined whether or not the video / audio signal is damaged. There's a problem. Although there is a sum check as a data error detection method, it is difficult to detect an error using the sum check because a video / audio signal generally has a large amount of data. In addition, when audio / video signals are encoded and decoded, part of the data often changes, but it is generally normal if there is no sense of incongruity when humans view video and audio based on the decoded audio / video signals. Treated as a signal. Therefore, it is not appropriate to perform error detection based on whether or not the video / audio signal before encoding and the video / audio signal after decoding are completely matched, as in sum check.

The present invention has been made in view of the problems of the prior art, and it is an object of the present invention to provide a detection system capable of quickly detecting an error particularly when an error occurs in a filed video / audio signal. To do.

In a detection system for detecting an error in a video / audio signal,
Extracting a feature value from the video / audio signal before encoding to obtain an original feature value, and embedding it in the video / audio signal;
Encoding a video / audio signal in which the original feature amount is embedded;
Decoding the encoded video / audio signal;
Reading the original feature amount embedded in the decoded video and audio signal;
Extracting a feature quantity from the decoded video and audio signal to obtain a decoded feature quantity, and comparing the extracted feature quantity with the original feature quantity;
And determining that an error has occurred when there is a difference of a predetermined value or more between the original feature value and the decoded feature value.

In a detection system that detects an error in an encoded video / audio signal after extracting the feature value from the video / audio signal to obtain an intercept feature value and further embedding the intercept feature value,
Decoding the audiovisual signal;
Reading the intercept feature amount embedded in the decoded video and audio signal;
Extracting a feature value from the decoded video and audio signal to obtain a decoded feature value, and comparing the extracted feature value with the interception feature value;
A step of determining that an error has occurred when there is a difference greater than or equal to a predetermined value between the intercept feature quantity and the decode feature quantity.

The detection system of the present invention uses the characteristic that the original feature value or the intercept feature value is a simple numerical value, and is less likely to be damaged by encoding or decoding than the video / audio signal. Specifically, if the decoded video / audio signal is not damaged, it should be the same as the video / audio signal before decoding, and therefore, the original feature value or the intercept feature value and the decoded feature value should substantially match. Therefore, if there is a difference of a predetermined value or more between the original feature quantity or the intercept feature quantity and the decoded feature quantity, it is determined that an error has occurred. This eliminates the need to compare the two feature quantities in real time, and if there is a decoded video / audio signal, an error check can be performed. Therefore, various points such as relay points and transmission destinations of the decoded video / audio signal can be used. This makes it possible to check the error of the video / audio signal at the place.

The “video / audio signal” generally includes both a video signal (video signal) and an audio signal (audio signal). However, a signal including at least one is sufficient in this specification. Any of raw data and compressed data may be used. The target for extracting the feature amount may be the entire video / audio signal or a part thereof. Furthermore, when comparing feature quantities extracted from a part of the video / audio signal, it is possible to compare the feature quantities of video / audio signals having the same frame number and time.

When it is determined that an error has occurred, it is preferable to have a step of repairing the decoded video / audio signal based on the original feature amount or the intercept feature amount.

When it is determined that an error has occurred, it is preferable to include a step of embedding information about the error in the decoded video / audio signal.

It is preferable that the original feature amount or the intercept feature amount is embedded in the metadata of the encoded video / audio signal.

The error is preferably an image freeze phenomenon.

The error is preferably a blackout phenomenon.

The error is preferably an audio mute phenomenon.

The error is preferably a voice failure phenomenon.

The error is preferably a video / audio mismatch phenomenon.

The error is preferably an illegal frame phenomenon.

When there is a difference of a predetermined value or more between the first feature value and the second feature value, it is preferable to correct the video / audio signal transmitted to the transmission destination.

FIG. 1 is a conceptual diagram of an entire transmission system including a detection system according to the present embodiment. FIG. 2 is a flowchart showing the entire detection system. FIG. 3 is a block diagram showing a configuration of the broadcast terminal 100X. FIG. 4 is a block diagram showing the configuration of the

broadcast terminals

100A and 100B. FIG. 5A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. FIG. 5B is a diagram illustrating a value obtained by taking a difference between two feature amounts. FIG. 5C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the

broadcasting terminal

100A or 100B or the

communication terminal

201A or 201B. FIG. 6A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. FIG. 6B is a diagram illustrating a value obtained by taking a difference between two feature amounts. FIG. 6C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the

broadcasting terminal

100A or 100B or the

communication terminal

201A or 201B. FIG. 7A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. FIG. 7B is a diagram illustrating a value obtained by taking a difference between two feature amounts. FIG. 7C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the

broadcasting terminal

100A or 100B or the

communication terminal

201A or 201B. FIG. 8A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. FIG. 8B is a diagram illustrating a value obtained by taking a difference between two feature amounts. FIG. 8C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the

broadcasting terminal

100A or 100B or the

communication terminal

201A or 201B. FIG. 9A shows the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. FIG. 9B is a diagram illustrating a value obtained by taking a difference between two feature amounts. FIG. 9C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the

broadcasting terminal

100A or 100B or the

communication terminal

201A or 201B. FIG. 10A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcast terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. FIG. 10B is a diagram illustrating a value obtained by taking a difference between two feature amounts. FIG. 10C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the

broadcasting terminal

100A or 100B or the

communication terminal

201A or 201B.

Hereinafter, the present invention will be described with reference to embodiments. FIG. 1 is a conceptual diagram of an entire transmission system including a detection system according to the present embodiment. In FIG. 1, for example, consider a case where a video / audio signal including an audio signal and a video signal is encoded and transmitted from a transmission source 10 such as a broadcasting station to

transmission destinations

20A and 20B such as a satellite station. Although the transmission of the video / audio signal is shown as an example performed via the communication satellite S, it is transmitted from the communication terminal 201X of the transmission source 10 to the

communication terminals

201A and 201B of the transmission destination via the Internet network INT or the like. Good.

Reference numerals

200X, 200A, and 200B denote monitors that display information from the

terminals

200X, 200A, and 200B, respectively.

FIG. 2 is a flowchart of the entire detection system. First, in the broadcasting terminal 100X of the transmission source 10, in step S101, a feature amount is extracted from an unencoded video / audio signal such as a movie, a drama, or a news as an original feature amount, and is embedded in the video / audio signal in step S102. Alternatively, the encoded video / audio signal before the feature value is extracted in the broadcasting terminal 100X of the transmission source 10 is input after being input by the communication terminal 201X in step S101, and then the feature value is extracted from the decoded video / audio signal. Then, it is set as an intercept feature amount and embedded in the video / audio signal encoded in step S102. In such a case, it is desirable to embed the original feature amount or the interception feature amount in the metadata of the video / audio signal. The metadata and its processing method are described in detail in, for example, Japanese Patent Application Laid-Open No. 2008-271414. Here, for example, a feature amount is extracted from a video / audio signal divided along the time axis for each scene change, and an original feature amount or an intercept feature amount is embedded in metadata in association with the position of the time axis.

Next, in step S103, the video / audio signal in which the original feature amount is embedded is encoded (the video / audio signal in which the interception feature amount is embedded is already encoded). In this state, the video / audio signal is filed and stored in the server of the transmission source 10 or the communication terminal 201X, or transmitted to the

broadcast terminals

100A and 100B of the

transmission destinations

20A and 20B, the

communication terminals

201A and 201B of the transmission destinations, etc. Is done.

On the other hand, when the error check of the transmitted video / audio signal is performed in the

broadcast terminals

100A and 100B of the

transmission destinations

20A and 20B, the

communication terminals

201A and 201B of the transmission destination, the encoded video / audio signal is decoded in step S104. To do. Next, in step S105, the original feature quantity or the intercept feature quantity embedded in the decoded video / audio signal and the position of the time axis corresponding to the original feature quantity are read out. Since the original feature amount, the intercept feature amount, and the position of the time axis are composed of simple numerical values, they are less likely to be damaged by encoding and decoding than the video / audio signal.

Further, in step S106, the decoded video / audio signal is divided according to the position of the read time axis, and the respective feature amounts are extracted as decoded feature amounts. In step S107, the original at the same time axis position is extracted. The feature amount or the intercept feature amount is compared with the decoded feature amount. Here, if there is a difference greater than or equal to a predetermined value between the original feature quantity or the intercept feature quantity and the decoded feature quantity, it is determined that an error has occurred in step S108, but if there is no difference greater than or equal to the predetermined value, step S109. It is determined that no error has occurred.

FIG. 3 is a block diagram showing a configuration of the broadcast terminal 100X. Left and right audio signals AL and AR among video and audio signals input from a video camera or the like are input to

audio input units

101 and 102, and signals output therefrom are input to delay

units

103 and 104, respectively. The results calculated by the calculation unit 105 are output as audio feature values (Audio Level, Audio Activity) and output from the

broadcast terminals

100X, 100A, 100B to the

terminals

201X, 201A, 201B. Here, “Audio Level” refers to an average value of absolute values of audio sampling (48 KHz) values (48000/30 = 1600) included in one frame (for example, 30 frames / second) of an image. Audio Activity refers to the root mean square value of audio sampling (48 KHz) values (48000/30 = 1600) included in one frame (for example, 30 frames / second) of an image.

On the other hand, the video signal VD among the video and audio signals is input to the video input unit 108, and the signals output therefrom are input to the

frame memories

109, 110, and 111. The frame memory 109 stores the current frame, the frame memory 110 stores the previous frame, and the frame memory 111 stores the second previous frame.

Output signals from the

frame memories

109, 110, and 111 are input to the MC calculation unit 112, and the calculation results are output as video feature values (Motion). On the other hand, an output signal from the frame memory 110 is input to the video calculation unit 119. The calculation result of the video calculation unit 119 is output as a video feature amount (Video Level, Video Activity). These output signals are output from the

broadcast terminals

100X, 100A, and 100B to the

terminals

200X, 201A, and 201B as video feature amounts. Here, “Motion” divides an image frame into small blocks of, for example, 8 pixels × 8 lines, obtains an average value and variance of 64 pixels for each small block, and calculates the block of the same place before N frames. It is represented by the difference between the average value and the variance value and indicates the movement of the image. However, N is usually one of 1, 2, and 4. Video Level is the average value of the pixels included in the image frame. Furthermore, as Video | Activity, when calculating | requiring dispersion | distribution for every small block contained in an image, you may use the average value of the pixel in the frame of this dispersion | distribution, or simply in the frame of the pixel contained in an image frame. The variance value may be used.

The extracted audio feature quantity and video feature quantity are associated with the position of the time axis as the original feature quantity, and are embedded as metadata of the video / audio signal by the output unit 111. After being encoded, it is distributed to the transmission destination as an individual file. It should be noted that the same function can be provided to the communication terminal 201X as the transmission source, and after the encoded video / audio signal is input and decoded, the interception feature amount can be extracted and embedded in the metadata.

Next, error detection will be described. FIG. 4 is a block diagram showing the configuration of the

broadcast terminals

100A and 100B as transmission destinations. The transmission

destination broadcast terminals

100A and 100B are mainly different from the transmission source broadcast terminal 100X in that they have a decoder DEC and a demultiplexer DMP. A description of the common points is omitted.

In the

broadcasting terminals

100A and 100B, when an encoded video / audio signal is input, first, the decoder DEC decodes the video / audio signal. At this time, the original feature amount embedded in the metadata and the position of the time axis corresponding thereto are Is read. Thereafter, the demultiplexer DMP divides the decoded video / audio signal into a video signal and an audio signal. In the same manner as described above, the

broadcasting terminals

100A and 100B extract the audio feature amount from the divided audio signal according to the position of the corresponding time axis, and extract the video feature amount from the divided video signal. This is a decoding feature amount.

The output unit 150 of the

broadcasting terminals

100A and 100B compares the original feature value and the decoded feature value, and if both have a predetermined difference, detects that an error has occurred, and an error has occurred in the metadata. Information indicating this can be written and transmitted back to the transmission source 10.

The operator of the transmission source 10 can analyze the cause of the error from the metadata of the video / audio signal transmitted back from the

terminals

201A and 201B to the terminal 201X.

The

destination communication terminals

201A, 201B, etc. also have the same function, decode the encoded video / audio signal transmitted from the communication terminal 201X, obtain the decode feature value, and embed the original feature. It can be compared with the quantity or interception feature quantity.

Here, if the error is a slight error such as a gap between video and audio, the original feature value of the metadata or the intercept feature value and the decode feature value can be compared and corrected. For example, the original feature amount or the intercept feature amount of the decoded video signal and the original feature amount or the intercept feature amount of the decoded audio signal are extracted, respectively, and a position that greatly changes along the time axis is obtained. Although scene changes often occur at the timing when the feature amount of the video signal changes greatly, the timing does not always coincide with the timing when the feature amount of the audio signal changes greatly. Therefore, a time difference between a position where the original feature amount or the interception feature amount of the video signal has changed greatly and a position where the original feature amount or the interception feature amount of the audio signal has changed greatly is obtained. Next, the transmission destination 20A, until the time difference between the position where the decoding feature value of the video signal is greatly changed and the position where the decoding feature value of the audio signal is changed matches the time difference between the original feature value or the interception feature value, The output unit 150 of the

20B broadcasting terminals

100A and 100B or the

communication terminals

201A and 201B can relatively shift the video signal and the audio signal. Alternatively, the original feature value or interception feature value of the video signal and the original feature value or interception feature value of the audio signal are embedded and encoded in advance, and the original feature value or the interception feature value is read after decoding. The video signal and the audio signal may be relatively shifted so that the timing signals match each other.

Next, more specific errors will be described.
(1) Detection of image freeze phenomenon FIG. 5A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 5C is a diagram decoded by the

broadcast terminal

100A or 100B. FIG. 5B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents motion as the feature amount, and the horizontal axis Represents time.

Here, as shown in FIG. 5C, in the video based on the decoded video / audio signal, the motion is low during the time t1 to t2, but the video before being encoded as shown in FIG. 5A. Also in the video based on the audio signal, the motion is low between the times t1 and t2, and the difference is zero (see FIG. 5B). This occurs because the transmitted video is a still image, and therefore it can be determined that the image freeze phenomenon does not occur.

On the other hand, as shown in FIG. 5C, in the video based on the video / audio signal after decoding, the motion is low during the time t3 to t4, whereas as shown in FIG. In the video based on the video / audio signal, the motion is high between the times t3 and t4, and the difference exceeds the threshold value TH1 (see FIG. 5B). This is because an image freeze phenomenon has occurred for some reason in the decoded video, so that it is possible to effectively detect that an error has occurred.

(2) Detection of Blackout Phenomenon FIG. 6A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 6C is a diagram decoded by the

broadcast terminal

100A or 100B. FIG. 6B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents the video activity as the feature amount. The axis represents time. As this Video Activity, for example, the following distribution A can be used.

When considering a video signal before and after transmission (assuming an example having a value for each three-dimensional coordinate value such as a virtual video signal, where z = 0 is a normal two-dimensional video signal), 3 at time t The video signal before transmission at the dimensional coordinates (x, y, z) is V (x, y, z, t), and the video signal after transmission at the three-dimensional coordinates (x, y, z) at time t is U ( x, y, z, t).

Here, if a video signal is transmitted over a long distance, various problems such as signal loss and noise may occur. Therefore, V (x, y, z, t) = U (x, y, z, t) is not necessarily obtained. However, it can be said that there is no need for correction if the error is such that the viewer does not notice it. However, if the problem is a blackout phenomenon, countermeasures are required.

The variance A as the feature quantity of the video signal V (x, y, z, t) can be expressed by the following equation.

Also, the average value ave. V can be obtained by the following equation.

Obtain the difference after obtaining the variance A for the video signal V (x, y, z, t) before transmission and U (x, y, z, t) for the video signal after transmission. Thus, the blackout can be determined as follows.

As shown in FIG. 6C, in the video based on the decoded video / audio signal, the variance value is low between the times t1 and t2, but as shown in FIG. 6A, the video / audio signal before being encoded is shown. Also in the video based on, the variance value is low between times t1 and t2, and the difference is zero (see FIG. 6B). This occurs because the transmitted video is a reflection of, for example, the starry sky. Therefore, it can be determined that the image freeze phenomenon has not occurred.

On the other hand, as shown in FIG. 6C, in the video based on the decoded video / audio signal, the variance value is low during the time t3 to t4, whereas before the encoding, as shown in FIG. 6A. In the video based on the video / audio signal, the variance value is high between times t3 and t4, and the difference exceeds the threshold value TH2 (see FIG. 6B). This is due to the occurrence of a blackout phenomenon in which the screen is completely black for some reason in the transmitted video, so that it is possible to effectively detect that an error has occurred.

(3) Detection of Audio Mute Phenomenon FIG. 7A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 7C is decoded by the

broadcast terminal

100A or 100B. FIG. 7B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents the audio level as the feature amount. The axis represents time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal. For example, in the case of a video signal of 30 frames per second, audio level sampling is preferably performed at 30 Hz.

Here, as shown in FIG. 7C, in the audio based on the decoded video / audio signal, Audio Level is very low during time t1 to t2, but is encoded as shown in FIG. 7A. Even in the audio based on the previous video / audio signal, Audio Level is low between times t1 and t2, and the difference is zero (see FIG. 7B). This is because the original audio level is low in the video / audio signal before encoding, and therefore it can be determined that the audio mute phenomenon does not occur.

On the other hand, in the audio based on the decoded video / audio signal as shown in FIG. 7C, Audio Level is low during time t3 to t4, whereas before audio is encoded as shown in FIG. 7A. In the audio based on the video / audio signal, Audio Level is high between times t3 and t4, and the difference exceeds the threshold TH3 (see FIG. 7B). This is due to the occurrence of an audio mute phenomenon in which the audio is interrupted for some reason in the transmitted audio, so that it is possible to effectively detect that an error has occurred.

(4) Detection of sound failure phenomenon FIG. 8A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 8C is decoded by the

broadcast terminal

100A or 100B. FIG. 8B is a diagram illustrating a value obtained by taking a difference between two feature amounts, and the vertical axis represents an audio level as a feature amount. The axis represents time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal.

Here, when the difference between the audio level based on the decoded video / audio signal and the audio level based on the video / audio signal before being encoded is calculated, as shown in FIG. 8B, during a period of time t1 to t2. , And between times t3 and t4, the difference exceeds the threshold value TH4. This is because noise or the like is superimposed on the transmitted sound for some reason and a sound failure phenomenon occurs, so that it is possible to effectively detect that an error has occurred.

(5) Detection of Video / Audio Inconsistency Phenomenon FIG. 9A is a diagram showing an audio level before encoding extracted by the broadcast terminal 100X corresponding to a video frame. FIG. 9C is a diagram illustrating the decoded audio level extracted by the

broadcast terminal

100A or 100B. FIG. 9B is a diagram showing voice advance / delay with respect to time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal.

Here, the rising edge of Audio Level with respect to the frame is detected and compared. As shown in FIG. 9C, the audio delay amount with respect to the video exceeds the threshold value TH5 + at times t1 and t3, and the audio advance amount with respect to the video is below the threshold value TH5- at time t2. By detecting either one, the transmission destination terminal 201A or terminal 201B that has determined that the video / audio mismatch phenomenon has occurred can effectively detect that an error has occurred and can repair it as necessary.

(6) Detection of Unauthorized Frame Phenomenon FIG. 10A is a diagram showing an original feature quantity or an intercept feature quantity embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 10C is decoded by the

broadcast terminal

100A or 100B. FIG. 10B is a diagram showing a value obtained by taking a difference between two feature amounts, and a vertical axis indicates a video activity (the above-mentioned variance) as a feature amount. The horizontal axis represents time.

When the video based on the video / audio signal before being encoded and the video based on the video / audio signal after being decoded completely match, the difference between the statistics of the image values is zero. However, when a video signal different by one frame is inserted into the decoded video / audio signal, the difference exceeds a predetermined threshold.

As shown in FIG. 10C, the difference in the statistic of the pixel value exceeds the threshold value TH6 + between the times t1 and t2, and the difference in the statistic of the pixel value is the threshold value TH6 between the times t3 and t4. Below-. By detecting either of them, the transmission destination terminal 201A or terminal 201B that has determined that an illegal frame phenomenon has occurred can effectively detect that an error has occurred.

Claims

In a detection system for detecting an error in a video / audio signal,
Extracting a feature value from the video / audio signal before encoding to obtain an original feature value, and embedding it in the video / audio signal;
Encoding a video / audio signal in which the original feature amount is embedded;
Decoding the encoded video / audio signal;
Reading the original feature amount embedded in the decoded video and audio signal;
Extracting a feature quantity from the decoded video and audio signal to obtain a decoded feature quantity, and comparing the extracted feature quantity with the original feature quantity;
And a step of determining that an error has occurred when there is a difference of a predetermined value or more between the original feature value and the decoded feature value.
2. The detection system according to claim 1, further comprising a step of repairing the decoded video / audio signal based on the original feature amount when it is determined that an error has occurred.
3. The detection system according to claim 1, wherein the original feature amount is embedded in metadata of the video / audio signal.
In a detection system that detects an error in an encoded video / audio signal after extracting the feature value from the video / audio signal to obtain an intercept feature value and further embedding the intercept feature value,
Decoding the audiovisual signal;
Reading the intercept feature amount embedded in the decoded video and audio signal;
Extracting a feature value from the decoded video and audio signal to obtain a decoded feature value, and comparing the extracted feature value with the interception feature value;
And a step of determining that an error has occurred when there is a difference greater than or equal to a predetermined value between the intercept feature quantity and the decode feature quantity.
5. The detection system according to claim 4, further comprising a step of repairing the decoded video / audio signal based on the intercept feature amount when it is determined that an error has occurred.
The detection system according to claim 4 or 5, wherein the intercept feature quantity is embedded in metadata of the video / audio signal.
7. The detection system according to claim 1, further comprising a step of embedding information relating to an error in the decoded video / audio signal when it is determined that an error has occurred.
The detection system according to claim 1, wherein the error is an image freeze phenomenon.
The detection system according to claim 1, wherein the error is a blackout phenomenon.
10. The detection system according to claim 1, wherein the error is an audio mute phenomenon.
10. The detection system according to claim 1, wherein the error is a voice failure phenomenon.
12. The detection system according to claim 1, wherein the error is a video / audio mismatch phenomenon.
13. The detection system according to claim 1, wherein the error is an illegal frame phenomenon.