WO2011089689A1 - Monitoring device - Google Patents

Monitoring device Download PDF

Info

Publication number
WO2011089689A1
WO2011089689A1 PCT/JP2010/050619 JP2010050619W WO2011089689A1 WO 2011089689 A1 WO2011089689 A1 WO 2011089689A1 JP 2010050619 W JP2010050619 W JP 2010050619W WO 2011089689 A1 WO2011089689 A1 WO 2011089689A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
audio signal
error
feature
decoded
Prior art date
Application number
PCT/JP2010/050619
Other languages
French (fr)
Japanese (ja)
Inventor
浜田高宏
Original Assignee
株式会社K-Will
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社K-Will filed Critical 株式会社K-Will
Priority to JP2011550742A priority Critical patent/JP5435597B2/en
Priority to PCT/JP2010/050619 priority patent/WO2011089689A1/en
Publication of WO2011089689A1 publication Critical patent/WO2011089689A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • H04N17/004Diagnosis, testing or measuring for television systems or their details for digital television systems

Definitions

  • the present invention relates to a monitoring device, and more particularly to a monitoring device suitable for monitoring digital video / audio signals.
  • high-definition video such as high-definition television broadcasting
  • digital video signals related to high-definition broadcasting and the like are often transmitted to each home via a satellite broadcasting or cable TV network.
  • errors may occur due to various causes while the video signal is transmitted.
  • there is a risk of inconveniences such as video freeze, blackout, noise, and audio mute, and countermeasures are required.
  • US Pat. No. 7,605,845 discloses a first feature amount extracted from a video / audio signal before being encoded and a first feature amount extracted from a video / audio signal after being decoded.
  • a detection system that compares two feature quantities in real time and determines that a transmission error has occurred when there is a difference of a predetermined value or more between the first feature quantity and the second feature quantity is disclosed. Yes.
  • the present invention has been made in view of the problems of the prior art, and it is an object of the present invention to provide a detection system capable of quickly detecting an error particularly when an error occurs in a filed video / audio signal. To do.
  • a detection system for detecting an error in a video / audio signal, Extracting a feature value from the video / audio signal before encoding to obtain an original feature value, and embedding it in the video / audio signal; Encoding a video / audio signal in which the original feature amount is embedded; Decoding the encoded video / audio signal; Reading the original feature amount embedded in the decoded video and audio signal; Extracting a feature quantity from the decoded video and audio signal to obtain a decoded feature quantity, and comparing the extracted feature quantity with the original feature quantity; And determining that an error has occurred when there is a difference of a predetermined value or more between the original feature value and the decoded feature value.
  • a detection system that detects an error in an encoded video / audio signal after extracting the feature value from the video / audio signal to obtain an intercept feature value and further embedding the intercept feature value, Decoding the audiovisual signal; Reading the intercept feature amount embedded in the decoded video and audio signal; Extracting a feature value from the decoded video and audio signal to obtain a decoded feature value, and comparing the extracted feature value with the interception feature value; A step of determining that an error has occurred when there is a difference greater than or equal to a predetermined value between the intercept feature quantity and the decode feature quantity.
  • the detection system of the present invention uses the characteristic that the original feature value or the intercept feature value is a simple numerical value, and is less likely to be damaged by encoding or decoding than the video / audio signal. Specifically, if the decoded video / audio signal is not damaged, it should be the same as the video / audio signal before decoding, and therefore, the original feature value or the intercept feature value and the decoded feature value should substantially match. Therefore, if there is a difference of a predetermined value or more between the original feature quantity or the intercept feature quantity and the decoded feature quantity, it is determined that an error has occurred. This eliminates the need to compare the two feature quantities in real time, and if there is a decoded video / audio signal, an error check can be performed. Therefore, various points such as relay points and transmission destinations of the decoded video / audio signal can be used. This makes it possible to check the error of the video / audio signal at the place.
  • the “video / audio signal” generally includes both a video signal (video signal) and an audio signal (audio signal). However, a signal including at least one is sufficient in this specification. Any of raw data and compressed data may be used.
  • the target for extracting the feature amount may be the entire video / audio signal or a part thereof. Furthermore, when comparing feature quantities extracted from a part of the video / audio signal, it is possible to compare the feature quantities of video / audio signals having the same frame number and time.
  • the original feature amount or the intercept feature amount is embedded in the metadata of the encoded video / audio signal.
  • the error is preferably an image freeze phenomenon.
  • the error is preferably a blackout phenomenon.
  • the error is preferably an audio mute phenomenon.
  • the error is preferably a voice failure phenomenon.
  • the error is preferably a video / audio mismatch phenomenon.
  • the error is preferably an illegal frame phenomenon.
  • FIG. 1 is a conceptual diagram of an entire transmission system including a detection system according to the present embodiment.
  • FIG. 2 is a flowchart showing the entire detection system.
  • FIG. 3 is a block diagram showing a configuration of the broadcast terminal 100X.
  • FIG. 4 is a block diagram showing the configuration of the broadcast terminals 100A and 100B.
  • FIG. 5A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 5B is a diagram illustrating a value obtained by taking a difference between two feature amounts.
  • FIG. 5A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 5B is a diagram illustrating a value obtained by taking a difference between two feature amounts.
  • FIG. 5C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B.
  • FIG. 6A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 6B is a diagram illustrating a value obtained by taking a difference between two feature amounts.
  • FIG. 6C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B.
  • FIG. 7A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 7B is a diagram illustrating a value obtained by taking a difference between two feature amounts.
  • FIG. 7C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B.
  • FIG. 8A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 8A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 8B is a diagram illustrating a value obtained by taking a difference between two feature amounts.
  • FIG. 8C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B.
  • FIG. 9A shows the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 9B is a diagram illustrating a value obtained by taking a difference between two feature amounts.
  • FIG. 9C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B.
  • FIG. 10A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcast terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X.
  • FIG. 10B is a diagram illustrating a value obtained by taking a difference between two feature amounts.
  • FIG. 10C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B.
  • FIG. 1 is a conceptual diagram of an entire transmission system including a detection system according to the present embodiment.
  • a video / audio signal including an audio signal and a video signal is encoded and transmitted from a transmission source 10 such as a broadcasting station to transmission destinations 20A and 20B such as a satellite station.
  • the transmission of the video / audio signal is shown as an example performed via the communication satellite S, it is transmitted from the communication terminal 201X of the transmission source 10 to the communication terminals 201A and 201B of the transmission destination via the Internet network INT or the like.
  • Reference numerals 200X, 200A, and 200B denote monitors that display information from the terminals 200X, 200A, and 200B, respectively.
  • FIG. 2 is a flowchart of the entire detection system.
  • a feature amount is extracted from an unencoded video / audio signal such as a movie, a drama, or a news as an original feature amount, and is embedded in the video / audio signal in step S102.
  • the encoded video / audio signal before the feature value is extracted in the broadcasting terminal 100X of the transmission source 10 is input after being input by the communication terminal 201X in step S101, and then the feature value is extracted from the decoded video / audio signal. Then, it is set as an intercept feature amount and embedded in the video / audio signal encoded in step S102.
  • the metadata and its processing method are described in detail in, for example, Japanese Patent Application Laid-Open No. 2008-271414.
  • a feature amount is extracted from a video / audio signal divided along the time axis for each scene change, and an original feature amount or an intercept feature amount is embedded in metadata in association with the position of the time axis.
  • step S103 the video / audio signal in which the original feature amount is embedded is encoded (the video / audio signal in which the interception feature amount is embedded is already encoded).
  • the video / audio signal is filed and stored in the server of the transmission source 10 or the communication terminal 201X, or transmitted to the broadcast terminals 100A and 100B of the transmission destinations 20A and 20B, the communication terminals 201A and 201B of the transmission destinations, etc. Is done.
  • step S104 when the error check of the transmitted video / audio signal is performed in the broadcast terminals 100A and 100B of the transmission destinations 20A and 20B, the communication terminals 201A and 201B of the transmission destination, the encoded video / audio signal is decoded in step S104. To do.
  • step S105 the original feature quantity or the intercept feature quantity embedded in the decoded video / audio signal and the position of the time axis corresponding to the original feature quantity are read out. Since the original feature amount, the intercept feature amount, and the position of the time axis are composed of simple numerical values, they are less likely to be damaged by encoding and decoding than the video / audio signal.
  • step S106 the decoded video / audio signal is divided according to the position of the read time axis, and the respective feature amounts are extracted as decoded feature amounts.
  • step S107 the original at the same time axis position is extracted.
  • the feature amount or the intercept feature amount is compared with the decoded feature amount.
  • step S109 it is determined that an error has occurred in step S108, but if there is no difference greater than or equal to the predetermined value, step S109. It is determined that no error has occurred.
  • FIG. 3 is a block diagram showing a configuration of the broadcast terminal 100X.
  • Left and right audio signals AL and AR among video and audio signals input from a video camera or the like are input to audio input units 101 and 102, and signals output therefrom are input to delay units 103 and 104, respectively.
  • the results calculated by the calculation unit 105 are output as audio feature values (Audio Level, Audio Activity) and output from the broadcast terminals 100X, 100A, 100B to the terminals 201X, 201A, 201B.
  • the video signal VD among the video and audio signals is input to the video input unit 108, and the signals output therefrom are input to the frame memories 109, 110, and 111.
  • the frame memory 109 stores the current frame
  • the frame memory 110 stores the previous frame
  • the frame memory 111 stores the second previous frame.
  • Output signals from the frame memories 109, 110, and 111 are input to the MC calculation unit 112, and the calculation results are output as video feature values (Motion).
  • an output signal from the frame memory 110 is input to the video calculation unit 119.
  • the calculation result of the video calculation unit 119 is output as a video feature amount (Video Level, Video Activity).
  • These output signals are output from the broadcast terminals 100X, 100A, and 100B to the terminals 200X, 201A, and 201B as video feature amounts.
  • “Motion” divides an image frame into small blocks of, for example, 8 pixels ⁇ 8 lines, obtains an average value and variance of 64 pixels for each small block, and calculates the block of the same place before N frames.
  • Video Level is the average value of the pixels included in the image frame.
  • Activity when calculating
  • the variance value may be used.
  • the extracted audio feature quantity and video feature quantity are associated with the position of the time axis as the original feature quantity, and are embedded as metadata of the video / audio signal by the output unit 111. After being encoded, it is distributed to the transmission destination as an individual file. It should be noted that the same function can be provided to the communication terminal 201X as the transmission source, and after the encoded video / audio signal is input and decoded, the interception feature amount can be extracted and embedded in the metadata.
  • FIG. 4 is a block diagram showing the configuration of the broadcast terminals 100A and 100B as transmission destinations.
  • the transmission destination broadcast terminals 100A and 100B are mainly different from the transmission source broadcast terminal 100X in that they have a decoder DEC and a demultiplexer DMP. A description of the common points is omitted.
  • the decoder DEC decodes the video / audio signal. At this time, the original feature amount embedded in the metadata and the position of the time axis corresponding thereto are Is read. Thereafter, the demultiplexer DMP divides the decoded video / audio signal into a video signal and an audio signal. In the same manner as described above, the broadcasting terminals 100A and 100B extract the audio feature amount from the divided audio signal according to the position of the corresponding time axis, and extract the video feature amount from the divided video signal. This is a decoding feature amount.
  • the output unit 150 of the broadcasting terminals 100A and 100B compares the original feature value and the decoded feature value, and if both have a predetermined difference, detects that an error has occurred, and an error has occurred in the metadata. Information indicating this can be written and transmitted back to the transmission source 10.
  • the operator of the transmission source 10 can analyze the cause of the error from the metadata of the video / audio signal transmitted back from the terminals 201A and 201B to the terminal 201X.
  • the destination communication terminals 201A, 201B, etc. also have the same function, decode the encoded video / audio signal transmitted from the communication terminal 201X, obtain the decode feature value, and embed the original feature. It can be compared with the quantity or interception feature quantity.
  • the original feature value of the metadata or the intercept feature value and the decode feature value can be compared and corrected.
  • the original feature amount or the intercept feature amount of the decoded video signal and the original feature amount or the intercept feature amount of the decoded audio signal are extracted, respectively, and a position that greatly changes along the time axis is obtained.
  • the output unit 150 of the 20B broadcasting terminals 100A and 100B or the communication terminals 201A and 201B can relatively shift the video signal and the audio signal.
  • the original feature value or interception feature value of the video signal and the original feature value or interception feature value of the audio signal are embedded and encoded in advance, and the original feature value or the interception feature value is read after decoding.
  • the video signal and the audio signal may be relatively shifted so that the timing signals match each other.
  • FIG. 5A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X
  • FIG. 5C is a diagram decoded by the broadcast terminal 100A or 100B.
  • FIG. 5B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents motion as the feature amount, and the horizontal axis Represents time.
  • the motion is low during the time t1 to t2, but the video before being encoded as shown in FIG. 5A. Also in the video based on the audio signal, the motion is low between the times t1 and t2, and the difference is zero (see FIG. 5B). This occurs because the transmitted video is a still image, and therefore it can be determined that the image freeze phenomenon does not occur.
  • FIG. 6A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X
  • FIG. 6C is a diagram decoded by the broadcast terminal 100A or 100B
  • FIG. 6B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents the video activity as the feature amount. The axis represents time.
  • this Video Activity for example, the following distribution A can be used.
  • V (x, y, z, t) U (x, y, z, t) is not necessarily obtained. However, it can be said that there is no need for correction if the error is such that the viewer does not notice it. However, if the problem is a blackout phenomenon, countermeasures are required.
  • the variance A as the feature quantity of the video signal V (x, y, z, t) can be expressed by the following equation.
  • the average value ave. V can be obtained by the following equation.
  • the blackout can be determined as follows.
  • the variance value is low between the times t1 and t2, but as shown in FIG. 6A, the video / audio signal before being encoded is shown. Also in the video based on, the variance value is low between times t1 and t2, and the difference is zero (see FIG. 6B). This occurs because the transmitted video is a reflection of, for example, the starry sky. Therefore, it can be determined that the image freeze phenomenon has not occurred.
  • the variance value is low during the time t3 to t4, whereas before the encoding, as shown in FIG. 6A.
  • the variance value is high between times t3 and t4, and the difference exceeds the threshold value TH2 (see FIG. 6B). This is due to the occurrence of a blackout phenomenon in which the screen is completely black for some reason in the transmitted video, so that it is possible to effectively detect that an error has occurred.
  • FIG. 7A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X
  • FIG. 7C is decoded by the broadcast terminal 100A or 100B
  • FIG. 7B is a diagram showing a value obtained by taking the difference between two feature amounts
  • the vertical axis represents the audio level as the feature amount.
  • the axis represents time.
  • the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal. For example, in the case of a video signal of 30 frames per second, audio level sampling is preferably performed at 30 Hz.
  • Audio Level is very low during time t1 to t2, but is encoded as shown in FIG. 7A. Even in the audio based on the previous video / audio signal, Audio Level is low between times t1 and t2, and the difference is zero (see FIG. 7B). This is because the original audio level is low in the video / audio signal before encoding, and therefore it can be determined that the audio mute phenomenon does not occur.
  • Audio Level is low during time t3 to t4, whereas before audio is encoded as shown in FIG. 7A.
  • Audio Level is high between times t3 and t4, and the difference exceeds the threshold TH3 (see FIG. 7B). This is due to the occurrence of an audio mute phenomenon in which the audio is interrupted for some reason in the transmitted audio, so that it is possible to effectively detect that an error has occurred.
  • FIG. 8A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X
  • FIG. 8C is decoded by the broadcast terminal 100A or 100B
  • FIG. 8B is a diagram illustrating a value obtained by taking a difference between two feature amounts, and the vertical axis represents an audio level as a feature amount. The axis represents time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal.
  • FIG. 9A is a diagram showing an audio level before encoding extracted by the broadcast terminal 100X corresponding to a video frame.
  • FIG. 9C is a diagram illustrating the decoded audio level extracted by the broadcast terminal 100A or 100B.
  • FIG. 9B is a diagram showing voice advance / delay with respect to time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal.
  • the rising edge of Audio Level with respect to the frame is detected and compared.
  • the audio delay amount with respect to the video exceeds the threshold value TH5 + at times t1 and t3, and the audio advance amount with respect to the video is below the threshold value TH5- at time t2.
  • FIG. 10A is a diagram showing an original feature quantity or an intercept feature quantity embedded in a video / audio signal by the broadcast terminal 100X
  • FIG. 10C is decoded by the broadcast terminal 100A or 100B
  • FIG. 10B is a diagram showing a value obtained by taking a difference between two feature amounts, and a vertical axis indicates a video activity (the above-mentioned variance) as a feature amount.
  • the horizontal axis represents time.
  • the difference between the statistics of the image values is zero.
  • the difference exceeds a predetermined threshold.
  • the difference in the statistic of the pixel value exceeds the threshold value TH6 + between the times t1 and t2, and the difference in the statistic of the pixel value is the threshold value TH6 between the times t3 and t4.
  • the transmission destination terminal 201A or terminal 201B that has determined that an illegal frame phenomenon has occurred can effectively detect that an error has occurred.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Monitoring And Testing Of Transmission In General (AREA)

Abstract

Disclosed is a monitoring device which makes use of the characteristic that original feature quantities are less likely to be damaged by encoding or decoding than video/audio signals since original feature quantities are composed of simple numerical values. Specifically, if no damage has occurred to a decoded video/audio signal, the decoded video/audio signal should be the same as the video/audio signal prior to decoding. Consequently, since the original feature quantity and the decoding feature quantity should be consistent, it is determined that an error has occurred if there is a difference of a prescribed value or greater between the original feature quantity and the decoding feature quantity.

Description

監視装置Monitoring device
 本発明は、監視装置に関し、特にデジタル・ビデオ/オーディオ信号の監視に適した監視装置に関する。 The present invention relates to a monitoring device, and more particularly to a monitoring device suitable for monitoring digital video / audio signals.
 近年の映像処理技術の向上により、ハイビジョンテレビ放送など高画質な映像が放映されるようになってきている。ここで、ハイビジョン放送などにかかるデジタル映像信号は、衛星放送やケーブルTVのネットワークを介して各家庭に伝送されることが多い。しかるに、映像信号が伝送される間に、種々の原因によりエラーが生じることがある。エラーが生じると、映像のフリーズ、ブラックアウト、ノイズ、音声ミュートなどの不具合を招く恐れあり、その対策が必要となっている。 Recently, with the improvement of video processing technology, high-definition video such as high-definition television broadcasting has been shown. Here, digital video signals related to high-definition broadcasting and the like are often transmitted to each home via a satellite broadcasting or cable TV network. However, errors may occur due to various causes while the video signal is transmitted. When an error occurs, there is a risk of inconveniences such as video freeze, blackout, noise, and audio mute, and countermeasures are required.
 これに対し、米国特許第7,605,845号明細書には、エンコードされる前の映像音声信号から抽出される第1の特徴量と、デコードされた後の映像音声信号から抽出される第2の特徴量とをリアルタイムで比較し、前記第1の特徴量と前記第2の特徴量との間に所定値以上の差がある場合、伝送エラーが生じたと判定する検出システムが開示されている。 In contrast, US Pat. No. 7,605,845 discloses a first feature amount extracted from a video / audio signal before being encoded and a first feature amount extracted from a video / audio signal after being decoded. A detection system that compares two feature quantities in real time and determines that a transmission error has occurred when there is a difference of a predetermined value or more between the first feature quantity and the second feature quantity is disclosed. Yes.
 しかしながら、米国特許第7,605,845号明細書に開示された技術では、第1の特徴量と第2の特徴量とをリアルタイムで比較しなくてはならないため、例えばストリーミング配信などの場合には極めて有効であるが、その場合でも第2の特徴量を伝送先に、第1の特徴量とは別にリアルタイムで確実に送信するためのネットワークなどの整備が必要になるという問題がある。加えて、近年では取り扱い容易性を確保するために映像音声信号をファイル化した上で、サーバーに保存したり、相手先に送信などする機会が多くなっているが、それに応じてエンコード時やデコード時にファイルが損傷することも増大している。しかしながら、このようにファイル化された映像音声信号については、デコード時に常に第1,第2の特徴量を準備しておくことはできないため、映像音声信号が破損しているか否かを判別できないという問題がある。尚、データエラーの検出方法としてはサムチェックがあるが、映像音声信号は一般的にデータ量が多いためサムチェックを用いてエラー検出することは困難である。又、映像音声信号はエンコードとデコードを行うと、データの一部が変化することが多いが、デコードされた映像音声信号に基づく映像や音声を人間が視聴して違和感がなければ一般的に正常信号として扱われる。従って、サムチェックのように、エンコード前の映像音声信号と、デコード後の映像音声信号とが完全一致するか否かでエラー検出を行うことは適切でない。 However, in the technique disclosed in US Pat. No. 7,605,845, the first feature quantity and the second feature quantity must be compared in real time. However, even in such a case, there is a problem that it is necessary to develop a network or the like for surely transmitting the second feature quantity to the transmission destination in real time separately from the first feature quantity. In addition, in recent years, video / audio signals have been filed to ensure ease of handling and saved on a server or transmitted to the other party. Sometimes files are getting damaged. However, for the video / audio signal filed in this way, the first and second feature quantities cannot always be prepared at the time of decoding, so it cannot be determined whether or not the video / audio signal is damaged. There's a problem. Although there is a sum check as a data error detection method, it is difficult to detect an error using the sum check because a video / audio signal generally has a large amount of data. In addition, when audio / video signals are encoded and decoded, part of the data often changes, but it is generally normal if there is no sense of incongruity when humans view video and audio based on the decoded audio / video signals. Treated as a signal. Therefore, it is not appropriate to perform error detection based on whether or not the video / audio signal before encoding and the video / audio signal after decoding are completely matched, as in sum check.
 本発明は、かかる従来技術の問題点に鑑みてなされたものであり、特にファイル化された映像音声信号についてエラーが生じた場合に、エラーを迅速に検出できる検出システムを提供することを目的とする。 The present invention has been made in view of the problems of the prior art, and it is an object of the present invention to provide a detection system capable of quickly detecting an error particularly when an error occurs in a filed video / audio signal. To do.
 映像音声信号のエラーを検出する検出システムにおいて、
 エンコード前の映像音声信号から特徴量を抽出してオリジナル特徴量とし、前記映像音声信号に埋め込むステップと、
 前記オリジナル特徴量を埋め込まれた映像音声信号をエンコードするステップと、
 前記エンコードされた映像音声信号をデコードするステップと、
 前記デコードされた映像音声信号に埋め込まれた前記オリジナル特徴量を読み出すステップと、
 前記デコードされた映像音声信号から特徴量を抽出してデコード特徴量とし、前記オリジナル特徴量と比較するステップと、
 前記オリジナル特徴量と前記デコード特徴量との間に所定値以上の差がある場合、エラーが生じたと判定するステップとを有することを特徴とする。
In a detection system for detecting an error in a video / audio signal,
Extracting a feature value from the video / audio signal before encoding to obtain an original feature value, and embedding it in the video / audio signal;
Encoding a video / audio signal in which the original feature amount is embedded;
Decoding the encoded video / audio signal;
Reading the original feature amount embedded in the decoded video and audio signal;
Extracting a feature quantity from the decoded video and audio signal to obtain a decoded feature quantity, and comparing the extracted feature quantity with the original feature quantity;
And determining that an error has occurred when there is a difference of a predetermined value or more between the original feature value and the decoded feature value.
 映像音声信号から特徴量を抽出してインターセプション特徴量とし、更に該インターセプション特徴量を埋め込んだ後にエンコードされた映像音声信号におけるエラーを検出する検出システムにおいて、
 前記映像音声信号をデコードするステップと、
 前記デコードされた映像音声信号に埋め込まれた前記インターセプション特徴量を読み出すステップと、
 前記デコードされた映像音声信号から特徴量を抽出してデコード特徴量とし、前記インターセプション特徴量と比較するステップと、
 前記インターセプション特徴量と前記デコード特徴量との間に所定値以上の差がある場合、エラーが生じたと判定するステップとを有することを特徴とする。
In a detection system that detects an error in an encoded video / audio signal after extracting the feature value from the video / audio signal to obtain an intercept feature value and further embedding the intercept feature value,
Decoding the audiovisual signal;
Reading the intercept feature amount embedded in the decoded video and audio signal;
Extracting a feature value from the decoded video and audio signal to obtain a decoded feature value, and comparing the extracted feature value with the interception feature value;
A step of determining that an error has occurred when there is a difference greater than or equal to a predetermined value between the intercept feature quantity and the decode feature quantity.
 本発明の検出システムは、オリジナル特徴量又はインターセプション特徴量が単純な数値からなるために、映像音声信号に比べてエンコードやデコードしても損傷しにくいという特性を利用するものである。具体的には、デコードされた映像音声信号に損傷が生じていなければ、デコード前の映像音声信号と同じはずであり、従ってオリジナル特徴量又はインターセプション特徴量とデコード特徴量とがほぼ一致するはずであるため、オリジナル特徴量又はインターセプション特徴量とデコード特徴量との間に所定値以上の差がある場合、エラーが生じたと判定するのである。これによりリアルタイムで2つの特徴量を比較する必要はなくなり、デコードされた映像音声信号があればエラーチェックを行えるから、デコードされた映像音声信号の中継点や伝送先など、時間を問わず種々の場所で映像音声信号のエラーチェックが可能となるのである。 The detection system of the present invention uses the characteristic that the original feature value or the intercept feature value is a simple numerical value, and is less likely to be damaged by encoding or decoding than the video / audio signal. Specifically, if the decoded video / audio signal is not damaged, it should be the same as the video / audio signal before decoding, and therefore, the original feature value or the intercept feature value and the decoded feature value should substantially match. Therefore, if there is a difference of a predetermined value or more between the original feature quantity or the intercept feature quantity and the decoded feature quantity, it is determined that an error has occurred. This eliminates the need to compare the two feature quantities in real time, and if there is a decoded video / audio signal, an error check can be performed. Therefore, various points such as relay points and transmission destinations of the decoded video / audio signal can be used. This makes it possible to check the error of the video / audio signal at the place.
 なお、「映像音声信号」とは、一般的には映像信号(ビデオ信号)と音声信号(オーディオ信号)の双方を含むものをいうが、本明細書中では少なくとも一方を含む信号であれば足り、生データ、圧縮データのいずれも問わない。又、特徴量を抽出する対象は、映像音声信号全体でも良いし、その一部でも良い。更に、映像音声信号の一部から抽出した特徴量同士を比較する場合、フレーム番号や時間が一致する映像音声信号同士の特徴量を比較することができる。 The “video / audio signal” generally includes both a video signal (video signal) and an audio signal (audio signal). However, a signal including at least one is sufficient in this specification. Any of raw data and compressed data may be used. The target for extracting the feature amount may be the entire video / audio signal or a part thereof. Furthermore, when comparing feature quantities extracted from a part of the video / audio signal, it is possible to compare the feature quantities of video / audio signals having the same frame number and time.
 エラーが生じたと判定した場合、前記オリジナル特徴量又はインターセプション特徴量に基づいて、前記デコードされた映像音声信号を修復するステップを有すると好ましい。 When it is determined that an error has occurred, it is preferable to have a step of repairing the decoded video / audio signal based on the original feature amount or the intercept feature amount.
 エラーが生じたと判定した場合、エラーに関する情報を前記デコードされた映像音声信号に埋め込むステップを有すると好ましい。 When it is determined that an error has occurred, it is preferable to include a step of embedding information about the error in the decoded video / audio signal.
 前記オリジナル特徴量又はインターセプション特徴量は、前記エンコードされた映像音声信号のメタデータ内に埋め込まれると好ましい。 It is preferable that the original feature amount or the intercept feature amount is embedded in the metadata of the encoded video / audio signal.
 前記エラーは、画像フリーズ現象であると好ましい。 The error is preferably an image freeze phenomenon.
 前記エラーは、ブラックアウト現象であると好ましい。 The error is preferably a blackout phenomenon.
 前記エラーは、音声ミュート現象であると好ましい。 The error is preferably an audio mute phenomenon.
 前記エラーは、音声不良現象であると好ましい。 The error is preferably a voice failure phenomenon.
 前記エラーは、映像音声不整合現象であると好ましい。 The error is preferably a video / audio mismatch phenomenon.
 前記エラーは、不正フレーム現象であると好ましい。 The error is preferably an illegal frame phenomenon.
 前記第1の特徴量と前記第2の特徴量との間に、所定値以上の差がある場合、伝送先に伝送された映像音声信号を補正すると好ましい。 When there is a difference of a predetermined value or more between the first feature value and the second feature value, it is preferable to correct the video / audio signal transmitted to the transmission destination.
図1は、本実施の形態にかかる検出システムを含む伝送システム全体の概念図である。FIG. 1 is a conceptual diagram of an entire transmission system including a detection system according to the present embodiment. 図2は、検出システム全体を示すフローチャート図である。FIG. 2 is a flowchart showing the entire detection system. 図3は、放送端末100Xの構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of the broadcast terminal 100X. 図4は、放送端末100A、100Bの構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of the broadcast terminals 100A and 100B. 図5Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又は通信端末201Xでエンコードされた映像音声信号に埋め込まれたインターセプション特徴量を示す図である。FIG. 5A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. 図5Bは、2つの特徴量の差分をとった値を示す図である。FIG. 5B is a diagram illustrating a value obtained by taking a difference between two feature amounts. 図5Cは、放送端末100A又は100B或いは通信端末201A又は201Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図である。FIG. 5C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B. 図6Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又は通信端末201Xでエンコードされた映像音声信号に埋め込まれたインターセプション特徴量を示す図である。FIG. 6A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. 図6Bは、2つの特徴量の差分をとった値を示す図である。FIG. 6B is a diagram illustrating a value obtained by taking a difference between two feature amounts. 図6Cは、放送端末100A又は100B或いは通信端末201A又は201Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図である。FIG. 6C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B. 図7Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又は通信端末201Xでエンコードされた映像音声信号に埋め込まれたインターセプション特徴量を示す図である。FIG. 7A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. 図7Bは、2つの特徴量の差分をとった値を示す図である。FIG. 7B is a diagram illustrating a value obtained by taking a difference between two feature amounts. 図7Cは、放送端末100A又は100B或いは通信端末201A又は201Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図である。FIG. 7C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B. 図8Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又は通信端末201Xでエンコードされた映像音声信号に埋め込まれたインターセプション特徴量を示す図である。FIG. 8A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. 図8Bは、2つの特徴量の差分をとった値を示す図である。FIG. 8B is a diagram illustrating a value obtained by taking a difference between two feature amounts. 図8Cは、放送端末100A又は100B或いは通信端末201A又は201Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図である。FIG. 8C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B. 図9Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又は通信端末201Xでエンコードされた映像音声信号に埋め込まれたインターセプション特徴量を示すである。FIG. 9A shows the original feature amount embedded in the video / audio signal by the broadcasting terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. 図9Bは、2つの特徴量の差分をとった値を示す図である。FIG. 9B is a diagram illustrating a value obtained by taking a difference between two feature amounts. 図9Cは、放送端末100A又は100B或いは通信端末201A又は201Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図である。FIG. 9C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B. 図10Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又は通信端末201Xでエンコードされた映像音声信号に埋め込まれたインターセプション特徴量を示す図である。FIG. 10A is a diagram illustrating the original feature amount embedded in the video / audio signal by the broadcast terminal 100X or the interception feature amount embedded in the video / audio signal encoded by the communication terminal 201X. 図10Bは、2つの特徴量の差分をとった値を示す図である。FIG. 10B is a diagram illustrating a value obtained by taking a difference between two feature amounts. 図10Cは、放送端末100A又は100B或いは通信端末201A又は201Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図である。FIG. 10C is a diagram illustrating the decoding feature amount extracted from the video / audio signal decoded by the broadcasting terminal 100A or 100B or the communication terminal 201A or 201B.
 以下、実施の形態を参照して本発明を説明する。図1は、本実施の形態にかかる検出システムを含む伝送システム全体の概念図である。図1において、例えば放送局などの伝送元10から、サテライト局などの伝送先20A,20Bに、オーディオ信号とビデオ信号とを含む映像音声信号がエンコードされて伝送される場合を考える。かかる映像音声信号の伝送は、通信衛星Sを介して行う例を示すが、インターネット網INT等を介して、例えば伝送元10の通信端末201Xから、伝送先の通信端末201A、201B等に伝送されて良い。尚、200X、200A、200Bは、それぞれ端末200X、200A、200Bからの情報を表示するモニタである。 Hereinafter, the present invention will be described with reference to embodiments. FIG. 1 is a conceptual diagram of an entire transmission system including a detection system according to the present embodiment. In FIG. 1, for example, consider a case where a video / audio signal including an audio signal and a video signal is encoded and transmitted from a transmission source 10 such as a broadcasting station to transmission destinations 20A and 20B such as a satellite station. Although the transmission of the video / audio signal is shown as an example performed via the communication satellite S, it is transmitted from the communication terminal 201X of the transmission source 10 to the communication terminals 201A and 201B of the transmission destination via the Internet network INT or the like. Good. Reference numerals 200X, 200A, and 200B denote monitors that display information from the terminals 200X, 200A, and 200B, respectively.
 図2は、検出システム全体のフローチャートである。まず、伝送元10の放送端末100Xにおいて、ステップS101で、映画やドラマ、ニュースなどのエンコード前の映像音声信号から特徴量を抽出してオリジナル特徴量とし、ステップS102で映像音声信号に埋め込む。或いは、伝送元10の放送端末100Xにおいて特徴量を抽出する前にエンコードされた映像音声信号を、ステップS101で通信端末201Xにて入力した後デコードし、デコードされた映像音声信号から特徴量を抽出してインターセプション特徴量とし、ステップS102でエンコードされている映像音声信号に埋め込む。かかる場合、オリジナル特徴量又はインターセプション特徴量は、映像音声信号のメタデータに埋め込むのが望ましい。メタデータ及びその処理方法は、例えば特開2008-271414号公報に詳細に記載されている。ここでは例えばシーンチェンジ毎に時間軸に沿って分割した映像音声信号に特徴量を抽出し、その時間軸の位置と対応づけてオリジナル特徴量又はインターセプション特徴量をメタデータに埋め込むものとする。 FIG. 2 is a flowchart of the entire detection system. First, in the broadcasting terminal 100X of the transmission source 10, in step S101, a feature amount is extracted from an unencoded video / audio signal such as a movie, a drama, or a news as an original feature amount, and is embedded in the video / audio signal in step S102. Alternatively, the encoded video / audio signal before the feature value is extracted in the broadcasting terminal 100X of the transmission source 10 is input after being input by the communication terminal 201X in step S101, and then the feature value is extracted from the decoded video / audio signal. Then, it is set as an intercept feature amount and embedded in the video / audio signal encoded in step S102. In such a case, it is desirable to embed the original feature amount or the interception feature amount in the metadata of the video / audio signal. The metadata and its processing method are described in detail in, for example, Japanese Patent Application Laid-Open No. 2008-271414. Here, for example, a feature amount is extracted from a video / audio signal divided along the time axis for each scene change, and an original feature amount or an intercept feature amount is embedded in metadata in association with the position of the time axis.
 次いでステップS103で、オリジナル特徴量を埋め込まれた映像音声信号をエンコードする(インターセプション特徴量を埋め込まれた映像音声信号は既にエンコードされている)。この状態で、映像音声信号はファイル化されて、伝送元10のサーバーや通信端末201Xに保存され、或いは伝送先20A、20Bの放送端末100A、100B、伝送先の通信端末201A、201B等に伝送される。 Next, in step S103, the video / audio signal in which the original feature amount is embedded is encoded (the video / audio signal in which the interception feature amount is embedded is already encoded). In this state, the video / audio signal is filed and stored in the server of the transmission source 10 or the communication terminal 201X, or transmitted to the broadcast terminals 100A and 100B of the transmission destinations 20A and 20B, the communication terminals 201A and 201B of the transmission destinations, etc. Is done.
 一方、伝送先20A、20Bの放送端末100A、100B、伝送先の通信端末201A、201B等において、伝送された映像音声信号のエラーチェックをする場合、ステップS104で、エンコードされた映像音声信号をデコードする。次いで、ステップS105で、デコードされた映像音声信号に埋め込まれたオリジナル特徴量又はインターセプション特徴量と、それに対応する時間軸の位置とを読み出す。オリジナル特徴量又はインターセプション特徴量及び時間軸の位置は単純な数値からなるために、映像音声信号に比べてエンコードやデコードしても損傷しにくい。 On the other hand, when the error check of the transmitted video / audio signal is performed in the broadcast terminals 100A and 100B of the transmission destinations 20A and 20B, the communication terminals 201A and 201B of the transmission destination, the encoded video / audio signal is decoded in step S104. To do. Next, in step S105, the original feature quantity or the intercept feature quantity embedded in the decoded video / audio signal and the position of the time axis corresponding to the original feature quantity are read out. Since the original feature amount, the intercept feature amount, and the position of the time axis are composed of simple numerical values, they are less likely to be damaged by encoding and decoding than the video / audio signal.
 更に、ステップS106で、デコードされた映像音声信号を、読み出した時間軸の位置に応じて分割し、それぞれ特徴量を抽出してデコード特徴量とし、ステップS107で、同一の時間軸の位置におけるオリジナル特徴量又はインターセプション特徴量とデコード特徴量とを比較する。ここで、オリジナル特徴量又はインターセプション特徴量とデコード特徴量との間に所定値以上の差がある場合、ステップS108でエラーが生じたと判定するが、所定値以上の差がない場合、ステップS109でエラーが生じてないと判定する。 Further, in step S106, the decoded video / audio signal is divided according to the position of the read time axis, and the respective feature amounts are extracted as decoded feature amounts. In step S107, the original at the same time axis position is extracted. The feature amount or the intercept feature amount is compared with the decoded feature amount. Here, if there is a difference greater than or equal to a predetermined value between the original feature quantity or the intercept feature quantity and the decoded feature quantity, it is determined that an error has occurred in step S108, but if there is no difference greater than or equal to the predetermined value, step S109. It is determined that no error has occurred.
 図3は、放送端末100Xの構成を示すブロック図である。ビデオカメラ等から入力された映像音声信号のうち左右のオーディオ信号AL、ARは、オーディオ入力部101,102に入力され、そこから出力された信号は、各々ディレイ部103,104に入力され、オーディオ演算部105で演算された結果が、音声の特徴量(Audio Level、Audio Activity)として出力され、放送端末100X,100A、100Bから端末201X、201A、201Bに出力される。ここで、Audio Levelとは、画像の1フレーム(例えば30フレーム/秒 )に含まれる音声サンプリング(48KHz)の値(48000/30=1600個)の絶対値の平均値の値をさす。又、Audio Activityとは、画像の1フレーム(例えば30フレーム/秒)に含まれる音声サンプリング(48KHz)の値(48000/30=1600個)の2乗平均値の値をさす。 FIG. 3 is a block diagram showing a configuration of the broadcast terminal 100X. Left and right audio signals AL and AR among video and audio signals input from a video camera or the like are input to audio input units 101 and 102, and signals output therefrom are input to delay units 103 and 104, respectively. The results calculated by the calculation unit 105 are output as audio feature values (Audio Level, Audio Activity) and output from the broadcast terminals 100X, 100A, 100B to the terminals 201X, 201A, 201B. Here, “Audio Level” refers to an average value of absolute values of audio sampling (48 KHz) values (48000/30 = 1600) included in one frame (for example, 30 frames / second) of an image. Audio Activity refers to the root mean square value of audio sampling (48 KHz) values (48000/30 = 1600) included in one frame (for example, 30 frames / second) of an image.
 一方、映像音声信号のうちビデオ信号VDは、ビデオ入力部108に入力され、そこから出力された信号は、フレームメモリ109、110、111に入力される。フレームメモリ109は現在のフレームを記憶し、フレームメモリ110は、一つ前のフレームを記憶し、フレームメモリ111は、2つ前のフレームを記憶する。 On the other hand, the video signal VD among the video and audio signals is input to the video input unit 108, and the signals output therefrom are input to the frame memories 109, 110, and 111. The frame memory 109 stores the current frame, the frame memory 110 stores the previous frame, and the frame memory 111 stores the second previous frame.
 フレームメモリ109、110,111からの出力信号は、MC演算部112に入力され、その演算結果が映像の特徴量(Motion)として出力される。一方、フレームメモリ110からの出力信号は、ビデオ演算部119に入力される。ビデオ演算部119の演算結果は、映像の特徴量(Video Level、Video Activity)として出力される。これらの出力信号は、映像の特徴量として、放送端末100X,100A、100Bから端末200X、201A、201Bに出力される。ここで、Motionとは、画像フレームを例えば、8画素×8ラインサイズの小ブロックに分けて、この小ブロックごとに、64画素の平均値と分散を求め、Nフレーム前の同じ場所のブロックの平均値と分散値との差で表され、画像の動きを示すものである。但し、Nは通常、1、2、4のいずれかである。又、Video Levelとは、画像フレームに含まれる画素の値の平均値である 。更に、Video Activityとしては、画像に含まれる小ブロックごとに分散を求めたとき、この分散のフレーム内の画素の平均値を用いても良いし、単純に画像フレームに含まれる画素のフレーム内での分散値を用いても良い。 Output signals from the frame memories 109, 110, and 111 are input to the MC calculation unit 112, and the calculation results are output as video feature values (Motion). On the other hand, an output signal from the frame memory 110 is input to the video calculation unit 119. The calculation result of the video calculation unit 119 is output as a video feature amount (Video Level, Video Activity). These output signals are output from the broadcast terminals 100X, 100A, and 100B to the terminals 200X, 201A, and 201B as video feature amounts. Here, “Motion” divides an image frame into small blocks of, for example, 8 pixels × 8 lines, obtains an average value and variance of 64 pixels for each small block, and calculates the block of the same place before N frames. It is represented by the difference between the average value and the variance value and indicates the movement of the image. However, N is usually one of 1, 2, and 4. Video Level is the average value of the pixels included in the image frame. Furthermore, as Video | Activity, when calculating | requiring dispersion | distribution for every small block contained in an image, you may use the average value of the pixel in the frame of this dispersion | distribution, or simply in the frame of the pixel contained in an image frame. The variance value may be used.
 抽出された音声の特徴量と映像の特徴量は、オリジナル特徴量として、その時間軸の位置に対応づけられて、出力部111で、映像音声信号のメタデータとして埋め込まれ、この映像音声信号はエンコードされた後に、個別のファイルとして伝送先に配信されることとなる。尚、伝送元の通信端末201Xに同様の機能を持たせて、エンコードされた映像音声信号を入力してデコードした後に、インターセプション特徴量を抽出し、これをメタデータに埋め込むことができる。 The extracted audio feature quantity and video feature quantity are associated with the position of the time axis as the original feature quantity, and are embedded as metadata of the video / audio signal by the output unit 111. After being encoded, it is distributed to the transmission destination as an individual file. It should be noted that the same function can be provided to the communication terminal 201X as the transmission source, and after the encoded video / audio signal is input and decoded, the interception feature amount can be extracted and embedded in the metadata.
 次に、エラー検出について説明する。図4は、伝送先の放送端末100A,100Bの構成を示すブロック図である。伝送先の放送端末100A,100Bが、伝送元の放送端末100Xと主として異なる点は、デコーダDECとデマルチプレクサDMPを有する点である。共通する点についての説明は省略する。 Next, error detection will be described. FIG. 4 is a block diagram showing the configuration of the broadcast terminals 100A and 100B as transmission destinations. The transmission destination broadcast terminals 100A and 100B are mainly different from the transmission source broadcast terminal 100X in that they have a decoder DEC and a demultiplexer DMP. A description of the common points is omitted.
 放送端末100A、100Bにおいて、エンコードされた映像音声信号を入力した場合、まずデコーダDECが映像音声信号をデコードし、このときメタデータに埋め込まれたオリジナル特徴量と、それに対応する時間軸の位置とを読み出す。その後、デマルチプレクサDMPが、デコードされた映像音声信号を映像信号と音声信号とに分ける。放送端末100A、100Bは、上述と同様にして、対応する時間軸の位置に応じて、分けられた音声信号から音声特徴量を抽出し、且つ分けられた映像信号から映像特徴量を抽出し、これをデコード特徴量とする。 In the broadcasting terminals 100A and 100B, when an encoded video / audio signal is input, first, the decoder DEC decodes the video / audio signal. At this time, the original feature amount embedded in the metadata and the position of the time axis corresponding thereto are Is read. Thereafter, the demultiplexer DMP divides the decoded video / audio signal into a video signal and an audio signal. In the same manner as described above, the broadcasting terminals 100A and 100B extract the audio feature amount from the divided audio signal according to the position of the corresponding time axis, and extract the video feature amount from the divided video signal. This is a decoding feature amount.
 放送端末100A,100Bの出力部150は、オリジナル特徴量とデコード特徴量とを比較して、両者に所定の差がある場合、エラーが生じたことを検出し、メタデータ内にエラーが生じたことを示す情報を書き込み、伝送元10に逆送信できる。 The output unit 150 of the broadcasting terminals 100A and 100B compares the original feature value and the decoded feature value, and if both have a predetermined difference, detects that an error has occurred, and an error has occurred in the metadata. Information indicating this can be written and transmitted back to the transmission source 10.
 伝送元10の作業者は、端末201A、201Bから端末201Xに逆送信された映像音声信号のメタデータから、エラーが生じた原因を解析することができる。 The operator of the transmission source 10 can analyze the cause of the error from the metadata of the video / audio signal transmitted back from the terminals 201A and 201B to the terminal 201X.
 尚、伝送先の通信端末201A、201B等も同様の機能を持たせて、通信端末201Xから伝送された、エンコードされた映像音声信号をデコードし、デコード特徴量を求めて、埋め込まれたオリジナル特徴量又はインターセプション特徴量と比較できる。 The destination communication terminals 201A, 201B, etc. also have the same function, decode the encoded video / audio signal transmitted from the communication terminal 201X, obtain the decode feature value, and embed the original feature. It can be compared with the quantity or interception feature quantity.
 ここで、エラーが映像と音声とのズレなど軽微なものであれば、メタデータのオリジナル特徴量又はインターセプション特徴量とデコード特徴量とを比較して修正を行える。例えばデコードされた映像信号のオリジナル特徴量又はインターセプション特徴量と、デコードされた音声信号のオリジナル特徴量又はインターセプション特徴量とをそれぞれ抽出して、時間軸に沿って大きく変化する位置を求める。映像信号の特徴量が大きく変化するタイミングでは、シーンチェンジが生じることが多いが、音声信号の特徴量が大きく変化するタイミングとは必ずしも一致しない。そこで、映像信号のオリジナル特徴量又はインターセプション特徴量が大きく変化した位置と、音声信号のオリジナル特徴量又はインターセプション特徴量が大きく変化した位置との時間差を求める。次いで、映像信号のデコード特徴量が大きく変化した位置と、音声信号のデコード特徴量が大きく変化した位置との時間差が、オリジナル特徴量又はインターセプション特徴量の時間差に合致するまで、伝送先20A、20Bの放送端末100A,100Bの出力部150、又は通信端末201A,201Bが映像信号と音声信号とを相対的にずらせることができる。或いは、予め映像信号のオリジナル特徴量又はインターセプション特徴量と、音声信号のオリジナル特徴量又はインターセプション特徴量にそれぞれタイミング信号を埋め込んでエンコードし、デコード後にオリジナル特徴量又はインターセプション特徴量を読み出して、タイミング信号同士が合致するように、映像信号と音声信号とを相対的にずらせてもよい。 Here, if the error is a slight error such as a gap between video and audio, the original feature value of the metadata or the intercept feature value and the decode feature value can be compared and corrected. For example, the original feature amount or the intercept feature amount of the decoded video signal and the original feature amount or the intercept feature amount of the decoded audio signal are extracted, respectively, and a position that greatly changes along the time axis is obtained. Although scene changes often occur at the timing when the feature amount of the video signal changes greatly, the timing does not always coincide with the timing when the feature amount of the audio signal changes greatly. Therefore, a time difference between a position where the original feature amount or the interception feature amount of the video signal has changed greatly and a position where the original feature amount or the interception feature amount of the audio signal has changed greatly is obtained. Next, the transmission destination 20A, until the time difference between the position where the decoding feature value of the video signal is greatly changed and the position where the decoding feature value of the audio signal is changed matches the time difference between the original feature value or the interception feature value, The output unit 150 of the 20B broadcasting terminals 100A and 100B or the communication terminals 201A and 201B can relatively shift the video signal and the audio signal. Alternatively, the original feature value or interception feature value of the video signal and the original feature value or interception feature value of the audio signal are embedded and encoded in advance, and the original feature value or the interception feature value is read after decoding. The video signal and the audio signal may be relatively shifted so that the timing signals match each other.
 次に、より具体的なエラーについて説明する。
(1)画像フリーズ現象の検出
 図5Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又はインターセプション特徴量を示す図であり,図5Cは、放送端末100A又は100Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図であり、図5Bは、2つの特徴量の差分をとった値を示す図であり、縦軸は特徴量としてのmotionを表し、横軸は時間を表す。
Next, more specific errors will be described.
(1) Detection of image freeze phenomenon FIG. 5A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 5C is a diagram decoded by the broadcast terminal 100A or 100B. FIG. 5B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents motion as the feature amount, and the horizontal axis Represents time.
 ここで、図5Cに示すように、デコードされた後の映像音声信号に基づく映像においては、時間t1~t2の間は、motionが低いが、図5Aに示すように、エンコードされる前の映像音声信号に基づく映像においても、時間t1~t2の間は、motionが低くなっており、その差分はゼロである(図5B参照)。これは、伝送した映像が静止画であるために生じたものであり、従って画像フリーズ現象は生じていないと判断できる。 Here, as shown in FIG. 5C, in the video based on the decoded video / audio signal, the motion is low during the time t1 to t2, but the video before being encoded as shown in FIG. 5A. Also in the video based on the audio signal, the motion is low between the times t1 and t2, and the difference is zero (see FIG. 5B). This occurs because the transmitted video is a still image, and therefore it can be determined that the image freeze phenomenon does not occur.
 一方、図5Cに示すように、デコードされた後の映像音声信号に基づく映像においては、時間t3~t4の間は、motionが低いのに対し、図5Aに示すように、エンコードされる前の映像音声信号に基づく映像においては、時間t3~t4の間は、motionが高くなっており、その差分は閾値TH1を超えている(図5B参照)。これは、デコードされた映像において、何らかの原因により画像フリーズ現象が生じたことによるものであるので、エラーが生じたことを有効に検出できる。 On the other hand, as shown in FIG. 5C, in the video based on the video / audio signal after decoding, the motion is low during the time t3 to t4, whereas as shown in FIG. In the video based on the video / audio signal, the motion is high between the times t3 and t4, and the difference exceeds the threshold value TH1 (see FIG. 5B). This is because an image freeze phenomenon has occurred for some reason in the decoded video, so that it is possible to effectively detect that an error has occurred.
(2)ブラックアウト現象の検出
 図6Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又はインターセプション特徴量を示す図であり,図6Cは、放送端末100A又は100Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図であり、図6Bは、2つの特徴量の差分をとった値を示す図であり、縦軸は特徴量としてのVideo Activityを表し、横軸は時間を表す。このVideo Activityとしては、例えば以下の分散Aを用いることができる。
(2) Detection of Blackout Phenomenon FIG. 6A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 6C is a diagram decoded by the broadcast terminal 100A or 100B. FIG. 6B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents the video activity as the feature amount. The axis represents time. As this Video Activity, for example, the following distribution A can be used.
 伝送前後におけるビデオ信号(バーチャルビデオ信号のような3次元座標値毎に値を持つ例とする、但しz=0とすれば通常の2次元ビデオ信号になる)を考えたとき、時刻tにおける3次元座標(x、y、z)における伝送前のビデオ信号をV(x、y、z、t)とし、時刻tにおける3次元座標(x、y、z)における伝送後のビデオ信号をU(x、y、z、t)とする。 When considering a video signal before and after transmission (assuming an example having a value for each three-dimensional coordinate value such as a virtual video signal, where z = 0 is a normal two-dimensional video signal), 3 at time t The video signal before transmission at the dimensional coordinates (x, y, z) is V (x, y, z, t), and the video signal after transmission at the three-dimensional coordinates (x, y, z) at time t is U ( x, y, z, t).
 ここで、ビデオ信号を長い距離にわたって伝送すると、信号の欠損、ノイズなど様々な問題が生じる恐れがあるため、必ずしもV(x、y、z、t)=U(x、y、z、t)とはならないが、視聴者が気づかない程度のエラーであれば補正する必要はないといえる。しかしながら、ブラックアウト現象のような不具合であれば、対策が必要である。 Here, if a video signal is transmitted over a long distance, various problems such as signal loss and noise may occur. Therefore, V (x, y, z, t) = U (x, y, z, t) is not necessarily obtained. However, it can be said that there is no need for correction if the error is such that the viewer does not notice it. However, if the problem is a blackout phenomenon, countermeasures are required.
 ビデオ信号V(x、y、z、t)の特徴量としての分散Aは、以下の式で表すことができる。 The variance A as the feature quantity of the video signal V (x, y, z, t) can be expressed by the following equation.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 又、平均値ave.Vは、以下の式で得ることができる。 Also, the average value ave. V can be obtained by the following equation.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 伝送前のビデオ信号V(x、y、z、t)と、伝送後のビデオ信号をU(x、y、z、t)とについて、それぞれ分散Aを求めた上で、その差分を取ることで、以下のようにしてブラックアウトを判別できる。 Obtain the difference after obtaining the variance A for the video signal V (x, y, z, t) before transmission and U (x, y, z, t) for the video signal after transmission. Thus, the blackout can be determined as follows.
 図6Cに示すように、デコードされた後の映像音声信号に基づく映像においては、時間t1~t2の間は、分散値が低いが、図6Aに示すように、エンコードされる前の映像音声信号に基づく映像においても、時間t1~t2の間は、分散値が低くなっており、その差分はゼロである(図6B参照)。これは、伝送した映像が例えば星空等を映したものであるために生じたものであり、従って画像フリーズ現象は生じていないと判断できる。 As shown in FIG. 6C, in the video based on the decoded video / audio signal, the variance value is low between the times t1 and t2, but as shown in FIG. 6A, the video / audio signal before being encoded is shown. Also in the video based on, the variance value is low between times t1 and t2, and the difference is zero (see FIG. 6B). This occurs because the transmitted video is a reflection of, for example, the starry sky. Therefore, it can be determined that the image freeze phenomenon has not occurred.
 一方、図6Cに示すように、デコードされた後の映像音声信号に基づく映像においては、時間t3~t4の間は、分散値が低いのに対し、図6Aに示すように、エンコードされる前の映像音声信号に基づく映像においては、時間t3~t4の間は、分散値が高くなっており、その差分は閾値TH2を超えている(図6B参照)。これは、伝送された映像において、何らかの原因により画面が真っ黒になるブラックアウト現象が生じたことによるものであるので、エラーが生じたことを有効に検出できる。 On the other hand, as shown in FIG. 6C, in the video based on the decoded video / audio signal, the variance value is low during the time t3 to t4, whereas before the encoding, as shown in FIG. 6A. In the video based on the video / audio signal, the variance value is high between times t3 and t4, and the difference exceeds the threshold value TH2 (see FIG. 6B). This is due to the occurrence of a blackout phenomenon in which the screen is completely black for some reason in the transmitted video, so that it is possible to effectively detect that an error has occurred.
(3)音声ミュート現象の検出
 図7Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又はインターセプション特徴量を示す図であり,図7Cは、放送端末100A又は100Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図であり、図7Bは、2つの特徴量の差分をとった値を示す図であり、縦軸は特徴量としてのAudio Levelを表し、横軸は時間を表す。なお、オーディオ信号におけるAudio Levelのサンプリングは、ビデオ信号におけるフレームの周波数で平均化されると好ましい。例えば、1秒間に30フレームのビデオ信号の場合、Audio Levelのサンプリングは30Hzで行うと好ましい。
(3) Detection of Audio Mute Phenomenon FIG. 7A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 7C is decoded by the broadcast terminal 100A or 100B. FIG. 7B is a diagram showing a value obtained by taking the difference between two feature amounts, and the vertical axis represents the audio level as the feature amount. The axis represents time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal. For example, in the case of a video signal of 30 frames per second, audio level sampling is preferably performed at 30 Hz.
 ここで、図7Cに示すように、デコードされた後の映像音声信号に基づく音声においては、時間t1~t2の間は、Audio Levelが非常に低いが、図7Aに示すように、エンコードされる前の映像音声信号に基づく音声においても、時間t1~t2の間は、Audio Levelが低くなっており、その差分はゼロである(図7B参照)。これは、エンコードされる前の映像音声信号において、元々のAudio Levelが低かったものであり、従って音声ミュート現象は生じていないと判断できる。 Here, as shown in FIG. 7C, in the audio based on the decoded video / audio signal, Audio Level is very low during time t1 to t2, but is encoded as shown in FIG. 7A. Even in the audio based on the previous video / audio signal, Audio Level is low between times t1 and t2, and the difference is zero (see FIG. 7B). This is because the original audio level is low in the video / audio signal before encoding, and therefore it can be determined that the audio mute phenomenon does not occur.
 一方、図7Cに示すように、デコードされた後の映像音声信号に基づく音声においては、時間t3~t4の間は、Audio Levelが低いのに対し、図7Aに示すように、エンコードされる前の映像音声信号に基づく音声においては、時間t3~t4の間は、Audio Levelが高くなっており、その差分は閾値TH3を超えている(図7B参照)。これは、伝送された音声において、何らかの原因により音声が途切れる音声ミュート現象が生じたことによるものであるので、エラーが生じたことを有効に検出できる。 On the other hand, in the audio based on the decoded video / audio signal as shown in FIG. 7C, Audio Level is low during time t3 to t4, whereas before audio is encoded as shown in FIG. 7A. In the audio based on the video / audio signal, Audio Level is high between times t3 and t4, and the difference exceeds the threshold TH3 (see FIG. 7B). This is due to the occurrence of an audio mute phenomenon in which the audio is interrupted for some reason in the transmitted audio, so that it is possible to effectively detect that an error has occurred.
(4)音声不良現象の検出
 図8Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又はインターセプション特徴量を示す図であり,図8Cは、放送端末100A又は100Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図であり、図8Bは、2つの特徴量の差分をとった値を示す図であり、縦軸は特徴量としてのAudio Levelを表し、横軸は時間を表す。なお、オーディオ信号におけるAudio Levelのサンプリングは、ビデオ信号におけるフレームの周波数で平均化されると好ましい。
(4) Detection of sound failure phenomenon FIG. 8A is a diagram showing an original feature value or an intercept feature value embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 8C is decoded by the broadcast terminal 100A or 100B. FIG. 8B is a diagram illustrating a value obtained by taking a difference between two feature amounts, and the vertical axis represents an audio level as a feature amount. The axis represents time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal.
 ここで、デコードされた後の映像音声信号に基づくAudio Levelと、エンコードされる前の映像音声信号に基づくAudio Levelとの差分を取ったとき、図8Bに示すように、時間t1~t2の間、及び時間t3~t4の間は、差分が閾値TH4を超えている。これは、伝送された音声において、何らかの原因によりノイズ等が重畳され、音声不良現象が生じたことによるものであるので、エラーが生じたことを有効に検出できる。 Here, when the difference between the audio level based on the decoded video / audio signal and the audio level based on the video / audio signal before being encoded is calculated, as shown in FIG. 8B, during a period of time t1 to t2. , And between times t3 and t4, the difference exceeds the threshold value TH4. This is because noise or the like is superimposed on the transmitted sound for some reason and a sound failure phenomenon occurs, so that it is possible to effectively detect that an error has occurred.
(5)映像音声不整合現象の検出
 図9Aは、ビデオフレームに対応して、放送端末100Xで抽出されたエンコード前のAudio Levelを示す図である。図9Cは、放送端末100A又は100Bで抽出されたデコード後のAudio Levelを示す図である。図9Bは、時間に対して音声の進み/遅れを示す図である。なお、オーディオ信号におけるAudio Levelのサンプリングは、ビデオ信号におけるフレームの周波数で平均化されると好ましい。
(5) Detection of Video / Audio Inconsistency Phenomenon FIG. 9A is a diagram showing an audio level before encoding extracted by the broadcast terminal 100X corresponding to a video frame. FIG. 9C is a diagram illustrating the decoded audio level extracted by the broadcast terminal 100A or 100B. FIG. 9B is a diagram showing voice advance / delay with respect to time. Note that the audio level sampling in the audio signal is preferably averaged at the frame frequency in the video signal.
 ここでは、フレームに対するAudio Levelの立ち上がりを検出して比較する。図9Cに示すように、時間t1、t3においては、映像に対する音声の遅れ量が閾値TH5+を超えており、時間t2においては、映像に対する音声の進み量が閾値TH5-を下回っている。いずれかを検出することで、映像音声不整合現象が生じたと判断した伝送先の端末201A又は端末201Bは、エラーが生じたことを有効に検出し、必要に応じて修復できる。 Here, the rising edge of Audio Level with respect to the frame is detected and compared. As shown in FIG. 9C, the audio delay amount with respect to the video exceeds the threshold value TH5 + at times t1 and t3, and the audio advance amount with respect to the video is below the threshold value TH5- at time t2. By detecting either one, the transmission destination terminal 201A or terminal 201B that has determined that the video / audio mismatch phenomenon has occurred can effectively detect that an error has occurred and can repair it as necessary.
(6)不正フレーム現象の検出
 図10Aは、放送端末100Xで映像音声信号に埋め込まれたオリジナル特徴量又はインターセプション特徴量を示す図であり、図10Cは、放送端末100A又は100Bで、デコードされた映像音声信号から抽出されたデコード特徴量を示す図であり、図10Bは、2つの特徴量の差分をとった値を示す図であり、縦軸は特徴量としてのVideo Activity(上述した分散を用いて良い)を表し、横軸は時間を表す。
(6) Detection of Unauthorized Frame Phenomenon FIG. 10A is a diagram showing an original feature quantity or an intercept feature quantity embedded in a video / audio signal by the broadcast terminal 100X, and FIG. 10C is decoded by the broadcast terminal 100A or 100B. FIG. 10B is a diagram showing a value obtained by taking a difference between two feature amounts, and a vertical axis indicates a video activity (the above-mentioned variance) as a feature amount. The horizontal axis represents time.
 エンコードされる前の映像音声信号に基づく映像と、デコードされた後の映像音声信号に基づく映像とが完全に一致している場合、画像値の統計量の差分を取るとゼロになる。ところが、デコードされた後の映像音声信号中に、1フレームだけ異なる映像の信号が挿入されたような場合、その差分が所定の閾値を超える。 When the video based on the video / audio signal before being encoded and the video based on the video / audio signal after being decoded completely match, the difference between the statistics of the image values is zero. However, when a video signal different by one frame is inserted into the decoded video / audio signal, the difference exceeds a predetermined threshold.
 図10Cに示すように、時間t1~t2の間において、画素値の統計量の差分が閾値TH6+を超えており、また時間t3~t4の間においては、画素値の統計量の差分が閾値TH6-を下回っている。いずれかを検出することで、不正フレーム現象が生じたと判断した伝送先の端末201A又は端末201Bは、エラーが生じたことを有効に検出できる。 As shown in FIG. 10C, the difference in the statistic of the pixel value exceeds the threshold value TH6 + between the times t1 and t2, and the difference in the statistic of the pixel value is the threshold value TH6 between the times t3 and t4. Below-. By detecting either of them, the transmission destination terminal 201A or terminal 201B that has determined that an illegal frame phenomenon has occurred can effectively detect that an error has occurred.

Claims (13)

  1.  映像音声信号のエラーを検出する検出システムにおいて、
     エンコード前の映像音声信号から特徴量を抽出してオリジナル特徴量とし、前記映像音声信号に埋め込むステップと、
     前記オリジナル特徴量を埋め込まれた映像音声信号をエンコードするステップと、
     前記エンコードされた映像音声信号をデコードするステップと、
     前記デコードされた映像音声信号に埋め込まれた前記オリジナル特徴量を読み出すステップと、
     前記デコードされた映像音声信号から特徴量を抽出してデコード特徴量とし、前記オリジナル特徴量と比較するステップと、
     前記オリジナル特徴量と前記デコード特徴量との間に所定値以上の差がある場合、エラーが生じたと判定するステップとを有することを特徴とする検出システム。
    In a detection system for detecting an error in a video / audio signal,
    Extracting a feature value from the video / audio signal before encoding to obtain an original feature value, and embedding it in the video / audio signal;
    Encoding a video / audio signal in which the original feature amount is embedded;
    Decoding the encoded video / audio signal;
    Reading the original feature amount embedded in the decoded video and audio signal;
    Extracting a feature quantity from the decoded video and audio signal to obtain a decoded feature quantity, and comparing the extracted feature quantity with the original feature quantity;
    And a step of determining that an error has occurred when there is a difference of a predetermined value or more between the original feature value and the decoded feature value.
  2.  エラーが生じたと判定した場合、前記オリジナル特徴量に基づいて、前記デコードされた映像音声信号を修復するステップを有することを特徴とする請求項1に記載の検出システム。 2. The detection system according to claim 1, further comprising a step of repairing the decoded video / audio signal based on the original feature amount when it is determined that an error has occurred.
  3.  前記オリジナル特徴量は、前記映像音声信号のメタデータ内に埋め込まれることを特徴とする請求項1又は2に記載の検出システム。 3. The detection system according to claim 1, wherein the original feature amount is embedded in metadata of the video / audio signal.
  4.  映像音声信号から特徴量を抽出してインターセプション特徴量とし、更に該インターセプション特徴量を埋め込んだ後にエンコードされた映像音声信号におけるエラーを検出する検出システムにおいて、
     前記映像音声信号をデコードするステップと、
     前記デコードされた映像音声信号に埋め込まれた前記インターセプション特徴量を読み出すステップと、
     前記デコードされた映像音声信号から特徴量を抽出してデコード特徴量とし、前記インターセプション特徴量と比較するステップと、
     前記インターセプション特徴量と前記デコード特徴量との間に所定値以上の差がある場合、エラーが生じたと判定するステップとを有することを特徴とする検出システム。
    In a detection system that detects an error in an encoded video / audio signal after extracting the feature value from the video / audio signal to obtain an intercept feature value and further embedding the intercept feature value,
    Decoding the audiovisual signal;
    Reading the intercept feature amount embedded in the decoded video and audio signal;
    Extracting a feature value from the decoded video and audio signal to obtain a decoded feature value, and comparing the extracted feature value with the interception feature value;
    And a step of determining that an error has occurred when there is a difference greater than or equal to a predetermined value between the intercept feature quantity and the decode feature quantity.
  5.  エラーが生じたと判定した場合、前記インターセプション特徴量に基づいて、前記デコードされた映像音声信号を修復するステップを有することを特徴とする請求項4に記載の検出システム。 5. The detection system according to claim 4, further comprising a step of repairing the decoded video / audio signal based on the intercept feature amount when it is determined that an error has occurred.
  6.  前記インターセプション特徴量は、前記映像音声信号のメタデータ内に埋め込まれることを特徴とする請求項4又は5に記載の検出システム。 The detection system according to claim 4 or 5, wherein the intercept feature quantity is embedded in metadata of the video / audio signal.
  7.  エラーが生じたと判定した場合、エラーに関する情報を前記前記デコードされた映像音声信号に埋め込むステップを有することを特徴とする請求項1~6のいずれかに記載の検出システム。 7. The detection system according to claim 1, further comprising a step of embedding information relating to an error in the decoded video / audio signal when it is determined that an error has occurred.
  8.  前記エラーは、画像フリーズ現象であることを特徴とする請求項1~7のいずれかに記載の検出システム。 The detection system according to claim 1, wherein the error is an image freeze phenomenon.
  9.  前記エラーは、ブラックアウト現象であることを特徴とする請求項1~7のいずれかに記載の検出システム。 The detection system according to claim 1, wherein the error is a blackout phenomenon.
  10.  前記エラーは、音声ミュート現象であることを特徴とする請求項1~9のいずれかに記載の検出システム。 10. The detection system according to claim 1, wherein the error is an audio mute phenomenon.
  11.  前記エラーは、音声不良現象であることを特徴とする請求項1~9のいずれかに記載の検出システム。 10. The detection system according to claim 1, wherein the error is a voice failure phenomenon.
  12.  前記エラーは、映像音声不整合現象であることを特徴とする請求項1~11のいずれかに記載の検出システム。 12. The detection system according to claim 1, wherein the error is a video / audio mismatch phenomenon.
  13.  前記エラーは、不正フレーム現象であることを特徴とする請求項1~12のいずれかに記載の検出システム。 13. The detection system according to claim 1, wherein the error is an illegal frame phenomenon.
PCT/JP2010/050619 2010-01-20 2010-01-20 Monitoring device WO2011089689A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2011550742A JP5435597B2 (en) 2010-01-20 2010-01-20 Detection method
PCT/JP2010/050619 WO2011089689A1 (en) 2010-01-20 2010-01-20 Monitoring device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2010/050619 WO2011089689A1 (en) 2010-01-20 2010-01-20 Monitoring device

Publications (1)

Publication Number Publication Date
WO2011089689A1 true WO2011089689A1 (en) 2011-07-28

Family

ID=44306513

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2010/050619 WO2011089689A1 (en) 2010-01-20 2010-01-20 Monitoring device

Country Status (2)

Country Link
JP (1) JP5435597B2 (en)
WO (1) WO2011089689A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7508040B2 (en) 2020-04-14 2024-07-01 日本放送協会 Content feature extraction device and program thereof, and monitoring device and program thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000092522A (en) * 1998-09-08 2000-03-31 Tektronix Inc Method and device for analyzing quality of image
JP2003134535A (en) * 2001-10-30 2003-05-09 Nec Eng Ltd Image quality deterioration detection system
WO2007080657A1 (en) * 2006-01-13 2007-07-19 Gaintech Co. Ltd. Monitor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000092522A (en) * 1998-09-08 2000-03-31 Tektronix Inc Method and device for analyzing quality of image
JP2003134535A (en) * 2001-10-30 2003-05-09 Nec Eng Ltd Image quality deterioration detection system
WO2007080657A1 (en) * 2006-01-13 2007-07-19 Gaintech Co. Ltd. Monitor

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7508040B2 (en) 2020-04-14 2024-07-01 日本放送協会 Content feature extraction device and program thereof, and monitoring device and program thereof

Also Published As

Publication number Publication date
JP5435597B2 (en) 2014-03-05
JPWO2011089689A1 (en) 2013-05-20

Similar Documents

Publication Publication Date Title
JP4519405B2 (en) Spatiotemporal channel of image
Oostveen et al. Visual hashing of digital video: applications and techniques
US6553127B1 (en) Method and apparatus for selective block processing
US7130443B1 (en) Watermarking
CA2450463C (en) Apparatus and method for watermarking a digital image
US20030016824A1 (en) Method and apparatus for use of a time-dependent watermark for the purpose of copy protection
US20110280434A1 (en) Method and system for watermakr insertin using video start codes
JP4951521B2 (en) Video fingerprint system, method, and computer program product
US9693078B2 (en) Methods and systems for detecting block errors in a video
WO2010032334A1 (en) Quality index value calculation method, information processing device, dynamic distribution system, and quality index value calculation program
US20100026813A1 (en) Video monitoring involving embedding a video characteristic in audio of a video/audio signal
US20020181706A1 (en) Digital watermark embedding device and digital watermark embedding method
EP1396154B1 (en) Error concealment method and device
KR101200345B1 (en) Block-bases image authentication method using reversible watermarking based on progressive differential histogram
JP4932741B2 (en) Monitoring device
JP5435597B2 (en) Detection method
US9253467B2 (en) Method and apparatus for correcting errors in multiple stream-based 3D images
CN1173559C (en) Method and apparatus for compression compatible video finger printing
US20040091108A1 (en) Insertion of binary messages in video pictures
US8881190B1 (en) Systems, methods, and apparatus for attacking digital watermarks
US20080276290A1 (en) Monitoring Apparatus
GB2348071A (en) Watermarking
CN117751378A (en) Digital watermarking system and method
KR20070026507A (en) An algorithm for reducing artifacts in decoded video
Pelly et al. UMID watermarking for managing metadata in content production

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10843855

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011550742

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10843855

Country of ref document: EP

Kind code of ref document: A1