JP2005510907A

JP2005510907A - Method and apparatus for forming an audio signal from a video data stream

Info

Publication number: JP2005510907A
Application number: JP2003546577A
Authority: JP
Inventors: ジモンマルクス
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2001-11-22
Filing date: 2002-11-04
Publication date: 2005-04-21
Also published as: WO2003045066A3; WO2003045066A2; EP1446956A2

Abstract

時間的な順序で単独画像（ｉｍｇ）を包含するビデオデータストリームから、音響信号（ｓｎｄ）が形成され、先ず単独画像の画像データから、動きベクトルフィールド（ｖｆｄ）が求められ、例えばこの際ＭＰＥＧアルゴリズムが使用される。続いて、分析モジュール（ＡＡＮ）において動きベクトル（ｖｆｄ）の少なくとも１つの特徴量（ｃｈｓ）、例えば支配的な動きベクトルが導出され、この特徴量（ｃｈｓ）に基づいて統合モジュール（ＳＹＮ）により音響信号（ｓｎｄ）が形成される。 An audio signal (snd) is formed from a video data stream containing a single image (img) in temporal order, and a motion vector field (vfd) is first obtained from the image data of the single image, for example, MPEG algorithm Is used. Subsequently, at least one feature quantity (chs) of the motion vector (vfd), for example, a dominant motion vector is derived in the analysis module (AAN), and the integration module (SYN) performs acoustics based on the feature quantity (chs). A signal (snd) is formed.

Description

本発明は、時間的な順序で単独画像を包含するビデオストリームからの信号、殊に音響信号の形成に関する。 The present invention relates to the formation of a signal, in particular an acoustic signal, from a video stream containing a single image in temporal order.

動き情報の音響信号への変換は、例えば音響的な動き検出器の形態で公知である。例えばこの検出器はビデオカメラを用いて画像を撮像し、画像が変化すると即座に例えば音響アラームのような信号をトリガする。しかしながらそのような動き検出器はある一定の信号しか供給しない。調査される画像領域をそれぞれ１つの異なる信号が対応付けられているフィールドに分割することによって、音響信号の変化をせいぜい実現できるにすぎない。例えば背景に相対的に動く対象が監視されるべき場所の場合には、この種の動き検出器は当然機能を発揮しない。 The conversion of motion information into an acoustic signal is known, for example in the form of an acoustic motion detector. For example, the detector takes an image using a video camera and triggers a signal such as an acoustic alarm as soon as the image changes. However, such motion detectors provide only a certain signal. By dividing the image area to be investigated into fields each associated with one different signal, changes in the acoustic signal can only be realized at best. For example, this type of motion detector will naturally not work if the object moving relative to the background is to be monitored.

動きに依存するサウンドの形成は、部隊芸術の分野においても使用される。例えば人間によって為される身体の動きが解釈され、サウンド生成の制御に使用される。このようにして役者は音楽のリズム及びテーマをその動きによって作り上げることができる。動きとサウンドの相互作用は創造的なプロセスであり、このプロセスによって人間は新たな動きによって常に更なる一連のサウンドを形成しようという気になる。サウンドの形成は通常の場合、人間が動く際に例えば動き検出器またはライトバリアのようなセンサが応答し、センサの信号がデータ処理装置を介して処理され、このようにしてセンサに対応付けられたサウンドが形成される。しかしながらこれらのサウンド形成は正に費用のかかる装置に基づくものであり、これらの装置は複数の記録装置を介して視覚的な入力を受け取り、さらには動作中には固定に据え付けられている。 Movement-dependent sound formation is also used in the field of unit art. For example, body movements made by humans are interpreted and used to control sound generation. In this way, actors can create musical rhythms and themes through their movements. The interaction of movement and sound is a creative process, which inspires humans to always make a series of additional sounds with new movements. Sound formation is usually performed by a sensor such as a motion detector or light barrier that responds when a person moves, and the sensor signal is processed through the data processor and thus associated with the sensor. A sound is formed. However, these sound formations are based on costly devices that receive visual input via a plurality of recording devices and are fixedly installed during operation.

本発明の課題は、１つのカメラを介して記録される動き情報をどのようにしてサウンドパターンに変換できるかという手段を示すことである。ここで、形成されるサウンドは動きの種類、殊に動きの方向によって可変であるべきである。そのようなサウンド形成は殊にカメラが取り付けられている移動電話の機能として統合されるべきである。 An object of the present invention is to show a means of how motion information recorded via one camera can be converted into a sound pattern. Here, the sound to be formed should be variable depending on the type of movement, in particular the direction of movement. Such sound formation should be integrated especially as a function of a mobile phone with a camera attached.

この課題は本発明によれば、以下のステップを有する、冒頭で述べたような方法により解決される：
ａ）先行及び／又は後続の単独画像の画像データを用いて、単独画像の画像データから動きベクトルフィールドを求め、
ｂ）動きベクトルフィールドから少なくとも１つの特徴量を導出し、
ｃ）１つまたは複数の特徴量に依存して音響信号を形成する。 This problem is solved according to the invention by a method as described at the outset with the following steps:
a) using the image data of the preceding and / or subsequent single images to determine a motion vector field from the image data of the single images;
b) Deriving at least one feature quantity from the motion vector field;
c) An acoustic signal is formed depending on one or more feature quantities.

本発明による方法には、先行及び／又は後続の単独画像の画像データを用いて、単独画像の画像データから動きベクトルフィールドを求める手段を備えた制御装置を有する装置に適しており、この制御装置は付加的に少なくとも１つの特徴量を動きベクトルフィールドから導出し、音響信号を複数の特徴量または少なくとも１つの特徴量に依存して形成するために構成されている。 The method according to the present invention is suitable for an apparatus having a control device having means for obtaining a motion vector field from image data of a single image using image data of a preceding and / or subsequent single image. Is additionally configured to derive at least one feature quantity from the motion vector field and form an acoustic signal depending on a plurality of feature quantities or at least one feature quantity.

本発明による解決手段では、動きの存在だけではなく、記録された動きの大きさ及び／又は方向にも起因するサウンドないし音響信号を形成する。これによって、形成されるサウンドを異なる構成にすることができる。監視の場合にはこのことは例えば、この音響信号は例えば種々の動き経過の区別を実現する音響信号を介する画像領域の種々の監視を可能にする。 The solution according to the invention forms a sound or acoustic signal that is not only due to the presence of movement but also due to the magnitude and / or direction of the recorded movement. Thereby, the sound to be formed can be configured differently. In the case of monitoring, this means, for example, that this acoustic signal makes it possible to monitor various image regions via acoustic signals, for example, which realize differentiating movements.

有利な実施形態においては本発明による解決手段は、遠隔通信端末機器、殊に移動電話によって実現されている。移動電話では既に頻繁に画像処理機能が設けられているので、この移動電話においては本発明の実施は殊に好適であり、コンパクトに実現することができる。この場合好適には、ビデオデータストリームを端末機器のカメラ装置によって形成することができる。更には音響信号を端末機器のオーディオ装置を介して出力することができるか、端末機器を用いて確立された移動遠隔通信コネクションを介して出力することもできる。 In an advantageous embodiment, the solution according to the invention is realized by a telecommunications terminal device, in particular a mobile telephone. Since the mobile phone is already frequently provided with an image processing function, the implementation of the present invention is particularly suitable for this mobile phone and can be realized in a compact manner. In this case, preferably, the video data stream can be formed by the camera device of the terminal device. Furthermore, the acoustic signal can be output via an audio device of the terminal device, or can be output via a mobile telecommunications connection established using the terminal device.

それに加え、殊に移動電話の場合において、ステップａ）において動きベクトルフィールドがそれ自体公知のＭＰＥＧエンコーダ方式を用いて求められる場合には有利である。 In addition, it is advantageous if the motion vector field is determined in step a) using a known MPEG encoder scheme, in particular in the case of mobile telephones.

ステップｂ）における簡単に実現すべき動きベクトルフィールドの評価は、例えば統計的な方式を使用して分布が導出され、この導出に対して統計的な特性量が求められ、この特性量から少なくとも１つの特徴量が検出されることにある。 In the evaluation of the motion vector field to be simply realized in step b), for example, a distribution is derived using a statistical method, and a statistical characteristic amount is obtained for the derivation, and at least 1 is calculated from this characteristic amount. One feature amount is to be detected.

更なる利点と共に本発明を以下では、図面に示されている非制限的な実施例に基づき詳細に説明する。図面には概略図が示されており、
図１は、実施例による移動電話の正面図を示し、
図２は、図１の移動電話の背面図を示し、
図３は、図１の移動電話のブロック図を示し、
図４は、ＭＰＥＧエンコーダのブロック図を示し、
図５は、動き評価によって得られたベクトルフィールドの一例を示し、
図６は、図５のベクトルフィールドから導出された動きヒストグラムを示す。 The invention, together with further advantages, will be described in detail below on the basis of the non-limiting examples shown in the drawings. The drawing shows a schematic diagram,
FIG. 1 shows a front view of a mobile phone according to an embodiment,
FIG. 2 shows a rear view of the mobile phone of FIG.
FIG. 3 shows a block diagram of the mobile phone of FIG.
FIG. 4 shows a block diagram of an MPEG encoder,
FIG. 5 shows an example of a vector field obtained by motion evaluation,
FIG. 6 shows a motion histogram derived from the vector field of FIG.

前もって述べておくが、以下に説明する実施例は単に一例として役に立つに過ぎず、上記で述べた本発明はこの実施例に制限されると解されるものではない。 As mentioned in advance, the embodiment described below is merely useful as an example, and the present invention described above is not to be construed as being limited to this embodiment.

図１及び図２は移動電話ＭＯＧを示し、この移動電話ＭＯＧは本発明によれば、周囲から記録した動きを音響信号に変換する（音響的なカレードスコープ）。電話ＭＯＧはケーシングにおける可視の特徴として公知のようにマイクロフォンＭＩＣと、スピーカＬＳＰと、操作命令及び電話番号を入力するための入力フィールドＥＩＮ（例えばキーボード）とを有し、またこれに加えＬＣＤディスプレイのようなスクリーンの形態の出力部ＤＩＳを有し、この出力部ＤＩＳにビデオ画像を表示することができる。ビデオ画像ｉｍｇは殊に背面（図２）に設けられているカメラモジュールＣＡＭに由来し、このカメラモジュールＣＡＭは周囲から画像を記録するために使用され、これらの画像を本発明による画像データ処理装置に供給する。同様に移動電話ＭＯＧの背面には公知のようにバッテリ及びＳＩＭカードのための区画ＣＡＣが設けられている。 1 and 2 show a mobile telephone MOG, which according to the invention converts motion recorded from the surroundings into an acoustic signal (acoustic kaleidoscope). The telephone MOG has a microphone MIC, a speaker LSP, and an input field EIN (for example, a keyboard) for inputting operation commands and a telephone number, as well known as visible features in the casing. An output unit DIS in the form of a screen as described above is provided, and a video image can be displayed on the output unit DIS. The video image img originates in particular from the camera module CAM provided on the back (FIG. 2), which camera module CAM is used for recording images from the surroundings, and these images are used as an image data processing device according to the invention. To supply. Similarly, a compartment CAC for the battery and the SIM card is provided on the back of the mobile phone MOG as is well known.

図３のブロック図は電話ＭＯＧの構成要素を示す。公知のように、入出力素子ＬＳＰ、ＭＩＣ、ＥＩＮ及びディスプレイＤＩＳの他に、移動遠隔通信機能を実施するためのアンテナＡＮＮ及び送受信装置ＳＥＥが設けられており、さらにはユーザが入力部ＥＩＮを介して入力した命令を解釈し、装置ＳＥＥを相応に制御するための制御装置としてプロセッサＰＲＣが設けられている。 The block diagram of FIG. 3 shows the components of the telephone MOG. As is well known, in addition to the input / output elements LSP, MIC, EIN, and the display DIS, an antenna ANN and a transmission / reception device SEE for carrying out a mobile telecommunications function are provided, and the user further inputs via the input unit EIN. A processor PRC is provided as a control device for interpreting the input commands and controlling the device SEE accordingly.

さらにプロセッサＰＲＣは本発明によれば、カメラモジュールの画像データｉｍｇを処理するために構成されており、この画像データｉｍｇから、以下説明するようにして、ビデオコーディング及び動きフィールド分析を用いてサウンドｓｎｄが形成される。画像ｉｍｇの動き情報からサウンドｓｎｄを形成するための機能は、実質的に以下のステップを包含する：
ａ）所属の動きベクトルフィールドを検出するための、例えばＭＰＥＧアルゴリズムを用いるエンコードモジュールＥＮＣによる画像情報ｉｍｇのエンコーディング、
ｂ）プロセッサシステムＰＲＣの分析モジュールＡＡＮにおいての所定の特徴量、例えば支配的な動きベクトルについてのベクトルフィールドｖｆｄの分析、
ｃ）ここで示す実施例においては同様にプロセッサシステムＰＲＣにおいて実現されている合成モジュールＳＹＮによる、量ｃｈｓに基づくサウンドの形成、例えば支配的な動きベクトルの配向及び値の関数としてのサウンドの形成。 Furthermore, the processor PRC is configured according to the invention to process the image data img of the camera module, from which the sound snd using video coding and motion field analysis as described below. Is formed. The function for forming the sound snd from the motion information of the image img substantially includes the following steps:
a) Encoding of image information img by an encoding module ENC using, for example, an MPEG algorithm for detecting a motion vector field to which it belongs,
b) Analysis of a predetermined feature quantity in the analysis module AAN of the processor system PRC, for example the vector field vfd for the dominant motion vector,
c) In the embodiment shown here, the synthesis module SYN, also implemented in the processor system PRC, forms a sound based on the quantity chs, for example the formation of the sound as a function of the dominant motion vector orientation and value.

画像情報をエンコーディングするステップａ）は例えば公知のＭＰＥＧエンコーディングを用いて行われる。このステップはディジタル化された一連の画像を圧縮及び伝送するための標準的な方法であり、例えばビデオ電話を目的としたカメラを備えた移動電話において既に様々に実現されている。本方法における本質的な構成要素は、連続する画像の動き評価部ＢＳ（下記参照）と、動き補償部ＢＫと、データ低減部ＤＫと接続されている、動き補償された画像を離散コサイン変換部（ＤＣＴ）を用いて周波数領域に変換する変換部である。本方法の原理は図４に見て取れる。これについての詳細は「Digital Signal Processing for Multimedia Systems」Keshab K. Parhi及びTakao Nishitani (Hrsg.), Marcel Dekker, Inc., New York、第３１頁から３７頁に示されている。 Step a) of encoding the image information is performed using, for example, known MPEG encoding. This step is a standard method for compressing and transmitting a digitized sequence of images and has already been implemented in various ways in mobile phones with cameras for videophone purposes, for example. The essential components in the present method are a motion-evaluation unit BS (see below) for continuous images, a motion compensation unit BK, and a data reduction unit DK connected to a motion-compensated image as a discrete cosine transform unit. It is a conversion part which converts into a frequency domain using (DCT). The principle of the method can be seen in FIG. Details on this are given in “Digital Signal Processing for Multimedia Systems”, Keshab K. Parhi and Takao Nishitani (Hrsg.), Marcel Dekker, Inc., New York, pages 31-37.

本発明により音響信号を導出するためには、もっともＭＰＥＧ方式により圧縮された画像よりも、ＭＰＥＧ方式の一部である動き評価部ＢＳの結果が必要である。動き評価部ＢＳに対する入力は、時点ｔ_ｎにおいて評価すべき画像の他に、この画像に先行する時点ｔ_ｎ−１における画像ｉｍｐである。画像ｉｍｐは、先行する画像のＤＣＴ変換されて動き補償された信号に逆ＤＣＴ（ｉＤＣＴ）を実施することにより得られるか、画像記憶装置ＩＳを用いて画像交換の期間にわたり中間記憶されることによって得られる。 In order to derive an audio signal according to the present invention, the result of the motion evaluation unit BS, which is a part of the MPEG system, is required rather than the image compressed by the MPEG system. In addition to the image to be evaluated at time t _n , the input to the motion evaluation unit BS is an image imp at time t _n−1 preceding this image. The image imp is obtained by performing an inverse DCT (iDCT) on the DCT transformed and motion compensated signal of the preceding image, or by being stored intermediately over the period of image exchange using the image storage device IS. can get.

画像ｉｍｐは参照画像として使用され、また多数のブロックｂｂｌ、例えば図５に示されているように、それぞれが１６×１６ピクセルの３６ピクセルブロックに分割される。これらのピクセルブロックｂｂｌ各々に対して、評価すべき画像ｉｍｎではＭＳＥ（平均２乗誤差）法により局所的な近傍における最善の整合が調べられる。このようにして各ブロックｂｂｌに対して変位ベクトルに関する情報が得られる。図５は一例として図１の画像ｉｍｇを示し、この画像ｉｍｇにおいては各ブロックｂｂｌに対して算出された動きベクトルｖが付加的にプロットされている。画像は背景に車が走行しているものであり、カメラは一連の画像を記録する際に車両に合わせて旋回運動した。この旋回運動に基づき、車両において算出された動きベクトルはほぼ零であり、一方周辺におけるベクトルには動きが示される。この処理の結果生じる動きベクトルフィールドｖｆｄが後続の処理段に対する入力である。 The image imp is used as a reference image and is divided into a number of blocks bbl, for example 36 pixel blocks of 16 × 16 pixels each as shown in FIG. For each of these pixel blocks bbl, the image imn to be evaluated is checked for the best match in the local neighborhood by the MSE (mean square error) method. In this way, information regarding the displacement vector is obtained for each block bbl. FIG. 5 shows the image img of FIG. 1 as an example, in which the motion vector v calculated for each block bbl is additionally plotted. The images were taken with a car running in the background, and the camera swiveled along with the vehicle when recording a series of images. Based on this turning motion, the motion vector calculated in the vehicle is almost zero, while the surrounding vectors indicate motion. The motion vector field vfd resulting from this processing is an input to the subsequent processing stage.

公知のＭＰＥＧ方式の一部であるここで説明する動き評価は簡単であるが、それにもかかわらず効果的な動き分析を供給する。本発明の枠内では、動きベクトルフィールドを供給し、このために一般的に検査すべき画像に時間的に先行するまたは後に続く１つまたは複数の画像を使用する他の動き分析方法も使用することができる。 The motion estimation described here, which is part of the known MPEG scheme, is simple but nevertheless provides an effective motion analysis. Within the framework of the present invention, other motion analysis methods are also used that provide a motion vector field for this purpose, typically using one or more images that precede or follow in time the image to be examined. be able to.

後続のステップｂ）では、動きベクトルフィールドｖｆｄから１つまたは複数の特徴量ｃｈｓ、ここでは例えば支配的な動き配向が検出される。この検出は分析モジュールＡＡＮにおいて行われる。ここで考察する例では、動きベクトルフィールドからの全てのベクトルの配向がヒストグラムｈｉｓにエントリされる（図６を参照されたい）。（明瞭にするため図６では１６の方向クラスへの分割が行われているが、勿論基本量のクラス数をこの数よりも明らかに多くすることができるのであって、ブロックｂｂｌの個数及び動き評価の解像度によってのみ制限されている）。ヒストグラムｈｉｓによって表される分布内の最大値ｂｍｘは画像内の主たる動き方向を示す。この方向に関して動き速度の値が検出され、この値は、所定の許容差（例えば最大値のヒストグラムクラスにそれぞれ隣接する２つのクラス）でもってこの主たる動き方向に属している全てのベクトルからの簡単な平均値形成によって算出される。この処理の結果は、その配向及び値によって画像内の主たる動きが表される１つのベクトルである。このベクトルはここで考察する例においては（２次元の）特徴量ｃｈｓであり、この特徴量ｃｈｓに基づき後続の処理段においてサウンドが導出される。 In a subsequent step b), one or more feature quantities chs, here for example a dominant motion orientation, are detected from the motion vector field vfd. This detection takes place in the analysis module AAN. In the example considered here, all vector orientations from the motion vector field are entered in the histogram his (see FIG. 6). (For the sake of clarity, the division into 16 direction classes is performed in FIG. 6, but of course the number of classes of the basic quantity can be clearly increased from this number, and the number of blocks bbl and the movement Limited only by the resolution of the rating). The maximum value bmx in the distribution represented by the histogram his indicates the main movement direction in the image. A value of the motion speed is detected for this direction, which is a simple value from all vectors belonging to this main motion direction with a predetermined tolerance (eg two classes each adjacent to the maximum histogram class). Calculated by average value formation. The result of this process is a vector whose main motion in the image is represented by its orientation and value. This vector is a (two-dimensional) feature quantity chs in the example considered here, and a sound is derived in a subsequent processing stage based on this feature quantity chs.

本発明の別の実施形態においては評価を別のやり方でも実施することも勿論可能である。例えば、ベクトルの配向及び値を考察する（＝２次元の基本量の頻度）ヒストグラムを評価することができる。評価の基礎として使用できる量は、例えば最も頻度の高い値（最大値）、第２最大値、所属の変数、高次の重みなどのような分布の統計的な特性データである。 Of course, in other embodiments of the present invention, the evaluation may be performed in other ways. For example, a histogram can be evaluated that considers the orientation and values of the vectors (= frequency of a two-dimensional basic quantity). The quantity that can be used as the basis of the evaluation is, for example, statistical characteristic data of a distribution such as the most frequently occurring value (maximum value), the second maximum value, the belonging variable, and the higher-order weight.

続くステップｃ）では合成モジュールＳＹＮにおいて特徴量ｃｈｓに基づくサウンド形成が行われる。先行して検出された主たる動きベクトルの配向及び値の関数としてサウンドｓｎｄが形成され、スピーカＬＳＰを介して出力される。択一的に、それ自体公知のやり方で電気的に表される音響信号として存在するサウンドシーケンスを移動電話ＭＯＧの移動遠隔通信コネクションを介して他の加入者に伝送することができる。 In the subsequent step c), sound formation based on the feature value chs is performed in the synthesis module SYN. A sound snd is formed as a function of the orientation and value of the main motion vector detected in advance, and output through the speaker LSP. As an alternative, a sound sequence present as an acoustic signal, which is represented electrically in a manner known per se, can be transmitted to other subscribers via the mobile telecommunication connection of the mobile telephone MOG.

例えば、サウンド形成の際に動き速度の値はサウンド量を制御し、他方では動き配向に依存して種々のサウンドが形成される。この際例えばヒストグラムｈｉｓに表されている各配向クラスが事前に記憶されているサウンドに対応付けられており、このサウンドは音の高さ及び／又は調性の特徴（上音スペクトル）によって他のサウンドと区別される。サウンドを音の高さに応じて配列できるが、これは必ずしも必要ではない。 For example, the value of motion speed during sound formation controls the amount of sound, while on the other hand, various sounds are formed depending on the motion orientation. At this time, for example, each orientation class shown in the histogram his is associated with a pre-stored sound, and this sound is classified according to the pitch and / or tonal characteristics (overtone spectrum). Distinguished from sound. Sounds can be arranged according to pitch, but this is not necessary.

勿論、サウンド形成のやり方を変えることができる。つまり本発明の変形では、分布ｈｉｓの複数の最大値ｈｍｘ、ｂｍｘを１つの音に重畳される。別の変型ではヒストグラムｈｉｓを直接的に合成装置ＳＹＮに供給することができ、この合成装置ＳＹＮはこのヒストグラムを基本サウンドの上サウンドスペクトルとして使用する。基本サウンドの音の高さを一定（例えばＡ、１１０Ｈｚ）に保つことができるか、上述の方法に応じて決定することができる。 Of course, you can change the way you sound. That is, in the modification of the present invention, a plurality of maximum values hmx and bmx of the distribution his are superimposed on one sound. In another variant, the histogram his can be fed directly to the synthesizer SYN, which uses this histogram as the upper sound spectrum of the basic sound. Whether the pitch of the basic sound can be kept constant (for example, A, 110 Hz) can be determined according to the method described above.

実施例による移動電話の正面図である。It is a front view of the mobile telephone by an Example. 図１の移動電話の背面図を示す。2 shows a rear view of the mobile phone of FIG. 図１の移動電話のブロック図を示す。FIG. 2 shows a block diagram of the mobile phone of FIG. ＭＰＥＧエンコーダのブロック図を示す。1 shows a block diagram of an MPEG encoder. 動き評価によって得られたベクトルフィールドの一例を示す。An example of the vector field obtained by motion evaluation is shown. 図５のベクトルフィールドから導出された動きヒストグラムを示す。Fig. 6 shows a motion histogram derived from the vector field of Fig. 5;

Claims

In a method of forming a signal from a video data stream containing a single image (img) in temporal order:
a) obtaining a motion vector field (vfd) from image data of a single image (img) using image data of a preceding and / or subsequent single image;
b) deriving at least one feature quantity (chs) from the motion vector field (vfd);
c) forming an audio signal (snd) in dependence on one or more of said feature quantities (chs), a method of forming a signal from a video data stream.

The method according to claim 1, carried out in a mobile telecommunication terminal equipment (MOG), for example a mobile telephone.

The method according to claim 2, wherein the video data stream is formed by a camera device (CAM) of the terminal equipment (MOG).

The method according to claim 2 or 3, wherein the acoustic signal is output via an audio device (LSP) of the terminal equipment (MOG).

The method according to claim 2 or 3, wherein the acoustic signal is output via a mobile telecommunications connection established by a terminal equipment (MOG).

The method according to claim 1, wherein in the step a), the motion vector field is obtained using a known MPEG encoder method.

In the step b), a distribution (his) is derived from the motion vector field (vfd), a statistical characteristic amount is obtained with respect to the distribution, and at least one characteristic amount (chs) is detected from the characteristic amount. Item 7. The method according to any one of Items 1 to 6.

In an apparatus for forming a signal from a video data stream containing a single image (img) in temporal order,
A control device having means (ENC) for obtaining a motion vector field (vfd) from image data of a single image (img) using a preceding and / or subsequent single image is provided,
The control device additionally derives at least one feature quantity (chs) from the motion vector field (vfd), and depending on a plurality of feature quantities or at least one feature quantity (chs), an acoustic signal (snd) Forming the apparatus.

9. A device according to claim 8, provided in a mobile telecommunications device (MOG), e.g. a mobile phone.

Device according to claim 9, wherein a camera device (CAM) is provided which forms a video data stream supplied to means (ENC) for determining a motion vector field (vfd).

11. Apparatus according to any one of claims 8 to 10, wherein the means (ENC) for determining a motion vector field (vfd) is an MPEG encoder.