KR100317299B1

KR100317299B1 - MPEG-4 Video Conference System And Multimedia Information Structure For MPEG-4 Video Conference System

Info

Publication number: KR100317299B1
Application number: KR1020000002279A
Authority: KR
Inventors: 손희정
Original assignee: 구자홍; 엘지전자주식회사
Priority date: 2000-01-18
Filing date: 2000-01-18
Publication date: 2001-12-22
Also published as: KR20010075803A

Abstract

본 발명은 화상회의 시스템에 있어서, 동화상과 음성을 MPEG-4의 기술로 표현(묘사)하고, 이 것을 이용해서 실시간 양방향 화상통신을 수행할 수 있도록 한 MPEG-4 화상회의를 위한 멀티미디어 정보구조와 MPEG-4 화상회의 시스템에 관한 것이다.The present invention relates to a multimedia information structure for MPEG-4 videoconferencing, which allows a videoconferencing system to express (describe) moving pictures and audio using MPEG-4 technology and to perform real-time two-way video communication using the same. MPEG-4 video conferencing system.

본 발명은 (a). 장면을 묘사하는 정보가 화상과 음성의 2차원 화면에 대한 묘사정보를 기술하고, (b). 상기 장면을 묘사하는 정보가 화상과 음성의 2차원 화면에 대한 정보를 노드와 트리구조로 기술하고, (c). 상기 장면을 묘사하는 정보가 오브젝트를 묘사하는 정보에 관계하는 오브젝트 묘사자 ID와, 최초로 전달되는 초기 오브젝트 묘사자(Initial Object Descriptor) 및 화상과 음성의 스트림의 코덱에 관련한 정보를 포함하고, (d). 상기 각 정보들이 ASCII 형태를 거쳐 이진으로 인코딩되고 BIFS 로 표현되어 MPEG-4 화상통신을 이루도록 하는 것을 특징으로 하는, MPEG-4 화상회의 시스템 이다.The present invention (a). The information describing the scene describes the description information of the two-dimensional screen of the image and the voice, and (b). The information describing the scene describes information on a two-dimensional screen of image and sound in a node and tree structure, and (c). The information depicting the scene includes an object descriptor ID relating to information describing the object, an initial object descriptor initially delivered, and information relating to a codec of a stream of video and audio, (d ). MPEG-4 video conferencing system, characterized in that the information is binary encoded via the ASCII form and represented in BIFS to achieve MPEG-4 video communication.

Description

MPEG-4 Video Conference System And Multimedia Information Structure For MPEG-4 Video Conference System

MPEG-4는 현재 ISO/IEC 산하 SG29 WG11에서 제안하여 표준화하는 차세대 멀티미디어를 위한 표준으로서, 그 응용분야가 방송과, 방송을 이용한 홈쇼핑, 실시간 동화상과 애니메이션과 음성전달을 이용하는 인터넷상의 교육용, 오락용, 홈쇼핑 혹은 정보안내 등의 프로그램 등에 다양하게 쓰일 수 있다.MPEG-4 is the standard for the next generation multimedia standard proposed and standardized by SG29 WG11 under ISO / IEC. Its application fields are broadcasting, home shopping using broadcasting, real-time moving picture, animation and voice transmission on the Internet. It can be used for various purposes such as home shopping, information shopping, etc.

MPEG-4가 갖는 중요한 장점은 기존의 멀티미디어 통신에서 사용하는 모든 미디어(Media) 각각을 하나의 객체로 보고 전체를 구성할 수 있게 하는 객체 지향적인 시스템이란 것으로서, 객체에 대한 독립적인 처리가 가능하다는 점이다.An important advantage of MPEG-4 is that it is an object-oriented system that can view all the media used in the multimedia communication as a single object and compose the whole object. Is the point.

멀티미디어 객체에 대한 독립처리가 가능함으로써 콘텐츠 제작자는 각각의 객체를 독립적으로 기록한 뒤에 각 객체에 대해 최대의 효율을 갖는 압축방식을 선택할 수 있고, 여러 개의 객체를 이용해서 다양하고 내용이 풍부한 멀티미디어 콘텐츠를 개발할 수도 있다.Independent processing of multimedia objects allows content creators to record each object independently and then choose the compression method with the maximum efficiency for each object. You can also develop.

그리고 객체 간의 관계와 객체에 대한 사용자의 접근에 관한 내용을 포함할 수 있기 때문에, 일방적이고 고정된 콘텐츠가 아닌 사용자의 의도에 따라 동적으로 변화할 수 있는 시나리오를 제공할 수 있게 되며, 콘텐츠에 대한 저작권에 따른 접근 제한을 제공하여 앞으로 범람할 멀티미디어 콘텐츠의 저작권에 관련한 문제도 다루고 있다.And because it can contain information about the relationship between objects and the user's access to the object, it is possible to provide a scenario that can change dynamically according to the user's intention, not unilateral and fixed content. It also provides access control under copyright to address the issue of copyright in multimedia content that will flood in the future.

이와같은 특징과 장점을 가지는 MPEG-4를 이용한 화상회의 시스템의 개발도 현재 가능한 응용분야로써 고려되고 있다.The development of video conferencing system using MPEG-4 with such features and advantages is also considered as a possible application field.

즉, 예를 들어 PC를 기반으로 하여 카메라로 촬영된 화자의 화상(비디오)과 마이크로 수집된 화자의 음성(오디오) 신호를 MPEG-4 로 표현하고, 이 것을 이용해서 화상통신(양방향 실시간 통신)을 수행한다면 위와같은 MPEG-4 의 특징과 장점의 효율 및 효과를 극대화할 수 있을 것이다.That is, for example, an image (video) of a speaker photographed with a camera based on a PC and a voice (audio) signal of a speaker collected by a microphone are expressed in MPEG-4, and video communication (bidirectional real-time communication) is used using this. In this case, the efficiency and effectiveness of the above MPEG-4 features and benefits will be maximized.

그러나, 지금까지의 화상통신 시스템은 그 개발과 보급에 있어 이른 바, 화상 전화기가 주류를 이루고 있으며, 화상 전화기는 오디오 및 비디오 신호를 압축코딩하여 실시간으로 송수신함으로써 상대방의 화상 및 음성과, 자신의 화상 및 음성을 상호 디스플레이 및 청취할 수 있도록 하는 기술이다.However, in the development and dissemination of the video communication system up to now, the so-called video phone is the mainstream, and the video phone compresses and transmits audio and video signals in real time to transmit and receive the video and audio of the other party. It is a technology for mutually displaying and listening to images and audio.

본 발명은 화자의 동화상과 음성을 MPEG-4 의 기술로 묘사하고, 이 묘사 정보구조를 이용해서 상대방과의 양방향 실시간 통신을 가능하게 함으로써, MPEG-4기술을 이용한 화상회의 시스템을 구현할 수 있도록 한, MPEG-4 화상회의 정보구조 및 MPEG-4 화상회의 시스템을 제공한다.According to the present invention, a video conferencing system using MPEG-4 technology can be implemented by describing a moving picture and a voice of a speaker using MPEG-4 technology, and by using this description information structure to enable two-way real-time communication with the other party. , MPEG-4 videoconferencing information structure and MPEG-4 videoconferencing system.

특히, 본 발명은 MPEG-4 시스템에서 제공한 참조 모델(IM1 Player2D, IM1 Player3D)을 변형하여, 화상회의 시스템을 개발할 때 고려해야 할 장면(Scene)과 오브젝트 묘사자(OD; Object Descriptor)의 ASCII 형태의 구성과, BIFS(Binary Format for Scene Description) 생성과 이 것을 화상회의 시스템의 방식에 맞게 실시간 양방향으로 송수신하는 방법을 제공한다.In particular, the present invention transforms the reference models (IM1 Player2D, IM1 Player3D) provided by the MPEG-4 system, and ASCII forms of scenes and object descriptors (ODs) to be considered when developing a videoconferencing system. And a method of generating a BIFS (Binary Format for Scene Description) and a method of transmitting and receiving the data bidirectionally in real time according to a video conference system.

도1은 본 발명을 설명하기 위한 MPEG-4 장면의 예를 나타낸 도면1 is a diagram showing an example of an MPEG-4 scene for explaining the present invention.

도2는 본 발명을 설명하기 위한 MPEG-4 정보구조(DS 및 스트림)의 예를 나타낸 도면2 is a diagram showing an example of an MPEG-4 information structure (DS and stream) for explaining the present invention.

도3은 본 발명을 설명하기 위한 화상회의 시스템의 화면 구성을 예로 들어 나타낸 도면3 is a view showing an example of the screen configuration of a videoconferencing system for explaining the present invention;

도4는 본 발명의 MPEG-4 화상회의 시스템의 구조를 나타낸 도면4 is a diagram showing the structure of an MPEG-4 video conference system according to the present invention.

도5는 본 발명의 MPEG-4 화상회의 시스템의 전송제어 수순을 나타낸 플로우차트5 is a flowchart showing a transmission control procedure of the MPEG-4 video conference system according to the present invention.

도6은 본 발명에서 장면(Scene)을 표현한 ASCII 형태의 정보구조를 나타낸 도면FIG. 6 is a view showing an information structure in ASCII format representing a scene in the present invention. FIG.

도7은 본 발명에서 초기 오브젝트 묘사자(Initial Object Descriptor)의 구조를 나타낸 도면7 illustrates the structure of an Initial Object Descriptor in the present invention.

도8은 보 발명에서 미디어 스트림(Media Stream)의 오브젝트 묘사자의 구조를 나타낸 도면8 illustrates the structure of an object descriptor of a media stream in the invention;

도9는 본 발명을 설명하기 위한, 화상회의시 사용할 코덱의 종류를 예로 들어 나타낸 도면9 is a diagram illustrating the type of codec to be used in video conferencing for the purpose of illustrating the present invention.

도10은 본 발명에서, IM1 Player 와 화상회의 시스템을 비교하여 설명하기 위한 도면FIG. 10 is a diagram for explaining and comparing an IM1 Player and a video conference system in the present invention. FIG.

본 발명의 MPEG-4 화상회의를 위한 멀티미디어 정보구조는;Multimedia information structure for MPEG-4 video conferencing of the present invention;

멀티미디어가 동화상일 때 멀티미디어 정보를 MPEG-4 기술로 표현하기 위하여, 장면(Scene)을 묘사하는 정보와 상기 장면에 해당하는 멀티미디어 오브젝트를 묘사하는 정보, 상기 장면 및 오브젝트에 해당하는 오디오 및 비디오 스트림을 지시하는 정보를 가지고 멀티미디어 정보가 표현될 때, MPEG-4 화상통신을 위하여;In order to express multimedia information using MPEG-4 technology when the multimedia is a moving picture, information describing a scene, information describing a multimedia object corresponding to the scene, and audio and video streams corresponding to the scene and the object are displayed. When multimedia information is represented with indicating information, for MPEG-4 video communication;

(a). 장면을 묘사하는 정보가 화상과 음성의 2차원 화면에 대한 묘사정보를 기술하고,(a). The information describing the scene describes the description of the two-dimensional screen of the image and sound,

(b). 상기 장면을 묘사하는 정보가 화상과 음성의 2차원 화면에 대한 정보를 노드와 트리구조로 기술하고,(b). The information describing the scene describes the information on the two-dimensional screen of the image and sound in a node and tree structure,

(c). 상기 장면을 묘사하는 정보가 오브젝트를 묘사하는 정보에 관계하는 오브젝트 묘사자 ID와, 최초로 전달되는 초기 오브젝트 묘사자(Initial Object Descriptor) 및 화상과 음성의 스트림의 코덱에 관련한 정보를 포함하고,(c). The information depicting the scene includes an object descriptor ID relating to information depicting an object, initial object descriptors initially delivered, and information relating to codecs of streams of video and audio,

(d). 상기 각 정보들이 ASCII 형태를 거쳐 이진으로 인코딩되고 BIFS 로 표현되어 MPEG-4 화상통신을 이루도록 하는 것을 특징으로 하는, MPEG-4 화상회의를 위한 멀티미디어 정보구조 이다.(d). A multimedia information structure for MPEG-4 video conferencing, characterized in that each piece of information is encoded in binary through ASCII format and represented in BIFS to achieve MPEG-4 video communication.

또한 본 발명에서, 상기 장면을 묘사하는 정보가 음성 및 화상의 2차원 화면에 대한 노드와 트리구조의 정보들을 Scene.bif 화일로 기술하고, 오브젝트 묘사자 ID, 초기 오브젝트 묘사자, 화상과 음성의 스트림의 코덱에 관련된 정보가 Scene.od 화일로 기술되는 것을 특징으로 하는 MPEG-4 화상회의를 위한 멀티미디어 정보구조 이다.Also, in the present invention, the information describing the scene describes the information of the node and the tree structure for the two-dimensional screen of the voice and the image in a Scene.bif file, and the object descriptor ID, the initial object descriptor, the image and the voice A multimedia information structure for MPEG-4 video conferencing, wherein information related to a codec of a stream is described in a Scene.od file.

또한 본 발명에서, 상기 Scene.bif 와 Scene.od 화일은 Scene.txt로 기술되는 것을 특징으로 하는 MPEG-4 화상회의를 위한 멀티미디어 정보구조 이다.Also, in the present invention, the Scene.bif and Scene.od files are multimedia information structures for MPEG-4 video conferencing, which are described as Scene.txt.

또한 본 발명에서, 상기 장면을 묘사하는 정보가 음성 및 화상의 2차원 하면에 대한 노드와 트리구조를 기술할 때, 노드로서; 그룹(Group), 2차원변환(Translation2D), 형태(Shape), 모양(Appearance), 사운드(Sound2D) 를 가지고, 객체의 상대적인 위치, 형태, 모양, 소리객체 등을 지정하는 것을 특징으로 하는 MPEG-4 화상회의를 위한 멀티미디어 정보구조 이다.Also in the present invention, when the information describing the scene describes a node and a tree structure for the two-dimensional lower surface of the voice and the image, as the node; MPEG-, characterized by specifying the relative position, shape, shape, sound object, etc. of the object with a Group, a Translation2D, a Shape, an Appearance, and a Sound2D. 4 It is a multimedia information structure for video conference.

또한 본 발명에서, 상기 모양(Appearance)의 이미지 텍스쳐(Image Texture) 노드와, 사운드(Sound2D) 노드는 url을 필드로 가지고 MPEG-4 시스템 상에서 화상과 음성의 스트림을 가리키는 정보(오브젝트 묘사자 ID)값을 가지는 것을 특징으로 하는 MPEG-4 화상회의를 위한 멀티미디어 정보구조 이다.Also, in the present invention, the Image Texture node of the Appearance and the Sound2D node have url as a field indicating information (object descriptor ID) indicating a stream of video and audio on an MPEG-4 system. It is a multimedia information structure for MPEG-4 video conferencing.

또한 본 발명에서, 상기 초기 오브젝트 묘사자는 MPEG-4 단말간의 호환을 위한 프로파일(Profile), Scene.od 와 Scene.bif에 대한 참조와 디코더값을 지정하는 것을 특징으로 하는 MPEG-4 화상회의를 위한 멀티미디어 정보구조 이다.Also, in the present invention, the initial object descriptor may specify a profile for compatibility between MPEG-4 terminals, reference to a Scene.od and a Scene.bif, and a decoder value. It is a multimedia information structure.

또한 본 발명에서, 상기 화상과 음성의 스트림의 코덱에 관련한 정보로서 미디어 스트림 오브젝트 묘사정보는 MPEG-4 화상회의를 위하여 해당 미디어 스트림 버퍼를 가리키는 정보와, 코덱마다 다른 값의 오브젝트 타입 ID로서 오디오만 사용하는 경우와 오디오 및 비디오를 함께 사용하는 경우를 지시하는 것을 특징으로 하는 MPEG-4 화상회의를 위한 멀티미디어 정보구조 이다.Also, in the present invention, the media stream object description information as the information related to the codec of the video and audio streams is information indicating the media stream buffer for MPEG-4 video conferencing and audio as the object type ID having different values for each codec. It is a multimedia information structure for MPEG-4 video conferencing that indicates when to use and when to use audio and video together.

한편, 본 발명의 MPEG-4 화상회의 시스템은; 사용자와 상대방의 오디오 및 비디오 신호를 실시간으로 입출력하기 위한 오디오/비디오 처리수단과, 상기의 멀티미디어 정보(Scene.bif,Scene.od,Initial.od)를 저장하는 수단과, 상기 입출력되는 상기 오디오 및 비디오 신호에 상기 저장된 정보구조를 실시간으로 멀티플렉싱 하여 송수신하는 수단을 포함하여 이루어지는 것을 특징으로 한다.On the other hand, the MPEG-4 video conference system of the present invention; Audio / video processing means for inputting and outputting audio and video signals of a user and the other party in real time, means for storing the multimedia information (Scene.bif, Scene.od, Initial.od), the audio input and output, And means for multiplexing and transmitting the information structure stored in the video signal in real time.

또한, 본 발명의 MPEG-4 화상회의 시스템 제어방법은; MPEG-4 화상통신을 위하여 멀티미디어 정보(Scene.bif, Scene.od, Initial.od)를 구성하는 단계, 상대방과의 통신조건을 설정하는 단계, 송신할 Initial.od 화일과 온라인 멀티플렉싱 프로그램에서 사용할 n.od를 설정하는 단계, Initial.od와 Scene.bif 및 Scene.od를 송수신하는 단계, 송수신할 코덱의 결정에 따라 오디오 및 비디오 스트림을 송수신하는 단계로 제어됨을 특징으로 하는 MPEG-4 화상회의 시스템 제어방법이다.In addition, the MPEG-4 video conference system control method of the present invention; Configuring multimedia information (Scene.bif, Scene.od, Initial.od) for MPEG-4 video communication, setting communication conditions with the other party, Initial.od file to be transmitted and n used in online multiplexing program MPEG-4 video conferencing system characterized by setting .od, transmitting and receiving Initial.od and Scene.bif and Scene.od, and transmitting and receiving audio and video streams according to the codec to transmit and receive Control method.

상기한 바와같이 이루어지는 본 발명에 따르면 본 발명의 제1특징으로서, 화상회의 시스템에서 구성할 장면에 사용되는 다양한 종류의 오디오, 비디오코덱(CODEC)에 따라 ASCII 형태로 구성하는 방식을 설명한다.According to the present invention made as described above, as a first feature of the present invention, a method of configuring in ASCII form according to various kinds of audio and video codecs (CODECs) used for scenes to be constructed in a videoconferencing system will be described.

즉, 장면을 위한 BIFS와 각 미디어 스트림(Media Stream)을 위한 오브젝트 묘사자(OD)에 대한 BIFS를 ASCII형태로 표현한다.That is, the BIFS for the scene and the BIFS for the object descriptor OD for each media stream are expressed in ASCII form.

또한 본 발명의 제2특징으로서, 화상회의 시스템에서 사용하는 다양한 종류의 오디오, 비디오 코덱에 따라 다양하게 생성된 ASCII 형태를 이용하여 BIFS들을 생성한 후에, 이 것을 화상회의 시스템에서 사용하는 방법을 설명한다.In addition, as a second aspect of the present invention, after generating BIFS using variously generated ASCII forms according to various kinds of audio and video codecs used in the videoconferencing system, a method of using the same in the videoconferencing system will be described. do.

즉, 앞에서 언급한 본 발명의 MPEG-4 화상회의 정보구조를 이용해서, PC기반으로 PC를 통해 음성과 카메라의 입력을 갖고 MPEG-4 콘텐츠를 생성하여 실시간으로 송신하고, 또한 수신함으로써 MPEG-4 기반 화상통신을 이룰 수 있도록 하는 화상회의 시스템 구성과 그 제어의 수순에 대해서 설명한다.That is, by using the MPEG-4 video conferencing information structure of the present invention mentioned above, MPEG-4 content is generated and transmitted in real time with the input of voice and camera through a PC based on a PC, and also received by MPEG-4. The configuration of a videoconferencing system for achieving basic video communication and the control procedure thereof will be described.

먼저, 화상회의 시스템을 위해서는 실시간으로 PC를 통해 음성과 카메라의 입력을 갖고 MPEG-4 콘텐츠를 생성하여 송신하는 부분이 필요하다.First, for a videoconferencing system, a part for generating and transmitting MPEG-4 content with voice and camera input through a PC in real time is required.

동화상과 음성의 압축부분과 송수신 부분 외에 MPEG-4의 콘텐츠의 생성을 위해선 MPEG-4 시스템(Systems) 부분에서 정의하고 있는 장면(Scene)을 기술하는 BIFS(Binary Format for Scene Description) 부분의 정보가 필요하다.In order to generate MPEG-4 contents in addition to the compressed and transmitted parts of moving images and audio, information of a BIFS (Binary Format for Scene Description) section describing a scene defined in the MPEG-4 Systems section is provided. need.

즉, 화상회의시 사용자(화자)의 화면과 상대방 화면을 하나의 장면으로 기술하는 BIFS의 생성이 필요하게 된다.In other words, it is necessary to create a BIFS that describes the screen of the user (the speaker) and the counterpart screen as one scene during a video conference.

또한 초기 스트림의 정의와 각 스트림의 특징의 전송을 위해서 부가적인 정보를 필요로 하게 되는데, 이 것은 오브젝트 묘사자(OD)라고 불리는 정보이고 역시 BIFS로 표현되며, 스트림의 내용이 된다.In addition, additional information is required for the definition of the initial stream and for the transmission of the characteristics of each stream. This information is called an object descriptor (OD), which is also expressed in BIFS, and becomes the content of the stream.

도1은 강의하는 장면을 MPEG-4 장면의 한 예로써 든 것이며, 도2는 MPEG-4 스트림 전달방식-FrameWork-에 대한 것으로서, 앞에서 설명한 장면(Scene)과 오브젝트 묘사자(OD) 및 BIFS 등의 의미에 대해서 설명하기 위한 것이다.FIG. 1 shows a lecture scene as an example of an MPEG-4 scene, and FIG. 2 illustrates an MPEG-4 stream delivery method (FrameWork), and the scene (Scene), the object descriptor (OD), BIFS, etc. This is to explain the meaning of.

도1의 예는 강의하는 장면을 각각의 객체로 구성한 것이므로 여기서 객체는 강사(101), 칠판(102), 책상(103), 지구의(104), 강사의 목소리(오디오) 등이 도리 수 있고, 이러한 객체들로 이루어지는 장면을 이진수로 표현한 것이 BIFs이다.Since the example of FIG. 1 is composed of each object of the lecture scene, the object may be the instructor 101, the blackboard 102, the desk 103, the earth 104, the voice of the lecturer (audio), etc. The binary representation of a scene made up of these objects is BIFs.

도1에서는 각 객체가 어떻게 한 평면상에 놓여지고, 보여지는가에 대해 나타나 있다.1 shows how each object is placed and viewed on one plane.

즉, 보는 이의 시점에 따라 이 장면이 3차원의 영상인 경우에는 다양한 모습으로 보이게 된다.That is, depending on the viewer's viewpoint, when the scene is a three-dimensional image, the scene is displayed in various shapes.

한편, 음성 혹은 음악을 의미하는 객체 역시 그 객체의 위치와 특성에 따라 사용자의 시점에 따라 다르게 들릴 수 있다.Meanwhile, an object representing voice or music may also sound differently depending on a user's viewpoint according to the location and characteristics of the object.

그리고, 사용자가 마우스를 이용하여 지구의를 클릭했을 때 회전하도록 장면을 구성할 수도 있다.In addition, the scene may be configured to rotate when the user clicks on the earth using a mouse.

앞의 도1의 강의하는 장면에서 MPEG-4의 객체로서 오디오 정보는 오디오 콤포지터(Compositer)(105)에 의해서, 비디오 정보는 비디오 콤포지터(Compositer) (106)에 의해서, 디스플레이(107)되고 스피커 등으로 청취(108)될 수 있다.In the previous lecture scene of FIG. 1, audio information as an object of MPEG-4 is displayed by an audio compositor 105 and video information by a video compositor 106 is displayed 107. Listening 108 may be performed by a speaker or the like.

도2는 MPEG-4에서 실제로 MPEG-4 형태의 정보를 어떻게 전달하는지를 나타내고 있다.2 shows how MPEG-4 actually delivers information in MPEG-4 format.

도2에서 살펴보면, MPEG-4 정보구조는 대략 크게 나누어, 초기 오브젝트 묘사정보(201)와, 장면을 묘사하는 부분(Scene Description Scheme)(202,203)과, 오브젝트를 묘사하는 부분(Object Description Scheme)(204,205), 그리고 상기 장면이 가리키는 오브젝트가 실연되는 구간정보인 비디오 스트림(Visual Stream)(206)과 오디오 스트림(Audio Stream)(207) 등으로 이루어진다.Referring to FIG. 2, the MPEG-4 information structure is roughly divided into initial object description information 201, scene description schemes 202 and 203 depicting scenes, and object description schemes describing objects. 204, 205, and a video stream (Visual Stream) 206, an audio stream (Audio Stream) 207, and the like, which are section information on which the object indicated by the scene is performed.

MPEG-4 정보구조에서 오브젝트 묘사자는 MPEG-4 구조에서 각각의 객체를 가리키는 포인터의 역할을 한다.The object descriptor in the MPEG-4 information structure serves as a pointer to each object in the MPEG-4 structure.

MPEG-4에서 처음에는 초기의 MPEG-4 정보를 위해 초기 오브젝트 묘사자(Initial OD)(201)가 전달되어야 한다.Initially in MPEG-4, an Initial Object Descriptor (Initial OD) 201 must be delivered for initial MPEG-4 information.

이 값은 장면(Scene)을 기술하는 스트림(202)과 오브젝트 묘사자(OD)(205)를 기술하는 스트림(204) 두개를 가리키고, 이 두 스트림을 위한 디코더의 버퍼의 크기 등, 각 스트림에 특정한 정보를 전달하게 된다.This value refers to two streams 202 describing the scene and two streams 204 describing the object descriptor (OD) 205, and the size of the decoder's buffer for these two streams. To convey specific information.

일반적으로 오브젝트 묘사자(OD)(205)의 스트림(204)은 각각의 미디어 객체에 해당하는 스트림들을 가리키고, 역시 각 객체의 디코더를 위한 정보를 포함하고 있다.In general, stream 204 of object descriptor (OD) 205 points to streams corresponding to each media object, and also contains information for the decoder of each object.

장면을 기술하는 스트림(202)을 이용하여 장면에 대한 트리(Tree) 형태의 정보가 전달되며, 이를 이용해서 수신측은 장면을 생성해낼 수 있게 된다.Tree-type information about the scene is transmitted using the stream 202 describing the scene, and the receiver can generate the scene using the stream 202.

오브젝트 묘사자(OD)를 기술하는 스트림(204)에는 장면(203)에서 가리키고 있는 각 객체들에 대한 정보가 전달되며, 이를 이용해서 각 객체의 스트림이 화면내의 해당하는 노드(Node)에 연결될 수 있고, 필요에 따라 추가 또는 삭제될 수 있다.In the stream 204 describing the object descriptor (OD), information about each object pointed to by the scene 203 is transmitted, and the stream of each object can be connected to a corresponding node in the screen. It may be added or deleted as needed.

즉, 도1에서 기술된 장면의 논리적인 정보는 장면을 기술하는 스트림(202,203)을 통해 전달되고, 각 장면의 객체를 가리키는 정보는 오브젝트 스트림(204,205)을 통해 전달된다.That is, logical information of the scene described in FIG. 1 is transmitted through the streams 202 and 203 describing the scene, and information indicating the object of each scene is transmitted through the object streams 204 and 205.

그리고 MPEG-4 시스템은 이러한 구조를 지원한다.And the MPEG-4 system supports this structure.

도2에서 살펴보면 MPEG-4 정보구조는 한 장면 내에서 또 다른 장면을 포함하는 인라인(InLine)노드의 경우에, 새로운 초기OD(Initial Object Descriptor)를 포함하는 구조와, 단지 추가적인 각각의 미디어 객체 스트림을 가리키는 경우를 표현하고 있는데, 이 것은 MPEG-4 정보구조가 트리구조로 이루어진다는 것을 의미한다(208,...).Referring to FIG. 2, the MPEG-4 information structure is a structure including a new initial object descriptor (OD) in case of an InLine node including another scene in one scene, and only an additional respective media object stream. In this case, the MPEG-4 information structure is composed of a tree structure (208, ...).

이 인라인(InLine)을 이용해서, 이미 생성된 MPEG-4화일을 포함하는 것이 구조적으로 단순해 질 수 있다.Using this InLine, it can be structurally simple to include an already created MPEG-4 file.

따라서, 위와같은 MPEG-4 정보구조를 기반으로 할 때, MPEG-4 화상회의를 위해서는 화상회의 장면을 MPEG-4의 표준에 맞게 기술한 장면과 오브젝트 묘사자(OD)에 해당하는 정보를 제공해야 한다.Therefore, based on the above MPEG-4 information structure, for MPEG-4 videoconferencing, information corresponding to scenes and object descriptors (OD) describing scenes of videoconferencing in accordance with the MPEG-4 standard should be provided. do.

그러므로 본 발명에서는 이러한 MPEG-4 화상회의 정보구조를 제공하며, 또한 이 정보구조를 이용해서 MPEG-4 화상통신을 수행하는 시스템을 제공한다.Therefore, the present invention provides such an MPEG-4 video conference information structure, and also provides a system for performing MPEG-4 video communication using this information structure.

도3에서 일반적인 화상회의 시스템의 화면 구성을 나타내고 있다.3 shows a screen configuration of a general video conference system.

화상회의 시스템의 특징을 보면, 화상과 음성으로 이루어지며, 2차원 화면이고, 사용자는 통화가 시작되는 순간부터 화상과 음성이 동기화되어 나타내지기를 기대하므로, 특별히 화상의 시작과 멈춤의 기능이 필요없다.The features of the video conferencing system are video and audio, two-dimensional screens, and the user expects the video and audio to be synchronized from the moment the call is initiated, so there is no need to start and stop the video. .

또한, 여기서 사용하는 화상과 음성은 일반적으로 자연 이미지 및 오디오의 코딩에 속하므로, 합성된 이미지나 음성을 사용하지 않는다.In addition, the picture and sound used herein generally belong to the coding of the natural image and audio, and thus do not use the synthesized image or sound.

이상의 특징은 ISO/IEC 14496-1 13장 MPEG-4 의 장면(Scene)의 단계를 정의하는 프로파일(Profile)에 따르면, Simple2D Profile 에 속하는 것이다.The above features belong to the Simple2D Profile according to the profile defining the stage of the scene of ISO / IEC 14496-1 Chapter 13 MPEG-4.

즉, 2차원의 평면에서 정의되는 오디오(301)와 화상(302)만을 가지고 표현할 수 있다.That is, it can be expressed with only the audio 301 and the image 302 defined in the two-dimensional plane.

도4는 도3과 같이, 오디오 및 화상만을 가지고 표현되는, 화상통신을 위한 MPEG-4 정보구조를 가지고 화상통신을 수행할 수 있도록 한 MPEG-4 화상회의 시스템의 구조를 예시하였다.FIG. 4 illustrates the structure of an MPEG-4 video conferencing system capable of performing video communication with an MPEG-4 information structure for video communication, represented only with audio and pictures, as shown in FIG.

네트워크 인터페이스(401), IM1 Player(402), 오디오/비디오 장치(403), MPEG-4 화상통신 정보(404), 멀티플렉서(405) 등으로 이루어진 것을 보이고 있다.The network interface 401, the IM1 Player 402, the audio / video device 403, the MPEG-4 video communication information 404, the multiplexer 405 and the like are shown.

네트워크 인터페이스(401)를 통해서 수신되는 상대방으로부터의 오디오 및 비디오 신호는 IM1 Player(402)를 거쳐서 오디오(A) 및 비디오(V)신호로 생성되어 오디오/비디오 장치(403)를 통해 디스플레이 및 출력되고, 오디오/비디오 장치(403)에 의해서 수집되는 사용자(화자)의 오디오 및 비디오 신호는 MPEG-4 화상통신정보(404)를 이용해서 MPEG-4 화상통신에 적합하게 가공되어 멀티플렉서(405)를 통해 네트워크 인터페이스(401)로 상대방에게 전송된다.Audio and video signals from the other party received via the network interface 401 are generated as audio (A) and video (V) signals via the IM1 Player 402, and displayed and output through the audio / video device 403. The audio and video signals of the user (speaker) collected by the audio / video device 403 are processed to be suitable for MPEG-4 video communication using the MPEG-4 video communication information 404, and the multiplexer 405 is used. The network interface 401 is sent to the other party.

MPEG-4 화상통신 정보(404)의 상세한 구조와 의미에 대해서는 후술하기로 한다.The detailed structure and meaning of the MPEG-4 video communication information 404 will be described later.

도4의 MPEG-4 화상회의 시스템에 의한 화상통신 제어의 수순을 도5에 나타내었다.The procedure of video communication control by the MPEG-4 video conference system of FIG. 4 is shown in FIG.

먼저, 통신 이전 단계로서; MPEG-4 화상통신에 사용할 Scene.bif 화일, Initial.od화일, Scene.od화일을 만들어 둔다(MPEG-4 화상통신 정보(404)).First, as a pre-communication step; A Scene.bif file, an Initial.od file, and a Scene.od file to be used for MPEG-4 video communication are created (MPEG-4 video communication information 404).

그리고, 통신 시작 단계로서; 화상회의 시스템에서 사용하는 통신모듈과 MPEG-4 에서 사용하는 DMIF를 통해 상대편과 데이터 전송능력 정보 'Capability'의 교환을 요청하고 공통적인 'Capability'를 정한다.And as a communication start step; The communication module used in the videoconferencing system and the DMIF used in MPEG-4 are requested to exchange 'capability' of data transmission capability information with the other party, and set a common 'capability'.

'Capability'를 정한 다음에는, 송신할 Initial.od 화일과 Online Mux프로그램에서 사용할 n.od를 결정한다.After setting 'Capability', it decides the Initial.od file to send and the n.od to use in Online Mux program.

여기서, Initia.od 와 Scene.bif는 고정된 것을 사용하고, 공통된 Capability에 따라 Scene.od를 선택한다.Here, Initia.od and Scene.bif use fixed ones and select Scene.od according to common Capability.

다음에는 화면 전송 단계로서; 송수신할 코덱이 결정되면 Online Mux를 이용해서 전송하기 시작하는데, 먼저 Initial.od를 송수신하고 Scene.bif와 Scene.od를 송수신한 다음, 오디오 및 비디오 스트림을 송수신함으로써, 화상통신을 진행한다.Next, as a screen transfer step; Once the codec to transmit / receive is determined, it starts to transmit using Online Mux. First, it transmits and receives Initial.od, Scene.bif and Scene.od, and then transmits and receives audio and video streams.

이때, 통신 초기에는 Initial OD를 주고받고, n.od는 OD Update Command로써, Scene.bif는 BIFS Update Commmand로써 주고 받는다.At this time, Initial OD is exchanged at the beginning of communication, n.od is transmitted as OD Update Command, and Scene.bif is transmitted as BIFS Update Commmand.

통신 종료시에는 일반적인 종료로 회의를 마친다.At the end of the communication, the meeting ends with a normal termination.

앞에서 설명한 바와같이 MPEG-4 화상회의 시스템과, 이 시스템에 의해서 이루어지는 MPEG-4 화상통신을 위해서는 장면(Scene)을 표현하는 ASCII 형태의 정보, 초기 오브젝트 묘사자(Initial Object Descriptor), 미디어 스트림(Media Stream)의 오브젝트 묘사정보 등의 MPEG-4 화상회의 정보구조의 정의와 이용이 필요하다.As described above, the MPEG-4 video conferencing system, ASCII information representing a scene, an Initial Object Descriptor, and a media stream for MPEG-4 video communication made by the system. It is necessary to define and use an MPEG-4 video conference information structure such as object description information of a stream.

다음에는 이러한 본 발명의 MPEG-4 화상회의 정보구조가 어떻게 이루어지며, 그 의미는 어떠한 것인지에 대해서 기술한다.The following describes how the MPEG-4 videoconferencing information structure of the present invention is made and what it means.

[1]. 화상회의 시스템에서 사용할 BIFS의 ASCII 형태 구성[One]. Configure ASCII Forms for BIFS for Use in Video Conferencing Systems

(1). 앞에서 설명한 바와같이 도3과 같은 화상통신시의 화면 구성(오디오 포함)을 살펴보면, 화상과 음성으로 이루어지고(2차원 화면), 사용자는 통화가 시작되는 순간부터 화상과 음성이 동기화되어 나타내지기를 기대하므로, 특별히 화상의 시작과 멈춤의 기능이 필요없으며, 사용하는 화상과 음성은 합성된 이미지나 음성을 사용하지 않는다.(One). As described above, the screen configuration (including audio) in video communication as shown in FIG. 3 is composed of video and audio (two-dimensional screen), and the user expects the video and audio to be synchronized from the start of the call. Therefore, there is no need for the function of starting and stopping images in particular, and the image and sound used do not use the synthesized image or sound.

따라서, 2차원의 평면에서 정의되는 오디오와 화상만을 가지고 표현할 수 있다.Therefore, only audio and images defined in a two-dimensional plane can be represented.

(2). 장면을 표현하는 ASCII 구성(2). ASCII composition to represent the scene

도6에 도3을 MPEG-4의 장면으로 표현하기 위한 ASCII 형태를 보인다.Fig. 6 shows an ASCII form for representing Fig. 3 as a scene of MPEG-4.

이 ASCII 형태의 방식은 ISO/IEC 14772-1의 VRML97 Spec에서 기술되어 있는 것의 형태와 동일하다.This ASCII format is identical to that described in the VRML97 Spec of ISO / IEC 14772-1.

MPEG-4에서는 새로운 노드와 의미가 추가되었고, ASCII의 이진 코딩을 규정하였다.MPEG-4 adds new nodes and semantics, and defines binary coding of ASCII.

간단히 설명하면 MPEG-4장면은 노드들의 트리 형태로 이루어진다.In short, an MPEG-4 scene consists of a tree of nodes.

해당하는 중괄호｛｝는 한 노드의 경계를 나타내고, 그 안에 포함되는 노드들은 자식(Children) 노드가 된다.Corresponding curly braces indicate the boundaries of a node, and the nodes contained within it become child nodes.

이런 노드의 종류로는 그룹(Group), 2차원변환(Translation2D),모양(Shape), 모양(Appearance), 2차원 사운드(Sound2D) 등이 있고, 각각 루트(Root)를 의미하고, 객체의 상대적인 위치를 지정하고, 형태를 지정하고, 모양을 지정하고, 소리 객체를 지정하는 역할을 한다.These types of nodes include Group, Translation2D, Shape, Appearance, and Sound2D, respectively, which means Root, and the relative of objects. It is responsible for specifying the location, specifying the shape, specifying the shape, and specifying the sound object.

노드는 필드를 갖고 있는데, 이 필드는 이름과 그 이름에 해당하는 값을 갖는다. 이 값이 또한 노드가 될 수 있다.The node has a field, which has a name and a value corresponding to that name. This value can also be a node.

그리고, 특별히 'Appearance'의 이미지 텍스쳐(Image Texture) 노드와 'Sound2D' 노드는 url필드를 갖고 있는데, 이 것은 MPEG-4의 시스템 상에서 화상과 음성의 스트림을 가리키는 번호 즉, 오브젝트 묘사자(OD)의 ID의 값을 갖는다.In particular, the 'Appearance' Image Texture node and the 'Sound2D' node have a url field, which is an object descriptor (OD) that indicates a stream of video and audio on an MPEG-4 system. Has the value of ID.

결론적으로, 화상회의를 위한 장면 자체는 오디오 비디오 코덱의 종류와 상관없이 고정하여 사용할 수 있다.In conclusion, the scene itself for videoconferencing can be fixed and used regardless of the type of audio video codec.

도6에서 'REPLACE BY SCENE' 과 'UPDATE'는 다음에 나오는 노드나 정보에 대해 지시하는 명령과 같다.In FIG. 6, 'REPLACE BY SCENE' and 'UPDATE' are the same as commands for instructing the next node or information.

따라서 'Group' 노드로 시작하는 초기장면(Initial Scene)에 의해 장면(Scene) 정보를 생성할 뿐 아니라, 오브젝트 묘사자(OD)에 관한 정보도 전달 스트림으로 만든다.Therefore, not only the scene information is generated by the initial scene starting with the 'Group' node, but also the information about the object descriptor OD is made into the delivery stream.

위의 화일은 실제로 MPEG-4 IM1-BifsEnc에 의해 'REPLACE BY SCENE' 이하는 'Scene.bif'화일로, 'UPDATE' 이하는 'Scene.od' 로 분리되어 인코딩(Encoding)된다.The above file is actually encoded by MPEG-4 IM1-BifsEnc to 'Scene.bif' file under 'REPLACE BY SCENE', and separated by 'Scene.od' under 'UPDATE'.

위의 ASCII 형태의 정보는 일반적으로 Scene.txt로 저장된다.The ASCII information above is usually stored as Scene.txt.

좀 더 자세히 말하면, 'Scene.bif'는 오디오, 비디오 코덱에 따라 변함없이일정한 장면 묘사이지만, 'Scene.od'는 'mux.scr'의 정보 즉, 각 오브젝트 묘사자(OD)의 정보의 변화에 따라 다양하게 된다.In more detail, Scene.bif is a constant scene description according to audio and video codec, but Scene.od is a change of information of mux.scr, that is, information of each object descriptor (OD). It varies according to.

즉, 화상회의에 사용하는 오디오, 비디오 코덱의 종류와 그에 따른 값의 변화에 따라 다양하게 된다.That is, it varies depending on the type of audio and video codec used for video conferencing and the value thereof.

이 정보는 'muxScript Mux.scr' 화일에 분리되어 있다.This information is separated into the 'muxScript Mux.scr' file.

IM1에서 사용하는 형식으로써 오브젝트 묘사자(OD)의 정보를 ASCII형태로 만든 후, 그 것으로 이진으로 인코딩할때, 초기 오브젝트 묘사자(Initial OD)와 각 오브젝트 묘사자(OD)에 관련된 명령에서 사용하는 오브젝트 묘사자(OD)의 정보를 참조하는데 쓰인다.The format used by IM1, which makes the information in the object descriptor (OD) in ASCII form, and then encodes it in binary, which is used by the Initial Object Descriptor (Initial OD) and the commands associated with each object descriptor (OD). It is used to refer to the information of the object descriptor (OD).

'muxScript'의 내용은 크게 초기오브젝트 묘사자(도7)의 내용과 미디어 스트림(Media Stream)에 대한 오브젝트 묘사자(도8)로 나뉜다.The contents of 'muxScript' are largely divided into contents of an initial object descriptor (FIG. 7) and an object descriptor for a media stream (FIG. 8).

(3). Mux.scr에서 오브젝트 묘사자(OD)를 표현하는 ASCII 구성(3). ASCII construct to represent object descriptors (OD) in Mux.scr

'Mux.scr'의 화일에는 최초로 전달되는 초기 오브젝트 묘사자(Initial OD)라는 MPEG-4 정보와 각 화상과 음성의 스트림의 코덱에 관련된 정보를 BIFS로 표현하여 전달한다.The file of 'Mux.scr' expresses MPEG-4 information called Initial Object Descriptor (Initial OD) and information related to the codec of each image and audio stream in BIFS.

이 것을 오브젝트 묘사자(OD)에 관련한 BIFS라고 부른다.This is called BIFS in relation to the object descriptor (OD).

초기 오브젝트 묘사자는 MPEG-4의 단말간의 호환성을 위한 프로파일(Profile)의 값을 주고 받으며, Scene.od와 Scene.bif에 대한 참조와 디코더(Decoder)값 등의 지정을 한다.The initial object descriptor exchanges a profile value for compatibility between MPEG-4 terminals, and specifies a reference to a Scene.od and a Scene.bif and a decoder value.

도7의 초기 오브젝트 묘사자는 화상회의의 경우와 일반적인 경우나 동일하므로 화상회의를 위해서 특별히 변경할 필요가 없다.The initial object descriptor of Fig. 7 is the same as the general case of the videoconferencing, and thus does not need to be changed specifically for the videoconferencing.

도8의 오브젝트 묘사자는 화상회의의 경우, 미디어 스트림의 종류가 오디오만 사용하는 경우와, 오디오와 비디오 함께 사용하는 경우가 있으며, 각각의 오디오와 비디오의 경우 코덱이 다양할 수 있다.In the case of video conferencing, the object descriptor of FIG. 8 may use audio only and audio and video together, and codecs may vary for each audio and video.

도8에서 네모진 굵은 점선의 부분(801)은 IM1-Mux에서 사용하는 특별한 형태로써 MPEG-4 화일에서 이용할 미디어 스트림을 화일의 형태로 참조하는 구조이다.In FIG. 8, the square thick dotted portion 801 is a special type used in IM1-Mux, and refers to a structure in which a media stream to be used in an MPEG-4 file is referred to as a file.

그러나, 화상회의 시스템은 실시간에 양방향 통신이므로 화일의 형태가 아니라 내부적으로 접근 가능한 버퍼를 가리키는 구조가 필요하다.However, since videoconferencing systems are bidirectional communication in real time, a structure is needed that points to an internally accessible buffer rather than a file.

그리고, 동그란 굵은 점선(802)의 값은 오브젝트 타입 ID(Objcet Type Indication)를 의미하는데, 각각 사용되는 코덱마다 다른 값을 갖는다.In addition, the value of the thick thick dotted line 802 means an object type ID (Objcet Type Indication), each having a different value for each codec used.

예를 들어, 화상회의 시스템에서 사용할 코덱의 조합을 도9와 같이 생각해 볼 수 있다.For example, a combination of codecs to be used in a videoconferencing system can be considered as shown in FIG.

도9에서, *1,*2가 MPEG-4의 경우 일반적인 경우이다.In Fig. 9, * 1 and * 2 are typical cases in the case of MPEG-4.

H.263과 G.723 등의 코덱은 기존의 화상회의 시스템에서 많이 사용하는 코덱이므로 사용될 확률이 높다.Codecs such as H.263 and G.723 are likely to be used because they are widely used in existing video conferencing systems.

각 코덱에 해당하는 ObjectTypeindication값은 표준안에 나와있고, 사용자 정의 값을 사용해도 된다.The ObjectTypeindication values corresponding to each codec are listed in the standard and may be user defined.

도8의 예에서는 동일한 값 'ox21'이 오디오와 비디오를 위해 사용되었지만, 스트림 타입(Stream Type)이 비디오 스트림(Video Stream)과 오디오 스트림(Audio Stream)으로 서로 다르므로 구별이 된다.In the example of FIG. 8, although the same value 'ox21' is used for audio and video, the stream type is different because it is different from the video stream and the audio stream.

결론적으로, 화상회의 시스템의 다양한 코덱을 지원하기 위해서는 도8에 해당하는 mux.scr화일의 미디어에 해당하는 오브젝트 묘사자(OD)를 변형시켜 주면 된다.In conclusion, in order to support various codecs of the video conference system, the object descriptor OD corresponding to the media of the mux.scr file of FIG. 8 may be modified.

그리고, 도7에 해당하는 초기 오브젝트 묘사자와 도6에 해당하는 장면(Scene)의 표현은 고정된 것을 사용하면 된다.In addition, the representation of the initial object descriptor corresponding to FIG. 7 and the scene corresponding to FIG. 6 may be fixed.

[2]. IM1에서 화상회의 시스템을 위해 변형해야 할 부분[2]. Changes to IM1 for Video Conferencing Systems

(1). 조건(One). Condition

*화상회의 시스템에서 Simple2D의 음성과 동화상으로 이루어진 장면을 사용한다. 즉, 고정된 Scene을 만들어 놓았다.* The video conferencing system uses a scene composed of Simple2D audio and video. In other words, you have a fixed scene.

*고정된 초기 오브젝트 묘사자(Initial OD)를 사용한다.Use a fixed initial object descriptor (Initial OD).

*사용할 코덱의 조합에 따라 필요한 미디어 묘사자(Media OD)의 내용을 바꾼다. 즉, 묘사자(OD)정보가 들어있는 Mux.scr 화일을 코덱에 따라 여러개를 준비한다. MPEG-4 오디오만을 지원한 화일 Mux1.scr, 혹은 MPEG-4 오디오와 비디오를 지원한 화일 Mux3.scr(번호는 도9에서 참조함).* Change the content of the Media OD you need based on the combination of codecs you want to use. That is, prepare several Mux.scr files containing the descriptor (OD) information according to the codec. File Mux1.scr that supports only MPEG-4 audio, or File Mux3.scr that supports MPEG-4 audio and video (see Figure 9 for number).

*화상회의 시스템의 사용자 음성과 동화상을 인코딩/디코딩하는 모듈을 가진다.Has a module for encoding / decoding user voices and moving images of videoconferencing systems.

*화상회의 시스템의 정보를 실시간으로 주고받을 수 있는 모듈을 가진다.It has a module that can send and receive video conference system information in real time.

(2). IM1에서 변형해야 할 부분(2). What to transform in IM1

도10에서는 참조 모델에서 구현된 내용과, 화상회의에 적용하기 위해 수정할 부분을 나타내고 있다.Fig. 10 shows the contents implemented in the reference model and the parts to be modified for application to video conferencing.

즉, 참조모델(IM1 Player)에서는 임의의 mp4화일에서 읽어들인 후, 각각의 데이터 당 디코더를 통하여 디코딩된 후 콤포지션되는 구조이고, MPEG-4 화상회의를 위해서는 지역의 화일에서 읽어들이는 것 아니라, 망(Network)을 통해서 실시간 통신이 이루어지도록 한다.That is, in the reference model (IM1 Player), it is a structure that is read from an arbitrary mp4 file and then decoded through a decoder for each data and then composed, and not from a local file for MPEG-4 video conferencing. Real-time communication is performed through the network.

또한, 참조모델(IM1 Player)에서는 mp4화일을 생성하는 BifsEnc와 Mux가 분리되어 있으며, Mux의 입력인 미디어 스트림이 화일 형태로 제공되었으나, MPEG-4 화상회의를 위해서는 음성과 화상 각각이 실시간으로 인코딩되어야 하고, Mux의 각 패킷을 만드는 것은 IM1 Player에 포함되어야 한다.Also, in the reference model (IM1 Player), BifsEnc and Mux, which generate mp4 files, are separated, and a media stream, which is a Mux input, is provided in the form of a file.However, for MPEG-4 video conferencing, audio and video are encoded in real time. Each packet of Mux must be included in the IM1 Player.

그리고, 참조모델(IM1 Player)에서는 IM1 Player의 동작이 방송, 클라이언트-서버의 형태인 'push senario'의 형태로 제공되었으나, MPEG-4 화상회의를 위해서는 실시간 양방향 통신인 경우에 해당 'pull senario'를 따를 수 있도록 바꾸어야 한다.In the reference model (IM1 Player), the operation of the IM1 Player was provided in the form of 'push senario', which is a broadcast and client-server type. You must change it so that it follows.

이상에서 설명한 바와같이 MPEG-4 화상회의를 위하여, 장면 묘사자, 초기 오브젝트 묘사자, 미디어 스트림의 오브젝트 묘사자를 정의하였고, Initial.od 와 Scene.bif 는 고정된 것을 사용하고, 'Capability'에 따라 Scene.od를 선택하여 Initial.od, Scene.bif, Scene.od 를 송수신한 후, 해당 오디오 비디오 스트림을 송수신함으로써 MPEG-4 화상통신(즉, 실시간 양방향 오디오/비디오 송수신)을 가능하게 하였으며, 그 통신 시스템와 제어의 수순은 이미 설명한 바와같이 도4 및 도5에 따른다.As described above, for MPEG-4 videoconferencing, scene descriptors, initial object descriptors, and object descriptors of media streams are defined. Initial.od and Scene.bif use fixed ones, and according to 'Capability' Select Scene.od to transmit and receive Initial.od, Scene.bif, Scene.od, and then transmit and receive the corresponding audio and video stream to enable MPEG-4 video communication (i.e. real-time two-way audio / video transmission and reception). The procedure of the communication system and the control is as described above with reference to Figs.

특히 도4에서 BIFS 스트림들(404)은 MPEG-4의 스트림 형태에 맞게 미리 생성해 놓은 것이며, 필요에 따라 사용할 수 있도록 되어 있다.In particular, in FIG. 4, the BIFS streams 404 are generated in advance according to the MPEG-4 stream type, and can be used as needed.

본 발명의 MPEG-4 화상회의 시스템에 따라면, MPEG-4 참조 모델로 개발중인 IM1을 이용하여 간단한 구조(변형)에 의해 MPEG-4 와 호환되는 화상회의 시스템을 개발할 수 있다.According to the MPEG-4 videoconferencing system of the present invention, it is possible to develop a videoconferencing system compatible with MPEG-4 by a simple structure (modification) using IM1 under development as an MPEG-4 reference model.

즉, 기존에는 오프라인(Offline)으로 동작하는 Mux를 실시간에 동작하게 하고, mp4화일을 생성하는 대신 미리 생성해둔 BIFS 스트림과 실시간에 생성되는 오디오와 비디오 스트림을 이용함으로써 MPEG-4 화상회의 시스템을 만들 수 있다.In other words, the MPEG-4 video conferencing system can be created by using a mux operating offline in real time and using a previously generated BIFS stream and an audio and video stream generated in real time instead of generating an mp4 file. Can be.

또한, 화상회의 시스템을 MPEG-4 로 만들기 위해 필요한 정보가 오디오 비디오 스트림의 코덱에 관련한 부분 외에는 고정적인 특징을 이용하여, MPEG-4 의 요소인 *.bif, *.od의 화일의 종류가 한정되는 것을 이용해 시스템을 단순화 하였다.In addition, the information required to make the videoconferencing system MPEG-4 is fixed except for the codec of the audio and video stream, and the types of * .bif and * .od files that are MPEG-4 elements are limited. To simplify the system.

Claims

In order to express multimedia information using MPEG-4 technology when the multimedia is a moving picture, information describing a scene, information describing a multimedia object corresponding to the scene, and audio and video streams corresponding to the scene and the object are displayed. When multimedia information is represented with indicating information, for MPEG-4 video communication;

(a). The information describing the scene describes the description of the two-dimensional screen of the image and sound,

(b). The information describing the scene describes the information on the two-dimensional screen of the image and sound in a node and tree structure,

(c). The information depicting the scene includes an object descriptor ID relating to information depicting an object, initial object descriptors initially delivered, and information relating to codecs of streams of video and audio,

(d). And each of said information is encoded in binary via ASCII format and represented in BIFS to achieve MPEG-4 video communication.

The method of claim 1, wherein the information describing the scene describes a node and a tree structure information of a voice and a two-dimensional screen of a picture in a Scene.bif file, an object descriptor ID, an initial object descriptor, and a picture and voice. And information related to the codec of the stream of the video stream is described in the Scene.od file.

3. The multimedia information structure of claim 2, wherein the Scene.bif and Scene.od files are described as Scene.txt.

2. The device of claim 1, wherein the information depicting the scene describes a node and a tree structure for a two-dimensional bottom surface of voice and image; MPEG-, characterized by specifying the relative position, shape, shape, sound object, etc. of the object with a Group, a Translation2D, a Shape, an Appearance, and a Sound2D. 4 Multimedia Information Structure for Video Conferencing.

5. The method of claim 4, wherein the Image Texture node of the Appearance and the Sound2D node have a url as a field indicating information (object descriptor ID) indicating a stream of video and audio on an MPEG-4 system. Multimedia information structure for MPEG-4 video conferencing.

2. The MPEG-4 video conferencing of claim 1, wherein the initial object descriptor designates a profile for compatibility between MPEG-4 terminals, a reference to Scene.od and Scene.bif, and a decoder value. Multimedia Information Structure.

The media stream object description information as information related to the codec of the video and audio streams is information indicating the media stream buffer for MPEG-4 video conferencing and audio as an object type ID having a different value for each codec. A multimedia information structure for MPEG-4 video conferencing, which indicates only when to use and when to use together audio and video.

Audio / video processing means for inputting and outputting audio and video signals of a user and the other party in real time, and means for storing multimedia information (Scene.bif, Scene.od, Initial.od) according to claim 1 to claim 7. And means for multiplexing and transmitting the information structure stored in the input and output audio and video signals in real time.

In order to express multimedia information by MPEG-4 technology when the multimedia is a moving picture, information describing a scene, information describing a multimedia object corresponding to the scene, and audio and video streams corresponding to the scene and the object are displayed. When multimedia information is represented with indicating information, for MPEG-4 video communication;

Constructing the multimedia information (Scene.bif, Scene.od, Initial.od) according to claim 1 for MPEG-4 video communication, setting the communication conditions with the other party, Initial. It is controlled to set n.od to be used in od file and online multiplexing program, to send and receive Initial.od and Scene.bif and Scene.od, and to send and receive audio and video streams depending on the codec to send and receive. MPEG-4 videoconferencing system control method.