WO2010119171A1

WO2010119171A1 - Method and arrangement for synchronizing digital multimedia signals

Info

Publication number: WO2010119171A1
Application number: PCT/FI2010/050259
Authority: WO
Inventors: Jari Koivusaari; Moncef Gabbouj; Hexin Chen
Original assignee: Jari Koivusaari; Moncef Gabbouj; Hexin Chen
Priority date: 2009-04-14
Filing date: 2010-03-31
Publication date: 2010-10-21
Also published as: FI20095408A0; FI124520B; FI20095408A

Abstract

The object of the invention is a method for synchronizing digital multimedia signals, in which method digital multimedia signals (21, 22) are at least encoded into a plurality of slices (21b) of the multimedia signal, are decoded and are synchronized. According to the invention a synchronization pointer (38) is embedded into two or more slices (21b) of the multimedia signal before the decoding phase.

Description

METHOD AND ARRANGEMENT FOR SYNCHRONIZING DIGITAL MULTIMEDIA SIGNALS

The object of the invention is a method as presented in the preamble of claim 1 and an arrangement as presented in the preamble of claim 7 for synchronizing digital multimedia signals.

The main application of the invention is the synchronization of a digital multimedia presentation. The method and the arrangement according to the invention, i.e. more briefly the solution of the invention, in this case comprise e.g. how the synchronization information of different multimedia signals can be recorded, maintained and stored starting from the time that the multimedia presentation is created until the moment of presentation. The solution according to the invention can be applied to all application areas in which digital audio, video or other multimedia data is needed. These application areas are e.g. Internet video services, videoconferencing, digital television, films, audiovisual messages, etc. The invention can easily be used e.g. in parallel with already existing standards and methods, which are e.g. MPEG-2, MPEG-4, DVB, HDTV, DVD, VCD, etc.

A digital multimedia presentation comprises two or more sepa- rate multimedia signals. Multimedia signals can contain different types of multimedia data, such as e.g. video image, music, voice, individual images, scene description data, synthetic objects (e.g. computer graphics), control data for peripheral apparatuses, such as e.g. lights and fans, et cetera. A good-quality multimedia presentation requires that the different multimedia signals can be precisely synchronized with each other, and it must be possible to present the multimedia signals at the correct presentation time so that the different multimedia signals are experienced as belonging to the same presentation. In this case, delays and/or other breaks that can be detected by human senses may not occur in the presentation of the synchronized multimedia signals.

It must be possible to attach the data used for synchronization to the presentation when creating the presentation. It must be possible to record the data needed for synchronization and the data must be saved in the recording of the multimedia presenta- tion, such as in Blu-Ray, DVD and VCD recordings. In addition, the data must be saved during the transmission of a multimedia presentation utilizing different communications channels, e.g. radio signals (e.g. DVB-T) or Internet technologies (e.g. RTP, IP) . It must also be possible to interpret the data used for synchronization at the receiving end (e.g. a DVB-T set-top box) and it must be possible to utilize it in the presentation (e.g. television) . All this can be referred to as so-called end-to- end synchronization.

Digital multimedia signals are compressed, i.e. packed, with the encoding method best applicable to each signal, for storing or transferring a presentation. Since encoding methods are independent of each other and separate, it must be possible to create dependency relationships, e.g. time dependency, between the signals for synchronization. By means of the invention it is possible to create dependency relationships that can be utilized in many ways between the counterparts of signals. In this context, the counterpart of a signal refers to independent parts of different signals that correspond to a certain time window. The independent component of a signal, for its part, refers here to the fact that it can be further processed, e.g. compressed and decompressed, without information about the other components of the signal. In addition to time-dependency relationships, many other useful dependency relationships can be created, by means of which the operation of the system can be improved. Dependency relationships are created according to the invention inside the signals and not as a part of header data as is common in prior art.

The methods used nowadays utilize so-called time stamp technology. For example, in the MPEG-2 coding system the time of presentation of sound or of a video image is attached to the com- pressed bitstream in connection with the encoding system, i.e. in connection with the recording or transmission. In the encoding system, the output after the audio encoder or video encoder is packetized into PES packets, to which time stamps are attached to the header fields for decoding or for presentation. Time-stamped packets are interleaved, i.e. multiplexed, into bitstreams. In connection with the decoding system, i.e. with reception or presentation, it is attempted by means of the time stamps in question to get the presentation times of the sound and video image synchronized with each other. In the decoding system the interleaved packets are divided, i.e. demultiplexed, for the correct decoding operations. Before decoding, the time stamps are read from the header fields for the use of the decoding system.

Thus audio and video data are encoded and decoded separately from each other, but the method offered by the MPEG standard (ISO/IEC 13818-1) is generally used to "connect" audio and video data with each other by means of time stamps. Synchronization must be precise so that the video and sound are experi- enced as being the same presentation.

The solution generally works very well if the decoding system of the receiving end obtains the received data at an adequately constant rate. Delays and jitter (variation in delay), among other things, cause problems at the receiving end. In addition to jitter, problems can also be caused by errors in the data transmission path or in the recording medium. New multimedia services {e.g. editing or scaling of content) , direct playback (streaming) of a multimedia presentation, converting the encoding format to another (transcoding) or scaling of the properties of a presentation will produce more problems. Problems will also increase when it is attempted to use unreliable com- munications channels, e.g. an Internet connection.

In prior art, there can also be problems at the hardware level in, among other things, clock accuracy or errors in the software used. Timing problems of this type have been noticed in Finland, also, in connection with the subtitling of digital television broadcasts. In this case it can be fairly certainly assumed that the broadcasts of television companies are error- free and conform to standard, in other words synchronization problems are not generally caused by the transmission. Errors are often caused by the receiver. A "shortcut" might have been taken in the details of its implementation and the assumption made that everything will work flawlessly. If the issue is simply a programming error, the defect can generally be repaired with a software update.

In .the near future multimedia applications will be introduced that could contain considerably more multimedia signals than currently. For example, the next generation of televisions, the applications and technologies relating to which are being stud- ied around the world at present. The increase in the amount of signals to be processed will also increase the performance and memory requirements for hardware. The pixel resolution of a video image will increase, stereo video broadcasts, multiview broadcast, multichannel audio, completely new types of multimedia signals, etc., will be introduced. In this type of case the resources offered by one apparatus can run out, so it must be possible to divide the processing of signals between a number of different apparatuses. A method is needed with which the synchronization of all signals will succeed also along some communication channel between different apparatuses. This can be called a distributed encoding system or distributed decoding system.

Known in the art also are two Chinese patent publications no. CN1655616 and no. CN1599464, in which it has been attempted to solve the synchronization problem such that audio signals and video signals are combined into one signal that is a so-called hybrid signal, which is encoded using one and the same encoder. There are a number of deficiencies in these methods, however, of which some are presented in more detail in the following.

The weakness in both CN patents is a certain type of conceptual mistake. An audio signal and a video signal are completely different in terms of their properties. In both methods it is attempted to combine the audio signal with the video signal and to use an encoding method designed for a video signal for compressing the combined hybrid signal. Combining the audio signal with the video signal is, as far as the video signal is concerned, increasing the noise (an almost random error) in the video picture. The video encoder, for its part, is designed to best compress a video picture containing wide uniform areas. It endeavors to reduce redundancy at the spatial and temporal level.

One problem is that the encoding process no longer produces a good compression ratio in the solutions presented by the CN patents. Probably encoding a hybrid signal combined with these methods, i.e. compression, produces a much larger bitstream than if a suitable encoding method were used separately for both signals.

The method presented in the first Chinese patent no. CN1655616 is defective. If the method presented is used without correction, both the audio data and the video data become corrupted. Errors are detected in the sound and the image. The method could be rectified fairly easily and thus the errors be avoided, but the quality of the signals in relation to the compression efficiency will suffer considerably in any case.

The method presented in the second Chinese patent no. CN1599464 also seems to be flawed. It would seem that in this method either the audio data is destroyed in a lossy quantization operation or alternatively the quality of the video picture suffers considerably.

The methods presented in the CN patents could, with some small revisions and corrections, be made to work reasonably if there is little audio data. However, if there is a lot of audio data, either the quality of the video or the compression efficiency suffers considerably and a much better end result would be achieved by encoding both signals separately.

The aim of this invention is to eliminate the aforementioned drawbacks and to achieve a simple and operationally reliable method and arrangement for synchronizing digital multimedia signals. The method according to the invention is characterized by what is disclosed in the characterization part of claim 1 and correspondingly the arrangement according to the invention is characterized by what is disclosed in the characterization part of claim 7. Other embodiments of the invention are characterized by what is disclosed in the other claims.

The advantages of the solution according to the invention are presented in the following.

The invention enables the creation of different dependency- relationships between separate multimedia signals. One example worth mentioning is time dependency, which can be used for the synchronized presentation of multimedia signals. The invention enables, among other things, a more precise, more flexible and more fault-tolerant method to implement synchronization for presentation between multimedia signals included in a multimedia presentation. The method also enables many other additional features that are useful for an application. On the other hand, the invention can be utilized in many other different ways also. By means of it, additional information that is valuable to an application can be combined into one or more signals. By means of the additional information, a functionality and/or properties can easily be designed for different applications, the implementation of which would be really laborious without the invention. Some examples of these applications that can be mentioned are balancing of computational load, fast forwarding of a multimedia presentation, prioritization of multimedia signals, etc.

Another advantage is that the invention can be used independently as it is, or in parallel with the methods of existing standards. In the latter case, the invention offers, among other things, more flexibility and more robustness for the application. It can improve or secure the services offered by an application or facilitate the implementation of additional functionalities. The invention can be used in all other appli- cations that utilize multimedia data. An application can be one that already exists, is coming or is still being developed.

The invention enables a more accurate and more robust implemen- tation method for the synchronization of a multimedia presentation. The additional information offered by the method is especially useful in situations in which standard methods fail, e.g. as a result of errors or delays in the transmission of a multimedia presentation or alternatively owing to software errors or hardware errors.

The invention also offers a more precise, more flexible, more dependable and more fault-tolerant way to implement synchronization between the different components (e.g. audio and video) of a multimedia presentation, and also resolves problems that new multimedia applications might cause for the synchronization of audio and video that utilizes time stamp technology. The invention also enables many useful additional services. It does not exclude the utilization of the synchronization methods of the standards and it can be used also in parallel with them, as mentioned earlier.

A really important advantage for the invention is its compatibility with the methods of the standards and with already ex- isting methods. The invention does not need to replace old methods but instead engineers and developers can take it into use if they so desire. If a standard encoding method is used with the invention, generally a small change needs to be made in the standard method so that dependency relationships can be created between different signals.

Compatibility is achieved when the embedding operation according to the invention is implemented such that further process- ing of the signal in the encoding process does not change. In this case the encoding process still produces a bitstream according to the standard, which bitstream can be decoded with all the decoding processes according to the standard. That being the case, existing decoding processes do not need changes. Of course, a small change must be made in the decoding process in those cases in which it is desired to utilize the functionality offered by the invention.

The invention can be used also fully independently, without already existing synchronization methods. In this case the embedding method and extraction method can be implemented much more freely because compatibility with the methods of the standards is not required.

The amount of data required by the synchronization pointer that is utilized according to the invention is extremely small so that it is easy to embed in a number of different signals such that the quality of the signal does not essentially deterio- rate. The more densely the synchronization pointers are embedded, the more certain the ability to recover from error situations and the more precisely the signals can be synchronized with each other. Owing to the small amount of data, also other dependency relationships that give added value, e.g. prioriti- zation, can be created in a multimedia presentation in addition to synchronization between signals.

In error situations it is possible to try to seek the synchronization data also from other signals by means of dependency relationships. In conventional methods, the presentation time data is obtained from the time stamp. In this new method the presentation time data can be obtained e.g. from the payload of the synchronization pointers, from the properties (e.g. the sampling rate) of the signals, from other signals, from existing time stamps or from existing header fields, et cetera.

An advantage of the invention in terms of the future is also its scalability to the needs of different applications. It can be utilized both in small appliances, such as video cameras and mobile devices, as well as in large systems, such as e.g. distributed coding systems .

Advantages of the invention when used with the methods of the standards :

• second-level synchronization (more precision and fault- tolerance)

• easy to implement additional functionality/additional fea- tures

• possibility for communication between synchronization units

Advantages of the invention when used independently:

• the packetizing of the methods of the standards is not needed, i.e. the overload caused by the header fields of the packetizing of signals can be reduced. Header fields are not necessarily needed if the synchronization information travels inside the multimedia signals.

• the less encapsulation of data is needed inside the header fields in the different phases of the coding process, the fewer bits need to be used for storing or transferring a presentation .

• in mobile applications and in low bit rate applications certain standards can load quite much of the total bitstream with their header fields.

• flexible synchronization and scalability from small applications through to large professional applications. In the following, the invention will be described in more detail by the aid of one example of its embodiments with reference to the attached drawings, wherein

Fig. 1 presents a simplified diagram of one method according to prior art for encoding multimedia signals,

Fig. 2 presents a simplified diagram of one method according to prior art for decoding multimedia signals,

Fig. 3 presents a simplified diagram of one method for encoding multimedia signals, in which method the solution according to the invention is applied, Fig. 4 presents a simplified diagram of one method for decoding multimedia signals, in which method the solution according to the invention is applied,

Fig. 5 presents a simplified diagram of an embedding unit according to the invention and of a synchronization unit of the encoding phase,

Fig. 6 presents a simplified diagram of an extraction unit according to the invention and the synchronization unit of the decoding phase, Fig. 7 presents a simplified diagram of an encoder, in which the solution according to the invention is used, and

Fig. 8 presents one method of embedding synchronization data.

As mentioned earlier, the main application of the invention can be considered to be the synchronization of a digital multimedia presentation. The aim is in this case that the synchronization information of different multimedia signals can be recorded, maintained and stored, starting from the moment the multimedia presentation is created until the time of presentation.

Figure 1 presents a simplified diagram of a method according to the prior-art MPEG-2 standard for encoding multimedia signals. The figure presents the encoding of a video signal 1 and an audio signal 2. The video signal 1 is encoded, i.e. compressed, at first separately with a video encoder 3 and the audio signal 2 with an audio encoder 4. After this the encoded video signal 1 is sliced and packetized into video packets Ia (Video PES) and the encoded audio signal into audio packets 2a (Audio PES) . In connection with packetizing header data is attached in connection with each packet Ia and 2a, which header data contains, among other things, time data, by means of which the signals can be synchronized when they are unpacked. Finally, both packets are combined into one bitstream by means of a multiplexer 5a or 5b. When combining a bitstream, in terms of type either a Program Stream 6a (PS) or a Transport Stream 6b (TS) is made from it, depending on how it is intended to be used. For example, data to be recorded on a DVD disk is of the PS type and a digital TV broadcast is of the TS type .

Figure 2 presents a simplified diagram of a method according to the prior-art MPEG-2 standard for decoding multimedia signals. At first the different video packets Ia and audio packets 2a are separated from the datastream (PS or TS) to be unpacked by means of a demultiplexer 7. After this the video packets Ia are decoded by means of the video decoder 9 and the audio packets by means of the audio decoder 10. By means of the time data in the header data of the packets and by means of the clock control 8, the signals are made to be synchronized, in which case the outgoing video picture 11 and also sound 12 are reproduced at the correct time to each other.

Fig. 3 presents a simplified diagram of a method for encoding multimedia signals, in which method the solution according to the invention for synchronizing signals is used. The method is otherwise similar to prior-art MPEG-2 encoding, which was described in Fig. 1, except for the embedding of synchronization information in the multimedia signals themselves. In this method a synchronization unit 25 is used, which synchronization unit receives from the video encoder 23 and from the audio encoder 24 the necessary data by means of which it determines the synchronization data and returns them to the encoders. The video encoders 23 comprise an embedding unit 26a of the video signal and the audio encoder 24 comprises an embedding unit 26b of the audio signal. The embedding units 26a and 26b embed synchronization data in the data signals 21 and 22 that are the multimedia signals. The embedded data are e.g. additional information to the actual multimedia information. Embedding the synchronization data in the signals is described in more detail below. After embedding, the signals are sliced into packets 21a and 22a and the packets are combined into a bitstream 28a or 28b by means of a multiplexer 27a or 27b.

Figure 4 presents a simplified diagram of a method according to the prior-art MPEG-2 standard for decoding multimedia signals, in which method the solution according to the invention for synchronization signals is used. At first the different video packets 21a and audio packets 22a are separated from the data- stream (PS or TS) to be unpacked by means of a demultiplexer 29. After this the video packets 21a are decoded by means of the video decoder 31 and the audio packets 22a by means of the audio decoder 32. The decoders contain extraction units 33a and 33b, which extract the synchronization data embedded in the packets and send them to the synchronization unit 34. By means of the synchronization unit 34 and the clock control 30, the signals are made to be synchronized, in which case the outgoing video picture 35 and also sound 36 are reproduced at the correct time to each other.

Figure 5 presents a simplified diagram of an embedding unit 2βa, 26b according to the invention and of a synchronization unit 25 of the encoding phase. The embedding unit 26a of the video signal and the embedding unit 26b of the audio signal contain embedding means 37, by the aid of which the slice 21b of multimedia signal to be processed in the embedding unit 26a, 26b is converted such that the synchronization pointer 38 re- ceived from the synchronization unit 25 is made to be embedded in the slice 21b of the signal. The synchronization pointer 38 is e.g. synchronization information that is used in synchronization and which thus enables synchronization. It is therefore at the same time additional information needed for synchroniza- tion. The changes made to the slice 21b of the signal thus describe the information contained in the synchronization pointer 38. The embedding means 37 are different for different signal types. For a certain signal type there must be embedding means suited to the signal type in question. The embedding unit 26a, 26b thus communicates with the synchronization unit 25, requesting from it e.g. a new synchronization pointer 38 for embedding or permission to continue encoding procedures without a new synchronization pointer 38. A synchronization pointer 38 is not necessarily put at every point, but instead only at the points suggested by the synchronization unit 25. After the processing of the embedding unit 26a, 26b the edited slice 21c of the signal is transmitted onwards. The synchronization unit 25 thus synchronizes the different encoding processes so that they work simultaneously. It receives the time data 40 needed for synchronization either from an exterior source or from the system clock time. The synchro- nization unit 25 comprises synchronization means 39, by the aid of which synchronization itself occurs. The synchronization means 39 use methods that are generally known in the art for the synchronization of signals of different types. The synchronization means 39 can e.g. delay the progress of certain encod- ers until slower encoders have reached the same stage. They can also give more computation time to slower encoders, e.g. to video encoders. The synchronization unit 25 can also contain communication means 41, by the aid of which it can be in contact with other synchronization units, if such are used in the arrangement. Communication is necessary when more than one synchronization unit 25 is in use.

Figure 6 presents a simplified diagram of an extraction unit 33a, 33b according to the invention and of a synchronization unit 34 of the decoding phase. The video extraction unit 33a and the audio extraction unit 33b comprise extraction means 42, by the aid of which the information embedded in the slice of the signal, i.e. the synchronization pointer, can be extracted and read from the slice 2Id of the signal that is to be proc- essed. The information of the synchronization pointer is conveyed to the synchronization unit 34, which comprises synchronization means 43 for synchronizing the different signals. The extraction unit 33a, 33b and the synchronization unit 34 communicate among themselves such that the extraction unit sends the synchronization pointer 38 to the synchronization unit and if there is no synchronization pointer to give at the particular moment, the extraction unit 33a, 33b can request from the synchronization unit 34 permission to continue decoding proce- dures. The synchronization unit 34 gives permission to continue decoding and can also in certain cases request that the decoding process to which the extraction unit is connected stops its operation. This type of request can be necessary e.g. in appli- cations in which sufficient computing power is not in use for decoding each signal, in which case only some of the signals can be decoded.

The synchronization unit 34 can also comprise communication means 44, by the aid of which it can be in contact with other synchronization units, if such are used in the arrangement. In addition, the synchronization unit 34 can transmit presentation time data 45 for the use of exterior operations if there is a need for this.

Fig. 7 presents a simplified diagram of one part of the functions of an encoder applicable to compress visual data, in which encoder the method according to the invention is used. The image data, i.e. pixels, of visual data are generally pre- sented in RGB color space. Often a conversion of color space from RGB space to YCbCr space is performed at first, after which a subsampling of the chrominance components Cb and Cr is performed. After this the image material to be processed is partitioned in the segmentation phase 46 into blocks of a cer- tain size. Each color component Y, Cb and Cr is processed as separate block of KxL pixels, where K is the height of the block and L is the width of the block. The most generally used block size is 8*8 pixels. Each slice of a signal is formed from one or more blocks. After segmentation a discrete cosine trans- form 47 is performed for each block of KxL pixels, by means of which transform coefficients can be calculated for each pixel from the block to be processed. After this quantization 48 of the coefficients is performed, in which case most of the coefficients produced by the transform are zeroed. Next the embedding of the synchronization data and of possible other payload data is performed by means of the embedding unit 26a and the synchronization unit 25. Finally, zigzag scanning 49 is performed, after which a 2-dimensional run-length code is constructed and finally in the entropy encoding stage 50 these 2-dimensional code pairs are written into the bitstream e.g. as Huffman codes or corresponding. The em- bedding phase is positioned after the quantization phase 48 because there is a lot of loss in quantizing. If the embedding were performed before it, the quantization could change data so that in the decoding phase the synchronization data would no longer be recognized. If it is desired to perform the embed- ding before the quantizing, it must be taken into account that the embedded information is preserved through quantization. This can be implemented by slightly changing the embedding method. Instead of it, the embedding phase can well be performed after the zigzag scanning 49, in which case there is no loss problem. The embedding of the synchronization pointer 38, however, must be done at the latest before the decoding phase.

Fig. 8 presents one method, the so-called Changing Parity method, for embedding synchronization data in a multimedia signal, i.e. more precisely in a slice of the multimedia signal. In this method the idea is to change, if necessary, the calculated coefficients that deviate from zero towards a zero value into odd or even numbers depending on the information that we embed, i.e. in this case, for example, depending on the synchronization pointer. According to the example, the embedding method inserts one bit of binary information per coefficient. The most important of the coefficients is the first, i.e. the so-called DC term, because it describes the weighted average of the whole KxL block. For this reason it is not desired to use it in the embedding and therefore in the method a so-called agreed action span [M, N] is used, which starts from an agreed ordinal M coefficient, i.e. element, onwards and which contains an agreed amount of coefficients up until the ordinal N coefficient. The need to change coefficients in connection with embedding depends on the information to be embedded. An application- specific agreement is made about what action span [M, N] is used and how many bits the synchronization pointer to be embedded contains. The synchronization pointer can contain, among other things, identifier data and payload data. Likewise, an agreement is made for the bit presentation of the synchronization pointer, e.g. in this example it is agreed that bit "1" of the synchronization pointer corresponds to an odd number and bit "0" corresponds to an even number.

In Fig. 8, the first row of coefficients presents one coded image block after the quantization phase 48. Since an image block is KxL pixels, the total number of coded transform coefficients, i.e. elements, is K*L units. For example, a block 8x8 in size contains 64 coefficients. Fig. 8 shows the first and the last factors of the coefficients as well as a few between them. On the second row of coefficients an action span [M, N] is presented, in which the embedding is performed. Correspondingly, on the lowermost coefficient row the coefficients after embedding are presented.

In the example, the following four bits are embedded: 1, 1, 0 and 0 for the action span [M, N] starting from the coefficient M, the value of which is -6. Since the first bit to be embedded is "1", which corresponds to an odd number, the coefficient -6 must be changed to be odd and in the direction of zero, thus to the coefficient -5. The second bit to be embedded is "1", in which case the next coefficient, i.e. the coefficient 3 does not need to be changed because it is already an odd number. The third coefficient is 0, so it is not taken into account. The fourth coefficient starting from the coefficient M is -5, but the next bit to be embedded is "0", which corresponds according to the agreement to an even number, so the coefficient -5 must be changed towards zero into the number -4. The next coefficient is again 0, so it is not taken into account and the coefficient corresponding to the last bit to be embedded is 2, which is already even, in which case it does not need to be changed. Thus the coefficients that deviate from zero in the action span [M, N] were changed as follows:

Before embedding: ...-6 3 0 -5 0 2 ... 1 -1 After embedding: ...-5 3 0 -4 0 2 ... 1 -1

This embedding method thus changes the values of the coefficients according to the bit presentation of the synchronization pointer if necessary into either odd or even values. The change must be made only when the bit presentation of the synchronization pointer does not correspond to parity. Additionally, the change is made towards zero value, because writing a smaller numerical value into the bitstream generally takes fewer bits. Coefficient values 1 and -1 are exceptions, because they cannot be changed to be zero value since in this case the extraction method would not be able to extract them. For this reason the coefficient values 1 and -1 are converted if necessary to the values 2 and -2. In the extraction phase, in connection with decoding simply the four coefficients starting from the M coefficient that differ from zero are checked, and from the values of them the four bits embedded as agreed in advance are deduced. The following coefficients are in the action span [M, N] after embedding:

After embedding: ...-5 3 0 -4 0 2 ... 1 -1

From this is obtained: -5, odd => 1 3, odd => 1

-4, even => 0

2, even => 0

i . e . the same as the embedded code : 1 1 0 0.

A common aspect in what is explained is that the main idea of the invention is to embed so-called synchronization pointers 38 into two or more slices 21b of the multimedia signals, which synchronization pointers contain at least certain identifier data and, according to need, also a payload. Generally the slices of signals that correspond to each other, i.e. the counterparts, are in different multimedia signals at least when synchronizing the signals. In this case the counterparts are e.g. in a video signal 21 and an audio signal 22. In a fast forwarding application of a multimedia presentation the counterparts can also be only in one signal, i.e. for example just in the video signal 21.

By means of the identifier data of the synchronization pointer 38 the dependency relationship between the counterparts is marked, i.e. different counterparts are connected to each other by embedding in them the same identifier data. The decoding system is able to extract and read the embedded identifier data. On the basis of the identifier data the decoding system is able to determine which slices in different signals contain the same identifier data and thus correspond to each other. The dependency relationships, identifier data and payloads in use are application-specific agreements that must be made when designing the encoding system and decoding system.

It is obvious to the person skilled in the art that the invention is not limited to the example described above, but that it may be varied within the scope of the claims presented below. Thus, for example, any other method whatsoever than the method described in the embodiment can be used as an embedding method.

It is also obvious to the skilled person that the embedding can be performed in another point of the encoding than directly after quantization. It can also be performed before quantization or one or more phases after quantization. What is most important is that an embedding method applicable to a certain signal and coding process is always used, i.e. a method which can guarantee that the embedded data can be extracted and read at the receiving end or presentation end. The embedding method to be used can be designed and implemented for any point whatsoever from the time a presentation is created up to the time it is presented.

Claims

1. Method for synchronizing digital multimedia signals, in which method digital multimedia signals (21, 22) are at least encoded into a plurality of slices (21b) of the multimedia signal, are decoded and are synchronized, characterized in that a synchronization pointer (38) is embedded in two or more slices (21b) of the multimedia signal before the decoding phase .

2. Method according to claim 1, characterized in that the synchronization pointer (38) comprises at least identifier data, -by means of which the desired slices (21b) of multimedia signals are marked with respect to their dependency relation- ship to each other by embedding the same identifier data in them.

3. Method according to claim 1 or 2, characterized in that a synchronization pointer (38) of the same -content- -i-s~embedded in one or more slices (21b) of the multimedia signal in at least two different multimedia signals (21, 22) .

4. Method according to any of the preceding claims, characterized in that in connection with the embedding of a synchronization pointer (38) payload data is embedded in the slices (21b) of the multimedia signal.

5. Method according to any of the preceding claims, characterized in that the synchronization pointer (38) is embedded in a slice (21b) of a multimedia signal in the encoding phase after the quantization phase (48) or corresponding phase.

6. Method according to any of the preceding claims, character- ized in that the information of the synchronization pointer (38) embedded in each slice (21b) of a multimedia signal is extracted at the latest in connection with decoding by means of an extraction unit (33a, 33b) .

7. Arrangement for synchronizing digital multimedia signals, which arrangement comprises at least one encoder (23, 24) , in which multimedia signals are at least encoded into a plurality of slices (21b) of the multimedia signal, and at least one decoder (31, 32), in which multimedia signals are decoded, characterized in that a synchronization unit (25) that acts on at least one multimedia signal (21, 22) is connected to the arrangement before the decoder (31, 32) via the embedding unit (26a, 26b) .

8. Arrangement according to claim 7 , characterized in that a synchronization unit (25) is connected to the encoder (23, 24) via an embedding unit (26a, 26b) , of which embedding units (26a, 26b) there is at least one for each multimedia signal (21, 22) to be processed.

9. Arrangement according to claim 7 or 8, characterized in that the embedding unit (26a, 26b) comprises at least embedding means (37) for embedding the synchronization pointer (38) received from the synchronization unit (25) in a slice (21b) of the multimedia signal.

10. Arrangement according to claim 7, 8 or 9, characterized in that the synchronization pointer (38) comprises at least identifier data, by means of which the desired slices (21b) of a multimedia signal in either one or more multimedia signal (21, 22) are arranged to be marked with respect to their de- pendency relationship to each other by embedding the same identifier data in them.

11. Arrangement according to any of claims 7-10 above, charac- terized in that the synchronization pointer (38) comprises payload data, which is arranged to be embedded in the slices (21b) of the multimedia signal in connection with embedding the identifier data.

12. Arrangement according to any of claims 7-11 above, characterized in that the embedding unit (26a, 26b) is connected to an encoder (23, 24) after the quantization phase (48) or corresponding phase.

13. Arrangement according to any of claims 7-12 above, characterized in that an extraction unit (33a, 33b) provided with extraction means (42) is connected in connection with the decoder (31, 32) for at least one multimedia signal to be processed, which extraction unit (33a, 33b) is arranged to extract the information of the synchronization pointer (38) embedded in each slice (21b) of the multimedia signal in connection with decoding.