WO2002052858A1

WO2002052858A1 - Feedback control to encoder

Info

Publication number: WO2002052858A1
Application number: PCT/GB2001/004890
Authority: WO
Inventors: Simon Durrant; Pat Mulroy; Julian Lake
Original assignee: Pa Consulting Services Limited
Priority date: 2000-12-22
Filing date: 2001-11-05
Publication date: 2002-07-04
Also published as: GB0031539D0

Abstract

A method of transmitting a compressed video data stream (CVDS) generated by an encoder from a transmitter to a receiver over an end to end communications link, said end to end link including a transmit side wireless channel between the transmitter and the receiver, the method including the steps of: ascertaining a quality parameter indicative of the transmission capability of the wireless channel between the transmitter and the receiver; and at the transmitter, altering the bit rate of the compressed video data stream generated by the encoder in dependence on the quality parameter and other information.

Description

FEEDBACK CONTROL TO ENCODER

FIELD OF THE INVENTION

The present invention relates to video streaming, and, more particularly, to the streaming of video data over a wireless communications network.

The invention has been developed primarily to allow video to be streamed in a UMTS or GPRS mobile telecommunications network using streamable formats such as MPEG-4 and H.263. However it will be appreciated by those skilled in the art that the invention is not limited to use with those particular standards.

BACKGROUND TO THE INVENTION

Systems currently exist to allow video data to be supplied from a video source to a mobile receiver with a display and incorporating a decoder capable of "playing" the video on the display. One of the difficulties with the use of a mobile receiver is the variation in the quality of the wireless communications link from the video source to the mobile receiver.

In such prior art streaming video applications, a first-in-first-out replay buffer is often used at the receiver to reduce the effect that temporary aberrations in the available bandwidth may have on video replay. This involves delaying commencement of replay of the video stream at the receiver whilst the replay buffer is first filled with video data. Once a predetermined amount of video data is in the buffer (typically several seconds of video), replay commences from the buffer whilst data continues to be received. Thus, the image is not viewed in "real time" - there is considerable latency.

If, during replay, the bandwidth is suddenly reduced to below the rate at which the video is replaying, the amount of data in the buffer will reduce. Notwithstanding the bandwidth reduction, the video data in the buffer will still be streamed out at the full rate to the display, so the reduction in bandwidth is not apparent to anyone viewing the replay. However, if the available bandwidth does not rise to the video data rate or above the buffer will eventually empty, resulting in visible freezing, stuttering or blanking of the video stream. Only when the buffer has refilled again to the predetermined amount can the replay process be continued. However, the user will have suffered a serious interruption to viewing. If the available bandwidth continues to remain at a reduced level, the video source can be instructed to transmit data at a reduced bitrate. However, if the available bandwidth continues to fluctuate, the user will continue to experience repeated freezing in the video stream.

Whilst buffering works well in cases where a constant delay is not a critical issue and where bandwidth is reasonably constant, it is less acceptable in applications where minimal delay is important, such as video telephony. In such cases, any buffering used should be minimised to minimise the latency. This in turn further increases the chance of undesirable stuttering and freezing if available bandwidth drops for a given time period.

The problem is exacerbated in a mobile telecommunications context, where bandwidth variations may be significant and sustained and where the provision of video memory solely for buffering adds to the cost and complexity of mobile handsets.

SUMMARY OF INVENTION

In a first aspect, the present invention provides a method of transmitting a compressed video data stream (CVDS) generated by an encoder from a transmitter to a receiver over an end to end communications link, said end to end link including a transmit side wireless channel between the transmitter and the receiver, the method including the steps of: ascertaining a quality parameter indicative of the transmission capability of the wireless channel between the transmitter and the receiver; and at the transmitter, altering the bit rate of the compressed video data stream generated by the encoder in dependence on the quality parameter and other information.

Preferably, the end to end link comprises more than one transmit side wireless channels where: each channel has an associated quality parameter; the CVDS is divided into a plurality of substreams with each substream transmitted over a separate wireless channel.

Preferably, the transmitter includes a controller and a transmitter air interface, the method including the steps of: determining the quality parameter at the transmitter air interface; supplying to the controller interface data derived from the quality parameter; and sending to the encoder control instructions from the controller to effect their alteration of the bit rate based on said interface and possibly other data.

In a preferred form, the control instructions define one or more encoding factors selected from: a number of video substreams; a mapping of video encoder frame types onto each substream; and per substream: an available data rate; a video frame rate; a video encoding quantisation factor; and a video frame resolution; the encoder effecting the alteration of the bit rate of the CVDS by applying said one or more encoding factor.

Preferably, the video data stream comprises a plurality of frames of differing types corresponding to each final video image frame, and wherein the ratios of said frames of differing types is alterable by the encoder.

It is preferred that the quality parameter includes a transmissible data rate capability of the wireless communications channel.

Preferably the controller at the transmitter is supplied with additional information from the receiver regarding the quality parameters of the overall end to end link between transmitter and receiver.

In a second aspect, the invention provides a transmitter for transmitting a compressed video data stream to a receiver over an end to end communications link, said end to end link including a transmit side wireless channel between the transmitter and the receiver, the transmitter comprising: a wireless interface arranged to determine a quality parameter indicative of the transmission capabilities of the transmit side wireless channel; an encoder arranged to generate said compressed video data stream; and a controller operatively connected to the wireless interface and to the encoder for altering the bit rate of the compressed video data stream generated by the encoder in dependence on the quality parameter and possibly other information.

Preferably, the end to end link comprises more than one transmit side wireless channels wherein: the quality parameter may be different for each channel; the CVDS may be divided into one or a plurality of substreams with each substream transmitted over a separate wireless channel.

In a preferred form, the transmitter includes a controller and a transmitter air interface, the transmitter being configured to: determine the quality parameter at the transmitter air interface; supply to the controller interface data derived from the quality parameter; and send to the encoder control instructions from the controller to effect their alteration of the bit rate based on said interface and possibly other data.

Preferably, the control instructions define one or more encoding factors selected from: a number of video substreams; a mapping of video encoder frame types onto each substream; and per substream: an available data rate; a video frame rate; a video encoding quantisation factor; and a video frame resolution; the encoder effecting the alteration of the bit rate of the CVDS by applying said one or more encoding factor.

In a preferred form, the video data stream comprises a plurality of frames of differing types corresponding to each final video image frame, and wherein the ratios of said frames of differing types is alterable by the encoder.

Preferably, the quality parameter includes a transmissible data rate capability of the wireless channel.

Preferably, additional information is supplied to the transmitter controller from the receiver regarding quality parameters and other information of the overall end to end communications link between transmitter and receiver.

In one form, the CVDS includes audio and visual content.

In a particularly preferred embodiment, the end to end communication link uses a UMTS communications network. By altering the bit-rate from the encoder in dependence upon the quality parameter in this way, relatively fast control of the output of the encoder is achieved. This acts to prevent or at least ameliorate undesirable delay due to the channel's bandwidth dropping below that of the CVDS. There may be a visible reduction in perceived quality of video replay, whether in terms of frame rate or spatial resolution. However, this will usually be an acceptable compromise because the video stream also remains substantially real-time, unlike in the prior art case.

BRIEF DESCRIPTION OF DRAWINGS

Preferred embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:

Figure 1 is a simplified schematic diagram of a UMTS communication system showing a network and two mobile handsets, where one handset is the transmitter and the other a receiver, for streaming video;

Figure 2 is a simplified schematic diagram of a UMTS communication system showing a network and two mobile handsets, where each handset is simultaneously a transmitter and a receiver for conversational video;

Figure 3 shows the construction of a UDP/IP packet containing MPEG-4 video payload data;

Figure 4 shows the construction of a UDP/IP packet containing RTCP data;

Figure 5 shows the construction of multiple substreams with a common IP address and different UDP addresses per substream;

Figure 6 shows the construction of a TCP/IP packet containing RTSP and SDP data;

Figure 7 shows the construction of a TCP/IP packet containing SIP and SDP data;

Figure 8, Figure 10 and Figure 12 show single and multiple RTP/RTCP sessions and an RTSP session (also labelled more generically as IP Sessions), mapped onto single and multiple wireless channels for streaming video; Figure 9, Figure 11 and Figure 13 show single and multiple RTP/RTCP sessions and a SIP session (also labelled more generically as IP Sessions), mapped onto single and multiple wireless channels for conversational video;

Figure 14 shows Quality of Service (QoS) parameters associated with the wireless channels between mobile handsets and networks, and the sending of these and other QoS parameters between the mobile handsets via IP sessions for streaming video;

Figure 15 shows QoS parameters associated with the wireless channels between mobile handsets and networks, and the sending of these and other QoS parameters between the mobile handsets via IP sessions for conversational video;

Figure 16 and Figure 17 shows typical mapping of compressed video data stream frames onto base and enhanced substreams;

Figure 18 is a flowchart showing sequential operations performed by the encoder controller;

Figure 19 is a flowchart showing sequential operations performed by the decoder controller; and

Figure 20 is a flowchart showing sequential operations performed by the encoder.

DETAILED DESCRIPTION OF PREFERRED AND OTHER EMBODIMENTS

The preferred embodiment of the present invention is applied to a network and associated mobile handsets designed to operate under the current GPRS or proposed UMTS standard.

Referring to the drawings, and Figure 1 in particular, there is shown a UMTS network 100 that is used to establish an end-to-end link between a first mobile handset 102 and a second mobile handset 104. The communication session between the first mobile handset 102 and the second mobile handset 104 is unidirectional. Mobile handset 102 is acting solely as a transmitter of compressed video data, whilst mobile handset 104 is acting solely as a receiver of compressed video data. This is termed the streaming arrangement. In the embodiment shown in Figure 2, the communication session between the first mobile handset 142 and the second mobile handset 144 is bi-directional. Mobile handset 142 is acting both as a transmitter and a receiver of compressed video data, and mobile handset 144 is also acting as a transmitter and receiver of compressed video data. This is termed the conversational arrangement. In Figure 2 the functions and operations of the second mobile handset 144 are identical to the functions and operations of the first mobile handset 142.

It will be appreciated that the transmitter function and operation of mobile handsets 142 and 144 is identical to that of the transmitter function and operation of mobile handset 102. It will also be appreciated that the receiver function and operation of mobile handsets 142 and 144 is identical to that of the receiver function and operation of mobile handset 104. In the case of mobile handsets 142 and 144, the receiver and transmitter functions and operations are present within the same mobile handset. In mobile handsets 102 and 104 only the transmitter or the receiver function is present respectively. It will be appreciated that the mobile handsets 142 and 144 can also operate exclusively as transmitters or receivers to produce the streaming arrangement, in addition to their conventional conversational arrangement, if so configured.

In Figure 1 the first mobile handset includes a transmitter controller 108, a Real Time Streaming Protocol (RTSP) server 116, an encoder 106, a Real Time Protocol (RTP) packetiser 117, a Real Time Control Protocol (RTCP) client 119 and a transmitter air interface 110, which are operatively interconnected with each other as shown. The encoder 106 accepts raw video data (RVD) from a video source such as a camera (not shown) associated with the first mobile handset 102 and encodes it into a compressed video data stream (CVDS) format, as discussed in detail below. This is then packetised by the RTP packetiser 117. According to the normal operation of a mobile communications network, the transmitter air interface 110 establishes a wireless channel with a transmit side air interface 112 in the network 100, which in turn is in communication with a network backbone 114.

The network 100 also includes a receive side air interface 118 that establishes a wireless channel with a receiver air interface 120 disposed in the second mobile handset 104. The second mobile handset 104 also includes a receiver controller 122, an RTSP client 123, an RTP depacketiser 125, an RTCP server 128 and a decoder 124. These are operatively interconnected with each other as shown.

In Figure 2 the mobile handsets 142 and 144 includes all the components that are present in both the mobile handset 102 for the transmitter function, with the exception of the RTSP server 116, and the mobile handset 104 for the receiver function, with the exception of the RTSP client 123. The RTSP server 116 is replaced by a SIP User Agent (UA) 146 in mobile handset 142 and the RTSP client 123 is replaced by a SIP UA 148 in mobile handset 144. The components of the transmitter function, the components of the receiver function, the User Agent and the air interface for mobile handsets 142 and 144 are operatively interconnected with each other as shown. It will be appreciated that the function and operation of the mobile communications network shown in Figure 2 is as above.

In a UMTS network, the mobile handsets 102, 104, 142 and 144 are designated User Equipment (UE), the air interface elements 112 and 118 correspond to the Universal Terrestrial Radio Access Network (UTRAN), the backbone element 114 corresponds to the Core Network (CN).

According to normal mobile communications network operation, an end to end link is established between the first and second mobile handsets 102, and 104 (or 142 and 144), comprising a first wireless channel 126 between the first mobile handset and the network and a second wireless channel 127 between the network and the second mobile handset. Wireless channels are established using different frequencies and/or spreading codes and/or time slots in a manner well known in the mobile communications art. They allow for bi-directional communication, both for data and control information.

As part of the wireless channel control information, the wireless channel 126 between the transmitter air interface 110 and the transmit side air interface 112 carries Quality of Service parameters (QoS), from the network 100 to the first mobile handset 102 (or 142). Similarly, the wireless channel 127 between the air interface 118 and the receiver side air interface 120 carries Quality of Service parameters (QoS), from the network 100 to the second mobile handset 104 (or 144).

The packetised CVDS is transmitted over the end to end link defined between the two mobile handsets 102 and 104 (or 142 and 144), across wireless channels 126 and 127. In the preferred embodiment, the CVDS takes the form of an MPEG-4 stream, but other suitable streaming formats, such as H.263 can also be used. Both of these standards are applicable to variable bitrate and low bitrate video, e.g. bitrates of 10kbps or higher. It is particularly preferred that the transmission be in RTP format, as discussed in detail below in relation to Figure 3, Figure 4, Figure 5.

Turning to Figure 3, the packetisation of the raw MPEG-4 data for transmission is shown. This packetisation takes place in the first mobile handset 102 or 142 under the control of the transmitter controller 108 before sending the packets to the transmitter air interface 110. Upon emerging from the wireless network of the transmitter the packets travel over a packet switched network to the wireless network of the receiver. Here the packets are sent to the receive side air interface 118 and on to the second mobile handset 104 or 144 via wireless channel 127.

Figure 3 shows the packetisation layers for a single MPEG-4 packet 200 in the form in which it leaves the encoder 106. It will be appreciated that a stream of such packets will be generated from the incoming RVD. The MPEG-4 video data 200 is wrapped in an RTP format packet layer 201. This, in turn, is wrapped in a UDP format packet layer 202, which in turn is packetised into an Internet Protocol (IP) packet 203. It is this IP packet 203 that is presented by the RTP packetiser 117 to the transmitter air interface 110 for transmission over the wireless channel. This layering format, defined by IETF, is known and is included in the developing UMTS standard for video transmission.

Each of the packetisation layers of the packet is directed to a particular part of the overall communication. They will not be described in detail because they are already known in the art and conform to the respective standards. However the principal component of each packet will be described insofar as is necessary to understand the embodiments of the invention that follow. The MPEG-4 layer 200 contains the coded video data. The RTP layer 201 contains sequence numbers, time stamps, and payload bits that enable a depacketiser and decoder to decode it and replay at the correct time and in the correct sequence in relation to other packets from the same stream. The UDP layer is used for asynchronous communication of the data over the wireless communications channel and is a "best effort" connectionless protocol. The IP packet 203 contains an IP address which identifies the mobile receiver 104 or 144 as the destination. The IP packet header may also contain a Differentiated Services Code Point (DSCP) which could be used by a diffserv-enabled core network to determine how that packet should be forwarded by nodes inside that network.

Figure 4 shows the packetisation layers for a single RTCP packet 205 in the form in which it leaves the RTCP server 128. The RTCP packet is wrapped in an UDP format packet layer 206, and packetised into an IP packet 207.

Turning to Figure 5, the CVDS can be transmitted over the wireless channels in one or multiple substreams, each transported by an RTP session, (and an associated RTCP session) where these are mapped to one or multiple wireless channels that may have different quality parameters. In this case the IP address 208 is common to both substreams. Routing through the transmission chain is achieved by characterising different substreams by different socket numbers 210 and 212 in the UDP address.

It will be understood that some of the receiving handsets, or, indeed, the transmitting handset may not be capable of forming a multi wireless channel connection with the network. This may be because of equipment incompatibilities or network resource issues, for example. In this case, it is still possible for the video layers to be allocated to multiple wireless channels in accordance with the above embodiment, whilst multiplexing the video onto a single wireless channel for those not capable of forming the requisite multi wireless channel connection.

In the streaming arrangement of Figure 1 , the IP packet of Figure 3 is transmitted directly from the first mobile handset 102, via the network 100, to the second mobile handset 104. Once received at the receiver air interface 120, the packets are forwarded to the RTP depacketiser 125, where the MPEG-4 data 200 is re-constructed. The packets must be re-ordered using RTP layer data 201 such as frame timestamps and the data from the plurality of substreams must be re-assembled. The reconstructed MPEG-4 data 200 is then sent from the RTP depacketiser 125 to the decoder 124, where it is decoded for replay on, for example, a visual display (not shown) on the second mobile handset 104.

In the return direction (i.e., from the second mobile handset 104 to the first mobile handset 102 in this example), control information is returned in accordance with the known RTCP and RTSP protocols, with the latter using the known Session Description Protocol (SDP).

Figure 8 and Figure 10 show the IP sessions between a streaming server and client. The RTCP sessions provide feedback on the data transmission quality for each RTP session. RTSP additionally provides an overriding control connection from RTSP client 123 to RTSP server 116, using SDP to provide a description of the connection between client and server. It is known in the art that, when a session changes, an RTSP control packet containing a new SDP packet is sent to the remote entity.

The packetisation of the SDP and RTSP information is shown in Figure 6. The SDP information 306 is wrapped in an RTSP packet 300. An RTSP packet 300 is wrapped in a Transport Control Protocol (TCP) packet 302, which is within an IP packet 304. This packet is built up by the RTSP client 123 and supplied to the receiver air interface 120 for transmission to the receive side air interface 118 of the network 100. The destination address of this RTSP packet is that of the first mobile handset 102.

In the preferred embodiment shown in Figure 1 , the RTSP packet passes to the transmit side air interface 112 via the backbone 114 for transmission to the transmitter air interface 110, to the RTSP server 116 and thence to the transmitter controller 108. Also, control information is exchanged between both mobile handsets 102 and 104 in accordance with the known RTCP protocol.

In the conversational arrangement of Figure 2 the IP packet of Figure 3 is transmitted and received between the mobile handsets 142 and 144, via the network 100, in the same way as for the streaming arrangement described above. The encoded video packet is also processed in the same way as for the streaming arrangement described above.

For the control of bi-directional communications sessions the Session Initiation Protocol (SIP) is used instead of RTSP. This is managed by user agents 146 and 148 in mobile handsets 142 and 144 respectively. SIP uses SDP to provide a description of the connection between two peer user agents. It is known in the art that, when a session changes, a SIP control packet containing a new SDP payload is sent to the peer user agent.

Figure 9 and Figure 11 show the IP sessions between two peer user agents 146 and 148. The SIP session provides an overriding connection control between user agents 146 and 148 using the SDP. Again, control information is also exchanged between both mobile handsets 142 and 144 in accordance with the known RTCP protocol. The RTCP session will provide feedback on the data transmission quality for each RTP session.

The packetisation of the SIP information is shown in Figure 7, where the packet is denoted 308. An SDP payload 307 is encapsulated within the SIP packet 308. A SIP packet 308 is wrapped in a User Datagram Protocol (UDP) packet 310, which is within an IP packet 305. This packet is built up by the user agent 148 and supplied to the air interface 127 for transmission to the receive side air interface 118 of the network 100. The destination address of this SIP packet is that of the other mobile handset 142. It will be understood that user agents 146 and 148 are interchangeable in this scenario.

In the preferred embodiment shown in Figure 2, the SIP packet passes to the transmit side air interface 112 via the backbone 114 for transmission to the transmitter air interface 126 and thence to the transmitter controller 108 via the user agent 146.

In use, a user of, say, the first mobile handset, 102 or 142, places a call to the second mobile handset 104 or 144, by dialling the second handset's mobile number.

The first mobile handset's number is mapped to a first IP address taken from a pool of IP addresses, and the second mobile handset's number is mapped to a second IP address taken from the pool of IP addresses. This mapping persists for as long as the connection is maintained. Once the connection is broken, usually because one or both of the users hang up, the mapping is removed and the IP addresses are returned to the pool for reuse. It will be appreciated that this arrangement means that packets can be routed using the allocated IP addresses instead of the phone numbers.

There follows a description of the basic elements of call set-up at the wireless level to aid understanding of the invention.

When a user requests a call, the requested quality class of the wireless channel is communicated to the network 100. In particular, the first mobile handset 102 or 142 can request a particular QoS from the UMTS network, which specifies, for example, guaranteed and maximum bitrates. On this basis, and assuming there are sufficient network resources available, a wireless communications channel is established between the first mobile handset 102 or 142 and the network 100, the wireless channel having defined QoS criteria.

Alternatively the first mobile handset 102 or 142 might request the network resources as a number of wireless channels each with associated QoS (see later for multiple wireless channel discussion).

The second mobile handset 104 or 144 must similarly establish a connection with the network 100, establishing a wireless communications channel with an independent QoS criteria to that of the wireless channel established by the first handset 102 or 142.

Once the wireless communications channel or set of channels is established, video data from, say, a camera (not shown) associated with the first mobile handset 102 or 142 is received by the encoder 106, which in turn generates a sequence of CVDS video data. The video data is sent to the RTP packetiser 117, where it is packetised as described above and sent to the second mobile handset as described above. The receiving mobile handset is mobile handset 104 if mobile handset 102 is the transmitter; mobile handset 144 if mobile handset 142 is the transmitter; and mobile handset 142 if mobile handset 144 is the transmitter. As other users make and break wireless communications channels and the network 100 continuously monitors user resource allocations, it can be the case that available bandwidth for any particular call changes, either increasing or decreasing. If a user has a multiple wireless channel allocation, one or more of the wireless channels may be terminated. It will be understood that these bandwidth changes can take place at a number of points along a given wireless communications channel. For example, it could take place between the first mobile handset 102 (or 142) and the network, or between the network and the second handset 104 (or 144).

Other factors, such as distance of a handset from a base station with which it is communicating, or strong multi-path reflections, can also cause the effective bandwidth or quality of a wireless channel to be reduced.

Any change in effective bandwidth or quality can have two consequences. If the bandwidth increases it could in principle be possible to transmit video data at a higher bitrate. If it decreases, however, the rate at which video data can reliably be transmitted also decreases, possibly below the value that was set at the start of transmission. To accommodate these consequences the QoS parameter set including the available bitrate and bit error rate on the wireless channel between the first mobile handset 102 or 142 and the network 100 is monitored by the controller 108 at the first mobile handset 102 or 142.

In the preferred form, at session initiation and when wireless channel characteristics are to be modified, the UMTS network provides the first mobile handset with a QoS parameter set that is indicative of the available bitrate (i.e. bandwidth) on the wireless communications channel. Such a QoS parameter set is supplied through the protocol stack in known wireless systems from the air interface 112 of the network to the air interface 110 of the transmitting mobile handset 102 or 142. It is normally supplied across a wireless control channel on the downlink of the call.

According to the developing UMTS standard, the QoS parameter set is indicative of various transmission parameters, including the transmissible bitrate over the wireless channel, the signal to noise ratio, the error rate and a priority indicator which is an indication provided from the network to the transmitting mobile handset of the likely priority to be placed on the call. This is therefore an indicator of the bandwidth and likely reliability of the wireless communication channel that has been opened for the particular wireless channel. It will be appreciated in this context that the word "call" is used herein to describe the transmission of video data as well or instead of voice data. The QoS parameter set is read at the mobile handset 102 or 142 by the controller 108 and the transmissible bit rate is extracted from it.

The quality of the wireless communication channel between the network 100 and the second mobile handset 104 or 144 is also monitored. In the preferred form, a QoS parameter set indicative of, amongst other things, the available bandwidth for the RTP session mapped onto this wireless channel is ascertained in the second mobile handset 104 or 144 derived from the wireless control channel information it receives from the network according to the relevant wireless standard (in this case UMTS).

The QoS parameter set is dealt with at the second mobile handset 104 or 144 in a novel manner. The session control protocol (both bi-directional using SIP and unidirectional using RTSP) and have already been discussed. SDP provides for the exchange and updating of session description information such as codec and bitrate.

For the streaming arrangement (shown in Figure 1 ), according to the existing RTSP standard, various control parameters are conveyed by the RTSP packets including, for example, video commands such as PLAY and PAUSE. The standard also provides an ANNOUNCE instruction The system described herein uses the ANNOUNCE provision in the RTSP standard to cause elements of the QoS parameter set determined in the wireless environment and/or other derived parameters to be placed into an SDP payload which itself is placed in an RTSP packet for transmission from the second mobile handset 104 to the mobile handset 102. The thus constructed novel packets are transmitted by the RTSP client 123, via the receiver air interface and the received side air interface to the network backbone 114. From here they travel to the RTSP server 116 via the transmission side air interface 112 and the transmitter air interface 110. For the conversational arrangement (in Figure 2), according to the existing SIP standard, SDP payload are conveyed by the SIP packets to control the bidirectional communication between two mobile handsets. A session is initiated using the INVITE instruction, which itself contains a session description in the SDP format. The standard provides for a session to be modified by either agent by issuing a subsequent INVITE instruction. The system described herein uses the re-INVITE provision in the SIP standard to cause the quality parameter determined in the wireless environment and/or other derived parameters to be placed into an SDP packet which itself is placed in a SIP packet for transmission from the receiving mobile handset 144 to the transmitting mobile handset 142. The thus constructed novel packets are transmitted by the session control agent 148, via the receiver air interface 120 and the received side air interface 118 to the network backbone 114. From here they travel to the session control agent 146 via the transmission side air interface 112 and the transmitter air interface 120.

It can be understood that user agents 146 and 148 are interchangeable in this scenario. For both streaming and conversational arrangements the RTP sessions carrying the video data each have associated RTCP sessions carrying control information back to the transmitter. The system described herein can, in addition to or instead of using RTSP or SIP, use the RTCP application defined (APP) packet to transfer application data (in this case the wireless and other derived QoS parameters) from the receiver to the transmitter.

In both the streaming (Figure 1 ) and conversational (Figure 2) arrangements, when a session control packet is received at the transmitting mobile handset, this along with locally derived QoS parameters can be used to modify the encoded bitrate as already discussed above. In this way, the bitrate of the video stream transmitted from the encoder can be adapted to the wireless channel between a transmitting mobile handset and the network and also to the wireless channel between the network and a receiving mobile handset. In the streaming arrangement and now referencing Figure 8 and Figure 10, there are shown two ways of transmitting a CVDS from an encoder 106 to a decoder 124 using RTP between an RTP packetiser 117 and an RTP depacketiser 125. RTCP control messages are sent between RTCP server 128 and RTCP client 119. RTSP control messages are sent between an RTSP server 116 and an RTSP client 123. Figure 8 shows a prior art method of streaming CVDS transported by an RTP session 804 and associated RTCP session 805. RTSP control messages are sent via RTSP session 806. It will be appreciated that at least some part of the end-to-end communications channel is wireless. In Figure 8 the RTP session containing all the frames of the video stream and the RTCP session containing all RTCP control messages are multiplexed onto a single wireless channel, with the CVDS parameters and frame sequencing being derived according to the QoS parameter of the wireless channel at call set up.

Figure 9 shows the prior art for conversational arrangement. Each mobile handset is depicted with an RTP packetiser (117 or 134), an RTP depacketiser (133 or 125) and a user agent (146 or 148). For a conversational arrangement, a CVDS is transmitted from RTP packetiser 117 to RTP depacketiser 125 using RTP session 814 and RTCP session 815 simultaneously with a CVDS transmitted from RTP packetiser 134 to RTP depacketiser 133 using RTP session 816 and RTCP session 817. A single SIP session 818 controls all the RTP/RTCP sessions. In Figure 9, as in Figure 8, the RTP sessions containing all the frames of the video stream, the RTCP sessions corresponding to these and the SIP session containing all of the SIP control messages are multiplexed onto a single wireless channel, with the CVDS parameters and frame sequencing being selected according to the QoS parameter of the wireless channel at call set up.

Figure 10 shows a preferred embodiment of the invention for the streaming arrangement, in which there is a plurality of wireless channels. In the example shown there are three wireless channels. RTP/RTCP sessions 824/825 and 826/827 are used for transmitting the two substreams of a CVDS and RTSP control messages are sent via RTSP session 828. The wireless channels are provided at call set up as previously described in a situation where the encoder 106, RTP packetiser 117, RTCP client 119 and RTSP server 116 are in a first mobile handset, and the decoder 124, RTP depacketiser 125, RTCP server 128 and RTSP client 123 are in a second mobile handset in ^' accordance with requested QoS and bandwidth parameters according to the UMTS standard. As with previous embodiments, the CVDS substreams are transported via RTP between the RTP packetiser 117 and RTP depacketiser 125, RTCP control messages are sent between RTCP server 128 and RTCP client 119 and control messages are transmitted via an RTSP session between the RTSP server 116 and RTSP client 123. As shown in the example Figure 8, one RTSP session covers all the RTP/RTCP sessions between any two entities while each CVDS substream requires an individual RTP session and an associated RTCP session.

Figure 11 shows a preferred embodiment of the invention for the conversational arrangement, in which there is a plurality of wireless channels. In the example shown there are three wireless channels. RTP/RTCP sessions 834/835 and 836/837 are used for transmitting the two substreams of a CVDS generated by the encoder 106. RTP/RTCP sessions 838/839 and 840/841 are used for transmitting the two substreams of a CVDS generated by the encoder 129. The SIP control messages are sent via SIP session 842. Figure 11 shows that the base layers produced by encoders 106 and 129, carried by RTP sessions 834 and 838 and packetised by RTP packetisers 117 and 134, use the up and down links of the same wireless bearer at each mobile handset. Similarly, the enhanced layers from the encoders 106 and 129, carried by RTP sessions 836 and 840 and packetised by RTP packetisers 117 and 134, use the up and down links of the same wireless bearer at each mobile handset. It will be appreciated

_* that there are many other possible mappings between RTP/RTCP sessions to wireless bearer. The wireless channels are provided at call set up as previously described and in accordance with requested QoS and bandwidth parameters according to the UMTS standard. As shown in the example, one SIP session 845 covers all the RTP/RTCP sessions between any two entities while each CVDS substream requires an individual RTP session and an associated RTCP session.

Since wireless channels for the transmit side and receive side are allocated separately there is no guarantee that the number of transmit side and receive side wireless channels will be the same. In the streaming arrangement shown in Figure 12 there is one transmit side wireless channel and three receive side wireless channels. On the receive side RTP/RTCP sessions 844/845 and 846/847 and RTSP session 848 are each mapped to a separate wireless channel. On the transmit side all IP sessions 844-848 are mapped to the same wireless channel.

Figure 13 shows the situation for the conversational arrangement, where for the handset containing user agent 146 there is only one wireless channel, whilst for the handset containing user agent 148 there are three wireless channels. In the handset containing user agent 148 the RTP/RTCP sessions 853/854 and 857/858 are mapped to the same wireless channel. However the RTP/RTCP sessions 851/852 and 855/856 are mapped to another wireless channel and the SIP control 859 is mapped to yet another wireless channel. In the handset containing user agent 146 all IP sessions of all types 851-859 are mapped to the same wireless channel.

Under the proposed UTMS standard (and others), wireless channels can be defined in terms of a number of quality of service parameters, such as priority, maximum and guaranteed bandwidth, residual bit error rate and delay.

In the embodiment of Figure 10, the first RTP session 824 is defined as carrying the base substream, having an example bitrate of 16kbps, whilst the second RTP session 826 has a bitrate of 32kbps. The first wireless channel has the lowest bitrate but highest priority and the base substream is allocated to it. The enhancement substream is allocated to the second wireless channel, since it has the lower priority. This ensures that the most important video data is allocated to the wireless channel with the highest priority. The first RTP session can also be marked with the highest priority DSCP for prioritised transport over the IP component of a diffserv enabled core network.

As discussed above in relation to other embodiments, the allocation of resources within a UMTS network is dynamic, and this can mean that bandwidths allocated to either of the RTP sessions can fluctuate with (amongst other things) network load. In the preferred embodiment shown in Figure 1 and Figure 2, the bandwidth available for each wireless channel is known to the transmitter controller 108 as it monitors the network messages at the transmitter air interface 110. In the event that the available bandwidth on one or more of the wireless channels is commanded by the network 100 to be reduced, an assessment is made as to whether it is desirable to reallocate the frames between substreams. For example, assuming the substream frame structure of Figure 16, if the second RTP session bandwidth fell to, say, 16kbps an assessment would be taken to determine whether to simply reduce the number of P and B frames generated and leave other substreams unchanged, or whether a better overall quality would be achieved by including some of the P and B frames on the base substream at the expense of say, reduced resolution in the I frames.

It is not necessarily the case that reallocation will happen automatically and immediately without any assessment of context. In one form, the preferred embodiment is configured to maintain a history of wireless channel behaviour in relation to the quality parameter. As an example, a sudden drop in bandwidth on a wireless channel to which relatively high priority substream or frame type is mapped may not be a trigger for the frame mapping to be changed. If there is a history of short-term bursts of bandwidth loss, it is likely that the higher bandwidth will be available shortly, and it may ultimately be more efficient to allow the short- term reduction to be ignored. Typically, an assessment of this type will be made by the transmitter controller 108. It will be understood that quite sophisticated proportional, integral and differential factors can be taken in to account to build a relatively sophisticated model of any wireless channel's behaviour (and likely future behaviour) over time. Such modelling well known to those skilled in the relevant art, and so is not described further here.

Similar history data can be collected for the other types of a quality data collected in earlier embodiments of the invention, and similarly used to make decisions about how and when to alter outputs of, for example, the encoder. In general, if there is a history of short-term bandwidth problems, then it may be more efficient or may provide a visibly better overall video streamed image if the bitrate out of the encoder is not immediately altered when the bandwidth initially drops. Rather, it will in some cases be preferable to wait until the bandwidth has remained low for a predetermined^' time period or number of frames before changing the output of the encoder.

Wireless channels between mobile handset and network have a certain QoS, which is provided for the mobile user of a network service. In an embodiment of the invention, a set of QoS parameters including Bitrate (BR) and Bit Error Rate (BER) are used for controlling the video encoder. These QoS parameters are conveyed between the encoder controller and the decoder controller via IP sessions. Referring to Figure 14, a wireless channel between transmitter 102 and network has QoS parameters BR and BER. A wireless channel between network and receiver 104 has QoS parameters BR' and BER'. The encoder controller 108 sends BR to decoder controller 122 via an RTSP session 866. Having received BR from the encoder controller 108, the decoder controller 122 sends BER' and the calculated Request Bitrate (RBR) to the encoder controller 108 via an RTSP session 866 or RTCP session 865. The encoder controller and the decoder controller will be discussed in detail below.

The encoder controller 108 is used to control the video encoder 106 with the objective of improving the error resilience of video encoding while meeting the bitrate constraint of wireless channels.

In a preferred embodiment of the invention, the video encoder is an MPEG-4 or H.263 compliant encoder. The input video source is encoded into an MPEG-4 or H.263 compliant bit stream. The video data can be constructed using a plurality of different types of frames, which are referred to as I, P and B frames. I frames are self-contained still frames of video data, whereas P and B frames constitute intermediate frames that are predicatively encoded. The precise composition of the frames varies in accordance with the particular standards and application of the standards and is known perse.

MPEG-4 specifies a plurality of layers, including the base layer and enhanced layers, in which each layer is comprised of a sequence of frames which may be of the same type (I, P, B) or a mixture of types. As already described, in a wireless network the mobile may be allocated one wireless channel or a plurality of wireless channels. In the preferred form, each wireless channel is used to transmit a single RTP/RTCP session pair. Each RTP session carries an optimum sequence of I, P and B frames, known as a substream. The term "substream" is used rather than "layer" because the frame sequencing onto wireless channels can be varied dynamically and nee^{^}d not be one of the layer sequences predefined in MPEG-4 or other known video encoding standards. Other partitions of coded video data for error resilient purposes (e.g. Data Partitioning Modes of MPEG-4/H.263) are also possible and could be represented by substreams.

Figure 16 and Figure 17 illustrate example compositions of a video data stream in accordance with the MPEG-4 video standard. In the example two temporally scalable substreams are used with I frames on the base substream while P and B frames are carried on the enhanced substream. In the illustrated example in Figure 17, only the base substream is used, comprising interleaved I and P frames. The encoder controller can thus control the bitrate of the video data stream for each wireless channel by manipulation of the number of substreams used for transmission, and the number and type of frames per substream.

Video encoding under MPEG-4 or H.263 standards operates on a frame- by-frame basis. Each frame is divided into either Group of Blocks (GOBs) or slices. A GOB comprises of macroblocks of one or several rows in a video frame. Slices are more flexible and can partition the frame into a variable number of macroblocks. A macroblock in turn comprises of four luminance and two spatially corresponding colour difference blocks of image data. All blocks in an l-frame are intra-coded. Blocks in an inter-coded P or B-frame can be of both intra-coded blocks (l-blocks) and inter-coded blocks (P-blocks or B-blocks).

The increase of l-block/P-block ratio (I_b/P_b) in P-frames or l-block/B-block ratio (I_b/Bb) in B-frames has two consequences: (1) improving the error resilience because more intra-coded blocks result in less error propagation; (2) increasing the bitrate because inter-coded blocks comprise substantially smaller amounts of data than intra-coded blocks. The encoder controller controls the encoder to make the best use of wireless channel utilisation for error resilient video encoding.

The error control can also be achieved by allocating GOBs or slices in a frame wherein the header of each GOB or slice can serve as synchronisation markers for decoder to regain synchronisation.

Figure 18 illustrates the operation of the encoder controller 108. The encoder controller operates by a closed-loop process. In the first step 400, the encoder controller obtains relevant information from various sources including the BR and BER associated with the wireless channel between the encoder and the network from air interface, the RBR and BER' via IP control session from the decoder, latency jitter (ΔL) of RTP packets from the RTCP client 123 and the instantaneous bitrate (IBR) from the encoder. In the second step 410, the encoder controller determines the target BR (BRt_arget) and the frame type (FT) based on the_. BR, RBR and IBR. In the third step 420, the encoder controller determines the I_b/P_b ratio for P-frames and the I_b/B for B-frames and the synchronisation marker rate R_sync for all frames. In the fourth step 430, the encoder controller determines the QP and the frame rate (FR) for the frame based on the l_b/P_b or I_b/B_b, and BR_targ_et- In the fifth step 440, the encoder controller sends the encoding parameters FT, FR, R_sync, Ib/P_b or Ib/B , and QP to the encoder. In the sixth step 445, the encoder controller sends BR via IP control session to the decoder. Then the encoder controller goes back to the first step 400.

Figure 19 illustrates the operation of the decoder controller 122. The decoder controller operates by a closed-loop process. In the first step 450, the decoder controller obtains relevant information from various sources including the BR' and BER' associated with the wireless channel between the decoder and the network from air interface, the BR associated with the wireless channel between the encoder and the network via IP session from the encoder. In the second step 460, the decoder controller determines ΔL of RTP packets received. In the third step 470, the decoder controller calculates RBR based on ΔL, BR and BR'. In the fourth step 480, the decoder controller sends the RBR and BER' via IP control session to the encoder controller. Then the decoder controller goes back to the first step 450 .Figure 20 illustrates the operation of the encoder 106. The encoder operates on a frame-by-frame basis. In the first step 490, the encoder obtains the encoding parameters including FT, FR, R_sync, Ib/Pb or l_b/Bb, and QP from the encoder controller. In the second step 492, the encoder allocates GOBs or slices for inter-coded frames based on R_sync- In the third step 494, the encoder further allocates the l-block distribution within P or B-frames based on I_b/P_b or I_b/Bb- In the fourth step 496, the encoder encodes the frame using the above encoding parameters and adds it to the CVDS. In the fifth step 498, the encoder calculates the IBR and sends it to the encoder controller. Then the encoder goes back to the first step, to process the next frame.

Although various aspects of the invention have been described with reference to specific embodiments, it will be appreciated by those skilled in the art that the invention can be embodied in many other forms.

DEFINITIONS

Unless the contrary intention clearly appears from the context in which certain words are used, the following definitions apply to words used in this specification:

WIRELESS CHANNEL; a physical radio channel with associated QoS parameters, e.g. UMTS Radio Access Bearer.

END TO END LINK; an end to end communications link between transmitter and receiver containing a transmit side wireless channel and / or a receive side wireless channel;

IP SESSION; An IP communications session between two IP hosts carrying either control or application data. Examples are a RTP session, a RTCP session, a RTSP session and a SIP session. One or more IP sessions can be mapped to a wireless channel;

COMPRESSED VIDEO DATA STREAM (CVDS); an overall video stream, where the original stream of image frames is compressed by means of an encoder;

FRAME; a video encoder outputs a number of different frame types. These include full still images and derivative images that have different data transmission requirements, have different sensitivity to errors and may have dependency on other frames. In the MPEG-4 video standard, frames are also known as Video Object Planes (VOPS);

SUBSTREAM; a CVDS may be split into a number of substreams for the purposes of transmission over a channel or plurality of channels. Each substream can be used as a means of transmitting a sequence of video frames that may be of different types. A substream is not necessarily the same as a layer as defined in video standards such as MPEG-4 and H.263. A substream could also be used to transmit any part of the coded frame data that can be successfully partitioned for error resilient purposes (e.g. DCT coefficients and motion vector data in data partitioning modes of MPEG-4/H.263). Each substream is transported by an RTP session and an associated RTCP session.

RTCP SERVER; an entity generating RTCP Receiver Reports based on the reception of RTP packets. These are sent to the transmitter of the RTP packets, where an RTCP client uses them.

RTCP CLIENT; an entity that uses RTCP Receiver Reports. These are sent from the receiver of the RTP packets, where an RTCP server generates them.

Claims

1. A method of transmitting a compressed video data stream (CVDS) generated by an encoder from a transmitter to a receiver over an end to end communications link, said end to end link including a transmit side wireless channel between the transmitter and the receiver, the method including the steps of: ascertaining a quality parameter indicative of the transmission capability of the wireless channel between the transmitter and the receiver; and at the transmitter, altering the bit rate of the compressed video data stream generated by the encoder in dependence on the quality parameter and other information.

2. A method according to claim 1 , wherein the end to end link comprises more than one transmit side wireless channels and where: each channel has an associated quality parameter; the CVDS is divided into a plurality of substreams with each substream transmitted over a separate wireless channel.

3. A method according to any preceding claim, wherein the transmitter includes a controller and a transmitter air interface, the method including the steps of: determining the quality parameter at the transmitter air interface; supplying to the controller interface data derived from the quality parameter; and sending to the encoder control instructions from the controller to effect their alteration of the bit rate based on said interface and possibly other data.

4. A method according to claim 3, wherein the control instructions define one or more encoding factors selected from: a number of video substreams; a mapping of video encoder frame types onto each substream; and per substream: an available data rate; a video frame rate; a video encoding quantisation factor; and a video frame resolution; the encoder effecting the alteration of the bit rate of the CVDS by applying said one or more encoding factor.

5. A method according to claim 4, wherein the video data stream comprises a plurality of frames of differing types corresponding to each final video image frame, and wherein the ratios of said frames of differing types is alterable by the encoder.

6. A method according to any preceding claim, wherein the quality parameter includes a transmissible data rate capability of the wireless communications channel.

7. A method according to any preceding claim, wherein additional information is supplied to the transmitter controller from the receiver regarding quality parameters of the overall communications link between transmitter and receiver.

8. A method according to any one of the preceding claims, wherein the CVDS includes audio and visual content.

9. A method according to any one of the preceding claims, wherein the end to end communication link uses a UMTS communications network.

10. A transmitter for transmitting a compressed video data stream to a receiver over an end to end communications link, said end to end link including a transmit side wireless channel between the transmitter and the receiver, the transmitter comprising: a wireless interface arranged to determine a quality parameter indicative of the transmission capabilities of the transmit side wireless channel; an encoder arranged to generate said compressed video data stream; and a controller operatively connected to the wireless interface and to the encoder for altering the bit rate of the compressed video data stream generated by the encoder in dependence on the quality parameter and possibly other information.

11. A transmitter according to claim 10, wherein the end to end link comprises more than one transmit side wireless channels and where: the quality parameter may be different for each channel; the CVDS may be divided into one or a plurality of substreams with each substream transmitted over a separate wireless channel.

12. A transmitter according to claim 10 or 11 , wherein the transmitter includes a controller and a transmitter air interface, the transmitter being configured to: determine the quality parameter at the transmitter air interface; supply to the controller interface data derived from the quality parameter; and send to the encoder control instructions from the controller to effect their alteration of the bit rate based on said interface and possible other data.

13. A transmitter according to claim 12, wherein the control instructions define one or more encoding factors selected from: a number of video substreams; a mapping of video encoder frame types onto each substream; and per substream: an available data rate; a video frame rate; a video encoding quantisation factor; and a video frame resolution; the encoder effecting the alteration of the bit rate of the CVDS by applying said one or more encoding factor.

14. A transmitter according to claim 13, wherein the video data stream comprises a plurality of frames of differing types corresponding to each final video image frame, and wherein the ratios of said frames of differing types is alterable by the encoder.

15. A transmitter according to any one of claims 11 to 14, wherein the quality parameter includes a transmissible data rate capability of the wireless channel.

16. A transmitter according to any one of claims 11 to 15, wherein additional information is supplied to the transmitter controller from the receiver regarding quality parameters and other information of the overall communications link between transmitter and receiver.

17. A method according to any one of claims 11 to 16, wherein the CVDS includes audio and visual content.

18. A method according to any one of claim 11 to 17, wherein the end to end communication link uses a UMTS communications network.