VIDEO LAYER MAPPING
FIELD OF THE INVENTION
The present invention relates to video streaming, and, more particularly, to the streaming of video data over a wireless communications network.
The invention has been developed primarily to allow video to be streamed in a UMTS or GPRS mobile telecommunications network using streamable formats such as MPEG-4 and H.263. However it will be appreciated by those skilled in the art that the invention is not limited to use with those particular standards
BACKGROUND TO THE INVENTION
Recent video compression standards have moved towards a layered approach to compression, which allows a video stream so encoded to be tailored to meet the requirements of a fixed bandwidth communications channel. Many recent codecs, such as MPEG-4 and H.263, can be configured to generate a compressed video stream that is defined by multiple prioritised layers. The first, and highest, priority layer is usually referred to as the base layer. One or more further layers, called enhancement layers, are also generated by the encoder for transmission with the base layer. Each enhancement layer adds quality to the final viewable video in the form of enhanced detail at the same framerate or higher framerate or indeed both. At the decoder, the received enhancement layers are combined with the base layer to generate the final viewable video image.
By way of example, a computer connected to the Internet can be configured to accept an encoded video stream for replay to a user. However, if the encoded video stream is to be played back over, say, a relatively slow dial-up connection having a bandwidth less than that of the fully enhanced stream, then one or more enhancement layers can be excluded from transmission to reduce
the bandwidth, thereby enabling real time video transmission. Typically, the lower priority enhancement layers will be excluded in preference, and then other sequentially higher priority layers until the desired bandwidth is achieved. Naturally, a viewer with access to wide bandwidth via, say, a cable modem, might be able to accept all of the base and enhanced layers, and would therefore obtain a better quality video stream.
Systems currently exist to allow video data to be supplied from a video source to a mobile receiver with a display and incorporating a decoder capable of "playing" the video on the display. One of the difficulties with use of a mobile receiver is the variation in the quality of the wireless communications link from the source to the mobile receiver.
In recently proposed mobile communications standards, such as UMTS, a mobile handset can request a communication channel to be opened across a wireless communications network. It is possible to specify the type of connection required, depending upon the nature of the "call" to be made. Various factors, such as minimum or maximum bandwidth (transmissible bitrate) or other quality of service (QoS) parameters can be specified. The network will then endeavour to meet the request with available resources. The network resources can also change dynamically as the number of users changes and channels are opened and closed over time, and so it is possible that the bandwidth or quality of service for a given channel can vary over the duration of a particular connection, potentially outside the originally requested limits.
Where real-time video is to be transmitted across a wireless communications network, a channel having a bandwidth or quality of sen/ice capable of video transmission is requested. Assuming the request can be fulfilled at the commencement of the call such that the video commences streaming, there is no immediate solution available in the real-time case if the bandwidth of the channel drops to below that of the streaming video.
In the case of non-real-time streaming, buffering solutions can be used to handle changes in the available bandwidth but these can lead to visible freezing, stuttering or blanking of the video stream if there is a sustained reduction in the available bandwidth. Moreover, in the case of streaming video, the bandwidth
difference affects ail layers equally. There is no way to dynamically prioritise the data in the encoded stream to minimise the impact of the reduction in available bandwidth on the received image; for example, to allow just the base layer with or without a subset of the enhancement layers.
SUMMARY OF INVENTION
In a first aspect, the present invention provides a method of transmitting a compressed video data stream (CVDS) from a transmitter to a receiver over at least first and second wireless channels in a wireless telecommunications network, the compressed video data stream comprising at least first data defined as being of a first priority level and second data defined as being of a second priority level, the first priority level being of greater importance than the second priority level in contributing to the quality of the received video stream, the method including the steps of: establishing said at least first and second wireless channels, each of the wireless channels having associated with it a quality parameter indicative of^a predetermined quality associated with that channel, the quality parameter of the first channel being the same or higher than that of the second channel; and allocating the first data to the first wireless channel and the second data to the second wireless channel.
Preferably, the method further includes the steps of: determining, in the transmitter, the quality parameter indicative of the respective qualities of the first and second wireless channels; and upon detecting a chang.e in the quality parameters, reallocating the first and second data between the first and second wireless channels if necessary to ensure that the first data is preferentially allocated to whichever of the first and second wireless channels has the better quality.
In a preferred form, the transmitter includes a transmitter air interface for maintaining wireless communication channels with the wireless network, the first and second data being mapped onto the wireless channels within the transmitter.
It is also preferred that the transmitter include a controller configured to communicate with the transmitter air interface, the quality parameters being received by the transmitter air interface from the network, and the controller being configured to receive the quality parameters from the air interface. Preferably, the step of mapping the first and second data to the wireless channels is performed under control of the controller. More preferably, the step of monitoring the quality parameters is performed by the controller.
In one preferred form, the compressed video data stream is in a format having a base layer and one or more enhanced layers, the first data or base substream comprising predominantly the base layer and the second data or substream comprising predominantly one or more of the enhancement layers. More preferably, the first data comprises only the base layer, and the second data comprises at least a highest priority enhanced layer.
In one embodiment, the method further includes the steps of: recording history data representative of changes in the quality parameter associated with either or both of the first and second wireless channels; and using the controller to effect the reallocation of the first and second data to the first and second wireless channels if necessary to ensure that the first data is preferentially allocated to whichever of the first and second wireless channels is likely to have the better quality over a predetermined future time period based on the history data.
Preferably, the compressed video data stream is generated by an encoder associated with the transmitter. More preferably, the encoder is in communication with the controller, the method including the step of using the controller to control the compressed video data stream output from the encoder.
In a preferred form, the transmitter is a network transmitter and the receiver is a mobile handset.
In preferred embodiments, the quality parameter includes a data rate factor, an error rate factor and/or a channel reliability factor.
Preferably, the number of data channels is more than two (i.e. first, second, third, etc. data channels or substreams are available).
In a second aspect, the present invention provides a transmitter for transmitting a compressed video data stream (CVDS) to a receiver over at least first and second wireless channels in a wireless telecommunications network, the compressed video data stream comprising at least first data defined as being of a first priority level and second data defined as being of a second priority level, the first priority level being of greater importance than the second priority level in contributing to the quality of the received video stream, the transmitter being configured to: establish said at least first and second wireless channels, each of the wireless channels having associated with it a quality parameter indicative of a predetermined quality associated with that channel, the quality parameter of the first channel being the same or higher than that of the second channel; and allocate the first data to the first wireless channel and the second data to the second wireless channel.
Preferably, the transmitter is configured to: determine the quality parameter indicative of the respective qualities of the first and second wireless channels; and upon detecting a change in the quality parameters, reallocate the first and second data between the first and second wireless channels if necessary to ensure that the first data is preferentially allocated to whichever of the first and second wireless channels has the better quality.
Preferably, the transmitter further includes a transmitter air interface for maintaining wireless communication channels with the wireless network, the first and second data being mapped onto the wireless channels within the transmitter. More preferably, the transmitter further includes a controller configured to communicate with the transmitter air interface, the quality parameters being received by the transmitter air interface from the network, and the controller being configured to receive the quality parameters from the air interface.
In a preferred form, mapping of the first and second channels is performed under control of the controller.
Preferably, monitoring of the quality parameters is performed by the controller.
Preferably, the compressed video data stream is in a format having a base layer and one or more enhanced layers, the first data or base substream comprising predominantly the base layer and the second data or substream comprising predominantly one or more of the enhancement layers.
Preferably, wherein the first data comprises only the base layer.
Preferably, the second data comprises at least a highest priority enhanced layer.
In a preferred form, the transmitter is configured to: record history data representative of changes in the quality parameter associated with either or both of the first and second wireless channels; and use the controller to effect the reallocation of the first and second data to the first and second wireless channels if necessary to ensure that the first data is preferentially allocated to whichever of the first and second wireless channels is likely to have the better quality over a predetermined future time period based on the history data.
Preferably, the history data includes integral, proportional and/or differential data calculated from previous quality parameters.
Preferably, the compressed video data stream is generated by an encoder associated with the transmitter.
In a preferred form, the encoder is in communication with the controller, the method including the step of using the controller to control the compressed video data stream output from the encoder.
In a preferred form, the transmitter is a network transmitter and the receiver is a mobile handset.
In another embodiment, the transmitter is a mobile handset and the receiver is a network receiver.
In another embodiment, both the receiver and the transmitter are mobile handsets.
In a preferred form, the quality parameter includes a data rate factor. In other embodiments, the quality parameter includes an error rate factor. In yet other embodiments, the quality parameter includes a channel reliability factor.
Preferably, the number of data channels is more than two.
In a preferred form, a mobile handset is composed of both a transmitter and a receiver. The mobile handset is able to: receive a compressed video data stream; transmit a compressed video data stream.
BRIEF DESCRIPTION OF DRAWINGS
Preferred embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
Figure 1 is a simplified schematic diagram of a UMTS communication system showing a network and two mobile handsets, where one handset is the transmitter and the other a receiver, for streaming video;
Figure 2 is a simplified schematic diagram of a UMTS communication system showing a network and two mobile handsets, where each handset is simultaneously a transmitter and a receiver for conversational video;
Figure 3 shows the construction of a UDP/IP packet containing MPEG-4 video payload data;
Figure 4 shows the construction of a UDP/IP packet containing RTCP data;
Figure 5 shows the construction of multiple substreams with a common IP address and different UDP addresses per substream;
Figure 6 shows the construction of a TCP/IP packet containing RTSP and SDP data;
Figure 7 shows the construction of a TCP/IP packet containing SIP and SDP data;
Figure 8, Figure 10 and Figure 12 show single and multiple RTP/RTCP sessions and an RTSP session (also labelled more generically as IP Sessions), mapped onto single and multiple wireless channels for streaming video;
Figure 9, Figure 11 and Figure 13 show single and multiple RTP/RTCP sessions and a SIP session (also labelled more generically as IP Sessions), mapped onto single and multiple wireless channels for conversational video;
Figure 14 shows Quality of Service (QoS) parameters associated with the wireless channels between mobile handsets and networks, and the sending of these and other QoS parameters between the mobile handsets via IP sessions for streaming video;
Figure 15 shows QoS parameters associated with the wireless channels between mobile handsets and networks, and the sending of these and other QoS parameters between the mobile handsets via IP sessions for conversational video;
Figure 16 and Figure 17 shows typical mapping of compressed video data stream frames onto base and enhanced substreams;
Figure 18 is a flowchart showing sequential operations performed by the encoder controller;
Figure 19 is a flowchart showing sequential operations performed by the decoder controller; and
Figure 20 is a flowchart showing sequential operations performed by the encoder.
DETAILED DESCRIPTION OF PREFERRED AND OTHER EMBODIMENTS
The preferred embodiment of the present invention is applied to a network and associated mobile handsets designed to operate under the current GPRS or proposed UMTS standard.
Referring to the drawings, and Figure 1 in particular, there is shown a UMTS network 100 that is used to establish an end-to-end link between a first mobile handset 102 and a second mobile handset 104. The communication session between the first mobile handset 102 and the second mobile handset 104 is unidirectional. Mobile handset 102 is acting solely as a transmitter of compressed video data, whilst mobile handset 104 is acting solely as a receiver of compressed video data. This is termed the streaming arrangement.
In the embodiment shown in Figure 2, the communication session between the first mobile handset 142 and the second mobile handset 144 is bi-directional. Mobile handset 142 is acting both as a transmitter and a receiver of compressed
video data, and mobile handset 144 is also acting as a transmitter and receiver of compressed video data. This is termed the conversational arrangement. In Figure 2 the functions and operations of the second mobile handset 144 are identical to the functions and operations of the first mobile handset 142.
It will be appreciated that the transmitter function and operation of mobile handsets 142 and 144 is identical to that of the transmitter function and operation of mobile handset 102. It will also be appreciated that the receiver function and operation of mobile handsets 142 and 144 is identical to that of the receiver function and operation of mobile handset 104. In the case of mobile handsets 142 and 144, the receiver and transmitter functions and operations are present within the same mobile handset. In mobile handsets 102 and 104 only the transmitter or the receiver function is present respectively. It will be appreciated that the mobile handsets 142 and 144 can also operate exclusively as transmitters or receivers to produce the streaming arrangement, in addition to their conventional conversational arrangement, if so configured.
In Figure 1 the first mobile handset includes a transmitter controller 108, a Real Time Streaming Protocol (RTSP) server 116, an encoder 106, a Real Time Protocol (RTP) packetiser 117, a Real Time Control Protocol (RTCP) client 119 and a transmitter air interface 110, which are operatively interconnected with each other as shown. The encoder 106 accepts raw video data (RVD) from a video source such as a camera (not shown) associated with the first mobile handset 102 and encodes it into a compressed video data stream (CVDS) format, as discussed in detail below. This is then packetised by the RTP packetiser 117. According to the normal operation of a mobile communications network, the transmitter air interface 110 establishes a wireless channel with a transmit side air interface 112 in the network 100, which in turn is in communication with a network backbone 114.
The network 100 also includes a receive side air interface 118 that establishes a wireless channel with a receiver air interface 120 disposed in the second mobile handset 104. The second mobile handset 104 also includes a receiver controller 122, an RTSP client 123, an RTP depacketiser 125, an RTCP
server 128 and a decoder 124. These are operatively interconnected with each other as shown.
In Figure 2 the mobile handsets 142 and 144 includes all the components that are present in both the mobile handset 102 for the transmitter function, with the exception of the RTSP server 116, and the mobile handset 104 for the receiver function, with the exception of the RTSP client 123. The RTSP server 116 is replaced by a SIP User Agent (UA) 146 in mobile handset 142 and the RTSP client 123 is replaced by a SIP UA 148 in mobile handset 144. The components of the transmitter function, the components of the receiver function, the User Agent and the air interface for mobile handsets 142 and 144 are operatively interconnected with each other as shown. It will be appreciated that the function and operation of the mobile communications network shown in Figure 2 is as above.
In a UMTS network, the mobile handsets 102, 104, 142 and 144 are designated User Equipment (UE), the air interface elements 112 and 118 correspond to the Universal Terrestrial Radio Access Network (UTRAN), the backbone element 114 corresponds to the Core Network (CN).
According to normal mobile communications network operation, an end to end link is established between the first and second mobile handsets 102, and 104 (or 142 and 144), comprising a first wireless channel 126 between the first mobile handset and the network and a second wireless channel 127 between the network and the second mobile handset. Wireless channels are established using different frequencies and/or spreading codes and/or time slots in a manner well known in the mobile communications art. They allow for bi-directional communication, both for data and control information.
As part of the wireless channel control information, the wireless channel 126 between the transmitter air interface 1 0 and the transmit side air interface 112 carries Quality of Service parameters (QoS), from the network 100 to the first mobile handset 102 (or 142). Similarly, the wireless channel 127 between the air interface 118 and the receiver side air interface 120 carries Quality of Service parameters (QoS), from the network 100 to the second mobile handset 104 (or 144).
The packetised CVDS is transmitted over the end to end link defined between the two mobile handsets 102 and 104 (or 142 and 144), across wireless channels 126 and 127. In the preferred embodiment, the CVDS takes the form of an MPEG-4 stream, but other suitable streaming formats, such as H.263 can also be used. Both of these standards are applicable to variable bitrate and low bitrate video, e.g. bitrates of 10kbps or higher. It is particularly preferred that the transmission be in RTP format, as discussed in detail below in relation to Figure 3, Figure 4, Figure 5.
Turning to Figure 3, the packetisation of the raw MPEG-4 data for transmission is shown. This packetisation takes place in the first mobile handset 102 or 142 under the control of the transmitter controller 108 before sending the packets to the transmitter air interface 110. Upon emerging from the wireless network of the transmitter the packets travel over a packet switched network to the wireless network of the receiver. Here the packets are sent to the receive side air interface 118 and on to the second mobile handset 104 or 144 via wireless channel 127.
Figure 3 shows the packetisation layers for a single MPEG-4 packet 200 in the form in which it leaves the encoder 106. It will be appreciated that a stream of such packets will be generated from the incoming RVD. The MPEG-4 video data 200 is wrapped in an RTP format packet layer 201. This, in turn, is wrapped in a UDP format packet layer 202, which in turn is packetised into an Internet Protocol (IP) packet 203. It is this IP packet 203 that is presented by the RTP packetiser 117 to the transmitter air interface 110 for transmission over the wireless channel. This layering format, defined by IETF, is known and is included in the developing UMTS standard for video transmission.
Each of the packetisation layers of the packet is directed to a particular part of the overall communication. They will not be described in detail because they are already known in the art and conform to the respective standards. However the principal component of each packet will be described insofar as is necessary to understand the embodiments of the invention that follow.
The MPEG-4 layer 200 contains the coded video data. The RTP layer 201 contains sequence numbers, time stamps, and payload bits that enable a
depacketiser and decoder to decode it and replay at the correct time and in the correct sequence in relation to other packets from the same stream. The UDP layer is used for asynchronous communication of the data over the wireless communications channel and is a "best effort" connectionless protocol. The IP packet 203 contains an IP address which identifies the mobile receiver 104 or 144 as the destination. The IP packet header may also contain a Differentiated Sen/ices Code Point (DSCP) which could be used by a diffserv-enabled core network to determine how that packet should be forwarded by nodes inside that network.
Figure 4 shows the packetisation layers for a single RTCP packet 205 in the form in which it leaves the RTCP server 128. The RTCP packet is wrapped in an UDP format packet layer 206, and packetised into an IP packet 207.
Turning to Figure 5, the CVDS can be transmitted over the wireless channels in one or multiple substreams, each transported by an RTP session, (and an associated RTCP session) where these are mapped to one or multiple wireless channels that may have different quality parameters. In this case the IP address 208 is common to both substreams. Routing through the transmission chain is achieved by characterising different substreams by different socket numbers 210 and 212 in the UDP address.
It will be understood that some of the receiving handsets, or, indeed, the transmitting handset may not be capable of forming a multi wireless channel connection with the network. This may be because of equipment incompatibilities or network resource issues, for example. In this case, it is still possible for the video layers to be allocated to multiple wireless channels in accordance with the above embodiment, whilst multiplexing the video onto a single wireless channel for those not capable of forming the requisite multi wireless channel connection.
In the streaming arrangement of Figure 1 , the IP packet of Figure 3 is transmitted directly from the first mobile handset 102, via the network 100, to the second mobile handset 104.
Once received at the receiver air interface 120, the packets are forwarded to the RTP depacketiser 125, where the MPEG-4 data 200 is re-constructed. The packets must be re-ordered using RTP layer data 201 such as frame timestamps
and the data from the plurality of substreams must be re-assembled. The reconstructed MPEG-4 data 200 is then sent from the RTP depacketiser 125 to the decoder 124, where it is decoded for replay on, for example, a visual display (not shown) on the second mobile handset 104.
In the return direction (i.e., from the second mobile handset 104 to the first mobile handset 102 in this example), control information is returned in accordance with the known RTCP and RTSP protocols, with the latter using the known Session Description Protocol (SDP).
Figure 8 and Figure 10 show the IP sessions between a streaming server and client. The RTCP sessions provide feedback on the data transmission quality for each RTP session. RTSP additionally provides an overriding control connection from RTSP client 123 to RTSP server 116, using SDP to provide a description of the connection between client and server. It is known in the art that, when a session changes, an RTSP control packet containing a new SDP packet is sent to the remote entity.
The packetisation of the SDP and RTSP information is shown in Figure 6. The SDP information 306 is wrapped in an RTSP packet 300. An RTSP packet 300 is wrapped in a Transport Control Protocol (TCP) packet 302, which is within an IP packet 304. This packet is built up by the RTSP client 123 and supplied to the receiver air interface 120 for transmission to the receive side air interface 118 of the network 100. The destination address of this RTSP packet is that of the first mobile handset 102.
In the preferred embodiment shown in Figure 1 , the RTSP packet passes to the transmit side air interface 112 via the backbone 114 for transmission to the transmitter air interface 110, to the RTSP server 116 and thence to the transmitter controller 108. Also, control information is exchanged between both mobile handsets 102 and 104 in accordance with the known RTCP protocol.
In the conversational arrangement of Figure 2 the IP packet of Figure 3 is transmitted and received between the mobile handsets 142 and 144, via the network 100, in the same way as for the streaming arrangement described above. The encoded video packet is also processed in the same way as for the streaming arrangement described above.
For the control of bi-directional communications sessions the Session Initiation Protocol (SIP) is used instead of RTSP. This is managed by user agents 146 and 148 in mobile handsets 142 and 144 respectively. SIP uses SDP to provide a description of the connection between two peer user agents. It is known in the art that, when a session changes, a SIP control packet containing a new SDP payload is sent to the peer user agent.
Figure 9 and Figure 11 show the IP sessions between two peer user agents 146 and 148. The SIP session provides an overriding connection control between user agents 146 and 148 using the SDP. Again, control information is also exchanged between both mobile handsets 142 and 144 in accordance with the known RTCP protocol. The RTCP session will provide feedback on the data transmission quality for each RTP session.
The packetisation of the SIP information is shown in Figure 7, where the packet is denoted 308. An SDP payload 307 is encapsulated within the SIP packet 308. A SIP packet 308 is wrapped in a User Datagram Protocol (UDP) packet 310, which is within an IP packet 305. This packet is built up by the user agent 148 and supplied to the air interface 127 for transmission to the receive side air interface 118 of the network 100. The destination address of this SIP packet is that of the other mobile handset 142. It will be understood that user agents 146 and 148 are interchangeable in this scenario.
In the preferred embodiment shown in Figure 2, the SIP packet passes to the transmit side air interface 112 via the backbone 114 for transmission to the transmitter air interface 126 and thence to the transmitter controller 108 via the user agent 146.
In use, a user of, say, the first mobile handset, 102 or 142, places a call to the second mobile handset 104 or 144, by dialling the second handset's mobile number.
The first mobile handset's number is mapped to a first IP address taken from a pool of IP addresses, and the second mobile handset's number is mapped to a second IP address taken from the pool of IP addresses. This mapping persists for as long as the connection is maintained. Once the connection is broken, usually because one or both of the users hang up, the mapping is
removed and the IP addresses are returned to the pool for reuse. It will be appreciated that this arrangement means that packets can be routed using the allocated IP addresses instead of the phone numbers.
There follows a description of the basic elements of call set-up at the wireless level to aid understanding of the invention.
When a user requests a call, the requested quality class of the wireless channel is communicated to the network 100. In particular, the first mobile handset 102 or 142 can request a particular QoS from the UMTS network, which specifies, for example, guaranteed and maximum bitrates. On this basis, and assuming there are sufficient network resources available, a wireless communications channel is established between the first mobile handset 102 or 142 and the network 100, the wireless channel having defined QoS criteria.
Alternatively the first mobile handset 102 or 142 might request the network resources as a number of wireless channels each with associated QoS (see later for multiple wireless channel discussion).
The second mobile handset 104 or 144 must similarly establish a connection with the network 100, establishing a wireless communications channel with an independent QoS criteria to that of the wireless channel established by the first handset 102 or 142.
Once the wireless communications channel or set of channels is established, video data from, say, a camera (not shown) associated with the first mobile handset 102 or 142 is received by the encoder 106, which in turn generates a sequence of CVDS video data. The video data is sent to the RTP packetiser 117, where it is packetised as described above and sent to the second mobile handset as described above. The receiving mobile handset is mobile handset 104 if mobile handset 102 is the transmitter; mobile handset 144 if mobile handset 142 is the transmitter; and mobile handset 142 if mobile handset 144 is the transmitter.
As other users make and break wireless communications channels and the network 100 continuously monitors user resource allocations, it can be the case that available bandwidth for any particular call changes, either increasing or decreasing. If a user has a multiple wireless channel allocation, one or more of
the wireless channels may be terminated. It will be understood that these bandwidth changes can take place at a number of points along a given wireless communications channel. For example, it could take place between the first mobile handset 102 (or 142) and the network, or between the network and the second handset 104 (or 144).
Other factors, such as distance of a handset from a base station with which it is communicating, or strong multi-path reflections, can also cause the effective bandwidth or quality of a wireless channel to be reduced.
Any change in effective bandwidth or quality can have two consequences. If the bandwidth increases it could in principle be possible to transmit video data at a higher bitrate. If it decreases, however, the rate at which video data can reliably be transmitted also decreases, possibly below the value that was set at the start of transmission. To accommodate these consequences the QoS parameter set including the available bitrate and bit error rate on the wireless channel between the first mobile handset 102 or 142 and the network 100 is, monitored by the controller 108 at the first mobile handset 102 or 142.
In the preferred form, at session initiation and when wireless channel characteristics are to be modified, the UMTS network provides the first mobile handset with a QoS parameter set that is indicative of the available bitrate (i.e. bandwidth) on the wireless communications channel. Such a QoS parameter set is supplied through the protocol stack in known wireless systems from the air interface 112 of the network to the air interface 110 of the transmitting mobile handset 102 or 142. It is normally supplied across a wireless control channel on the downlink of the call.
According to the developing UMTS standard, the QoS parameter set is indicative of various transmission parameters, including the transmissible bitrate over the wireless channel, the signal to noise ratio, the error rate and a priority indicator which is an indication provided from the network to the transmitting mobile handset of the likely priority to be placed on the call. This is therefore an indicator of the bandwidth and likely reliability of the wireless communication channel that has been opened for the particular wireless channel. It will be appreciated in this context that the word "call" is used herein to describe the
transmission of video data as well or instead of voice data. The QoS parameter set is read at the mobile handset 102 or 142 by the controller 108 and the transmissible bitrate is extracted from it.
The quality of the wireless communication channel between the network 100 and the second mobile handset 104 or 144 is also monitored. In the preferred form, a QoS parameter set indicative of, amongst other things, the available bandwidth for the RTP session mapped onto this wireless channel is ascertained in the second mobile handset 104 or 144 derived from the wireless control channel information it receives from the network according to the relevant wireless standard (in this case UMTS).
The QoS parameter set is dealt with at the second mobile handset 104 or 144 in a novel manner. The session control protocol (both bi-directional using SIP and unidirectional using RTSP) and have already been discussed. SDP provides for the exchange and updating of session description information such as codec and bitrate.
For the streaming arrangement (shown in Figure 1 ), according to the existing RTSP standard, various control parameters are conveyed by the RTSP packets including, for example, video commands such as PLAY and PAUSE. The standard also provides an ANNOUNCE instruction The system described herein uses the ANNOUNCE provision in the RTSP standard to cause elements of the QoS parameter set determined in the wireless environment and/or other derived parameters to be placed into an SDP payload which itself is placed in an RTSP packet for transmission from the second mobile handset 104 to the mobile handset 102. The thus constructed novel packets are transmitted by the RTSP client 123, via the receiver air interface and the received side air interface to the network backbone 114. From here they travel to the RTSP server 116 via the transmission side air interface 112 and the transmitter air interface 110.
For the conversational arrangement (in Figure 2), according to the existing SIP standard, SDP payload are conveyed by the SIP packets to control the bidirectional communication between two mobile handsets. A session is initiated using the INVITE instruction, which itself contains a session description in the SDP format. The standard provides for a session to be modified by either agent
by issuing a subsequent INVITE instruction. The system described herein uses the re-INVITE provision in the SIP standard to cause the quality parameter determined in the wireless environment and/or other derived parameters to be placed into an SDP packet which itself is placed in a SIP packet for transmission from the receiving mobile handset 144 to the transmitting mobile handset 142. The thus constructed novel packets are transmitted by the session control agent 148, via the receiver air interface 120 and the received side air interface 118 to the network backbone 114. From here they travel to the session control agent 146 via the transmission side air interface 1 12 and the transmitter air interface 120.
It can be understood that user agents 146 and 148 are interchangeable in this scenario. For both streaming and conversational arrangements the RTP sessions carrying the video data each have associated RTCP sessions carrying control information back to the transmitter. The system described herein can, in addition to or instead of using RTSP or SIP, use the RTCP application defined (APP) packet to transfer application data (in this case the wireless and other derived QoS parameters) from the receiver to the transmitter.
In both the streaming (Figure 1 ) and conversational (Figure 2) arrangements, when a session control packet is received at the transmitting mobile handset, this along with locally derived QoS parameters can be used to modify the encoded bitrate as already discussed above. In this way, the bitrate of the video stream transmitted from the encoder can be adapted to the wireless channel between a transmitting mobile handset and the network and also to the wireless channel between the network and a receiving mobile handset. In the streaming arrangement and now referencing Figure 8 and Figure 10, there are shown two ways of transmitting a CVDS from an encoder 106 to a decoder 124 using RTP between an RTP packetiser 117 and an RTP depacketiser 125. RTCP control messages are sent between RTCP server 128 and RTCP client 119. RTSP control messages are sent between an RTSP server 116 and an RTSP client 123. Figure 8 shows a prior art method of streaming CVDS transported by an RTP session 804 and associated RTCP session 805. RTSP control messages are sent via RTSP session 806. It will be appreciated that at least some part of the end-to-end communications channel is wireless. In Figure 8 the RTP session
containing all the frames of the video stream and the RTCP session containing all RTCP control messages are multiplexed onto a single wireless channel, with the CVDS parameters and frame sequencing being derived according to the QoS parameter of the wireless channel at call set up.
Figure 9 shows the prior art for conversational arrangement. Each mobile handset is depicted with an RTP packetiser (117 or 134), an RTP depacketiser (133 or 125) and a user agent (146 or 148). For a conversational arrangement, a CVDS is transmitted from RTP packetiser 117 to RTP depacketiser 125 using RTP session 814 and RTCP session 815 simultaneously with a CVDS transmitted from RTP packetiser 134 to RTP depacketiser 133 using RTP session 816 and RTCP session 817. A single SIP session 818 controls all the RTP/RTCP sessions. In Figure 9, as in Figure 8, the RTP sessions containing all the frames of the video stream, the RTCP sessions corresponding to these and the SIP session containing all of the SIP control messages are multiplexed onto a single wireless channel, with the CVDS parameters and frame sequencing being selected according to the QoS parameter of the wireless channel at call set up.
Figure 10 shows a preferred embodiment of the invention for the streaming arrangement, in which there is a plurality of wireless channels. In the example shown there are three wireless channels. RTP/RTCP sessions 824/825 and 826/827 are used for transmitting the two substreams of a CVDS and RTSP control messages are sent via RTSP session 828. The wireless channels are provided at call set up as previously described in a situation where the encoder 106, RTP packetiser 117, RTCP client 119 and RTSP server 116 are in a first mobile handset, and the decoder 124, RTP depacketiser 125, RTCP server 128 and RTSP client 123 are in a second mobile handset in accordance with requested QoS and bandwidth parameters according to the UMTS standard.
As with previous embodiments, the CVDS substreams are transported via RTP between the RTP packetiser 117 and RTP depacketiser 125, RTCP control messages are sent between RTCP server 128 and RTCP client 119 and control messages are transmitted via an RTSP session between the RTSP server 116 and RTSP client 123. As shown in the example Figure 8, one RTSP session
covers all the RTP/RTCP sessions between any two entities while each CVDS substream requires an individual RTP session and an associated RTCP session.
Figure 11 shows a preferred embodiment of the invention for the conversational arrangement, in which there is a plurality of wireless channels. In the example shown there are three wireless channels. RTP/RTCP sessions 834/835 and 836/837 are used for transmitting the two substreams of a CVDS generated by the encoder 106. RTP/RTCP sessions 838/839 and 840/841 are used for transmitting the two substreams of a CVDS generated by the encoder 129. The SIP control messages are sent via SIP session 842. Figure 11 shows that the base layers produced by encoders 106 and 129, carried by RTP sessions 834 and 838 and packetised by RTP packetisers 117 and 134, use the up and down links of the same wireless bearer at each mobile handset. Similarly, the enhanced layers from the encoders 106 and 129, carried by RTP sessions 836 and 840 and packetised by RTP packetisers 117 and 134, use the up and down links of the same wireless bearer at each mobile handset. It will be appreciated that there are many other possible mappings between RTP/RTCP sessions to wireless bearer. The wireless channels are provided at call set up as previously described and in accordance with requested QoS and bandwidth parameters according to the UMTS standard. As shown in the example, one SIP session 845 covers all the RTP/RTCP sessions between any two entities while each CVDS substream requires an individual RTP session and an associated RTCP session.
Since wireless channels for the transmit side and receive side are allocated separately there is no guarantee that the number of transmit side and receive side wireless channels will be the same. In the streaming arrangement shown in Figure 12 there is one transmit side wireless channel and three receive side wireless channels. On the receive side RTP/RTCP sessions 844/845 and 846/847 and RTSP session 848 are each mapped to a separate wireless channel. On the transmit side all IP sessions 844-848 are mapped to the same wireless channel.
Figure 13 shows the situation for the conversational arrangement, where for the handset containing user agent 146 there is only one wireless channel, whilst for the handset containing user agent 148 there are three wireless
channels. In the handset containing user agent 148 the RTP/RTCP sessions 853/854 and 857/858 are mapped to the same wireless channel. However the RTP/RTCP sessions 851/852 and 855/856 are mapped to another wireless channel and the SIP control 859 is mapped to yet another wireless channel. In the handset containing user agent 146 all IP sessions of all types 851-859 are mapped to the same wireless channel.
Under the proposed UTMS standard (and others), wireless channels can be defined in terms of a number of quality of service parameters, such as priority, maximum and guaranteed bandwidth, residual bit error rate and delay.
In the embodiment of Figure 10, the first RTP session 824 is defined as carrying the base substream, having an example bitrate of 16kbps, whilst the second RTP session 826 has a bitrate of 32kbps. The first wireless channel has the lowest bitrate but highest priority and the base substream is allocated to it. The enhancement substream is allocated to the second wireless channel, since it has the lower priority. This ensures that the most important video data is allocated to the wireless channel with the highest priority. The first RTP session can also be marked with the highest priority DSCP for prioritised transport over the IP component of a diffserv enabled core network.
As discussed above in relation to other embodiments, the allocation of resources within a UMTS network is dynamic, and this can mean that bandwidths allocated to either of the RTP sessions can fluctuate with (amongst other things) network load. In the preferred embodiment shown in Figure 1 and Figure 2, the bandwidth available for each wireless channel is known to the transmitter controller 108 as it monitors the network messages at the transmitter air interface 110. In the event that the available bandwidth on one or more of the wireless channels is commanded by the network 100 to be reduced, an assessment is made as to whether it is desirable to reallocate the frames between substreams. For example, assuming the substream frame structure of Figure 16, if the second RTP session bandwidth fell to, say, 16kbps an assessment would be taken to determine whether to simply reduce the number of P and B frames generated and leave other substreams unchanged, or whether a better overall quality would be
achieved by including some of the P and B frames on the base substream at the expense of say, reduced resolution in the 1 frames.
It is not necessarily the case that reallocation will happen automatically and immediately without any assessment of context. In one form, the preferred embodiment is configured to maintain a history of wireless channel behaviour in relation to the quality parameter. As an example, a sudden drop in bandwidth on a wireless channel to which relatively high priority substream or frame type is mapped may not be a trigger for the frame mapping to be changed. If there is a history of short-term bursts of bandwidth loss, it is likely that the higher bandwidth will be available shortly, and it may ultimately be more efficient to allow the short- term reduction to be ignored. Typically, an assessment of this type will be made by the transmitter controller 108. It will be understood that quite sophisticated proportional, integral and differential factors can be taken in to account to build a relatively sophisticated model of any wireless channel's behaviour (and likely future behaviour) over time. Such modelling well known to those skilled in the relevant art, and so is not described further here.
Similar history data can be collected for the other types of a quality data collected in earlier embodiments of the invention, and similarly used to make decisions about how and when to alter outputs of, for example, the encoder. In general, if there is a history of short-term bandwidth problems, then it may be more efficient or may provide a visibly better overall video streamed image if the bitrate out of the encoder is not immediately altered when the bandwidth initially drops. Rather, it will in some cases be preferable to wait until the bandwidth has remained low for a predetermined time period or number of frames before changing the output of the encoder.
Wireless channels between mobile handset and network have a certain QoS, which is provided for the mobile user of a network service. In an embodiment of the invention, a set of QoS parameters including Bitrate (BR) and Bit Error Rate (BER) are used for controlling the video encoder. These QoS parameters are conveyed between the encoder controller and the decoder controller via IP sessions. Referring to Figure 14, a wireless channel between transmitter 102 and network has QoS parameters BR and BER. A wireless
channel between network and receiver 104 has QoS parameters BR' and BER'. The encoder controller 108 sends BR to decoder controller 122 via an RTSP session 866. Having received BR from the encoder controller 108, the decoder controller 122 sends BER' and the calculated Request Bitrate (RBR) to the encoder controller 108 via an RTSP session 866 or RTCP session 865. The encoder controller and the decoder controller will be discussed in detail below.
The encoder controller 108 is used to control the video encoder 106 with the objective of improving the error resilience of video encoding while meeting the bitrate constraint of wireless channels.
In a preferred embodiment of the invention, the video encoder is an MPEG-4 or H.263 compliant encoder. The input video source is encoded into an MPEG-4 or H.263 compliant bit stream. The video data can be constructed using a plurality of different types of frames, which are referred to as I, P and B frames. I frames are self-contained still frames of video data, whereas P and B frames constitute intermediate frames that are predicatively encoded. The precise composition of the frames varies in accordance with the particular standards and application of the standards and is known perse.
MPEG-4 specifies a plurality of layers, including the base layer and enhanced layers, in which each layer is comprised of a sequence of frames which may be of the same type (I, P, B) or a mixture of types. As already described, in a wireless network the mobile may be allocated one wireless channel or a plurality of wireless channels. In the preferred form, each wireless channel is used to transmit a single RTP/RTCP session pair. Each RTP session carries an optimum sequence of I, P and B frames, known as a substream. The term "substream" is used rather than "layer" because the frame sequencing onto wireless channels can be varied dynamically and need not be one of the layer sequences predefined in MPEG-4 or other known video encoding standards. Other partitions of coded video data for error resilient purposes (e.g. Data Partitioning Modes of MPEG-4/H.263) are also possible and could be represented by substreams.
Figure 16 and Figure 17 illustrate example compositions of a video data stream in accordance with the MPEG-4 video standard. In the example two temporally scalable substreams are used with I frames on the base substream
while P and B frames are carried on the enhanced substream. In the illustrated example in Figure 17, only the base substream is used, comprising interleaved I and P frames. The encoder controller can thus control the bitrate of the video data stream for each wireless channel by manipulation of the number of substreams used for transmission, and the number and type of frames per substream.
Video encoding under MPEG-4 or H.263 standards operates on a frame- by-frame basis. Each frame is divided into either Group of Blocks (GOBs) or slices. A GOB comprises of macroblocks of one or several rows in a video frame. Slices are more flexible and can partition the frame into a variable number of macroblocks. A macroblock in turn comprises of four luminance and two spatially corresponding colour difference blocks of image data. All blocks in an l-frame are intra-coded. Blocks in an inter-coded P or B-frame can be of both intra-coded blocks (l-blocks) and inter-coded blocks (P-blocks or B-blocks).
The increase of l-block/P-block ratio (Ib/Pb) in P-frames or l-block/B-block ratio (Ib/Bb) in B-frames has two consequences: (1 ) improving the error resilience because more intra-coded blocks result in less error propagation; (2) increasing the bitrate because inter-coded blocks comprise substantially smaller amounts of data than intra-coded blocks. The encoder controller controls the encoder to make the best use of wireless channel utilisation for error resilient video encoding.
The error control can also be achieved by allocating GOBs or slices in a frame wherein the header of each GOB or slice can serve as synchronisation markers for decoder to regain synchronisation.
Figure 18 illustrates the operation of the encoder controller 108. The encoder controller operates by a closed-loop process. In the first step 400, the encoder controller obtains relevant information from various sources including the BR and BER associated with the wireless channel between the encoder and the network from air interface, the RBR and BER' via IP control session from the decoder, latency jitter (ΔL) of RTP packets from the RTCP client 123 and the instantaneous bitrate (IBR) from the encoder. In the second step 410, the encoder controller determines the target BR (BRtarget) and the frame type (FT) based on the BR, RBR and IBR. In the third step 420, the encoder controller
determines the I /Pb ratio for P-frames and the I /Bb for B-frames and the synchronisation marker rate Rsync for all frames. In the fourth step 430, the encoder controller determines the QP and the frame rate (FR) for the frame based on the Ib/Pb or Ib/Bb, and BRtarget- In the fifth step 440, the encoder controller sends the encoding parameters FT, FR, Rsync, Ib/P or Ib/B , and QP to the encoder. In the sixth step 445, the encoder controller sends BR via IP control session to the decoder. Then the encoder controller goes back to the first step 400.
Figure 19 illustrates the operation of the decoder controller 122. The decoder controller operates by a closed-loop process. In the first step 450, the decoder controller obtains relevant information from various sources including the BR' and BER' associated with the wireless channel between the decoder and the network from air interface, the BR associated with the wireless channel between the encoder and the network via IP session from the encoder. In the second step 460, the decoder controller determines ΔL of RTP packets received. In the third step 470, the decoder controller calculates RBR based on ΔL, BR and BR'. In the fourth step 480, the decoder controller sends the RBR and BER' via IP control session to the encoder controller. Then the decoder controller goes back to the first step 450 .Figure 20 illustrates the operation of the encoder 106. The encoder operates on a frame-by-frame basis. In the first step 490, the encoder obtains the encoding parameters including FT, FR, Rsync, Ib/Pb or Ib/Bb, and QP from the encoder controller. In the second step 492, the encoder allocates GOBs or slices for inter-coded frames based on Rsync- In the third step 494, the encoder further allocates the l-block distribution within P or B-frames based on Ib/Pb or Ib/Bb. In the fourth step 496, the encoder encodes the frame using the above encoding parameters and adds it to the CVDS. In the fifth step 498, the encoder calculates the IBR and sends it to the encoder controller. Then the encoder goes back to the first step, to process the next frame.
Although various aspects of the invention have been described with reference to specific embodiments, it will be appreciated by those skilled in the art that the invention can be embodied in many other forms.
DEFINITIONS
Unless the contrary intention clearly appears from the context in which certain words are used, the following definitions apply to words used in this specification:
WIRELESS CHANNEL; a physical radio channel with associated QoS parameters, e.g. UMTS Radio Access Bearer.
END TO END LINK; an end to end communications link between transmitter and receiver containing a transmit side wireless channel and / or a receive side wireless channel;
IP SESSION; An IP communications session between two IP hosts carrying either control or application data. Examples are a RTP session, a RTCP session, a RTSP session and a SIP session. One or more IP sessions can be mapped to a wireless channel;
COMPRESSED VIDEO DATA STREAM (CVDS); an overall video stream, where the original stream of image frames is compressed by means of an encoder;
FRAME; a video encoder outputs a number of different frame types. These include full still images and derivative images that have different data transmission requirements, have different sensitivity to errors and may have dependency on other frames, in the MPEG-4 video standard, frames are also known as Video Object Planes (VOPS);
SUBSTREAM; a CVDS may be split into a number of substreams for the purposes of transmission over a channel or plurality of channels. Each substream can be used as a means of transmitting a sequence of video frames that may be of different types. A substream is not necessarily the same as a layer as defined in video standards such as MPEG-4 and H.263. A substream could also be used to transmit any part of the coded frame data that can be successfully partitioned for error resilient purposes (e.g. DCT coefficients and motion vector data in data
partitioning modes of MPEG-4/H.263). Each substream is transported by an RTP session and an associated RTCP session.
RTCP SERVER; an entity generating RTCP Receiver Reports based on the reception of RTP packets. These are sent to the transmitter of the RTP packets, where an RTCP client uses them.
RTCP CLIENT; an entity that uses RTCP Receiver Reports. These are sent from the receiver of the RTP packets, where an RTCP server generates them.