WO2009152845A1 - Converting encrypted media data - Google Patents

Converting encrypted media data Download PDF

Info

Publication number
WO2009152845A1
WO2009152845A1 PCT/EP2008/057548 EP2008057548W WO2009152845A1 WO 2009152845 A1 WO2009152845 A1 WO 2009152845A1 EP 2008057548 W EP2008057548 W EP 2008057548W WO 2009152845 A1 WO2009152845 A1 WO 2009152845A1
Authority
WO
WIPO (PCT)
Prior art keywords
packet
media data
streaming service
switched streaming
srtp
Prior art date
Application number
PCT/EP2008/057548
Other languages
French (fr)
Inventor
Rolf Blom
Karl Norrman
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2008/057548 priority Critical patent/WO2009152845A1/en
Publication of WO2009152845A1 publication Critical patent/WO2009152845A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0457Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply dynamic encryption, e.g. stream encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0464Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload using hop-by-hop encryption, i.e. wherein an intermediate entity decrypts the information and re-encrypts it before forwarding it
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]

Definitions

  • the invention relates to the field of converting between encryption transforms for streaming media data.
  • Real-time Transport Protocol is format for delivering audio and video media data over a packet switched network.
  • RTP is used for transporting real-time media data, such as interactive audio and video. It is therefore used in applications such is IPTV, conferencing, Voice over IP (VoIP).
  • SRTP Secure Real-time Transport Protocol
  • IETF RFC 371 1 is a transport security protocol that provides a form of encrypted RTP. In addition to encryption, it provides message authentication and integrity, and replay protection, in unicast, multicast and broadcast applications. SRTP is used to protect content delivered between peers in an RTP session.
  • RTP is closely related to RTCP (RTP control protocol), which can be used to control the RTP session, and similarly SRTP has a sister protocol, called Secure RTCP (or SRTCP). SRTCP provides the same security-related features to RTCP as the ones provided by SRTP to RTP.
  • SRTP only protects data during the transport between the two peers running SRTP, it does not protect data once it has been delivered to the endpoint of the SRTP session. Furthermore, the sending peer is assumed to have knowledge of all keying material and to encrypt the data. There are circumstances in which it would be desirable to use SRTP for protection of media data in a different trust model to that one for which SRTP was designed. For example, even though the sending peer is transmitting the media data, it may be required that the sending peer should not be able to access the plaintext media data. For example, a media data source may apply the protection to the media data, and then pass the encrypted media data to a streaming server to stream the media data to clients.
  • Another example is where a user wishes to record a message on a network answering machine. A second user, receiving the message, should be able to listen to the message at a later stage. In this case the encryption should be applied between the users, and the answering machine should not be able to view the content in plaintext.
  • Packet-switched Streaming Service described in TS 26.234, defines an encryption transform that is applied to media data before SRTP is involved in the processing of the media data.
  • the encryption key for this transform is not known by the streaming server itself, and the media content is integrity protected by using SRTP between the streaming server and the client. In this way, the streaming server does not have access to the plaintext media data.
  • the encryption transform specified for PSS encrypts each data unit using a 128-bit Advanced Encryption Standard (AES) in counter mode for the encryption, as described in NIST Special Publication 800-38A, "Recommendation for Block Cipher Modes of AES
  • Each SRTP packet delivered from the streaming server contains one encrypted data unit.
  • the signalling and architecture is illustrated in Figure 1 , in which a Media data source 1 and a receiving client 2 have knowledge of the keying materials for encryption using AES.
  • Integrity protection using SRTP is applied between an intermediate node 3, such as a
  • the intermediate node 3 therefore cannot access the plaintext media data.
  • the solid arrows in Figure 1 shows the protection end points, and the dashed arrows shows the signalling of information necessary for the client 2 to be able to retrieve the plaintext media content.
  • IV is a block of bits used in conjunction with keying materials that prevents a data unit that is identical to a previous data unit from producing the same ciphertext when encrypted and in the case of stream cipher modes of operation, e.g., counter mode, to encrypt two different data units using the same key-stream sequence.
  • IVSN IV sequence number
  • nonce a random or pseudorandom number which remains the same through out the entire session and is signalled out of band before the streaming begins.
  • IV (nonce * 2 ⁇ 16) XOR (IVSN * 2 ⁇ 16)
  • PSS assumes that the intermediate node 3 has received the encrypted data units from the Content Creator 1 together with the IVSN for each data unit. This works well in the case where the streaming server is delivering songs or video clips from a typical media data source.
  • the network answering machine it is typically the case that the initiator of the phone call expects to communicate directly with the responder (which would be another terminal). The initiator therefore attempts to set up a regular SRTP session.
  • the network answering machine would receive SRTP packets containing normal encrypted codec data units as the payload. These payloads would not be on the PSS encryption format, and PSS protection could therefore not be used to achieve end-to-end encryption of the data as described above.
  • the inventors have realised the problems associated with the prior art, and have invented a method and apparatus for allowing conversion between SRTP and Packet- switched Streaming Service media streams.
  • An intermediate node receives Secure Real-time Transport Protocol (SRTP) media data packets having an encrypted payload.
  • the intermediate node modifies each media data packet, such that a SRTP Initialization Vector (IV) is converted to a corresponding Packet-switched Streaming Service (PSS) IV.
  • the modified media data packets are then sent to a receiver node using a PSS protocol. This allows the receiver to receive the SRTP media in a PSS format.
  • SRTP Secure Real-time Transport Protocol
  • PSS Packet-switched Streaming Service
  • the method optionally comprises, at the intermediate node, receiving a SRTP salting key. This, along with a data source identifier, is used to derive a PSS nonce value. For each modified media data packet, the PSS IV is derived using the derived nonce and a SRTP sequence number contained in each received media data packet. As a further option, the PSS nonce value is derived using a SRTP Roll-Over Counter start value.
  • a PSS nonce value is derived using a dynamic salting key and a data source identifier, the PSS nonce value forming part of the PSS IV.
  • receiver node When the receiver node, receiver node receives the modified media data packets, it optionally performing PSS decryption on the received modified media data packets, allowing it to render the media data.
  • the method optionally comprises, prior to sending the modified media packets to the receiver node, storing the modified media packets in a memory. This is useful where the intermediate node is not simply provided to perform the conversion, but is, for example, a store and forward node or a network answering machine.
  • a media data source node derives a PSS nonce using a SRTP salting key and a data source identifier.
  • a PSS IV is derived for each PSS media packet using the derived nonce and a packet sequence, and the derived PSS IV is included in each packet.
  • an intermediate node for use in a communication network.
  • the node is provided with a a receiver for receiving SRTP media data packets having an encrypted payload.
  • a processor is used to modify each media data packet, such that a SRTP IV Sequence Number is converted to a corresponding PSS IV Sequence Number.
  • a transmitter is provided for sending the modified media data packets to a receiver node using a PSS protocol.
  • the intermediate node is optionally provided with a second receiver for receiving a SRTP salting key, and the processor is arrange to derive a PSS nonce value using the salting key and a data source identifier. For each modified media data packet, the PSS IV Sequence Number is derived using the derived nonce and a SRTP index.
  • the transmitter is further arranged to send the derived nonce to the receiver node.
  • the processor is further arranged to derive the PSS nonce value using a SRTP Roll-Over Counter start value.
  • a receiver node for use in a communication network.
  • the receiver node is provided with a receiver for receiving PSS encrypted data packets, the data packets having a PSS IV and a SRTP payload.
  • a processor is also provided for performing PSS decryption on the received modified media data packets.
  • a media data source node comprising a source of media data packets, and a processor for deriving a PSS nonce using a SRTP salting key and a data source identifier.
  • the processor is also to derive a PSS IV using the derived nonce and a packet sequence; and include the derived PSS IV in each packet.
  • a transmitter is also provided for sending the media data packets having a derived PSS IV to a remote node.
  • Figure 1 illustrates schematically in a block diagram the signalling and architecture used for a Packet-switched Streaming Service
  • Figure 2 illustrates schematically the generation of an Initialization Vector for use in a Packet-switched Streaming Service
  • Figure 3 illustrates schematically the generation of an Initialization Vector for use in a Packet-switched Streaming Service from an SRTP data packet
  • FIG. 4 is a flow diagram showing steps according to an embodiment of the invention.
  • Figure 5 is a flow diagram showing steps according to a further embodiment of the invention.
  • Figure 6 is a flow diagram illustrating steps of generating PSS media data packets according to an embodiment of the invention.
  • Figure 6 illustrates schematically in a block diagram an intermediate node according to an embodiment of the invention.
  • a solution to the problems described above would be for an intermediate node such as a network answering machine or a Streaming Server to receive a standard SRTP stream.
  • the intermediate node would check the integrity of the SRTP stream without being able but not access the plaintext media data. It would then store the received encrypted payloads together with some information derived from the SRTP security context, enabling the generation of PSS payloads and the PSS nonce.
  • the media data is subsequently delivered to the intended recipient, it is sent with SRTP using only integrity protection and "PSS" payloads.
  • the intermediate node cannot check the integrity of the incoming SRTP without also being able to access the content in plaintext, owing to the keying model employed in SRTP.
  • the intermediate node 3 converts SRTP encrypted data to data in a PSS format without having to, or even being able to, decrypt and re-encrypt the data, which therefore prevents the intermediate node from accessing the plaintext data. There is no need for conversion of the encrypted data unit itself.
  • a nonce and IVSN is created to ensure that the IV used by the PSS format exactly matches the IV used in the SRTP encryption.
  • SRTP has two modes for encryption. One is the AES counter mode (AES-CTR), and the other is the AES F8 mode (AES-F8). These two modes use the same encryption key, but use different IVs.
  • AES-CTR AES counter mode
  • AES-F8 AES F8 mode
  • the media data source 1 uses SRTP AES-CTR to encrypt media data
  • it provides the encrypted media data to the intermediate node 3.
  • the intermediate node 3 must convert the IV used for the media data to a PSS encryption format.
  • the IV for AES-CTR is constructed as follows:
  • k_s is a salting key having a length less than or equal to 112-bit, and remains the same throughout the session.
  • SSRC is a 32-bit identity of the origin of the data in the SRTP session, and i is a 48-bit index (a sequence number which is unique for each packet).
  • the SRTP index consists of two parts; an SRTP sequence number transferred in the SRTP packets (the 16 least significant bits), and a Roll-over counter (ROC) (the 32 most significant bits of the index), which is increased each time the SRTP sequence number wraps around.
  • ROC Roll-over counter
  • the "session static" parts of the AES-CTR IV in SRTP are the salting key k_s and the SSRC. These are mapped to the session static part of the PSS IV, i.e., the nonce.
  • the PSS nonce is set as:
  • PSS nonce (k_s * 2 ⁇ 16) XOR (SSRC * 2 ⁇ 64)
  • the SRTP index associated with each SRTP packet is used as the IVSN. Note that the IVSN is 32 bits long, whereas the SRTP index is 48 bits long. This means that it is not possible to store SRTP streams with more than 2 ⁇ 32 packets. In the scenario in which the SRTP ROC does not start at zero in the original SRTP stream, it is necessary to accommodate for this by XOR:ing the 16 most significant bits of the start value for the ROC in the nonce.
  • Figure 3 illustrates schematically the final construction of the nonce for the PSS format when derived from an SRTP packet.
  • R0C_msb is the 16 most significant bits of the SRTP Roll-over counter.
  • the IVSNs used for PSS are defined as the 16 least significant bits of the ROC (ROCJsb) concatenated with the 16-bit RTP sequence numbers from the corresponding packets.
  • SRTP provides the possibility to refresh the encryption (and integrity) key during the session, by running it through a one-way function for each r:th packet (r is a constant agreed by the peers at session start up). This is a seldom used feature, but if it is used, then the peers using the PSS streaming must have signalled the constant r out- of-band and adjust the PSS packets for the changes of the keys. This is made possible because the sequence numbers used to synchronize the key change are equal to the IVSN used by PSS.
  • the intermediate node 3 does not have knowledge of the salting key k_s, as it does not have access to the SRTP master key. The reason for this is that SRTP derives the salting key k_s from the master key and a master salt. Therefore, the intermediate node that performs the conversion is provided with the derived salting key k_s.
  • FIG. 4 there is a flow diagram illustrating the basic steps according to an embodiment of the invention. The following numbering corresponds to the numbering used in Figure 4.
  • the intermediate node 2 is provided with k_s. 52.
  • the intermediate node receives SRTP encrypted media data packets from the media data source 1.
  • the intermediate node derives a PSS nonce as described above by XORing k_s and the SSRC, and the ROC_msb and stores it
  • a PSS IVSN is derived using the SRTP index.
  • Each PSS media data packet is modified to include the derived IVSN.
  • the media data packets are stored.
  • the modified media data packets and the nonce are subsequently sent to the receiving client 2.
  • the receiving client 2 decrypts the received media data packets.
  • the intermediate node 3 is a network answering machine.
  • the following numbering corresponds to the numbering used in Figure 4.
  • the answering machine records an incoming SRTP protected media session.
  • the answering machine receives the salting key k_s, records the SSRC used and the starting value for the ROC.
  • the answering machine stores the 16 msb's of the ROC.
  • the answering machine checks that the ROC hasn't increased to have its original 16 msb's changed.
  • the answering machine receives a message from a client 3 requesting that the message is played. 515.
  • the client 3 uses SRTP key derivation function applied to the master key and master salt shared with the original sender of the SRTP stream to generate an SRTP encryption key and a salting key.
  • the answering machine sends the 16 msb's of the ROC and the original SSRC to the client 3.
  • the client 3 calculates the nonce value as described above with reference to Figure 3.
  • the client 3 performs normal PSS packet decryption processing.
  • the answering machine receives SRTP input and sends PSS output.
  • the media data source computes the IV for the SRTP packets using k_s, SSRC and i before sending the SRTP input.
  • the answering machine requires k_s for the conversion of the SRTP IV to a PSS IV. It is possible for k_s to be sent out of band from the media data source 1 to the receiving client 2. In this case, the receiving client incorporates the received k_s into the received PSS IV before decrypting.
  • PSS nonce has to be constructed so that the SRTP receiver can be given the master key, master salt and simply XOR in the SSRC and ROC as usual.
  • the SRTP encryption key and the SRTP derived salting key are derived from the master key and master salt.
  • the resulting SRTP encryption key is used as PSS encryption key.
  • the nonce used in the encryption are structured to handle this.
  • SRTP also uses the RTP sequence number in the IV, but this is simply mapped to the IVSN.
  • the nonce used for the encryption of the PSS data units is constructed in the same way as shown in Figure 3.
  • the media data source node 1 must construct the nonce and derive the encryption key as an SRTP encryption key as described above if it is necessary to support SRTP receivers. Since only the SRTP master key would be conveyed in any reasonable key management solution, another consequence is that also PSS receivers have to derive the encryption key according to the SRTP key derivation function.
  • the steps of generating PSS media data packets suitable for conversion to SRTP are illustrated in Figure 6. The following numbering corresponds to the numbering of
  • the media data source 1 derives a PSS nonce using k_s and SSRC.
  • the R0C_msb is assumed to be set to zero, but if not then it is also used in the derivation.
  • a PSS IVSN is generated from a packet sequence number.
  • the PSS IVSN is included in the payload of each media data packet.
  • the SRTP AES-F8 transform has one important distinguishing factor compared to AES-CTR.
  • SRTP AES-F8 the fields of the SRTP header that are not constant over the session are included in the IV. As a consequence, these fields must be changed on a per packet basis.
  • session static data is encoded into the nonce and salting key. Since the salt and the nonce are kept static throughout the session, they cannot be used to encode the non-static fields of the SRTP header.
  • SRTP AES-F8 can only be converted to PSS by signalling a new salting key or nonce during a session to signal the non-static fields. However this is expensive in terms of bandwidth, as it is effectively equivalent to including an entire 128-bit IV with each packet.
  • an intermediate node 3 is illustrated.
  • the intermediate node 3 is provided with a receiver 4 for receiving k_s, and a further receiver 5 for receiving SRTP media data packets.
  • a processor 6 is arranged to modify the header of each media data packet such that the SRTP IVSN in each header is converted to a PSS IVSN. This is performed, as described above, by deriving a PSS nonce value using k_s and ROC, and for the SRTP payload, deriving a PSS IVSN using the nonce value and a SRTP sequence number.
  • a memory 7 is provided for storing the modified media data packets, and a transmitter 8 is provided for sending the modified media data packets and the nonce to a receiver client node 2.
  • a receiver client node 2 is illustrated.
  • the client node 2 is provide with a receiver 9 for receiving PSS encrypted data packets, the data packets having a PSS IV and a SRTP encrypted payload.
  • the receiver client node 2 is also provided with a processor 10 for performing Packet-switched Streaming Service decryption on the received modified media data packets.
  • a media data source node 1 that can generate PSS media data packets that can be converted to SRTP media data packets.
  • the media data source node 1 is provided with a source 1 1 of media data packets. This may be a memory in which data packets are stored, or a receiver for receiving media content from a remote source.
  • a processor 12 is provided for deriving a PSS nonce using a SRTP salting key k_s, and a data source identifier SSRC.
  • the processor 12 is also to derive a PSS IV for each media data packet using the derived nonce and a packet sequence.
  • a PSS IV is included in the header of each packet.
  • a transmitter 13 is also provided for sending the media data packets having PSS IVs to a remote node such as an intermediate node 3.
  • the invention allows SRTP streaming of content that is pre-encrypted with the PSS encryption format via an intermediate node that is not trusted by one of the two end point peers. Integrity protection cannot be used, unless the integrity key is provided to the intermediate node, or the key derivation function of SRTP is modified to allow encryption and integrity keys that are not derived from the master key and master salt. Clients only supporting one of the protocols can still receive end-to-end encryption with a node supporting the other format. It will be appreciated by the person of skill in the art that various modifications may be made to the above-described embodiments without departing from the scope of the present invention.

Abstract

A method and apparatus for converting encrypted media data. An intermediate node receives Secure Real-time Transport Protocol (SRTP) media data packets having an encrypted payload. The intermediate node modifies each media data packet, such that a SRTP Initialization Vector (IV) is converted to a corresponding Packet-switched Streaming Service (PSS) IV. The modified media data packets are then sent to a receiver node using a PSS protocol. This allows the receiver to receive the SRTP media in a PSS format.

Description

Converting Encrypted Media Data
TECHNICAL FIELD
The invention relates to the field of converting between encryption transforms for streaming media data.
BACKGROUND
Real-time Transport Protocol (RTP) is format for delivering audio and video media data over a packet switched network. RTP is used for transporting real-time media data, such as interactive audio and video. It is therefore used in applications such is IPTV, conferencing, Voice over IP (VoIP).
Secure Real-time Transport Protocol (SRTP), specified in IETF RFC 371 1 , is a transport security protocol that provides a form of encrypted RTP. In addition to encryption, it provides message authentication and integrity, and replay protection, in unicast, multicast and broadcast applications. SRTP is used to protect content delivered between peers in an RTP session.
RTP is closely related to RTCP (RTP control protocol), which can be used to control the RTP session, and similarly SRTP has a sister protocol, called Secure RTCP (or SRTCP). SRTCP provides the same security-related features to RTCP as the ones provided by SRTP to RTP.
As SRTP only protects data during the transport between the two peers running SRTP, it does not protect data once it has been delivered to the endpoint of the SRTP session. Furthermore, the sending peer is assumed to have knowledge of all keying material and to encrypt the data. There are circumstances in which it would be desirable to use SRTP for protection of media data in a different trust model to that one for which SRTP was designed. For example, even though the sending peer is transmitting the media data, it may be required that the sending peer should not be able to access the plaintext media data. For example, a media data source may apply the protection to the media data, and then pass the encrypted media data to a streaming server to stream the media data to clients. This may occur where, for example, there is no trust relationship between the media data source and the streaming server. Another example is where a user wishes to record a message on a network answering machine. A second user, receiving the message, should be able to listen to the message at a later stage. In this case the encryption should be applied between the users, and the answering machine should not be able to view the content in plaintext.
Packet-switched Streaming Service (PSS), described in TS 26.234, defines an encryption transform that is applied to media data before SRTP is involved in the processing of the media data. The encryption key for this transform is not known by the streaming server itself, and the media content is integrity protected by using SRTP between the streaming server and the client. In this way, the streaming server does not have access to the plaintext media data.
The encryption transform specified for PSS encrypts each data unit using a 128-bit Advanced Encryption Standard (AES) in counter mode for the encryption, as described in NIST Special Publication 800-38A, "Recommendation for Block Cipher Modes of
Operation: Methods and Techniques", 2001. http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf. Each SRTP packet delivered from the streaming server contains one encrypted data unit. The signalling and architecture is illustrated in Figure 1 , in which a Media data source 1 and a receiving client 2 have knowledge of the keying materials for encryption using AES.
Integrity protection using SRTP is applied between an intermediate node 3, such as a
Streaming Server, and the client 2. The intermediate node 3 therefore cannot access the plaintext media data. The solid arrows in Figure 1 shows the protection end points, and the dashed arrows shows the signalling of information necessary for the client 2 to be able to retrieve the plaintext media content.
Assuming that the client 2 shares the encryption key with the media data source 1 , the client 2 only needs to be able to re-construct the same initialization vector (IV) as was used by the media data source 1 when the data unit was encrypted. An IV is a block of bits used in conjunction with keying materials that prevents a data unit that is identical to a previous data unit from producing the same ciphertext when encrypted and in the case of stream cipher modes of operation, e.g., counter mode, to encrypt two different data units using the same key-stream sequence. The IV for each data unit is, in PSS, constructed from an IV sequence number (IVSN), which is transported with each data unit combined with a nonce (a random or pseudorandom number which remains the same through out the entire session and is signalled out of band before the streaming begins). The construction of a 128-bit IV is illustrated in Figure 2, and defined as follows:
IV = (nonce * 2Λ16) XOR (IVSN * 2Λ16)
PSS assumes that the intermediate node 3 has received the encrypted data units from the Content Creator 1 together with the IVSN for each data unit. This works well in the case where the streaming server is delivering songs or video clips from a typical media data source. However, a problem arises in cases where the data received by the streaming server cannot be assumed to be in the correct PSS format. For example, in the network answering machine case described above, it is typically the case that the initiator of the phone call expects to communicate directly with the responder (which would be another terminal). The initiator therefore attempts to set up a regular SRTP session. In this case, the network answering machine would receive SRTP packets containing normal encrypted codec data units as the payload. These payloads would not be on the PSS encryption format, and PSS protection could therefore not be used to achieve end-to-end encryption of the data as described above.
Another problem arises where it would be desirable to record the incoming SRTP stream and retransmit at a later time. An application initiating a new RTP stream can in principle not determine on which RTP sequence number the new RTP stream will start. It is also impossible to use the original Synchronization Source (SSRC), as the RTP standard specifies that the SSRC shall be chosen at random at the start of the session. Furthermore, a solution based on retransmission of an RTP stream exactly as it was received would require that the RTP stream was terminated after every played message and a new stream established for the next message. This would be a very inefficient solution. Using protocols existing today, the only solution to the problem would be to terminate the encryption in the answering machine, store the contents as plaintext, and then deliver them to the receiver using another SRTP session. This is disadvantageous, since the two peers may not trust the answering machine operated by a third party with their plaintext data. SUMMARY
The inventors have realised the problems associated with the prior art, and have invented a method and apparatus for allowing conversion between SRTP and Packet- switched Streaming Service media streams.
According to a first aspect of the invention, there is provided a method of converting encrypted media data. An intermediate node receives Secure Real-time Transport Protocol (SRTP) media data packets having an encrypted payload. The intermediate node modifies each media data packet, such that a SRTP Initialization Vector (IV) is converted to a corresponding Packet-switched Streaming Service (PSS) IV. The modified media data packets are then sent to a receiver node using a PSS protocol. This allows the receiver to receive the SRTP media in a PSS format.
The method optionally comprises, at the intermediate node, receiving a SRTP salting key. This, along with a data source identifier, is used to derive a PSS nonce value. For each modified media data packet, the PSS IV is derived using the derived nonce and a SRTP sequence number contained in each received media data packet. As a further option, the PSS nonce value is derived using a SRTP Roll-Over Counter start value.
In an alternative option, a PSS nonce value is derived using a dynamic salting key and a data source identifier, the PSS nonce value forming part of the PSS IV.
When the receiver node, receiver node receives the modified media data packets, it optionally performing PSS decryption on the received modified media data packets, allowing it to render the media data.
The method optionally comprises, prior to sending the modified media packets to the receiver node, storing the modified media packets in a memory. This is useful where the intermediate node is not simply provided to perform the conversion, but is, for example, a store and forward node or a network answering machine.
According to a second aspect of the invention, there is provided a method of generating PSS encrypted media. A media data source node derives a PSS nonce using a SRTP salting key and a data source identifier. A PSS IV is derived for each PSS media packet using the derived nonce and a packet sequence, and the derived PSS IV is included in each packet.
According to a third aspect of the invention, there is provided an intermediate node for use in a communication network. The node is provided with a a receiver for receiving SRTP media data packets having an encrypted payload. A processor is used to modify each media data packet, such that a SRTP IV Sequence Number is converted to a corresponding PSS IV Sequence Number. A transmitter is provided for sending the modified media data packets to a receiver node using a PSS protocol.
The intermediate node is optionally provided with a second receiver for receiving a SRTP salting key, and the processor is arrange to derive a PSS nonce value using the salting key and a data source identifier. For each modified media data packet, the PSS IV Sequence Number is derived using the derived nonce and a SRTP index. The transmitter is further arranged to send the derived nonce to the receiver node. As a further option, the processor is further arranged to derive the PSS nonce value using a SRTP Roll-Over Counter start value.
According to a fourth aspect of the invention, there is provided a receiver node for use in a communication network. The receiver node is provided with a receiver for receiving PSS encrypted data packets, the data packets having a PSS IV and a SRTP payload. A processor is also provided for performing PSS decryption on the received modified media data packets.
According to a fifth aspect of the invention, there is provided a media data source node. The node comprises a source of media data packets, anda processor for deriving a PSS nonce using a SRTP salting key and a data source identifier. The processor is also to derive a PSS IV using the derived nonce and a packet sequence; and include the derived PSS IV in each packet. A transmitter is also provided for sending the media data packets having a derived PSS IV to a remote node.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates schematically in a block diagram the signalling and architecture used for a Packet-switched Streaming Service; Figure 2 illustrates schematically the generation of an Initialization Vector for use in a Packet-switched Streaming Service;
Figure 3 illustrates schematically the generation of an Initialization Vector for use in a Packet-switched Streaming Service from an SRTP data packet;
Figure 4 is a flow diagram showing steps according to an embodiment of the invention;
Figure 5 is a flow diagram showing steps according to a further embodiment of the invention;
Figure 6 is a flow diagram illustrating steps of generating PSS media data packets according to an embodiment of the invention; and
Figure 6 illustrates schematically in a block diagram an intermediate node according to an embodiment of the invention.
DETAILED DESCRIPTION
A solution to the problems described above would be for an intermediate node such as a network answering machine or a Streaming Server to receive a standard SRTP stream. The intermediate node would check the integrity of the SRTP stream without being able but not access the plaintext media data. It would then store the received encrypted payloads together with some information derived from the SRTP security context, enabling the generation of PSS payloads and the PSS nonce. When the media data is subsequently delivered to the intended recipient, it is sent with SRTP using only integrity protection and "PSS" payloads. However, the intermediate node cannot check the integrity of the incoming SRTP without also being able to access the content in plaintext, owing to the keying model employed in SRTP.
According to an embodiment of the invention, the intermediate node 3 converts SRTP encrypted data to data in a PSS format without having to, or even being able to, decrypt and re-encrypt the data, which therefore prevents the intermediate node from accessing the plaintext data. There is no need for conversion of the encrypted data unit itself. A nonce and IVSN is created to ensure that the IV used by the PSS format exactly matches the IV used in the SRTP encryption.
The following description assumes that the key used for media data protection has already been delivered to the client 2 using an out of band mechanism.
SRTP has two modes for encryption. One is the AES counter mode (AES-CTR), and the other is the AES F8 mode (AES-F8). These two modes use the same encryption key, but use different IVs. As described above, the PSS IV is generated using the formula IV = (nonce * 2Λ16 ) XOR (IVSN * 2Λ16). It should be noted that the dynamic part of the IV, the sequence number IVSN, is XORed to the static parts in the case of both AES-CTR and AES-F8 both cases.
Where the media data source 1 uses SRTP AES-CTR to encrypt media data, it provides the encrypted media data to the intermediate node 3. The intermediate node 3 must convert the IV used for the media data to a PSS encryption format. The IV for AES-CTR is constructed as follows:
IV = (k_s * 2Λ16) XOR (SSRC * 2Λ64) XOR (i * 2Λ16)
In the above equation, k_s is a salting key having a length less than or equal to 112-bit, and remains the same throughout the session. SSRC is a 32-bit identity of the origin of the data in the SRTP session, and i is a 48-bit index (a sequence number which is unique for each packet). The SRTP index consists of two parts; an SRTP sequence number transferred in the SRTP packets (the 16 least significant bits), and a Roll-over counter (ROC) (the 32 most significant bits of the index), which is increased each time the SRTP sequence number wraps around.
The "session static" parts of the AES-CTR IV in SRTP are the salting key k_s and the SSRC. These are mapped to the session static part of the PSS IV, i.e., the nonce. When converting to a PSS format from SRTP using AES-CTR, the PSS nonce is set as:
PSS nonce = (k_s * 2Λ16) XOR (SSRC * 2Λ64) The SRTP index associated with each SRTP packet is used as the IVSN. Note that the IVSN is 32 bits long, whereas the SRTP index is 48 bits long. This means that it is not possible to store SRTP streams with more than 2Λ32 packets. In the scenario in which the SRTP ROC does not start at zero in the original SRTP stream, it is necessary to accommodate for this by XOR:ing the 16 most significant bits of the start value for the ROC in the nonce. Figure 3 illustrates schematically the final construction of the nonce for the PSS format when derived from an SRTP packet.
R0C_msb is the 16 most significant bits of the SRTP Roll-over counter. The IVSNs used for PSS are defined as the 16 least significant bits of the ROC (ROCJsb) concatenated with the 16-bit RTP sequence numbers from the corresponding packets.
As the IVSNs are only 32 bits long, and there is no functionality corresponding to the
ROC in PSS, this scheme will break down when the IVSN wraps around (and there would have been a carry propagation into the 16 most significant bits of the ROC). However, this is not a very limiting restriction, since PSS in general cannot handle more than 2Λ32 encrypted data units.
SRTP provides the possibility to refresh the encryption (and integrity) key during the session, by running it through a one-way function for each r:th packet (r is a constant agreed by the peers at session start up). This is a seldom used feature, but if it is used, then the peers using the PSS streaming must have signalled the constant r out- of-band and adjust the PSS packets for the changes of the keys. This is made possible because the sequence numbers used to synchronize the key change are equal to the IVSN used by PSS.
The intermediate node 3 does not have knowledge of the salting key k_s, as it does not have access to the SRTP master key. The reason for this is that SRTP derives the salting key k_s from the master key and a master salt. Therefore, the intermediate node that performs the conversion is provided with the derived salting key k_s.
Referring to Figure 4, there is a flow diagram illustrating the basic steps according to an embodiment of the invention. The following numbering corresponds to the numbering used in Figure 4.
S1. The intermediate node 2 is provided with k_s. 52. The intermediate node receives SRTP encrypted media data packets from the media data source 1.
53. The intermediate node derives a PSS nonce as described above by XORing k_s and the SSRC, and the ROC_msb and stores it
54. A PSS IVSN is derived using the SRTP index.
55. Each PSS media data packet is modified to include the derived IVSN.
56. The media data packets are stored.
57. The modified media data packets and the nonce are subsequently sent to the receiving client 2.
58. The receiving client 2 decrypts the received media data packets.
Referring to Figure 5, the following describes an example in which the intermediate node 3 is a network answering machine. The following numbering corresponds to the numbering used in Figure 4.
59. The answering machine records an incoming SRTP protected media session.
510. At the start of the session, the answering machine receives the salting key k_s, records the SSRC used and the starting value for the ROC.
511. The answering machine stores the 16 msb's of the ROC.
512. The answering machine checks that the ROC hasn't increased to have its original 16 msb's changed.
513. The answering machine records the payload packet together with the associated IVSN = (ROCJsb) || (SEQ from RTP).
S14. The answering machine receives a message from a client 3 requesting that the message is played. 515. The client 3 uses SRTP key derivation function applied to the master key and master salt shared with the original sender of the SRTP stream to generate an SRTP encryption key and a salting key.
516. The answering machine sends the 16 msb's of the ROC and the original SSRC to the client 3.
517. The client 3 calculates the nonce value as described above with reference to Figure 3.
518. The answering machine sends the original SRTP payloads and IVSN as the payload of the PSS stream, PSS payload = (SRTP payload) and IVSN.
S19. The client 3 performs normal PSS packet decryption processing.
Note that the answering machine receives SRTP input and sends PSS output. The media data source computes the IV for the SRTP packets using k_s, SSRC and i before sending the SRTP input. The answering machine requires k_s for the conversion of the SRTP IV to a PSS IV. It is possible for k_s to be sent out of band from the media data source 1 to the receiving client 2. In this case, the receiving client incorporates the received k_s into the received PSS IV before decrypting.
It is, of course, also possible that data stored in a PSS encryption format can be streamed using regular SRTP. This is useful in the case where the client 3 can decrypt SRTP, but not PSS. To achieve this, the stored data is produced with an SRTP client in mind. In other words, the PSS nonce has to be constructed so that the SRTP receiver can be given the master key, master salt and simply XOR in the SSRC and ROC as usual.
The SRTP encryption key and the SRTP derived salting key are derived from the master key and master salt. The resulting SRTP encryption key is used as PSS encryption key. Next, since the SRTP client 3 uses the XOR of the SSRC, the ROC and the derived salting key as IV, the nonce used in the encryption are structured to handle this. SRTP also uses the RTP sequence number in the IV, but this is simply mapped to the IVSN. The nonce used for the encryption of the PSS data units is constructed in the same way as shown in Figure 3.
For this to work, the media data source node 1 must construct the nonce and derive the encryption key as an SRTP encryption key as described above if it is necessary to support SRTP receivers. Since only the SRTP master key would be conveyed in any reasonable key management solution, another consequence is that also PSS receivers have to derive the encryption key according to the SRTP key derivation function. The steps of generating PSS media data packets suitable for conversion to SRTP are illustrated in Figure 6. The following numbering corresponds to the numbering of
Figure 6:
S20. The media data source 1 derives a PSS nonce using k_s and SSRC. The R0C_msb is assumed to be set to zero, but if not then it is also used in the derivation.
522. For each PSS data packet, a PSS IVSN is generated from a packet sequence number.
523. The PSS IVSN is included in the payload of each media data packet.
Turning now to the case where SRTP AES-F8 encryption is used, the SRTP AES-F8 transform has one important distinguishing factor compared to AES-CTR. Using SRTP AES-F8, the fields of the SRTP header that are not constant over the session are included in the IV. As a consequence, these fields must be changed on a per packet basis. When transforming from AES-CTR to PSS, session static data is encoded into the nonce and salting key. Since the salt and the nonce are kept static throughout the session, they cannot be used to encode the non-static fields of the SRTP header. SRTP AES-F8 can only be converted to PSS by signalling a new salting key or nonce during a session to signal the non-static fields. However this is expensive in terms of bandwidth, as it is effectively equivalent to including an entire 128-bit IV with each packet.
Referring to Figure 7, an intermediate node 3 is illustrated. The intermediate node 3 is provided with a receiver 4 for receiving k_s, and a further receiver 5 for receiving SRTP media data packets. A processor 6 is arranged to modify the header of each media data packet such that the SRTP IVSN in each header is converted to a PSS IVSN. This is performed, as described above, by deriving a PSS nonce value using k_s and ROC, and for the SRTP payload, deriving a PSS IVSN using the nonce value and a SRTP sequence number. A memory 7 is provided for storing the modified media data packets, and a transmitter 8 is provided for sending the modified media data packets and the nonce to a receiver client node 2.
Referring to Figure 8, a receiver client node 2 is illustrated. The client node 2 is provide with a receiver 9 for receiving PSS encrypted data packets, the data packets having a PSS IV and a SRTP encrypted payload. The receiver client node 2 is also provided with a processor 10 for performing Packet-switched Streaming Service decryption on the received modified media data packets.
Referring to Figure 9, there is illustrated a media data source node 1 that can generate PSS media data packets that can be converted to SRTP media data packets. The media data source node 1 is provided with a source 1 1 of media data packets. This may be a memory in which data packets are stored, or a receiver for receiving media content from a remote source. A processor 12 is provided for deriving a PSS nonce using a SRTP salting key k_s, and a data source identifier SSRC. The processor 12 is also to derive a PSS IV for each media data packet using the derived nonce and a packet sequence. A PSS IV is included in the header of each packet. A transmitter 13 is also provided for sending the media data packets having PSS IVs to a remote node such as an intermediate node 3.
In alternative embodiments, it may be possible to change transforms in SRTP and a new version of PSS to define pairs of transforms; one for SRTP and one for a PSS-like system. Transforming SRTP into PSS would be possible if both use the same core crypto transforms and the IV of the new crypto transforms only rely on a fixed part and an index.
The invention allows SRTP streaming of content that is pre-encrypted with the PSS encryption format via an intermediate node that is not trusted by one of the two end point peers. Integrity protection cannot be used, unless the integrity key is provided to the intermediate node, or the key derivation function of SRTP is modified to allow encryption and integrity keys that are not derived from the master key and master salt. Clients only supporting one of the protocols can still receive end-to-end encryption with a node supporting the other format. It will be appreciated by the person of skill in the art that various modifications may be made to the above-described embodiments without departing from the scope of the present invention.
The following abbreviates are used in the above description:
AES Advanced Encryption Standard
CTR CounTeR mode i index, a sequence number unique to each data packet
IV Initialization Vector
IVSN IV Sequence Number k_s Salting key
NONCE Number used ONCE PSS Packet Switched Streaming
ROC Roll-Over Counter
R0C_msb Most significant bits of the ROC
RTP Real-time Transport Protocol
SRTP Secure Real-Time transport Protocol SSRC Synchronization SouRCe

Claims

CLAIMS:
1. A method of converting encrypted media data, the method comprising: at an intermediate node, receiving Secure Real-time Transport Protocol media data packets having an encrypted payload; modifying each media data packet, such that a Secure Real-time Transport Protocol Initialization Vector is converted to a corresponding Packet-switched Streaming Service Initialization Vector; sending the modified media data packets to a receiver node using a Packet- switched Streaming Service protocol.
2. The method according to claim 1 , comprising: at the intermediate node, receiving a Secure Real-time Transport Protocol salting key; deriving a Packet-switched Streaming Service nonce value using the salting key and a data source identifier; and for each modified media data packet, deriving the Packet-switched Streaming Service Initialization Vector using the derived nonce and a Secure Real-time Transport Protocol sequence number contained in each received media data packet.
3. The method according to claim 2, further comprising: deriving the Packet-switched Streaming Service nonce value using a Secure Real-time Transport Protocol Roll-Over Counter start value.
4. The method according to claim 1 , the method comprising: for each media data packet, deriving a Packet-switched Streaming Service nonce value using a dynamic salting key and a data source identifier, the Packet- switched Streaming Service nonce value forming part of the Packet-switched Streaming Service Initialization Vector.
5. The method according to any one of claims 1 to 4, further comprising, at the receiver node, receiving the modified media data packets; and performing Packet-switched Streaming Service decryption on the received modified media data packets.
6. The method according to any one of claims 1 to 5, further comprising, prior to sending the modified media packets to the receiver node, storing the modified media packets in a memory.
7. A method of generating Packet-switched Streaming Service encrypted media, the method comprising: at a media data source node, deriving a Packet-switched Streaming Service nonce using a Secure Real-time Transport Protocol salting key and a data source identifier; and for each Packet-switched Streaming Service media packet, a Packet-switched
Streaming Service Initialization Vector using the derived nonce and a packet sequence; and including a derived Packet-switched Streaming Service Initialization Vector in each packet.
8. An intermediate node for use in a communication network, the node comprising: a receiver for receiving Secure Real-time Transport Protocol media data packets having an encrypted payload; a processor for modifying each media data packet, such that a Secure Real- time Transport Protocol Initialization Vector Sequence Number is converted to a corresponding Packet-switched Streaming Service Initialization Vector Sequence Number; a transmitter for sending the modified media data packets to a receiver node using a Packet-switched Streaming Service protocol.
9. The intermediate node according to claim 8, further comprising: a second receiver for receiving a Secure Real-time Transport Protocol salting key; wherein the processor is arrange to derive a Packet-switched Streaming Service nonce value using the salting key and a data source identifier and, for each modified media data packet, derive the Packet-switched Streaming Service Initialization Vector Sequence Number using the derived nonce and a Secure Realtime Transport Protocol index; and wherein the transmitter is further arranged to send the derived nonce to the receiver node.
10. The intermediate node according to claim 9, wherein the processor is further arranged to derive the Packet-switched Streaming Service nonce value using a Secure Real-time Transport Protocol Roll-Over Counter start value.
1 1. A receiver node for use in a communication network, the receiver node comprising: a receiver for receiving Packet-switched Streaming Service encrypted data packets, the data packets having a Packet-switched Streaming Service Initialization Vector and a Secure Real-time Transport Protocol payload; a processor for performing Packet-switched Streaming Service decryption on the received modified media data packets.
12. A media data source node, the node comprising: a source of media data packets; a processor for deriving a Packet-switched Streaming Service nonce using a
Secure Real-time Transport Protocol salting key and a data source identifier, the processor being further arranged to, for each media packet, derive a Packet-switched Streaming Service Initialization Vector using the derived nonce and a packet sequence; and including the derived Packet-switched Streaming Service Initialization Vector in each packet; and a transmitter for sending the media data packets having a derived Packet- switched Streaming Service Initialization Vector to a remote node.
PCT/EP2008/057548 2008-06-16 2008-06-16 Converting encrypted media data WO2009152845A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/057548 WO2009152845A1 (en) 2008-06-16 2008-06-16 Converting encrypted media data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/057548 WO2009152845A1 (en) 2008-06-16 2008-06-16 Converting encrypted media data

Publications (1)

Publication Number Publication Date
WO2009152845A1 true WO2009152845A1 (en) 2009-12-23

Family

ID=40343580

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/057548 WO2009152845A1 (en) 2008-06-16 2008-06-16 Converting encrypted media data

Country Status (1)

Country Link
WO (1) WO2009152845A1 (en)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Universal Mobile Telecommunications System (UMTS); Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs (3GPP TS 26.234 version 7.5.0 Release 7); ETSI TS 126 234", ETSI STANDARDS, LIS, SOPHIA ANTIPOLIS CEDEX, FRANCE, no. V7.5.0, 1 April 2008 (2008-04-01), XP014041757 *
ERICSSON: "Real-time transport of protected continuous PSS media", 3GPP TSG-SA WG4 TEMPORARY DOCUMENT; S4-030791, 29 WG4 MEETING, TEMPERE, FI, 24-28 NOVEMBER 2003, November 2003 (2003-11-01), XP050286421 *

Similar Documents

Publication Publication Date Title
Baugher et al. The secure real-time transport protocol (SRTP)
US7693278B2 (en) Data distribution apparatus and data communications system
Andreasen et al. Session description protocol (SDP) security descriptions for media streams
US7684565B2 (en) System for securely communicating information packets
US8503681B1 (en) Method and system to securely transport data encryption keys
US8548164B2 (en) Method and device for the encryption and decryption of data
US8645680B2 (en) Sending media data via an intermediate node
KR101238477B1 (en) Delivering policy updates for protected content
JP4856723B2 (en) Method, apparatus and / or computer program product for encrypting and transmitting media data between a media server and a subscriber device
JP2007529967A (en) Efficient transmission of cryptographic information in a secure real-time protocol
US8661243B2 (en) Storing and forwarding media data
KR20080033930A (en) Carrying protected content using a control protocol for streaming and a transport protocol
EP1611726A1 (en) Methods and apparatus for secure and adaptive delivery of multimedia content
US7466824B2 (en) Method and system for encryption of streamed data
Wang et al. A dependable privacy protection for end-to-end VoIP via Elliptic-Curve Diffie-Hellman and dynamic key changes
CN104618110A (en) VoIP safety meeting session key transmission method
Baugher et al. RFC3711: The Secure Real-time Transport Protocol (SRTP)
US20110093609A1 (en) Sending Secure Media Streams
WO2017197968A1 (en) Data transmission method and device
Steffen et al. SIP security
CN101222324B (en) Method and apparatus for implementing end-to-end media stream safety
Jung et al. Securing RTP packets using per-packet selective encryption scheme for real-time multimedia applications
WO2009152845A1 (en) Converting encrypted media data
Fries et al. On the applicability of various multimedia internet keying (mikey) modes and extensions
CN101729535B (en) Implementation method of media on-demand business

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08761064

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08761064

Country of ref document: EP

Kind code of ref document: A1