WO2014190811A1

WO2014190811A1 - Method and apparatus for remote display endpoint capability exchange, and data flow

Info

Publication number: WO2014190811A1
Application number: PCT/CN2014/075201
Authority: WO
Inventors: 王亮; 叶小阳
Original assignee: 中兴通讯股份有限公司
Priority date: 2013-06-01
Filing date: 2014-04-11
Publication date: 2014-12-04
Also published as: CN104219483A; CN104219483B

Abstract

Disclosed are a method and apparatus for remote display endpoint capability exchange, and a data flow. The method comprises: capability exchange is carried out between a first remote display endpoint and a second remote display endpoint, capability sets of the remote display endpoints being carried in capability exchange messages, send capability sets of the remote display endpoints being carried in the capability exchange message; the first remote display endpoint receives a mode request message from the second remote display endpoint; on the basis of capability exchange results and the received mode request information, the first remote display endpoint opens a logical channel between the first remote display endpoint and the second remote display endpoint. The present invention allows capability exchange between remote display terminals, thereby improving the user experience.

Description

The present invention relates to the field of communications, and in particular to a method and apparatus for interacting with capabilities of a remote presentation endpoint, and a data stream. In the H.323-based videoconferencing product, when the capability negotiation is performed on the H.245, only the traditional video conferencing endpoint capability can be negotiated, and the capability of the remote presentation endpoint cannot be negotiated. Specific Tables When capabilities are interacted, only the codec capabilities of traditional video conferencing endpoints can be interacted, and the new media stream receiving and sending capabilities of the remote rendering endpoints cannot be interacted. When you open a logical channel, you can only specify the codec attribute of the logical channel. You cannot specify the media stream attribute, codec attribute, multiplexing attribute, and binding relationship between these remote presentation endpoint capability attributes of the logical channel. . In view of the inability of the negotiation method in the related art to negotiate the capability set of the remote presentation endpoint, an effective solution has not been proposed yet. SUMMARY OF THE INVENTION The present invention provides a method and apparatus for interacting with capabilities of a remote presentation endpoint, and a data stream to address at least the above problems. According to an aspect of the present invention, a capability interaction method for remotely presenting an endpoint is provided, including: performing a capability interaction between a first remote presentation endpoint and a second remote presentation endpoint, where the capability interaction message carries a remote presentation endpoint capability set, wherein the capability interaction message carries the remote presentation endpoint transmission capability set; the first remote presentation endpoint receives a mode request message of the second remote presentation endpoint; the first remote presentation The endpoint opens a logical channel between the first telepresence endpoint and the second telepresence endpoint based on the result of the capability interaction and the received mode request information. Preferably, performing the capability interaction between the first remote presentation endpoint and the second remote presentation endpoint comprises: sending, by the first remote presentation endpoint, the first capability set interaction request to the second remote presentation endpoint, where the first The first set of transmission capabilities of the first remote presentation endpoint is carried in the capability set interaction request; The first remote presentation endpoint receives the second capability set interaction request sent by the second remote presentation endpoint, where the second capability set interaction request carries the second transmission capability set of the second remote presentation endpoint; The first remote presentation endpoint receives the first mode request message sent by the second remote presentation endpoint, where the first mode request message carries the sending parameter of the first remote presentation endpoint; Presenting, by the endpoint, a second mode request message sent by the endpoint to the second remote presentation endpoint, where the second mode request message carries a sending parameter of the second remote presentation endpoint; As a result of the mode request process corresponding to the second mode request message, sending a first logical channel request to the second remote presentation endpoint, where the first logical channel request is used to request the first remote rendering endpoint to open a forward logical channel between the first telepresence endpoint and the second telepresence endpoint; the first telepresence Receiving, by the point, the second logical channel request sent by the second remote presentation endpoint, where the second logical channel request is determined by the second remote presentation endpoint according to a result of a mode request process corresponding to the first mode request The second logical channel request is for requesting the second remote rendering endpoint to open a forward logical channel between the second remote rendering endpoint and the first remote rendering endpoint. Preferably, performing the capability interaction between the first remote presentation endpoint and the second remote presentation endpoint comprises: sending, by the first remote presentation endpoint, a third capability set interaction request to the second remote presentation endpoint, wherein the third The capability set interaction request carries a first transmission capability set of the first remote presentation endpoint; the first remote presentation endpoint receives a three-mode request message sent by the second remote presentation endpoint, where the third mode The request message carries the sending parameter of the first remote rendering endpoint; the first remote rendering endpoint receives the fourth capability set interaction request sent by the second remote rendering endpoint, where the fourth capability set interaction request is carried a second transmission capability set of the second remote presentation endpoint; a fourth mode request message sent by the first remote presentation endpoint to the second remote presentation endpoint, where the fourth mode request message carries Transmitting parameters of the second remote presentation endpoint; the first remote presentation endpoint according to the mode corresponding to the fourth mode request message As a result of the request process, sending a third logical channel request to the second remote presentation endpoint, wherein the third logical channel request is for requesting the first remote presentation endpoint to open the first remote presentation endpoint to a forward logical channel between the second telepresence endpoints; Receiving, by the first remote presentation endpoint, a fourth logical channel request sent by the second remote presentation endpoint, where the fourth logical channel request is a mode corresponding to the first remote mode endpoint according to the first mode request As a result of the requesting process, the fourth logical channel request is for requesting the second telepresence endpoint to open a forward logical channel between the second telepresence endpoint and the first telepresence endpoint. Preferably, after the first remote presentation endpoint sends the first capability set interaction request to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint receiving, by the second remote presentation endpoint, the corresponding The first capability set interaction request corresponding response message; after the first remote presentation endpoint sends the third capability set interaction request to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint receiving And a response message corresponding to the third capability set interaction request sent by the second remote presentation endpoint. Preferably, after the first remote presentation endpoint receives the second capability set interaction request sent by the second remote presentation endpoint, the method further includes: sending, by the first remote presentation endpoint, the second remote presentation endpoint to the second remote presentation endpoint The second capability set interaction request corresponding to the response message; after the first remote presentation endpoint receives the fourth capability set interaction request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint to the second The telepresence endpoint sends a response message corresponding to the fourth capability set interaction request. Preferably, after the first remote presentation endpoint receives the first mode request message sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the corresponding location to the second remote presentation endpoint a response message corresponding to the first mode request message; after the first remote presentation endpoint receives the third mode request message sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint to the The remote presentation endpoint sends a response message corresponding to the third mode request message. After the second remote presentation endpoint sends the second mode request message to the second remote presentation endpoint, the method further includes: sending, by the first remote presentation endpoint, the second remote presentation endpoint to the second remote presentation endpoint a response message of the second mode request message; after the fourth mode request message sent by the first remote presentation endpoint to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint to the second The telepresence endpoint sends a response message corresponding to the fourth mode request message. Preferably, after the first remote presentation endpoint sends the first logical channel request to the second remote presentation endpoint according to the result of the mode request process corresponding to the second mode request message, the method further includes: The telepresence endpoint receives a response message corresponding to the first logical channel request sent by the second remote presentation endpoint; and the result of the mode request process corresponding to the second remote presentation endpoint according to the second mode request message After the third logical channel request is sent to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint receiving, by the second remote presentation endpoint, a response message corresponding to the third logical channel request . Preferably, after the first remote presentation endpoint receives the second logical channel request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the first to the second remote presentation endpoint After the first remote presentation endpoint receives the fourth logical channel request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint to the second remote The presentation endpoint sends a response message corresponding to the fourth logical channel request. Preferably, the remote presentation endpoint transmission capability set includes a capture parameter, wherein the capture parameter includes a universal parameter, a video parameter, and/or an audio parameter. Preferably, the universal parameters include media capture content, scene description, scene switching policy, general space information, and/or general encoding information; the media capturing content indicates a purpose of media capturing; and the scene description is used to provide an overall scenario. Preferably, the scenario switching policy is used to indicate a supported media switching policy, where the supported media switching policy includes a location switching policy and a partial switching policy, where the location switching policy is used to indicate simultaneous handover All captures are taken to ensure that the captures come together from the same endpoint location, which is used to indicate that different captures can be switched at different times and from the same and/or different remote presentation endpoints. Preferably, the general space information includes a scene area and/or an area scale parameter, where the scene area parameter is used to indicate a range of an overall scene related to the endpoint, and the area scale indicates a scale used by the spatial information parameter. kind. Preferably, the universal coding information includes all the maximum bandwidth, all the maximum number of pixels per second, and/or all the maximum number of macroblocks per second, wherein the total maximum bandwidth is used to indicate the preset type issued by the terminal. The maximum number of bitrates per second of all codestreams; the total number of pixels per second is used to represent all of the coding groups The maximum number of pixels per second independently coded; the total number of macroblocks per second represents the maximum number of macroblocks per second for all video streams sent by the endpoint. Preferably, the video parameters include: a video capture quantity, video capture space information, and/or video capture coding information; the video capture quantity is used to indicate the number of video captures. Advantageously, said video capture spatial information comprises a capture area, a capture point and/or a point on the capture line, wherein said capture area is for indicating a spatial location of the video capture in the overall captured scene; A capture point, used to indicate the location of the video capture in the captured scene; a point on the capture line that describes the spatial location of the second point on the optical axis of the capture device, and the first point is the capture point. Preferably, the video capture coding information includes a maximum video bandwidth, a maximum number of pixels per second, a width of a maximum video resolution, a height of a maximum video resolution, and/or a maximum video frame rate; wherein, the maximum video bandwidth, The maximum number of bits per second for indicating a single video encoding; the maximum number of pixels per second, the parameter is used to represent the maximum number of pixels per second for a single video encoding; the width of the maximum video resolution, the parameter is used to represent The width of the maximum video resolution in pixels; the height of the maximum video resolution, which is used to represent the height of the maximum video resolution in pixels; the maximum video frame rate, which indicates the maximum Video frame rate. Preferably, the audio parameters include an audio capture amount, audio capture space information, and/or audio capture coding information; the audio capture number is used to indicate the number of audio captures. Preferably, the audio capture space information includes: a capture area, and/or a capture point, where the capture area is used to indicate a spatial location where the audio capture is located in the overall captured scene; the capture point is used to indicate Captures the location of the audio capture in the scene. Preferably, the audio capture coding information includes: an audio channel format and/or a maximum audio bandwidth; the audio channel format is used to indicate an attribute of an audio channel; and the maximum audio bandwidth is used to indicate a single audio code per second. The maximum number of bits. According to another aspect of the present invention, a capability interaction device for remotely presenting an endpoint is provided, which is applied to a first remote presentation endpoint, and includes: an interaction module configured to perform capability interaction with a second remote presentation endpoint, where The capability interaction message carries a remote presentation endpoint capability set, where the capability interaction message carries the remote presentation endpoint transmission capability set; and the first receiving module is configured to receive the second remote presentation endpoint mode. a request message, the processing module, configured to open a logical channel between the first telepresence endpoint and the second telepresence endpoint according to the result of the capability interaction and the received mode request information. Preferably, the interaction module includes: a first sending module, configured to send a first capability set interaction request to the second remote presentation endpoint, where the first capability set interaction request carries the first remote Presenting a first set of transmission capabilities of the endpoint; the second receiving module is configured to receive the second capability set interaction request sent by the second remote presentation endpoint, where the second capability set interaction request carries the second remote presentation a second sending capability set of the endpoint; the third receiving module is configured to receive the first mode request message sent by the second remote rendering endpoint, where the first mode request message carries the first remote rendering endpoint The second sending module is configured to send the second mode request message to the second remote presentation endpoint, where the second mode request message carries the sending parameter of the second remote rendering endpoint; a third sending module, configured to send, according to a result of the mode request process corresponding to the second mode request message, to the second remote The presentation endpoint sends a first logical channel request, wherein the first logical channel request is used to request the first remote presentation endpoint to open forward logic between the first remote presentation endpoint and the second remote presentation endpoint a fourth receiving module, configured to receive a second logical channel request sent by the second remote rendering endpoint, where the second logical channel request is that the second remote rendering endpoint responds according to the first mode request The result of the mode request process is determined, the second logical channel requesting to request the second telepresence endpoint to open a forward logical channel between the second telepresence endpoint and the first telepresence endpoint. Preferably, the interaction module includes: a fourth sending module, configured to send a third capability set interaction request to the second remote presentation endpoint, where the third capability set interaction request carries the first remote Presenting a first set of transmission capabilities of the endpoint; the fifth receiving module is configured to receive the third mode request message sent by the second remote presentation endpoint, where the third mode request message carries the first remote presentation a sending parameter of the endpoint; a sixth receiving module, configured to receive a fourth capability set interaction request sent by the second remote presentation endpoint, where the fourth capability set interaction request carries the second remote presentation endpoint a sending capability set; a fifth sending module, configured to send the fourth mode request message to the second remote rendering endpoint, where the fourth mode request message carries the sending parameter of the second remote rendering endpoint; a sixth sending module, configured to send a third logical channel request to the second remote rendering endpoint according to a result of the mode request process corresponding to the fourth mode request message, where the third logical channel request is used for the request The first remote presentation endpoint opens a forward logical channel between the first remote presentation endpoint and the second remote presentation endpoint; the seventh receiving module is configured to receive the fourth sent by the second remote presentation endpoint a logical channel request, where the fourth logical channel request is determined by the second remote presentation endpoint according to a result of a mode request process corresponding to the first mode request, where the fourth logical channel request is used to request the A second telepresence endpoint opens a forward logical channel between the second telepresence endpoint and the first telepresence endpoint. Preferably, the method further includes: an eighth receiving module, configured to receive, after the first sending module sends the first capability set interaction request to the second remote presentation endpoint, the second remote rendering endpoint sends the corresponding a response message corresponding to the capability set interaction request; the ninth receiving module is configured to: after the fourth sending module sends the third capability set interaction request to the second remote presentation endpoint, receive the correspondence sent by the second remote rendering endpoint Corresponding to the response message corresponding to the third capability set. Preferably, the method further includes: a seventh sending module, configured to send, after the second receiving module receives the second capability set interaction request sent by the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the first And the eighth sending module is configured to: after the fourth receiving module receives the fourth capability set interaction request sent by the second remote rendering endpoint, the method further includes: the first remote rendering endpoint Sending a response message corresponding to the fourth capability set interaction request to the second remote presentation endpoint. Preferably, the method further includes: a ninth sending module, configured to send, after the third receiving module receives the first mode request message sent by the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the a response message corresponding to the first mode request message; a tenth sending module, configured to send, after the fifth receiving module receives the third mode request message sent by the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the third mode request message Response message. Preferably, the method further includes: an eleventh sending module, configured to send, after the second sending module sends the second mode request message to the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the a response message of the second mode request message; the twelfth sending module is configured to: after the fourth mode request message sent by the fifth sending module to the second remote presentation endpoint, to the second remote presentation endpoint A response message corresponding to the fourth mode request message is transmitted. Preferably, the method further includes: a tenth receiving module, configured to send, by the third sending module, a first logical channel request to the second remote rendering endpoint according to a result of a mode request process corresponding to the second mode request message After receiving the response message corresponding to the first logical channel request sent by the second remote presentation endpoint, the eleventh receiving module is configured to correspond to the second mode request message according to the second sending module. As a result of the mode request process, after sending the third logical channel request to the second remote presentation endpoint, receiving a response message corresponding to the third logical channel request sent by the second remote presentation endpoint. Preferably, the method further includes: a thirteenth sending module, configured to: after the first remote rendering endpoint of the fourth receiving module receives the second logical channel request sent by the second remote rendering endpoint, to the second The telepresence endpoint sends a response message corresponding to the second logical channel request; the fourteenth sending module is configured to: after the seventh receiving module receives the fourth logical channel request sent by the second remote rendering endpoint, The second telepresence endpoint sends a response message corresponding to the fourth logical channel request. According to another aspect of the present invention, a capability interaction apparatus for remotely presenting an endpoint is provided, comprising: according to another aspect of the present invention, a data stream is provided, comprising: a remote presentation endpoint capability set, wherein the remote The presentation endpoint capability set includes: a transmission capability set, where the transmission capability set includes: a capture parameter, where the capture parameter includes: a general parameter, a video parameter, and/or an audio parameter. Preferably, the universal parameters include media capture content, scene description, scene switching policy, general space information, and/or general encoding information; the media capturing content indicates a purpose of media capturing; and the scene description is used to provide an overall scenario. Preferably, the scenario switching policy is used to indicate a supported media switching policy, where the supported media switching policy includes a location switching policy and a partial switching policy, where the location switching policy is used to indicate simultaneous handover All captures are taken to ensure that the captures come together from the same endpoint location, which is used to indicate that different captures can be switched at different times and from the same and/or different remote presentation endpoints. Preferably, the general space information includes a scene area and/or an area scale parameter, where the scene area parameter is used to indicate a range of an overall scene related to the endpoint, and the area scale indicates a scale used by the spatial information parameter. kind. Preferably, the universal coding information includes all the maximum bandwidth, all the maximum number of pixels per second, and/or all the maximum number of macroblocks per second, wherein the total maximum bandwidth is used to indicate the preset type issued by the terminal. The maximum number of bits per second of all code streams; the total number of pixels per second is used to represent the maximum number of pixels per second independently coded in the code group; the maximum number of macroblocks per second is represented by The maximum number of macroblocks per second for all video streams sent by the endpoint. Preferably, the video parameters include: a video capture quantity, video capture space information, and/or video capture coding information; the video capture quantity is used to indicate the number of video captures. Advantageously, said video capture spatial information comprises a capture area, a capture point and/or a point on the capture line, wherein said capture area is for indicating a spatial location of the video capture in the overall captured scene; A capture point, used to indicate the location of the video capture in the captured scene; a point on the capture line that describes the spatial location of the second point on the optical axis of the capture device, and the first point is the capture point. Preferably, the video capture coding information includes a maximum video bandwidth, a maximum number of pixels per second, a width of a maximum video resolution, a height of a maximum video resolution, and/or a maximum video frame rate; wherein, the maximum video bandwidth, The maximum number of bits per second for indicating a single video encoding; the maximum number of pixels per second, the parameter is used to represent the maximum number of pixels per second for a single video encoding; the width of the maximum video resolution, the parameter is used to represent The width of the maximum video resolution in pixels; the height of the maximum video resolution, which is used to represent the height of the maximum video resolution in pixels; the maximum video frame rate, which indicates the maximum Video frame rate. Preferably, the audio parameters include an audio capture amount, audio capture space information, and/or audio capture coding information; the audio capture number is used to indicate the number of audio captures. Preferably, the audio capture space information includes: a capture area, and/or a capture point, where the capture area is used to indicate a spatial location where the audio capture is located in the overall captured scene; the capture point is used to indicate Captures the location of the audio capture in the scene. Preferably, the audio capture coding information includes: an audio channel format and/or a maximum audio bandwidth; the audio channel format is used to indicate an attribute of an audio channel; and the maximum audio bandwidth is used to indicate a single audio code per second. The maximum number of bits. Through the invention, the capability interaction of the telepresence terminal is realized, and the user experience is improved. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set to illustrate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, In the drawings: FIG. 1 is a flowchart of a capability interaction method of a remote presentation endpoint according to an embodiment of the present invention; FIG. 2 is a structural block diagram of a capability interaction device for remotely presenting an endpoint according to an embodiment of the present invention; FIG. 4 is a flowchart 1 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 4 is a flowchart 1 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 6 is a flowchart 3 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 7 is a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 8 is a flowchart of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 9 is a flowchart 6 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. FIG. 11 is a flowchart of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 11 is a flowchart 8 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; 12 is a flowchart 9 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 13 is a flowchart 10 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; 14 is a flowchart 11 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; FIG. 15 is a flowchart 12 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention; Flowchart 13 of a method for negotiating a remote presentation endpoint of an embodiment; FIG. 17 is a schematic diagram of a remote presentation endpoint capability set in accordance with an embodiment of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. The present embodiment provides a method for interacting with a remote presentation endpoint. FIG. 1 is a flowchart of a method for interacting with a remote presentation endpoint according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps S102 to S106. Step S102: Perform a capability interaction between the first remote presentation endpoint and the second remote presentation endpoint, where the capability interaction message carries a remote presentation endpoint capability set, where the capability interaction message carries the remote presentation endpoint transmission capability set . Step S104: The first telepresence endpoint receives a mode request message of the second telepresence endpoint. Step S106: The first telepresence endpoint opens a logical channel between the first telepresence endpoint and the second telepresence endpoint according to the result of the capability interaction and the received mode request information. As a preferred implementation, step S102 can be implemented in the following two manners. Method 1 The method includes the following sub-steps S1 to S11. Step S1: The first remote presentation endpoint sends a first capability set interaction request to the second remote presentation endpoint, where the first capability set interaction request carries the first transmission capability set of the first remote presentation endpoint. Step S3: The remote presentation endpoint receives the second capability set interaction request sent by the second remote presentation endpoint, where the second capability set interaction request carries the second transmission capability set of the second remote presentation endpoint; Step S5: The first remote presentation endpoint receives a first mode request message sent by the second remote presentation endpoint, where the first mode request message carries a sending parameter of the first remote presentation endpoint; Step S7: The second mode request message sent by the first remote presentation endpoint to the second remote presentation endpoint, where the second mode request message carries the sending parameter of the second remote presentation endpoint; Step S9: the first remote presentation endpoint is configured according to As a result of the mode request process corresponding to the second mode request message, sending a first logical channel request to the second remote presentation endpoint, where the first logical channel request is used to request the first remote presentation endpoint to open the first remote presentation endpoint to the second Remotely presenting a forward logical channel between the endpoints; Step S11: The first remote presentation endpoint receives the second logical channel request sent by the second remote presentation endpoint, where the second logical channel request is the second remote presentation endpoint according to the first mode Requesting a result of the corresponding mode request procedure, the second logical channel request is for requesting the second telepresence endpoint to open the forward logical channel between the second telepresence endpoint and the first telepresence endpoint. Manner 2: The mode includes the following sub-steps S2 to S12. Step S2: The first remote presentation endpoint sends a third capability set interaction request to the second remote presentation endpoint, where the third capability set interaction request carries the first transmission capability set of the first remote presentation endpoint. Step S4: The remote presentation endpoint receives the three-mode request message sent by the second remote presentation endpoint, where the third mode request message carries the transmission parameter of the first remote presentation endpoint; Step S6: the first remote presentation endpoint receives the second remote presentation endpoint to send The fourth capability set interaction request, where the fourth capability set interaction request carries the second transmission capability set of the second remote presentation endpoint; Step S8: the fourth mode sent by the first remote presentation endpoint to the second remote presentation endpoint a request message, where the fourth mode request message carries the sending parameter of the second remote presentation endpoint; Step S10: the first remote presentation endpoint sends the second remote presentation endpoint according to the result of the mode request process corresponding to the fourth mode request message Sending a third logical channel request, where the third logical channel request is used to request the first far The presentation endpoint opens a forward logical channel between the first remote presentation endpoint and the second remote presentation endpoint; Step S12: The first remote presentation endpoint receives a fourth logical channel request sent by the second remote presentation endpoint, where the fourth logical channel The request is determined by the second telepresence endpoint as a result of the first mode request corresponding mode request process, and the fourth logical channel request is for requesting the second telepresence endpoint to open the second telepresence endpoint to the first telepresence endpoint Forward logical channel. As a preferred implementation, after the first remote presentation endpoint sends the first capability set interaction request to the second remote presentation endpoint, the method further includes: receiving, by the first remote presentation endpoint, the first capability corresponding to the first remote presentation endpoint Set the response message corresponding to the interaction request. After the first remote presentation endpoint sends the third capability set interaction request to the second remote presentation endpoint, the method further includes: receiving, by the first remote presentation endpoint, a response message corresponding to the third capability set interaction request sent by the second remote presentation endpoint. As a preferred implementation, after the first remote presentation endpoint receives the second capability set interaction request sent by the second remote presentation endpoint, the method further includes: sending, by the first remote presentation endpoint, the second remote presentation endpoint to the second capability After the first remote presentation endpoint receives the fourth capability set interaction request sent by the second remote presentation endpoint, the method further includes: sending, by the first remote presentation endpoint, the second remote presentation endpoint to the fourth capability Set the response message corresponding to the interaction request. As a preferred implementation, after the first remote presentation endpoint receives the first mode request message sent by the second remote presentation endpoint, the method further includes: sending, by the first remote presentation endpoint, the first mode request to the second remote presentation endpoint The response message corresponding to the message; after the first remote presentation endpoint receives the third mode request message sent by the second remote presentation endpoint, the method further includes: transmitting, by the first remote presentation endpoint, to the second remote presentation endpoint, corresponding to the third mode request message Response message. As a preferred implementation manner, after the first remote presentation endpoint sends the second mode request message to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the second mode request to the second remote presentation endpoint a response message of the message, after the first remote presentation endpoint sends the fourth mode request message to the second remote presentation endpoint, the method further comprising: the first remote presentation endpoint transmitting a response corresponding to the fourth mode request message to the second remote presentation endpoint Message. As a preferred implementation manner, after the first remote presentation endpoint sends the first logical channel request to the second remote presentation endpoint according to the result of the mode request process corresponding to the second mode request message, the method further includes: the first remote presentation endpoint Receiving, by the second telepresence endpoint, a response message corresponding to the first logical channel request; After the first remote presentation endpoint sends the third logical channel request to the second remote presentation endpoint according to the result of the mode request process corresponding to the second mode request message, the method further includes: receiving, by the first remote presentation endpoint, the second remote presentation endpoint Corresponding to the third logical channel request corresponding response message. As a preferred implementation, after the first remote presentation endpoint receives the second logical channel request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the second logical channel request corresponding to the second remote presentation endpoint After the first remote presentation endpoint receives the fourth logical channel request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending a response message corresponding to the fourth logical channel request to the second remote presentation endpoint. As a preferred implementation manner, the telepresence endpoint transmission capability set includes a capture parameter, where the capture parameter includes a universal parameter, a video parameter, and/or an audio parameter. Preferably, the universal parameters include media capture content, scene description, scene switching policy, general space information and/or general encoding information; media capture content indicates use of media capture; scene description is used to provide description of the overall scene; preferably, the scene The switching policy is used to indicate the supported media switching policy, where the supported media switching policy includes a location switching policy and a partial switching policy, where the location switching policy is used to indicate that all the capturing is simultaneously switched to ensure that the capturing is from the same endpoint together A place, partial handoff strategy is used to indicate that different captures can be switched at different times and from the same and/or different remote presentation endpoints. Preferably, the general spatial information includes a scene area and/or an area scale parameter, wherein the scene area parameter is used to indicate a range of the overall scene associated with the endpoint, and the area scale indicates the type of scale used by the spatial information parameter. Preferably, the universal coding information includes all the maximum bandwidth, the total number of maximum pixels per second, and/or the maximum number of macroblocks per second, wherein all the maximum bandwidth is used to indicate all the code streams of the preset type sent by the terminal. The maximum number of bits per second; the maximum number of pixels per second is used to represent the maximum number of pixels per second independently coded in the code group; the maximum number of macroblocks per second represents the total number of video streams sent by the endpoint. The maximum number of macroblocks in seconds. Preferably, the video parameters include: video capture number, video capture spatial information, and/or video capture encoded information; the number of video captures is used to indicate the number of video captures. Preferably, the video capture spatial information comprises a capture area, a capture point and/or a point on the capture line, wherein the capture area is used to indicate a spatial location of the video capture in the overall captured scene; a capture point, used to indicate In the captured scene, the location of the video capture; the point on the capture line, describes the spatial location of the second point on the optical axis of the capture device, and the first point is the capture point. Preferably, the video captures the encoded information, including the maximum video bandwidth, the maximum number of pixels per second, the width of the maximum video resolution, the height of the maximum video resolution, and/or the maximum video frame rate; wherein, the maximum video bandwidth is used to indicate a single The maximum number of bits per second for video encoding; the maximum number of pixels per second, which is used to represent the maximum number of pixels per second for a single video encoding; the maximum video resolution width, which is used to represent the maximum video resolution in pixels. The width of the rate; the height of the maximum video resolution, which is used to indicate the height of the maximum video resolution in pixels; the maximum video frame rate, which indicates the maximum video frame rate. Preferably, the audio parameters include an audio capture amount, audio capture spatial information, and/or audio capture encoded information; the number of audio captures is used to indicate the number of audio captures. Preferably, the audio capture spatial information comprises: a capture area and/or a capture point, wherein the capture area is used to indicate a spatial location where the audio capture is located in the overall captured scene; and a capture point is used to represent the audio capture in the captured scene s position. Preferably, the audio capture encoded information comprises: an audio channel format and/or a maximum audio bandwidth; an audio channel format for indicating an attribute of the audio channel; a maximum audio bandwidth for indicating a maximum number of bits per second for a single audio encoding. It should be noted that the steps shown in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and, although the logical order is shown in the flowchart, in some cases, The steps shown or described may be performed in an order different than that herein. In another embodiment, a capability interactive software for remotely presenting endpoints is provided for performing the technical solutions described in the above embodiments and preferred embodiments. In another embodiment, a storage medium is provided, the storage medium having the capability of interacting with the remote presentation endpoints, including but not limited to: an optical disk, a floppy disk, a hard disk, a rewritable memory, and the like. The embodiment of the present invention further provides a capability interaction device for a remote presentation endpoint, which is applicable to a first remote presentation endpoint, and the capability interaction device of the remote presentation endpoint may be used to implement the capability interaction method and a preferred implementation manner of the remote presentation endpoint. Having already explained, it will not be described again, and the modules involved in the capability interaction device of the remote presentation endpoint will be described below. As used below, the term "module" can achieve a predetermined function a combination of software and / or hardware. Although the systems and methods described in the following embodiments are preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated. 2 is a structural block diagram of a capability interaction device for remotely presenting an endpoint according to an embodiment of the present invention. As shown in FIG. 2, the device includes: an interaction module 22, a first receiving module 24, a processing module 26, and the foregoing structure. Carry out a detailed description. The interaction module 22 is configured to perform a capability interaction with the second remote presentation endpoint, where the capability interaction message carries a remote presentation endpoint capability set, where the capability interaction message carries the remote presentation endpoint transmission capability set; a receiving module 24, configured to receive a mode request message of the second telepresence endpoint; the processing module 26, connected to the interaction module 22 and the processing module 26, configured to open the first remote according to the result of the capability interaction and the received mode request information A logical channel between the endpoint and the second telepresence endpoint is rendered. FIG. 3 is a block diagram of a preferred structure of a device for remotely presenting an endpoint according to an embodiment of the present invention. As shown in FIG. 3, a preferred structure of the device is as follows: The interaction module 22 includes: a first sending module 220, configured to Sending a first capability set interaction request to the second remote presentation endpoint, where the first capability set interaction request carries a first transmission capability set of the first remote presentation endpoint; the second receiving module 221 is configured to receive the second remote presentation a second capability set interaction request sent by the endpoint, where the second capability set interaction request carries a second transmission capability set of the second remote presentation endpoint, and the third receiving module 222 is configured to receive the second remote presentation endpoint a mode request message, where the first mode request message carries the transmission parameter of the first remote presentation endpoint; the second sending module 223 is configured to send the second mode request message to the second remote presentation endpoint, where the second The mode request message carries the sending parameter of the second remote rendering endpoint; the third sending module 224, And sending a first logical channel request to the second remote rendering endpoint according to a result of the mode request process corresponding to the second mode request message, where the first logical channel request is used to request the first remote rendering endpoint to open the first remote rendering endpoint a second logical channel request sent by the second remote presentation endpoint, wherein the second logical channel request is a second remote presentation endpoint according to the forward logical channel between the second remote presentation endpoints The first mode request determines the result of the corresponding mode request process, and the second logical channel request is for requesting the second telepresence endpoint to open the forward logical channel between the second telepresence endpoint and the first telepresence endpoint. Preferably, the interaction module 22 includes: a fourth sending module 226, configured to send a third capability set interaction request to the second remote presentation endpoint, where the third capability set interaction request carries the first transmission of the first remote presentation endpoint The fifth receiving module 227 is configured to receive the third mode request message sent by the second remote presentation endpoint, where the third mode request message carries the sending parameter of the first remote rendering endpoint; the sixth receiving module 228, The fourth capability set interaction request sent by the second remote presentation endpoint is received, where the fourth capability set interaction request carries the second transmission capability set of the second remote presentation endpoint, and the fifth sending module 229 is set to the first a fourth mode request message sent by the remote presentation endpoint, where the fourth mode request message carries the sending parameter of the second remote presentation endpoint, and the sixth sending module 230 is configured to select the mode request process corresponding to the fourth mode request message. As a result, sending a third logical channel request to the second telepresence endpoint, where The third logical channel request is for requesting the first remote rendering endpoint to open the forward logical channel between the first remote rendering endpoint and the second remote rendering endpoint; the seventh receiving module 231 is configured to receive the second remote endpoint a fourth logical channel request, wherein the fourth logical channel request is determined by the second remote rendering endpoint according to a result of the mode request process corresponding to the first mode request, and the fourth logical channel request is used to request the second remote rendering endpoint to open the second remote A forward logical channel between the endpoint and the first telepresence endpoint is presented. Preferably, the foregoing apparatus further includes: an eighth receiving module 31, connected to the first sending module 220, after the first sending module sends the first capability set interaction request to the second remote rendering endpoint, receiving the second remote rendering endpoint sending Corresponding to the response message corresponding to the first capability set interaction request; the ninth receiving module 32 is connected to the fourth sending module 226, and after the fourth sending module 226 sends the third capability set interaction request to the second remote rendering endpoint, The response message corresponding to the third capability set interaction request sent by the remote presentation endpoint is sent. Preferably, the foregoing apparatus further includes: a seventh sending module 33, connected to the second receiving module 221, after the second receiving module 221 receives the second capability set interaction request sent by the second remote rendering endpoint, to the second remote presentation endpoint Sending a response message corresponding to the second capability set interaction request; The eighth sending module 34 is connected to the sixth receiving module 228. After the sixth receiving module 228 receives the fourth capability set interaction request sent by the second remote rendering endpoint, the method further includes: the first remote rendering endpoint to the second remote rendering endpoint A response message corresponding to the fourth capability set interaction request is sent. Preferably, the foregoing apparatus further includes: a ninth sending module 35, connected to the third receiving module 222, after the third receiving module 222 receives the first mode request message sent by the second remote rendering endpoint, sending the message to the second remote rendering endpoint Corresponding to the response message corresponding to the first mode request message; the tenth sending module 36 is connected to the fifth receiving module 227, and after the fifth receiving module 227 receives the third mode request message sent by the second remote rendering endpoint, to the second The telepresence endpoint sends a response message corresponding to the third mode request message. Preferably, the foregoing apparatus further includes: an eleventh sending module 37, connected to the second sending module 223, configured to send to the second remote mode after the second sending module 223 sends the second mode request message to the second remote rendering endpoint The presentation endpoint sends a response message corresponding to the second mode request message; the twelfth sending module 38 is connected to the fifth sending module 229, and is set as the fourth mode request message sent by the fifth sending module 229 to the second remote presentation endpoint. Thereafter, a response message corresponding to the fourth mode request message is sent to the second remote presentation endpoint. Preferably, the foregoing apparatus further includes: a tenth receiving module 39, connected to the third sending module 224, configured to send the second remote presentation endpoint to the second remote sending module 224 according to a result of the mode request process corresponding to the second mode request message After the first logical channel request is sent, the response message corresponding to the first logical channel request sent by the second remote presentation endpoint is received; the eleventh receiving module 40 is connected to the sixth sending module 230, and is configured to be in the sixth sending module. 230. After transmitting the third logical channel request to the second remote presentation endpoint according to the result of the mode request process corresponding to the second mode request message, receiving a response message corresponding to the third logical channel request sent by the second remote presentation endpoint. The thirteenth sending module 41 is connected to the fourth receiving module 225, and the fourth receiving module is configured to: after the first remote rendering endpoint of the fourth receiving module 225 receives the second logical channel request sent by the second remote rendering endpoint, to the second The telepresence endpoint sends a response message corresponding to the second logical channel request; The fourteenth sending module 42 is connected to the seventh receiving module 231, and configured to send the fourth logical channel request to the second remote rendering endpoint after the seventh receiving module 231 receives the fourth logical channel request sent by the second remote rendering endpoint Corresponding response message. The embodiment provides a data stream, a remote presentation endpoint capability set, where the remote presentation endpoint capability set includes: a transmission capability set, where the transmission capability set includes: a capture parameter, where the capture parameters include: a general parameter, a video parameter, and/or an audio parameter. . Preferably, the universal parameters include media capture content, scene description, scene switching policy, general space information, and/or general encoding information; media capture content indicates use of media capture, the attribute includes a media capture perspective, a role of the representation of the media, the media Whether it is auxiliary stream content, media related language; scene description is used to provide a description of the overall scene, such as a text description. Preferably, the scenario switching policy is used to indicate the supported media switching policy, where the supported media switching policy includes a location switching policy and a partial switching policy, where the location switching policy is used to indicate that all the capturing is simultaneously switched to ensure that the capturing is performed together. From the same endpoint location, a partial handover policy is used to indicate that different acquisitions can be switched at different times and from the same and/or different remote presentation endpoints. Preferably, the general spatial information includes a scene area and/or an area scale parameter, wherein the scene area parameter is used to indicate a range of the overall scene associated with the endpoint, and the area scale indicates the type of scale used by the spatial information parameter. Preferably, the universal coding information includes all the maximum bandwidth, the total number of maximum pixels per second, and/or the maximum number of macroblocks per second, wherein all the maximum bandwidth is used to indicate all the code streams of the preset type sent by the terminal. The maximum number of bits per second; the maximum number of pixels per second is used to represent the maximum number of pixels per second independently coded in the code group; the maximum number of macroblocks per second represents the total number of video streams sent by the endpoint. The maximum number of macroblocks in seconds. Preferably, the video parameters include: video capture number, video capture spatial information, and/or video capture encoded information; the number of video captures is used to indicate the number of video captures. Preferably, the video capture spatial information comprises a capture area, a capture point and/or a point on the capture line, wherein the capture area is used to indicate a spatial location of the video capture in the overall captured scene; a capture point, used to indicate In the captured scene, the location of the video capture; the point on the capture line, describes the spatial location of the second point on the optical axis of the capture device, and the first point is the capture point. Preferably, the video captures the encoded information, including the maximum video bandwidth, the maximum number of pixels per second, the width of the maximum video resolution, the height of the maximum video resolution, and/or the maximum video frame rate; wherein, the maximum video bandwidth is used to indicate a single The maximum number of bits per second for video encoding; the maximum number of pixels per second, which is used to represent the maximum number of pixels per second for a single video encoding; the maximum video resolution width, which is used to represent the maximum video resolution in pixels. The width of the rate; the height of the maximum video resolution, which is used to indicate the height of the maximum video resolution in pixels; the maximum video frame rate, which indicates the maximum video frame rate. Preferably, the audio parameters include an audio capture amount, audio capture spatial information, and/or audio capture encoded information; the number of audio captures is used to indicate the number of audio captures. Preferably, the audio capture spatial information comprises: a capture area and/or a capture point, wherein the capture area is used to indicate a spatial location where the audio capture is located in the overall captured scene; and a capture point is used to represent the audio capture in the captured scene s position. Preferably, the audio capture encoded information comprises: an audio channel format and/or a maximum audio bandwidth; an audio channel format for indicating an attribute of the audio channel; a maximum audio bandwidth for indicating a maximum number of bits per second for a single audio encoding. Preferably, the telepresence endpoint capability set comprises: a telepresence endpoint symmetric capability set, and the telepresence endpoint symmetric capability set comprises: capturing rendering parameters, and capturing rendering parameters comprises: general parameters, video parameters and/or audio parameters. Preferably, the universal parameters include media capture rendering content, scene description, scene switching strategy, general space information, and/or general encoding information; media capture rendering content represents media capture and/or rendering purposes, including media capture perspective, media The representation of the role, whether the media is the auxiliary stream content, the media related language; the scenario description is used to provide a description of the overall scenario; preferably, the scenario switching policy is used to indicate the supported media switching policy, wherein the supported media switching policy includes a place switching policy and a partial switching policy, wherein the place switching policy is used to indicate that all the captured renderings are simultaneously switched to ensure that the captured renderings come together from the same endpoint location, and the partial switching strategy is used to indicate that different capturing renderings can be switched at different times. And from the same and/or different telepresence endpoints. Preferably, the general spatial information includes a scene area and/or an area scale parameter, wherein the scene area parameter is used to indicate a range of the overall scene associated with the endpoint, and the area scale indicates the type of scale used by the spatial information parameter. Preferably, the universal coding information includes all the maximum bandwidth, all the maximum number of pixels per second and/or the maximum number of macroblocks per second, wherein all the maximum bandwidth is used to indicate the pre-send and/or received pre-received by the capture rendering endpoint. Set the maximum number of bitrates per second for all streams of the type; the maximum number of pixels per second is used to represent the maximum number of pixels per second independently encoded in the encoded group sent and/or received by the endpoint; all per second The maximum number of macroblocks represents the maximum number of macroblocks per second for all video streams sent and/or received by the endpoint. Preferably, the video parameters include: video capture rendering amount, video capture rendering space information, and/or video capture rendering encoding information; the number of video capture renderings is used to indicate the number of video captures and/or renderings. Preferably, the video capture rendering spatial information comprises capturing a rendering area, capturing a rendering point, and/or capturing a point on the rendering line, wherein the capturing rendering area is used to indicate where the video capture rendering is in the overall captured and/or rendered scene Spatial position; capture rendering point, used to indicate the location of video capture and/or rendering in the captured and/or rendered scene; capture points on the rendering line, describing the second on the optical axis of the capture and/or rendering device The spatial position of the points, and the first point is the capture and / or render point. Preferably, the video capture renders the encoded information, including the maximum video bandwidth, the maximum number of pixels per second, the width of the maximum video resolution, the height of the maximum video resolution, and/or the maximum video frame rate; wherein the maximum video bandwidth is used to indicate The maximum number of bits per second for a single video encoding; the maximum number of pixels per second, which is used to represent the maximum number of pixels per second for a single video encoding; the maximum video resolution width, which is used to represent the largest video in pixels. The width of the resolution; the height of the maximum video resolution, which is used to indicate the height of the maximum video resolution in pixels; the maximum video frame rate, which indicates the maximum video frame rate. Preferably, the audio parameters include an audio capture rendering amount, audio capture rendering space information, and/or audio capture rendering encoding information; an audio capture rendering number is used to indicate the number of audio capture renderings. Preferably, the audio capture rendering spatial information comprises: capturing a rendering area and/or capturing a rendering point, wherein the rendering area is used to represent an audio capture and/or rendering a spatial location where the overall captured and/or rendered scene is located; Render point, used to indicate the location of audio capture and/or rendering in the captured and/or rendered scene. Preferably, the audio capture rendering encoded information comprises: an audio channel format and/or a maximum audio bandwidth; an audio channel format for indicating an attribute of the audio channel; and a maximum audio bandwidth for indicating a maximum number of bits per second involved in a single audio encoding. Preferably, the telepresence endpoint capability set includes: As a preferred implementation, the telepresence endpoint capability set includes: a telepresence endpoint reception capability set, including: a rendering parameter, where the rendering parameters include: a general parameter, a video parameter, and/or Audio parameters. Preferably, the universal parameters include: media rendering content, scene description, scene switching strategy, general space information, and/or general encoding information, wherein the media rendering content is used to represent an attribute of the captured content required by the rendering endpoint, the attribute including media capture The perspective, the role of the media representation, whether the media is the auxiliary stream content, the media related language; the scenario description, used to provide a description of the overall scenario; the scenario switching policy, used to indicate the supported media switching policy, preferably, the scenario switching policy The method includes a location switching policy and/or a partial switching policy, where the location switching policy is to switch all the renderings at the same time, to ensure that the renderings come together from the same endpoint location, and the partial switching strategy switches for different renderings at different times, from the same and / or different endpoints. Preferably, the general spatial information includes: a scene area and/or an area scale parameter, wherein the scene area parameter is used to indicate a range of the overall scene related to the endpoint, and the area scale is used to indicate the type of the scale used by the spatial information parameter. Preferably, the universally encoded information includes all of the maximum bandwidth, all of the maximum number of pixels per second, and/or all of the maximum number of macroblocks per second, wherein all of the maximum bandwidth represents all of the preset types of streams received by the rendering endpoint. The maximum number of bits per second; the maximum number of pixels per second represents the maximum number of pixels processed per second independently encoded in the code group; all the maximum number of macroblocks per second represents all video streams received by the endpoint The maximum number of macroblocks per second. Preferably, the video parameters include: a number of video renderings, video rendering space information, and/or video rendering encoding information, where the number of video renderings is used to indicate the number of video renderings; the video rendering spatial information is used to indicate that the video rendering representation is a whole Render a portion of the scene. Preferably, the video rendering coding information includes a maximum video bandwidth, a maximum number of pixels per second, a width of a maximum video resolution, a height of a maximum video resolution, and/or a maximum video frame rate; wherein, a maximum video bandwidth, the parameter is used to represent The maximum number of bits per second for a single video encoding; the maximum number of pixels per second, which is used to represent the maximum number of pixels per second for a single video encoding; the maximum video resolution width, which is used to represent the largest video in pixels. The width of the resolution; the height of the maximum video resolution, which is used to indicate the height of the maximum video resolution in pixels; the maximum video frame rate, which is used to represent the maximum video frame rate. Preferably, the audio parameters include: an audio rendering amount, audio rendering space information, and/or audio rendering encoding information, where the audio rendering number is used to represent the number of audio renderings; the audio rendering space information is used to indicate that the audio rendering is in the overall rendering scene. The spatial information in which it is located. Preferably, the audio rendering encoded information comprises: an audio channel format and/or a maximum audio bandwidth, wherein the audio channel format is used to represent an attribute of the audio channel; and the maximum audio bandwidth is used to represent a maximum number of bits per second for a single audio encoding. Preferred Embodiment 1 The preferred embodiment provides a capability negotiation method, which can ensure scene description, codec capability, and negotiation of multiplexing channels of multiple media between remote presentation endpoints. In this embodiment, a negotiation mode A is provided. FIG. 4 is a flowchart 1 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. As shown in FIG. 4, the negotiation mode includes a capability set interaction and a logical channel open. section. The method includes the following steps S401 and S402: Step S401: Capability set interaction: capability interaction between two telepresence endpoints, carrying a telepresence endpoint capability set. Step S402: The logical channel is opened: Open a logical channel between two remote rendering endpoints, and specify the channel attribute after negotiation. The following description will be made with reference to examples. In this embodiment, a negotiation mode A-1 is provided. FIG. 5 is a flowchart 2 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention, as shown in FIG. 5. The capability set carried in the negotiation mode capability set interaction message is a receiving capability set, and the receiving capability set includes the endpoint receiving capability related parameter, and the following steps S501 and S502 are included. Step S501: capability set interaction: capability set interaction between two remote presentation endpoints, where the message carries a remote presentation endpoint reception capability set parameter. Step S502: Logical channel open: Open a logical channel between two remote rendering endpoints, and specify the channel attribute after negotiation. FIG. 6 is a flowchart 3 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. The capability set interaction and the logical channel open procedure in the negotiation manner shown in FIG. 6 may include the following steps S601 to S608. Step S601: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the reception capability set of the endpoint A. Step S602: Endpoint B replies to the endpoint A capability set interaction response message; Step S603: The remote presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, and the message carries the reception capability set of the endpoint B. Step S604: The endpoint A replies to the endpoint B capability set interaction response message; Step S605: The endpoint B according to the endpoint A The receiving capability, combined with its own sending capability, sends an Open Logical Channel Request message to Endpoint A. Specifying the channel attribute, requesting to open the forward logical channel of B to A; Step S606: Endpoint A replies to the Endpoint B to open the logical channel response message; Step S607: Endpoint A according to the receiving capability of the endpoint B, combined with its own sending capability, to the endpoint B sends an open logical channel request message. Specify the channel attribute, request to open the forward logical channel of A to B; Step S608: Endpoint B replies to Endpoint A to open the logical channel response message. One pair of request and response messages has a chronological order, and the first pair of capability interaction messages are sent before the first pair of logical channel open messages. According to this rule, the order of sending the messages described in FIG. 3 may also be 1, 3, 2, 4, 5, 6, 7, 8, and the endpoints A and B first transmit the response set after interacting with each other, or 1 After the 2, 5, 6, 3, 4, 7, and 8 capabilities are completed, first open the logical channel on one side. In the process of information interaction from endpoint A to endpoint B, as long as the length of the underlying packet is allowed, multiple messages sent from one endpoint may be carried at a time, such as 1, 2+3, 4, 5, 6+7, 8 . 2+3 indicates that end point B is sent to the endpoint A-strip information, and the interaction contains two messages and two messages. Preferred Embodiment 2 The preferred embodiment provides a capability negotiation method, which can ensure scene description, codec capability, and negotiation of multiplexing channels of multiple media between remote presentation endpoints. The preferred embodiment provides a negotiation mode A-2, and the capability set carried in the negotiation mode capability set interaction message is a symmetric capability set, and the symmetric capability set indicates that the receiving capability set and the transmission capability set of the endpoint are consistent. FIG. 7 is a flowchart 4 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. As shown in FIG. 7, the method includes step S701 and step S702. Step S701: Capability set interaction: The capability set interaction is performed between two remote rendering endpoints, and the message carries the parameters of the remote rendering endpoint symmetric capability set. Step S702: The logical channel is opened: Open a logical channel between two remote rendering endpoints, and specify the channel attribute after negotiation. FIG. 8 is a flowchart of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. As shown in FIG. 8, the capability set interaction and the logical channel open procedure in the negotiation manner include the following steps S801 to S808. Step S801: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the symmetric capability set of the endpoint A. Step S803: the endpoint B replies to the endpoint A capability set interaction response message; Step S804: remotely presents the endpoint B direction The remote presentation endpoint A initiates the capability set interaction request, and the message carries the symmetric capability set of the endpoint B. Step S804: the endpoint A replies to the endpoint B capability set interaction response message; Step S805: the endpoint B combines its own capability according to the symmetric capability of the endpoint A. , Send an open logical channel request message to endpoint A. Specifying the channel attribute, requesting to open the forward logical channel of B to A; Step S806: End point A replies to the end point B to open the logical channel response message; Step S807: End point A according to the symmetry capability of the end point B, combined with its own capability, to the end point B Send Open logical channel request message. Specify the channel attribute, request to open the forward logical channel of A to B; Step S808: Endpoint B replies to Endpoint A to open the logical channel response message. One pair of request and response messages has a chronological order, and the first pair of capability interaction messages are sent before the first pair of logical channel open messages. According to this rule, the order of sending the messages described in FIG. 5 may also be 1, 3, 2, 4, 5, 6, 7, 8, and the endpoints A and B first transmit the response set and then send the response message, or 1 After the 2, 5, 6, 3, 4, 7, and 8 capabilities are completed, first open the logical channel on one side. In the process of information interaction from endpoint A to endpoint B, as long as the length of the underlying packet is allowed, multiple messages sent from one endpoint may be carried at a time, such as 1, 2+3, 4, 5, 6+7, 8 . 2+3 indicates that end point B is sent to the endpoint A-strip information, and the interaction contains two messages and two messages. Preferred Embodiment 3 The preferred embodiment provides a capability negotiation method, which can ensure scene description, codec capability, and negotiation of multiplexing channels of multiple media between remote presentation endpoints. The preferred embodiment provides a negotiation mode A-3, and the capability set carried in the negotiation mode capability set interaction message is a transmission capability set, and the transmission capability set includes an endpoint transmission capability related parameter. FIG. 9 is a flowchart 6 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. As shown in FIG. 9, the method includes steps S901 and S902. Step S901: capability set interaction: a capability set interaction is performed between two remote presentation endpoints, and the message carries a remote presentation endpoint transmission capability set parameter. Step S902: The logical channel is opened: Open the logical channel between the two telepresence endpoints, and specify the channel attribute after negotiation. The following description will be made with reference to examples. Example 1: This example provides a negotiation mode A-3-1. FIG. 10 is a flowchart 7 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. As shown in FIG. 10, the method is in a capability interaction response message. Returning the selected parameters includes the following steps S101 to S108. Step S101: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the transmission capability set of the endpoint A. Step S102: The endpoint B replies to the endpoint A capability set interaction response message, and carries the transmission capability of the B from the A. Step S103: The remote presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, and the message carries the transmission capability set of the endpoint B. Step S104: The endpoint A replies to the endpoint B capability set interaction response message, carrying the A slave B. The sending capability concentrates the selected parameters; Step S105: Endpoint A sends an Open Logical Channel Request message to Endpoint B according to the capability of Endpoint B to select in the capability set interaction response. Specifying the channel attribute, requesting to open the forward logical channel of A to B; Step S106: Endpoint B replies to Endpoint A to open the logical channel response message; Step S107: Endpoint B according to the capability of Endpoint A in the capability set interactive response selection, to Endpoint A Send Open logical channel request message. Specify the channel attribute, request to open the forward logical channel of B to A; Step S108: Endpoint A replies to Endpoint B with the open logical channel response message. One pair of request and response messages has a chronological order, and the first pair of capability interaction messages are sent before the first pair of logical channel open messages. According to this rule, the order of sending the messages described in FIG. 7 may also be 1, 3, 2, 4, 5, 6, 7, 8, and the endpoints A and B first transmit the response set and then send the response message, or 1 After the 2, 5, 6, 3, 4, 7, and 8 capabilities are completed, first open the logical channel on one side. In the process of information interaction from endpoint A to endpoint B, as long as the length of the underlying packet is allowed, multiple messages sent from one endpoint may be carried at a time, such as 1, 2+3, 4, 5, 6+7, 8 . 2+3 indicates that end point B is sent to the endpoint A-strip information, and the interaction contains two messages and two messages. Example 2: This example provides negotiation mode A-3-2. As shown in FIG. 11, the mode does not carry the selection parameter in the capability set interaction response message, and the reverse logical channel is requested to be opened in the open logical channel request. The following steps S111 to S118 are included. Step S111: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the transmission capability set of the endpoint A. Step S112: the endpoint B replies to the endpoint A capability set interaction response message; Step S113: remotely renders the endpoint B direction The remote presentation endpoint A initiates the capability set interaction request, and the message carries the transmission capability set of the endpoint B. Step S114: The endpoint A replies to the endpoint B capability set interaction response message; Step S115: the endpoint B combines its own reception according to the sending capability of the endpoint A. Capabilities, sends an open logical channel request message to endpoint A. Specifying the channel attribute, requesting to open the reverse logical channel of B to A; Step S116: Endpoint A replies to the Endpoint B to open the logical channel response message; Step S117: Endpoint A according to the sending capability of the endpoint B, combined with its own receiving capability, to the endpoint B sends an open logical channel request message. Specify the channel attribute, request to open the reverse logical channel from A to B; Step S118: Endpoint B replies to Endpoint A to open the logical channel response message. One pair of request and response messages has a chronological order, and the first pair of capability interaction messages are sent before the first pair of logical channel open messages. According to this rule, the order of sending the messages described in FIG. 8 may also be 1, 3, 2, 4, 5, 6, 7, 8, and the endpoints A and B first transmit the response set after interacting with each other, or 1 After the 2, 5, 6, 3, 4, 7, and 8 capabilities are completed, first open the logical channel on one side. In the process of information interaction from endpoint A to endpoint B, as long as the length of the underlying packet is allowed, multiple messages sent from one endpoint may be carried at a time, such as 1, 2+3, 4, 5, 6+7, 8 . 2+3 indicates that end point B is sent to the endpoint A-strip information, and the interaction contains two messages and two messages. Preferred Embodiment 4 The preferred embodiment provides a capability negotiation method, which can ensure scene description, codec capability and negotiation of multiplexing channels of multiple media between remote presentation endpoints. The preferred embodiment provides a negotiation mode A-4, and the capability set carried in the negotiation mode capability set interaction message is a reception capability set and a transmission capability set. The receiving capability set includes endpoint receiving capability related parameters, and the sending capability set includes endpoint sending capability related parameters. As shown in FIG. 12, the following steps S1201 and S1202 are included. Step S1201: Capability set interaction: The capability set interaction is performed between two telepresence endpoints, and the message carries the telepresence endpoint receiving capability set and the sending capability set parameter. Step S1202: Logical channel open: Open the logical channel between two telepresence endpoints, and specify the channel properties after negotiation. The following is an example to illustrate. Instance 1 This example describes the negotiation mode A-4-1. As shown in FIG. 13, the negotiation mode carries the receiving capability set and the sending capability set simultaneously in the capability set interaction, and includes the following steps S1301 to S1308. Step S1301: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the reception capability set and the transmission capability set of the endpoint A. Step S1302: The endpoint B replies to the endpoint A capability set interaction response message; Step S1303: Remote The presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, and the message carries the reception capability set and the transmission capability set of the endpoint B. Step S1304: The endpoint A replies to the endpoint B capability set interaction response message; Step S1305: The endpoint B according to the endpoint A The ability to, in conjunction with its own capabilities, sends an Open Logical Channel Request message to Endpoint A. Specify the channel attribute, request to open the forward logical channel of B to A; Step S1306: Endpoint A replies to Endpoint B to open the logical channel response message; Step S1307: Endpoint A sends an Open Logical Channel Request message to Endpoint B according to the capabilities of Endpoint B and its own capabilities. Specify the channel attribute, request to open the forward logical channel from A to B; Step S1308: Endpoint B replies to Endpoint A to open the logical channel response message. One pair of request and response messages has a chronological order, and the first pair of capability interaction messages are sent before the first pair of logical channel open messages. According to this rule, the order of sending the messages described in FIG. 10 may also be 1, 3, 2, 4, 5, 6, 7, 8, and the endpoints A and B first interact with each other and then open the logical channels on both sides. Or 1, 2, 5, 6, 3, 4, 7, 8, after the ability to complete a part of the interaction, first open the logical channel on one side. In the process of information interaction from endpoint A to endpoint B, as long as the length of the underlying packet is allowed, multiple messages sent from one endpoint may be carried at a time, such as 1, 2+3, 4, 5, 6+7, 8 . 2+3 indicates that end point B is sent to the endpoint A-strip information, and the interaction contains two messages and two messages. Example 2 FIG. 14 is a flowchart 11 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. In this embodiment, a negotiation mode A-4-2 is described. As shown in FIG. 14, the mode is in a capability set. The receiving capability set as the receiving end in the interaction request message is formed by the transmission capability set as the transmitting end, and includes the following steps S1401 to S1412.

S1401: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the transmission capability set of the endpoint A;

S1402: Endpoint B replies to the endpoint A capability set interaction response message;

S1403: The endpoint B initiates a capability interaction request message to the endpoint A according to the sending capability of the endpoint A and the receiving capability of the endpoint A, where the message carries the receiving capability set of the endpoint B.

S1404: Endpoint A replies to the endpoint B capability set interaction response message;

S1405: The remote presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, and the message carries the transmission capability set of the endpoint B;

S1406: Endpoint A replies to the endpoint B capability set interaction response message; S1407: The endpoint A initiates a capability interaction request message to the endpoint B according to the sending capability of the endpoint B and its own receiving needs, where the message carries the receiving capability set of the endpoint A; S1408: Endpoint B replies to the endpoint A capability set interaction response message;

S1409: Endpoint A sends an Open Logical Channel Request message to Endpoint B according to the receiving capability set of Endpoint B. Specify the channel attribute, request to open the forward logical channel from A to B;

S1410: Endpoint B replies to Endpoint A to open logical channel response message; S1411: Endpoint B sends an Open Logical Channel Request message to Endpoint A according to the receiving capability set of Endpoint A. Specify the channel attribute, request to open the forward logical channel from B to A;

S1412: Endpoint A replies to Endpoint B with an open logical channel response message. One of the pair of request and response messages has a chronological order, and the first two pairs of sending and receiving capability exchange messages are sent before the first pair of logical channel open messages. According to this rule, the order of sending the messages described in FIG. 11 may also be 1, 3, 5, 7, 2, 4, 6, 8, and the endpoints A and B first transmit the response set after interacting with each other, or 1 2, 3, 4, 9, 10, 5, 6, 7, 8, 11, 12, after the part of the capability interaction is completed, first open the logical channel on one side. In the process of information interaction from endpoint A to endpoint B, as long as the length of the underlying packet is allowed, multiple messages sent from one endpoint may be carried at a time, such as 1, 2+3, 4, 5, 6+7, 8 , 9, 10+1, 12. 2+3 indicates that endpoint B sends to endpoint A-strip information, and the interaction contains 2 and 3 messages. Preferred Embodiment 5 This preferred embodiment provides a capability negotiation method, which can ensure scene description, codec capability, and negotiation of multiplexing channels of multiple media between remote presentation endpoints. In this embodiment, the capability set carried in the negotiation mode capability set interaction message is a transmission capability set. As shown in FIG. 15, the following three steps are performed: Step S1501: capability set interaction: capability between two remote presentation endpoints Set interaction, the message carries the telepresence endpoint receiving capability set and the sending capability set parameter. Step S1502: Mode request: The receiving end requests a specific transmission mode according to the sending capability set of the transmitting end. Step S1503: Logical channel open: Open the logical channel between the two telepresence endpoints, and specify the negotiated channel attributes. The following is a detailed description by way of example. FIG. 16 is a flowchart 13 of a method for negotiating a remote presentation endpoint according to an embodiment of the present invention. As shown in FIG. 16, the method includes the following steps S1601 to S1612. Step S1601: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the transmission capability set of the endpoint A. Step S1602: The endpoint B replies to the endpoint A capability set interaction response message. Step S1603: The remote presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, and the message carries the transmission capability set of the endpoint B. Step S1604: Endpoint A replies to the Endpoint B Capability Set Interactivity Response message. Step S 1605: The remote presentation endpoint B initiates a mode request message to the remote presentation endpoint A, requesting a specific transmission mode, and the mode carries the transmission parameter of the endpoint A. Step S1606: Endpoint A replies to the Endpoint B Mode Request Response message. Step S1607: The remote presentation endpoint A initiates a mode request message to the remote presentation endpoint B, requesting a specific transmission mode, and the mode carries the transmission parameter of the endpoint B. Step S1608: Endpoint B replies to the endpoint A mode request response message. Step S1609: Endpoint A sends an Open Logical Channel Request message to Endpoint B according to the result of the mode request process. Specify the channel properties and request to open the forward logical channel from A to B. Step S1610: Endpoint B replies to Endpoint A to open the logical channel response message; Step S1611: Endpoint B sends an Open Logical Channel Request message to Endpoint A according to the result of the mode request process. Specifies the channel properties, requesting to open the forward logical channel from B to A. Step S1612: Endpoint A replies to Endpoint B with an open logical channel response message. Preferably, the pair of request and response messages have a temporal sequence, the first pair of capability interaction message sending time is before the first pair mode request message, and the first pair mode request message is before the first pair of logical channel open messages. According to this rule, the order of message transmission described in FIG. 13 may also be 1, 2, 5, 6, 3, 4, 7, 8, 9, 10, 11, 12, or 1, 2, 5, 6, 9, 10, 3, 4, 7, 8, 11, 12. In the process of information interaction from endpoint A to endpoint B, as long as the length of the underlying packet is allowed, multiple messages sent from one endpoint may be carried at a time, such as 1, 2+3, 4, 5, 6+7, 8 , 9, 10+1, 12. 2+3 indicates that endpoint B sends to endpoint A-strip information, and the interaction contains 2 and 3 messages. Preferred Embodiment 6 This preferred embodiment provides a remote presentation endpoint capability set parameter, and FIG. 17 is a schematic diagram of a remote presentation endpoint capability set according to an embodiment of the present invention. As shown in FIG. 17, the transmission capability set and the reception capability are divided. set. A detailed description will be given below. It should be noted that the parameters listed in the figure and in the above embodiments are only important parameters, and do not represent all the parameters. The parameters passed in the capability interaction message are not intended to include all of the following parameters, and can be actually combined as needed. The transmission capability set mainly includes capturing related parameters, and the related parameters include common parameters, video parameters and audio parameters. Other parameters in the set of transmit capabilities can be used to include coding standards in related art, such as

H.263, H.264, etc. Preferably, the general parameters are used to describe scene related parameters and general coding related parameters. The scene related parameters include a scene description, a scene switching policy, a scene area, and metric information. The general encoding related parameters include a maximum bandwidth, a maximum number of macroblocks per second, and an encoding standard. Preferably, the video parameters are used to describe the attributes of the individual videos that make up the captured scene. The video parameters mainly include video capture space information, video capture and encoding information, number of video captures, video content attributes, video switching strategies, video combination strategies and other parameters. The video capture spatial information includes a capture area, a capture point, and a point on the capture line; the video capture encoded information includes a maximum video bandwidth captured by the video, a maximum number of macroblocks per second, a maximum video resolution width, a maximum video resolution height, Maximum video frame rate. Preferably, the audio parameters are used to describe the attributes of the individual audios that make up the captured scene. The audio parameters mainly include parameters such as audio capture space information, audio capture coding information, and number of audio captures. The audio capture space information includes a capture area, a capture point, and a point on the capture line; the audio capture encoded information includes an audio channel format of the audio capture, and a maximum audio bandwidth. Preferably, the receiving capability set corresponds to the sending capability set, and the rendering related parameter corresponds to the capturing related parameter. Preferably, the receiving capability set mainly includes rendering related parameters, and the rendering related parameters include general parameters, video parameters and audio parameters. Other parameters of the receive capability set can be used to include decoding standards such as H.263, H.264, and the like. Preferably, the general parameters are used to describe scene related parameters and general decoding related parameters. The scene related parameters include a scene description, a scene switching policy, a scene area, and metric information. The general decoding related parameters include a maximum bandwidth, a maximum number of macroblocks per second, and a decoding standard. Preferably, the video parameters are used to describe the attributes of the individual videos that make up the rendered scene. Video parameters mainly include video rendering space information, video rendering and decoding information, number of video renderings, content attributes, automatic switching strategies, combined strategies and other parameters. The video rendering space information includes a rendering area, a rendering point, and a point on the rendering line; the video rendering decoding information includes a maximum video bandwidth of the video rendering, a maximum number of macroblocks per second, a maximum video resolution width, and a maximum video resolution height. Maximum video frame rate. Preferably, the audio parameters are used to describe the properties of the individual audios that make up the rendered scene. The audio parameters mainly include parameters such as audio rendering space information, audio rendering decoding information, and number of audio renderings. The audio rendering space information includes a rendering area, a rendering point, and a point on the rendering line; the audio rendering decoding information includes an audio channel format of the audio rendering, and a maximum audio bandwidth. Preferably, the logical channel attribute specified by the remote presentation terminal mainly includes media related information, codec information, and the like used by the channel, and if the logical channel needs to be multiplexed, channel multiplexing information needs to be specified. The preferred embodiment 7 is one of the preferred embodiments of the negotiation mode A-1 in the preferred embodiment in this embodiment, and describes a specific capability negotiation process between the remote presentation endpoint A and the remote presentation endpoint B, wherein the endpoint A With 3 cameras, 3 monitors, 1 microphone, 1 speaker, Endpoint B has 3 cameras, 3 monitors, 1 microphone, 1 speaker. Endpoint A or / and Endpoint B can also be MCU devices. The method includes the following steps S1702 to S1716. Step S1702: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the reception capability set of the endpoint A. The video parameters in the rendering related parameters included in the receiving capability set are: the video rendering space information with the rendering identifier VR0 is represented as the left, the video content attribute is the main video, and the maximum video bandwidth in the video rendering decoding information is 4M; The video rendering space information of VR1 is represented as medium, the video content attribute is the main video, and the maximum video bandwidth in the video rendering and decoding information is 4M; The video rendering space information with the rendering identifier VR2 is represented as the right, the video content attribute is the main video, and the maximum video bandwidth in the video rendering decoding information is 4M; the video rendering space information with the rendering identifier VR3 is represented as the left, and the video content attribute is the largest. Acoustic speaker, the maximum video bandwidth in the video rendering and decoding information is 4M, the automatic switching attribute is YES; the video rendering space information with the rendering identifier VR4 is represented as medium, the video content attribute is the panoramic video, and the maximum video in the video rendering decoding information The bandwidth is 4M; the video rendering space information with the rendering identifier VR5 is represented as right, the video content attribute is VIP, and the maximum video bandwidth in the video rendering decoding information is 4M; the audio parameters in the rendering related parameters included in the receiving capability set are: rendering The audio content identified as AR0 is the main audio, the maximum bandwidth in the audio rendering and decoding information is 128K, and the audio channel format is stereo; the common parameters in the rendering related parameters included in the receiving capability set are: The scene with the scene identifier is 1 by VR0, VR1, VR2, scene description is to render left, center, right video The codec standard is H.264, and the maximum bandwidth is 12M. The scene with scene ID 2 consists of VR3, VR4, and VR5. The scene is described as rendering the largest speaker, panorama, VIP video, and the codec standard is H.264. The bandwidth is 12M. The scene with the scene ID of 3 is composed of AR0. The scene is described as rendering the main audio. The codec standard is G711 and the maximum bandwidth is 128K. Step S1704: The endpoint Β reply to the endpoint Α capability set interaction response message; Step S1706: The remote presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, where the message carries the reception capability set of the endpoint B; and the video rendering space with the rendering identifier VR0 The information is represented as left, the video content attribute is the main video, and the maximum video bandwidth in the video rendering and decoding information is 4M; the video rendering space information with the rendering identifier VR1 is represented as medium, the video content attribute is the main video, and the video rendering information is decoded. The maximum video bandwidth is 4M; The video rendering space information with the rendering identifier VR2 is represented as the right, the video content attribute is the main video, and the maximum video bandwidth in the video rendering decoding information is 4M; the audio content of the rendering identifier AR0 is the main audio, and the audio rendering decoding information is the largest. The bandwidth is 128K, and the audio channel format is stereo. The common parameters in the rendering related parameters included in the receiving capability set are: The scene with the scene identifier 1 consists of VR0, VR1, and VR2, and the scene is described as rendering left, center, and right video. The decoding standard is H.264, and the maximum bandwidth is 12M. The scene with the scene ID of 2 is composed of AR0. The scene is described as rendering the main audio. The codec standard is G711, and the maximum bandwidth is 128K. Step SI 708: Endpoint A replies to the Endpoint B Capability Set Interactivity Response message. Step S1710: According to the receiving capability of the endpoint A, the endpoint B determines the sending scenario 1 and scenario 3 of the receiving capability of the endpoint A according to its own sending capability. Endpoint B sends an Open Logical Channel Request message to Endpoint A, specifies the channel properties, and requests to open the forward logical channel from B to A. The specified logical channel attributes are: One video logical channel transmits three RTP video streams, and one audio logical channel transmits one RTP audio stream. The video channel media related information is the VR0, VR1, and VR2 video transmission; the encoding related information is encoded by H.264, and the maximum bandwidth is 12M; the multiplexing information is VR0, VR1, and VR2 respectively. 1, 2, used to distinguish different RTP streams in the same logical channel. The audio channel media related information is sent AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S1712: The endpoint replies to the endpoint B to open the logical channel response message. Step S1714: The endpoint A determines the sending scenario 1 and scenario 2 of the endpoint B receiving capability set according to the receiving capability of the endpoint B and its own sending capability. Endpoint A sends an Open Logical Channel Request message to Endpoint B, specifies the channel properties, and requests to open the forward logical channel from A to B. The specified logical channel attributes are: One video logical channel transmits three RTP video streams, and one audio logical channel transmits one RTP audio stream. The video channel media related information is the VR0, VR1, and VR2 video transmission; the encoding related information is encoded by H.264, and the maximum bandwidth is 12M; the multiplexing information is VR0, VR1, and VR2 respectively. 1, 2, used to distinguish different RTP streams in the same logical channel. The audio channel media related information is sent AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S1716: The endpoint Α replies to the endpoint Α to open the logical channel response message. Preferred Embodiment 8 This preferred embodiment is one of the preferred embodiments of the preferred embodiment negotiation mode A1, and describes a specific capability negotiation procedure between the remote presentation endpoint A and the remote presentation endpoint B. Endpoint A and Endpoint B are telepresence video conferencing endpoints. Endpoint A has 3 cameras, 3 displays, 1 microphone, and 1 speaker. Endpoint B has 1 camera, 1 display, 1 microphone, and 1 speaker. Endpoint A or / and Endpoint B can also be MCU devices. The method includes the following steps S1802 to S1816. Step S1802: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the reception capability set of the endpoint A. Preferably, the video parameters in the rendering related parameters included in the receiving capability set are: the video rendering spatial information with the rendering identifier VR0 is represented as the left, the video content attribute is the main video, and the maximum video bandwidth in the video rendering decoding information is 4M. The video rendering space information with the rendering identifier VR1 is represented as medium, the video content attribute is the main video, and the maximum video bandwidth in the video rendering decoding information is 4M. The video rendering space information with the rendering identifier VR2 is represented as the right, the video content attribute is the main video, and the maximum video bandwidth in the video rendering decoding information is 4M. The video rendering space information with the rendering identifier VR3 is represented as medium, the video content attribute is panoramic video, and the maximum video bandwidth in the video rendering decoding information is 4M. Preferably, the audio parameters in the rendering related parameters included in the receiving capability set are: rendering the audio content identified as AR0 as the main audio, the maximum bandwidth in the audio rendering decoding information is 128K, and the audio channel format is stereo. The common parameters in the rendering-related parameters included in the receiving capability set are: The scene with the scene identifier of 1 is composed of VR0, VR1, and VR2, and the scene is described as rendering left, center, and right video. The codec standard is H.264, and the maximum bandwidth is 12M. The scene with the scene ID of 2 is composed of VR3. The scene is described as rendering panoramic video. The codec standard is H.264, and the maximum bandwidth is 4M. The scene with the scene ID of 3 is composed of AR0. The scene is described as rendering the main audio. The codec standard is G711, and the maximum bandwidth is 128K. Step S1804: The endpoint Β reply to the endpoint Α capability set interaction response message. Step S1806: The remote presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, and the message carries the reception capability set of the endpoint B. The scene with the rendering identifier VR0 is described as rendering the panoramic video. The codec standard is H.264, and the maximum bandwidth is

4M. The audio content identified as AR0 is rendered as the main audio, and the maximum bandwidth in the audio rendering and decoding information is 128K, and the audio channel format is stereo. The common parameters in the rendering-related parameters included in the receiving capability set are: The scene with the scene identifier of 1 is composed of VR0, and the scene is described as rendering panoramic video. The codec standard is H.264, and the maximum bandwidth is 4M. The scene with the scene ID of 2 is composed of AR0. The scene is described as rendering the main audio. The codec standard is G711, and the maximum bandwidth is 128K. Step SI 808: Endpoint A replies to the Endpoint B Capability Set Interactivity Response message. Step S1810: According to the receiving capability of the endpoint A, the endpoint B determines the sending scenario 2 and scenario 3 of the receiving capability of the endpoint A according to its own sending capability. Endpoint B sends an Open Logical Channel Request message to Endpoint A, specifies the channel properties, and requests to open the forward logical channel from B to A. The specified logical channel attributes are: One video logical channel transmits one RTP video stream, and one audio logical channel transmits one RTP audio stream. The video channel media related information is sent VR3 video; the encoding related information is encoded by H.264, and the maximum bandwidth is 4M; no multiplexing information. The audio channel media related information is AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S1812: The endpoint replies to the endpoint B to open the logical channel response message. Step S1814: Based on the receiving capability of the endpoint B, and in conjunction with its own sending capability, the endpoint A determines to send the scenario 1 and scenario 2 of the endpoint B receiving capability set. Endpoint A sends an Open Logical Channel Request message to Endpoint B, specifies the channel properties, and requests to open the forward logical channel from A to B. The specified logical channel attributes are: One video logical channel transmits one RTP video stream, and one audio logical channel transmits one RTP audio stream. The video channel media related information is the VR0 video transmission; the encoding related information is encoded by H.264, and the maximum bandwidth is 12M; no multiplexing information. The audio channel media related information is AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S1816: The endpoint Α replies to the endpoint Α to open the logical channel response message. Preferred Embodiment 9 This preferred embodiment is one of the preferred embodiments of the negotiation mode A-2, and describes a specific capability negotiation procedure between the remote presentation endpoint A and the remote presentation endpoint B. Endpoint A and Endpoint B are telepresence video conferencing endpoints. Endpoint A has 3 cameras, 3 displays, 1 microphone, and 1 speaker. Endpoint B has 3 cameras, 3 displays, 1 microphone, and 1 speaker. Endpoint A or / and Endpoint B can also be MCU devices. The preferred embodiment includes the following steps S1902 to S1916. Step S1902: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the symmetric capability set of the endpoint A; the video parameters in the rendering/capture related parameters included in the symmetric capability set are: the rendering/capture identifier is The video rendering space information of VR0 is represented as left, the video content attribute is the main video, and the maximum video bandwidth in the video rendering/capturing codec information is 4M; the rendering/capturing video rendering space information identified as VR1 is represented as medium, video content attribute For the main video, the maximum video bandwidth in the video rendering/capturing codec information is 4M; the rendering/capturing video rendering space information identified as VR2 is represented as the right, the video content attribute is the main video, and the video rendering/capturing codec information is The maximum video bandwidth is 4M; the audio parameters in the rendering related parameters included in the receiving capability set are: rendering/capturing the audio content identified as AR0 as the main audio, and the maximum bandwidth in the audio rendering/capturing codec information is 128K, the audio channel format For stereo; the symmetrical capability set contains common parameters in the rendering/capture related parameters The scene with the scene ID is 1 is composed of VR0, VR1, and VR2. The scene is described as rendering left, center, and right video. The codec standard is H.264, and the maximum bandwidth is 12M. The scene with the scene ID of 2 is composed of VR3. The scene is described as rendering panoramic video. The codec standard is H.264, and the maximum bandwidth is 4M. The scene with scene ID 3 is composed of AR0. The scene is described as rendering main audio, codec standard. For G711, the maximum bandwidth is 128K. Step S1904: The endpoint Β reply to the endpoint Α capability set interaction response message; Step S1906: The remote presentation endpoint B initiates a capability set interaction request to the remote presentation endpoint A, the message carries the symmetric capability set of the endpoint B; and renders/captures the video identified as VR0 The rendering/capturing spatial information is represented as left, the video content attribute is the main video, and the maximum video bandwidth in the video rendering/capturing codec information is 4M; the rendering/capturing video rendering/capturing spatial information identified as VR1 is represented as medium, video content The attribute is the main video, the maximum video bandwidth in the video rendering/capturing codec information is 4M; the rendering/capturing video rendering/capturing space information identified as VR2 is represented as right, the video content attribute is the main video, and the video rendering/capturing codec is decoded. The maximum video bandwidth in the message is 4M; the audio content identified as AR0 is rendered/captured as the primary audio, the maximum bandwidth in the audio rendering/capture codec information is 128K, the audio channel format is stereo; the receive capability set contains rendering/capture The common parameters in the related parameters are: The scene with the scene ID is 1 consists of VR0, VR1, and VR2. The scene is described as rendering/capturing the left, middle, and right videos. The codec standard is H.264, and the maximum bandwidth is 12M. The scene with the scene identifier of 2 is composed of AR0. The scene is described as rendering/capturing the main audio, and the codec standard is G. .711, the maximum bandwidth is 128K. Step S1908: The endpoint Α replies to the endpoint B capability set interaction response message. Step S1910: The endpoint B determines the scenario 1 and scenario 3 of the endpoint A receiving capability set according to the symmetry capability of the endpoint A and its own capabilities. Endpoint B sends an Open Logical Channel Request message to Endpoint A. Specifies the channel properties, requesting to open the forward logical channel from B to A. The specified logical channel attributes are: One video logical channel transmits three RTP video streams, and one audio logical channel transmits one RTP audio stream. Video channel media related information is sent VR0, VR1, VR2 video; encoding related information is compiled with H.264 The maximum bandwidth of the code is 12M. The value of the RTP header extension identifier corresponding to the VR0, VR1, and VR2 is 0, 1, and 2, which are used to distinguish different RTP streams in the same logical channel. The audio channel media related information is sent AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S1912: The endpoint replies to the endpoint B to open the logical channel response message. Step S1914: The endpoint A determines the scenario 1 and scenario 2 of the endpoint B receiving capability set according to the symmetry capability of the endpoint B and its own sending capability. Endpoint A sends an Open Logical Channel Request message to Endpoint B, specifies the channel properties, and requests to open the forward logical channel from A to B. The specified logical channel attributes are: One video logical channel transmits three RTP video streams, and one audio logical channel transmits one RTP audio stream. The video channel media related information is the VR0, VR1, and VR2 video transmission; the encoding related information is encoded by H.264, and the maximum bandwidth is 12M; the multiplexing information is VR0, VR1, and VR2 respectively. 1, 2, used to distinguish different RTP streams in the same logical channel. The audio channel media related information is sent AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S1916: The endpoint Α replies to the endpoint Α to open the logical channel response message. Preferred Embodiments The preferred embodiment is one of the preferred embodiments of the negotiation mode A-3-1, and describes a specific capability negotiation procedure between the remote presentation endpoint A and the remote presentation endpoint B. Endpoint A and Endpoint B are telepresence video conferencing endpoints. Endpoint A has 3 cameras, 3 displays, 1 microphone, and 1 speaker. Endpoint B has 1 camera, 1 display, 1 microphone, and 1 speaker. Endpoint A or / and Endpoint B can also be MCU devices. The preferred embodiment includes the following steps S2002 to S2016. Step S2002: The remote presentation endpoint A initiates a capability set interaction request to the remote presentation endpoint B, and the message carries the transmission capability set of the endpoint A; the video parameters in the capture related parameters included in the transmission capability set are: video capture with the capture identifier VR0 The spatial information is represented as the left, the video content attribute is the main video, and the maximum video bandwidth in the video capture encoded information is 4M; the video capture spatial information with the capture identifier VR1 is represented as medium, the video content attribute is the main video, and the video capture encoded information is included. The maximum video bandwidth is 4M; the video capture space information with the capture identifier VR2 is represented as the right, the video content attribute is the main video, and the maximum video bandwidth in the video capture encoded information is 4M; Capture the video content attribute identified as VR3 as a panorama, and the maximum video bandwidth in the video capture encoded information is

4M; The audio parameters in the acquisition related parameters included in the receiving capability set are: capturing the audio content identified as AR0 as the main audio, the maximum bandwidth in the audio capture encoding information is 128K, the audio channel format is stereo; the transmission capability set includes the capture The common parameters in the related parameters are: The scene with the scene ID is 1 is composed of VR0, VR1, and VR2. The scene is described as capturing left, center, and right video. The encoding standard is H.264, and the maximum bandwidth is 12M. The scene identifier is 2 The scene is composed of VR3. The scene is described as capturing panoramic video. The encoding standard is H.264, and the maximum bandwidth is 4M. The scene with scene ID 3 is composed of AR0. The scene is described as capturing main audio. The encoding standard is G711. The bandwidth is 128K. Step S2004: the endpoint Β reply endpoint Α capability set interaction response message carries the parameters selected by B from the transmission capability set of A: the media in scenario 2 and scenario 3; step S2006: the remote presentation endpoint B initiates the capability set to the remote presentation endpoint A The interaction request, the message carries the transmission capability set of the endpoint B; the video parameters in the acquisition related parameters included in the transmission capability set are: The scene with the capture identifier VR0 is described as capturing the panoramic video, the coding standard is H.264, and the maximum bandwidth is 4M; captures the audio content identified as AR0 as the main audio, the maximum bandwidth in the audio capture encoding information is 128K, and the audio channel format is stereo; the common parameters in the capture related parameters included in the transmission capability set are: the scene with the scene identifier being 1 It is composed of VR0. The scene is described as capturing panoramic video. The encoding standard is H.264, and the maximum bandwidth is 4M. The scene with scene identifier 2 is composed of AR0. The scene is described as capturing main audio. The encoding standard is G711 and the maximum bandwidth is 128K. Step S2008: Endpoint A replies to the Endpoint B capability set interaction response message, and carries the parameters selected by A from the transmission capability set of B: media in scenario 1 and scenario 2; step S2010: endpoint A selects in the capability set interaction response according to endpoint B The ability to send an Open Logical Channel Request message to Endpoint B. Specify the channel properties and request to open the forward logical channel from A to B. The specified channel attributes are: One video logical channel transmits one RTP video stream, and one audio logical channel transmits one RTP audio stream. The video channel media related information is sent VR3 video; the encoding related information is encoded by H.264, and the maximum bandwidth is 4M; no multiplexing information. The audio channel media related information is AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S2012: The endpoint sends a logical channel response message to the endpoint Α reply; Step S2014: The endpoint B sends an open logical channel request message to the endpoint A according to the capability of the endpoint A in the capability set interaction response selection. Specify the channel attribute, request to open the forward logical channel from B to A; one video logical channel transmits one RTP video stream, and one audio logical channel transmits one RTP audio stream. The video channel media related information is the VR0 video transmission; the encoding related information is encoded by H.264, and the maximum bandwidth is 12M; no multiplexing information. The audio channel media related information is AR0 audio, the encoding related information is G711, and the maximum bandwidth is 128K. Step S2016: The endpoint replies to the endpoint B to open the logical channel response message. It should be noted that the corresponding embodiments of the other methods are similar to the above embodiments. Obviously, those skilled in the art should understand that the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software. The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention. INDUSTRIAL APPLICABILITY The technical solution provided by the embodiments of the present invention can be applied to the field of communications, and the capability interaction of the remote presentation terminal is realized, and the user experience is improved.

Claims

1. A method for interactively interacting with endpoints, including:

A capability interaction between the first remote presentation endpoint and the second remote presentation endpoint, where the capability interaction message carries a remote presentation endpoint capability set, where the capability interaction message carries the remote presentation endpoint Sending capability set;

The first telepresence endpoint receives a mode request message of the second telepresence endpoint; the first telepresence endpoint opens the first telepresence endpoint according to the result of the capability interaction and the received mode request information The second telepresence logical channel between the endpoints.

2. The method of claim 1, wherein performing capability interaction between the first telepresence endpoint and the second telepresence endpoint comprises:

The first remote presentation endpoint sends a first capability set interaction request to the second remote presentation endpoint, where the first capability set interaction request carries a first transmission capability set of the first remote presentation endpoint;

The first remote presentation endpoint receives the second capability set interaction request sent by the second remote presentation endpoint, where the second capability set interaction request carries the second transmission capability set of the second remote presentation endpoint;

Receiving, by the first remote presentation endpoint, the first mode request message sent by the second remote presentation endpoint, where the first mode request message carries a sending parameter of the first remote presentation endpoint; a second mode request message sent by the remote presentation endpoint to the second remote presentation endpoint, where the second mode request message carries a transmission parameter of the second remote presentation endpoint; the first remote presentation endpoint is configured according to As a result of the mode request process corresponding to the second mode request message, sending a first logical channel request to the second remote presentation endpoint, where the first logical channel request is used to request the first remote rendering endpoint to open a forward logical channel between the first telepresence endpoint and the second telepresence endpoint;

Receiving, by the first remote presentation endpoint, a second logical channel request sent by the second remote presentation endpoint, where the second logical channel request is a mode corresponding to the first remote mode endpoint according to the first mode request As a result of the requesting process, the second logical channel request is for requesting the second telepresence endpoint to open a forward logical channel between the second telepresence endpoint and the first telepresence endpoint.

3. The method of claim 1, wherein performing capability interaction between the first telepresence endpoint and the second telepresence endpoint comprises:

The first remote presentation endpoint sends a third capability set interaction request to the second remote presentation endpoint, where the third capability set interaction request carries the first transmission capability set of the first remote presentation endpoint;

The first remote presentation endpoint receives the three-mode request message sent by the second remote presentation endpoint, where the third mode request message carries the sending parameter of the first remote presentation endpoint; The presentation endpoint receives the fourth capability set interaction request sent by the second remote presentation endpoint, where the fourth capability set interaction request carries the second transmission capability set of the second remote presentation endpoint;

a fourth mode request message sent by the first remote presentation endpoint to the second remote presentation endpoint, where the fourth mode request message carries a transmission parameter of the second remote presentation endpoint; The telepresence endpoint sends a third logical channel request to the second remote presentation endpoint according to a result of the mode request process corresponding to the fourth mode request message, where the third logical channel request is used to request the first The telepresence endpoint opens a forward logical channel between the first telepresence endpoint and the second telepresence endpoint;

Receiving, by the first remote presentation endpoint, a fourth logical channel request sent by the second remote presentation endpoint, where the fourth logical channel request is a mode corresponding to the first remote mode endpoint according to the first mode request As a result of the requesting process, the fourth logical channel request is for requesting the second telepresence endpoint to open a forward logical channel between the second telepresence endpoint and the first telepresence endpoint.

The method according to claim 2 or 3, wherein, after the first remote presentation endpoint sends the first capability set interaction request to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint Receiving, by the second remote presentation endpoint, a response message corresponding to the first capability set interaction request;

After the first remote presentation endpoint sends the third capability set interaction request to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint receiving, by the second remote presentation endpoint, the corresponding The response message corresponding to the three capability set interaction request. The method according to claim 2 or 3, wherein After the first remote presentation endpoint receives the second capability set interaction request sent by the second remote presentation endpoint, the method further includes: sending, by the first remote presentation endpoint, the second remote presentation endpoint to the second capability Set a response message corresponding to the interaction request;

After the first remote presentation endpoint receives the fourth capability set interaction request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the fourth capability to the second remote presentation endpoint Set the response message corresponding to the interaction request.

The method according to claim 2 or 3, after the first remote presentation endpoint receives the first mode request message sent by the second remote presentation endpoint, further comprising: the first remote presentation endpoint Transmitting, to the second remote presentation endpoint, a response message corresponding to the first mode request message;

After the first remote presentation endpoint receives the third mode request message sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the third remote presentation endpoint to the third remote presentation endpoint The response message corresponding to the mode request message.

The method according to claim 2 or 3, wherein, after the second mode request message sent by the first remote presentation endpoint to the second remote presentation endpoint, the method further comprises: the first remote presentation endpoint Sending a response message corresponding to the second mode request message to the second remote presentation endpoint;

After the first remote presentation endpoint sends the fourth mode request message to the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the fourth remote presentation endpoint to the fourth remote presentation endpoint A response message to the mode request message.

The method according to claim 2 or 3, wherein the first remote presentation endpoint sends a first to the second remote presentation endpoint according to a result of a mode request process corresponding to the second mode request message. After the logical channel request, the method further includes: the first remote presentation endpoint receiving a response message corresponding to the first logical channel request sent by the second remote presentation endpoint;

After the first remote presentation endpoint sends the third logical channel request to the second remote presentation endpoint according to the result of the mode request process corresponding to the second mode request message, the method further includes: the first remote presentation endpoint Receiving, by the second remote presentation endpoint, a response message corresponding to the third logical channel request.

9. The method according to claim 2 or 3, wherein After the first remote presentation endpoint receives the second logical channel request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the second logical channel to the second remote presentation endpoint Request a corresponding response message;

After the first remote presentation endpoint receives the fourth logical channel request sent by the second remote presentation endpoint, the method further includes: the first remote presentation endpoint sending the fourth logical channel to the second remote presentation endpoint Request the corresponding response message.

The method according to any one of claims 1 to 3, wherein the remote presentation endpoint transmission capability set comprises a capture parameter, wherein the capture parameter comprises a universal parameter, a video parameter and/or an audio parameter.

The method according to claim 10, wherein the universal parameters include media capture content, scene description, scene switching policy, general space information, and/or general encoding information; and the media capturing content indicates a use of media capturing; The scenario description is used to provide a description of the overall scenario.

The method according to claim 11, wherein the scenario switching policy is used to indicate a supported media switching policy, wherein the supported media switching policy comprises a location switching policy and a partial switching policy, where The location switching policy is used to indicate that all acquisitions are simultaneously switched to ensure that the acquisitions come together from the same endpoint location, the partial handover strategy is used to indicate that different acquisitions can be switched at different times, and from the same and/or different remote presentations End point.

The method according to claim 11, wherein the general space information comprises a scene area and/or an area scale parameter, wherein the scene area parameter is used to indicate a range of an overall scene related to the endpoint, the area The scale indicates the type of scale used for the spatial information parameter.

The method according to claim 11, wherein the universal coding information includes all maximum bandwidth, all maximum number of pixels per second, and/or all maximum macroblocks per second, wherein the total maximum bandwidth a maximum number of bits per second for indicating a total code stream of a preset type issued by the terminal; the total number of pixels per second being used to represent the maximum number of pixels per second independently coded in the code group; The maximum number of macroblocks per second represents the maximum number of macroblocks per second for all video streams sent by the endpoint.

15. The method of claim 10, wherein the video parameters comprise: a number of video captures, video capture spatial information, and/or video capture encoded information; the number of video captures is used to represent the number of video captures.

16. The method of claim 15, wherein the video capture spatial information comprises a capture area, a capture point, and/or a point on a capture line, wherein the capture area is used to indicate that the video capture is captured in an overall scene The spatial location in which it is located; the capture point, used to indicate the video capture in the captured scene Position; a point on the capture line that describes the spatial location of the second point on the optical axis of the capture device, and the first point is the capture point.

17. The method of claim 15, wherein the video capture encoded information comprises a maximum video bandwidth, a maximum number of pixels per second, a width of a maximum video resolution, a height of a maximum video resolution, and/or a maximum video frame. Rate; wherein the maximum video bandwidth is used to indicate a maximum number of bits per second for a single video encoding; the maximum number of pixels per second, the parameter is used to represent a maximum number of pixels per second for a single video encoding; The width of the resolution, which is used to represent the width of the maximum video resolution in pixels; the height of the maximum video resolution, which is used to represent the height of the maximum video resolution in pixels; Maximum video frame rate, this parameter indicates the maximum video frame rate.

18. The method of claim 10, wherein the audio parameters comprise an audio capture quantity, audio capture space information, and/or audio capture coding information; the audio capture quantity is used to indicate an audio capture number

19. The method according to claim 18, wherein the audio capture spatial information comprises: a capture area and/or a capture point, wherein the capture area is used to indicate a spatial location where the audio capture is located in the overall captured scene. The capture point is used to indicate the location of the audio capture in the captured scene.

20. The method according to claim 18, wherein the audio capture coding information comprises: an audio channel format and/or a maximum audio bandwidth; the audio channel format, used to indicate an attribute of an audio channel; , the maximum number of bits per second used to indicate a single audio encoding.

21. A capability interaction device for telepresence endpoints, applied to a first telepresence endpoint, comprising:

The interaction module is configured to perform a capability interaction with the second remote presentation endpoint, where the capability interaction message carries a remote presentation endpoint capability set, where the capability interaction message carries the remote presentation endpoint Sending capability set;

a first receiving module, configured to receive a mode request message of the second remote rendering endpoint; the processing module, configured to open the first remote rendering endpoint and the according to the result of the capability interaction and the received mode request information The second telepresence logical channel between the endpoints.

The device according to claim 21, wherein the interaction module comprises: a first sending module, configured to send a first capability set interaction request to the second remote presentation endpoint, where the first capability set The first request capability set of the first remote presentation endpoint is carried in the interaction request; a second receiving module, configured to receive a second capability set interaction request sent by the second remote presentation endpoint, where the second capability set interaction request carries a second sending capability set of the second remote presentation endpoint;

a third receiving module, configured to receive a first mode request message sent by the second remote presentation endpoint, where the first mode request message carries a sending parameter of the first remote rendering endpoint; a second mode request message sent to the second remote presentation endpoint, where the second mode request message carries a transmission parameter of the second remote presentation endpoint, and a third sending module is configured to As a result of the mode request process corresponding to the second mode request message, sending a first logical channel request to the second remote presentation endpoint, where the first logical channel request is used to request the first remote rendering endpoint to open a forward logical channel between the first telepresence endpoint and the second telepresence endpoint;

a fourth receiving module, configured to receive a second logical channel request sent by the second remote rendering endpoint, where the second logical channel request is a mode corresponding to the second remote rendering endpoint according to the first mode request As a result of the requesting process, the second logical channel request is for requesting the second telepresence endpoint to open a forward logical channel between the second telepresence endpoint and the first telepresence endpoint.

The device according to claim 21, wherein the interaction module comprises:

a fourth sending module, configured to send a third capability set interaction request to the second remote presentation endpoint, where the third capability set interaction request carries a first sending capability set of the first remote rendering endpoint;

a fifth receiving module, configured to receive a third mode request message sent by the second remote presentation endpoint, where the third mode request message carries a sending parameter of the first remote rendering endpoint; The fourth capability set interaction request sent by the second remote presentation endpoint is received, where the fourth capability set interaction request carries the second transmission capability set of the second remote presentation endpoint;

a fifth sending module, configured to send a fourth mode request message to the second remote presentation endpoint, where the fourth mode request message carries a sending parameter of the second remote rendering endpoint; And sending, to the second remote presentation endpoint, a third logical channel request according to a result of the mode request process corresponding to the fourth mode request message, where the third logical channel request is used to request the first The telepresence endpoint opens a forward logical channel between the first telepresence endpoint and the second telepresence endpoint; a seventh receiving module, configured to receive a fourth logical channel request sent by the second remote presentation endpoint, where the fourth logical channel request is a mode corresponding to the first remote mode endpoint according to the first mode request As a result of the requesting process, the fourth logical channel request is for requesting the second telepresence endpoint to open a forward logical channel between the second telepresence endpoint and the first telepresence endpoint.

The device according to claim 22 or 23, further comprising: an eighth receiving module, configured to: after the first sending module sends the first capability set interaction request to the second remote rendering endpoint, receive the a response message corresponding to the first capability set interaction request sent by the second remote presentation endpoint;

The ninth receiving module is configured to: after the fourth sending module sends the third capability set interaction request to the second remote rendering endpoint, receive, by the second remote rendering endpoint, corresponding to the third capability set interaction request Response message.

25. The device according to claim 22 or 23, further comprising:

a seventh sending module, configured to send, after the second receiving module receives the second capability set interaction request sent by the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the second capability set interaction request Response message

The eighth sending module is configured to: after the sixth receiving module receives the fourth capability set interaction request sent by the second remote rendering endpoint, the method further includes: sending, by the first remote rendering endpoint, the corresponding to the second remote rendering endpoint Corresponding to the response message corresponding to the fourth capability set.

The device according to claim 22 or 23, further comprising:

a ninth sending module, configured to send, after the third receiving module receives the first mode request message sent by the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the first mode request message Response message

a tenth sending module, configured to send, after the fifth receiving module receives the third mode request message sent by the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the third mode request message Response message.

27. Apparatus according to claim 22 or 23, wherein

An eleventh sending module, configured to send, after the second sending module sends the second mode request message to the second remote rendering endpoint, to the second remote rendering endpoint, corresponding to the second mode request message Response message a twelfth sending module, configured to send, after the fourth mode sending message sent by the fifth sending module to the second remote rendering endpoint, a message corresponding to the fourth mode request message to the second remote rendering endpoint Response message.

The device according to claim 22 or 23, wherein the tenth receiving module is configured to: at the third sending module, according to a result of the mode request process corresponding to the second mode request message, to the second After the remote presentation endpoint sends the first logical channel request, receiving a response message corresponding to the first logical channel request sent by the second remote presentation endpoint;

The eleventh receiving module is configured to: after the sixth sending module sends the third logical channel request to the second remote rendering endpoint according to the result of the mode request process corresponding to the second mode request message, A response message corresponding to the third logical channel request sent by the second remote presentation endpoint.

29. Apparatus according to claim 22 or 23, wherein

a thirteenth sending module, configured to send, after the first remote rendering endpoint of the fourth receiving module receives the second logical channel request sent by the second remote rendering endpoint, to the second remote rendering endpoint The second logical channel requests a corresponding response message;

a fourteenth sending module, configured to: after the seventh receiving module receives the fourth logical channel request sent by the second remote rendering endpoint, send the fourth logical channel request corresponding to the second remote rendering endpoint Response message.

The data stream includes: a telepresence endpoint capability set, where the telepresence endpoint capability set includes: a sending capability set, where the sending capability set includes: a capturing parameter, where the capturing parameter includes: a universal parameter, a video parameter And / or audio parameters.

31. The data stream of claim 30, wherein the universal parameters comprise media capture content, scene description, scene switching policy, general space information, and/or general encoding information; the media capture content represents use of media capture The scene description is used to provide a description of the overall scene.

The data flow according to claim 31, wherein the scenario switching policy is used to indicate a supported media switching policy, where the supported media switching policy comprises a location switching policy and a partial switching policy, where The local handover policy is used to indicate that all acquisitions are simultaneously switched to ensure that the acquisitions come together from the same endpoint location, the partial handover strategy is used to indicate that different acquisitions can be switched at different times, and from the same and/or different remotes. Render the endpoint.

The data stream according to claim 31, wherein the general space information includes a scene area and/or an area scale parameter, where the scene area parameter is used to indicate a range of an overall scene related to the endpoint, The area scale indicates the type of scale used for the spatial information parameter.

34. The data stream of claim 31, wherein the universally encoded information comprises all maximum bandwidth, all maximum number of pixels per second, and/or all maximum number of macroblocks per second, wherein the total maximum is The bandwidth is used to indicate a maximum number of bit rates per second of all code streams of a preset type sent by the terminal; the total number of pixels per second is used to represent the maximum number of pixels per second independently coded in the code group; The maximum number of macroblocks per second represents the maximum number of macroblocks per second for all video streams sent by the endpoint.

35. The data stream of claim 30, wherein the video parameters comprise: a video capture number, video capture spatial information, and/or video capture encoded information; the video capture number is used to represent a number of video captures.

36. The data stream of claim 35, wherein the video capture spatial information comprises a capture area, a capture point, and/or a point on a capture line, wherein the capture area is used to indicate that the video capture is captured in an overall manner a spatial location in the scene; the capture point, used to indicate the location of the video capture in the captured scene; the point on the capture line, describing the spatial location of the second point on the optical axis of the capture device , and the first point is the capture point.

37. The data stream of claim 35, wherein the video capture encoded information comprises a maximum video bandwidth, a maximum number of pixels per second, a width of a maximum video resolution, a height of a maximum video resolution, and/or a maximum video. a frame rate, where the maximum video bandwidth is used to indicate a maximum number of bits per second for a single video encoding; the maximum number of pixels per second, the parameter is used to represent a maximum number of pixels per second for a single video encoding; The width of the video resolution, which is used to represent the width of the maximum video resolution in pixels; the height of the maximum video resolution, which is used to represent the height of the maximum video resolution in pixels; The maximum video frame rate, which indicates the maximum video frame rate.

38. The data stream of claim 30, wherein the audio parameters comprise an audio capture number, audio capture spatial information, and/or audio capture encoded information; the audio capture quantity is used to indicate the number of audio captures.

The data stream according to claim 38, wherein the audio capture space information comprises: a capture area and/or a capture point, wherein the capture area is used to indicate that the audio capture is in a space where the entire captured scene is located Position; the capture point, used to indicate the location of the audio capture in the captured scene.

0. The data stream according to claim 38, wherein the audio capture coding information comprises: an audio channel format and/or a maximum audio bandwidth; the audio channel format, used to indicate an attribute of an audio channel; Bandwidth, the maximum number of bits per second used to indicate a single audio encoding.