WO2023066023A1 - Gesture-based communication method and apparatus, storage medium, and electronic apparatus - Google Patents

Gesture-based communication method and apparatus, storage medium, and electronic apparatus Download PDF

Info

Publication number
WO2023066023A1
WO2023066023A1 PCT/CN2022/123487 CN2022123487W WO2023066023A1 WO 2023066023 A1 WO2023066023 A1 WO 2023066023A1 CN 2022123487 W CN2022123487 W CN 2022123487W WO 2023066023 A1 WO2023066023 A1 WO 2023066023A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
stream
service
target
data channel
Prior art date
Application number
PCT/CN2022/123487
Other languages
French (fr)
Chinese (zh)
Inventor
陈小丽
章璐
王梦晓
陈世林
方琰崴
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023066023A1 publication Critical patent/WO2023066023A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures

Definitions

  • the present disclosure relates to the communication field, and in particular, to a gesture communication method, device, storage medium and electronic device.
  • Gestures are often used in daily life. Gesture users, such as deaf-mute people in special groups, have great obstacles in communicating with normal people. Their gestures are extremely difficult to understand as a communication language (sign language), and it is difficult for non-professionals and normal people to accurately recognize the gestures of deaf-mute people: when deaf-mute users dial various public service calls (119, 110, 120, etc.) , public service personnel cannot directly understand what deaf-mute users want to express; when deaf-mute users participate in online teaching, deaf-mute users cannot interact with teachers in real time in a simple way; deaf-mute users cannot directly communicate with normal users on the phone. normal communication, etc. This requires the recognition and translation of gestures (sign language) of the deaf and the delivery of communication. There are also some gesture users in specific application scenarios, such as military sign language and sign language for special industries, which also need to be recognized and translated accordingly.
  • gesture recognition relies on specific equipment such as wearing equipment gloves. These devices are expensive and are only suitable for interaction within a certain range. There are often limitations in time and space, and they are not direct and natural interaction and communication. There are also some gesture recognition based on vision, which rely on specific collectors such as somatosensors to collect gestures. Data and analysis data, basic phone calls, rely on terminal equipment, have high requirements for terminal processing, not economical and convenient, information and data updates are not timely, and communication experience is poor.
  • Embodiments of the present disclosure provide a gesture communication method, device, storage medium, and electronic device, so as to at least solve the technical problem in the related art that gesture communication mainly depends on specific equipment, resulting in high cost.
  • a gesture communication method including: when a first terminal and a second terminal make a video call or an audio call, acquiring the first message sent by the first terminal or the second terminal A request, wherein the first request is used to request creation of a gesture recognition service, wherein the gesture recognition service is used to perform semantic recognition on gestures recognized in video frames collected by the first terminal; in response to the Create the gesture recognition service based on the first request; in the video call or audio call, obtain a group of gestures identified in a group of video frames collected by the first terminal; through the gesture recognition service, Perform semantic recognition on a group of gestures identified in a group of video frames collected by the first terminal to obtain target semantics represented by the group of gestures; and send the target semantics to the second terminal.
  • a gesture communication device including: a first acquisition module, configured to acquire the first terminal when the first terminal and the second terminal make a video call or an audio call Or the first request sent by the second terminal, where the first request is used to request creation of a gesture recognition service, where the gesture recognition service is used to identify Semantic recognition of the gesture; the first creation module is configured to create the gesture recognition service in response to the first request; the second acquisition module is configured to acquire the first gesture recognition service during the video call or audio call A group of gestures recognized in a group of video frames collected by the terminal; the recognition module is configured to perform semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal through the gesture recognition service, The target semantics represented by the group of gestures are obtained; the first sending module is configured to send the target semantics to the second terminal.
  • a computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any one of the above Steps in the method examples.
  • an electronic device including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the above-mentioned processor executes any of the above-mentioned tasks through the computer program. Steps in a method embodiment.
  • FIG. 1 is a block diagram of a mobile terminal hardware structure of a gesture communication method according to an embodiment of the disclosure
  • FIG. 2 is a flowchart of a gesture communication method according to an embodiment of the present disclosure
  • Fig. 3 is a gesture communication system structure and a media path diagram according to a specific embodiment of the present disclosure
  • Fig. 4 is an example diagram 1 of a gesture communication method according to a specific embodiment of the present disclosure.
  • Fig. 5 is a second example diagram of a gesture communication method according to a specific embodiment of the present disclosure.
  • Fig. 6 is a third example diagram of a gesture communication method according to a specific embodiment of the present disclosure.
  • Fig. 7 is a fourth example diagram of a gesture communication method according to a specific embodiment of the present disclosure.
  • Fig. 8 is a fifth example diagram of a gesture communication method according to a specific embodiment of the present disclosure.
  • Fig. 9 is a structural block diagram of a gesture communication device according to an embodiment of the disclosure.
  • Fig. 10 is a preferred structural block diagram 1 of a gesture communication device according to an embodiment of the present disclosure
  • FIG. 11 is a second preferred structural block diagram of a gesture communication device according to an embodiment of the present disclosure.
  • FIG. 1 is a block diagram of a mobile terminal hardware structure of a gesture communication method according to an embodiment of the present disclosure.
  • the mobile terminal may include one or more (only one is shown in Figure 1) processors 102 (processors 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA, etc.) and a memory 104 configured to store data, in an exemplary embodiment, the above-mentioned mobile terminal may further include a transmission device 106 and an input/output device 108 configured to communicate.
  • processors 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA, etc.
  • a memory 104 configured to store data
  • the above-mentioned mobile terminal may further include a transmission device 106 and an input/output device 108 configured to communicate.
  • the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal.
  • the mobile terminal may also include more or fewer components than those shown in FIG
  • the memory 104 can be set to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the gesture communication method in the embodiment of the present disclosure, and the processor 102 executes the computer program stored in the memory 104 by running the computer program.
  • the memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Transmission device 106 is configured to receive or transmit data via a network.
  • the specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal.
  • the transmission device 106 includes a network interface controller (NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is configured to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of a gesture communication method according to an embodiment of the present disclosure. As shown in FIG. 2 , the process includes the following steps:
  • Step S2002 when the first terminal and the second terminal make a video call or an audio call, obtain a first request sent by the first terminal or the second terminal, wherein the first request is used to request to create a gesture recognition service, wherein the gesture recognition service is used to perform semantic recognition on the gestures recognized in the video frames collected by the first terminal;
  • Step S2004 creating the gesture recognition service in response to the first request
  • Step S2006 during the video call or audio call, acquire a group of gestures identified in a group of video frames collected by the first terminal;
  • Step S2008 through the gesture recognition service, perform semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal, and obtain the target semantics represented by the group of gestures;
  • Step S2010 sending the target semantics to the second terminal.
  • the communication terminal can request the network-side device to create a gesture recognition service during a video call or audio call, and the gesture recognition service created by the network-side device can perform semantic recognition on the gestures recognized in the video frames collected by the communication terminal , without the need to complete gesture semantic recognition on the communication terminal through a specific device on the communication terminal, thus solving the technical problem in the related art that gesture communication mainly depends on specific devices and resulting in high costs, and achieving a reduction in the gesture communication process The technical effect of the cost in it further improves the user experience.
  • the executor of the above steps may be a network end, or a network side device, for example, a network device including a service control node, an application control node, and a media server, or a network device with a service control node function, an application control node function, and a media server function
  • the execution subject of the above steps may also be other processing devices or processing units with similar processing capabilities, but is not limited thereto.
  • the following is an example of performing the above operations on the network side (it is only an exemplary description, and other devices or modules may also be used to perform the above operations in actual operation):
  • the network side obtains the first request sent by the first terminal or the second terminal, and the first request is used to request to create a gesture recognition service, Recognizing gestures collected by the first terminal during a video call or audio call, specifically requesting recognition of a group of gestures identified in a group of video frames collected by the first terminal, of course, in practical applications, if The second terminal uses gestures to communicate.
  • the first request can be used to request recognition of the gestures collected by the second terminal.
  • the network After receiving the first request, the network creates a gesture recognition service.
  • the gesture recognition service is used for Recognize the above gestures; in a video or audio call, obtain a group of gestures identified in a group of video frames collected by the first terminal.
  • the video frame images collected by the first terminal can be obtained, and from Recognize a group of gestures in the frame image, and then use the gesture recognition service created above to perform semantic recognition on the group of gestures recognized from the video frame image to obtain the target semantics represented by a group of gestures, and then send the target semantics to to the second terminal.
  • the purpose of gesture communication in video or audio calls is realized , which avoids the problem in related technologies that needs to rely on specific devices or must be used in video calls to realize gesture communication, solves the problem in related technologies that gesture communication mainly depends on specific devices, resulting in high cost and poor experience, and achieves Broaden the application range of gesture communication and improve the effect of user experience.
  • the method further includes: acquiring a second request sent by the first terminal or the second terminal, where the second request is used to request creation of a target data channel; in response The second request is to create the target data channel, where the target data channel is a channel allowed to be used by the first terminal or the second terminal;
  • the first request sent by the terminal includes: obtaining the first request transmitted by the first terminal or the second terminal on the target data channel.
  • the second request sent by the first terminal or the second terminal can be obtained to create a target data channel.
  • usually The second request is initiated by a terminal that supports the use of the target data channel.
  • At least one of the first terminal and the second terminal supports the use of the target data channel, or both terminals support the use of the target data channel.
  • the above-mentioned first request is transmitted by the first terminal or the second terminal through the target data channel.
  • the obtaining the second request sent by the first terminal or the second terminal includes: obtaining the first terminal or the second terminal through the access control entity SBC/ The second request sent by the P-CSCF, the session control entity I/S-CSCF and the service control node to the media server; the creating the target data channel in response to the second request includes: responding to the The second request is to create the target data channel through the media server, where the target data channel is used to transmit data between the first terminal or the second terminal and the media server.
  • the second request is sent by the first terminal or the second terminal to the media server through the access control entity SBC/P-CSCF, the session control entity I/S-CSCF and the service control node, and in order to respond Based on the second request, a target data channel is created by the media server, and the target data channel is used to transmit data between the first terminal or the second terminal and the media server.
  • the purpose of establishing a dedicated data channel between the terminal and the media server is achieved.
  • the obtaining the first request transmitted by the first terminal or the second terminal on the target data channel includes: obtaining the first request transmitted by the first terminal or the second terminal The first request transmitted by the terminal to the application control node on the target data channel; the creation of the gesture recognition service in response to the first request includes: sending the service to the service by the application control node The control node sends a first instruction, wherein the first instruction is used to instruct the service control node to send a second instruction to the media server, and the second instruction is used to instruct the media server to create the gesture recognition service ; In response to the second instruction, create the gesture recognition service through the media server, or instruct a third-party service component to create the gesture recognition service through the media server.
  • the acquisition of the first request by the network end is the acquisition of the first request transmitted by the first terminal or the second terminal to the application control node on the target data channel; and in order to respond to the first request, the application control node sends
  • the service control node sends a first instruction to instruct the service control node to send a second instruction to the media server, the second instruction is used to instruct the media server to create a gesture recognition service, and then responds to the second instruction to create a gesture recognition service through the media server, or , instruct the third-party service component to create a gesture recognition service through the media server.
  • the purpose of creating a gesture recognition service is achieved.
  • the method further includes: sending a third instruction to the media server through the service control node, where the third instruction is used to request creation of a mixed media service, and the mixed media service is used to Processing the video stream, audio stream and data stream in the video call, or processing the audio stream and data stream in the audio call, the data stream is a data stream representing the target semantics; response In the third instruction, create the mixed media service through the media server, or instruct a third-party service component to create the mixed media service through the media server.
  • the service control node may request the media server to create the mixed media service, and then create the mixed media service through the media server, or the media server may instruct a third-party service component to create the mixed media service.
  • the gesture recognition service performs semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal, and obtains the gestures represented by the group of gestures.
  • Target semantics including: performing semantic recognition on the group of gestures recognized in a group of video frames collected by the first terminal through the gesture recognition service to obtain one or more semantics, wherein each of the The semantics are semantics expressed by one or more gestures in the group of gestures; based on the one or more semantics, the target semantics corresponding to the group of gestures are generated.
  • semantic recognition is performed on a group of gestures identified in the video frame images collected by the first terminal to obtain one or more semantic meanings, and then based on the one or more semantic meanings, a The complete target semantics for group gestures.
  • the purpose of converting gestures obtained from terminals using gestures for communication into target semantics is achieved.
  • the sending the target semantics to the second terminal includes: when the target semantics is the semantics formed by concatenating the one or more semantics, sending the target Each of the semantics included in the semantics is sent to the second terminal synchronously with corresponding video frames in the group of video frames; or, when the target semantics is composed of data corresponding to the group of video frames stream representation, and when the data stream is a text stream and an audio stream, the text stream is synchronously synthesized with the corresponding video frames in the group of video frames to obtain a target video stream; the target video stream is combined with the The audio stream is synchronously sent to the second terminal.
  • each semantics included in the target semantics is sent to the second terminal synchronously with the corresponding video frames in a group of video frames.
  • the The data stream representing the target semantics is sent to the second terminal synchronously with the video stream formed by the video frame through the target data channel; or, when the second terminal does not support the use of the target data channel, the data stream used to represent the target semantics
  • the text stream included in and the video frame are synchronously synthesized to obtain the target video stream, and then the target video stream and the audio stream are synchronously sent to the second terminal.
  • the second terminal when the second terminal supports the target data channel, Transmit the data stream through the target data channel, and send it to the second terminal synchronously with the video stream, and if the second terminal does not support the use of the target data channel, synthesize the text stream included in the data stream with the video frame, and then It is sent to the second terminal synchronously with the audio stream.
  • the method further includes: making the video call between the first terminal and the second terminal, and both the first terminal and the second terminal support the use of target data
  • a channel obtain the second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the target data channel, where the The target data channel includes a first target data channel and a second target data channel, the first target data channel is a data channel between the first terminal and the media server, and the second target data channel is the first target data channel A data channel between the second terminal and the media server;
  • the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first terminal on the first target data channel
  • the creating the gesture recognition service in response to the first request includes: sending a target instruction to the media server through a service control node in response to the first request, wherein, The target instruction is used to request to create a mixed media service and the gesture recognition service, the mixed media service is used to process the video stream, audio
  • the first terminal and the second terminal support the use of the target data channel
  • the first group of video frame images acquired by the first terminal are recognized
  • a group of gestures are semantically recognized to obtain the target semantics.
  • the first data stream used to represent the target semantics may include text streams and voice streams, that is, to convert gestures into voice or text, etc.
  • the mixed data stream provided by the media server Media service and gesture recognition service the first video stream, the first audio stream and the first data stream are synchronized, and then sent to the second terminal, and the first data stream is passed through the second target data channel (or called dedicated data channel) to the second terminal;
  • the second terminal uses a non-gesture communication method, that is, uses a normal video or voice communication method, and the second terminal is sent to the second terminal through a media server and/or a third-party service component.
  • the voice frame of the terminal is converted into a gesture stream and a target text stream, and the gesture stream, the target text stream and the video frames and audio frames collected by the second terminal are synchronously sent to the first target data channel (or called a dedicated data channel).
  • the first target data channel or called a dedicated data channel
  • a terminal Through this embodiment, when both the first terminal and the second terminal support the use of the target data channel, the purpose of using gestures for interactive communication at one end is achieved, and the gesture is converted into a data stream and then sent through
  • the method further includes: performing the video call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and the second terminal If the use of the target data channel is not supported, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the A target data channel, wherein the target data channel is a data channel between the first terminal and a media server; the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the The first request transmitted by the first terminal on the target data channel; the creating the gesture recognition service in response to the first request includes: in response to the first request, through a service control node Sending a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service, a composition service, and the gesture recognition service, and the mixed media service is used for the video stream, audio stream and data stream, the data stream is a data stream representing the target semantics; create the mixed media service, the composition
  • the obtained first Semantic recognition is performed on a group of gestures identified in the second group of video frame images collected by a terminal to obtain target semantics.
  • the first data stream used to represent the target semantics may include a first text stream and a voice stream, that is, gestures are converted into Voice or text, etc., after recognizing the semantics, through the synthesis service provided by the media server, the first text stream used to represent the target semantics and the video stream formed by the second group of video frames are synthesized to obtain the second video stream, and then through Mixed media service, synchronizing the second audio stream and the second video stream included in the data stream used to represent the target semantics, obtaining the synchronized second video stream and the second audio stream, and sending them to the second terminal;
  • a non-gesture communication method is adopted for the second terminal, that is, normal video or voice communication is adopted, and the voice frame of the second terminal is converted into a gesture stream and target text through a media server and/or a third-party service component.
  • the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel, the purpose of using gestures for interactive communication at one end is realized, and the conversion of gestures into text streams and video The stream is synthesized and then sent synchronously with the audio stream.
  • the method further includes: performing the video call between the first terminal and the second terminal, and the first terminal does not support the use of the target data channel and the second terminal
  • obtain a second request sent by the second terminal where the second request is used to request to create a target data channel
  • create the A target data channel wherein the target data channel is a data channel between the second terminal and the media server
  • the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the The first request transmitted by the second terminal on the target data channel
  • the creating the gesture recognition service in response to the first request includes: in response to the first request, through a service control node sending a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service and the gesture recognition service, and the mixed media service is used to process the video stream, audio stream and data in the video call
  • the data stream is a data stream representing the target semantics; the mixed media service and the gesture recognition service are created
  • the acquired first terminal collects Semantic recognition is performed on a group of gestures identified in the third group of video frame images to obtain the target semantics.
  • the third data stream used to represent the target semantics may include text streams and voice streams, that is, gestures are converted into voice or text, etc.
  • the media server provides mixed media services, performs synchronous processing on the third video stream, the third audio stream and the third data stream, and then sends them to the second terminal, and the third data stream is on the target data channel Sending;
  • non-gesture communication is used for the second terminal, that is, normal video or voice communication is used, and the voice frame of the second terminal is converted into a gesture stream through a media server and/or a third-party service component , the target text stream, and then, through the synthesis service provided by the media server, the gesture stream, the target text stream and the video frames collected by the second terminal are synthesized to obtain the target video stream, and the target video stream is combined with the second terminal
  • the collected audio frames are synchronously sent to the first terminal.
  • the purpose of using gestures for interactive communication at one end is achieved, and the gestures are converted into text streams and passed through the target The purpose of the data channel to send.
  • the method further includes: conducting the audio call between the first terminal and the second terminal, and both the first terminal and the second terminal support the use of target data
  • a channel obtain the second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the target data channel, where the The target data channel includes a first target data channel and a second target data channel, the first target data channel is a data channel between the first terminal and the media server, and the second target data channel is the first target data channel A data channel between the second terminal and the media server;
  • the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first terminal on the first target data channel
  • the creating the gesture recognition service in response to the first request includes: sending a target instruction to the media server through a service control node in response to the first request, wherein, The target instruction is used to request the creation of a mixed media service and the gesture recognition service, the mixed media service is used to process the audio stream and
  • the acquired gestures identified in the fourth group of video frame images collected by the first terminal are semantically recognized to obtain the target semantics.
  • the first data stream used to represent the target semantics may include text streams and voice streams, that is, to convert gestures into voice or text, etc.
  • the mixed data stream provided by the media server Media services and gesture recognition services synchronously process the audio stream formed by the fourth group of audio frames collected by the first terminal and the first data stream, and then send it to the second terminal, and the first data stream is passed through the second target data channel (or called a dedicated data channel) to the second terminal;
  • the second terminal adopts a non-gesture communication method, that is, communicates in a normal voice mode, through a media server and/or a third-party service
  • the component converts the voice frame of the second terminal into gesture stream and target text stream, and combines the gesture stream and target text stream with the video frames and/or
  • the audio frame is synchronously sent to the first terminal.
  • both the first terminal and the second terminal support the use of the target data channel, the purpose of using gestures for interactive communication at one end is achieved, and the gesture is converted into a data stream and then sent through the target data channel Purpose.
  • the method further includes: conducting the audio call between the first terminal and the second terminal, and the first terminal supports the use of a target data channel and the second terminal If the use of the target data channel is not supported, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the A target data channel, wherein the target data channel is a data channel between the first terminal and a media server; the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the The first request transmitted by the first terminal on the target data channel; the creating the gesture recognition service in response to the first request includes: in response to the first request, through a service control node sending a target instruction to the media server, wherein the target instruction is used to request creation of the gesture recognition service; creating the gesture recognition service through the media server, or instructing a third-party service component to create a gesture recognition service through the media server The gesture recognition service; during the video call or audio call, obtaining a group of gestures recognized
  • the gesture recognition service when the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel, after the gesture recognition service is created, the acquired fifth group of video frames collected by the first terminal A group of gestures identified in the image are subjected to semantic recognition to obtain the target semantics.
  • the data stream used to represent the target semantics can include text streams and voice streams, that is, gestures are converted into voice or text, etc. After recognizing the semantics, the target will be represented
  • the fifth audio stream of the voice is sent to the second terminal; in this embodiment, the second terminal adopts a non-gesture communication mode, that is, uses a normal voice mode to communicate, and the media server and/or third-party service component will send the fifth audio stream to the second terminal.
  • the voice frames of the second terminal are converted into a gesture stream and a target text stream, and the gesture stream, target text stream and the audio stream collected by the second terminal are synchronously sent to the first terminal through the target data channel (or called a dedicated data channel).
  • the target data channel or called a dedicated data channel.
  • Fig. 3 is a gesture communication system structure and media path diagram according to a specific embodiment of the present disclosure. As shown in Fig. 3, the system includes:
  • S101 terminal (type 1): a new type of terminal, type 1 is equivalent to the aforementioned terminal that supports the target data channel (hereinafter referred to as "type 1"), supports real-time audio and video stream channels, and also supports real-time data stream dedicated channel (dedicated data channel, corresponding to the aforementioned target data channel); in this disclosure, the terminal interacts with network-side entities through a dedicated data channel to provide end users with a new service experience, receives network-side data streams through a dedicated channel, and passes audio The video stream channel receives audio and video streams; in this disclosure, the terminal type can be an independent application program or a dedicated terminal device;
  • S102 terminal (type 2): a traditional terminal, type 2 is equivalent to the aforementioned terminal that does not support the target data channel (hereinafter referred to as "type 2"), and only supports real-time audio and video stream channels; the terminal is connected to the "SBC/P-CSCF "Entity interaction on the network side provides service experience for end users and receives audio and video streams through audio and video stream channels;
  • SBC/P-CSCF Provide signaling and media access for terminals, support audio and video stream channels and data stream channels, and forward audio and video streams and data streams;
  • I/S-CSCF Interrogating/Serving-CSCF (Call Session Control Function) query/service-call session control function, providing registration authentication, session control, call routing, etc. for multiple types of terminals in the IMS network
  • S105 Service Control Node As a signaling control network element of the gesture communication system, it undertakes the IMS call management capability and is responsible for controlling calls; as a service provider network element of gesture communication, it can call related services through the service bus, and other The application provides communication capabilities and service capabilities, and the service calls and controls the forwarding of various media data streams, including calling real-time audio and video streaming media forwarding and data stream forwarding;
  • the application control node can call the media server and third-party service components, apply for resources, realize gesture recognition translation to voice, gesture flow animation generation, synthesized audio and video media stream, and data stream integration. media stream. and notify of service results;
  • the service control node can exist independently, or it can be set up together with the application control node;
  • S106 Application Control Node implement various business service logics. Specific enhancements include but are not limited to: (1) The media stream and data stream category to be sent can be determined according to the application form of the terminal (version number, device type, specific label, etc.); It is converted into a real-time media stream; (2) Send an application control request to the service control node, and call a third-party service component and media server to realize image processing, gesture recognition, conversion, and synthesis; (3) It can communicate with the media server through the service bus Invoke various services provided and report the service results;
  • application control node can exist independently, or it can be co-established with the service control node.
  • S107 Media Server Provide various media services. Specific functions include but are not limited to: (1) image recognition, such as image recognition through feature data comparison, and gesture recognition; (2) real-time media stream generation services, such as converting voice clips into corresponding RTP media streams; (3) Real-time gesture stream generation, which automatically generates gesture stream video for recognized gestures; (4) synthesis service, which synthesizes and outputs existing and generated media streams and gesture streams (output to real-time audio and video streams), and converts video streams , gesture stream, and text stream are combined in the video stream; (5) real-time audio and video stream forwarding, anchoring, processing, and forwarding the audio and video stream of the current call; (6) data stream forwarding service, gesture stream, text stream Streaming and other data streams are forwarded through a dedicated data channel, and a dedicated channel is established for the synthesized integrated data stream for forwarding; (7) the service control node and the application control node can call various services provided by the media server through the service bus; ( 8) Mixed media service, which supports the processing of audio and video streams
  • S108 Third-party service component can be called by the service control node and the application control node, and provide gesture language translation, audio-to-text conversion services, etc.
  • S109HSS Provide user service data and other related content.
  • User UE A carries the terminal ID to initiate an audio or video call request to the IMS network, and calls UE B. Establish an audio or video call with UE B through SBC/P-CSCF, I/SCSCF, service control node and other network elements;
  • terminal (type 1) is a new type of terminal, it has real-time audio and video stream channels, and also has a dedicated channel for real-time data streams
  • terminal (type 2) is a traditional The terminal only supports real-time audio and video streaming channels
  • the terminal (type 1) user that supports the data flow channel applies to the "media server” to create a data channel through "SBC/P-CSCF", "I/SCSCF", and "service control node” resource;
  • the terminal (type 1, dedicated data channel) initiates a gesture recognition conversion request to the "application control node" through the data channel;
  • the "application control node” instructs the “service control node” to create gesture recognition resources
  • the "service control node” instructs the "media server” to create a mixed media service, which requires gesture recognition related services;
  • the “media server” applies for the gesture recognition service from the "third-party service component", and the mixed media service is created successfully.
  • the "service control node” invites UE A and UE B to join the conference respectively through Reinvite; applies to the "media server” for UE A and UE B membership resources;
  • UE A and UE B media are anchored to the "media server";
  • the "service control node” applies to the “media server” for processing such as gesture recognition, gesture translation business types and synthesis;
  • the "media server” applies to the "third-party service component” for services such as gesture recognition, gesture translation, speech-to-text, text-to-speech, gesture stream generation, voice stream generation, gesture stream, voice stream, text stream, video stream synthesis, forwarding, etc.
  • “Media Server” and “Third Party Service Components” perform corresponding services;
  • Media Server sends different stream information (synthetic and non-synthetic) to different terminal types UE A and UE B, including voice stream, video stream, gesture stream, text stream, etc.;
  • the "media server” returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
  • gesture user terminal type 1, with dedicated data channel
  • non-gesture user terminal type 1, with dedicated data channel
  • Figure 4 is an example of a gesture communication method according to a specific embodiment of the present disclosure.
  • User UE B takes a video call as an example for illustration:
  • Step S201 The gesture user UE A of the terminal (type 1) carries the terminal identifier to initiate a video call to the SBC/P-CSCF, and calls the non-gesture user UE B.
  • Inivite carries the SDP related information of terminal audio and video video and audio;
  • Step S202 SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
  • Step S203 The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
  • Steps S204-S206 make a video call to the non-gesture user UE B of the terminal (type 1);
  • Steps S207-S218 UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish a video call;
  • Steps S219-S229 UE A applies for the creation of data channel resources; UE A needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and reaches the "service control node” through SBC/P-CSCF and I/S-CSCF "; "Service Control Node” applies to the “Media Server” to create a UE A data channel; "Media Server” returns to the "Service Control Node” that the creation of the data channel is completed;
  • Step S230 UE A initiates a gesture recognition conversion request through the data channel
  • Step S231 the "application control node” instructs the “service control node” to create gesture recognition resources
  • Step S232 the "service control node” instructs the "media server” to create a mixed media service, which needs to use the gesture recognition service;
  • Step S233 the "media server” applies to the "third-party service component” for a gesture recognition service
  • Step S234 the "media server” returns to the "service control node” the success of creating the mixed media service
  • Steps S235 to S246 "Serving Control Node” invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node” sends a Reinvite message carrying SDP to UE B; UE B returns a 200OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE B.
  • the media of UE B is anchored to the media server;
  • Steps S247 to S258 "Serving Control Node” invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node” sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
  • Step S259 The "service control node” applies to the “media server” for gesture translation service types and synthesis processing;
  • Step S260 The "media server” applies to the "third-party service component" for voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, synthesis service, real-time audio and video stream forwarding , data flow forwarding and other services;
  • Steps S261-S264 The "media server” sends the media stream information of gesture stream, text stream, voice stream, and video stream to UE A;
  • the media stream information can be that the "media server” passes through the "service control node” and "application control node”. "to the SBC/PCSCF and then to the terminal; it can also be the “media server” to the SBC/PCSCF and then to the terminal through the "application control node”;
  • Step S265 The "media server” applies to the "third-party service component” for gesture translation, synthesis and forwarding services;
  • Steps S266-S268 The "media server” sends the media stream information of voice stream, text stream, and video stream to UE B; the media stream information can be sent from the “media server” to the SBC through the "service control node” and “application control node” /PCSCF and then to the terminal; it can also be that the "media server” passes through the "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S269 The "media server” returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
  • Non-gesture user terminal type 2, without dedicated data channel
  • gesture user terminal type 1, with dedicated data channel
  • Fig. 5 is a second example of a gesture communication method according to a specific embodiment of the present disclosure.
  • a non-gesture user UE A terminal type 2, no dedicated data channel
  • a gesture user UE B Terminal type 1, with a dedicated data channel
  • Step S301 The non-gesture user UE A of the terminal (type 2) carries the terminal identifier to initiate a video call to the SBC/P-CSCF, calls the gesture user UE B, and Inivite carries the SDP related information of the terminal audio and video video and audio;
  • Step S302 SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
  • Step S303 The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
  • Steps S304-S306 video call to the gesture user UE B of the terminal (type 1);
  • Steps S307-S318 UE B sends a 200OK message carrying the terminal ID, picks up the phone to answer, and UE A returns an ACK message; UE A and UE B establish a video call;
  • Steps S319-S329 UE B applies for the creation of data channel resources; UE B needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and reaches the "service control node” through SBC/P-CSCF and I/S-CSCF "; "Service Control Node” applies to the “Media Server” to create a UE B data channel; "Media Server” returns to the "Service Control Node” that the data channel creation is completed;
  • Step S330 UE B initiates a gesture recognition conversion request through the data channel
  • Step S331 the "application control node” instructs the “service control node” to create gesture recognition resources
  • Step S332 the "service control node” instructs the "media server” to create a mixed media service, which needs to use the gesture recognition service;
  • Step S333 the "media server” applies to the "third-party service component” for a gesture recognition service
  • Step S334 The "media server” returns to the "service control node” the success of creating the mixed media service:
  • Steps S335 to S346 "Serving Control Node” invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node” sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
  • Steps S347 to S358 "Serving Control Node” invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node” sends a Reinvite message carrying SDP to UE B; UE A returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
  • Step S359 The "service control node” applies to the “media server” for gesture translation service types and synthesis processing;
  • Step S360 The "media server” applies to the "third-party service component" for gesture translation, synthesis and forwarding services, voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, and synthesis services , real-time audio and video stream forwarding, data stream forwarding, etc.:
  • Steps S361-S362 The "media server” sends to UE A the real-time voice stream converted from gestures, and the media stream information including the video stream synthesized by video and text; To the SBC/PCSCF and then to the terminal; it can also be that the "media server” passes through the "service control node” and "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S363 The "media server” applies to the "third-party service component” for gesture stream generation, translation, synthesis and forwarding services;
  • Steps S364-S367 The "media server” sends the media stream information of gesture stream, voice stream, text stream, and video stream to UE B; the media stream information can be sent from the “media server” to the SBC/PCSCF via the "application control node". to the terminal; it can also be that the "media server” passes through the "service control node” and "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S368 The "media server” returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
  • Gesture user terminal type 2, without dedicated data channel
  • non-gesture user terminal type 1, with dedicated data channel
  • Fig. 6 is a third example of a gesture communication method according to a specific embodiment of the present disclosure.
  • a gesture user UE A terminal type 2, no dedicated data channel
  • a non-gesture user UE B Terminal type 1, with a dedicated data channel
  • Step S401 The gesture user UE A of the terminal (type 2) carries the terminal identifier to initiate a video call to the SBC/P-CSCF, and calls the non-gesture user UE B; Inivite carries the SDP related information of the terminal audio and video video and audio;
  • Step S402 SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
  • Step S403 The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
  • Steps S404-S406 make a video call to the non-gesture user UE B of the terminal (type 1);
  • Steps S407-S418 UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish a video call;
  • Steps S419-S429 UE B applies to create data channel resources; UE B needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and reaches the "service control node” through SBC/P-CSCF and I/S-CSCF "; "Service Control Node” applies to the “Media Server” to create a UE B data channel; "Media Server” returns to the "Service Control Node” that the data channel creation is completed;
  • Step S430 UE B initiates a gesture recognition conversion request through the data channel
  • Step S431 the "application control node” instructs the “service control node” to create gesture recognition resources
  • Step S432 the "service control node” instructs the "media server” to create a mixed media service, which needs to use the gesture recognition service;
  • Step S433 the "media server” applies to the "third-party service component” for a gesture recognition service
  • Step S434 the "media server” returns to the "service control node” the success of creating the mixed media service
  • Steps S435 to S446 "Serving Control Node” invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node” sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
  • Steps S447 to S458 "Serving Control Node” invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node” sends a Reinvite message carrying SDP to UE B; UE A returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
  • Step S459 The "service control node” applies to the “media server” for gesture translation service types and synthesis processing;
  • Step S460 The "media server” applies to the "third-party service component" for gesture translation, gesture stream generation, synthesis and forwarding services, speech-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, and real-time media stream generation , synthesis service, real-time audio and video stream forwarding, data stream forwarding, etc.;
  • Steps S461-S462 The "media server” sends to UE A the real-time voice stream converted from gestures, media stream information containing video, text, and video streams synthesized from video; the media stream information can be the Node" to SBC/PCSCF and then to terminal; it can also be “media server” to SBC/PCSCF and then to terminal through "service control node” and "application control node”;
  • Step S463 The "media server” applies to the "third-party service component” for gesture stream generation, translation, synthesis and forwarding services;
  • Steps S464-S466 The "media server” sends the media stream information of voice stream, text stream, and video stream to UE B; the media stream information can be from the “media server” to the SBC/PCSCF through the "application control node” and then to the terminal; It can also be that the "media server” passes through the "service control node” and "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S467 The "media server” returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
  • Gesture user terminal type 1, has dedicated data channel
  • non-gesture user terminal type 1, has dedicated data channel
  • Fig. 7 is an example of a gesture communication method according to a specific embodiment of the present disclosure Fig. 4, as shown in Fig. 7, in this embodiment, the gesture user UE A of the terminal (type 1) dials the non-gesture use of the terminal (type 1) Let UE B take an audio call as an example for illustration:
  • Step S501 The gesture user UE A of the terminal (type 1) carries the terminal identifier to initiate an audio call to the SBC/P-CSCF, and calls the non-gesture user UE B; Inivite carries the SDP related information of the terminal audio;
  • Step S502 SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
  • Step S503 The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
  • Steps S504-S506 making an audio call to the non-gesture user UE B of the terminal (type 1);
  • Steps S507-S518 UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish an audio call;
  • Steps S519-S529 UE A activates the gesture recognition application to open the camera, and applies for the creation of data channel resources; UE A needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and passes through SBC/P-CSCF, I/S- CSCF reaches the "service control node"; the "service control node” applies to the "media server” to create a UE A data channel; the “media server” returns the data channel creation to the "service control node”; the gesture recognition application will collect gesture data;
  • Step S530 UE A initiates a gesture recognition conversion request through the data channel
  • Step S531 the "application control node” instructs the “service control node” to create gesture recognition resources
  • Step S532 the "service control node” instructs the "media server” to create a mixed media service, which needs to use the gesture recognition service;
  • Step S533 The "media server” applies to the "third-party service component” for a gesture recognition service
  • Step S534 the "media server” returns to the "service control node” the success of creating the mixed media service
  • Steps S535 to S546 "Serving Control Node” invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node” sends a Reinvite message carrying SDP to UE B; UE B returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
  • Steps S547 to S558 "Serving Control Node” invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node” sends a Reinvite message carrying SDP to UE A; UE A returns a 200OK message carrying SDP information.
  • the "service control node” applies to the "media server” for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
  • Step S559 The "service control node” applies to the “media server” for gesture translation service types and synthesis processing;
  • Step S560 The "media server” applies to the "third-party service component" for voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, synthesis service, real-time audio stream forwarding, Services such as data flow forwarding;
  • Steps S561-S563 The "media server” sends gesture stream, text stream, and voice stream media stream information to UE A; the media stream information can be sent from the “media server” to the SBC/ PCSCF and then to the terminal; it can also be that the "media server” passes through the "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S564 The "media server” applies to the "third-party service component” for gesture translation stream synthesis and forwarding service;
  • Steps S565-S566 The "media server” sends the media stream information of voice stream and text stream to UE A; the media stream information can be transmitted from the “media server” to the SBC/PCSCF via the "service control node” and "application control node” to the terminal; it can also be that the "media server” passes through the "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S567 The "media server” returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
  • non-gesture user terminal type 2, no dedicated data channel
  • gesture user terminal type 1, with dedicated data channel
  • Fig. 8 is a fifth example of a gesture communication method according to a specific embodiment of the present disclosure.
  • a non-gesture user UE A terminal type 2, no dedicated data channel
  • a gesture user UE B Terminal type 1, with a dedicated data channel
  • Step S601 The non-gesture user UE A of the terminal (type 2) carries the terminal identifier to initiate an audio call to the SBC/P-CSCF, and calls the gesture user UE B; Inivite carries the SDP related information of the terminal audio;
  • Step S602 SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
  • Step S603 The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
  • Steps S604-S606 audio call to the gesture user UE B of the terminal (type 1);
  • Steps S607-S618 UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish an audio call;
  • Steps S619-S629 UE B activates the gesture recognition application, turns on the camera, and applies for creating a data channel resource; UE B needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and passes through SBC/P-CSCF, I/S -CSCF, reach "service control node”; “service control node” applies to “media server” to create UE B data channel; “media server” returns data channel creation to "service control node”; gesture recognition application will collect gesture data ;
  • Step S630 UE B initiates a gesture recognition conversion request through the data channel
  • Step S631 the "application control node” instructs the “service control node” to create gesture recognition resources
  • Step S632 the "service control node” instructs the "media server” to create a mixed media service, which needs to use the gesture recognition service;
  • Step S633 the "media server” applies to the "third-party service component” for a gesture recognition service
  • Step S634 the "media server” returns to the "service control node” the success of creating the mixed media service
  • Steps S635 to S646 "Serving Control Node” invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node” sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
  • Steps S647 to S658 "Serving Control Node” invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node” sends a Reinvite message carrying SDP to UE B; UE A returns a 200 OK message carrying SDP information; "Service The "control node” applies to the "media server” for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
  • Step S659 The "service control node” applies to the “media server” for gesture translation service types and synthesis processing;
  • Step S660 The "media server” applies to the "third-party service component" for gesture translation and forwarding services, voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, synthesis services, Real-time audio stream forwarding, data stream forwarding, etc.;
  • Step S661 The "media server” sends to UE A the media stream information of the real-time voice stream converted from the gesture; the media stream information can be from the “media server” to the SBC/PCSCF through the "application control node” and then to the terminal; it can also be The “media server” passes through the "service control node” and "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S662 The "media server” applies to the "third-party service component” for gesture stream generation, translation, synthesis and forwarding services;
  • Steps S663-S665 The "media server” sends the media stream information of gesture stream, voice stream, and text stream to UE B; the media stream information can be from the “media server” to the SBC/PCSCF through the "application control node” and then to the terminal; It can also be that the "media server” passes through the "service control node” and "application control node” to the SBC/PCSCF and then to the terminal;
  • Step S666 The "media server” returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
  • the goals that can be achieved include: 1) realizing the purpose of transmitting gesture information by using a dedicated data channel; 2) reducing the requirements for the terminal by performing gesture recognition on the network side, and the terminal only needs to be an integrated
  • a collection device such as a common mobile phone can be instructed by the gesture recognition application when an IMS call is established to collect gestures as required, and the collected gesture-related information is transmitted through a dedicated channel to initiate a gesture recognition request to the gesture recognition application server; 3) through The platform side provides comprehensive services, including gesture recognition, analysis, synthesis, etc., and transmits service information through a dedicated channel; 4) supports two-way conversion between sign language and voice/video, and identifies, analyzes, processes, and Data synthesis, after processing and rendering, synthesizes escaped text, sign language standard video and original voice/video stream; 5) supports the conversion of communication content between different terminal types; The information flow between terminals is converted to realize the purpose of gesture communication between different types of terminals.
  • the terminal type supporting the data channel can be an independent application program or
  • the achievable effects include: (1) real-time interaction, user communication is economical, convenient, usable, and effective.
  • This system uses 5G and 6G network dedicated channels to realize simultaneous transmission of multiple business streams through network-side mixed media mode, and a system and method for realizing gesture communication, which is economical, convenient, and rich in experience.
  • Communication wearable devices that no longer rely on features; traditional gesture recognition that relies on wearable devices is expensive and only suitable for interactions within a certain range.
  • the platform side provides comprehensive services, which can be connected to third-party service components; service expansion; interactive and immersive calls can be provided under the new architecture; (3) good security.
  • the data between the terminal and the network is transmitted through encrypted channels to prevent information leakage; (4) Support the conversion of communication content between different terminal types.
  • the platform side converts the information flow between different terminals by identifying different types of terminals, and realizes gesture communication between different types of terminals.
  • the specific beneficial effects include at least: 1) when the gesture user uses terminal type 1 and the non-gesture user (uses terminal type 1 or 2) to make a video call (the call can be that the gesture user dials a video call established by the non-gesture user, or It can be a non-gesture user dialing a video call established by the gesture user), and the gesture user or the non-gesture user using terminal type 1 can apply for gesture recognition conversion; the gesture user can receive and see the non-gesture user on the other end Standard gesture streaming video, text, original voice, and original video converted from voice; non-gesture users can hear and see the voice, text, and original call video converted by the gesture user's gesture, of which non-gesture users use When the terminal type is 1, what the non-gesture user receives and sees and hears is voice stream, text stream, and original video stream; when the non-gesture user uses the terminal type 2, what the non-gesture user receives and sees is voice 2) When the gesture user uses terminal type 2 and the non-gesture user (using terminal
  • the emergence of the fifth-generation communication technology provides users with mobile networks with higher bandwidth, lower latency, and wider coverage, and can provide more applications such as webcasting, virtual reality, and 4K video.
  • 5G technology will face five main application scenarios in the future: 1) Ultra-high-speed scenarios, providing ultra-fast data network access for future mobile broadband users; 2) Supporting large-scale crowds, providing high-quality mobile broadband experience for areas or occasions with high population density ;3) The best experience anytime and anywhere, ensuring that users can still enjoy high-quality services in the mobile state; 4) Ultra-reliable real-time connection, ensuring that new applications and user instances meet strict standards in terms of delay and reliability; 5) Nowhere The non-existent object communication ensures efficient handling of a large number of diverse device communications, including machine-type devices and sensors.
  • 3GPP Third Generation Partnership Project
  • IMS IP Multimedia Subsystem, Internet Protocol Multimedia Subsystem
  • Data Chanel Data Channel
  • a system and method for realizing gesture communication by using a dedicated data channel and using a mixed media method are provided, which can be applied to 5G and 6G networks; the following problems of gesture recognition or gesture translation in related technologies can be avoided Problems: 1) Most of the collection functions that have been realized are provided by specific wearable devices used on the terminal side. These devices are expensive and only suitable for interactions within a certain range. There are time and space constraints that are not economical, convenient, and usable.
  • Direct and natural interaction and communication 1) Some system functions such as gesture recognition, translation, and synthesis are provided by the terminal side, which has high requirements for the terminal; gesture recognition, translation, and synthesis are not provided by the network side, and information updates are not timely; 3) Cannot Realize the conversion between different terminal types; 4) Some technologies require that the communication parties must be in a video call to realize gesture communication, and the platform side needs to package the gesture content and send it back to the terminal, and the terminal sends it to the terminal on the other side; Realize the user's gesture communication during the voice call.
  • the terminal can turn on the camera through the "gesture recognition application" on the terminal side; during the call, the terminal can query the menu containing the gesture recognition function, and can initiate a gesture Recognition request; the terminal receives the video, gesture, and text information sent by the data channel, and these contents are displayed synchronously on the local mobile phone.
  • FIG. 9 is a structural block diagram of a gesture communication device according to an embodiment of the present disclosure. As shown in FIG. 9 , the device includes:
  • the first acquiring module 902 is configured to acquire a first request sent by the first terminal or the second terminal when the first terminal and the second terminal make a video call or an audio call, wherein the first request uses requesting to recognize gestures collected by the first terminal during the video call or audio call;
  • the first creation module 904 is configured to create a gesture recognition service in response to the first request, where the gesture recognition service is used to recognize the gesture collected by the first terminal;
  • the second obtaining module 906 is configured to obtain a group of gestures identified in a group of video frames collected by the first terminal during the video call or audio call;
  • the recognition module 908 is configured to perform semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal through the gesture recognition service, and obtain the target semantics represented by the group of gestures;
  • the first sending module 910 is configured to send the target semantics to the second terminal.
  • the above-mentioned device further includes: a third obtaining module 1002 and a second creating module 1004, as shown in FIG. 10 , which is a preferred structural block diagram of a gesture communication device according to an embodiment of the present disclosure.
  • the third obtaining module 1002 is configured to obtain a second request sent by the first terminal or the second terminal, wherein the second request is used to request to create a target data channel;
  • the second creating module 1004 It is configured to create the target data channel in response to the second request, where the target data channel is a channel allowed to be used by the first terminal or the second terminal;
  • the first obtaining module 902 includes: An acquiring unit configured to acquire the first request transmitted by the first terminal or the second terminal on the target data channel.
  • the third obtaining module 1002 includes: a second obtaining unit configured to obtain the first terminal or the second terminal through the access control entity SBC/P-CSCF, session control entity The second request sent by the I/S-CSCF and the service control node to the media server;
  • the above-mentioned second creation module 1004 includes: a first creation unit configured to respond to the second request and create the media server through the media server The target data channel, wherein the target data channel is used to transmit data between the first terminal or the second terminal and the media server.
  • the above-mentioned first acquiring unit includes: a first acquiring subunit configured to acquire all information transmitted by the first terminal or the second terminal to the application control node on the target data channel the first request;
  • the first creation module 904 includes: a first processing unit configured to send a first instruction to the service control node by the application control node, wherein the first instruction is used to indicate the service The control node sends a second instruction to the media server, the second instruction is used to instruct the media server to create the gesture recognition service; the second creation unit is configured to respond to the second instruction, through the media The server creates the gesture recognition service, or instructs a third-party service component to create the gesture recognition service through the media server.
  • the above-mentioned device further includes: a second sending module 1102 and a third creating module 1104, as shown in FIG. 11 , which is a preferred structural block diagram of a gesture communication device according to an embodiment of the present disclosure.
  • the second sending module 1102 is configured to send a third instruction to the media server through the service control node, wherein the third instruction is used to request the creation of a mixed media service, and the mixed media service is used for the video call Process the video stream, audio stream and data stream in the audio call, or process the audio stream and data stream in the audio call, the data stream is a data stream representing the target semantics;
  • the third creation module 1104 set to create the mixed media service through the media server in response to the third instruction, or instruct a third-party service component to create the mixed media service through the media server.
  • the recognition module 908 includes: a first recognition unit configured to recognize the group of gestures in a group of video frames collected by the first terminal through the gesture recognition service performing semantic recognition to obtain one or more semantics, wherein each of the semantics is the semantics expressed by one or more gestures in the group of gestures; the generation unit is configured to be based on the one or more semantics, The target semantics corresponding to the set of gestures are generated.
  • the above-mentioned first sending module 910 includes: a first sending unit configured to send the Each of the included semantics is sent to the second terminal synchronously with the corresponding video frames in the group of video frames; or, the synthesis unit is configured to be included when the target semantics is included with the group of video frames
  • the corresponding data stream represents and the data stream is a text stream and an audio stream
  • the text stream is synchronously synthesized with the corresponding video frames in the group of video frames to obtain the target video stream
  • the second sending unit It is set to send the target video stream and the audio stream to the second terminal synchronously.
  • the above device further includes: a fourth acquisition module, configured to perform the video call between the first terminal and the second terminal, and the first terminal and the second terminal When all terminals support the use of the target data channel, obtain the second request sent by the first terminal, where the second request is used to request the creation of the target data channel; the fourth creation module is configured to respond to the first Two requests, creating the target data channel, wherein the target data channel includes a first target data channel and a second target data channel, and the first target data channel is the data between the first terminal and the media server channel, the second target data channel is a data channel between the second terminal and the media server; the first acquiring module 902 includes: a third acquiring unit configured to acquire the The first request transmitted on the first target data channel; the above-mentioned first creation module 904 includes: a second processing unit configured to respond to the first request and send a target instruction to the media server through a service control node, Wherein, the target instruction is used to request to create a mixed media service and the gesture recognition service,
  • the above-mentioned device further includes: a first processing module configured to form the first group of video frames through the mixed media service The first video stream, the first audio stream formed by the first group of audio frames, and the first data stream used to represent the target semantics are synchronized to obtain the synchronized first video stream, the first audio stream and the first data stream;
  • the first sending module 910 includes: a third sending unit configured to send the synchronized first video stream, the first audio stream and the first data stream to The second terminal, wherein the synchronized first data stream is sent on the second target data channel.
  • the above device further includes: a fifth acquisition module, configured to perform the video call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and when the second terminal does not support the use of the target data channel, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel;
  • the fifth creating module It is configured to create the target data channel in response to the second request, wherein the target data channel is a data channel between the first terminal and the media server;
  • the first obtaining module 902 includes: a fifth obtaining A unit configured to obtain the first request transmitted by the first terminal on the target data channel;
  • the above-mentioned first creation module 904 includes: a third processing unit configured to respond to the first request through a service The control node sends a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service, a composition service and the gesture recognition service, and the mixed media service is used for the video stream in the video call , audio stream and
  • the second audio stream included in the data stream used to represent the target semantics is synchronized with the second video stream to obtain the synchronized second video stream and the second audio stream , wherein, the data stream includes the first text stream;
  • the first sending module 910 includes: a fourth sending unit configured to send the synchronized second video stream and the second audio stream to the first Two terminals.
  • the above device further includes: a sixth acquisition module, configured to conduct the video call between the first terminal and the second terminal, and the first terminal does not support the use of target data When the channel and the second terminal support the use of the target data channel, obtain a second request sent by the second terminal, where the second request is used to request to create a target data channel; the sixth creation module, It is configured to create the target data channel in response to the second request, wherein the target data channel is a data channel between the second terminal and the media server; the above-mentioned first obtaining module 902 includes: a seventh obtaining A unit configured to obtain the first request transmitted by the second terminal on the target data channel; the above-mentioned first creation module 904 includes: a fourth processing unit configured to respond to the first request through a service The control node sends a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service and the gesture recognition service, and the mixed media service is used for the video stream and audio stream in the video call and
  • the above device further includes: a seventh acquisition module, configured to conduct the audio call between the first terminal and the second terminal, and the first terminal and the second terminal When all the terminals support the use of the target data channel, obtain the second request sent by the first terminal, where the second request is used to request the creation of the target data channel; the seventh creation module is configured to respond to the first Two requests, creating the target data channel, wherein the target data channel includes a first target data channel and a second target data channel, and the first target data channel is the data between the first terminal and the media server channel, the second target data channel is a data channel between the second terminal and the media server; the first acquisition module 902 includes: a ninth acquisition unit configured to acquire the The first request transmitted on the first target data channel; the above-mentioned first creation module 904 includes: a fifth processing unit configured to respond to the first request and send a target instruction to the media server through a service control node, Wherein, the target instruction is used to request to create a mixed media service and the gesture recognition service, and
  • the above device further includes: an eighth acquisition module, configured to conduct the audio call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and when the second terminal does not support the use of the target data channel, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel; an eighth creating module, It is configured to create the target data channel in response to the second request, wherein the target data channel is a data channel between the first terminal and the media server;
  • the first obtaining module 902 includes: eleventh An acquisition unit configured to acquire the first request transmitted by the first terminal on the target data channel;
  • the first creation module 904 includes: a sixth processing unit configured to respond to the first request by The service control node sends a target instruction to the media server, where the target instruction is used to request creation of the gesture recognition service; a seventh creation unit is configured to create the gesture recognition service through the media server, or, through The media server instructs the third-party service component to create the gesture recognition service;
  • the above-mentioned first sending module 910 includes: a seventh sending unit, configured to represent the Send the fifth audio stream of the target semantics to the second terminal.
  • the above-mentioned modules can be realized by software or hardware. For the latter, it can be realized by the following methods, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules can be combined in any combination The forms of are located in different processors.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.
  • the above-mentioned computer-readable storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.
  • ROM read-only memory
  • RAM random access memory
  • mobile hard disk magnetic disk or optical disk and other media that can store computer programs.
  • Embodiments of the present disclosure also provide an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
  • the electronic device may further include a transmission device and an input and output device, wherein the transmission device is connected to the processor, and the input and output device is connected to the processor.
  • each module or each step of the above-mentioned disclosure can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present disclosure is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Embodiments of the present disclosure provide a gesture-based communication method and apparatus, a storage medium, and an electronic apparatus. The method comprises: when a first terminal and a second terminal make a video call or an audio call, acquiring a first request sent by the first terminal or the second terminal, the first request being used for requesting creation of a gesture recognition service; in response to the first request, creating the gesture recognition service; acquiring a group of gestures recognized from a group of video frames collected by the first terminal; performing, by means of the gesture recognition service, semantic recognition on the group of gestures recognized from the group of video frames collected by the first terminal to obtain target semantics represented by the group of gestures; and sending the target semantics to the second terminal.

Description

手势通信方法、装置、存储介质及电子装置Gesture communication method, device, storage medium and electronic device 技术领域technical field
本公开涉及通信领域,具体而言,涉及一种手势通信方法、装置、存储介质及电子装置。The present disclosure relates to the communication field, and in particular, to a gesture communication method, device, storage medium and electronic device.
背景技术Background technique
手势在生活中经常用到,手势使用者如特殊人群聋哑人士在与正常人的沟通交流中存在着较大的障碍。他们的手势作为一种交流语言(手语)极其难懂,非专业人士,正常人很难准确的识别聋哑人的手势:聋哑用户拨打各类公共服务电话(119,110,120等)时,公共服务人员无法直接理解聋哑用户想要表达的内容;聋哑用户参与线上教学时,聋哑用户无法通过简单的方式与老师进行实时交互;聋哑用户与正常用户打电话无法进行直接的正常交流等。这需要对聋哑人的手势(手语)进行识别和翻译以及传递通讯。还有一些特定应用场景下的手势使用者如军事手语,特殊行业专用手语,也需要进行相应的识别和翻译。Gestures are often used in daily life. Gesture users, such as deaf-mute people in special groups, have great obstacles in communicating with normal people. Their gestures are extremely difficult to understand as a communication language (sign language), and it is difficult for non-professionals and normal people to accurately recognize the gestures of deaf-mute people: when deaf-mute users dial various public service calls (119, 110, 120, etc.) , public service personnel cannot directly understand what deaf-mute users want to express; when deaf-mute users participate in online teaching, deaf-mute users cannot interact with teachers in real time in a simple way; deaf-mute users cannot directly communicate with normal users on the phone. normal communication, etc. This requires the recognition and translation of gestures (sign language) of the deaf and the delivery of communication. There are also some gesture users in specific application scenarios, such as military sign language and sign language for special industries, which also need to be recognized and translated accordingly.
但目前对手势识别多数都依赖特定设备如穿戴设备手套等。这些设备价格昂贵,只适用于一定范围内的交互,还经常存在时间、空间等限制,不是直接自然的交互和通讯;还有一部分基于视觉的手势识别,依赖特定采集器如体感器去收集手势数据和分析数据,进行基本的电话通话,依赖终端设备,对终端处理有较高要求,不够经济、便捷,信息及数据更新不及时,通讯体验较差。But at present, most of the gesture recognition relies on specific equipment such as wearing equipment gloves. These devices are expensive and are only suitable for interaction within a certain range. There are often limitations in time and space, and they are not direct and natural interaction and communication. There are also some gesture recognition based on vision, which rely on specific collectors such as somatosensors to collect gestures. Data and analysis data, basic phone calls, rely on terminal equipment, have high requirements for terminal processing, not economical and convenient, information and data updates are not timely, and communication experience is poor.
针对相关技术中存在的手势通信主要依赖于特定设备而导致成本高的技术问题,目前尚未提出有效的解决方案。Aiming at the technical problem in the related art that the gesture communication mainly depends on a specific device, resulting in high cost, no effective solution has been proposed yet.
发明内容Contents of the invention
本公开实施例提供了一种手势通信方法、装置、存储介质及电子装置,以至少解决相关技术中存在的手势通信主要依赖于特定设备而导致成本高的技术问题。Embodiments of the present disclosure provide a gesture communication method, device, storage medium, and electronic device, so as to at least solve the technical problem in the related art that gesture communication mainly depends on specific equipment, resulting in high cost.
根据本公开实施例的一个方面,提供了一种手势通信方法,包括:在第一终端和第二终端进行视频通话或音频通话时,获取所述第一终端或所述第二终端发送的第一请求,其中,所述第一请求用于请求创建手势识别服务,其中,所述手势识别服务用于对所述第一终端采集到的视频帧中识别出的手势进行语义识别;响应于所述第一请求,创建所述手势识别服务;在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势;通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义;将所述目标语义发送给所述第二终端。According to an aspect of an embodiment of the present disclosure, a gesture communication method is provided, including: when a first terminal and a second terminal make a video call or an audio call, acquiring the first message sent by the first terminal or the second terminal A request, wherein the first request is used to request creation of a gesture recognition service, wherein the gesture recognition service is used to perform semantic recognition on gestures recognized in video frames collected by the first terminal; in response to the Create the gesture recognition service based on the first request; in the video call or audio call, obtain a group of gestures identified in a group of video frames collected by the first terminal; through the gesture recognition service, Perform semantic recognition on a group of gestures identified in a group of video frames collected by the first terminal to obtain target semantics represented by the group of gestures; and send the target semantics to the second terminal.
根据本公开实施例的又一个方面,还提供了一种手势通信装置,包括:第一获取模块,设置为在第一终端和第二终端进行视频通话或音频通话时,获取所述第一终端或所述第二终端发送的第一请求,其中,所述第一请求用于请求创建手势识别服务,其中,所述手势识别服务用于对所述第一终端采集到的视频帧中识别出的手势进行语义识别;第一创建模块,设置为响应于所述第一请求,创建所述手势识别服务;第二获取模块,设置为在所述视频通话 或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势;识别模块,设置为通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义;第一发送模块,设置为将所述目标语义发送给所述第二终端。According to still another aspect of the embodiments of the present disclosure, there is also provided a gesture communication device, including: a first acquisition module, configured to acquire the first terminal when the first terminal and the second terminal make a video call or an audio call Or the first request sent by the second terminal, where the first request is used to request creation of a gesture recognition service, where the gesture recognition service is used to identify Semantic recognition of the gesture; the first creation module is configured to create the gesture recognition service in response to the first request; the second acquisition module is configured to acquire the first gesture recognition service during the video call or audio call A group of gestures recognized in a group of video frames collected by the terminal; the recognition module is configured to perform semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal through the gesture recognition service, The target semantics represented by the group of gestures are obtained; the first sending module is configured to send the target semantics to the second terminal.
根据本公开实施例的又一个方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被处理器执行时实现上述任一项方法实施例中的步骤。According to still another aspect of the embodiments of the present disclosure, there is also provided a computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, any one of the above Steps in the method examples.
根据本公开实施例的又一个方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过计算机程序执行上述任一项方法实施例中的步骤。According to yet another aspect of the embodiments of the present disclosure, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the above-mentioned processor executes any of the above-mentioned tasks through the computer program. Steps in a method embodiment.
附图说明Description of drawings
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示例性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present disclosure, and constitute a part of the present application. The exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure, and do not constitute improper limitations to the present disclosure. In the attached picture:
图1是本公开实施例的手势通信方法的移动终端硬件结构框图;FIG. 1 is a block diagram of a mobile terminal hardware structure of a gesture communication method according to an embodiment of the disclosure;
图2是根据本公开实施例的手势通信方法流程图;FIG. 2 is a flowchart of a gesture communication method according to an embodiment of the present disclosure;
图3是根据本公开具体实施例的手势通信系统结构和媒体路径图;Fig. 3 is a gesture communication system structure and a media path diagram according to a specific embodiment of the present disclosure;
图4是根据本公开具体实施例的手势通信方法示例图一;Fig. 4 is an example diagram 1 of a gesture communication method according to a specific embodiment of the present disclosure;
图5是根据本公开具体实施例的手势通信方法示例图二;Fig. 5 is a second example diagram of a gesture communication method according to a specific embodiment of the present disclosure;
图6是根据本公开具体实施例的手势通信方法示例图三;Fig. 6 is a third example diagram of a gesture communication method according to a specific embodiment of the present disclosure;
图7是根据本公开具体实施例的手势通信方法示例图四;Fig. 7 is a fourth example diagram of a gesture communication method according to a specific embodiment of the present disclosure;
图8是根据本公开具体实施例的手势通信方法示例图五;Fig. 8 is a fifth example diagram of a gesture communication method according to a specific embodiment of the present disclosure;
图9是根据本公开实施例的手势通信装置的结构框图;Fig. 9 is a structural block diagram of a gesture communication device according to an embodiment of the disclosure;
图10是根据本公开实施例的手势通信装置的优选的结构框图一;Fig. 10 is a preferred structural block diagram 1 of a gesture communication device according to an embodiment of the present disclosure;
图11是根据本公开实施例的手势通信装置的优选的结构框图二。FIG. 11 is a second preferred structural block diagram of a gesture communication device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。In order to enable those skilled in the art to better understand the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is an embodiment of a part of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the specification and claims of the present disclosure and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
本申请实施例中所提供的方法实施例可以在移动终端、计算机终端或者类似的运算装置 中执行。以运行在移动终端上为例,图1是本公开实施例的手势通信方法的移动终端硬件结构框图。如图1所示,移动终端可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)和设置为存储数据的存储器104,在一个示例性实施例中,上述移动终端还可以包括设置为通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。The method embodiments provided in the embodiments of this application can be executed in mobile terminals, computer terminals or similar computing devices. Taking running on a mobile terminal as an example, FIG. 1 is a block diagram of a mobile terminal hardware structure of a gesture communication method according to an embodiment of the present disclosure. As shown in Figure 1, the mobile terminal may include one or more (only one is shown in Figure 1) processors 102 (processors 102 may include but not limited to processing devices such as microprocessor MCU or programmable logic device FPGA, etc.) and a memory 104 configured to store data, in an exemplary embodiment, the above-mentioned mobile terminal may further include a transmission device 106 and an input/output device 108 configured to communicate. Those skilled in the art can understand that the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above mobile terminal. For example, the mobile terminal may also include more or fewer components than those shown in FIG. 1 , or have a different configuration from that shown in FIG. 1 .
存储器104可设置为存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例中的手势通信方法对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be set to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the gesture communication method in the embodiment of the present disclosure, and the processor 102 executes the computer program stored in the memory 104 by running the computer program. Various functional applications and data processing are to realize the above-mentioned method. The memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory that is remotely located relative to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
传输设备106设置为经由一个网络接收或者发送数据。上述的网络具体实例可包括移动终端的通信供应商提供的无线网络。在一个实例中,传输设备106包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输设备106可以为射频(Radio Frequency,简称为RF)模块,其设置为通过无线方式与互联网进行通讯。 Transmission device 106 is configured to receive or transmit data via a network. The specific example of the above network may include a wireless network provided by the communication provider of the mobile terminal. In one example, the transmission device 106 includes a network interface controller (NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet. In an example, the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is configured to communicate with the Internet in a wireless manner.
在本实施例中提供了一种手势通信方法,图2是根据本公开实施例的手势通信方法流程图,如图2所示,该流程包括如下步骤:In this embodiment, a gesture communication method is provided. FIG. 2 is a flowchart of a gesture communication method according to an embodiment of the present disclosure. As shown in FIG. 2 , the process includes the following steps:
步骤S2002,在第一终端和第二终端进行视频通话或音频通话时,获取所述第一终端或所述第二终端发送的第一请求,其中,所述第一请求用于请求创建手势识别服务,其中,所述手势识别服务用于对所述第一终端采集到的视频帧中识别出的手势进行语义识别;Step S2002, when the first terminal and the second terminal make a video call or an audio call, obtain a first request sent by the first terminal or the second terminal, wherein the first request is used to request to create a gesture recognition service, wherein the gesture recognition service is used to perform semantic recognition on the gestures recognized in the video frames collected by the first terminal;
步骤S2004,响应于所述第一请求,创建所述手势识别服务;Step S2004, creating the gesture recognition service in response to the first request;
步骤S2006,在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势;Step S2006, during the video call or audio call, acquire a group of gestures identified in a group of video frames collected by the first terminal;
步骤S2008,通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义;Step S2008, through the gesture recognition service, perform semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal, and obtain the target semantics represented by the group of gestures;
步骤S2010,将所述目标语义发送给所述第二终端。Step S2010, sending the target semantics to the second terminal.
通过上述步骤,通信终端可以在视频通话或音频通话时请求网络侧设备创建手势识别服务,并通过网络侧设备创建的手势识别服务可以对通信终端采集到的视频帧中识别出的手势进行语义识别,而不需要在通信终端上通过在通信终端上的特定设备完成手势语义识别,从而解决了相关技术中存在的手势通信主要依赖于特定设备而导致成本高的技术问题,达到了降低手势通信过程中的成本的技术效果,进一步提升用户体验度。Through the above steps, the communication terminal can request the network-side device to create a gesture recognition service during a video call or audio call, and the gesture recognition service created by the network-side device can perform semantic recognition on the gestures recognized in the video frames collected by the communication terminal , without the need to complete gesture semantic recognition on the communication terminal through a specific device on the communication terminal, thus solving the technical problem in the related art that gesture communication mainly depends on specific devices and resulting in high costs, and achieving a reduction in the gesture communication process The technical effect of the cost in it further improves the user experience.
其中,上述步骤的执行主体可以为网络端,或网络侧设备,例如,包括服务控制节点、应用控制节点及媒体服务器的网络设备,或具备服务控制节点功能、应用控制节点功能及媒体服务器功能的其它网络设备,上述步骤的执行主体还可以是其他的具备类似处理能力的处理设备或处理单元等,但不限于此。下面以网络端执行上述操作为例(仅是一种示例性说明, 在实际操作中还可以是其他的设备或模块来执行上述操作)进行说明:Wherein, the executor of the above steps may be a network end, or a network side device, for example, a network device including a service control node, an application control node, and a media server, or a network device with a service control node function, an application control node function, and a media server function For other network devices, the execution subject of the above steps may also be other processing devices or processing units with similar processing capabilities, but is not limited thereto. The following is an example of performing the above operations on the network side (it is only an exemplary description, and other devices or modules may also be used to perform the above operations in actual operation):
在上述实施例中,在第一终端和第二终端进行视频通话或音频通话时,网络端获取第一终端或第二终端发送的第一请求,该第一请求用于请求创建手势识别服务,对在视频通话或音频通话中第一终端采集到的手势进行识别,具体的是请求对第一终端采集的一组视频帧中识别出的一组手势进行识别,当然,在实际应用中,如果是由第二终端采用手势进行通信的,该第一请求可用于请求对第二终端采集到的手势进行识别,接收到第一请求之后,网络端创建手势识别服务,该手势识别服务是用于对上述手势进行识别;在视频或音频通话中,获取第一终端采集的一组视频帧中识别出的一组手势,在实际应用中,可获取第一终端采集到的视频帧图像,并从帧图像中识别出一组手势,再通过上述创建的手势识别服务,对从视频帧图像中识别出的一组手势进行语义识别,得到一组手势所表示的目标语义,然后,将目标语义发送给第二终端。通过对从第一终端采集到视频帧图像中识别出的手势进行识别以得到手势所表示的目标语义,并将目标语义发送给第二终端,实现了在视频或音频通话中进行手势通信的目的,避免了相关技术中需要依赖于特定设备或者必须在视频通话中才能实现手势通信的问题,解决了相关技术中存在的手势通信主要依赖于特定设备而导致成本高及体验差的问题,达到了拓宽手势通信的应用范围及提升用户体验的效果。In the above embodiment, when the first terminal and the second terminal make a video call or an audio call, the network side obtains the first request sent by the first terminal or the second terminal, and the first request is used to request to create a gesture recognition service, Recognizing gestures collected by the first terminal during a video call or audio call, specifically requesting recognition of a group of gestures identified in a group of video frames collected by the first terminal, of course, in practical applications, if The second terminal uses gestures to communicate. The first request can be used to request recognition of the gestures collected by the second terminal. After receiving the first request, the network creates a gesture recognition service. The gesture recognition service is used for Recognize the above gestures; in a video or audio call, obtain a group of gestures identified in a group of video frames collected by the first terminal. In practical applications, the video frame images collected by the first terminal can be obtained, and from Recognize a group of gestures in the frame image, and then use the gesture recognition service created above to perform semantic recognition on the group of gestures recognized from the video frame image to obtain the target semantics represented by a group of gestures, and then send the target semantics to to the second terminal. By recognizing the gestures recognized in the video frame images collected from the first terminal to obtain the target semantics represented by the gestures, and sending the target semantics to the second terminal, the purpose of gesture communication in video or audio calls is realized , which avoids the problem in related technologies that needs to rely on specific devices or must be used in video calls to realize gesture communication, solves the problem in related technologies that gesture communication mainly depends on specific devices, resulting in high cost and poor experience, and achieves Broaden the application range of gesture communication and improve the effect of user experience.
在一个可选的实施例中,所述方法还包括:获取所述第一终端或所述第二终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道为所述第一终端或所述第二终端允许使用的通道;所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端或所述第二终端在所述目标数据通道上传输的所述第一请求。在本实施例中,在第一终端和第二终端进行视频通话或音频通话中,可获取第一终端或第二终端发送的第二请求,以创建目标数据通道,在实际应用中,通常是由支持使用目标数据通道的终端发起第二请求,第一终端和第二终端中至少有一个终端是支持使用目标数据通道的,也可以是两个终端均支持使用目标数据通道,上述第一请求是由第一终端或第二终端通过目标数据通道传输的。通过本实施例,实现了创建数据通道的目的,以及通过数据通道传输第一请求的目的。In an optional embodiment, the method further includes: acquiring a second request sent by the first terminal or the second terminal, where the second request is used to request creation of a target data channel; in response The second request is to create the target data channel, where the target data channel is a channel allowed to be used by the first terminal or the second terminal; The first request sent by the terminal includes: obtaining the first request transmitted by the first terminal or the second terminal on the target data channel. In this embodiment, during the video call or audio call between the first terminal and the second terminal, the second request sent by the first terminal or the second terminal can be obtained to create a target data channel. In practical applications, usually The second request is initiated by a terminal that supports the use of the target data channel. At least one of the first terminal and the second terminal supports the use of the target data channel, or both terminals support the use of the target data channel. The above-mentioned first request is transmitted by the first terminal or the second terminal through the target data channel. Through this embodiment, the purpose of creating a data channel and the purpose of transmitting the first request through the data channel are achieved.
在一个可选的实施例中,所述获取所述第一终端或所述第二终端发送的第二请求,包括:获取所述第一终端或所述第二终端通过接入控制实体SBC/P-CSCF、会话控制实体I/S-CSCF以及服务控制节点向媒体服务器发送的所述第二请求;所述响应于所述第二请求,创建所述目标数据通道,包括:响应于所述第二请求,通过所述媒体服务器创建所述目标数据通道,其中,所述目标数据通道用于在所述第一终端或所述第二终端与所述媒体服务器之间传输数据。在本实施例中,第二请求是由第一终端或第二终端通过接入控制实体SBC/P-CSCF、会话控制实体I/S-CSCF以及服务控制节点向媒体服务器发送的,而为了响应于该第二请求,通过媒体服务器创建目标数据通道,该目标数据通道用于在第一终端或第二终端与媒体服务器之间传输数据。通过本实施例,实现了在终端与媒体服务器之间建立专用数据通道的目的。In an optional embodiment, the obtaining the second request sent by the first terminal or the second terminal includes: obtaining the first terminal or the second terminal through the access control entity SBC/ The second request sent by the P-CSCF, the session control entity I/S-CSCF and the service control node to the media server; the creating the target data channel in response to the second request includes: responding to the The second request is to create the target data channel through the media server, where the target data channel is used to transmit data between the first terminal or the second terminal and the media server. In this embodiment, the second request is sent by the first terminal or the second terminal to the media server through the access control entity SBC/P-CSCF, the session control entity I/S-CSCF and the service control node, and in order to respond Based on the second request, a target data channel is created by the media server, and the target data channel is used to transmit data between the first terminal or the second terminal and the media server. Through this embodiment, the purpose of establishing a dedicated data channel between the terminal and the media server is achieved.
在一个可选的实施例中,所述获取所述第一终端或所述第二终端在所述目标数据通道上传输的所述第一请求,包括:获取所述第一终端或所述第二终端在所述目标数据通道上向应用控制节点传输的所述第一请求;所述响应于所述第一请求,创建所述手势识别服务,包括:由所述应用控制节点向所述服务控制节点发出第一指令,其中,所述第一指令用于指示所述服务控制节点向所述媒体服务器发出第二指令,所述第二指令用于指示所述媒体服务器创建 所述手势识别服务;响应于所述第二指令,通过所述媒体服务器创建所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述手势识别服务。在本实施例中,网络端获取第一请求是获取由第一终端或第二终端在目标数据通道上向应用控制节点传输的第一请求;而为了响应于第一请求,由应用控制节点向服务控制节点发出第一指令,以指示服务控制节点向媒体服务器发出第二指令,第二指令用于指示媒体服务器创建手势识别服务,再响应于第二指令,通过媒体服务器创建手势识别服务,或者,通过媒体服务器指示第三方服务组件创建手势识别服务。通过本实施例,实现了创建手势识别服务的目的。In an optional embodiment, the obtaining the first request transmitted by the first terminal or the second terminal on the target data channel includes: obtaining the first request transmitted by the first terminal or the second terminal The first request transmitted by the terminal to the application control node on the target data channel; the creation of the gesture recognition service in response to the first request includes: sending the service to the service by the application control node The control node sends a first instruction, wherein the first instruction is used to instruct the service control node to send a second instruction to the media server, and the second instruction is used to instruct the media server to create the gesture recognition service ; In response to the second instruction, create the gesture recognition service through the media server, or instruct a third-party service component to create the gesture recognition service through the media server. In this embodiment, the acquisition of the first request by the network end is the acquisition of the first request transmitted by the first terminal or the second terminal to the application control node on the target data channel; and in order to respond to the first request, the application control node sends The service control node sends a first instruction to instruct the service control node to send a second instruction to the media server, the second instruction is used to instruct the media server to create a gesture recognition service, and then responds to the second instruction to create a gesture recognition service through the media server, or , instruct the third-party service component to create a gesture recognition service through the media server. Through this embodiment, the purpose of creating a gesture recognition service is achieved.
在一个可选的实施例中,所述方法还包括:通过服务控制节点向媒体服务器发送第三指令,其中,所述第三指令用于请求创建混合媒体服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,或者用于对所述音频通话中的音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;响应于所述第三指令,通过所述媒体服务器创建所述混合媒体服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务。在本实施例中,可由服务控制节点向媒体服务器请求创建混合媒体服务,再通过媒体服务器创建混合媒体服务,或者,由媒体服务器指示第三方服务组件创建混合媒体服务。通过本实施例,实现了创建混合媒体服务的目的,也为了在后续的手势通信过程中对相关音视频流、数据流进行处理做好了准备。In an optional embodiment, the method further includes: sending a third instruction to the media server through the service control node, where the third instruction is used to request creation of a mixed media service, and the mixed media service is used to Processing the video stream, audio stream and data stream in the video call, or processing the audio stream and data stream in the audio call, the data stream is a data stream representing the target semantics; response In the third instruction, create the mixed media service through the media server, or instruct a third-party service component to create the mixed media service through the media server. In this embodiment, the service control node may request the media server to create the mixed media service, and then create the mixed media service through the media server, or the media server may instruct a third-party service component to create the mixed media service. Through this embodiment, the purpose of creating a mixed media service is achieved, and preparations are made for processing related audio and video streams and data streams in the subsequent gesture communication process.
在一个可选的实施例中,所述通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义,包括:通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的所述一组手势进行语义识别,得到一个或多个语义,其中,每个所述语义是所述一组手势中的一个或多个手势所表达的语义;基于所述一个或多个语义,生成与所述一组手势对应的所述目标语义。在本实施例中,通过手势识别服务,对第一终端采集的视频帧图像中识别出的一组手势进行语义识别,以得到一个或多个语义,再基于一个或多个语义,生成与一组手势对应的完整的目标语义。通过本实施例,实现了将从采用手势进行通信的终端中获取的手势转成目标语义的目的。In an optional embodiment, the gesture recognition service performs semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal, and obtains the gestures represented by the group of gestures. Target semantics, including: performing semantic recognition on the group of gestures recognized in a group of video frames collected by the first terminal through the gesture recognition service to obtain one or more semantics, wherein each of the The semantics are semantics expressed by one or more gestures in the group of gestures; based on the one or more semantics, the target semantics corresponding to the group of gestures are generated. In this embodiment, through the gesture recognition service, semantic recognition is performed on a group of gestures identified in the video frame images collected by the first terminal to obtain one or more semantic meanings, and then based on the one or more semantic meanings, a The complete target semantics for group gestures. Through this embodiment, the purpose of converting gestures obtained from terminals using gestures for communication into target semantics is achieved.
在一个可选的实施例中,所述将所述目标语义发送给所述第二终端,包括:在所述目标语义是将所述一个或多个语义拼接成的语义时,将所述目标语义中包括的每个所述语义与所述一组视频帧中对应的视频帧同步发送给所述第二终端;或者,在所述目标语义是由包括与所述一组视频帧对应的数据流表示、且所述数据流为文字流和音频流时,将所述文字流与所述一组视频帧中对应的视频帧进行同步合成,得到目标视频流;将所述目标视频流与所述音频流同步发送给所述第二终端。在本实施例中,将目标语义中包括的每个语义与一组视频帧中对应的视频帧同步发送给第二终端,例如,在第二终端也支持使用目标数据通道的情况下,可将表示目标语义的数据流通过目标数据通道与由视频帧形成的视频流同步发送给第二终端;或者,在第二终端不支持使用目标数据通道的情况下,将用于表示目标语义的数据流中包括的文字流与视频帧进行同步合成,以得到目标视频流,再将目标视频流与音频流同步发送给第二终端,通过本实施例,在第二终端支持目标数据通道的情况下,通过目标数据通道传输数据流,并与视频流同步发送给第二终端,而在第二终端不支持使用目标数据通道的情况下,则将数据流中包括的文字流与视频帧进行合成,再与音频流同步发送给第二终端。In an optional embodiment, the sending the target semantics to the second terminal includes: when the target semantics is the semantics formed by concatenating the one or more semantics, sending the target Each of the semantics included in the semantics is sent to the second terminal synchronously with corresponding video frames in the group of video frames; or, when the target semantics is composed of data corresponding to the group of video frames stream representation, and when the data stream is a text stream and an audio stream, the text stream is synchronously synthesized with the corresponding video frames in the group of video frames to obtain a target video stream; the target video stream is combined with the The audio stream is synchronously sent to the second terminal. In this embodiment, each semantics included in the target semantics is sent to the second terminal synchronously with the corresponding video frames in a group of video frames. For example, when the second terminal also supports the use of the target data channel, the The data stream representing the target semantics is sent to the second terminal synchronously with the video stream formed by the video frame through the target data channel; or, when the second terminal does not support the use of the target data channel, the data stream used to represent the target semantics The text stream included in and the video frame are synchronously synthesized to obtain the target video stream, and then the target video stream and the audio stream are synchronously sent to the second terminal. Through this embodiment, when the second terminal supports the target data channel, Transmit the data stream through the target data channel, and send it to the second terminal synchronously with the video stream, and if the second terminal does not support the use of the target data channel, synthesize the text stream included in the data stream with the video frame, and then It is sent to the second terminal synchronously with the audio stream.
在一个可选的实施例中,所述方法还包括:在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端和所述第二终端均支持使用目标数据通道的情况下,获取所述第一 终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道包括第一目标数据通道和第二目标数据通道,所述第一目标数据通道是所述第一终端与媒体服务器之间的数据通道,所述第二目标数据通道是所述第二终端与所述媒体服务器之间的数据通道;所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述第一目标数据通道上传输的所述第一请求;所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述视频通话中,获取所述第一终端采集到的第一组视频帧和对应的第一组音频帧,以及在所述第一组视频帧中识别出的第一组手势;在得到所述目标语义之后,所述方法还包括:通过所述混合媒体服务,对所述第一组视频帧形成的第一视频流、所述第一组音频帧形成的第一音频流以及用于表示所述目标语义的第一数据流进行同步处理,得到同步的所述第一视频流、所述第一音频流和所述第一数据流;所述将所述目标语义发送给所述第二终端,包括:将同步的所述第一视频流、所述第一音频流和所述第一数据流发送给所述第二终端,其中,所述同步的所述第一数据流在所述第二目标数据通道上发送。在本实施例中,当第一终端和第二终端均支持使用目标数据通道的情况下,在创建手势识别服务之后,对获取的第一终端采集到的第一组视频帧图像中识别出的一组手势进行语义识别,以得到目标语义,用于表示目标语义的第一数据流可以包括文字流、语音流,即将手势转换成语音或文字等,在识别语义之后,通过媒体服务器提供的混合媒体服务和手势识别服务,对第一视频流、第一音频流和第一数据流进行同步处理,再发送给第二终端,且第一数据流是通过第二目标数据通道(或称为专用数据通道)发送给第二终端的;在本实施例中,对于第二终端采用非手势通信方式,即采用正常的视频或语音方式进行通信,通过媒体服务器和/或第三方服务组件将第二终端的语音帧转换成手势流、目标文字流,并通过第一目标数据通道(或称为专用数据通道)将手势流、目标文字流与第二终端采集的视频帧和音频帧同步发送给第一终端。通过本实施例,当第一终端和第二终端均支持使用目标数据通道时,实现了其中一端采用手势进行交互通信的目的,并实现了将手势转换成数据流后通过目标数据通道进行发送的目的。In an optional embodiment, the method further includes: making the video call between the first terminal and the second terminal, and both the first terminal and the second terminal support the use of target data In the case of a channel, obtain the second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the target data channel, where the The target data channel includes a first target data channel and a second target data channel, the first target data channel is a data channel between the first terminal and the media server, and the second target data channel is the first target data channel A data channel between the second terminal and the media server; the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first terminal on the first target data channel The first request transmitted; the creating the gesture recognition service in response to the first request includes: sending a target instruction to the media server through a service control node in response to the first request, wherein, The target instruction is used to request to create a mixed media service and the gesture recognition service, the mixed media service is used to process the video stream, audio stream and data stream in the video call, and the data stream represents the Create the mixed media service and the gesture recognition service through the media server, or instruct a third-party service component to create the mixed media service and the gesture recognition service through the media server; In the video call or audio call, obtaining a group of gestures recognized in a group of video frames collected by the first terminal includes: in the video call, obtaining the first gestures collected by the first terminal A group of video frames and a corresponding first group of audio frames, and a first group of gestures identified in the first group of video frames; after obtaining the target semantics, the method further includes: using the mixed media service , performing synchronous processing on the first video stream formed by the first group of video frames, the first audio stream formed by the first group of audio frames, and the first data stream used to represent the target semantics, to obtain all synchronized The first video stream, the first audio stream, and the first data stream; the sending the target semantics to the second terminal includes: synchronizing the first video stream, the first An audio stream and the first data stream are sent to the second terminal, wherein the synchronized first data stream is sent on the second target data channel. In this embodiment, when both the first terminal and the second terminal support the use of the target data channel, after the gesture recognition service is created, the first group of video frame images acquired by the first terminal are recognized A group of gestures are semantically recognized to obtain the target semantics. The first data stream used to represent the target semantics may include text streams and voice streams, that is, to convert gestures into voice or text, etc. After the semantics are recognized, the mixed data stream provided by the media server Media service and gesture recognition service, the first video stream, the first audio stream and the first data stream are synchronized, and then sent to the second terminal, and the first data stream is passed through the second target data channel (or called dedicated data channel) to the second terminal; in this embodiment, the second terminal uses a non-gesture communication method, that is, uses a normal video or voice communication method, and the second terminal is sent to the second terminal through a media server and/or a third-party service component. The voice frame of the terminal is converted into a gesture stream and a target text stream, and the gesture stream, the target text stream and the video frames and audio frames collected by the second terminal are synchronously sent to the first target data channel (or called a dedicated data channel). a terminal. Through this embodiment, when both the first terminal and the second terminal support the use of the target data channel, the purpose of using gestures for interactive communication at one end is achieved, and the gesture is converted into a data stream and then sent through the target data channel Purpose.
在一个可选的实施例中,所述方法还包括:在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端支持使用目标数据通道和所述第二终端不支持使用所述目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第一终端与媒体服务器之间的数据通道;所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述目标数据通道上传输的所述第一请求;所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务、合成服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行 处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务、所述合成服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务、所述合成服务和所述手势识别服务;在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述视频通话中,获取所述第一终端采集到的第二组视频帧和对应的第二组音频帧,以及在所述第二组视频帧中识别出的第二组手势;在得到所述目标语义之后,所述方法还包括:通过所述合成服务,将用于表示所述目标语义的第一文字流与所述第二组视频帧形成的视频流进行合成处理,得到第二视频流,通过所述混合媒体服务,将用于表示所述目标语义的数据流中包括的第二音频流与所述第二视频流进行同步处理,得到同步的所述第二视频流和所述第二音频流,其中,所述数据流包括所述第一文字流;所述将所述目标语义发送给所述第二终端,包括:将同步的所述第二视频流、所述第二音频流发送给所述第二终端。在本实施例中,当第一终端支持使用目标数据通道和第二终端不支持使用目标数据通道的情况下,在通过媒体服务器创建混合媒体服务、合成服务和手势识别服务之后,对获取的第一终端采集到的第二组视频帧图像中识别出的一组手势进行语义识别,以得到目标语义,用于表示目标语义的第一数据流可以包括第一文字流、语音流,即将手势转换成语音或文字等,在识别语义之后,通过媒体服务器提供的合成服务,将用于表示目标语义的第一文字流与第二组视频帧形成的视频流进行合成处理,得到第二视频流,再通过混合媒体服务,将用于表示目标语义的数据流中包括的第二音频流与第二视频流进行同步处理,得到同步的第二视频流和第二音频流,并发送给第二终端;在本实施例中,对于第二终端采用非手势通信方式,即采用正常的视频或语音方式进行通信,通过媒体服务器和/或第三方服务组件将第二终端的语音帧转换成手势流、目标文字流,并通过第一目标数据通道(或称为专用数据通道)将手势流、目标文字流与第二终端采集的视频帧和音频帧同步发送给第一终端。通过本实施例,当第一终端支持使用目标数据通道和第二终端不支持使用目标数据通道时,实现了其中一端采用手势进行交互通信的目的,并实现了将手势转换成文字流后与视频流进行合成后再与音频流同步进行发送的目的。In an optional embodiment, the method further includes: performing the video call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and the second terminal If the use of the target data channel is not supported, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the A target data channel, wherein the target data channel is a data channel between the first terminal and a media server; the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the The first request transmitted by the first terminal on the target data channel; the creating the gesture recognition service in response to the first request includes: in response to the first request, through a service control node Sending a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service, a composition service, and the gesture recognition service, and the mixed media service is used for the video stream, audio stream and data stream, the data stream is a data stream representing the target semantics; create the mixed media service, the composition service and the gesture recognition service through the media server, or, through the media server The server instructs the third-party service component to create the mixed media service, the synthesis service, and the gesture recognition service; during the video call or audio call, obtain the identified information in a group of video frames collected by the first terminal A set of gestures, including: during the video call, acquiring a second set of video frames and a corresponding second set of audio frames collected by the first terminal, and the first set of gestures identified in the second set of video frames Two groups of gestures; after obtaining the target semantics, the method further includes: using the synthesis service, synthesizing the first text stream used to represent the target semantics and the video stream formed by the second group of video frames process to obtain a second video stream, and use the mixed media service to synchronize the second audio stream included in the data stream representing the target semantics with the second video stream to obtain the synchronized first Two video streams and the second audio stream, wherein the data stream includes the first text stream; sending the target semantics to the second terminal includes: synchronizing the second video stream . Send the second audio stream to the second terminal. In this embodiment, when the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel, after creating the mixed media service, composition service and gesture recognition service through the media server, the obtained first Semantic recognition is performed on a group of gestures identified in the second group of video frame images collected by a terminal to obtain target semantics. The first data stream used to represent the target semantics may include a first text stream and a voice stream, that is, gestures are converted into Voice or text, etc., after recognizing the semantics, through the synthesis service provided by the media server, the first text stream used to represent the target semantics and the video stream formed by the second group of video frames are synthesized to obtain the second video stream, and then through Mixed media service, synchronizing the second audio stream and the second video stream included in the data stream used to represent the target semantics, obtaining the synchronized second video stream and the second audio stream, and sending them to the second terminal; In this embodiment, a non-gesture communication method is adopted for the second terminal, that is, normal video or voice communication is adopted, and the voice frame of the second terminal is converted into a gesture stream and target text through a media server and/or a third-party service component. stream, and synchronously send the gesture stream, the target text stream, and the video frames and audio frames collected by the second terminal to the first terminal through the first target data channel (or called a dedicated data channel). Through this embodiment, when the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel, the purpose of using gestures for interactive communication at one end is realized, and the conversion of gestures into text streams and video The stream is synthesized and then sent synchronously with the audio stream.
在一个可选的实施例中,所述方法还包括:在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端不支持使用目标数据通道和所述第二终端支持使用所述目标数据通道的情况下,获取所述第二终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第二终端与媒体服务器之间的数据通道;所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第二终端在所述目标数据通道上传输的所述第一请求;所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述视频通话中,获取所述第一终端采集到的第三组视频帧和对应的第三组音频帧,以及在所述第三组视频帧中识别出的第三组手势;在得到所述目标语义之后,所述方法还包括:通过所述混合媒体服务,对所述第三组视频帧形成的第三视频流、所 述第三组音频帧形成的第三音频流以及用于表示所述目标语义的第三数据流进行同步处理,得到同步的所述第三视频流、所述第三音频流和所述第三数据流;所述将所述目标语义发送给所述第二终端,包括:将同步的所述第三视频流、所述第三音频流和所述第三数据流发送给所述第二终端,其中,所述同步的所述第三数据流在所述目标数据通道上发送。在本实施例中,当第一终端不支持使用目标数据通道和第二终端支持使用目标数据通道的情况下,在通过媒体服务器创建混合媒体服务和手势识别服务之后,对获取的第一终端采集到的第三组视频帧图像中识别出的一组手势进行语义识别,以得到目标语义,用于表示目标语义的第三数据流可以包括文字流、语音流,即将手势转换成语音或文字等,在识别语义之后,通过媒体服务器提供混合媒体服务,对第三视频流、第三音频流和第三数据流进行同步处理,再发送给第二终端,且第三数据流在目标数据通道上发送;在本实施例中,对于第二终端采用非手势通信方式,即采用正常的视频或语音方式进行通信,通过媒体服务器和/或第三方服务组件将第二终端的语音帧转换成手势流、目标文字流,然后,再通过媒体服务器提供的合成服务,将手势流、目标文字流及第二终端采集的视频帧进行合成,以得到目标视频流,并将该目标视频流与第二终端采集的音频帧同步发送给第一终端。通过本实施例,当第一终端不支持使用目标数据通道和第二终端支持使用目标数据通道时,实现了其中一端采用手势进行交互通信的目的,并实现了将手势转换成文字流后通过目标数据通道进行发送的目的。In an optional embodiment, the method further includes: performing the video call between the first terminal and the second terminal, and the first terminal does not support the use of the target data channel and the second terminal When the terminal supports the use of the target data channel, obtain a second request sent by the second terminal, where the second request is used to request to create a target data channel; in response to the second request, create the A target data channel, wherein the target data channel is a data channel between the second terminal and the media server; the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the The first request transmitted by the second terminal on the target data channel; the creating the gesture recognition service in response to the first request includes: in response to the first request, through a service control node sending a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service and the gesture recognition service, and the mixed media service is used to process the video stream, audio stream and data in the video call The data stream is a data stream representing the target semantics; the mixed media service and the gesture recognition service are created through the media server, or the media server instructs a third-party service component to create the The mixed media service and the gesture recognition service; during the video call or audio call, acquiring a group of gestures identified in a group of video frames collected by the first terminal includes: during the video call, Obtaining a third group of video frames and a corresponding third group of audio frames collected by the first terminal, and a third group of gestures identified in the third group of video frames; after obtaining the target semantics, the The method further includes: using the mixed media service, the third video stream formed by the third group of video frames, the third audio stream formed by the third group of audio frames, and the third audio stream used to represent the target semantics Perform synchronous processing on the three data streams to obtain the synchronized third video stream, the third audio stream, and the third data stream; the sending the target semantics to the second terminal includes: synchronizing The third video stream, the third audio stream, and the third data stream are sent to the second terminal, wherein the synchronized third data stream is sent on the target data channel. In this embodiment, when the first terminal does not support the use of the target data channel and the second terminal supports the use of the target data channel, after creating the mixed media service and the gesture recognition service through the media server, the acquired first terminal collects Semantic recognition is performed on a group of gestures identified in the third group of video frame images to obtain the target semantics. The third data stream used to represent the target semantics may include text streams and voice streams, that is, gestures are converted into voice or text, etc. , after identifying the semantics, the media server provides mixed media services, performs synchronous processing on the third video stream, the third audio stream and the third data stream, and then sends them to the second terminal, and the third data stream is on the target data channel Sending; in this embodiment, non-gesture communication is used for the second terminal, that is, normal video or voice communication is used, and the voice frame of the second terminal is converted into a gesture stream through a media server and/or a third-party service component , the target text stream, and then, through the synthesis service provided by the media server, the gesture stream, the target text stream and the video frames collected by the second terminal are synthesized to obtain the target video stream, and the target video stream is combined with the second terminal The collected audio frames are synchronously sent to the first terminal. Through this embodiment, when the first terminal does not support the use of the target data channel and the second terminal supports the use of the target data channel, the purpose of using gestures for interactive communication at one end is achieved, and the gestures are converted into text streams and passed through the target The purpose of the data channel to send.
在一个可选的实施例中,所述方法还包括:在所述第一终端和所述第二终端进行所述音频通话、且所述第一终端和所述第二终端均支持使用目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道包括第一目标数据通道和第二目标数据通道,所述第一目标数据通道是所述第一终端与媒体服务器之间的数据通道,所述第二目标数据通道是所述第二终端与所述媒体服务器之间的数据通道;所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述第一目标数据通道上传输的所述第一请求;所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述音频通话中的音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述音频通话中,获取所述第一终端采集到的第四组视频帧和对应的第四组音频帧,以及在所述第四组视频帧中识别出的第四组手势;在得到所述目标语义之后,所述方法还包括:通过所述混合媒体服务,对用于表示所述目标语义的第二文字流和所述第四组音频帧形成的第四组音频流进行同步处理,得到同步的所述第二文字流和第四音频流,其中,所述数据流包括所述第二文字流;所述将所述目标语义发送给所述第二终端,包括:将同步的所述第二文字流和所述第四音频流发送给所述第二终端,其中,所述同步的所述第二文字流在所述第二目标数据通道上发送。在本实施例中,当第一终端和第二终端均支持使用目标数据通道的情况下,在创建手势识别服务之后,对获取的第一终端采集到的第四组视频帧图像中识别出的一组手势进行语义识别,以得到目标语义,用于表示目标语义的第一数据流可以包括文字流、语音流,即将手势转换成语音或文字等,在识 别语义之后,通过媒体服务器提供的混合媒体服务和手势识别服务,对第一终端采集到的第四组音频帧形成的音频流和第一数据流进行同步处理,再发送给第二终端,且第一数据流是通过第二目标数据通道(或称为专用数据通道)发送给第二终端的;在本实施例中,对于第二终端采用非手势通信方式,即采用正常的语音方式进行通信,通过媒体服务器和/或第三方服务组件将第二终端的语音帧转换成手势流、目标文字流,并通过第一目标数据通道(或称为专用数据通道)将手势流、目标文字流与第二终端采集的视频帧和/或音频帧同步发送给第一终端。通过本实施例,当第一终端和第二终端均支持使用目标数据通道时,实现了其中一端采用手势进行交互通信的目的,并实现了将手势转换成数据流后通过目标数据通道进行发送的目的。In an optional embodiment, the method further includes: conducting the audio call between the first terminal and the second terminal, and both the first terminal and the second terminal support the use of target data In the case of a channel, obtain the second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the target data channel, where the The target data channel includes a first target data channel and a second target data channel, the first target data channel is a data channel between the first terminal and the media server, and the second target data channel is the first target data channel A data channel between the second terminal and the media server; the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first terminal on the first target data channel The first request transmitted; the creating the gesture recognition service in response to the first request includes: sending a target instruction to the media server through a service control node in response to the first request, wherein, The target instruction is used to request the creation of a mixed media service and the gesture recognition service, the mixed media service is used to process the audio stream and data stream in the audio call, and the data stream represents the target semantic the data stream; create the mixed media service and the gesture recognition service through the media server, or instruct a third-party service component to create the mixed media service and the gesture recognition service through the media server; During a video call or an audio call, obtaining a group of gestures identified in a group of video frames collected by the first terminal includes: obtaining a fourth group of video frames collected by the first terminal during the audio call and the corresponding fourth group of audio frames, and the fourth group of gestures identified in the fourth group of video frames; after obtaining the target semantics, the method further includes: using the mixed media service, performing synchronous processing on the second text stream representing the target semantics and the fourth group of audio streams formed by the fourth group of audio frames to obtain the synchronized second text stream and fourth audio stream, wherein the data The stream includes the second text stream; the sending the target semantics to the second terminal includes: sending the synchronized second text stream and the fourth audio stream to the second terminal, Wherein, the synchronized second text stream is sent on the second target data channel. In this embodiment, when both the first terminal and the second terminal support the use of the target data channel, after the gesture recognition service is created, the acquired gestures identified in the fourth group of video frame images collected by the first terminal A group of gestures are semantically recognized to obtain the target semantics. The first data stream used to represent the target semantics may include text streams and voice streams, that is, to convert gestures into voice or text, etc. After the semantics are recognized, the mixed data stream provided by the media server Media services and gesture recognition services, synchronously process the audio stream formed by the fourth group of audio frames collected by the first terminal and the first data stream, and then send it to the second terminal, and the first data stream is passed through the second target data channel (or called a dedicated data channel) to the second terminal; in this embodiment, the second terminal adopts a non-gesture communication method, that is, communicates in a normal voice mode, through a media server and/or a third-party service The component converts the voice frame of the second terminal into gesture stream and target text stream, and combines the gesture stream and target text stream with the video frames and/or The audio frame is synchronously sent to the first terminal. Through this embodiment, when both the first terminal and the second terminal support the use of the target data channel, the purpose of using gestures for interactive communication at one end is achieved, and the gesture is converted into a data stream and then sent through the target data channel Purpose.
在一个可选的实施例中,所述方法还包括:在所述第一终端和所述第二终端进行所述音频通话、且所述第一终端支持使用目标数据通道和所述第二终端不支持使用所述目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第一终端与媒体服务器之间的数据通道;所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述目标数据通道上传输的所述第一请求;所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建所述手势识别服务;通过所述媒体服务器创建所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述手势识别服务;在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述音频通话中,获取所述第一终端采集到的第五组视频帧和对应的第五组音频帧,以及在所述第五组视频帧中识别出的第五组手势;所述将所述目标语义发送给所述第二终端,包括:将用于表示所述目标语义的第五音频流发送给所述第二终端。在本实施例中,当第一终端支持使用目标数据通道和第二终端不支持使用目标数据通道的情况下,在创建手势识别服务之后,对获取的第一终端采集到的第五组视频帧图像中识别出的一组手势进行语义识别,以得到目标语义,用于表示目标语义的数据流可以包括文字流、语音流,即将手势转换成语音或文字等,在识别语义之后,将表示目标语音的第五音频流发送给第二终端的;在本实施例中,对于第二终端采用非手势通信方式,即采用正常的语音方式进行通信,通过媒体服务器和/或第三方服务组件将第二终端的语音帧转换成手势流、目标文字流,并通过目标数据通道(或称为专用数据通道)将手势流、目标文字流与第二终端采集的音频流同步发送给第一终端。通过本实施例,当第一终端支持使用目标数据通道和第二终端不支持使用目标数据通道时,实现了其中一端采用手势进行交互通信的目的,并实现了将手势转换成音频流后进行发送的目的。In an optional embodiment, the method further includes: conducting the audio call between the first terminal and the second terminal, and the first terminal supports the use of a target data channel and the second terminal If the use of the target data channel is not supported, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the A target data channel, wherein the target data channel is a data channel between the first terminal and a media server; the obtaining the first request sent by the first terminal or the second terminal includes: obtaining the The first request transmitted by the first terminal on the target data channel; the creating the gesture recognition service in response to the first request includes: in response to the first request, through a service control node sending a target instruction to the media server, wherein the target instruction is used to request creation of the gesture recognition service; creating the gesture recognition service through the media server, or instructing a third-party service component to create a gesture recognition service through the media server The gesture recognition service; during the video call or audio call, obtaining a group of gestures recognized in a group of video frames collected by the first terminal includes: during the audio call, obtaining the first The fifth group of video frames and the corresponding fifth group of audio frames collected by the terminal, and the fifth group of gestures recognized in the fifth group of video frames; the sending the target semantics to the second terminal , including: sending the fifth audio stream used to represent the target semantics to the second terminal. In this embodiment, when the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel, after the gesture recognition service is created, the acquired fifth group of video frames collected by the first terminal A group of gestures identified in the image are subjected to semantic recognition to obtain the target semantics. The data stream used to represent the target semantics can include text streams and voice streams, that is, gestures are converted into voice or text, etc. After recognizing the semantics, the target will be represented The fifth audio stream of the voice is sent to the second terminal; in this embodiment, the second terminal adopts a non-gesture communication mode, that is, uses a normal voice mode to communicate, and the media server and/or third-party service component will send the fifth audio stream to the second terminal. The voice frames of the second terminal are converted into a gesture stream and a target text stream, and the gesture stream, target text stream and the audio stream collected by the second terminal are synchronously sent to the first terminal through the target data channel (or called a dedicated data channel). Through this embodiment, when the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel, the purpose of using gestures for interactive communication at one end is achieved, and the gestures are converted into audio streams before sending the goal of.
显然,上述所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。下面结合具体实施例对本公开进行具体说明:Apparently, the embodiments described above are only some of the embodiments of the present disclosure, not all of them. The present disclosure is specifically described below in conjunction with specific embodiments:
图3是根据本公开具体实施例的手势通信系统结构和媒体路径图,如图3所示,该系统包括:Fig. 3 is a gesture communication system structure and media path diagram according to a specific embodiment of the present disclosure. As shown in Fig. 3, the system includes:
S101终端(类型1):一种新型的终端类型,类型1相当于前述支持目标数据通道的终端(以下简称为“类型1”),支持实时音视频流通道,也支持实时的数据流专用的通道(专用数据通道,对应于前述目标数据通道);在本公开中,终端通过专用数据通道与网络侧实体交 互,为最终用户提供新的业务体验,通过专用通道接收网络侧数据流,通过音视频流通道接收音视频流;本公开中本终端类型可以是独立的应用程序,也可以是专用的终端设备;S101 terminal (type 1): a new type of terminal, type 1 is equivalent to the aforementioned terminal that supports the target data channel (hereinafter referred to as "type 1"), supports real-time audio and video stream channels, and also supports real-time data stream dedicated channel (dedicated data channel, corresponding to the aforementioned target data channel); in this disclosure, the terminal interacts with network-side entities through a dedicated data channel to provide end users with a new service experience, receives network-side data streams through a dedicated channel, and passes audio The video stream channel receives audio and video streams; in this disclosure, the terminal type can be an independent application program or a dedicated terminal device;
S102终端(类型2):传统的终端,类型2相当于前述不支持目标数据通道的终端(以下简称为“类型2”),只支持实时音视频流通道;终端通过与“SBC/P-CSCF”网络侧实体交互,为最终用户提供业务体验,通过音视频流通道接收音视频流;S102 terminal (type 2): a traditional terminal, type 2 is equivalent to the aforementioned terminal that does not support the target data channel (hereinafter referred to as "type 2"), and only supports real-time audio and video stream channels; the terminal is connected to the "SBC/P-CSCF "Entity interaction on the network side provides service experience for end users and receives audio and video streams through audio and video stream channels;
S103接入控制实体(SBC/P-CSCF):为终端提供信令、媒体的接入,支持音视频流通道和数据流通道,对音视频流和数据流进行转发;S103 Access Control Entity (SBC/P-CSCF): Provide signaling and media access for terminals, support audio and video stream channels and data stream channels, and forward audio and video streams and data streams;
S104会话控制实体(I/S-CSCF):Interrogating/Serving-CSCF(Call Session Control Function)查询/服务-呼叫会话控制功能,为多类型终端提供注册鉴权,会话控制,呼叫路由等IMS网络中的基本功能,将呼叫触发到“服务控制节点”;S104 Session Control Entity (I/S-CSCF): Interrogating/Serving-CSCF (Call Session Control Function) query/service-call session control function, providing registration authentication, session control, call routing, etc. for multiple types of terminals in the IMS network The basic function of triggering the call to the "service control node";
S105服务控制节点(Service Control Node):作为手势通信系统的信令控制网元,承接IMS呼叫管理能力,负责控制呼叫;作为手势通信的服务提供网元,可以通过服务总线调用相关服务,对其他应用提供通信能力和服务能力,服务调用和控制各类媒体数据流转发,包括呼叫实时音视频流媒体转发以及数据流的转发;S105 Service Control Node: As a signaling control network element of the gesture communication system, it undertakes the IMS call management capability and is responsible for controlling calls; as a service provider network element of gesture communication, it can call related services through the service bus, and other The application provides communication capabilities and service capabilities, and the service calls and controls the forwarding of various media data streams, including calling real-time audio and video streaming media forwarding and data stream forwarding;
具体增强功能包括但不限于:Specific enhancements include, but are not limited to:
(1)提供音视频呼叫和数据流通道呼叫的管理,包括但不限于,呼叫的建立,媒体透传、媒体路径的改向,呼叫的拆除,呼叫事件的上报、服务调用、服务结果通知等;(1) Provide management of audio and video calls and data flow channel calls, including but not limited to, call establishment, media transparent transmission, media path redirection, call removal, call event reporting, service invocation, service result notification, etc. ;
(2)提供通信能力和服务的对外开放,处理应用控制的业务请求,把业务请求转换成具体的控制操作。如应用控制节点通过服务控制节点提供的开放接口,可用对媒体服务器,第三方服务组件进行调用,资源申请,实现手势识别翻译转语音、手势流动画生成、合成音视频媒体流,数据流一体的媒体流。并对服务结果进行通知;(2) Provide communication capabilities and services to the outside world, process business requests for application control, and convert business requests into specific control operations. For example, through the open interface provided by the service control node, the application control node can call the media server and third-party service components, apply for resources, realize gesture recognition translation to voice, gesture flow animation generation, synthesized audio and video media stream, and data stream integration. media stream. and notify of service results;
(3)通过服务总线对媒体服务器提供的各类服务进行调用和控制,包括但不限于,数据通道的创建、修改、删除,音视频媒体资源的申请、修改、删除,以及手势识别和翻译能力的申请、修改、删除等;(3) Invoke and control various services provided by the media server through the service bus, including but not limited to the creation, modification, and deletion of data channels, the application, modification, and deletion of audio and video media resources, and gesture recognition and translation capabilities application, modification, deletion, etc.;
对于本申请而言,服务控制节点可以独立存在,也可以和应用控制节点合设;For this application, the service control node can exist independently, or it can be set up together with the application control node;
S106应用控制节点(Application Control Node):实现各类业务服务逻辑。具体增强功能包括但不限于:(1)可以根据终端的应用形态(版本号,设备类型,特定标签等)来决定要发送的媒体流和数据流类别;比如是发送实时数据流,还是需要把它转换成实时的媒体流下发;(2)发送应用控制请求给服务控制节点,调用第三方服务组件和媒体服务器实现图像处理、手势识别、转换、合成;(3)可以通过服务总线对媒体服务器提供的各类服务进行调用,对服务结果进行上报;S106 Application Control Node: implement various business service logics. Specific enhancements include but are not limited to: (1) The media stream and data stream category to be sent can be determined according to the application form of the terminal (version number, device type, specific label, etc.); It is converted into a real-time media stream; (2) Send an application control request to the service control node, and call a third-party service component and media server to realize image processing, gesture recognition, conversion, and synthesis; (3) It can communicate with the media server through the service bus Invoke various services provided and report the service results;
需要说明的是,应用控制节点可以独立存在,也可以和服务控制节点合设。It should be noted that the application control node can exist independently, or it can be co-established with the service control node.
S107媒体服务器(Media Server):提供各类媒体服务。具体功能包括但不限于:(1)图像识别,如通过特征数据比对进行图像识别,识别手势;(2)实时媒体流生成的服务,如将语音片段转成对应RTP媒体流;(3)实时手势流生成,对识别的手势自动生成手势流视频;(4)合成服务,对已有的和已生成的媒体流、手势流进行合成输出(输出到实时音视频流里),将视频流、手势流、文字流统一合成在视频流中;(5)实时音视频流转发,对当前通话的音视频流进行锚定、处理、转发;(6)数据流转发服务,对手势流、文字流等数据流通过专用数据通道进行转发,对合成的一体数据流建立专用通道,进行转发;(7)服务控制节点和应 用控制节点可以通过服务总线对媒体服务器提供的各类服务进行调用;(8)混合媒体服务,支持将音视频流、数据流在一个混合媒体中进行处理;(9)建立专用数据通道,通过加密方式安全传递手势信息。S107 Media Server: Provide various media services. Specific functions include but are not limited to: (1) image recognition, such as image recognition through feature data comparison, and gesture recognition; (2) real-time media stream generation services, such as converting voice clips into corresponding RTP media streams; (3) Real-time gesture stream generation, which automatically generates gesture stream video for recognized gestures; (4) synthesis service, which synthesizes and outputs existing and generated media streams and gesture streams (output to real-time audio and video streams), and converts video streams , gesture stream, and text stream are combined in the video stream; (5) real-time audio and video stream forwarding, anchoring, processing, and forwarding the audio and video stream of the current call; (6) data stream forwarding service, gesture stream, text stream Streaming and other data streams are forwarded through a dedicated data channel, and a dedicated channel is established for the synthesized integrated data stream for forwarding; (7) the service control node and the application control node can call various services provided by the media server through the service bus; ( 8) Mixed media service, which supports the processing of audio and video streams and data streams in a mixed media; (9) Establishes a dedicated data channel to safely transmit gesture information through encryption.
S108第三方服务组件:可以被服务控制节点和应用控制节点调用,提供手势语言翻译,音频文本转换服务等。S108 Third-party service component: can be called by the service control node and the application control node, and provide gesture language translation, audio-to-text conversion services, etc.
S109HSS:提供用户业务数据等相关内容。S109HSS: Provide user service data and other related content.
现对本公开实施例的整体技术方案流程大致说明如下:The overall technical solution process of the embodiment of the present disclosure is roughly described as follows:
1)用户UE A携带终端标识向IMS网络发起音频或视频呼叫请求,呼叫UE B。经过SBC/P-CSCF,I/SCSCF,服务控制节点等网元,与UE B建立音频或者视频通话;1) User UE A carries the terminal ID to initiate an audio or video call request to the IMS network, and calls UE B. Establish an audio or video call with UE B through SBC/P-CSCF, I/SCSCF, service control node and other network elements;
UE A,UE B可以分别是不同的终端类型:终端(类型1)是一种新型的终端类型,它有实时音视频流通道,也有实时的数据流专用的通道;终端(类型2)是传统的终端,只支持实时音视频流通道;UE A and UE B can be different terminal types: terminal (type 1) is a new type of terminal, it has real-time audio and video stream channels, and also has a dedicated channel for real-time data streams; terminal (type 2) is a traditional The terminal only supports real-time audio and video streaming channels;
2)视频或者音频通话建立后,有支持数据流通道的终端(类型1)用户经过“SBC/P-CSCF”,“I/SCSCF”,“服务控制节点”向“媒体服务器”申请创建数据通道资源;2) After the video or audio call is established, the terminal (type 1) user that supports the data flow channel applies to the "media server" to create a data channel through "SBC/P-CSCF", "I/SCSCF", and "service control node" resource;
3)“媒体服务器”返回创建成功的数据通道资源;3) The "media server" returns the successfully created data channel resource;
4)终端(类型1,专用数据通道)的终端通过数据通道向“应用控制节点”发起手势识别转换请求;4) The terminal (type 1, dedicated data channel) initiates a gesture recognition conversion request to the "application control node" through the data channel;
“应用控制节点”指示“服务控制节点”创建手势识别资源;The "application control node" instructs the "service control node" to create gesture recognition resources;
“服务控制节点”指示“媒体服务器”创建混合媒体服务,需要手势识别相关服务;The "service control node" instructs the "media server" to create a mixed media service, which requires gesture recognition related services;
“媒体服务器”向“第三方服务组件”申请手势识别服务,创建混合媒体服务成功。The "media server" applies for the gesture recognition service from the "third-party service component", and the mixed media service is created successfully.
5)“服务控制节点”通过Reinvite方式分别邀请UE A、UE B入会;向“媒体服务器”申请UE A、UE B入会资源;5) The "service control node" invites UE A and UE B to join the conference respectively through Reinvite; applies to the "media server" for UE A and UE B membership resources;
6)UE A、UE B媒体锚定到“媒体服务器”;6) UE A and UE B media are anchored to the "media server";
7)“服务控制节点”向“媒体服务器”申请手势识别、手势翻译业务种类及合成等处理;7) The "service control node" applies to the "media server" for processing such as gesture recognition, gesture translation business types and synthesis;
8)“媒体服务器”向“第三方服务组件”申请手势识别、手势翻译、语音转文字、文字转语音、手势流生成、语音流生成、手势流语音流文字流视频流合成、转发等服务,“媒体服务器”、“第三方服务组件”执行相应服务;8) The "media server" applies to the "third-party service component" for services such as gesture recognition, gesture translation, speech-to-text, text-to-speech, gesture stream generation, voice stream generation, gesture stream, voice stream, text stream, video stream synthesis, forwarding, etc. "Media Server" and "Third Party Service Components" perform corresponding services;
9)“媒体服务器”向不同终端类型UE A、UE B发送不同流信息(合成和非合成),包括语音流、视频流、手势流、文字流等内容;9) "Media Server" sends different stream information (synthetic and non-synthetic) to different terminal types UE A and UE B, including voice stream, video stream, gesture stream, text stream, etc.;
10)“媒体服务器”向“服务控制节点”返回手势识别手势流文字流语音流等操作响应。10) The "media server" returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
具体实施例一:手势使用者(终端类型1,有专用数据通道)和非手势使用者(终端类型1,有专用数据通道)视频通话:Specific embodiment one: gesture user (terminal type 1, with dedicated data channel) and non-gesture user (terminal type 1, with dedicated data channel) video call:
图4是根据本公开具体实施例的手势通信方法示例图一,如图4所示,本实施例以使用终端(类型1)的手势使用者UE A拨打使用终端(类型1)的非手势使用者UE B,进行视频通话为例进行说明:Figure 4 is an example of a gesture communication method according to a specific embodiment of the present disclosure. User UE B takes a video call as an example for illustration:
步骤S201:终端(类型1)的手势使用者UE A携带终端标识向SBC/P-CSCF发起视频呼叫,呼叫非手势使用者UE B。Inivite中携带终端音视频video和audio的SDP相关信息;Step S201: The gesture user UE A of the terminal (type 1) carries the terminal identifier to initiate a video call to the SBC/P-CSCF, and calls the non-gesture user UE B. Inivite carries the SDP related information of terminal audio and video video and audio;
步骤S202:SBC/P-CSCF透传Invite呼叫信息到I/S-CSCF;Step S202: SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
步骤S203:I/S-CSCF找到对应用户的服务控制节点,向其发送呼叫信息;Step S203: The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
步骤S204~S206:视频呼叫到终端(类型1)的非手势使用者UE B;Steps S204-S206: make a video call to the non-gesture user UE B of the terminal (type 1);
步骤S207~S218:UE B用户发送200OK消息携带终端标识,摘机应答;UE A返回ACK消息;UE A和UE B建立视频通话;Steps S207-S218: UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish a video call;
步骤S219~S229:UE A申请创建数据通道资源;UE A用户需要手势识别,发送携带专用数据通道SDP数据通道的Invite请求,经过SBC/P-CSCF,I/S-CSCF,达到“服务控制节点”;“服务控制节点”向“媒体服务器”申请创建UE A数据通道;“媒体服务器”向“服务控制节点”返回数据通道创建完成;Steps S219-S229: UE A applies for the creation of data channel resources; UE A needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and reaches the "service control node" through SBC/P-CSCF and I/S-CSCF "; "Service Control Node" applies to the "Media Server" to create a UE A data channel; "Media Server" returns to the "Service Control Node" that the creation of the data channel is completed;
步骤S230:UE A通过数据通道发起手势识别转换请求;Step S230: UE A initiates a gesture recognition conversion request through the data channel;
步骤S231:“应用控制节点”指示“服务控制节点”创建手势识别资源;Step S231: the "application control node" instructs the "service control node" to create gesture recognition resources;
步骤S232:“服务控制节点”指示“媒体服务器”创建混合媒体服务,需要使用手势识别服务;Step S232: the "service control node" instructs the "media server" to create a mixed media service, which needs to use the gesture recognition service;
步骤S233:“媒体服务器”向“第三方服务组件”申请手势识别服务;Step S233: the "media server" applies to the "third-party service component" for a gesture recognition service;
步骤S234:“媒体服务器”向“服务控制节点”返回创建混合媒体服务成功;Step S234: the "media server" returns to the "service control node" the success of creating the mixed media service;
步骤S235~步骤S246:“服务控制节点”邀请UE B入会并且为UE B申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE B;UE B返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE B所需混合媒体资源。UE B的媒体锚定到媒体服务器;Steps S235 to S246: "Serving Control Node" invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node" sends a Reinvite message carrying SDP to UE B; UE B returns a 200OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE B. The media of UE B is anchored to the media server;
步骤S247~步骤S258:“服务控制节点”邀请UE A入会并且为UE A申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE A;UE A返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE A所需混合媒体资源;UE A的媒体锚定到媒体服务器;Steps S247 to S258: "Serving Control Node" invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node" sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
步骤S259:“服务控制节点”向“媒体服务器”申请手势翻译业务种类及合成处理;Step S259: The "service control node" applies to the "media server" for gesture translation service types and synthesis processing;
步骤S260:“媒体服务器”向“第三方服务组件”申请对终端数据的语音转文字处理,提取特征数据的手势图像识别,实时手势流生成,实时媒体流生成,合成服务,实时音视频流转发,数据流转发等服务;Step S260: The "media server" applies to the "third-party service component" for voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, synthesis service, real-time audio and video stream forwarding , data flow forwarding and other services;
步骤S261~S264:“媒体服务器”向UE A发送手势流、文字流、语音流、视频流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;Steps S261-S264: The "media server" sends the media stream information of gesture stream, text stream, voice stream, and video stream to UE A; the media stream information can be that the "media server" passes through the "service control node" and "application control node". "to the SBC/PCSCF and then to the terminal; it can also be the "media server" to the SBC/PCSCF and then to the terminal through the "application control node";
步骤S265:“媒体服务器”向“第三方服务组件”申请手势翻译合成转发服务;Step S265: The "media server" applies to the "third-party service component" for gesture translation, synthesis and forwarding services;
步骤S266~S268:“媒体服务器”向UE B发送语音流、文字流、视频流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;Steps S266-S268: The "media server" sends the media stream information of voice stream, text stream, and video stream to UE B; the media stream information can be sent from the "media server" to the SBC through the "service control node" and "application control node" /PCSCF and then to the terminal; it can also be that the "media server" passes through the "application control node" to the SBC/PCSCF and then to the terminal;
步骤S269:“媒体服务器”向“服务控制节点”返回手势识别手势流文字流语音流等操作响应。Step S269: The "media server" returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
具体实施例二:非手势使用者(终端类型2,无专用数据通道)和手势使用者(终端类型1,有专用数据通道)视频通话:Specific embodiment two: Non-gesture user (terminal type 2, without dedicated data channel) and gesture user (terminal type 1, with dedicated data channel) video call:
图5是根据本公开具体实施例的手势通信方法示例图二,如图5所示,本实施例以非手势使用者UE A(终端类型2,无专用数据通道)和手势使用者UE B(终端类型1,有专用数 据通道)进行视频通话为例进行说明:Fig. 5 is a second example of a gesture communication method according to a specific embodiment of the present disclosure. As shown in Fig. 5 , in this embodiment, a non-gesture user UE A (terminal type 2, no dedicated data channel) and a gesture user UE B ( Terminal type 1, with a dedicated data channel) to make a video call as an example:
步骤S301:终端(类型2)的非手势使用者UE A携带终端标识向SBC/P-CSCF发起视频呼叫,呼叫手势使用者UE B,Inivite中携带终端音视频video和audio的SDP相关信息;Step S301: The non-gesture user UE A of the terminal (type 2) carries the terminal identifier to initiate a video call to the SBC/P-CSCF, calls the gesture user UE B, and Inivite carries the SDP related information of the terminal audio and video video and audio;
步骤S302:SBC/P-CSCF透传Invite呼叫信息到I/S-CSCF;Step S302: SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
步骤S303:I/S-CSCF找到对应用户的服务控制节点,向其发送呼叫信息;Step S303: The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
步骤S304~S306:视频呼叫到终端(类型1)的手势使用者UE B;Steps S304-S306: video call to the gesture user UE B of the terminal (type 1);
步骤S307~S318:UE B用户发送200OK消息携带终端标识,摘机应答,UE A返回ACK消息;UE A和UE B建立视频通话;Steps S307-S318: UE B sends a 200OK message carrying the terminal ID, picks up the phone to answer, and UE A returns an ACK message; UE A and UE B establish a video call;
步骤S319~S329:UE B申请创建数据通道资源;UE B用户需要手势识别,发送携带专用数据通道SDP数据通道的Invite请求,经过SBC/P-CSCF,I/S-CSCF,达到“服务控制节点”;“服务控制节点”向“媒体服务器”申请创建UE B数据通道;“媒体服务器”向“服务控制节点”返回数据通道创建完成;Steps S319-S329: UE B applies for the creation of data channel resources; UE B needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and reaches the "service control node" through SBC/P-CSCF and I/S-CSCF "; "Service Control Node" applies to the "Media Server" to create a UE B data channel; "Media Server" returns to the "Service Control Node" that the data channel creation is completed;
步骤S330:UE B通过数据通道发起手势识别转换请求;Step S330: UE B initiates a gesture recognition conversion request through the data channel;
步骤S331:“应用控制节点”指示“服务控制节点”创建手势识别资源;Step S331: the "application control node" instructs the "service control node" to create gesture recognition resources;
步骤S332:“服务控制节点”指示“媒体服务器”创建混合媒体服务,需要使用手势识别服务;Step S332: the "service control node" instructs the "media server" to create a mixed media service, which needs to use the gesture recognition service;
步骤S333:“媒体服务器”向“第三方服务组件”申请手势识别服务;Step S333: the "media server" applies to the "third-party service component" for a gesture recognition service;
步骤S334:“媒体服务器”向“服务控制节点”返回创建混合媒体服务成功:Step S334: The "media server" returns to the "service control node" the success of creating the mixed media service:
步骤S335~步骤S346:“服务控制节点”邀请UE A入会并且为UE A申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE A;UE A返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE A所需混合媒体资源;UE A的媒体锚定到媒体服务器;Steps S335 to S346: "Serving Control Node" invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node" sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
步骤S347~步骤S358:“服务控制节点”邀请UE B入会并且为UE B申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE B;UE A返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE B所需混合媒体资源;UE B的媒体锚定到媒体服务器;Steps S347 to S358: "Serving Control Node" invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node" sends a Reinvite message carrying SDP to UE B; UE A returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
步骤S359:“服务控制节点”向“媒体服务器”申请手势翻译业务种类及合成处理;Step S359: The "service control node" applies to the "media server" for gesture translation service types and synthesis processing;
步骤S360:“媒体服务器”向“第三方服务组件”申请手势翻译合成转发服务,对终端数据的语音转文字处理,提取特征数据的手势图像识别,实时手势流生成,实时媒体流生成,合成服务,实时音视频流转发,数据流转发等:Step S360: The "media server" applies to the "third-party service component" for gesture translation, synthesis and forwarding services, voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, and synthesis services , real-time audio and video stream forwarding, data stream forwarding, etc.:
步骤S361~S362:“媒体服务器”向UE A发送手势转换成的实时语音流、含有视频和文字合成的视频流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;Steps S361-S362: The "media server" sends to UE A the real-time voice stream converted from gestures, and the media stream information including the video stream synthesized by video and text; To the SBC/PCSCF and then to the terminal; it can also be that the "media server" passes through the "service control node" and "application control node" to the SBC/PCSCF and then to the terminal;
步骤S363:“媒体服务器”向“第三方服务组件”申请手势流生成翻译合成转发服务;Step S363: The "media server" applies to the "third-party service component" for gesture stream generation, translation, synthesis and forwarding services;
步骤S364~S367:“媒体服务器”向UE B发送手势流、语音流、文字流、视频流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;Steps S364-S367: The "media server" sends the media stream information of gesture stream, voice stream, text stream, and video stream to UE B; the media stream information can be sent from the "media server" to the SBC/PCSCF via the "application control node". to the terminal; it can also be that the "media server" passes through the "service control node" and "application control node" to the SBC/PCSCF and then to the terminal;
步骤S368:“媒体服务器”向“服务控制节点”返回手势识别手势流文字流语音流等操作响 应。Step S368: The "media server" returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
具体实施例三:手势使用者(终端类型2,无专用数据通道)和非手势使用者(终端类型1,有专用数据通道)视频通话:Specific embodiment three: Gesture user (terminal type 2, without dedicated data channel) and non-gesture user (terminal type 1, with dedicated data channel) video call:
图6是根据本公开具体实施例的手势通信方法示例图三,如图6所示,本实施例以手势使用者UE A(终端类型2,无专用数据通道)和非手势使用者UE B(终端类型1,有专用数据通道)进行视频通话为例进行说明:Fig. 6 is a third example of a gesture communication method according to a specific embodiment of the present disclosure. As shown in Fig. 6, in this embodiment, a gesture user UE A (terminal type 2, no dedicated data channel) and a non-gesture user UE B ( Terminal type 1, with a dedicated data channel) to make a video call as an example:
步骤S401:终端(类型2)的手势使用者UE A携带终端标识向SBC/P-CSCF发起视频呼叫,呼叫非手势使用者UE B;Inivite中携带终端音视频video和audio的SDP相关信息;Step S401: The gesture user UE A of the terminal (type 2) carries the terminal identifier to initiate a video call to the SBC/P-CSCF, and calls the non-gesture user UE B; Inivite carries the SDP related information of the terminal audio and video video and audio;
步骤S402:SBC/P-CSCF透传Invite呼叫信息到I/S-CSCF;Step S402: SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
步骤S403:I/S-CSCF找到对应用户的服务控制节点,向其发送呼叫信息;Step S403: The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
步骤S404~S406:视频呼叫到终端(类型1)的非手势使用者UE B;Steps S404-S406: make a video call to the non-gesture user UE B of the terminal (type 1);
步骤S407~S418:UE B用户发送200OK消息携带终端标识,摘机应答;UE A返回ACK消息;UE A和UE B建立视频通话;Steps S407-S418: UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish a video call;
步骤S419~S429:UE B申请创建数据通道资源;UE B用户需要手势识别,发送携带专用数据通道SDP数据通道的Invite请求,经过SBC/P-CSCF,I/S-CSCF,达到“服务控制节点”;“服务控制节点”向“媒体服务器”申请创建UE B数据通道;“媒体服务器”向“服务控制节点”返回数据通道创建完成;Steps S419-S429: UE B applies to create data channel resources; UE B needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and reaches the "service control node" through SBC/P-CSCF and I/S-CSCF "; "Service Control Node" applies to the "Media Server" to create a UE B data channel; "Media Server" returns to the "Service Control Node" that the data channel creation is completed;
步骤S430:UE B通过数据通道发起手势识别转换请求;Step S430: UE B initiates a gesture recognition conversion request through the data channel;
步骤S431:“应用控制节点”指示“服务控制节点”创建手势识别资源;Step S431: the "application control node" instructs the "service control node" to create gesture recognition resources;
步骤S432:“服务控制节点”指示“媒体服务器”创建混合媒体服务,需要使用手势识别服务;Step S432: the "service control node" instructs the "media server" to create a mixed media service, which needs to use the gesture recognition service;
步骤S433:“媒体服务器”向“第三方服务组件”申请手势识别服务;Step S433: the "media server" applies to the "third-party service component" for a gesture recognition service;
步骤S434:“媒体服务器”向“服务控制节点”返回创建混合媒体服务成功;Step S434: the "media server" returns to the "service control node" the success of creating the mixed media service;
步骤S435~步骤S446:“服务控制节点”邀请UE A入会并且为UE A申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE A;UE A返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE A所需混合媒体资源;UE A的媒体锚定到媒体服务器;Steps S435 to S446: "Serving Control Node" invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node" sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
步骤S447~步骤S458:“服务控制节点”邀请UE B入会并且为UE B申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE B;UE A返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE B所需混合媒体资源;UE B的媒体锚定到媒体服务器;Steps S447 to S458: "Serving Control Node" invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node" sends a Reinvite message carrying SDP to UE B; UE A returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
步骤S459:“服务控制节点”向“媒体服务器”申请手势翻译业务种类及合成处理;Step S459: The "service control node" applies to the "media server" for gesture translation service types and synthesis processing;
步骤S460:“媒体服务器”向“第三方服务组件”申请手势翻译手势流生成合成转发服务,对终端数据的语音转文字处理,提取特征数据的手势图像识别,实时手势流生成,实时媒体流生成,合成服务,实时音视频流转发,数据流转发等;Step S460: The "media server" applies to the "third-party service component" for gesture translation, gesture stream generation, synthesis and forwarding services, speech-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, and real-time media stream generation , synthesis service, real-time audio and video stream forwarding, data stream forwarding, etc.;
步骤S461~S462:“媒体服务器”向UE A发送手势转换成的实时语音流、含有视频、文字、视频合成的视频流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;Steps S461-S462: The "media server" sends to UE A the real-time voice stream converted from gestures, media stream information containing video, text, and video streams synthesized from video; the media stream information can be the Node" to SBC/PCSCF and then to terminal; it can also be "media server" to SBC/PCSCF and then to terminal through "service control node" and "application control node";
步骤S463:“媒体服务器”向“第三方服务组件”申请手势流生成翻译合成转发服务;Step S463: The "media server" applies to the "third-party service component" for gesture stream generation, translation, synthesis and forwarding services;
步骤S464~S466:“媒体服务器”向UE B发送语音流、文字流、视频流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;Steps S464-S466: The "media server" sends the media stream information of voice stream, text stream, and video stream to UE B; the media stream information can be from the "media server" to the SBC/PCSCF through the "application control node" and then to the terminal; It can also be that the "media server" passes through the "service control node" and "application control node" to the SBC/PCSCF and then to the terminal;
步骤S467:“媒体服务器”向“服务控制节点”返回手势识别手势流文字流语音流等操作响应。Step S467: The "media server" returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
具体实施例四:手势使用者(终端类型1,有专用数据通道)和非手势使用者(终端类型1,有专用数据通道)音频通话:Specific embodiment four: Gesture user (terminal type 1, has dedicated data channel) and non-gesture user (terminal type 1, has dedicated data channel) audio communication:
图7是根据本公开具体实施例的手势通信方法示例图四,如图7所示,本实施例以使用终端(类型1)的手势使用者UE A拨打使用终端(类型1)的非手势使用者UE B,进行音频通话为例进行说明:Fig. 7 is an example of a gesture communication method according to a specific embodiment of the present disclosure Fig. 4, as shown in Fig. 7, in this embodiment, the gesture user UE A of the terminal (type 1) dials the non-gesture use of the terminal (type 1) Let UE B take an audio call as an example for illustration:
步骤S501:终端(类型1)的手势使用者UE A携带终端标识向SBC/P-CSCF发起音频呼叫,呼叫非手势使用者UE B;Inivite中携带终端音频audio的SDP相关信息;Step S501: The gesture user UE A of the terminal (type 1) carries the terminal identifier to initiate an audio call to the SBC/P-CSCF, and calls the non-gesture user UE B; Inivite carries the SDP related information of the terminal audio audio;
步骤S502:SBC/P-CSCF透传Invite呼叫信息到I/S-CSCF;Step S502: SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
步骤S503:I/S-CSCF找到对应用户的服务控制节点,向其发送呼叫信息;Step S503: The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
步骤S504~S506:音频呼叫到终端(类型1)的非手势使用者UE B;Steps S504-S506: making an audio call to the non-gesture user UE B of the terminal (type 1);
步骤S507~S518:UE B用户发送200OK消息携带终端标识,摘机应答;UE A返回ACK消息;UE A和UE B建立音频通话;Steps S507-S518: UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish an audio call;
步骤S519~S529:UE A启用手势识别应用打开摄像头,申请创建数据通道资源;UE A用户需要手势识别,发送携带专用数据通道SDP数据通道的Invite请求,经过SBC/P-CSCF,I/S-CSCF,达到“服务控制节点”;“服务控制节点”向“媒体服务器”申请创建UE A数据通道;“媒体服务器”向“服务控制节点”返回数据通道创建完成;手势识别应用将采集手势数据;Steps S519-S529: UE A activates the gesture recognition application to open the camera, and applies for the creation of data channel resources; UE A needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and passes through SBC/P-CSCF, I/S- CSCF reaches the "service control node"; the "service control node" applies to the "media server" to create a UE A data channel; the "media server" returns the data channel creation to the "service control node"; the gesture recognition application will collect gesture data;
步骤S530:UE A通过数据通道发起手势识别转换请求;Step S530: UE A initiates a gesture recognition conversion request through the data channel;
步骤S531:“应用控制节点”指示“服务控制节点”创建手势识别资源;Step S531: the "application control node" instructs the "service control node" to create gesture recognition resources;
步骤S532:“服务控制节点”指示“媒体服务器”创建混合媒体服务,需要使用手势识别服务;Step S532: the "service control node" instructs the "media server" to create a mixed media service, which needs to use the gesture recognition service;
步骤S533:“媒体服务器”向“第三方服务组件”申请手势识别服务;Step S533: The "media server" applies to the "third-party service component" for a gesture recognition service;
步骤S534:“媒体服务器”向“服务控制节点”返回创建混合媒体服务成功;Step S534: the "media server" returns to the "service control node" the success of creating the mixed media service;
步骤S535~步骤S546:“服务控制节点”邀请UE B入会并且为UE B申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE B;UE B返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE B所需混合媒体资源;UE B的媒体锚定到媒体服务器;Steps S535 to S546: "Serving Control Node" invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node" sends a Reinvite message carrying SDP to UE B; UE B returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
步骤S547~步骤S558:“服务控制节点”邀请UE A入会并且为UE A申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE A;UE A返回携带SDP信息的200OK消息。“服务控制节点”向“媒体服务器”申请UE A所需混合媒体资源;UE A的媒体锚定到媒体服务器;Steps S547 to S558: "Serving Control Node" invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node" sends a Reinvite message carrying SDP to UE A; UE A returns a 200OK message carrying SDP information. The "service control node" applies to the "media server" for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
步骤S559:“服务控制节点”向“媒体服务器”申请手势翻译业务种类及合成处理;Step S559: The "service control node" applies to the "media server" for gesture translation service types and synthesis processing;
步骤S560:“媒体服务器”向“第三方服务组件”申请对终端数据的语音转文字处理,提取 特征数据的手势图像识别,实时手势流生成,实时媒体流生成,合成服务,实时音频流转发,数据流转发等服务;Step S560: The "media server" applies to the "third-party service component" for voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, synthesis service, real-time audio stream forwarding, Services such as data flow forwarding;
步骤S561~S563:“媒体服务器”向UE A发送手势流、文字流、语音流媒体流信息;该媒体流信息可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;Steps S561-S563: The "media server" sends gesture stream, text stream, and voice stream media stream information to UE A; the media stream information can be sent from the "media server" to the SBC/ PCSCF and then to the terminal; it can also be that the "media server" passes through the "application control node" to the SBC/PCSCF and then to the terminal;
步骤S564:“媒体服务器”向“第三方服务组件”申请手势翻译流合成转发服务;Step S564: The "media server" applies to the "third-party service component" for gesture translation stream synthesis and forwarding service;
步骤S565~S566:“媒体服务器”向UE A发送语音流、文字流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;Steps S565-S566: The "media server" sends the media stream information of voice stream and text stream to UE A; the media stream information can be transmitted from the "media server" to the SBC/PCSCF via the "service control node" and "application control node" to the terminal; it can also be that the "media server" passes through the "application control node" to the SBC/PCSCF and then to the terminal;
步骤S567:“媒体服务器”向“服务控制节点”返回手势识别手势流文字流语音流等操作响应。Step S567: The "media server" returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
具体实施例五:非手势使用者(终端类型2,无专用数据通道)和手势使用者(终端类型1,有专用数据通道)音频通话Specific embodiment five: non-gesture user (terminal type 2, no dedicated data channel) and gesture user (terminal type 1, with dedicated data channel) audio communication
图8是根据本公开具体实施例的手势通信方法示例图五,如图8所示,本实施例以非手势使用者UE A(终端类型2,无专用数据通道)和手势使用者UE B(终端类型1,有专用数据通道)进行音频通话为例进行说明:Fig. 8 is a fifth example of a gesture communication method according to a specific embodiment of the present disclosure. As shown in Fig. 8 , in this embodiment, a non-gesture user UE A (terminal type 2, no dedicated data channel) and a gesture user UE B ( Terminal type 1, with a dedicated data channel) is used as an example to make an audio call:
步骤S601:终端(类型2)的非手势使用者UE A携带终端标识向SBC/P-CSCF发起音频呼叫,呼叫手势使用者UE B;Inivite中携带终端音频audio的SDP相关信息;Step S601: The non-gesture user UE A of the terminal (type 2) carries the terminal identifier to initiate an audio call to the SBC/P-CSCF, and calls the gesture user UE B; Inivite carries the SDP related information of the terminal audio audio;
步骤S602:SBC/P-CSCF透传Invite呼叫信息到I/S-CSCF;Step S602: SBC/P-CSCF transparently transmits the Invite call information to I/S-CSCF;
步骤S603:I/S-CSCF找到对应用户的服务控制节点,向其发送呼叫信息;Step S603: The I/S-CSCF finds the service control node corresponding to the user, and sends call information to it;
步骤S604~S606:音频呼叫到终端(类型1)的手势使用者UE B;Steps S604-S606: audio call to the gesture user UE B of the terminal (type 1);
步骤S607~S618:UE B用户发送200OK消息携带终端标识,摘机应答;UE A返回ACK消息;UE A和UE B建立音频通话;Steps S607-S618: UE B sends a 200OK message carrying the terminal ID, and answers by off-hook; UE A returns an ACK message; UE A and UE B establish an audio call;
步骤S619~S629:UE B启用手势识别应用,打开摄像头,申请创建数据通道资源;UE B用户需要手势识别,发送携带专用数据通道SDP数据通道的Invite请求,经过SBC/P-CSCF,I/S-CSCF,达到“服务控制节点”;“服务控制节点”向“媒体服务器”申请创建UE B数据通道;“媒体服务器”向“服务控制节点”返回数据通道创建完成;手势识别应用将采集手势数据;Steps S619-S629: UE B activates the gesture recognition application, turns on the camera, and applies for creating a data channel resource; UE B needs gesture recognition, sends an Invite request carrying a dedicated data channel SDP data channel, and passes through SBC/P-CSCF, I/S -CSCF, reach "service control node"; "service control node" applies to "media server" to create UE B data channel; "media server" returns data channel creation to "service control node"; gesture recognition application will collect gesture data ;
步骤S630:UE B通过数据通道发起手势识别转换请求;Step S630: UE B initiates a gesture recognition conversion request through the data channel;
步骤S631:“应用控制节点”指示“服务控制节点”创建手势识别资源;Step S631: the "application control node" instructs the "service control node" to create gesture recognition resources;
步骤S632:“服务控制节点”指示“媒体服务器”创建混合媒体服务,需要使用手势识别服务;Step S632: the "service control node" instructs the "media server" to create a mixed media service, which needs to use the gesture recognition service;
步骤S633:“媒体服务器”向“第三方服务组件”申请手势识别服务;Step S633: the "media server" applies to the "third-party service component" for a gesture recognition service;
步骤S634:“媒体服务器”向“服务控制节点”返回创建混合媒体服务成功;Step S634: the "media server" returns to the "service control node" the success of creating the mixed media service;
步骤S635~步骤S646:“服务控制节点”邀请UE A入会并且为UE A申请混合媒体资源;“服务控制节点”发送Reinvite携带SDP消息给UE A;UE A返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE A所需混合媒体资源;UE A的媒体锚定到媒体服务器;Steps S635 to S646: "Serving Control Node" invites UE A to join the conference and applies for mixed media resources for UE A; "Serving Control Node" sends a Reinvite message carrying SDP to UE A; UE A returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE A; the media of UE A is anchored to the media server;
步骤S647~步骤S658:“服务控制节点”邀请UE B入会并且为UE B申请混合媒体资源; “服务控制节点”发送Reinvite携带SDP消息给UE B;UE A返回携带SDP信息的200OK消息;“服务控制节点”向“媒体服务器”申请UE B所需混合媒体资源;UE B的媒体锚定到媒体服务器;Steps S647 to S658: "Serving Control Node" invites UE B to join the conference and applies for mixed media resources for UE B; "Serving Control Node" sends a Reinvite message carrying SDP to UE B; UE A returns a 200 OK message carrying SDP information; "Service The "control node" applies to the "media server" for the mixed media resources required by UE B; the media of UE B is anchored to the media server;
步骤S659:“服务控制节点”向“媒体服务器”申请手势翻译业务种类及合成处理;Step S659: The "service control node" applies to the "media server" for gesture translation service types and synthesis processing;
步骤S660:“媒体服务器”向“第三方服务组件”申请手势翻译转发服务,对终端数据的语音转文字处理,提取特征数据的手势图像识别,实时手势流生成,实时媒体流生成,合成服务,实时音频流转发,数据流转发等;Step S660: The "media server" applies to the "third-party service component" for gesture translation and forwarding services, voice-to-text processing of terminal data, gesture image recognition for feature data extraction, real-time gesture stream generation, real-time media stream generation, synthesis services, Real-time audio stream forwarding, data stream forwarding, etc.;
步骤S661:“媒体服务器”向UE A发送手势转换成的实时语音流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;Step S661: The "media server" sends to UE A the media stream information of the real-time voice stream converted from the gesture; the media stream information can be from the "media server" to the SBC/PCSCF through the "application control node" and then to the terminal; it can also be The "media server" passes through the "service control node" and "application control node" to the SBC/PCSCF and then to the terminal;
步骤S662:“媒体服务器”向“第三方服务组件”申请手势流生成翻译合成转发服务;Step S662: The "media server" applies to the "third-party service component" for gesture stream generation, translation, synthesis and forwarding services;
步骤S663~S665:“媒体服务器”向UE B发送手势流、语音流、文字流的媒体流信息;该媒体流信息可以是“媒体服务器”经过“应用控制节点”到SBC/PCSCF再到终端;也可以是“媒体服务器”经过“服务控制节点”、“应用控制节点”到SBC/PCSCF再到终端;Steps S663-S665: The "media server" sends the media stream information of gesture stream, voice stream, and text stream to UE B; the media stream information can be from the "media server" to the SBC/PCSCF through the "application control node" and then to the terminal; It can also be that the "media server" passes through the "service control node" and "application control node" to the SBC/PCSCF and then to the terminal;
步骤S666:“媒体服务器”向“服务控制节点”返回手势识别手势流文字流语音流等操作响应。Step S666: The "media server" returns operation responses such as gesture recognition, gesture stream, text stream, voice stream, etc. to the "service control node".
通过上述实施例,可实现的目的包括:1)通过利用专用数据通道,实现传递手势信息的目的;2)通过由网络侧执行手势识别,降低对终端的要求,终端只需要是具有摄像头的集采设备如普通手机,在IMS呼叫建立时可以通过手势识别应用指示,按要求对手势进行采集,采集到的手势相关信息通过专用通道进行传递,向手势识别应用服务器发起手势识别请求;3)通过平台侧提供综合服务,包括对手势进行识别、分析、合成等,并通过专用通道传递服务信息;4)支持实现手语与语音/视频的双向转换,对手语相关手势信息进行识别,分析、处理、数据合成,加工渲染后合成带有转义后的文字,手语标准视频和原语音/视频流;5)支持对不同终端类型之间通信内容的转换;平台侧通过识别不同类型的终端,对不同终端之间的信息流进行转换,实现在不同类型终端之间的手势通信的目的。支持数据通道的终端类型可以是独立的应用程序,也可以是专用的终端设备。Through the above embodiments, the goals that can be achieved include: 1) realizing the purpose of transmitting gesture information by using a dedicated data channel; 2) reducing the requirements for the terminal by performing gesture recognition on the network side, and the terminal only needs to be an integrated A collection device such as a common mobile phone can be instructed by the gesture recognition application when an IMS call is established to collect gestures as required, and the collected gesture-related information is transmitted through a dedicated channel to initiate a gesture recognition request to the gesture recognition application server; 3) through The platform side provides comprehensive services, including gesture recognition, analysis, synthesis, etc., and transmits service information through a dedicated channel; 4) supports two-way conversion between sign language and voice/video, and identifies, analyzes, processes, and Data synthesis, after processing and rendering, synthesizes escaped text, sign language standard video and original voice/video stream; 5) supports the conversion of communication content between different terminal types; The information flow between terminals is converted to realize the purpose of gesture communication between different types of terminals. The terminal type supporting the data channel can be an independent application program or a dedicated terminal device.
通过本申请实施例,可达到的效果包括:(1)实时交互,用户交流经济便捷可用性强,效果好。本系统利用5G、6G网络专用通道,通过网络侧混合媒体模式实现多种业务流同时传送,实现手势通信的系统及方法,经济便捷,体验丰富的实现手势使用者和非手势使用者之间的交流;不再依赖特性的穿戴设备;传统的依赖穿戴设备的手势识别,设备价格昂贵,只适用于一定范围内的交互,还经常存在时间、空间等限制,可用性差,不是直接自然的交互和通讯;(2)扩展性好。平台侧提供综合服务,可以对接第三方服务组件;进行服务扩展;新架构下可提供交互式、沉浸式通话;(3)安全性好。利用5G、6G网络专用通道和IMS呼叫,终端与网络之间的数据通过加密通道传输数据,防止信息泄露;(4)支持对不同终端类型之间通信内容的转换。平台侧通过识别不同类型的终端,对不同终端之间的信息流进行转换,实现在不同类型终端之间的手势通信。具体的有益效果至少包括:1)手势使用者使用终端类型1和非手势使用者(使用终端类型1或者2)视频通话时(通话可以是手势使用者拨打非手势使用者建立的视频通话,也可以是非手势使用者拨打手势使用者建立的视频通话),手势使用者或者使用终端类型1的非手势使用者都可以申请手势识别转换;手势使用者可以接收 和看到对端非手势使用者由语音转换成的标准手势流视频、文字、原语音、原视频;非手势使用者可以听到看到由手势使用者手势转换出来的语音、文字,原通话视频,其中非手势使用者使用的是终端类型1时,非手势使用者接收和看到听到的是语音流、文字流、原视频流;非手势使用者使用终端类型2时,非手势使用者接收和看到听到的是语音流、看到视频和文字合成的视频流;2)手势使用者使用终端类型2和非手势使用者(使用终端类型1)视频电话时(通话可以是手势使用者拨打非手势使用者建立的视频通话,也可以是非手势使用者拨打手势使用者建立的视频通话),非手势使用者也可以申请手势转换;手势使用者可以看到听到由非手势使用者语音转换出来的含有手势、文字、原视频合成的视频流和语音流;非手势使用者可以看到听到由手势使用者手势转换出来的语音、文字、原通话视频;3)手势使用者使用终端类型1和非手势使用者(使用终端类型1或者2)音频通话时(通话可以是手势使用者拨打非手势使用者建立的音频通话,也可以是非手势使用者拨打手势使用者建立的音频通话),手势使用者或者使用终端类型1的非手势使用者都可以申请手势识别转换;手势使用者申请手势识别转换时启用手势识别应用打开摄像头;手势使用者可以接收和看到对端非手势使用者由语音转换成的标准手势流、文字、原语音;非手势使用者可以听到看到由手势使用者手势转换出来的语音流、文字。其中非手势使用者使用的是终端类型1时,非手势用着接收和看到听到的是语音流、文字流;非手势使用者使用终端类型2时,非手势使用者接收和听到的是语音流。Through the embodiment of the present application, the achievable effects include: (1) real-time interaction, user communication is economical, convenient, usable, and effective. This system uses 5G and 6G network dedicated channels to realize simultaneous transmission of multiple business streams through network-side mixed media mode, and a system and method for realizing gesture communication, which is economical, convenient, and rich in experience. Communication; wearable devices that no longer rely on features; traditional gesture recognition that relies on wearable devices is expensive and only suitable for interactions within a certain range. There are often time and space constraints, poor usability, and not direct and natural interaction and Communication; (2) Good scalability. The platform side provides comprehensive services, which can be connected to third-party service components; service expansion; interactive and immersive calls can be provided under the new architecture; (3) good security. Using 5G and 6G network dedicated channels and IMS calls, the data between the terminal and the network is transmitted through encrypted channels to prevent information leakage; (4) Support the conversion of communication content between different terminal types. The platform side converts the information flow between different terminals by identifying different types of terminals, and realizes gesture communication between different types of terminals. The specific beneficial effects include at least: 1) when the gesture user uses terminal type 1 and the non-gesture user (uses terminal type 1 or 2) to make a video call (the call can be that the gesture user dials a video call established by the non-gesture user, or It can be a non-gesture user dialing a video call established by the gesture user), and the gesture user or the non-gesture user using terminal type 1 can apply for gesture recognition conversion; the gesture user can receive and see the non-gesture user on the other end Standard gesture streaming video, text, original voice, and original video converted from voice; non-gesture users can hear and see the voice, text, and original call video converted by the gesture user's gesture, of which non-gesture users use When the terminal type is 1, what the non-gesture user receives and sees and hears is voice stream, text stream, and original video stream; when the non-gesture user uses the terminal type 2, what the non-gesture user receives and sees is voice 2) When the gesture user uses terminal type 2 and the non-gesture user (using terminal type 1) makes a video call (the call can be that the gesture user dials the video created by the non-gesture user) call, or a non-gesture user dials a video call established by the gesture user), and the non-gesture user can also apply for gesture conversion; the gesture user can see and hear the voice conversion of the non-gesture user containing gestures, text, The video stream and voice stream synthesized from the original video; the non-gesture user can see and hear the voice, text, and original call video converted by the gesture user's gesture; 3) the gesture user uses terminal type 1 and the non-gesture user ( When using terminal type 1 or 2) to make an audio call (the call can be made by a gesture user dialing an audio call established by a non-gesture user, or by a non-gesture user dialing an audio call established by a gesture user), the gesture user or the terminal type 1 non-gesture users can apply for gesture recognition conversion; when gesture users apply for gesture recognition conversion, enable gesture recognition applications and turn on the camera; gesture users can receive and see the standard gesture stream converted from speech by peer non-gesture users , text, and original voice; non-gesture users can hear and see the voice stream and text converted from the gesture of the gesture user. Among them, when non-gesture users use terminal type 1, what non-gesture users receive and hear are voice streams and text streams; when non-gesture users use terminal type 2, non-gesture users receive and hear is the voice stream.
第五代通信技术的出现,为用户提供带宽更高、时延更低、覆盖更广的移动网络,可以提供网络直播、虚拟现实、4K视频等更多的应用。5G技术将面向未来的五个主要应用场景:1)超高速场景,为未来移动宽带用户提供极速数据网络接入;2)支持大规模人群,为高人群密度地区或场合提供高质量移动宽带体验;3)随时随地最佳体验,确保用户在移动状态仍享有高质量服务;4)超可靠的实时连接,确保新应用和用户实例在时延和可靠性方面符合严格的标准;5)无处不在的物物通信,确保高效处理多样化的大量设备通信,包括机器类设备和传感器等。The emergence of the fifth-generation communication technology provides users with mobile networks with higher bandwidth, lower latency, and wider coverage, and can provide more applications such as webcasting, virtual reality, and 4K video. 5G technology will face five main application scenarios in the future: 1) Ultra-high-speed scenarios, providing ultra-fast data network access for future mobile broadband users; 2) Supporting large-scale crowds, providing high-quality mobile broadband experience for areas or occasions with high population density ;3) The best experience anytime and anywhere, ensuring that users can still enjoy high-quality services in the mobile state; 4) Ultra-reliable real-time connection, ensuring that new applications and user instances meet strict standards in terms of delay and reliability; 5) Nowhere The non-existent object communication ensures efficient handling of a large number of diverse device communications, including machine-type devices and sensors.
以上应用对于5G网络中的通信系统的提出了更高的要求。3GPP(Third Generation Partnership Project,第三代合作伙伴计划)R16引入了IMS(IP Multimedia Subsystem,网际协议多媒体子系统)数据通道机制(Data Chanel),利用5G网络高带宽、低时延的特性,可以在音视频基础上,为用户额外提供图片、文字、位置、名片、动作、表情、动画等信息,可以提供高清、可视、新型交互和沉浸式业务体验。The above applications put forward higher requirements for the communication system in the 5G network. 3GPP (Third Generation Partnership Project, Third Generation Partnership Project) R16 introduced the IMS (IP Multimedia Subsystem, Internet Protocol Multimedia Subsystem) data channel mechanism (Data Chanel), using the characteristics of high bandwidth and low delay of 5G network, it can On the basis of audio and video, it provides users with additional information such as pictures, texts, locations, business cards, actions, expressions, animations, etc., and can provide high-definition, visual, new interactive and immersive business experience.
在本申请实施例中,提供了通过使用专用数据通道,利用混合媒体方式实现手势通信的系统及方法,可应用于5G、6G网络中;可避免相关技术中进行手势识别或手势翻译存在的以下问题:1)较多已实现的由终端侧使用用特定穿戴设备提供采集功能,这些设备价格昂贵,只适用于一定范围内的交互,存在时间、空间等限制不够经济、便捷,可用性差,不是直接自然的交互和通讯;2)有部分由终端侧提供手势识别、翻译、合成等系统功能,对终端要求高;没有利用网络侧提供手势识别、翻译、合成,信息更新不及时;3)不能实现不同终端类型之间的转换;4)部分技术要求通信双方必须在视频通话中才能实现手势通信,要求平台侧需要把手势内容打包后回传回终端,由终端发送给另一侧终端;无法实现用户在语音通话过程中的手势通信。In the embodiment of this application, a system and method for realizing gesture communication by using a dedicated data channel and using a mixed media method are provided, which can be applied to 5G and 6G networks; the following problems of gesture recognition or gesture translation in related technologies can be avoided Problems: 1) Most of the collection functions that have been realized are provided by specific wearable devices used on the terminal side. These devices are expensive and only suitable for interactions within a certain range. There are time and space constraints that are not economical, convenient, and usable. Direct and natural interaction and communication; 2) Some system functions such as gesture recognition, translation, and synthesis are provided by the terminal side, which has high requirements for the terminal; gesture recognition, translation, and synthesis are not provided by the network side, and information updates are not timely; 3) Cannot Realize the conversion between different terminal types; 4) Some technologies require that the communication parties must be in a video call to realize gesture communication, and the platform side needs to package the gesture content and send it back to the terminal, and the terminal sends it to the terminal on the other side; Realize the user's gesture communication during the voice call.
现对本申请实施例中涉及到的用户界面简述如下:在音频通话时终端可以通过终端侧“手 势识别应用”打开摄像头;终端在通话过程中,可以查询到包含手势识别功能菜单,可以发起手势识别请求;终端接收数据通道发送的视频、手势、文字信息,本端手机上同步呈现这些内容。The user interface involved in the embodiment of this application is briefly described as follows: during an audio call, the terminal can turn on the camera through the "gesture recognition application" on the terminal side; during the call, the terminal can query the menu containing the gesture recognition function, and can initiate a gesture Recognition request; the terminal receives the video, gesture, and text information sent by the data channel, and these contents are displayed synchronously on the local mobile phone.
在本实施例中还提供了一种手势通信装置,图9是根据本公开实施例的手势通信装置的结构框图,如图9所示,该装置包括:In this embodiment, a gesture communication device is also provided. FIG. 9 is a structural block diagram of a gesture communication device according to an embodiment of the present disclosure. As shown in FIG. 9 , the device includes:
第一获取模块902,设置为在第一终端和第二终端进行视频通话或音频通话时,获取所述第一终端或所述第二终端发送的第一请求,其中,所述第一请求用于请求对在所述视频通话或音频通话中所述第一终端采集到的手势进行识别;The first acquiring module 902 is configured to acquire a first request sent by the first terminal or the second terminal when the first terminal and the second terminal make a video call or an audio call, wherein the first request uses requesting to recognize gestures collected by the first terminal during the video call or audio call;
第一创建模块904,设置为响应于所述第一请求,创建手势识别服务,其中,所述手势识别服务用于对所述第一终端采集到的手势进行识别;The first creation module 904 is configured to create a gesture recognition service in response to the first request, where the gesture recognition service is used to recognize the gesture collected by the first terminal;
第二获取模块906,设置为在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势;The second obtaining module 906 is configured to obtain a group of gestures identified in a group of video frames collected by the first terminal during the video call or audio call;
识别模块908,设置为通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义;The recognition module 908 is configured to perform semantic recognition on a group of gestures recognized in a group of video frames collected by the first terminal through the gesture recognition service, and obtain the target semantics represented by the group of gestures;
第一发送模块910,设置为将所述目标语义发送给所述第二终端。The first sending module 910 is configured to send the target semantics to the second terminal.
在一个可选的实施例中,上述装置还包括:第三获取模块1002,第二创建模块1004,如图10所示,图10是根据本公开实施例的手势通信装置的优选的结构框图一,其中,第三获取模块1002,设置为获取所述第一终端或所述第二终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;第二创建模块1004,设置为响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道为所述第一终端或所述第二终端允许使用的通道;上述第一获取模块902包括:第一获取单元,设置为获取所述第一终端或所述第二终端在所述目标数据通道上传输的所述第一请求。In an optional embodiment, the above-mentioned device further includes: a third obtaining module 1002 and a second creating module 1004, as shown in FIG. 10 , which is a preferred structural block diagram of a gesture communication device according to an embodiment of the present disclosure. , wherein the third obtaining module 1002 is configured to obtain a second request sent by the first terminal or the second terminal, wherein the second request is used to request to create a target data channel; the second creating module 1004, It is configured to create the target data channel in response to the second request, where the target data channel is a channel allowed to be used by the first terminal or the second terminal; the first obtaining module 902 includes: An acquiring unit configured to acquire the first request transmitted by the first terminal or the second terminal on the target data channel.
在一个可选的实施例中,上述第三获取模块1002包括:第二获取单元,设置为获取所述第一终端或所述第二终端通过接入控制实体SBC/P-CSCF、会话控制实体I/S-CSCF以及服务控制节点向媒体服务器发送的所述第二请求;上述第二创建模块1004包括:第一创建单元,设置为响应于所述第二请求,通过所述媒体服务器创建所述目标数据通道,其中,所述目标数据通道用于在所述第一终端或所述第二终端与所述媒体服务器之间传输数据。In an optional embodiment, the third obtaining module 1002 includes: a second obtaining unit configured to obtain the first terminal or the second terminal through the access control entity SBC/P-CSCF, session control entity The second request sent by the I/S-CSCF and the service control node to the media server; the above-mentioned second creation module 1004 includes: a first creation unit configured to respond to the second request and create the media server through the media server The target data channel, wherein the target data channel is used to transmit data between the first terminal or the second terminal and the media server.
在一个可选的实施例中,上述第一获取单元包括:第一获取子单元,设置为获取所述第一终端或所述第二终端在所述目标数据通道上向应用控制节点传输的所述第一请求;上述第一创建模块904包括:第一处理单元,设置为由所述应用控制节点向所述服务控制节点发出第一指令,其中,所述第一指令用于指示所述服务控制节点向所述媒体服务器发出第二指令,所述第二指令用于指示所述媒体服务器创建所述手势识别服务;第二创建单元,设置为响应于所述第二指令,通过所述媒体服务器创建所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述手势识别服务。In an optional embodiment, the above-mentioned first acquiring unit includes: a first acquiring subunit configured to acquire all information transmitted by the first terminal or the second terminal to the application control node on the target data channel the first request; the first creation module 904 includes: a first processing unit configured to send a first instruction to the service control node by the application control node, wherein the first instruction is used to indicate the service The control node sends a second instruction to the media server, the second instruction is used to instruct the media server to create the gesture recognition service; the second creation unit is configured to respond to the second instruction, through the media The server creates the gesture recognition service, or instructs a third-party service component to create the gesture recognition service through the media server.
在一个可选的实施例中,上述装置还包括:第二发送模块1102,第三创建模块1104,如图11所示,图11是根据本公开实施例的手势通信装置的优选的结构框图二,其中,第二发送模块1102,设置为通过服务控制节点向媒体服务器发送第三指令,其中,所述第三指令用于请求创建混合媒体服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,或者用于对所述音频通话中的音频流和数据流进行处理,所述数据流是表示 所述目标语义的数据流;第三创建模块1104,设置为响应于所述第三指令,通过所述媒体服务器创建所述混合媒体服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务。In an optional embodiment, the above-mentioned device further includes: a second sending module 1102 and a third creating module 1104, as shown in FIG. 11 , which is a preferred structural block diagram of a gesture communication device according to an embodiment of the present disclosure. , wherein the second sending module 1102 is configured to send a third instruction to the media server through the service control node, wherein the third instruction is used to request the creation of a mixed media service, and the mixed media service is used for the video call Process the video stream, audio stream and data stream in the audio call, or process the audio stream and data stream in the audio call, the data stream is a data stream representing the target semantics; the third creation module 1104 , set to create the mixed media service through the media server in response to the third instruction, or instruct a third-party service component to create the mixed media service through the media server.
在一个可选的实施例中,上述识别模块908包括:第一识别单元,设置为通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的所述一组手势进行语义识别,得到一个或多个语义,其中,每个所述语义是所述一组手势中的一个或多个手势所表达的语义;生成单元,设置为基于所述一个或多个语义,生成与所述一组手势对应的所述目标语义。In an optional embodiment, the recognition module 908 includes: a first recognition unit configured to recognize the group of gestures in a group of video frames collected by the first terminal through the gesture recognition service performing semantic recognition to obtain one or more semantics, wherein each of the semantics is the semantics expressed by one or more gestures in the group of gestures; the generation unit is configured to be based on the one or more semantics, The target semantics corresponding to the set of gestures are generated.
在一个可选的实施例中,上述第一发送模块910包括:第一发送单元,设置为在所述目标语义是将所述一个或多个语义拼接成的语义时,将所述目标语义中包括的每个所述语义与所述一组视频帧中对应的视频帧同步发送给所述第二终端;或者,合成单元,设置为在所述目标语义是由包括与所述一组视频帧对应的数据流表示、且所述数据流为文字流和音频流时,将所述文字流与所述一组视频帧中对应的视频帧进行同步合成,得到目标视频流;第二发送单元,设置为将所述目标视频流与所述音频流同步发送给所述第二终端。In an optional embodiment, the above-mentioned first sending module 910 includes: a first sending unit configured to send the Each of the included semantics is sent to the second terminal synchronously with the corresponding video frames in the group of video frames; or, the synthesis unit is configured to be included when the target semantics is included with the group of video frames When the corresponding data stream represents and the data stream is a text stream and an audio stream, the text stream is synchronously synthesized with the corresponding video frames in the group of video frames to obtain the target video stream; the second sending unit, It is set to send the target video stream and the audio stream to the second terminal synchronously.
在一个可选的实施例中,上述装置还包括:第四获取模块,设置为在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端和所述第二终端均支持使用目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;第四创建模块,设置为响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道包括第一目标数据通道和第二目标数据通道,所述第一目标数据通道是所述第一终端与媒体服务器之间的数据通道,所述第二目标数据通道是所述第二终端与所述媒体服务器之间的数据通道;上述第一获取模块902包括:第三获取单元,设置为获取所述第一终端在所述第一目标数据通道上传输的所述第一请求;上述第一创建模块904包括:第二处理单元,设置为响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;第三创建单元,设置为通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;上述第二获取模块906包括:第四获取单元,设置为在所述视频通话中,获取所述第一终端采集到的第一组视频帧和对应的第一组音频帧,以及在所述第一组视频帧中识别出的第一组手势;上述装置还包括:第一处理模块,设置为通过所述混合媒体服务,对所述第一组视频帧形成的第一视频流、所述第一组音频帧形成的第一音频流以及用于表示所述目标语义的第一数据流进行同步处理,得到同步的所述第一视频流、所述第一音频流和所述第一数据流;上述第一发送模块910包括:第三发送单元,设置为将同步的所述第一视频流、所述第一音频流和所述第一数据流发送给所述第二终端,其中,所述同步的所述第一数据流在所述第二目标数据通道上发送。In an optional embodiment, the above device further includes: a fourth acquisition module, configured to perform the video call between the first terminal and the second terminal, and the first terminal and the second terminal When all terminals support the use of the target data channel, obtain the second request sent by the first terminal, where the second request is used to request the creation of the target data channel; the fourth creation module is configured to respond to the first Two requests, creating the target data channel, wherein the target data channel includes a first target data channel and a second target data channel, and the first target data channel is the data between the first terminal and the media server channel, the second target data channel is a data channel between the second terminal and the media server; the first acquiring module 902 includes: a third acquiring unit configured to acquire the The first request transmitted on the first target data channel; the above-mentioned first creation module 904 includes: a second processing unit configured to respond to the first request and send a target instruction to the media server through a service control node, Wherein, the target instruction is used to request to create a mixed media service and the gesture recognition service, and the mixed media service is used to process the video stream, audio stream and data stream in the video call, and the data stream is A data stream representing the target semantics; a third creating unit configured to create the mixed media service and the gesture recognition service through the media server, or, through the media server, instruct a third-party service component to create the mixed Media services and the gesture recognition service; the above-mentioned second acquisition module 906 includes: a fourth acquisition unit configured to acquire the first group of video frames and the corresponding first group of video frames collected by the first terminal during the video call. A group of audio frames, and a first group of gestures recognized in the first group of video frames; the above-mentioned device further includes: a first processing module configured to form the first group of video frames through the mixed media service The first video stream, the first audio stream formed by the first group of audio frames, and the first data stream used to represent the target semantics are synchronized to obtain the synchronized first video stream, the first audio stream and the first data stream; the first sending module 910 includes: a third sending unit configured to send the synchronized first video stream, the first audio stream and the first data stream to The second terminal, wherein the synchronized first data stream is sent on the second target data channel.
在一个可选的实施例中,上述装置还包括:第五获取模块,设置为在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端支持使用目标数据通道和所述第二终端不支持使用所述目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;第五创建模块,设置为响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第一终端与媒体服务器之间的数据通道;上述第 一获取模块902包括:第五获取单元,设置为获取所述第一终端在所述目标数据通道上传输的所述第一请求;上述第一创建模块904包括:第三处理单元,设置为响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务、合成服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;第四创建单元,设置为通过所述媒体服务器创建所述混合媒体服务、所述合成服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务、所述合成服务和所述手势识别服务;上述第二获取模块906包括:第六获取单元,设置为在所述视频通话中,获取所述第一终端采集到的第二组视频帧和对应的第二组音频帧,以及在所述第二组视频帧中识别出的第二组手势;上述装置还包括:第二处理模块,设置为通过所述合成服务,将用于表示所述目标语义的第一文字流与所述第二组视频帧形成的视频流进行合成处理,得到第二视频流,通过所述混合媒体服务,将用于表示所述目标语义的数据流中包括的第二音频流与所述第二视频流进行同步处理,得到同步的所述第二视频流和所述第二音频流,其中,所述数据流包括所述第一文字流;上述第一发送模块910包括:第四发送单元,设置为将同步的所述第二视频流、所述第二音频流发送给所述第二终端。In an optional embodiment, the above device further includes: a fifth acquisition module, configured to perform the video call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and when the second terminal does not support the use of the target data channel, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel; the fifth creating module, It is configured to create the target data channel in response to the second request, wherein the target data channel is a data channel between the first terminal and the media server; the first obtaining module 902 includes: a fifth obtaining A unit configured to obtain the first request transmitted by the first terminal on the target data channel; the above-mentioned first creation module 904 includes: a third processing unit configured to respond to the first request through a service The control node sends a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service, a composition service and the gesture recognition service, and the mixed media service is used for the video stream in the video call , audio stream and data stream are processed, and the data stream is a data stream representing the target semantics; a fourth creation unit is configured to create the mixed media service, the synthesis service and the gesture through the media server recognition service, or, through the media server, instruct a third-party service component to create the mixed media service, the synthesis service, and the gesture recognition service; the above-mentioned second acquisition module 906 includes: a sixth acquisition unit, configured to In the above video call, obtain the second group of video frames and the corresponding second group of audio frames collected by the first terminal, and the second group of gestures recognized in the second group of video frames; the above device also includes : The second processing module is configured to synthesize the first text stream used to represent the target semantics and the video stream formed by the second group of video frames through the synthesis service to obtain a second video stream. In the mixed media service, the second audio stream included in the data stream used to represent the target semantics is synchronized with the second video stream to obtain the synchronized second video stream and the second audio stream , wherein, the data stream includes the first text stream; the first sending module 910 includes: a fourth sending unit configured to send the synchronized second video stream and the second audio stream to the first Two terminals.
在一个可选的实施例中,上述装置还包括:第六获取模块,设置为在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端不支持使用目标数据通道和所述第二终端支持使用所述目标数据通道的情况下,获取所述第二终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;第六创建模块,设置为响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第二终端与媒体服务器之间的数据通道;上述第一获取模块902包括:第七获取单元,设置为获取所述第二终端在所述目标数据通道上传输的所述第一请求;上述第一创建模块904包括:第四处理单元,设置为响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;第五创建单元,设置为通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;上述第二获取模块906包括:第八获取单元,设置为在所述视频通话中,获取所述第一终端采集到的第三组视频帧和对应的第三组音频帧,以及在所述第三组视频帧中识别出的第三组手势;上述装置还包括:第三处理模块,设置为通过所述混合媒体服务,对所述第三组视频帧形成的第三视频流、所述第三组音频帧形成的第三音频流以及用于表示所述目标语义的第三数据流进行同步处理,得到同步的所述第三视频流、所述第三音频流和所述第三数据流;上述第一发送模块910包括:第五发送单元,设置为将同步的所述第三视频流、所述第三音频流和所述第三数据流发送给所述第二终端,其中,所述同步的所述第三数据流在所述目标数据通道上发送。In an optional embodiment, the above device further includes: a sixth acquisition module, configured to conduct the video call between the first terminal and the second terminal, and the first terminal does not support the use of target data When the channel and the second terminal support the use of the target data channel, obtain a second request sent by the second terminal, where the second request is used to request to create a target data channel; the sixth creation module, It is configured to create the target data channel in response to the second request, wherein the target data channel is a data channel between the second terminal and the media server; the above-mentioned first obtaining module 902 includes: a seventh obtaining A unit configured to obtain the first request transmitted by the second terminal on the target data channel; the above-mentioned first creation module 904 includes: a fourth processing unit configured to respond to the first request through a service The control node sends a target instruction to the media server, wherein the target instruction is used to request the creation of a mixed media service and the gesture recognition service, and the mixed media service is used for the video stream and audio stream in the video call and a data stream, the data stream is a data stream representing the target semantics; the fifth creating unit is configured to create the mixed media service and the gesture recognition service through the media server, or, through the The media server instructs the third-party service component to create the mixed media service and the gesture recognition service; the above-mentioned second acquisition module 906 includes: an eighth acquisition unit, configured to acquire the information collected by the first terminal during the video call The third group of video frames and the corresponding third group of audio frames, and the third group of gestures identified in the third group of video frames; the above-mentioned device also includes: a third processing module, configured to pass the mixed media service, performing synchronous processing on the third video stream formed by the third group of video frames, the third audio stream formed by the third group of audio frames, and the third data stream used to represent the target semantics, to obtain a synchronized The third video stream, the third audio stream, and the third data stream; the first sending module 910 includes: a fifth sending unit configured to send the synchronized third video stream, the third sending the audio stream and the third data stream to the second terminal, wherein the synchronized third data stream is sent on the target data channel.
在一个可选的实施例中,上述装置还包括:第七获取模块,设置为在所述第一终端和所述第二终端进行所述音频通话、且所述第一终端和所述第二终端均支持使用目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;第七创建模块,设置为响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道包括第一目标数据通道和第二目标数据通道,所述第一目标数据通道是所述第一终 端与媒体服务器之间的数据通道,所述第二目标数据通道是所述第二终端与所述媒体服务器之间的数据通道;上述第一获取模块902包括:第九获取单元,设置为获取所述第一终端在所述第一目标数据通道上传输的所述第一请求;上述第一创建模块904包括:第五处理单元,设置为响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述音频通话中的音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;第六创建单元,设置为通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;上述第二获取模块906包括:第十获取单元,设置为在所述音频通话中,获取所述第一终端采集到的第四组视频帧和对应的第四组音频帧,以及在所述第四组视频帧中识别出的第四组手势;上述装置还包括:第四处理模块,设置为通过所述混合媒体服务,对用于表示所述目标语义的第二文字流和所述第四组音频帧形成的第四组音频流进行同步处理,得到同步的所述第二文字流和第四音频流,其中,所述数据流包括所述第二文字流;上述第一发送模块910包括:第六发送单元,设置为将同步的所述第二文字流和所述第四音频流发送给所述第二终端,其中,所述同步的所述第二文字流在所述第二目标数据通道上发送。In an optional embodiment, the above device further includes: a seventh acquisition module, configured to conduct the audio call between the first terminal and the second terminal, and the first terminal and the second terminal When all the terminals support the use of the target data channel, obtain the second request sent by the first terminal, where the second request is used to request the creation of the target data channel; the seventh creation module is configured to respond to the first Two requests, creating the target data channel, wherein the target data channel includes a first target data channel and a second target data channel, and the first target data channel is the data between the first terminal and the media server channel, the second target data channel is a data channel between the second terminal and the media server; the first acquisition module 902 includes: a ninth acquisition unit configured to acquire the The first request transmitted on the first target data channel; the above-mentioned first creation module 904 includes: a fifth processing unit configured to respond to the first request and send a target instruction to the media server through a service control node, Wherein, the target instruction is used to request to create a mixed media service and the gesture recognition service, and the mixed media service is used to process the audio stream and data stream in the audio call, and the data stream represents the The data flow of the target semantics; the sixth creating unit is configured to create the mixed media service and the gesture recognition service through the media server, or instruct a third-party service component to create the mixed media service and the gesture recognition service through the media server The gesture recognition service; the second acquisition module 906 includes: a tenth acquisition unit configured to acquire the fourth group of video frames and the corresponding fourth group of audio frames collected by the first terminal during the audio call , and a fourth group of gestures identified in the fourth group of video frames; the above device further includes: a fourth processing module, configured to, through the mixed media service, perform a second text used to represent the target semantics stream and the fourth group of audio streams formed by the fourth group of audio frames are synchronously processed to obtain the synchronized second text stream and fourth audio stream, wherein the data stream includes the second text stream; the above The first sending module 910 includes: a sixth sending unit configured to send the synchronized second text stream and the fourth audio stream to the second terminal, wherein the synchronized second text stream Send on the second target data channel.
在一个可选的实施例中,上述装置还包括:第八获取模块,设置为在所述第一终端和所述第二终端进行所述音频通话、且所述第一终端支持使用目标数据通道和所述第二终端不支持使用所述目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;第八创建模块,设置为响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第一终端与媒体服务器之间的数据通道;上述第一获取模块902包括:第十一获取单元,设置为获取所述第一终端在所述目标数据通道上传输的所述第一请求;上述第一创建模块904包括:第六处理单元,设置为响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建所述手势识别服务;第七创建单元,设置为通过所述媒体服务器创建所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述手势识别服务;上述第二获取模块906包括:第十二获取单元,设置为在所述音频通话中,获取所述第一终端采集到的第五组视频帧和对应的第五组音频帧,以及在所述第五组视频帧中识别出的第五组手势;上述第一发送模块910包括:第七发送单元,设置为将用于表示所述目标语义的第五音频流发送给所述第二终端。In an optional embodiment, the above device further includes: an eighth acquisition module, configured to conduct the audio call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and when the second terminal does not support the use of the target data channel, obtain a second request sent by the first terminal, where the second request is used to request to create a target data channel; an eighth creating module, It is configured to create the target data channel in response to the second request, wherein the target data channel is a data channel between the first terminal and the media server; the first obtaining module 902 includes: eleventh An acquisition unit configured to acquire the first request transmitted by the first terminal on the target data channel; the first creation module 904 includes: a sixth processing unit configured to respond to the first request by The service control node sends a target instruction to the media server, where the target instruction is used to request creation of the gesture recognition service; a seventh creation unit is configured to create the gesture recognition service through the media server, or, through The media server instructs the third-party service component to create the gesture recognition service; the second acquiring module 906 includes: a twelfth acquiring unit configured to acquire the fifth gesture collected by the first terminal during the audio call. A group of video frames and a corresponding fifth group of audio frames, and a fifth group of gestures recognized in the fifth group of video frames; the above-mentioned first sending module 910 includes: a seventh sending unit, configured to represent the Send the fifth audio stream of the target semantics to the second terminal.
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。It should be noted that the above-mentioned modules can be realized by software or hardware. For the latter, it can be realized by the following methods, but not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules can be combined in any combination The forms of are located in different processors.
本公开的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。Embodiments of the present disclosure also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.
在一个示例性实施例中,上述计算机可读存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。In an exemplary embodiment, the above-mentioned computer-readable storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。Embodiments of the present disclosure also provide an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to execute the steps in any one of the above method embodiments.
在一个示例性实施例中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。In an exemplary embodiment, the electronic device may further include a transmission device and an input and output device, wherein the transmission device is connected to the processor, and the input and output device is connected to the processor.
本实施例中的具体示例可以参考上述实施例及示例性实施方式中所描述的示例,本实施例在此不再赘述。For specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and exemplary implementation manners, and details will not be repeated here in this embodiment.
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned disclosure can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices In fact, they can be implemented in program code executable by a computing device, and thus, they can be stored in a storage device to be executed by a computing device, and in some cases, can be executed in an order different from that shown here. Or described steps, or they are fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present disclosure is not limited to any specific combination of hardware and software.
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the principle of the present disclosure shall be included in the protection scope of the present disclosure.

Claims (15)

  1. 一种手势通信方法,包括:A gesture communication method comprising:
    在第一终端和第二终端进行视频通话或音频通话时,获取所述第一终端或所述第二终端发送的第一请求,其中,所述第一请求用于请求创建手势识别服务,其中,所述手势识别服务用于对所述第一终端采集到的视频帧中识别出的手势进行语义识别;When the first terminal and the second terminal make a video call or an audio call, obtain a first request sent by the first terminal or the second terminal, where the first request is used to request creation of a gesture recognition service, where , the gesture recognition service is used to perform semantic recognition on the gestures recognized in the video frames collected by the first terminal;
    响应于所述第一请求,创建所述手势识别服务;creating the gesture recognition service in response to the first request;
    在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势;During the video call or audio call, acquiring a group of gestures identified in a group of video frames collected by the first terminal;
    通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义;Perform semantic recognition on a group of gestures identified in a group of video frames collected by the first terminal through the gesture recognition service, to obtain the target semantics represented by the group of gestures;
    将所述目标语义发送给所述第二终端。Send the target semantics to the second terminal.
  2. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述方法还包括:获取所述第一终端或所述第二终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道为所述第一终端或所述第二终端允许使用的通道;The method further includes: acquiring a second request sent by the first terminal or the second terminal, wherein the second request is used to request creation of a target data channel; in response to the second request, creating the A target data channel, wherein the target data channel is a channel allowed to be used by the first terminal or the second terminal;
    所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端或所述第二终端在所述目标数据通道上传输的所述第一请求。The acquiring the first request sent by the first terminal or the second terminal includes: acquiring the first request transmitted by the first terminal or the second terminal on the target data channel.
  3. 根据权利要求2所述的方法,其中,The method of claim 2, wherein,
    所述获取所述第一终端或所述第二终端发送的第二请求,包括:获取所述第一终端或所述第二终端通过接入控制实体SBC/P-CSCF、会话控制实体I/S-CSCF以及服务控制节点向媒体服务器发送的所述第二请求;The obtaining the second request sent by the first terminal or the second terminal includes: obtaining the first terminal or the second terminal through the access control entity SBC/P-CSCF, the session control entity I/ The second request sent by the S-CSCF and the serving control node to the media server;
    所述响应于所述第二请求,创建所述目标数据通道,包括:响应于所述第二请求,通过所述媒体服务器创建所述目标数据通道,其中,所述目标数据通道用于在所述第一终端或所述第二终端与所述媒体服务器之间传输数据。The creating the target data channel in response to the second request includes: creating the target data channel through the media server in response to the second request, wherein the target data channel is used in the transmitting data between the first terminal or the second terminal and the media server.
  4. 根据权利要求3所述的方法,其中,The method according to claim 3, wherein,
    所述获取所述第一终端或所述第二终端在所述目标数据通道上传输的所述第一请求,包括:获取所述第一终端或所述第二终端在所述目标数据通道上向应用控制节点传输的所述第一请求;The acquiring the first request transmitted by the first terminal or the second terminal on the target data channel includes: acquiring the first request transmitted by the first terminal or the second terminal on the target data channel said first request transmitted to an application control node;
    所述响应于所述第一请求,创建所述手势识别服务,包括:由所述应用控制节点向所述服务控制节点发出第一指令,其中,所述第一指令用于指示所述服务控制节点向所述媒体服务器发出第二指令,所述第二指令用于指示所述媒体服务器创建所述手势识别服务;响应于所述第二指令,通过所述媒体服务器创建所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述手势识别服务。The creating the gesture recognition service in response to the first request includes: the application control node sends a first instruction to the service control node, wherein the first instruction is used to instruct the service control The node sends a second instruction to the media server, the second instruction is used to instruct the media server to create the gesture recognition service; in response to the second instruction, the gesture recognition service is created by the media server, Alternatively, the media server instructs a third-party service component to create the gesture recognition service.
  5. 根据权利要求1所述的方法,其中,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    通过服务控制节点向媒体服务器发送第三指令,其中,所述第三指令用于请求创建混合媒体服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,或者用于对所述音频通话中的音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;响应于所述第三指令,通过所述媒体服务器创建所述混合媒体服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务。A third instruction is sent to the media server through the service control node, wherein the third instruction is used to request the creation of a mixed media service, and the mixed media service is used to perform the video stream, audio stream and data stream in the video call Processing, or for processing the audio stream and data stream in the audio call, the data stream is a data stream representing the target semantics; in response to the third instruction, create the Mixed media service, or, the media server instructs a third-party service component to create the mixed media service.
  6. 根据权利要求1所述的方法,其中,所述通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义,包括:The method according to claim 1, wherein the semantic recognition is performed on a group of gestures recognized in a group of video frames collected by the first terminal through the gesture recognition service, and the gestures identified by the group of gestures are obtained. The target semantics of the representation, including:
    通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的所述一组手势进行语义识别,得到一个或多个语义,其中,每个所述语义是所述一组手势中的一个或多个手势所表达的语义;Through the gesture recognition service, semantic recognition is performed on the group of gestures identified in a group of video frames collected by the first terminal to obtain one or more semantic meanings, wherein each of the semantic meanings is the one The semantics expressed by one or more gestures in the group gesture;
    基于所述一个或多个语义,生成与所述一组手势对应的所述目标语义。Based on the one or more semantics, the target semantics corresponding to the set of gestures are generated.
  7. 根据权利要求6所述的方法,其中,所述将所述目标语义发送给所述第二终端,包括:The method according to claim 6, wherein the sending the target semantics to the second terminal comprises:
    在所述目标语义是将所述一个或多个语义拼接成的语义时,将所述目标语义中包括的每个所述语义与所述一组视频帧中对应的视频帧同步发送给所述第二终端;或者,When the target semantics is the semantics spliced by the one or more semantics, each of the semantics included in the target semantics is synchronously sent to the corresponding video frame in the group of video frames a second terminal; or,
    在所述目标语义是由包括与所述一组视频帧对应的数据流表示、且所述数据流为文字流和音频流时,将所述文字流与所述一组视频帧中对应的视频帧进行同步合成,得到目标视频流;将所述目标视频流与所述音频流同步发送给所述第二终端。When the target semantics is represented by a data stream corresponding to the group of video frames, and the data stream is a text stream and an audio stream, the text stream and the corresponding video in the group of video frames The frames are synthesized synchronously to obtain a target video stream; and the target video stream and the audio stream are synchronously sent to the second terminal.
  8. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述方法还包括:在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端和所述第二终端均支持使用目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道包括第一目标数据通道和第二目标数据通道,所述第一目标数据通道是所述第一终端与媒体服务器之间的数据通道,所述第二目标数据通道是所述第二终端与所述媒体服务器之间的数据通道;The method further includes: when the first terminal and the second terminal conduct the video call, and both the first terminal and the second terminal support the use of the target data channel, acquiring the second A second request sent by a terminal, wherein the second request is used to request creation of a target data channel; in response to the second request, the target data channel is created, wherein the target data channel includes the first target data channel and a second target data channel, the first target data channel is a data channel between the first terminal and the media server, and the second target data channel is a data channel between the second terminal and the media server data channel;
    所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述第一目标数据通道上传输的所述第一请求;The obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first request transmitted by the first terminal on the first target data channel;
    所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;The creating the gesture recognition service in response to the first request includes: in response to the first request, sending a target instruction to the media server through a service control node, wherein the target instruction is used to request creation Mixed media service and the gesture recognition service, the mixed media service is used to process the video stream, audio stream and data stream in the video call, the data stream is a data stream representing the target semantics; through The media server creates the mixed media service and the gesture recognition service, or instructs a third-party service component to create the mixed media service and the gesture recognition service through the media server;
    在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组 手势,包括:在所述视频通话中,获取所述第一终端采集到的第一组视频帧和对应的第一组音频帧,以及在所述第一组视频帧中识别出的第一组手势;In the video call or audio call, obtaining a group of gestures recognized in a group of video frames collected by the first terminal includes: in the video call, obtaining the first gestures collected by the first terminal a set of video frames and a corresponding first set of audio frames, and a first set of gestures identified in the first set of video frames;
    在得到所述目标语义之后,所述方法还包括:通过所述混合媒体服务,对所述第一组视频帧形成的第一视频流、所述第一组音频帧形成的第一音频流以及用于表示所述目标语义的第一数据流进行同步处理,得到同步的所述第一视频流、所述第一音频流和所述第一数据流;After obtaining the target semantics, the method further includes: using the mixed media service, the first video stream formed by the first group of video frames, the first audio stream formed by the first group of audio frames, and performing synchronous processing on the first data stream used to represent the target semantics to obtain the synchronized first video stream, the first audio stream, and the first data stream;
    所述将所述目标语义发送给所述第二终端,包括:将同步的所述第一视频流、所述第一音频流和所述第一数据流发送给所述第二终端,其中,所述同步的所述第一数据流在所述第二目标数据通道上发送。The sending the target semantics to the second terminal includes: sending the synchronized first video stream, the first audio stream, and the first data stream to the second terminal, wherein, The synchronized first data stream is sent on the second target data channel.
  9. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述方法还包括:在所述第一终端和所述第二终端进行所述视频通话、且所述第一终端支持使用目标数据通道和所述第二终端不支持使用所述目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第一终端与媒体服务器之间的数据通道;The method further includes: making the video call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel In this case, obtain the second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the target data channel, where the target The data channel is a data channel between the first terminal and the media server;
    所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述目标数据通道上传输的所述第一请求;The obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first request transmitted by the first terminal on the target data channel;
    所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务、合成服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务、所述合成服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务、所述合成服务和所述手势识别服务;The creating the gesture recognition service in response to the first request includes: in response to the first request, sending a target instruction to the media server through a service control node, wherein the target instruction is used to request creation Mixed media service, composition service and the gesture recognition service, the mixed media service is used to process the video stream, audio stream and data stream in the video call, the data stream is data representing the target semantics stream; create the mixed media service, the composite service and the gesture recognition service through the media server, or instruct a third-party service component to create the mixed media service, the composite service and the gesture recognition service through the media server Gesture recognition services described above;
    在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述视频通话中,获取所述第一终端采集到的第二组视频帧和对应的第二组音频帧,以及在所述第二组视频帧中识别出的第二组手势;In the video call or audio call, obtaining a group of gestures recognized in a group of video frames collected by the first terminal includes: obtaining a second gesture collected by the first terminal in the video call a set of video frames and a corresponding second set of audio frames, and a second set of gestures identified in the second set of video frames;
    在得到所述目标语义之后,所述方法还包括:通过所述合成服务,将用于表示所述目标语义的第一文字流与所述第二组视频帧形成的视频流进行合成处理,得到第二视频流,通过所述混合媒体服务,将用于表示所述目标语义的数据流中包括的第二音频流与所述第二视频流进行同步处理,得到同步的所述第二视频流和所述第二音频流,其中,所述数据流包括所述第一文字流;After the target semantics is obtained, the method further includes: using the synthesis service, synthesizing the first text stream used to represent the target semantics and the video stream formed by the second group of video frames to obtain the second Two video streams, using the mixed media service, synchronizing the second audio stream included in the data stream used to represent the target semantics and the second video stream to obtain the synchronized second video stream and the second video stream The second audio stream, wherein the data stream includes the first text stream;
    所述将所述目标语义发送给所述第二终端,包括:将同步的所述第二视频流、所述第二音频流发送给所述第二终端。The sending the target semantics to the second terminal includes: sending the synchronized second video stream and the second audio stream to the second terminal.
  10. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述方法还包括:在所述第一终端和所述第二终端进行所述视频通话、且所述第一终 端不支持使用目标数据通道和所述第二终端支持使用所述目标数据通道的情况下,获取所述第二终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第二终端与媒体服务器之间的数据通道;The method further includes: making the video call between the first terminal and the second terminal, and the first terminal does not support the use of the target data channel and the second terminal supports the use of the target data channel In this case, obtain the second request sent by the second terminal, where the second request is used to request to create a target data channel; in response to the second request, create the target data channel, where the target The data channel is a data channel between the second terminal and the media server;
    所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第二终端在所述目标数据通道上传输的所述第一请求;The obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first request transmitted by the second terminal on the target data channel;
    所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述视频通话中的视频流、音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述混合媒体服务和所述手势识别服务;The creating the gesture recognition service in response to the first request includes: in response to the first request, sending a target instruction to the media server through a service control node, wherein the target instruction is used to request creation Mixed media service and the gesture recognition service, the mixed media service is used to process the video stream, audio stream and data stream in the video call, the data stream is a data stream representing the target semantics; through The media server creates the mixed media service and the gesture recognition service, or instructs a third-party service component to create the mixed media service and the gesture recognition service through the media server;
    在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述视频通话中,获取所述第一终端采集到的第三组视频帧和对应的第三组音频帧,以及在所述第三组视频帧中识别出的第三组手势;In the video call or audio call, obtaining a group of gestures recognized in a group of video frames collected by the first terminal includes: in the video call, obtaining a third gesture collected by the first terminal a set of video frames and a corresponding third set of audio frames, and a third set of gestures identified in the third set of video frames;
    在得到所述目标语义之后,所述方法还包括:通过所述混合媒体服务,对所述第三组视频帧形成的第三视频流、所述第三组音频帧形成的第三音频流以及用于表示所述目标语义的第三数据流进行同步处理,得到同步的所述第三视频流、所述第三音频流和所述第三数据流;After obtaining the target semantics, the method further includes: using the mixed media service, the third video stream formed by the third group of video frames, the third audio stream formed by the third group of audio frames, and performing synchronous processing on the third data stream used to represent the target semantics to obtain the synchronized third video stream, the third audio stream, and the third data stream;
    所述将所述目标语义发送给所述第二终端,包括:将同步的所述第三视频流、所述第三音频流和所述第三数据流发送给所述第二终端,其中,所述同步的所述第三数据流在所述目标数据通道上发送。The sending the target semantics to the second terminal includes: sending the synchronized third video stream, the third audio stream and the third data stream to the second terminal, wherein, The synchronized third data stream is sent on the target data channel.
  11. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述方法还包括:在所述第一终端和所述第二终端进行所述音频通话、且所述第一终端和所述第二终端均支持使用目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道包括第一目标数据通道和第二目标数据通道,所述第一目标数据通道是所述第一终端与媒体服务器之间的数据通道,所述第二目标数据通道是所述第二终端与所述媒体服务器之间的数据通道;The method further includes: when the first terminal and the second terminal conduct the audio call, and both the first terminal and the second terminal support the use of the target data channel, acquiring the second A second request sent by a terminal, wherein the second request is used to request creation of a target data channel; in response to the second request, the target data channel is created, wherein the target data channel includes the first target data channel and a second target data channel, the first target data channel is a data channel between the first terminal and the media server, and the second target data channel is a data channel between the second terminal and the media server data channel;
    所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述第一目标数据通道上传输的所述第一请求;The obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first request transmitted by the first terminal on the first target data channel;
    所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建混合媒体服务和所述手势识别服务,所述混合媒体服务用于对所述音频通话中的音频流和数据流进行处理,所述数据流是表示所述目标语义的数据流;通过所述媒体服务器创建所述混合媒体服务和所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所 述混合媒体服务和所述手势识别服务;The creating the gesture recognition service in response to the first request includes: in response to the first request, sending a target instruction to the media server through a service control node, wherein the target instruction is used to request creation Mixed media service and the gesture recognition service, the mixed media service is used to process the audio stream and data stream in the audio call, the data stream is a data stream representing the target semantics; through the media The server creates the mixed media service and the gesture recognition service, or, through the media server, instructs a third-party service component to create the mixed media service and the gesture recognition service;
    在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述音频通话中,获取所述第一终端采集到的第四组视频帧和对应的第四组音频帧,以及在所述第四组视频帧中识别出的第四组手势;In the video call or audio call, obtaining a group of gestures recognized in a group of video frames collected by the first terminal includes: in the audio call, obtaining a fourth gesture collected by the first terminal a set of video frames and a corresponding fourth set of audio frames, and a fourth set of gestures identified in the fourth set of video frames;
    在得到所述目标语义之后,所述方法还包括:通过所述混合媒体服务,对用于表示所述目标语义的第二文字流和所述第四组音频帧形成的第四组音频流进行同步处理,得到同步的所述第二文字流和第四音频流,其中,所述数据流包括所述第二文字流;After obtaining the target semantics, the method further includes: using the mixed media service, performing a second text stream representing the target semantics and a fourth group of audio streams formed by the fourth group of audio frames synchronous processing to obtain the synchronized second text stream and fourth audio stream, wherein the data stream includes the second text stream;
    所述将所述目标语义发送给所述第二终端,包括:将同步的所述第二文字流和所述第四音频流发送给所述第二终端,其中,所述同步的所述第二文字流在所述第二目标数据通道上发送。The sending the target semantics to the second terminal includes: sending the synchronized second text stream and the fourth audio stream to the second terminal, wherein the synchronized first Two word streams are sent on the second target data channel.
  12. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述方法还包括:在所述第一终端和所述第二终端进行所述音频通话、且所述第一终端支持使用目标数据通道和所述第二终端不支持使用所述目标数据通道的情况下,获取所述第一终端发送的第二请求,其中,所述第二请求用于请求创建目标数据通道;响应于所述第二请求,创建所述目标数据通道,其中,所述目标数据通道是所述第一终端与媒体服务器之间的数据通道;The method further includes: conducting the audio call between the first terminal and the second terminal, and the first terminal supports the use of the target data channel and the second terminal does not support the use of the target data channel In this case, obtain the second request sent by the first terminal, where the second request is used to request to create a target data channel; in response to the second request, create the target data channel, where the target The data channel is a data channel between the first terminal and the media server;
    所述获取所述第一终端或所述第二终端发送的第一请求,包括:获取所述第一终端在所述目标数据通道上传输的所述第一请求;The obtaining the first request sent by the first terminal or the second terminal includes: obtaining the first request transmitted by the first terminal on the target data channel;
    所述响应于所述第一请求,创建所述手势识别服务,包括:响应于所述第一请求,通过服务控制节点向所述媒体服务器发送目标指令,其中,所述目标指令用于请求创建所述手势识别服务;通过所述媒体服务器创建所述手势识别服务,或者,通过所述媒体服务器指示第三方服务组件创建所述手势识别服务;The creating the gesture recognition service in response to the first request includes: in response to the first request, sending a target instruction to the media server through a service control node, wherein the target instruction is used to request creation The gesture recognition service; creating the gesture recognition service through the media server, or instructing a third-party service component to create the gesture recognition service through the media server;
    在所述视频通话或音频通话中,获取所述第一终端采集的一组视频帧中识别出的一组手势,包括:在所述音频通话中,获取所述第一终端采集到的第五组视频帧和对应的第五组音频帧,以及在所述第五组视频帧中识别出的第五组手势;In the video call or audio call, obtaining a group of gestures recognized in a group of video frames collected by the first terminal includes: in the audio call, obtaining the fifth gesture collected by the first terminal a set of video frames and a corresponding fifth set of audio frames, and a fifth set of gestures identified in the fifth set of video frames;
    所述将所述目标语义发送给所述第二终端,包括:将用于表示所述目标语义的第五音频流发送给所述第二终端。The sending the target semantics to the second terminal includes: sending a fifth audio stream used to represent the target semantics to the second terminal.
  13. 一种手势通信装置,包括:A gesture communication device, comprising:
    第一获取模块,设置为在第一终端和第二终端进行视频通话或音频通话时,获取所述第一终端或所述第二终端发送的第一请求,其中,所述第一请求用于请求创建手势识别服务,其中,所述手势识别服务用于对所述第一终端采集到的视频帧中识别出的手势进行语义识别;The first obtaining module is configured to obtain a first request sent by the first terminal or the second terminal when the first terminal and the second terminal make a video call or an audio call, wherein the first request is used for requesting the creation of a gesture recognition service, wherein the gesture recognition service is used to perform semantic recognition on the gestures recognized in the video frames collected by the first terminal;
    第一创建模块,设置为响应于所述第一请求,创建所述手势识别服务;A first creating module, configured to create the gesture recognition service in response to the first request;
    第二获取模块,设置为在所述视频通话或音频通话中,获取所述第一终端采集的一组 视频帧中识别出的一组手势;The second acquisition module is configured to acquire a group of gestures identified in a group of video frames collected by the first terminal during the video call or audio call;
    识别模块,设置为通过所述手势识别服务,对所述第一终端采集的一组视频帧中识别出的一组手势进行语义识别,得到所述一组手势所表示的目标语义;The recognition module is configured to perform semantic recognition on a group of gestures identified in a group of video frames collected by the first terminal through the gesture recognition service, to obtain the target semantics represented by the group of gestures;
    第一发送模块,设置为将所述目标语义发送给所述第二终端。The first sending module is configured to send the target semantics to the second terminal.
  14. 一种计算机可读的存储介质,所述计算机可读的存储介质包括存储的程序,其中,所述程序被处理器执行时实现上述权利要求1至12中任一项所述的方法。A computer-readable storage medium, the computer-readable storage medium comprising a stored program, wherein, when the program is executed by a processor, the method according to any one of claims 1 to 12 is implemented.
  15. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行所述权利要求1至12中任一项所述的方法。An electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to execute the method according to any one of claims 1 to 12 through the computer program.
PCT/CN2022/123487 2021-10-20 2022-09-30 Gesture-based communication method and apparatus, storage medium, and electronic apparatus WO2023066023A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111218290.3A CN113660449B (en) 2021-10-20 2021-10-20 Gesture communication method and device, storage medium and electronic device
CN202111218290.3 2021-10-20

Publications (1)

Publication Number Publication Date
WO2023066023A1 true WO2023066023A1 (en) 2023-04-27

Family

ID=78484250

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/123487 WO2023066023A1 (en) 2021-10-20 2022-09-30 Gesture-based communication method and apparatus, storage medium, and electronic apparatus

Country Status (2)

Country Link
CN (1) CN113660449B (en)
WO (1) WO2023066023A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660449B (en) * 2021-10-20 2022-03-01 中兴通讯股份有限公司 Gesture communication method and device, storage medium and electronic device
CN116719419B (en) * 2023-08-09 2023-11-03 世优(北京)科技有限公司 Intelligent interaction method and system for meta universe

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984496A (en) * 2012-12-21 2013-03-20 华为技术有限公司 Processing method, device and system of video and audio information in video conference
CN105100482A (en) * 2015-07-30 2015-11-25 努比亚技术有限公司 Mobile terminal and system for realizing sign language identification, and conversation realization method of the mobile terminal
CN106254960A (en) * 2016-08-30 2016-12-21 福州瑞芯微电子股份有限公司 A kind of video call method for communication disorders and system
US10176366B1 (en) * 2017-11-01 2019-01-08 Sorenson Ip Holdings Llc Video relay service, communication system, and related methods for performing artificial intelligence sign language translation services in a video relay service environment
US20200117888A1 (en) * 2018-10-11 2020-04-16 Chris Talbot Interactive sign language response system and method
CN113660449A (en) * 2021-10-20 2021-11-16 中兴通讯股份有限公司 Gesture communication method and device, storage medium and electronic device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070065A (en) * 2019-04-30 2019-07-30 李冠津 The sign language systems and the means of communication of view-based access control model and speech-sound intelligent
KR102212298B1 (en) * 2020-11-09 2021-02-05 주식회사 라젠 Platform system for providing video communication between non disabled and hearing impaired based on artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984496A (en) * 2012-12-21 2013-03-20 华为技术有限公司 Processing method, device and system of video and audio information in video conference
CN105100482A (en) * 2015-07-30 2015-11-25 努比亚技术有限公司 Mobile terminal and system for realizing sign language identification, and conversation realization method of the mobile terminal
CN106254960A (en) * 2016-08-30 2016-12-21 福州瑞芯微电子股份有限公司 A kind of video call method for communication disorders and system
US10176366B1 (en) * 2017-11-01 2019-01-08 Sorenson Ip Holdings Llc Video relay service, communication system, and related methods for performing artificial intelligence sign language translation services in a video relay service environment
US20200117888A1 (en) * 2018-10-11 2020-04-16 Chris Talbot Interactive sign language response system and method
CN113660449A (en) * 2021-10-20 2021-11-16 中兴通讯股份有限公司 Gesture communication method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN113660449B (en) 2022-03-01
CN113660449A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
WO2023066023A1 (en) Gesture-based communication method and apparatus, storage medium, and electronic apparatus
CN106331581B (en) Method and device for communication between mobile terminal and video network terminal
US9024997B2 (en) Virtual presence via mobile
US7996540B2 (en) Method and system for replacing media stream in a communication process of a terminal
CN101924772B (en) Communication system and method supporting cross-network and cross-terminal realization of multimedia session merging
CN101677388A (en) Visual communication system, terminal gateway, video gateway and visual communication method
US7142643B2 (en) Method and system for unifying phonebook for varied hearing disabilities
RU2504090C2 (en) Method, apparatus and system for making video call
JP2006217592A (en) Video call method for providing image via third display device
CN110475094B (en) Video conference processing method and device and readable storage medium
WO2012075937A1 (en) Video call method and videophone
CN108881149B (en) Access method and system of video telephone equipment
CN108574689B (en) Method and device for video call
KR20120018708A (en) Method and system for providing multimedia content during communication service
CN112543301A (en) Intelligent conference system based on IMS and implementation method thereof
CN113923470A (en) Live stream processing method and device
CN112714131A (en) Cross-platform microphone connecting method and device, storage medium and electronic equipment
WO2014012384A1 (en) Communication data transmitting method, system and receiving device
WO2023005524A1 (en) Order payment method and apparatus, and storage medium, device and system
CN106230915A (en) A kind of method and system realizing function machine intelligent communication
CN101568007B (en) Video information processing method and system based on 3G video calling center
CN102045535B (en) Device, system and method for user to select customer service representative by video
US8588394B2 (en) Content switch for enhancing directory assistance
JP2008042386A (en) Communication terminal device
CN216930179U (en) Video transmission device for Mini video processing card

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882634

Country of ref document: EP

Kind code of ref document: A1