CN112511789B - Instant messaging expansion method and system - Google Patents

Instant messaging expansion method and system Download PDF

Info

Publication number
CN112511789B
CN112511789B CN202011377860.9A CN202011377860A CN112511789B CN 112511789 B CN112511789 B CN 112511789B CN 202011377860 A CN202011377860 A CN 202011377860A CN 112511789 B CN112511789 B CN 112511789B
Authority
CN
China
Prior art keywords
sip server
client
network
video
interruption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011377860.9A
Other languages
Chinese (zh)
Other versions
CN112511789A (en
Inventor
曹春林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Manji Network Technology Co ltd
Original Assignee
Chongqing Manji Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Manji Network Technology Co ltd filed Critical Chongqing Manji Network Technology Co ltd
Priority to CN202011377860.9A priority Critical patent/CN112511789B/en
Publication of CN112511789A publication Critical patent/CN112511789A/en
Application granted granted Critical
Publication of CN112511789B publication Critical patent/CN112511789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • H04L51/046Interoperability with other network applications or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1069Session establishment or de-establishment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1101Session protocols
    • H04L65/1104Session initiation protocol [SIP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Environmental & Geological Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of communication, and particularly discloses an instant messaging expansion method and an instant messaging expansion system, wherein the method comprises the following steps: s1, a client establishes connection with an XMPP server; s2, after receiving the video conference request, the client acquires the current network quality information; the client sends the video conference request and the current network quality information to an XMPP server; s3, the XMPP server judges whether the current network quality information meets the preset network quality requirement, and if so, the XMPP server establishes connection with the SIP server through a gateway; sending the video conference request to an SIP server, and jumping to S4; and S4, the SIP server establishes connection with the client based on the video conference request and establishes a video conference. By adopting the technical scheme of the invention, the invalid connection between the XMPP server and the SIP server can be avoided.

Description

Instant messaging expansion method and system
Technical Field
The present invention relates to the field of communications technologies, and in particular, to an instant messaging extension method and system.
Background
In order to realize instant messaging, two parties of communication need to unify communication protocols. Currently, there are 4 mainstream instant messaging protocols: instant Messaging And Presence Protocol IMPP (Instant Messaging And Presence Protocol), presence And Instant Messaging Protocol PRIM (Presence And Instant Messaging Protocol), SIP Instant Messaging And Presence extension Protocol SIMPLE (SIP for Instant Messaging And Presence Extensions) And Extensible Messaging And Presence Protocol XMPP (Extensible Messaging And Presence Protocol).
The extensible messaging and presence protocol XMPP, referred to as XMPP protocol in this specification, is a protocol based on a subset XML of standard universal markup language. The XMPP protocol is developed and improved by an open source communication protocol Jabber, and the XMPP protocol itself performs communication based on the TCP protocol. The XMPP protocol can be regarded as an XML application defining three roles, server, client and gateway, through any two of which information communication between two or more network entities can be realized. The main responsibilities of the server are connection management and message routing, while recording client information. The gateway connects different communication systems like MSN, ICQ and the like through message forwarding and routing functions to realize intercommunication. The XMPP protocol is advantageous in that its information exchange is quasi-real-time, and applications based on it exhibit high extensibility and good maintainability.
At present, pictures, characters and voice messages are mainly sent through an XMPP, a video conference cannot be completely supported by only depending on the XMPP, and the use scene of the whole instant messaging cannot be covered. SIP is an application-layer control protocol that can be used to establish, modify, and terminate multimedia sessions (or conferences). Based on the good expansibility of the XMPP protocol, the XMPP server is connected with the SIP server through the gateway, and the SIP server is utilized to realize the support of the video conference.
However, the SIP server still belongs to an extended server, and the XMPP server and the SIP server also need to occupy bandwidth resources to establish a connection. If the connection between the XMPP server and the SIP server is directly established for each video conference request of the client without performing pre-audit, if the video conference cannot be normally performed, bandwidth resources are uselessly occupied.
Therefore, an instant messaging extension method and system capable of avoiding invalid connection between an XMPP server and an SIP server are needed.
Disclosure of Invention
The invention provides an instant messaging extension method and system, which can avoid invalid connection between an XMPP server and an SIP server.
In order to solve the technical problem, the present application provides the following technical solutions:
an instant communication extension method comprises the following steps:
s1, a client establishes connection with an XMPP server;
s2, after receiving the video conference request, the client acquires the current network quality information; the client sends the video conference request and the current network quality information to an XMPP server;
s3, the XMPP server judges whether the current network quality information meets the preset network quality requirement, and if so, the XMPP server establishes connection with the SIP server through a gateway; sending the video conference request to an SIP server, and jumping to S4; if the network quality information does not meet the preset network quality requirement, the XMPP server sends a network monitoring instruction to the client, the client acquires the network quality information in real time based on the network monitoring instruction, and a meeting-opening prompt is generated when the network quality information meets the preset network quality requirement;
and S4, the SIP server establishes connection with the client based on the video conference request and establishes a video conference.
The basic scheme principle and the beneficial effects are as follows:
in the scheme, whether the current network quality information meets the preset network quality requirement or not is judged firstly, and the connection is established only when the current network quality information meets the requirement, so that the condition that the network quality of the client is too poor and the video conference cannot be smoothly carried out even after the client is connected can be avoided, and invalid connection between the XMPP server and the SIP server can also be avoided. Moreover, the client side acquires the network quality information in real time based on the network monitoring instruction, and generates a meeting opening prompt when the network quality information meets the preset network quality requirement; after the network is recovered, the user can be reminded in time, and the user can input the video conference request at the client again after knowing the user.
Further, S5, when the video conference is carried out, the client acquires video content and sends the video content to other clients participating in the video conference through the SIP server; the SIP server distinguishes the clients according to the video content and marks the clients as a speaking client and a listening client respectively; the SIP server stores the video content of the speaking client in real time;
s6, after acquiring the current network quality information, the listening client judges whether network interruption occurs or not based on the network quality information, if the network interruption occurs, the duration time of the network interruption is recorded, and the duration time of the network interruption is sent to the SIP server after the network is recovered;
s7, the SIP server extracts the video content during interruption from the video content of the speaking client according to the duration of network interruption, identifies the sound information in the video content during interruption, judges whether the voice exists or not, and jumps to S8 if the voice does not exist;
and S8, the SIP server sends the stored video content of the speaking client to the listening client with network interruption from the time point of network recovery.
Under the condition of good network quality, network interruption may also occur, for example, sudden flow interruption caused by defects of routers, network cards and the like, which also causes network interruption. When the listening client side has network interruption, recording the duration time of the network interruption, and sending the duration time of the network interruption to the SIP server after the network is recovered; the SIP server is convenient to know the network condition of the listening client. When the video content during interruption does not have voice, the SIP server sends the stored video content of the speaking client to the listening client with network interruption from the time point of network recovery; when no voice exists, the video content during interruption does not contain important information, so that the method directly saves the video content, can enable the listening client with interrupted network to immediately follow the progress of the conference without influencing the colleagues of information reception, and ensures the normal operation of the video conference.
Further, in S7, if there is a voice transition to S9;
s9, the SIP server stores the user name corresponding to each client; the SIP server converts the voice into a text, judges whether the speaker conversion information exists or not based on the text, and jumps to S10 if the speaker conversion information exists;
s10, the SIP server extracts the name of the next speaker from the speaker conversion information, and the SIP server judges whether the next speaker is a user corresponding to the listening client with network interruption or not based on the name of the next speaker and the stored user name corresponding to the client, and if not, the SIP server jumps to S11;
s11, the SIP server accelerates the stored video content of the speaking client from the time point when the network interruption occurs, and sends the video content to the listening client with the network interruption.
Further, in S10, if the next speaker is a user corresponding to the listening client with the network interruption, the process jumps to S12;
and S12, the SIP server sends the text to the listening client with network interruption.
When the video content during the interruption is voiced, the SIP server accelerates the stored video content of the speaking client from the time point when the network interruption occurs, and sends the accelerated video content to the listening client with the network interruption. The method and the device have the advantages that the user can not miss information which is not received when the network is interrupted, and compared with the prior art, the method and the device can accelerate video content, can quickly catch up with the real-time progress of a conference while not missing information, and avoid large delay.
Because the situation that one person speaks is often found in the conference and another person speaks is changed, when the next speaker is a listening client corresponding user with network interruption, if the video is accelerated, the next speaker needs to watch the previous video first and can speak under the condition of not omitting information, but other users participating in the conference wait for a long time and influence the experience of all users participating in the conference.
At this time, compared with the listening client with network interruption corresponding to the information when the user obtains the video interruption without feeling (i.e. accelerates the video and gradually follows the conference progress), the speed of obtaining the information is more important.
Further, in S9, if there is no speaker conversion information, jumping to S13;
s13, the SIP server judges whether the network interruption time exceeds a third threshold value, and if the network interruption time exceeds the third threshold value, the SIP server jumps to S12; if not, jump to S11.
If the network interruption time is too long, the missed content is too much, and if the user keeps up with the conference schedule in a video acceleration mode, a long time is needed, so that the user is asynchronous with other users for a long time, and the user experience is influenced by the long-time asynchronous. The preferred scheme can effectively avoid the situation.
Further, in S10, if the next speaker is not the user corresponding to the listening client with the network interruption, the process jumps to S13.
Further, the S11 further includes S1101, the SIP server marks video content of the speaking client with a preset time length from a time point when the network interruption occurs as a video to be processed, the SIP server extracts a human voice and an environmental voice from sound information in the video to be processed, determines whether a speech rate is lower than a first threshold based on the human voice, determines whether an environmental volume is lower than a second threshold based on the environmental voice, if the speech rate is lower than the first threshold and the environmental volume is lower than the second threshold, jumps to S1102, and if the speech rate is not lower than the first threshold or the environmental volume is not lower than the second threshold, jumps to S1103;
s1102, the SIP server integrally accelerates the video to be processed; the reduced playing time of the accelerated video to be processed is equal to the duration of network interruption;
s1103, the SIP server identifies each pause in the speaking according to the voice, judges whether the total pause duration is longer than the network interruption duration, and jumps to S1104 if the total pause duration is longer than the network interruption duration;
and S1104, the SIP server accelerates each pause according to a preset proportion, so that the reduced playing time of the accelerated video to be processed is equal to the duration of network interruption.
Further, in S1103, if the total duration of the pause is less than or equal to the duration of the network interruption, go to S1105;
and S1105, after the SIP server increases the video to be processed for a preset time length, the SIP server jumps to S1103.
Compared with the prior art, the optimal scheme does not directly accelerate the video. The video content of the speaking client with the preset time length from the time point when the network interruption occurs is marked as the video to be processed, so that the video with enough length can be accelerated, the perception of the user on acceleration is reduced, and the user experience is improved. For example, the duration of the video after acceleration is 5 seconds, the length of the video before acceleration is 20 seconds and the length of the video before acceleration is 10 seconds, the acceleration rate of the video before acceleration of 20 seconds is lower, the user has weaker perception of acceleration, and the viewing experience is better.
If the speech rate is lower than the first threshold value and the environmental volume is lower than the second threshold value, it is indicated that the speech rate of the user speaking is slow, and the environment is quiet, at the moment, the whole video to be processed is accelerated, the situation that the user cannot hear clearly or the speech rate cannot be kept up with too fast due to too fast speed is avoided, and the user experience is good.
When the speech rate is not lower than the first threshold or the environmental volume is not lower than the second threshold, it is indicated that the speech rate of the speaking user is too fast or the environment is noisy, and the overall acceleration effect on the video to be processed is poor. In the preferred scheme, each pause is accelerated according to a preset proportion, and compared with the mode of directly skipping pause, the method can ensure the integral continuity of the video, avoid the false appearance of card pause, still pause after acceleration, and can also facilitate the understanding of semantics to the maximum extent.
Moreover, by increasing the preset time length, the effect that the reduced playing time of the video to be processed is equal to the duration of network interruption after each pause is accelerated according to the preset proportion can be achieved.
Further, in S1102, when the listening client views the accelerated video content, the listening client acquires image data of the user, sends the image data to the SIP server, the SIP server identifies a face from the image data, performs expression analysis based on the face, and when an analysis result is a negative expression, the SIP server increases the length of the video to be processed, and reduces the acceleration amplitude of the video to be processed.
The negative expression may be that the accelerated video of the user is uncomfortable, and the acceleration amplitude of the to-be-processed video is reduced by increasing the length of the to-be-processed video, so that the perception of the user on acceleration can be further reduced.
An instant communication expansion system adopts the instant communication expansion method.
Drawings
Fig. 1 is a flowchart beginning from step S6 in an instant messaging extension method according to an embodiment.
Detailed Description
The following is further detailed by way of specific embodiments:
example one
An instant messaging extension method of this embodiment includes the following steps:
s1, a client establishes connection with an XMPP server;
s2, after receiving the video conference request, the client acquires the current network quality information; the client sends the video conference request and the current network quality information to an XMPP server;
s3, the XMPP server judges whether the current network quality information meets the preset network quality requirement, and if so, the XMPP server establishes connection with the SIP server through a gateway; and sending the video conference request to the SIP server, and jumping to S4. In this embodiment, the preset network quality requirements include an upload speed requirement, a download speed requirement, and a signal strength requirement. And if the network quality information does not meet the preset network quality requirement, the XMPP server sends a network monitoring instruction to the client, the client acquires the network quality information in real time based on the network monitoring instruction, and a meeting opening prompt is generated when the network quality information meets the preset network quality requirement.
And S4, the SIP server establishes connection with the client based on the video conference request and establishes the video conference.
S5, during the video conference, the client collects video content and sends the video content to other clients participating in the video conference through the SIP server; the SIP server distinguishes the clients according to the video content and marks the clients as a speaking client and a listening client respectively; the SIP server stores the video content of the speaking client in real time;
as shown in fig. 1, S6, the listening client obtains current network quality information, determines whether a network interruption occurs based on the network quality information, records a duration of the network interruption if the network interruption occurs, and sends the duration of the network interruption to the SIP server after the network is recovered;
s7, the SIP server extracts the video content during interruption from the video content of the speaking client according to the duration of network interruption, identifies the sound information in the video content during interruption, judges whether the voice exists or not, and jumps to S8 if the voice does not exist; if the voice exists, jumping to S9;
s8, the SIP server sends the stored video content of the speaking client to the listening client with network interruption from the time point of network recovery;
s9, the SIP server stores the user name corresponding to each client; the SIP server converts the voice into a text, judges whether the speaker conversion information exists or not based on the text, and jumps to S10 if the speaker conversion information exists; jumping to S13 if no speaker conversion information exists;
s10, the SIP server extracts the name of the next speaker from the speaker conversion information, judges whether the next speaker is a user corresponding to the listening client with network interruption or not based on the name of the next speaker and the stored user name corresponding to the client, and jumps to S12 if the next speaker is a user corresponding to the listening client with network interruption; and if the next speaker is not the user corresponding to the listening client with the network interruption, jumping to S13.
S11, the SIP server accelerates the stored video content of the speaking client from the time point when the network interruption occurs, and sends the video content to the listening client with the network interruption, and the method specifically comprises the following steps:
s1101, marking the video content of the speaking client with a preset time length from the time point when the network interruption occurs as a video to be processed by the SIP server, extracting human voice and environmental voice from the voice information in the video to be processed by the SIP server, judging whether the voice speed is lower than a first threshold value or not based on the human voice, judging whether the environmental volume is lower than a second threshold value or not based on the environmental voice, jumping to S1102 if the voice speed is lower than the first threshold value and the environmental volume is lower than the second threshold value, and jumping to S1103 if the voice speed is not lower than the first threshold value or the environmental volume is not lower than the second threshold value;
s1102, the SIP server integrally accelerates the video to be processed; the reduced playing time of the accelerated video to be processed is equal to the duration of network interruption;
s1103, the SIP server identifies each pause in the speaking according to the voice, judges whether the total pause duration is longer than the network interruption duration, and jumps to S1104 if the total pause duration is longer than the network interruption duration; if the total duration of the pause is less than or equal to the duration of the network interruption, jumping to S1105;
s1104, the SIP server accelerates each pause according to a preset proportion, so that the reduced playing time of the accelerated to-be-processed video is equal to the duration of network interruption;
and S1105, after the SIP server increases the video to be processed for a preset time length, the SIP server jumps to S1103.
And S12, the SIP server sends the text to the listening client with network interruption.
S13, the SIP server judges whether the network interruption time exceeds a third threshold value, and if so, the SIP server jumps to S12; if not, jump to S11.
Based on the instant messaging extension method, the embodiment further provides an instant messaging extension system, which comprises a client, an XMPP server, a gateway and an SIP server.
The client is used for establishing connection with the XMPP server;
the client is also used for acquiring the current network quality information after receiving the video conference request; and the client sends the video conference request and the current network quality information to the XMPP server. In this embodiment, the video conference request is input to the client by the user.
The XMPP server is used for judging whether the current network quality information meets the preset network quality requirement, and if so, the XMPP server establishes connection with the SIP server through the gateway; and sending the video conference request to the SIP server. In this embodiment, the preset network quality requirements include an upload speed requirement, a download speed requirement, and a signal strength requirement.
And the SIP server is used for establishing connection with the client according to the video conference request and establishing the video conference.
During the video conference, the client acquires video content and sends the video content to other clients participating in the video conference through the SIP server.
The SIP server also stores the user name corresponding to each client. For example, the client used by zhang san, the corresponding user name is zhang san.
The SIP server is also used for distinguishing the clients according to the video content sent by the clients and respectively marking the clients as speaking clients and listening clients. In this embodiment, the client that has the voice in the video content and the voice duration of which exceeds 10 seconds is marked as the speaking client, and the rest are marked as the listening clients. Adding the condition limit of more than 10 seconds helps to screen out simple answers "good", "clear" and "received" etc. for the conference by non-speakers.
The listening client is used for acquiring the current network quality information, judging whether network interruption occurs or not based on the network quality information, recording the duration of the network interruption if the network interruption occurs, and sending the duration of the network interruption to the SIP server after the network is recovered.
The SIP server is used for storing the video content of the speaking client in real time, and is also used for extracting the video content during interruption from the video content of the speaking client according to the duration time of network interruption, identifying the sound information in the video content during interruption, judging whether the voice exists or not, and if the voice does not exist, the SIP server sends the stored video content of the speaking client to the listening client with the network interruption from the time point of network recovery.
If the voice exists, the voice is also converted into a text, whether the speaker conversion information exists is judged based on the text, and if the speaker conversion information exists, the name of the next speaker is extracted from the speaker conversion information. In the present embodiment, the talker switching information indicates a sentence intending to switch the talker, for example, "please speak three below", "please talk four below", "five king you chat in a chat bar", or the like.
And the SIP server also judges whether the next speaker is a user corresponding to the listening client with network interruption or not based on the name of the next speaker and the stored user name corresponding to the client, and if not, the SIP server accelerates the stored video content of the speaking client from the time point when the network interruption occurs and sends the accelerated video content to the listening client with the network interruption.
And if the next speaker is the user corresponding to the listening client with the network interruption, the SIP server is also used for sending the text to the listening client with the network interruption. The listening client is used for receiving the text, displaying the text and playing a preset reminding sound.
And if the SIP server does not exceed the third threshold, the SIP server accelerates the stored video content of the speaking client from the time point when the network interruption occurs and sends the accelerated video content to the listening client with the network interruption. The third threshold is 15-120 seconds, 15 seconds in this embodiment.
When the SIP server accelerates the stored video content of the speaking client from the time point when the network interruption occurs, specifically:
the SIP server is used for marking the video content of the speaking client with the preset time length from the time point when the network interruption occurs as the video to be processed. In this embodiment, the preset time duration is longer than the network interruption time.
The SIP server extracts human voice and environmental voice from voice information in the video to be processed, judges whether the voice speed is lower than a first threshold value or not based on the human voice, judges whether the environmental voice volume is lower than a second threshold value or not based on the environmental voice, and if the voice speed is lower than the first threshold value and the environmental voice volume is lower than the second threshold value, the SIP server is used for integrally accelerating the video to be processed; and the reduced playing time of the accelerated to-be-processed video is equal to the duration of network interruption. The whole acceleration can be performed in a frame skipping mode, or in a frame rate increasing mode, for example, 30 frames per second of the original video to be processed, and 60 frames are played one second after the whole acceleration, so that double-speed playing is realized.
If the speech speed is not lower than the first threshold or the environmental volume is not lower than the second threshold, the SIP server is further used for identifying each pause in the speaking according to the voice and judging whether the total duration of the pause is longer than the duration of the network interruption or not, and if the total duration of the pause is longer than the duration of the network interruption, the SIP server is further used for accelerating each pause according to a preset proportion so that the playing time reduced by the accelerated video to be processed is equal to the duration of the network interruption;
if the total pause duration is less than the network interruption duration, the SIP server increases the preset time length of the video to be processed again, continues to judge whether the total pause duration is greater than the network interruption duration in the increased video to be processed, and continues to increase the preset time length of the video to be processed until the total pause duration is greater than the network interruption duration if the total pause duration is less than the network interruption duration.
Example two
The difference between this embodiment and the first embodiment is that, in this embodiment S1102, when the user views the accelerated video content, the listening client acquires image data of the user, sends the image data to the SIP server, the SIP server identifies a face from the image data, performs expression analysis based on the face, and when an analysis result is a negative expression, the SIP server increases the length of the video to be processed, and reduces the acceleration amplitude of the video to be processed. In the present embodiment, a facial expression is defined as a neutral expression, and based on this, a happy, satisfied, or the like is defined as a positive expression, and a confused, unhappy, worried, or the like is defined as a negative expression. The negative expression may be that the user is uncomfortable with the accelerated video, and the acceleration amplitude of the to-be-processed video is reduced by increasing the length of the to-be-processed video, so that the perception of the user on acceleration can be further reduced, and certainly, more time is spent in catching up with the real-time progress.
The above are only examples of the present invention, and the present invention is not limited to the field related to the embodiments, the general knowledge of the specific structures and characteristics of the embodiments is not described herein, and those skilled in the art can know all the common technical knowledge in the technical field before the application date or the priority date, can know all the prior art in the field, and have the capability of applying the conventional experimental means before the application date, and those skilled in the art can combine the capabilities of themselves to complete and implement the present invention, and some typical known structures or known methods should not become obstacles for those skilled in the art to implement the present application. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (10)

1. An instant messaging expansion method is characterized by comprising the following steps:
s1, a client establishes connection with an XMPP server;
s2, after receiving the video conference request, the client acquires the current network quality information; the client sends the video conference request and the current network quality information to an XMPP server;
s3, the XMPP server judges whether the current network quality information meets the preset network quality requirement, and if so, the XMPP server establishes connection with the SIP server through a gateway; sending the video conference request to an SIP server, and jumping to S4; if the network quality information meets the preset network quality requirement, the XMPP server sends a network monitoring instruction to the client, the client acquires the network quality information in real time based on the network monitoring instruction, and a meeting opening prompt is generated when the network quality information meets the preset network quality requirement;
and S4, the SIP server establishes connection with the client based on the video conference request and establishes a video conference.
2. The instant messaging expansion method of claim 1, wherein: s5, when the video conference is carried out, the client acquires video content and sends the video content to other clients participating in the video conference through the SIP server; the SIP server distinguishes the clients according to the video content and marks the clients as a speaking client and a listening client respectively; the SIP server stores the video content of the speaking client in real time;
s6, after acquiring the current network quality information, the listening client judges whether network interruption occurs or not based on the network quality information, if the network interruption occurs, the duration time of the network interruption is recorded, and the duration time of the network interruption is sent to the SIP server after the network is recovered;
s7, the SIP server extracts the video content during interruption from the video content of the speaking client according to the duration of network interruption, identifies the sound information in the video content during interruption, judges whether the voice exists or not, and jumps to S8 if the voice does not exist;
and S8, the SIP server sends the stored video content of the speaking client to the listening client with network interruption from the time point of network recovery.
3. The instant messaging expansion method of claim 2, wherein: in the S7, if the voice jumps to S9;
s9, the SIP server stores the user name corresponding to each client; the SIP server converts the voice into a text, judges whether the speaker conversion information exists or not based on the text, and jumps to S10 if the speaker conversion information exists;
s10, the SIP server extracts the name of the next speaker from the speaker conversion information, and the SIP server judges whether the next speaker is a user corresponding to the listening client with network interruption or not based on the name of the next speaker and the stored user name corresponding to the client, and if not, the SIP server jumps to S11;
s11, the SIP server accelerates the stored video content of the speaking client from the time point when the network interruption occurs, and sends the accelerated video content to the listening client with the network interruption.
4. The instant messaging expansion method of claim 3, wherein: in the step S10, if the next speaker is a user corresponding to the listening client with the network interruption, jumping to step S12;
and S12, the SIP server sends the text to the listening client with network interruption.
5. The instant messaging expansion method of claim 4, wherein: in the S9, if the speaker conversion information does not exist, jumping to S13;
s13, the SIP server judges whether the network interruption time exceeds a third threshold value, and if so, the SIP server jumps to S12; if not, jump to S11.
6. The instant messaging expansion method of claim 5, wherein: in S10, if the next speaker is not a user corresponding to the listening client with the network interruption, the process jumps to S13.
7. The instant messaging expansion method of claim 6, wherein: the S11 further includes S1101, the SIP server marks video content of the speaking client with a preset time length from a time point when the network interruption occurs as a video to be processed, the SIP server extracts a voice and an environmental voice from sound information in the video to be processed, determines whether a speech rate is lower than a first threshold based on the voice, determines whether an environmental volume is lower than a second threshold based on the environmental voice, and jumps to S1102 if the speech rate is lower than the first threshold and the environmental volume is lower than the second threshold, and jumps to S1103 if the speech rate is not lower than the first threshold or the environmental volume is not lower than the second threshold;
s1102, the SIP server integrally accelerates the video to be processed; the reduced playing time of the accelerated video to be processed is equal to the duration of network interruption;
s1103, the SIP server identifies each pause in the speaking according to the voice, judges whether the total pause duration is longer than the duration of network interruption or not, and jumps to S1104 if the total pause duration is longer than the duration of the network interruption;
and S1104, the SIP server accelerates each pause according to a preset proportion, so that the reduced playing time of the accelerated to-be-processed video is equal to the duration of network interruption.
8. The instant messaging expansion method of claim 7, wherein: in the step S1103, if the total duration of the pause is less than or equal to the duration of the network interruption, skipping to step S1105;
and S1105, after the SIP server increases the video to be processed for a preset time length, the SIP server jumps to S1103.
9. The instant messaging expansion method of claim 7, wherein: in the S1102, when the user views the accelerated video content, the listening client acquires image data of the user, sends the image data to the SIP server, and the SIP server recognizes a face from the image data, performs expression analysis based on the face, and when an analysis result is a negative expression, the SIP server increases the length of the video to be processed, and reduces the acceleration amplitude of the video to be processed.
10. An instant messaging expansion system, characterized in that the instant messaging expansion method of any one of claims 1 to 9 is used.
CN202011377860.9A 2020-11-30 2020-11-30 Instant messaging expansion method and system Active CN112511789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011377860.9A CN112511789B (en) 2020-11-30 2020-11-30 Instant messaging expansion method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011377860.9A CN112511789B (en) 2020-11-30 2020-11-30 Instant messaging expansion method and system

Publications (2)

Publication Number Publication Date
CN112511789A CN112511789A (en) 2021-03-16
CN112511789B true CN112511789B (en) 2023-04-07

Family

ID=74968547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011377860.9A Active CN112511789B (en) 2020-11-30 2020-11-30 Instant messaging expansion method and system

Country Status (1)

Country Link
CN (1) CN112511789B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102456A (en) * 2007-07-25 2008-01-09 中兴通讯股份有限公司 A video conference system with instant messaging function and its implementation method
CN101488916A (en) * 2009-01-24 2009-07-22 深圳华为通信技术有限公司 Band-width control method, apparatus, terminal and system based on video conference
CN101552909A (en) * 2009-04-28 2009-10-07 山东大学 Frame rate controlling method based on wireless video monitoring
CN103634279A (en) * 2012-08-23 2014-03-12 阿尔卡特朗讯公司 Method and device for conveniently establishing communication connection from SIP user to XMPP user
CN103763627A (en) * 2014-01-02 2014-04-30 Tcl集团股份有限公司 Method and system for realizing real-time video conference

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9215079B2 (en) * 2010-04-18 2015-12-15 Tropo, Inc. Servlet API and method for XMPP protocol
US10171536B2 (en) * 2016-09-30 2019-01-01 Atlassian Pty Ltd Rapid optimization of media stream bitrate

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102456A (en) * 2007-07-25 2008-01-09 中兴通讯股份有限公司 A video conference system with instant messaging function and its implementation method
CN101488916A (en) * 2009-01-24 2009-07-22 深圳华为通信技术有限公司 Band-width control method, apparatus, terminal and system based on video conference
CN101552909A (en) * 2009-04-28 2009-10-07 山东大学 Frame rate controlling method based on wireless video monitoring
CN103634279A (en) * 2012-08-23 2014-03-12 阿尔卡特朗讯公司 Method and device for conveniently establishing communication connection from SIP user to XMPP user
CN103763627A (en) * 2014-01-02 2014-04-30 Tcl集团股份有限公司 Method and system for realizing real-time video conference

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于XMPP协议的Android即时通信系统设计;黄伟敏;《电子设计工程》;20110420(第08期);全文 *

Also Published As

Publication number Publication date
CN112511789A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
US11159478B1 (en) Voice communications with real-time status notifications
US9998602B2 (en) Voice communications with real-time status notifications
US8270606B2 (en) Open architecture based domain dependent real time multi-lingual communication service
US8478316B2 (en) Methods and systems for providing near real time messaging to hearing impaired user during telephone calls
US20050210394A1 (en) Method for providing concurrent audio-video and audio instant messaging sessions
US7305438B2 (en) Method and system for voice on demand private message chat
RU2398361C2 (en) Intelligent method, audio limiting unit and system
CN105991854B (en) System and method for visualizing VoIP (Voice over Internet protocol) teleconference on intelligent terminal
US20160170970A1 (en) Translation Control
EP1496700B1 (en) Apparatus, method and computer program for supporting video conferencing in a communication system
US20080165791A1 (en) Buffering, pausing and condensing a live phone call
CN107040751A (en) Control the method for real-time conferencing session, computer is performed this method computer program product and associated communication system
JP2006174480A (en) Systems and methods for mediating teleconferences
CN112511789B (en) Instant messaging expansion method and system
CN112291139B (en) Instant messaging method and system based on XMPP protocol
CN108322429A (en) Recording control method, real-time communication system and communication terminal in real-time Communication for Power
JP4636903B2 (en) Video phone equipment
US9350943B2 (en) Video picker
JP2006210973A (en) Communication terminal and its session connection method
US9680905B2 (en) System for intelligible audio conversation over unreliable digital transmission media
US11750668B1 (en) Combined asynchronous and synchronous communication system and service with transcription support
JP2590193B2 (en) Interactive voice response device
CN115379151A (en) Video call collaborative window splicing interaction method and system based on multi-window multi-task
CN117880253A (en) Method and device for processing call captions, electronic equipment and storage medium
KR20030074011A (en) Method for intercommunicative voice message service and Messanger for the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant