WO2018054171A1 - 通话方法、装置、计算机存储介质及终端 - Google Patents

通话方法、装置、计算机存储介质及终端 Download PDF

Info

Publication number
WO2018054171A1
WO2018054171A1 PCT/CN2017/095309 CN2017095309W WO2018054171A1 WO 2018054171 A1 WO2018054171 A1 WO 2018054171A1 CN 2017095309 W CN2017095309 W CN 2017095309W WO 2018054171 A1 WO2018054171 A1 WO 2018054171A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
signal
client
call
preset
Prior art date
Application number
PCT/CN2017/095309
Other languages
English (en)
French (fr)
Inventor
王凤玲
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201610844042.2A external-priority patent/CN107864084B/zh
Priority claimed from CN201610940605.8A external-priority patent/CN107979482B/zh
Priority claimed from CN201610945642.8A external-priority patent/CN106506872B/zh
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP17852241.3A priority Critical patent/EP3490199B1/en
Publication of WO2018054171A1 publication Critical patent/WO2018054171A1/zh
Priority to US16/208,473 priority patent/US10693799B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • H04L1/1825Adaptation of specific ARQ protocol parameters according to transmission conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/12Arrangements for detecting or preventing errors in the information received by using return channel
    • H04L1/16Arrangements for detecting or preventing errors in the information received by using return channel in which the return channel carries supervisory signals, e.g. repetition request signals
    • H04L1/18Automatic repetition systems, e.g. Van Duuren systems
    • H04L1/1829Arrangements specially adapted for the receiver end
    • H04L1/1854Scheduling and prioritising arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/20Arrangements for detecting or preventing errors in the information received using signal quality detector
    • H04L1/205Arrangements for detecting or preventing errors in the information received using signal quality detector jitter monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/087Jitter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/34Flow control; Congestion control ensuring sequence integrity, e.g. using sequence numbers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0894Packet rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Definitions

  • the present application relates to the field of instant messaging, and in particular, to a call method, apparatus, computer storage medium, and terminal.
  • instant messaging software has sprung up, such as WeChat, QQ, etc.
  • the use of instant messaging software mainly depends on the Internet. Therefore, the network The degree of good or bad will directly affect the communication quality of instant messaging software.
  • the embodiment of the present application provides a call method and apparatus, so as to at least solve the technical problem that the instant communication quality is poor due to network congestion in the related art.
  • a method for calling includes: determining, according to a first data packet sent by a second client that is received by a first client by using a preset network, that the second client passes the preset network. Whether the first media information sent to the first client has lost packets, The first media information includes a first data packet that is successfully transmitted, and the first media information is media information that is transmitted when the second client performs an audio call or a video call with the first client.
  • the predetermined parameter includes Retransmitting a successful first probability threshold and successfully outputting at least one of a second probability threshold of the second data packet;
  • a preset condition that the network status information needs to be met when the retransmission is requested where the preset condition is used to indicate that the preset network successfully retransmits the second data packet.
  • a network condition required to be smaller than the first probability threshold, and/or a network condition required to indicate that the second data packet of the successful retransmission can be successfully output is not less than the network condition required by the second probability threshold;
  • a call device including:
  • the first determining part is configured to determine, according to the first data packet sent by the second client that is received by the first client by using the preset network, the first media information that is sent by the second client to the first client by using the preset network. Whether the packet loss occurs, wherein the first media information includes a first data packet that is successfully transmitted, and the first media information is media information that is transmitted when the second client performs an audio call or a video call with the first client;
  • a parameter determining portion configured to determine a predetermined parameter that requests the second client to retransmit the second data packet, where the second data packet is a data packet that fails to be transmitted in the first media information Retransmitting the data packet;
  • the predetermined parameter includes: at least one of a retransmission available parameter and a valid usage parameter; the retransmission available parameter is used to indicate a probability that the second data packet can be successfully retransmitted; The probability that the valid use parameter is used for retransmission of the second data packet is successfully output;
  • the condition determining part is configured to determine, according to the predetermined parameter, a preset condition that the network status information needs to be met when requesting retransmission, where the preset condition is used to indicate that the preset network successfully retransmits the a network condition to be reached by the second data packet, and/or a network condition indicating that the second data packet of the successful retransmission can be successfully output;
  • the first obtaining part is configured to acquire network state information of the preset network when it is determined that the first media information is lost.
  • a first execution part configured to send a retransmission request to the second client if the network status information meets the preset condition, where the retransmission request is used to request the second client to retransmit the lost information in the first media information a second data packet, where the preset condition is used to indicate a network condition that the preset network needs to retransmit the second data packet;
  • the second execution part is configured to cancel sending the retransmission request to the second client if the network status information does not satisfy the first preset condition.
  • a terminal including:
  • a network interface configured to connect to the server over a network
  • a memory configured to store computer executable instructions
  • the processor is respectively connected to the network interface and the memory, and is configured to implement the foregoing call method by executing the computer executable instructions.
  • the technical solution provided in the embodiment of the present application determines that the second client passes the preset network to the first data packet based on the first data packet sent by the second client received by the first client through the preset network. Whether the first media information sent by the client is lost, wherein the first media information includes the first data packet, and the first media information is the media that is transmitted when the second client performs an audio call or a video call with the first client.
  • the embodiment of the present application can reduce the further congestion of the preset network caused by the continuous sending of the retransmission request in the case that the network is already congested, and solve the media information transmission caused by the delay of the preset network congestion condition being delayed. Further blocked phenomenon. Therefore, from the whole network, the congestion is alleviated, and the client can provide better transmission of media information, thereby improving the quality of instant messaging as a whole.
  • FIG. 1 is a schematic diagram of a hardware environment of a call method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a communication message transmission system according to an embodiment of the present application.
  • FIG. 3A is a flowchart of an optional calling method according to an embodiment of the present application.
  • FIG. 3B is a flowchart of an optional calling method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an optional communication device according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an optional communication device according to an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a terminal according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a hardware environment of a call method according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an implementation process of a method according to the present application.
  • FIG. 11 is a schematic diagram of an implementation process of another method according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a process for implementing another method according to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a process for implementing another method according to an embodiment of the present application.
  • 15 is a schematic diagram of an end-to-end module of a call in the prior art
  • FIG. 18 is a schematic diagram of a scenario in which the present application is applied.
  • FIG. 19 to FIG. 20 are schematic diagrams showing comparison results of debounce processing after applying the embodiment of the present application.
  • FIG. 21 is a flowchart of a call state detecting method according to an exemplary embodiment
  • FIG. 22 is a spectrum diagram of a mixed signal according to an embodiment of the present application.
  • FIG. 23 is a schematic diagram of a far-end signal attenuation process according to an embodiment of the present application.
  • FIG. 24 is a schematic flowchart of a correlation value calculation according to an embodiment of the present application.
  • FIG. 25 is a schematic diagram of a call state detection process according to an embodiment of the present application.
  • FIG. 26 is a block diagram showing the structure of a call state detecting apparatus according to an embodiment of the present application.
  • a method embodiment of a call method is provided.
  • the foregoing method may be applied to a hardware environment composed of the server 102 and the terminal 104 as shown in FIG. 1.
  • the server 102 is connected to the terminal 104 through a network.
  • the network includes but is not limited to a wide area network, a metropolitan area network, or a local area network.
  • the terminal 104 is not limited to a PC, a mobile phone, a tablet, or the like.
  • the method of the embodiment of the present application may be performed by the server 102, may be performed by the terminal 104, or may be performed by the server 102 and the terminal 104 in common.
  • the method in which the terminal 104 performs the embodiment of the present application may also be performed by a client installed thereon.
  • the client B encodes and transmits the data collected by the received sound card, and transmits it to the client A through the network (that is, through the original data stream);
  • the data that is, the data transmitted in the original data stream
  • the decoded data is sent to the sound card for playback.
  • the client A receives the data, if a packet loss phenomenon is found (by the packet loss detection in step S31), the retransmission request (that is, the request in the retransmission request data stream) can be sent to the client B, and the client B receives the packet.
  • the required data is resent to client A, and the retransmitted data retransmits the response data in the response data stream.
  • the transmission method does not consider the characteristics of the voice real-time call, and the real-time call has strict requirements on the time when the data arrives, and the retransmission is to resend the retransmission request and wait for the other party to resend the response data after detecting the packet loss.
  • the time required for packet loss detection, transmission of retransmission request, and reception of response data it takes a certain amount of time to go through one after another, if this time is too large, then even The response data is re-transmitted to the receiving end, which is useless for real-time communication. Under such network conditions, the usage rate of retransmitted data will be very low, or even not at all.
  • a method embodiment of a call method is provided.
  • FIG. 3A is a flowchart of an optional call method according to an embodiment of the present application. As shown in FIG. 3A, the method may include the following steps:
  • Step S302 Determine, according to the first data packet sent by the second client that is received by the first client by using the preset network, whether the first media information sent by the second client to the first client by the preset network is lost.
  • the first media information includes a first data packet, where the first media information is media information that is transmitted when the second client performs an audio call or a video call with the first client; where the first data packet is the first media information.
  • the first successful transmission of the data packet so referred to as the initial success packet.
  • Step S304 Acquire network state information of the preset network when it is determined that the first media information is lost.
  • Step S306 if the network status information meets the preset condition, send a retransmission request to the second client, where the retransmission request is used to request the second client to retransmit the lost content in the first media information.
  • the preset condition is used to indicate a network condition that the preset network needs to retransmit the second data packet;
  • Step S308 if the network status information does not satisfy the preset condition, cancel sending the retransmission request to the second client.
  • the network device determines whether to send a retransmission request according to the foregoing step S302 to step S308, and sends a retransmission request to obtain the lost data packet if the network condition is ideal.
  • the retransmission request is not sent to avoid the congestion of the network, and the instant messaging quality caused by network congestion in the related art can be solved.
  • Technical problems and thus achieve the technical effect of improving the quality of instant messaging.
  • the preset condition is further determined; when the preset condition is determined, as shown in FIG. 3B, the following steps are included:
  • Step S3041 determining a predetermined parameter that requests the second client to retransmit the second data packet, where the second data packet is a retransmission data packet of the data packet that fails to be transmitted in the first media information;
  • the predetermined parameter includes: at least one of a first probability threshold for retransmission success and a second probability threshold for successfully outputting the second data packet;
  • Step S3042 Determine, according to the predetermined parameter, a preset condition that the network status information needs to be met when requesting retransmission, where the preset condition is used to indicate that the preset network successfully retransmits the second data packet.
  • the probability of not being less than the network condition required by the first probability threshold, and/or the probability that the second data packet indicating successful retransmission can be successfully output is not less than the second probability threshold required Network condition
  • the duration that the second client keeps the data packet sent to the first client in the cache is limited, so that even if the retransmission request reaches the second client, the second client itself discards The first media information, obviously, even if the retransmission request successfully reaches the second client, there is no way to successfully request the second data packet.
  • the retransmission request may cause packet loss during the process of reaching the second client. In this case, the retransmission request may fail due to the loss of the retransmission request. Therefore, in this embodiment, parameters such as the probability of requesting retransmission are determined based on the currently transmitted status information of the first media information.
  • the preset parameter may be a pre-negotiated parameter, or may be dynamically determined according to the type of the first media information currently transmitted by the first client and the second client.
  • the first probability threshold and the second probability threshold corresponding to the transmission of the voice data packet and the video data packet may be different.
  • the probability that the retransmission request successfully obtains the retransmission data packet may be counted, and only when the probability is higher than the first probability threshold, the retransmission request is sent to request the retransmitted data packet.
  • the retransmitted data packet is successfully requested from the second client, but the output time of the retransmitted second data packet has passed, the retransmission data packet is not necessary to be requested. Therefore, in this embodiment, it may also be determined that the probability that the second data packet successfully retransmitted is output is not less than a network condition required by the second probability threshold.
  • the retransmission request is sent, so that it is obviously not limited to any network condition, and the retransmission request is sent directly in the case of packet loss. Effectively reduce the frequency of retransmission request transmission, and reduce further congestion caused by frequent transmission of retransmission requests under network congestion conditions, and use useful bandwidth for useful media information transmission as much as possible.
  • the determining, according to the predetermined parameter, a preset condition that the network condition information needs to be met when requesting retransmission includes at least one of the following:
  • the length of time for the second client to cache the first media information may be different.
  • the duration of the first media information is cached by the second client, and various weights are utilized.
  • the transmission model or the like calculates a first network condition that needs to ensure that the probability of success of the second data packet retransmission reaches a second probability threshold or more.
  • the network condition that the current network status information needs to be met is determined according to the predetermined parameter.
  • Option 1 determining, according to the predetermined parameter, a preset condition that the network status information needs to be met when requesting retransmission, including:
  • Option 2 determining, according to the predetermined parameter, a preset condition that the network status information needs to be met when requesting retransmission, and the method further includes:
  • the output rate of the media information here is for the number of voice data packets that can be outputted per unit time, and the amount of voice data output per unit time.
  • the number of frames of the image frame that can be output per unit time for the video that is, corresponds to the frame rate and the like.
  • the above client can be a client for communication, and the client can be installed on a fixed device such as a computer or a mobile device.
  • the client may be a client that requires high immediacy of communication, that is, an instant messaging client, such as WeChat, QQ, etc., which can be used for an instant messaging service.
  • the fixing device may include a desktop computer, a smart TV, or the like.
  • the mobile device can include a mobile phone, a tablet, a wearable device, and the like.
  • the preset network is a network for communication between clients, for example, an Internet connecting two clients.
  • the client A is located in Haidian District, Beijing, and the client B is located in the Chaoyang District of Beijing; the server connecting the client A and the client B is also deployed in the Haidian District and the Chaoyang District, and the predetermined network may include: Connect the network of Haidian District and Chaoyang District.
  • the preset network here may be the network of the first media information transmission.
  • the first media information may be dynamic multimedia information, such as video, audio, GIF pictures, etc., or may be static information, such as text information, static pictures, and the like.
  • Network status information is also used to describe the status of network communication, such as network transmission speed, delay and other information.
  • the above network conditions refer to the minimum network resources required for transmitting the second data packet and/or the minimum network communication state required by the network, such as the minimum required to limit the preset network retransmission of the second data packet. Conditions such as network transmission speed and minimum delay.
  • a retransmission request is sent whenever a packet loss occurs. Because the network congestion is serious at this time, the retransmission request is sent, which undoubtedly aggravates the congestion of the network, and thus causes more The data packet is lost, and due to the serious network congestion, even if the response packet is received, the validity of the response packet is greatly reduced, and the effect of improving the communication quality is not improved. On the contrary, due to the aggravation of the network congestion, more The packet is lost.
  • the retransmission request is not sent to avoid aggravating the congestion state of the network, and the subsequent packet loss phenomenon may be reduced as compared with the means adopted in the related art. , in turn, the communication quality is relatively improved.
  • the execution body of step S302 to step S308 may be a client that receives the data packet (ie, the first a client), that is, the first client initiates a retransmission request to the second client according to the needs of the user, and in order to reduce the running load of the first client, the application server to which the client belongs may perform step S302 to step S308,
  • the server monitors the data packet reception status of the first client, and after determining the packet loss, applies for the lost data packet to the second client according to the network condition, where the server can be the client server, for example, the client is instant.
  • the server is an instant messaging application server.
  • the present application analyzes current network characteristics based on historical data, determines whether to send a retransmission request according to network characteristics and importance of receiving voice data, and adjusts related strategies of retransmission control in real time according to utilization of retransmission data. Bandwidth utilization and retransmission usage are optimal under various network conditions.
  • the alternative implementation is shown in Figure 3:
  • step S302 based on the first data packet sent by the second client received by the first client through the preset network, determining, by the first client, the first data sent to the first client by using the preset network Whether the packet loss occurs in the media information may be implemented by determining whether the first media information is lost according to the sequence index information in the first data packet.
  • the data packet with the index of 8 may be determined to be lost.
  • an index interval of a plurality of data packets corresponding to a certain media information is identified in the data packet. For example, for a voice in an instant messaging application, it can be split into 100 data packets for transmission, and then in the data packet. It can be identified that the index range used for the voice is 301 to 400, so that any one of the data packets can be lost according to the received data packet.
  • step S304 if it is determined that packet loss occurs in the first media information, the network state information of the preset network is acquired, and the acquired information mainly includes the current used bandwidth for characterizing the first network state, and the current The transmission delay, the current packet loss rate, and a second preset value for describing the number of consecutive lost packets.
  • the current used bandwidth is used to indicate the current used code rate.
  • Use code The rate refers to the actual rate used by the current call, including the transmission rate and the received code rate.
  • the transmission rate is the total number of bytes sent divided by the duration of the call.
  • the received code rate is the total number of bytes received divided by the duration of the call. .
  • the estimated bandwidth ie, the bandwidth threshold
  • Estimated bandwidth estimated to be the approximate bandwidth of the link during the current call, is a real-time change.
  • determining a bandwidth threshold according to the bandwidth information of the preset network Before determining whether the first network state of the preset network indicated by the network state information matches the second network state required for retransmitting the second data packet, determining a bandwidth threshold according to the bandwidth information of the preset network;
  • the network jitter information determines a transmission delay threshold;
  • the packet loss rate threshold is determined according to the historical packet loss rate and the packet loss model.
  • the packet loss rate includes the long-term packet loss rate (that is, the packet loss rate from the start of the call to the current time) and the short-term packet loss rate (such as the packet loss rate within 5 seconds, which is used to indicate whether the network packet loss rate is abrupt).
  • the cumulative histogram of consecutive packet loss numbers used to characterize the packet loss model, that is, the network type of uniform packet loss, or the network type with more bursts and large packet loss).
  • Transmission delay refers to the time required for a node to enter a transmission medium from a node when transmitting data, that is, the total time required for a station to start transmitting a data frame until the data frame is sent (or the receiving station receives another time). The total time of the data frame sent by a station).
  • step S306 or S308 after acquiring network state information of the preset network, and before sending a retransmission request to the second client or canceling sending a retransmission request to the second client, determining network status information Whether the indicated first network state of the preset network matches the second network state required for retransmitting the second data packet; if the first network state matches the second network state, determining that the network state information meets the pre- If the first network state does not match the second network state, it is determined that the network state information does not satisfy the preset condition.
  • determining whether the first network state of the preset network indicated by the network state information matches the second network state required for retransmitting the second data packet includes at least one of: determining a difference between the bandwidth threshold and the current used bandwidth. Whether the value is smaller than the first preset value; determining whether the current transmission delay is small
  • the transmission delay threshold is used to determine whether the current packet loss rate is smaller than the packet loss rate threshold; and whether the number of consecutive packet loss is smaller than the second preset value; wherein the preset determination result is used to indicate the first network state and the second network state.
  • the matching, the preset judgment result includes at least one of the following: determining that the difference between the bandwidth threshold and the current used bandwidth is smaller than the first preset value; determining that the current transmission delay is smaller than the transmission delay threshold; determining that the current packet loss rate is less than the lost The packet rate threshold is determined; the number of consecutive packet drops is determined to be less than the second preset value.
  • the sending the retransmission request to the second client includes: sending, to the second client, the network state information meets the preset condition, and the voice feature includes at least one of a voiced feature, a voice feature, and a semantic feature. Retransmit the request.
  • the voice signal can be analyzed, such as unvoiced, voiced analysis, voice, silence analysis, semantic importance analysis, etc., to adjust the network parameter threshold.
  • the packet can be retransmitted as long as the packet is detected.
  • the bandwidth is insufficient, only the lost important speech frames (i.e., the speech frames satisfying one or more of the voiced features, the voice features, and the semantic features described above) are retransmitted.
  • the lost important speech frames i.e., the speech frames satisfying one or more of the voiced features, the voice features, and the semantic features described above.
  • the method further includes:
  • the step S304 includes:
  • step S304 may be: when the data content is not a predetermined type, the step S304 may be blocked.
  • the method further includes at least one of: according to the previously determined bandwidth threshold and the pre- Setting the current bandwidth information of the network to determine the current bandwidth threshold; increasing the packet loss rate when the first ratio of the number of received second data packets to the number of transmitted retransmission requests is less than a third preset value Threshold, and reducing the transmission delay threshold; if the second ratio between the received valid second data packet and all the received second data packets is less than the fourth preset value, increasing the packet loss rate threshold, And reduce the transmission delay threshold.
  • the above-mentioned valid second data packet refers to a data packet that satisfies the real-time requirement, that is, a data packet received within a preset time after the loss.
  • the thresholds such as the bandwidth threshold, the packet loss threshold, and the transmission delay threshold may be set to an initial value according to experience at the initial time, and the initial execution of each threshold is performed in steps S302 to S308.
  • self-adjustment can be performed according to network conditions and actual feedback conditions, so as to improve the quality of voice communication.
  • step S306 or S308 After sending the retransmission request to the second client, receiving the second data packet sent by the second client; generating the second media information according to the first data packet and the second data packet; or If the network status information does not satisfy the preset condition, the third media information is generated according to the first data packet.
  • the generated second media information that is, the first media information
  • the generated second media information that is, the first media information
  • the third media information is relatively low in quality compared to the first media information.
  • the transmission control process mainly includes:
  • Step S31 the packet loss detection determines whether there is a packet loss according to the sequence index information in the packet header information of the received data packet. For example, the current data packet has a sequence index of 25, and the previous data packet has a sequence index of 24, Since the sequence index of the two data packets is continuous, according to the sequence index, no packet loss occurs. If the sequence index of the previous data packet is 22, since the sequence index of the two data packets is not continuous, the number index may be lost according to the sequence number index. Packet, and the number of lost packets is 2 (that is, the serial number index of the lost packet is 23 and 24).
  • Step S32 requesting control, if it is detected in step S31 that a packet loss occurs, transmitting a retransmission request to the other party (such as client B).
  • Step S33 in response to the control, determining which data to retransmit in the history cache data according to the received retransmission request information.
  • the basis for the determination includes: the length interval between the retransmitted data and the transmitted data, and the important level of data to be retransmitted.
  • the retransmission request is sent when the packet loss is detected, and the retransmission request information is also required to consume bandwidth, in some networks. If the bandwidth is consumed, the network congestion may be intensified, the call quality may be deteriorated, or the utilization of the retransmission data may be too low due to the characteristics of the real-time call. In this case, the re-requesting information in step S32 is unnecessary.
  • the network characteristics, the utilization rate of retransmission, and the like are not considered. Therefore, in this retransmission control method, the utilization of retransmission data and the utilization of bandwidth are not controlled according to different network characteristics.
  • Step S401 the packet loss detection determines whether there is a packet loss according to the sequence index information in the packet header information.
  • step S402 the packet loss is determined, that is, whether packet loss occurs. If no packet loss is detected in step S401, step S409 is performed. If packet loss is detected, step S403 is performed.
  • Step S403 performing network characteristic analysis, where network characteristics include, but are not limited to, using a code rate, an estimated bandwidth, a packet loss rate, a network jitter, and an end-to-end transmission delay.
  • the above-mentioned usage code rate refers to the code rate actually used by the current call, including the transmission code rate and the reception code rate.
  • the transmission code rate is the total number of bytes transmitted divided by the duration of the call
  • the received code rate is the total number of bytes received. Divided by the duration of the call, for example, the estimated bandwidth is 512 kbps, and the currently used transmission rate is 100 kbps, which means that the bandwidth is sufficient, and it is okay to send more retransmission packets without stressing the network.
  • Estimated bandwidth estimated to be the approximate bandwidth of the link during the current call, is a real-time change.
  • the packet loss rate includes the long-term packet loss rate (that is, the packet loss rate from the start of the call to the current time) and the short-term packet loss rate (such as the packet loss rate within 5 seconds, which is used to indicate whether the network packet loss rate is abrupt).
  • the cumulative histogram of consecutive packet loss numbers used to characterize the packet loss model, that is, the network type of uniform packet loss, or the network type with more bursts and large packet loss).
  • Network jitter which is a concept in QOS (Quality Of Service), refers to the degree of change in packet delay. If the network is congested, the queuing delay will affect the end-to-end delay and cause packet delays transmitted through the same connection. The same, and jitter is used to describe the extent of such delay variations.
  • Step S404 calculating a corresponding network parameter threshold according to the result of the analysis in step S403.
  • the bandwidth threshold According to the estimated bandwidth, when the used code rate (that is, the current used bandwidth, such as the received code rate and the transmission code rate) is greater than a certain threshold, the ARQ request (ie, the retransmission request) is not allowed to be sent.
  • the packet loss rate threshold is determined according to the historical packet loss rate and the analysis of the packet loss model to determine the threshold under the current packet loss rate. For example, in a network with insufficient bandwidth or a network with a large packet loss rate, the more data is sent, the more data is lost. At this time, sending an ARQ request increases the network load, that is, sends an ARQ request. It is also useless or harmful.
  • the bandwidth is sufficient. If a packet loss is detected, a retransmission request can be sent. Assuming the estimated bandwidth is 512 kbps, the used bit rate is 450 kbps, indicating that the remaining bandwidth is not. Very sufficient, at this time, only the packet loss rate is greater than 15%, and the cumulative histogram of the number of consecutive packet drops shows that the number of consecutively lost multiple (such as 4) packets is relatively large, then the retransmission request is sent. The reason for this is that when the packet loss rate is relatively low, although the quality of the call will decrease, the semantic understanding will not be affected. When the packet loss rate is large, the semantic reception will be affected. When the bandwidth is not enough, in order to avoid the impact of multiple retransmission packets on the network, the retransmission request is sent only when the packet loss rate reaches a certain level.
  • Step S405 the related utilization rate of the retransmission request is counted.
  • the client B After receiving the ARQ request, the client B finds the corresponding data in the history cache data, and resends the data as a response packet to the client A. At this time, if the transmission delay between Client B and Client A is too large, the response data will not meet the data requirements of the real-time call when it arrives at Client A, and it becomes necessary to become a late packet. If the response data is used, the response data is too low. If the actual utilization rate is low for a period of time, the ARQ request frequency needs to be lowered, that is, the relevant threshold of the network parameter is increased.
  • step S406 the threshold is updated.
  • the network bandwidth and transmission delay are estimates, even if the parameters such as estimated bandwidth, code rate, packet loss rate, and transmission delay are properly controlled, the actual effect may not be satisfactory. For example, The bandwidth estimation is not accurate enough.
  • the network congestion makes the transmission delay larger, and many retransmission requests are sent, but the received retransmission response packets are few.
  • the number of response data received and ARQ at this time.
  • the ratio between the number of requests will be low, for example, 1000 retransmission requests are sent, and only one ARQ response packet is received. At this time, the frequency of retransmission requests is reduced. When the reduction is not a little less, it is achieved by increasing the relevant network parameters step by step.
  • the ARQ request is allowed to be sent. Now the threshold is increased, only the packet loss rate. An ARQ request is allowed to be sent when the transmission delay is less than 150 ms.
  • Step S407 signal characteristic analysis.
  • the signal is analyzed, such as unvoiced, voiced analysis, voice, silence analysis, semantic importance analysis, etc., to adjust the network parameter threshold in step S406. For example, when the bandwidth is sufficient, the retransmission request can be performed as long as the packet loss is detected. When the bandwidth is insufficient, only the retransmission request for the lost important voice frame is performed.
  • the bandwidth estimation is 512 kbps, and the use rate is 100 kbps, indicating that the bandwidth is sufficient, so that a retransmission request can be sent as long as there is a packet loss; assuming that the bandwidth estimation is 512 kbps and the use bandwidth is 460 kbps, indicating that the bandwidth is not sufficient, then A retransmission request is sent only when the lost packet is found to be important.
  • Step S408 the comprehensive judgment of the request.
  • step S409 the ARQ request is not allowed to be sent.
  • Step S410 allowing an ARQ request to be sent.
  • the transmission of the retransmission request can be adapted to different network characteristics, so that the bandwidth utilization and the retransmission efficiency are optimized in various network environments.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present application which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present application.
  • the present application further provides another embodiment; the calling method further includes:
  • the debounce parameter can be obtained by using the second debounce policy, and the call quality can be improved by performing debounce processing in a voice call or a video call.
  • the method further includes: collecting offline network data, extracting at least one network parameter for characterizing the network feature from the offline network data; constructing a network model according to the at least one network parameter, according to the network The model determines a first de-jittering strategy; the first de-jittering policy is modified according to a characteristic parameter for evaluating a call quality of an audio call or a video call, to obtain a second de-jittering strategy.
  • the first de-jittering policy is modified according to the feature parameter of the call quality for the call quality of the audio call or the video call, including:
  • the first debounce policy is modified according to the signal content of the current audio call or the video call.
  • the first debounce strategy is modified according to the perceptual auditory result.
  • the method further includes:
  • the first de-jittering policy is modified according to different processing capabilities of the terminal device and/or scheduling characteristics of an application as the communication medium.
  • the method further includes:
  • the first de-jittering policy is modified according to different processing capabilities of the terminal device and/or scheduling characteristics of an application as the communication medium.
  • the client of the embodiment of the present application can be implemented in various forms corresponding to an intelligent terminal (such as a mobile terminal).
  • the mobile terminal described in the embodiments of the present application may include, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a personal digital assistant (PDA, Personal Digital Assistant), a tablet computer (PAD), a portable multimedia player ( Mobile terminals such as PMP (Portable Media Player), navigation devices, and the like, and fixed terminals such as digital TVs, desktop computers, and the like.
  • PDA Personal Digital Assistant
  • PAD tablet computer
  • PMP Portable Multimedia Player
  • navigation devices and the like
  • fixed terminals such as digital TVs, desktop computers, and the like.
  • the terminal is a mobile terminal.
  • those skilled in the art will appreciate that configurations in accordance with embodiments of the present application can be applied to fixed type terminals in addition to elements that are specifically for mobile purposes.
  • FIG. 9 is a schematic diagram of hardware entities of each party performing information interaction in the embodiment of the present application, and FIG. 9 includes: a terminal device 1, a server 2, and a terminal device 3.
  • the terminal device 1 is called a sender device, and is composed of terminal devices 11-14.
  • the terminal device 3 is called a receiver device and is composed of terminal devices 31-35.
  • the server 2 is configured to perform debounce processing logic.
  • the terminal device exchanges information with the server through a wired network or a wireless network.
  • Terminal equipment includes mobile phones, desktops, PCs, all-in-ones, and the like.
  • the terminal device 1 performs information transmission and interaction via the server 2 and the terminal device 3.
  • the call of the present application can be a voice call or a video call.
  • the terminal device 11-14 sends the network in the current Voip network call.
  • the network data is played by the terminal device 31-35 to complete the Voip network call.
  • the de-jittering strategy is implemented by using a single parameter in the prior art, the call quality of the Voip network call is affected, and the offline network data of the current network is used in the embodiment of the present application, and the offline network data is extracted and used.
  • at least one network parameter characterizing the network feature constructing the network model according to the at least one network parameter, so that the first de-jittering strategy (or de-jittering strategy) determined according to the network model tends to be accurate.
  • the processing logic 10 in the server 2 that performs the de-shake processing includes: S1, collecting offline network data, and extracting, from the offline network data, at least one network parameter for characterizing the network feature; Determining at least one network parameter to construct a network model, determining a first de-jittering policy according to the network model; S3, correcting the first de-jittering policy according to a characteristic parameter for evaluating a voice call or a video call quality such as a Voip call Obtaining a second debounce strategy; S4, obtaining a debounce parameter according to the current real-time network condition and the second de-jittering policy, and setting, according to the debounce parameter, a voice call or video call data for transmitting a voice call such as a Voip call
  • the buffer size is such that the delay of a voice call or video call such as a Voip call is as expected.
  • FIG. 9 is only an example of a system architecture that implements an embodiment of the present application.
  • the embodiment of the present application is not limited to the system structure described in FIG. 9 above.
  • various embodiments of the method of the present application are proposed. .
  • An information processing method in the embodiment of the present application includes: collecting offline network data, and extracting, from the offline network data, at least one network parameter used to represent a network feature, according to the The at least one network parameter constructs a network model to measure or simulate the call quality of the Voip according to the network model, and the first debounce strategy (101) is determined according to the network model.
  • the first de-jittering strategy may also be referred to as an initial de-jittering strategy.
  • the initial debounce strategy can be determined, since the relevant parameters output based on the initial debounce strategy include debounce parameters and delay parameters, etc. Therefore, the initial de-jittering strategy and related parameters are determined according to the network model, and the related parameters include a de-jitter parameter and a delay parameter.
  • the first debounce strategy is based on characteristic parameters (such as historical data of the current call, signal content of the current call, perceptual auditory result of the current call, etc.) for evaluating the quality of a voice call or a video call such as a Voip call.
  • the correction is made to obtain a second debounce strategy (1021).
  • a second debounce strategy (1021).
  • it in terms of the historical data of the call, it can reflect the characteristics of the call network; in terms of the signal content of the call, it determines whether the current frame is an important frame, and the voice data content is an important frame, which needs to be focused.
  • a voice call or video call data such as a Voip call
  • the delay in a call or video call is as expected and tends to be reasonable (103).
  • the size of the de-jitter buffer is determined according to the de-jitter parameter obtained by the second de-jittering strategy.
  • the buffer data is adjusted based on the size of the de-jitter buffer.
  • the debounce algorithm is constructed by using multiple parameters, the various complex conditions in the network call environment are fully estimated, and the obtained first debounce strategy (or initial debounce strategy) tends to be accurate. According to this, the relevant parameters obtained by the initial debounce strategy, such as debounce parameters, also tend to be accurate.
  • the first de-jittering strategy is further modified according to a characteristic parameter for evaluating a voice call or a video call quality such as a Voip call, to obtain a second de-jittering strategy; according to the current real-time network situation and location
  • the second de-jittering strategy obtains a de-jitter parameter, and according to the de-jitter parameter, sets a buffer size for transmitting a voice call or a video call data such as a Voip call, so that the delay of a voice call or a video call such as a Voip call is met.
  • logic of the collection, policy determination, and policy modification in the processing logic of the above method is not limited to being located at the transmitting end, the receiving end, or the server, and some or all of the logic may be located at the transmitting end, the receiving end, or the server.
  • the method includes: collecting offline network data, and extracting, from the offline network data, at least one network parameter used to represent a network feature, according to the The at least one network parameter constructs a network model to measure or simulate the call quality of the Voip according to the network model, and the first debounce strategy (201) is determined according to the network model.
  • the first de-jittering strategy may also be referred to as an initial de-jittering strategy.
  • a large number of existing network-related network data are collected through different network types, and the network model is obtained through offline training, and the network model is The initial de-jittering strategy may be determined.
  • the initial de-jittering strategy and related parameters are determined according to the network model, and related parameters include Jitter parameters and delay parameters.
  • the network parameter setting in the first de-jittering policy can be adjusted.
  • debounce parameters and delay processing parameters Obtaining a debounce parameter according to the current real-time network condition and the second de-jittering policy, and setting a buffer size for transmitting a voice call or video call data, such as a Voip call, according to the debounce parameter, so as to make a voice such as a Voip call
  • the delay of the call or video call is as expected and tends to be reasonable (203).
  • the size of the de-jitter buffer is determined according to the de-jitter parameter obtained by the second de-jittering strategy. Finally, the buffer data is adjusted based on the size of the de-jitter buffer.
  • An information processing method in the embodiment of the present application includes: collecting offline network data, and extracting, from the offline network data, at least one network parameter for characterizing a network, according to the The at least one network parameter constructs a network model to measure or simulate the call quality of the Voip according to the network model, and the first debounce strategy (301) is determined according to the network model.
  • the first de-jittering strategy may also be referred to as an initial de-jittering strategy.
  • a large number of existing network-related network data are collected through different network types, and the network model is obtained through offline training, and the network model is The initial de-jittering strategy may be determined.
  • the initial de-jittering strategy and related parameters are determined according to the network model, and related parameters include Jitter parameters and delay parameters.
  • the signal content of the call determines whether the current frame is an important frame, the content of the voice data is an important frame, and needs to be focused, while the content of the mute data does not need to be focused on, and the processing of debounce for different content is Differently, in a single call, network parameter settings in the first debounce policy, such as a debounce parameter and a delay processing parameter, may be adjusted.
  • the characteristic parameters such as the historical data of the current call, the perceptual auditory result of the current call, etc.
  • the voice call or video call quality of the Voip call may be corrected.
  • the delay of the call or video call is as expected and tends to be reasonable (303).
  • the size of the de-jitter buffer is determined according to the de-jitter parameter obtained by the second de-jittering strategy.
  • the buffer data is adjusted based on the size of the de-jitter buffer.
  • the method includes: collecting offline network data, and extracting, from the offline network data, at least one network parameter used to represent a network feature, according to the The at least one network parameter constructs a network model to measure or simulate the call quality of the Voip according to the network model, and determine a first debounce policy (401) according to the network model.
  • the first de-jittering strategy may also be referred to as an initial de-jittering strategy.
  • a large number of existing network-related network data are collected through different network types, and the network model is obtained through offline training, and the network model is The initial de-jittering strategy may be determined.
  • the relevant parameters output based on the initial de-jittering strategy include a de-jittering parameter and a delay parameter, etc.
  • the initial de-jittering strategy and related parameters are determined according to the network model, and related parameters include Jitter parameters and delay parameters.
  • Obtaining the perceptual auditory result of the call which may also be referred to as a traditional perceptual auditory evaluation parameter, and using the perceptual auditory result of the current call as a characteristic parameter for evaluating the quality of a voice call or a video call such as a Voip call, according to the present
  • the first de-jittering strategy is modified by the perceptual auditory result of the secondary call to obtain a second de-jittering strategy (402).
  • a voice call or video call data such as a Voip call
  • the delay of the call or video call is as expected and tends to be reasonable (403).
  • the size of the de-jitter buffer is determined according to the de-jitter parameter obtained by the second de-jittering strategy.
  • the buffer data is adjusted based on the size of the de-jitter buffer.
  • the sender and the receiver can also be set according to different processing capabilities of the device, scheduling characteristics of the application thread, and the like. Different delay processing methods and parameters are used to continue to modify the first de-jittering strategy to improve the accuracy of the de-jittering strategy, as shown in the following embodiments.
  • the voice call or the video call data of the Voip call of the current call when the voice call or the video call data of the Voip call of the current call is collected, different processing capabilities of the terminal device are acquired. Or scheduling characteristics of the application as the voice call or video call medium of the Voip call, according to different processing capabilities of the terminal device and/or scheduling characteristics of the application as the voice call or video call medium of the Voip call Correcting the first debounce strategy.
  • the receiving end (or the playing end) in the entire Voip network call in an information processing method in the embodiment of the present application, when playing a voice call or video call data such as a Voip call of the current call, acquiring the terminal device Different processing capabilities and/or scheduling characteristics of the application as the voice call or video call medium of the Voip call, according to different processing capabilities of the terminal device and/or as a voice call or video call such as a Voip call.
  • the scheduling characteristic of the application of the medium corrects the first de-jittering strategy.
  • offline packet capture, corresponding parameter extraction network characteristics, and a large number of offline training different network model parameters are established, and initial debounce is determined according to the established network parameter model.
  • the algorithm and related parameters are then adjusted according to the historical data of the current call, and the debounce strategy and related parameters are adjusted. Because, in the modeling of the network model, the overall characteristics of the network during the whole call are considered, and the burstiness in a period of time is also considered, so that the network characteristics can be more accurately estimated.
  • JB_len refers to the buffer size
  • AD_up refers to the upper limit of the buffer
  • AD_dw refers to the lower limit of the buffer
  • F1-F4 refers to the adjustment
  • the empirical value of the parameter as follows:
  • JB_len>AD_up ⁇ F1 if the current frame signal content is an important frame (such as a voice segment), the current buffer data is compressed; if the current frame is non-critical data (such as mute data), the current frame is directly Lost.
  • JB_len>AD_up ⁇ F2 F1>F2
  • the current frame signal content is an important frame (such as a voice segment)
  • no processing is performed on the current buffer data
  • the current frame is non-critical data (such as silent data)
  • the current buffer data is compressed.
  • the magnitude of the compression is determined by the size of F1 and F2, and the magnitude of each compression is smaller than the data length of the current frame.
  • the basis for this processing is that whether the signal is compressed or directly lost, in fact, the call quality is a kind of damage, and the damage of the direct packet loss is greater than the compression damage; the compression algorithm based on a single packet, each compression The amplitude is less than the data length of 1 frame, so the data compression directly loses the current frame, and the buffer data length is not reduced so fast, that is, the end-to-end delay is slowed down. Therefore, we only take the method of direct frame loss when the buffer data length is very large and the current data is non-important data. If the data length of the buffer area is very large and the current data is important data, the damage is small.
  • the method is compression to adjust the buffer length; if the data length of the buffer area is greater than a certain threshold, but the current frame is important data, or a strategy of doing nothing, so as to ensure the quality of the voice segment to the greatest extent. .
  • the extra delay can wait until the non-silent segment performs fast processing to reduce the end-to-end delay and maximize the perceived quality of the call.
  • JB_len ⁇ AD_dw ⁇ F3 if the current frame is a non-important frame, the current frame is directly copied, and the number of times of copying is determined according to the size of F3; if the current frame is an important frame, the current buffer area data is expanded.
  • JB_len ⁇ AD_dw ⁇ F4 F3 ⁇ F4
  • the current buffer is expanded. The magnitude of each expansion is determined by the size of F3 and F4.
  • the data of the buffer is directly decoded and sent to the sound card device without any debounce processing.
  • the method further includes:
  • the voice call or the video call is subjected to a specific process of improving the call quality.
  • the determining whether the first client and the second client are in a dual-speaking state that simultaneously collects a sound includes:
  • a far-end signal provided by the first client, where the far-end signal is a signal obtained according to a sound signal sent by a peer end of the voice call;
  • the near-end signal is a sound signal collected by the second client through the microphone portion
  • the correlation value is less than the preset correlation value threshold, it is determined that the call state when the microphone portion collects the near-end signal is a double talk state.
  • the method before the superimposing the ultrasonic signal on the far-end signal, the method further includes:
  • the step of superimposing the ultrasonic signal on the far-end signal is performed.
  • the acquiring the remote signal includes:
  • the cutoff frequency of the low pass filter is lower than the lowest frequency of the ultrasonic signal.
  • the determining, according to the ultrasonic signal, the first signal segment in the far-end signal and the second signal segment in the near-end signal including:
  • a signal played at the playback time is determined as the first signal segment.
  • the determining, according to the ultrasonic signal, the first signal segment in the far-end signal and the second signal segment in the near-end signal including:
  • the signal obtained by the query is determined as the second signal segment.
  • the data information carried by the ultrasonic signal superimposed on the far-end signal is not repeated within a predetermined period
  • the predetermined period is greater than or equal to a maximum value of the echo delay, and the echo delay is a delay between the speaker portion playing the mixed signal and the echo corresponding to the mixed signal collected by the microphone portion.
  • the data information carried by the ultrasonic signal comprises a plurality of ultrasound codes, each of the ultrasound codes being composed of at least two coding parts, and each of the coding parts is used to indicate at least two ultrasound frequency points Whether there is a signal at each ultrasonic frequency point.
  • the calculating a correlation value between the first signal segment and the second signal segment includes:
  • the method further includes:
  • the amplitude of the far-end signal is attenuated according to a predetermined attenuation strategy.
  • FIG. 21 is a flowchart of a call state detecting method according to an exemplary embodiment.
  • the call state detecting method may include the following steps:
  • Step S201 Receive a sound signal sent by a peer end of the voice call.
  • the terminal may receive a voice signal sent by the opposite end of the call, and the voice signal may be a voice signal sent through the PSTN or a voice signal sent through the data network.
  • Step S202 performing low-pass filtering on the received sound signal to obtain a far-end signal.
  • the far-end signal is a signal carrying a sound emitted by a peer end of the voice call, and the cut-off frequency of the low-pass filter is lower than the lowest frequency of the ultrasonic wave.
  • the normal frequency of the voice signal is relatively low, usually between several hundred and several kilohertz, and in the received sound signal, some high frequency interference signals may be carried, and these high frequency interference signals There may be ultrasonic signals in it.
  • the accuracy of the alignment further affects the accuracy of the dual-state detection. Therefore, in the embodiment of the present application, after receiving the voice signal sent by the opposite end of the voice call, the terminal first performs low-pass filtering on the voice signal.
  • the cutoff frequency of the low pass filtering needs to be lower than the lowest frequency of the ultrasonic wave, so as to avoid interference with the ultrasonic signal superimposed on the far end signal in the subsequent step.
  • the lowest frequency of the ultrasonic signal is 20 kHz
  • the cutoff frequency of the low pass filtering may be between the normal frequency of the voice signal and the lowest frequency of the ultrasonic signal.
  • the cutoff frequency may be 12 kHz, that is, the terminal will receive Among the sound signals, signals below 12 kHz are acquired as far-end signals.
  • Step S203 detecting whether the power value of the far-end signal is greater than a preset power threshold, and if yes, proceeding to step S204, otherwise, proceeding to step S211.
  • the terminal after acquiring the far-end signal, the terminal first determines whether the power value of the far-end signal is greater than a preset power threshold, and if yes, the power of the far-end signal is higher, through the speaker portion. After playing, the microphone part will collect the echo signal. If the power value of the far-end signal is not greater than the preset power threshold, the power of the far-end signal is higher. After the speaker part is played, the microphone part may not be collected. Go to the echo signal.
  • the power value of the far-end signal is also used to determine whether the opposite end of the voice call is making a sound. If the remote signal is greater than the preset power threshold, it indicates that the opposite end of the voice call is making a sound, for example, the peer user is speaking, and then proceeds to step S204 for subsequent further detection; if the far-end signal is not greater than the preset
  • the power threshold indicates that the peer end of the voice call does not emit a sound, or the voice of the opposite end of the voice call is small. For example, the peer user does not currently speak, and then proceeds to step 205.
  • the terminal when calculating the power value of the far-end signal, the terminal may perform the framing of the far-end signal by a fixed duration (for example, 20 ms), and perform power value calculation for each of the far-end signal frames respectively.
  • a fixed duration for example, 20 ms
  • the calculation formula of the power value of the nth frame can be as follows:
  • P X (n) is the power value of the nth frame
  • M is the frame length, and is numerically equal to the sampling frequency of the far-end signal multiplied by 20 ms, where x is the sampled value of the far-end signal.
  • Step S204 superimposing the ultrasonic signal on the far-end signal to obtain a mixed signal after superimposing the ultrasonic signal.
  • the conventional microphone section uses a sampling frequency of 48 kHz. According to the Shannon sampling theorem, the maximum frequency of the signal collected by the microphone section is 24 kHz.
  • the frequency of the ultrasonic signal superimposed on the far-end signal needs to be lower than the maximum frequency of the signal collected by the microphone portion. Specifically, for example, when the sampling rate of the microphone portion is 48 kHz, the frequency range of the ultrasonic signal superimposed on the far-end signal can be set to 20 to 22 kHz.
  • the terminal needs to encode the ultrasonic signal superimposed on the far-end signal so as to be superimposed on the far-end signal.
  • the data information carried by the ultrasonic signal on the end speech signal is not repeated for a predetermined period; the predetermined period is greater than or equal to the maximum value of the echo delay.
  • the echo delay is a delay between the speaker portion playing the mixed signal and the echo corresponding to the mixed signal collected by the microphone portion.
  • the data information carried by the ultrasonic signal is used to indicate a frequency point corresponding to the ultrasonic signal.
  • the data information carried by the ultrasonic signal may include a plurality of ultrasonic encodings, each ultrasonic encoding is composed of at least two encoding portions, and each encoding portion is used to indicate each of the at least two ultrasonic frequency points. Is there a signal on it?
  • each ultrasound coding is composed of three coding parts, each of which is used to indicate whether there is a signal on each of the three ultrasonic frequency points.
  • the coding design of the ultrasonic signal can be as follows:
  • each coding part is constructed by assigning one of three ultrasonic frequency points of f 0 (frequency is 20400hz), f 1 (frequency is 21100hz), and f 2 (frequency is 21800hz) (
  • f 0 frequency is 20400hz
  • f 1 frequency is 21100hz
  • f 2 frequency is 21800hz
  • an encoding portion of more than three frequency points may be designed.
  • the number of the encoding portions may be determined by a maximum echo delay and a frame length.
  • the embodiment of the present application is only illustrated by three coding portions.
  • the formula of the ultrasonic signal corresponding to each coded part is as follows:
  • an encoding part can represent a value of 0 to 7.
  • the range of the first and second encoding parts is set to 1 to 7
  • the third coding part is set to 0, so that up to 49 different values of ultrasonic coding can be constructed. With these 49 different values of ultrasonic coding, a code table of size 49 can be designed.
  • the code table is read in order to obtain the corresponding ultrasonic code, and the ultrasonic signal is formed according to the above ultrasonic signal formula, and then superimposed with the far-end signal (the signal sample value, that is, the amplitude of the signal is added)
  • the code table data is cyclically read to construct an ultrasonic signal.
  • the coded portion set to 0 is used to indicate the boundary between adjacent two ultrasonic codes superimposed on the far-end signal, and optionally, in an actual application, in an ultrasonic coding, set to 0.
  • the coding portion may also be the first coding portion or the second coding portion.
  • an ultrasonic signal corresponding to the coding portion is superimposed on a far-end signal frame every 20 ms, that is, one superimposed on each adjacent three remote signal frames.
  • Ultrasonic encoding the corresponding ultrasonic signal is indicated by a binary indication.
  • FIG. 22 shows a mixed signal spectrum diagram according to an embodiment of the present application. In FIG. 22, the terminal is from 0.36 s.
  • the ultrasonic signal corresponding to the same coded portion is superimposed on the duration of 0.02 s, and the ultrasonic signal is not superimposed on the last 0.02 s in every 0.06 s, or the time of the last 0.02 s is superimposed.
  • the encoded portion of the ultrasonic signal corresponds to an encoded value of 0.
  • the ultrasonic signal superimposed on the duration of 0.06 s is used to indicate an ultrasonic encoding, and the encoded values of each ultrasonic encoding are different in a predetermined period. Specifically, in FIG.
  • the coded value of one coded portion is represented by the values of b 2 , b 1 , and b 0
  • the coded value of one ultrasonic code is represented by the coded value of the three coded portions, at 0.36 s ⁇
  • there is no signal at the f 2 frequency point and there are signals on the frequency points of f 1 and f 0.
  • the coded value of the coded part is 011 (that is, 3), 0.38s to 0.40s, f 2 and f 1 frequency.
  • the coded value of the coded part is 110 (that is, 6), and there is no signal at the frequencies of f 2 , f 1 and f 0 within 0.40 s to 0.42 s.
  • the coded value of the coded part is 000 (that is, represents 0), that is, within 0.36 s to 0.42 s, the ultrasonic coded value of the ultrasonic signal superimposed on the far-end signal corresponds to "360", and so on, at 0.42 s.
  • the ultrasonic coded ultrasonic signal corresponding to the superimposed signal on the far-end signal has an encoded value of "540".
  • the terminal may further detect whether the amplitude of the sound signal obtained by superimposing the far-end signal and the ultrasonic signal exceeds a preset amplitude range; if the sound signal If the amplitude exceeds the preset amplitude range, the amplitude of the far-end signal is attenuated according to a predetermined attenuation strategy.
  • the signal sample value is represented by 16-bit data, that is, at most 216 different signal sample values, and each amplitude of the speech signal corresponds to one signal sample value, that is, the amplitude is at [32767,- The speech signal between 32768] can be accurately represented, and the speech signal beyond the amplitude range cannot be accurately represented, resulting in breakage during speech playback.
  • the amplitude of the far-end signal with excessive amplitude may be attenuated.
  • FIG. 23 illustrates a schematic diagram of a far-end signal attenuation process according to an embodiment of the present application.
  • the far-end signal before superimposing the ultrasonic signal on the far-end signal, it is first determined whether the amplitude of the obtained sound signal exceeds [32767, -32768] after superimposing the far-end signal and the ultrasonic signal, and if so, When the sound signal is played through the speaker part, the sound breakage occurs. At this time, the far-end signal can be attenuated according to a predetermined attenuation strategy, and the amplitude of the sound signal obtained by superimposing the attenuated far-end signal and the ultrasonic signal is detected. Exceeding [32767, -32768], if the amplitude of the obtained sound signal does not exceed [32767, -32768], the far-end signal is superimposed with the ultrasonic signal to obtain a mixed signal.
  • the foregoing method attenuates the far-end signal according to a predetermined attenuation strategy, and specifically, the far-end signal is attenuated according to a predetermined attenuation ratio. For example, each time the far-end signal is attenuated, the amplitude of the far-end signal may be used. Multiplying the attenuation ratio to obtain the attenuated far-end signal, the attenuation ratio may be a positive number less than 1, for example, the attenuation ratio may be 0.9 or 0.8 or the like.
  • the amplitude of the ultrasonic signal should take an appropriate value to accurately detect the ultrasonic signal in the near-end signal collected by the microphone portion.
  • the amplitude of the ultrasonic signal is prevented from being too high, and the mixed signal after the ultrasonic superposition is broken, thereby affecting the call effect.
  • the amplitude of the ultrasonic signal can be set to 3000.
  • step S205 the mixed signal is played through the speaker portion.
  • the mixed signal is also buffered locally for subsequent signal alignment.
  • Step 206 Acquire a near-end signal, which is a sound signal collected by the microphone portion.
  • the near-end signal refers to a sound signal collected by the terminal through the microphone portion, and the echo signal collected by the microphone portion after the sound signal played by the speaker portion is reflected to reach the microphone portion, And the sound signal generated locally by the terminal; that is, the near-end signal collected by the microphone part, including the far-end signal played by the speaker part, the ultrasonic signal superimposed on the far-end signal, and the sound signal generated locally by the terminal (such as the user of the terminal) the sound of).
  • Step 207 Determine a first signal segment in the mixed signal and a second signal segment in the near-end signal according to the ultrasonic signal.
  • the first signal segment is a signal in a time domain of the mixed signal
  • the second signal segment is a signal in a time domain of the near-end signal.
  • the terminal may first determine the second signal segment in the near-end signal, and then determine the first signal in the mixed signal according to the ultrasonic signal included in the second signal segment. segment. For example, when determining the first signal segment and the second signal segment, the terminal may parse the data information carried by the ultrasonic signal included in the near-end signal, where the near-end signal corresponds to the ultrasonic signal carrying the target data information.
  • the signal in the time domain is determined as the second signal segment, and the playing time of the mixed signal that is recently played and superimposed with the ultrasonic signal carrying the target data information is determined, and the mixed signal is played on the determined playing time.
  • the signal is determined to be the second signal segment.
  • the terminal may analyze the ultrasonic frequency band of the signal collected by the microphone, and obtain the coding information of the ultrasonic signal according to the above coding rule.
  • the terminal uses the FFT (Fast Fourier Transformation) analysis method to perform the collected near-end signal.
  • FFT Fast Fourier Transformation
  • the terminal may first determine the play time of the mixed signal that is played most recently and superimposed with the complete ultrasonically encoded ultrasonic signal, and determine the signal played on the play time as the mixed signal. Play(i).
  • the terminal detects that there is a signal at the frequency points of f 2 , f 1 , and f 0 in the near-end signal from the time point of 0.37 s, and within the range of 0.37 s to 0.43 s, the above-mentioned f in the near-end signal 2 , the ultrasonic code corresponding to the ultrasonic signal at the f 1 and f 0 frequency points has an encoded value of “360”, and the terminal query determines the ultrasonic wave carried in the mixed signal from 0.36 s to 0.42 s in the mixed signal corresponding to FIG. 22 .
  • the coded value of the ultrasonic code corresponding to the signal is also "360", and it is determined that the near-end signal collected in 0.37s to 0.43s and the mixed signal in 0.36s to 0.42s in FIG. 22 contain the same ultrasonic signal, that is, FIG. 22
  • the mixed signal played in 0.36s ⁇ 0.42s is the first signal segment, and the near-end signal collected in 0.37s ⁇ 0.43s is the second signal segment.
  • the terminal may first determine the first signal segment in the mixed signal, and then determine the second signal segment in the near-end signal according to the ultrasonic signal included in the first signal segment. For example, when determining the first signal segment and the second signal segment, the terminal may determine, in the mixed signal, a signal in a time domain corresponding to the ultrasonic signal carrying the target data information as the first signal segment, and in the first signal segment. In the near-end signal collected after being played, the signal in the time domain corresponding to the ultrasonic signal carrying the target data information is queried, and the signal obtained by the query is determined as the second signal segment.
  • the terminal may determine, in the mixed signal, the adjacent three-frame mixed signal Play(ii) carrying one of the ultrasonic encodings as the first signal segment, and collect the microphone portion after the first signal segment is played.
  • the end signal is analyzed, and in the near-end signal collected after the first signal segment is played, the adjacent three-frame near-end signal Cap(ii) containing the same super-encoding is mixed with the mixed signal Play(ii), the near The end signal Cap(ii) is the second signal segment corresponding to the mixed signal Play(ii).
  • Step S208 calculating a correlation value between the first signal segment and the second signal segment.
  • the terminal may separately calculate a power spectrum corresponding to the first signal segment and the second signal segment by using a fast Fourier transform. And performing binarization processing on the power spectrum corresponding to the first signal segment and the second signal segment, obtaining a binarized array corresponding to each of the first signal segment and the second signal segment, and calculating the first signal segment and the first The correlation value between the binarized arrays corresponding to the two signal segments.
  • the terminal calculates the power spectrum corresponding to the first signal segment and the second signal segment, Calculating the first signal segment and the second signal segment respectively corresponding to the designation
  • the power spectrum in the frequency band which may be the frequency band in which most of the sounds are located during a voice call, for example, the designated frequency band may be 500 Hz to 1200 Hz.
  • the power spectrum of a signal when the power spectrum of a signal (such as the first signal segment or the second signal segment) is binarized, the power spectrum of the signal may be smoothed to obtain each frequency point in the power spectrum of the signal.
  • FIG. 24 is a schematic flowchart of a correlation value calculation according to an embodiment of the present application.
  • the terminal performs fast Fourier transform on the first signal segment to obtain the first signal segment at 500 Hz.
  • a power spectrum P p (j) at 1200 Hz wherein the power spectrum P p (j) represents the power of the first signal segment at each frequency point of 500 Hz to 1200 Hz, and the value range of j is [m1, m2] ,among them,
  • M is half of the number of fast Fourier transform points described above, and f s is the sampling frequency of the first signal segment.
  • the terminal binarizes P p (j) according to P psm (j).
  • the power value of the frequency point is compared with the frequency point corresponding to P psm ( The magnitude of the power smoothing value in j), if the power value of the frequency point is greater than the power smoothing value corresponding to the frequency point in P psm (j), the value of the frequency point is set to 1, otherwise, The value of the frequency point is set to 0, and finally the binarized array P pb (j) of P p (j) is obtained.
  • the terminal also performs fast Fourier transform on the second signal segment to obtain a power spectrum P c (j) of the second signal segment at 500 Hz to 1200 Hz, and smooth filters P c (j) to obtain P csm (j).
  • the P csm (j) represents the smoothed power value at each frequency point c (j) of P
  • the terminal binarized P csm (j) to P c (j) according to obtain P c (j) di Value the array P cb (j).
  • the terminal calculates a correlation value between P pb (j) and P cb (j), and the calculated correlation value can be used as a correlation value of the first signal segment and the second signal segment on the specified frequency band.
  • the specific correlation value calculation formula can be as follows:
  • PC xor ⁇ k ⁇ [m1,m2] (P pb (k)Xor P cb (k))/(m2-m1+1);
  • Xor is an XOR operator.
  • Step S209 determining whether the correlation value is less than a preset correlation value threshold, and if yes, proceeding to step 210, otherwise, proceeding to step S211.
  • the above correlation value threshold may be a threshold set by a developer in advance.
  • step S210 it is determined that the call state is a double talk state.
  • step S209 determines that the correlation value is smaller than the preset correlation value threshold, it may be determined that the call state when the microphone portion collects the near-end signal is a double talk state.
  • step S211 it is determined that the call state is a non-double talk state.
  • step S203 detects that the power value of the remote signal is not greater than the preset power threshold, it may be determined that the call state when the remote signal is acquired is a non-double talk state; or, when the foregoing step 209 determines that the correlation value is not less than the pre-predetermined value
  • the correlation value threshold is set, it can be determined that the call state when the microphone part collects the near-end signal is a non-double talk state.
  • the near-end signal collected by the microphone part of the terminal includes a locally generated sound signal (such as the voice of the user of the terminal) and the mixed signal is transmitted to the microphone. Part of the echo signal.
  • the signal strength of the locally generated sound signal may be considered to be higher, most likely the user of the terminal is Speaking, in combination with the above step S203, determining that the power value of the far-end signal is greater than the preset power threshold, it can be determined that the call state corresponding to the near-end signal is a double talk state; otherwise, when the calculated correlation value is not less than the preset
  • the correlation value threshold it can be considered that the signal strength of the locally generated sound signal is low, and the user of the terminal may not speak, and it can be determined that the call state corresponding to the near-end signal is a non-double talk state.
  • FIG. 25 is a schematic diagram of a call state detection process according to an embodiment of the present application.
  • the terminal receives the voice signal sent by the opposite end of the call.
  • receives the received voice signal when the terminal receives the voice signal.
  • the sound signal is low-pass filtered to obtain the far-end signal, and it is determined whether the power of the far-end signal is greater than a preset power threshold.
  • the current call state is a non-double talk state
  • the ultrasonic signal is superimposed on the far-end signal to obtain a mixed signal and stored
  • the terminal plays the mixed signal through the speaker portion, and the sound signal collected by the microphone portion is acquired as a near-end signal, Parsing the code carried by the ultrasonic signal in the near-end signal to align with the mixed signal, determining the first signal segment in the mixed signal and the second signal segment in the near-end signal, including the same ultrasonic signal, and calculating the first Correlation value between the signal segment and the second signal segment, if the calculated correlation value is less than the correlation value threshold, it is determined
  • the previous call state is a double talk state, otherwise, the current call state is determined to be a non-double talk state.
  • the terminal aligns the mixed signal and the near-end signal by using the ultrasonic signal superimposed in the far-end signal and the ultrasonic signal included in the near-end signal collected by the microphone portion. And determining whether the call state is a double talk state by the correlation value between the aligned near-end signal and the mixed signal, compared to the solution for estimating the amplitude attenuation in the process of the far-end signal reflection reaching the microphone portion,
  • the application shown in the application can improve the accuracy of the double-talk status detection.
  • the terminal aligns the mixed signal and the near-end signal by using an ultrasonic signal that cannot be sensed by a human hearing, thereby avoiding a normal call to the user. interference.
  • FIG. 26 is a block diagram showing the structure of a call state detecting apparatus according to an exemplary embodiment.
  • the call state detecting means can perform all or part of the steps in the embodiment shown in FIG.
  • the call state detecting device may include:
  • the remote signal acquiring part 801 acquires a far-end signal, where the far-end signal is a signal obtained according to a sound signal sent by the opposite end of the voice call;
  • a signal superimposing portion 802 configured to superimpose an ultrasonic signal on the far-end signal to obtain a mixed signal after superimposing the ultrasonic signal
  • a playing portion 803, configured to play the mixed signal through the speaker portion
  • the near-end signal acquisition part 804 is configured to acquire a near-end signal, where the near-end signal is a sound signal collected through a microphone part;
  • a signal determining portion 805 configured to determine, according to the ultrasonic signal, a first signal segment of the mixed signal and a second signal segment of the near-end signal;
  • the state determining portion 807 is configured to determine that the call state when the microphone portion collects the near-end signal is a double talk state when the correlation value is less than a preset correlation value threshold.
  • the device further includes:
  • the power detecting portion is configured to detect whether the power value of the far-end signal is greater than a preset power threshold before the signal superimposing portion superimposes the ultrasonic signal on the far-end signal;
  • the signal superimposing portion is configured to perform the step of superimposing the ultrasonic signal on the far-end signal when the detection result of the power detecting portion is that the power value of the far-end signal is greater than the preset power threshold.
  • the signal acquisition part includes:
  • a signal receiving part configured to receive a sound signal sent by the opposite end
  • a filtering part configured to perform low-pass filtering on the received sound signal to obtain the far-end signal
  • the cutoff frequency of the low pass filter is lower than the lowest frequency of the ultrasonic signal.
  • the data information carried by the ultrasonic signal superimposed on the far-end voice signal is not repeated in a predetermined period
  • the predetermined period is greater than or equal to a maximum value of the echo delay, and the echo delay is a delay between the speaker portion playing the mixed signal and the echo corresponding to the mixed signal collected by the microphone portion.
  • the signal determining part includes:
  • the first signal determining portion is configured to determine, in the near-end signal, a signal in a time domain corresponding to an ultrasonic signal carrying target data information as the second signal segment;
  • a play time determining portion configured to determine a play time of the most recently played and superimposed mixed signal of the ultrasonic signal carrying the target data information
  • the second signal determining portion is configured to determine, in the mixed signal, a signal played at the playing time as the first signal segment.
  • the signal determining part includes:
  • the third signal determining portion is configured to determine, in the mixed signal, a signal in a time domain corresponding to the ultrasonic signal carrying the target data information as the first signal segment;
  • the querying portion is configured to query, in the near-end signal collected after the first signal segment is played, a signal in a time domain corresponding to an ultrasonic signal carrying the target data information;
  • the fourth signal determining portion is configured to determine a signal obtained by the query portion query as the second signal segment.
  • the data information carried by the ultrasonic signal is used to indicate a frequency point corresponding to the ultrasonic signal.
  • the data information carried by the ultrasonic signal includes several ultrasonic codes, and each The ultrasound coding consists of at least two coding sections, and each of the coding sections is used to indicate whether a signal is present at each of the at least two ultrasound frequency points.
  • the correlation value calculation part includes:
  • the power spectrum acquisition part is configured to respectively acquire a power spectrum corresponding to each of the first signal segment and the second signal segment;
  • a binarization processing portion configured to perform binarization processing on a power spectrum corresponding to each of the first signal segment and the second signal segment, to obtain a corresponding correspondence between the first signal segment and the second signal segment Binary array
  • the correlation value calculation portion is configured to calculate a correlation value between the binarized arrays corresponding to the first signal segment and the second signal segment.
  • the device further includes:
  • the amplitude detecting portion is configured to detect whether the amplitude of the sound signal obtained after superimposing the far-end signal and the ultrasonic signal exceeds a preset value before the signal superimposing portion superimposes the ultrasonic signal on the far-end signal Range of amplitudes;
  • Attenuating portion configured to attenuate the amplitude of the far-end signal according to a predetermined attenuation strategy when the detection result of the amplitude detecting portion is that the amplitude of the sound signal exceeds the preset amplitude range deal with.
  • the call state detecting apparatus aligns the mixed signal and the near-end signal by the ultrasonic signal superimposed in the far-end signal and the ultrasonic signal included in the near-end signal collected by the microphone part. And determining, by using the correlation value between the aligned near-end signal and the mixed signal, whether the call state is a double talk state, compared with the solution for estimating the amplitude attenuation in the process of the far-end signal reflection reaching the microphone portion, the present application
  • the scheme shown can improve the accuracy of the double talk state detection.
  • the terminal aligns the mixed signal and the near-end signal by using an ultrasonic signal that cannot be sensed by a human hearing, thereby avoiding a normal call to the user. interference.
  • FIG. 5 is a schematic diagram of an optional communication device according to an embodiment of the present application. As shown in FIG. 5, the device may include: a first determining portion 52, a first obtaining portion 54, a first executing portion 56, and a second Execution section 58.
  • the first determining portion 52 is configured to determine, according to the first data packet sent by the second client that is received by the first client by using the preset network, the first media that is sent by the second client to the first client by using the preset network. Whether the information is lost or not, where the first media information includes a first data packet, where the first media information is media information that is transmitted when the second client performs an audio call or a video call with the first client;
  • the first obtaining part 54 is configured to acquire network state information of the preset network when it is determined that the first media information is lost.
  • the first executing part 56 is configured to send a retransmission request to the second client if the network status information satisfies the first preset condition, where the retransmission request is used to request the second client to retransmit the first media information.
  • the second data packet that is lost, the first preset condition is used to indicate a network condition that the preset network needs to retransmit the second data packet;
  • the second executing part 58 is configured to cancel sending a retransmission request to the second client if the network status information does not satisfy the first preset condition.
  • the transmission device further includes:
  • a parameter determining portion configured to determine a predetermined parameter that requests the second client to retransmit the second data packet, where the second data packet is a retransmission data packet of the data packet that fails to be transmitted in the first media information
  • the predetermined parameter includes: at least one of a first probability threshold for retransmission success and a second probability threshold for successfully outputting the second data packet;
  • condition determining part configured to determine, according to the predetermined parameter, a preset condition that the network condition information needs to be met when requesting retransmission, where the preset condition is used to indicate the preset network
  • the parameter determining portion and the condition determining portion which also correspond to a processor or a processing circuit, can be used for a preset condition of whether to transmit a retransmission request currently.
  • the initiator sub-portion 52 in this embodiment may be used to perform step S302 in Embodiment 1 of the present application.
  • the open sub-portion 54 in this embodiment may be used to perform step S304 in Embodiment 1 of the present application.
  • the transmitting sub-portion 56 in this embodiment may be used to perform step S306 in Embodiment 1 of the present application, and the first closing sub-portion 58 in this embodiment may be used to perform step S308 in Embodiment 1 of the present application.
  • sub-portions are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the contents disclosed in the above embodiment 1. It should be noted that the foregoing sub-portion may be implemented in a hardware environment as shown in FIG. 1 as part of the device, and may be implemented by software or by hardware.
  • the packet loss occurs in the first media information, whether the retransmission request is sent according to the network status information, and the retransmission request is sent to obtain the lost data packet to obtain the media.
  • the retransmission request is not sent to avoid the congestion of the network, and the technical problem of poor instant messaging quality caused by network congestion in the related art can be solved. In turn, the technical effect of improving the quality of instant messaging is achieved.
  • the above client may be a client for communication, and the client may be installed on a computer or a mobile device.
  • the client may be a client that has high immediacy for communication, that is, an instant messaging client.
  • an instant messaging client Such as WeChat, QQ, etc.
  • the default network is the network used for communication between clients
  • the media information can be dynamic multimedia information, such as video, audio, GIF pictures, etc., and can also be static information, such as text information, static pictures, etc.
  • Network status information is also used to describe the network Information such as network transmission speed, delay, etc.
  • a retransmission request is sent whenever a packet loss occurs. Because the network congestion is serious at this time, the retransmission request is sent, which undoubtedly aggravates the congestion of the network, and thus causes more The data packet is lost, and due to the serious network congestion, even if the response packet is received, the validity of the response packet is greatly reduced, and the effect of improving the communication quality is not improved. On the contrary, due to the aggravation of the network congestion, more The packet is lost.
  • the retransmission request is not sent to avoid aggravating the congestion state of the network, and the subsequent packet loss phenomenon may be reduced as compared with the means adopted in the related art. , in turn, the communication quality is relatively improved.
  • the first determining portion 52, the first obtaining portion 54, the first executing portion 56, and the second executing portion 58 may be disposed on the first client, that is, the first client initiates retransmission to the second client according to its own needs.
  • the request in order to reduce the running load of the first client, the first determining portion 52, the first obtaining portion 54, the first executing portion 56, and the second executing portion 58 may also be disposed on the application server, and the first client is served by the server.
  • the data reception status of the terminal is monitored.
  • the second client is requested to apply for the lost data packet according to the network condition.
  • the server here may be the client server, for example, when the client is an instant messaging application, the server Application server for instant messaging.
  • the present application analyzes current network characteristics based on historical data, determines whether to send a retransmission request according to network characteristics and importance of receiving voice data, and adjusts related strategies of retransmission control in real time according to utilization of retransmission data. Bandwidth utilization and retransmission usage are optimal under various network conditions. An alternative implementation is described with reference to FIG.
  • the first determining part is further configured to determine, according to the sequence index information in the first data packet, whether the first media information is lost.
  • the data packet with the index of 8 may be determined to be lost.
  • the data packet will be marked Identifying an index interval of a plurality of data packets corresponding to a certain media information, for example, for a voice in an instant messaging application, it can be split into 100 data packets for transmission, and then the voice packet can be identified in the data packet.
  • the index interval is 301 to 400, so that any packet can be lost based on the received packet.
  • the apparatus further includes: a second acquiring part, configured to: before determining whether the first network state of the preset network indicated by the network state information matches the second network state required for retransmitting the second data packet, Obtaining a current used bandwidth for characterizing the first network state, a current transmission delay, a current packet loss rate, and a second preset value for describing the number of consecutive lost packets; and a third determining portion, configured according to the preset network
  • the bandwidth information determines a bandwidth threshold; the fourth determining part is configured to determine a transmission delay threshold according to the network jitter information of the preset network, and the fifth determining part is configured to determine a packet loss rate threshold according to the historical packet loss rate and the packet loss model.
  • the apparatus further includes: a second determining portion configured to: after acquiring network state information of the preset network, and before sending a retransmission request to the second client or canceling sending the retransmission request to the second client Determining whether the first network state of the preset network indicated by the network state information matches the second network state required for retransmitting the second data packet; the first determining portion configured to be in the first network state and the second network state In the case of matching, it is determined that the network status information satisfies the first preset condition; and the second determining part is configured to determine that the network status information does not satisfy the first pre-condition if the first network status does not match the second network status Set conditions.
  • the second determining part includes: a first determining sub-portion configured to determine whether a difference between the bandwidth threshold and the currently used bandwidth is smaller than a first preset value; and the second determining sub-portion is configured to determine whether the current transmission delay is
  • the third determining sub-section is configured to determine whether the current packet loss rate is smaller than a packet loss rate threshold, and the fourth determining sub-section is configured to determine whether the number of consecutive packet loss is smaller than a second preset value;
  • the preset judgment result is used to indicate that the first network state matches the second network state, and the preset determination result includes at least one of the following: determining that the difference between the bandwidth threshold and the currently used bandwidth is less than the first preset value; determining the current transmission Delay is less than the transmission delay threshold
  • the value of the current packet loss rate is less than the packet loss rate threshold; and the number of consecutive packet loss is determined to be less than the second preset value.
  • the apparatus further includes: a first update part, configured to, according to the previously determined bandwidth threshold, after sending a retransmission request to the second client or canceling sending the retransmission request to the second client And re-determining the current bandwidth threshold with the current bandwidth information of the preset network;
  • the second updating part is configured to: the first ratio of the number of received second data packets to the number of retransmission requests sent is less than a third preset value a case where the packet loss rate threshold is increased and the transmission delay threshold is decreased; and the third update portion is configured to set the second ratio between the received valid second data packet and all received second data packets to be smaller than In the case of the fourth preset value, the packet loss rate threshold is increased, and the transmission delay threshold is decreased.
  • the apparatus further includes: a sixth determining part, configured to determine the lost second data packet by performing signal feature analysis on the media information segment in the first data packet before sending the retransmission request to the second client
  • the voice feature the first execution portion is further configured to send a retransmission request to the second client if the network state information satisfies a preset condition and the voice feature includes at least one of a voiced feature, a voice feature, and a semantic feature.
  • the voice signal can be analyzed, such as unvoiced, voiced analysis, voice, silence analysis, semantic importance analysis, etc., to adjust the network parameter threshold. For example, when the bandwidth is sufficient, the packet can be retransmitted as long as the packet is detected. Request, when the bandwidth is insufficient, only the retransmission request is made for the lost important voice frame. Such as retransmission of voice packets including important semantics.
  • the apparatus further includes: a receiving part 60, configured to receive a second data packet sent by the second client after sending the retransmission request to the second client; the first generating part 62 Configuring to generate second media information according to the first data packet and the second data packet; The portion 64 is configured to generate the third media information according to the first data packet if the network state information does not satisfy the first preset condition.
  • the restored second media information that is, the first media information
  • the restored second media information can recover a complete voice. Due to the lack of speech, packet loss occurs, and the third media information is relatively low in quality compared to the first media information.
  • the apparatus includes: an collecting part configured to collect offline network data, and extract at least one network parameter for characterizing a network feature from the offline network data.
  • a policy determining part configured to construct a network model according to the at least one network parameter, and determine a first de-jittering policy according to the network model, to measure or simulate a voice quality of the Voip according to the network model, optionally, first going
  • the dithering strategy may also be referred to as an initial de-jittering strategy.
  • a large number of existing network-related network data are collected through different network types, and the network model is constructed through offline training, and the network model can determine the initial de-jittering strategy.
  • the policy correction part is configured to perform characteristic parameters (such as historical data of the current call, the signal content of the current call, the perceptual auditory result of the current call, etc.) for evaluating the quality of the voice call or the video call, such as a Voip call.
  • the first de-jittering strategy is modified to obtain a second de-jittering strategy.
  • the historical data of the call in terms of the historical data of the call, it can reflect the characteristics of the call network; in terms of the signal content of the call, it determines whether the current frame is an important frame, and the voice data content is an important frame, which needs to be focused.
  • a buffer adjustment part configured to obtain a debounce parameter according to the current real-time network condition and the second de-jittering policy, and set, according to the de-jitter parameter, a language for transmitting, for example, a Voip call
  • the buffer size of the audio call or video call data makes the delay of the voice call or video call such as Voip call as expected, which is reasonable.
  • the size of the de-jitter buffer is determined according to the de-jitter parameter obtained by the second de-jittering strategy. Finally, the buffer data is adjusted based on the size of the de-jitter buffer.
  • the debounce algorithm is constructed by using multiple parameters, the various complex conditions in the network call environment are fully estimated, and the obtained first debounce strategy (or initial debounce strategy) tends to be accurate. According to this, the relevant parameters obtained by the initial debounce strategy, such as debounce parameters, also tend to be accurate.
  • the first de-jittering strategy is further modified according to a characteristic parameter for evaluating a voice call or a video call quality such as a Voip call, to obtain a second de-jittering strategy; according to the current real-time network situation and location
  • the second de-jittering strategy obtains a de-jitter parameter, and according to the de-jitter parameter, sets a buffer size for transmitting a voice call or a video call data such as a Voip call, so that the delay of a voice call or a video call such as a Voip call is met. It is expected that through the optimization of a series of de-jittering strategies, the buffer size set according to this tends to be reasonable, and the improvement of the network call quality according to the buffer size has referenceability and improves the quality of network calls.
  • the collection part, the policy determination part, and the policy modification part in the foregoing apparatus are not limited to being located at the transmitting end, the receiving end or the server, and some or all of these parts may be located at the transmitting end, the receiving end or the server.
  • the policy modification part is configured to: acquire historical data of the current call; and correct the first debounce policy according to the historical data of the current call.
  • the policy modification part is configured to: acquire the signal content of the current call, and perform the first debounce policy according to the signal content of the current call. Corrected.
  • the policy modification part is configured to: acquire a perceptual auditory result of the current call, and modify the first debounced policy according to the perceptual auditory result.
  • the device further includes: a call collection part, configured to collect voice call or video call data such as a Voip call of the current call.
  • the policy modification part is configured to: acquire different processing capabilities of the terminal device and/or serve as the voice call or video call medium such as the Voip call when triggering the voice call or video call data of the Voip call of the current call.
  • the scheduling feature of the application, the first de-jittering policy is modified according to different processing capabilities of the terminal device and/or scheduling characteristics of the application as the voice call or video call medium of the Voip call.
  • the device further includes: a call playing portion configured to play a voice call or video call data such as a Voip call of the current call.
  • the policy correction part is configured to: when triggering a voice call or video call data such as a Voip call of the current call, acquire different processing capabilities of the terminal device and/or serve as the voice call or video call medium such as the Voip call.
  • the scheduling feature of the application, the first de-jittering policy is modified according to different processing capabilities of the terminal device and/or scheduling characteristics of the application as the voice call or video call medium of the Voip call.
  • An information processing system of the embodiment of the present application includes a transmitting end (or collecting end) 41, a debounce end 42 and a receiving end (or playing end) 43.
  • the processing logic of the sending end (or the collecting end) includes: collecting offline network data, extracting at least one network parameter for characterizing the network feature from the offline network data, and using the at least one network parameter for constructing a network model, configured to determine a first debounce policy in a voice call or video call data, such as a Voip call; and acquire a voice call or video call data, such as a Voip call, of the current call, acquiring the terminal device Different processing capabilities and/or as a voice pass for such a Voip call a scheduling characteristic of an application of a voice or video call medium; performing the first debounce policy according to different processing capabilities of the terminal device and/or scheduling characteristics of an application of the voice call or video call medium such as a Voip call Corrected.
  • the processing logic of the de-jittering end comprises: constructing a network model according to the at least one network parameter, determining a first de-jittering policy according to the network model, the at least one network parameter, derived from the extracted offline network data for characterizing the network a parameter of the feature; correcting the first de-jittering policy according to a characteristic parameter for evaluating a voice call or a video call quality of a network protocol voice, such as a Voip call, to obtain a second de-jittering strategy; according to the current real-time network situation and location
  • the second de-jittering strategy obtains a de-jitter parameter, and according to the de-jitter parameter, sets a buffer size for transmitting a voice call or a video call data such as a Voip call, so that the delay of a voice call or a video call such as a Voip call is met. Expectations, tend to be reasonable.
  • the correcting the first de-jittering policy according to a feature parameter for evaluating a voice call or a video call quality includes: acquiring historical data of the current call, according to the current time.
  • the historical data of the call corrects the first debounce strategy.
  • the correcting the first de-jittering policy according to a feature parameter for evaluating a voice call or a video call quality includes: acquiring a signal content of the current call, according to the current The signal content of the call corrects the first debounce strategy.
  • the modifying the first de-jittering policy according to a feature parameter for evaluating a voice call or a video call quality includes: acquiring a perceptual auditory result of the current call, according to the sensing The auditory result corrects the first debounce strategy.
  • the processing logic of the receiving end includes: acquiring a first de-jittering policy determined in transmitting a voice call or video call data, such as a Voip call, where the first de-jittering policy is constructed according to at least one network parameter
  • the model obtains, the at least one network parameter is derived from a parameter extracted from the collected offline network data for characterizing the network feature; when playing a voice call or video call data such as a Voip call of the current call, acquiring the terminal device Different treatment Capability and/or scheduling characteristics of an application as a voice call or video call medium such as a Voip call; application according to different processing capabilities of the terminal device and/or as a voice call or video call medium such as a Voip call The scheduling feature corrects the first de-jittering strategy.
  • the information processing system described above includes a transmitting end (or collecting end) 41, a debounce end 42 and a receiving end (or playing end) 43.
  • the sending end (or the collecting end) 41 includes: an collecting part 411, configured to collect offline network data, and extract at least one network parameter for characterizing the network feature from the offline network data, where the at least one network is The parameter is used to construct a network model for determining a first debounce policy in transmitting voice call or video call data such as a Voip call; the call collection portion 412 is configured to collect a voice such as a Voip call of the current call.
  • the first de-jittering strategy is modified by different processing capabilities and/or scheduling characteristics of the application as the voice call or video call medium of the Voip call.
  • the de-jittering end 42 includes: a policy determining portion 421, configured to construct a network model according to the at least one network parameter, and determine a first de-jittering policy according to the network model, where the at least one network parameter is derived from the collected offline network data Extracting a parameter for characterizing the network feature; the second policy modifying portion 422 is configured to correct the first debounce policy according to a feature parameter for evaluating a voice call or a video call quality such as a Voip call, to obtain a second a de-jittering policy; a buffer adjusting portion 423, configured to obtain a de-jitter parameter according to the current real-time network condition and the second de-jittering policy, and set, according to the de-jittering parameter, a voice call or a video call for transmitting a voice call such as a Voip call
  • the buffer size of the data makes the delay of voice calls or video calls such as Voip calls as expected and tends to be reasonable.
  • the receiving end (or playing end) 43 includes: an obtaining part 431, configured to acquire a first de-jittering policy determined in transmitting a voice call or video call data such as a Voip call, the first de-jittering policy according to at least one network
  • the network model of the parameter construction is obtained, and the at least one network parameter is derived from the offline network collected a parameter for characterizing the network feature extracted from the network data
  • the call playing portion 432 is configured to acquire different processing capabilities of the terminal device and/or as described when playing the voice call or video call data of the Voip call of the current call a scheduling feature of an application of a voice call or a video call medium such as a Voip call
  • a third policy modification section 433 for using different processing capabilities of the terminal device and/or as a voice call or video call medium such as a Voip call
  • the scheduling characteristic of the application corrects the first de-jittering strategy.
  • a microprocessor for the processor for data processing, a microprocessor, a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor) or programmable logic may be used when performing processing.
  • An FPGA Field-Programmable Gate Array
  • An FPGA Field-Programmable Gate Array
  • the operation instruction may be a computer executable code, and the operation instruction is used to implement the information processing method in the foregoing embodiment of the present application.
  • the embodiment of the present application is as follows:
  • the embodiment of the present application may be a solution for end-to-end delay processing in a voice call or a video call such as a Voip call.
  • the voice call or video call end-to-end module included in the Voip call is shown in FIG. 15.
  • the end-to-end delay refers to the time difference from when the speaker A speaks to when the listener B hears the sound.
  • a voice call or video call technology of a Voip call transmits data in a packet form over an IP network. Due to the inherent characteristics of the IP network, the time taken for each packet to be transmitted over the network is uncertain. The difference is called jitter.
  • the end-to-end delay mainly includes: the cache delay of the device (mainly the buffer delay of the sound card acquisition, the cache delay of the sound card playback), and the data cache delay processed by each module of the Voip application ( Mainly due to the delay generated by the debounce module), network transmission delay (uncontrollable).
  • the embodiment of the present application can implement an end-to-end delay in real-time calls, and involves all aspects from collection to playback, including the following contents:
  • Different delay processing methods and parameters are set according to different processing capabilities of the device, scheduling characteristics of the application thread, and the like.
  • the method includes: determining a network jitter parameter for indicating a current network jitter condition; and adjusting a delay parameter of the jitter buffer Jitter Buffer according to the current network jitter parameter; according to the adjusted jitter
  • the delay parameter of the Buffer is used to delay the processing of the data packet in the Jitter Buffer.
  • it is first determined to indicate current network jitter.
  • the parameters are: use PktComeThisTime to record the number of 10ms packets arriving at the Jitter Buffer each time, record multiple PktComeThisTimes and determine the maximum value thereof, denoted as Pm; then obtain a parameter representing the network jitter J from a series of weighted average Pm. Adjust the size of the Jitter Buffer according to J.
  • the receiving end predicts or estimates the network delay dn according to the historical data, and simultaneously counts the packet loss rate of the receiving end; and then, according to the estimated network delay and statistics.
  • the packet loss rate is based on the E-Model to obtain the current ideal de-jitter buffer size.
  • the buffer area data is adjusted based on the buffer size.
  • Network estimation The estimation of network characteristics plays an important role in guiding the debounce algorithm.
  • the size of the de-jitter buffer is determined by the network characteristics estimated from the historical data of the current call.
  • the network characteristics estimation methods are different, the common shortcoming is that the parameters used are relatively simple, and the complexity of the network is not enough.
  • the offline de-jittering algorithm and the related parameters are determined according to the established network parameter model by offline packet capture, corresponding parameter representation network characteristics, and a large number of offline training. Then, according to the historical data of the current call, the debounce algorithm and related parameters are adjusted. At the same time, in the modeling of the network model, the overall characteristics of the network during the entire call are considered, and the burstiness in a period of time is also considered. In this way, network characteristics can be estimated more accurately.
  • the selection of the debounce algorithm is determined according to the content of the adjustment time signal and the traditional perceptual auditory evaluation parameter, and the more flexible processing is performed, so that the final effect of the perceptual hearing is better.
  • the overall schematic diagram is as shown in FIG. 18, and includes: determining a lower limit value AD_dw and an adjusted upper limit value AD_up of the current buffer size adjustment according to the current network estimation situation. Then, according to the size of the current buffer data JB_len, the size of the AD_up/AD_dw, the current signal content, and the human ear perceptual hearing model, the manner of adjusting the current buffer data and the magnitude of the adjustment are determined.
  • the acquisition and playback strategies are adjusted according to the performance of the device, so that the data transmission speed is more uniform, and the data speed to the buffer is more uniform, so that the debounce module works in the best state, and the specific implementation is as follows: Said:
  • JB_len>AD_up ⁇ F1 if the current frame signal content is an important frame (such as a voice segment), the current buffer data is compressed; if the current frame is non-critical data (such as mute data), the current frame is directly Lost.
  • JB_len>AD_up ⁇ F2 (F1)F2) if the current frame signal content is an important frame (such as a voice segment), no processing is performed on the current buffer data; if the current frame is non-essential data (such as mute data), The current buffer data is compressed.
  • the magnitude of the compression is determined by the size of F1 and F2, and the magnitude of each compression is smaller than the data length of the current frame.
  • the basis for this processing is that whether the signal is compressed or directly lost, in fact, the quality of the call is a kind of damage, and the damage of the direct packet loss is greater than the compression damage;
  • the compression algorithm of a single packet the amplitude of each compression is less than the data length of 1 frame, so the data compression directly loses the current frame, and the reduction of the buffer data length is not so fast, that is, the end-to-end delay is reduced. Slower. Therefore, we only take the method of direct frame loss when the buffer data length is very large and the current data is non-important data. If the data length of the buffer area is very large and the current data is important data, we use the damage. The smaller way is to adjust the buffer length.
  • JB_len ⁇ AD_dw ⁇ F3 if the current frame is a non-important frame, the current frame is directly copied, and the number of times of copying is determined according to the size of F3; if the current frame is an important frame, the current buffer area data is expanded.
  • JB_len ⁇ AD_dw ⁇ F4 F3 ⁇ F4
  • the current buffer is expanded. The magnitude of each expansion is determined by the size of F3 and F4.
  • the data of the buffer is directly decoded and sent to the sound card device without any debounce processing.
  • the expansion or compression also needs to look at the content of the signal and the adjustment algorithm at the time, for example, because the expansion and compression algorithms are based on the pitch period, and the music signal is not suitable for such expansion or compression. Algorithm, so if it detects that the current signal is a music signal For non-speech signals, the adjustment parameters (AD_up, AD_dw, F1 to F4) need to be adjusted appropriately.
  • the adjustment algorithms in 1) and 2) need to be adjusted according to the adjustment strategy (such as specifying continuous expansion or The maximum number of compressions, etc., to ensure that the final auditory perception can not hear the effect of fast or slow broadcast.
  • offline network feature modeling is performed: offline capture of packets, analysis of a large number of existing network data, extraction of parameters, and establishment of different network models.
  • FIG. 19 and FIG. 20 extract the “time difference between two packets before and after arrival” in the offline data as one of the model feature parameters.
  • the range of values in FIG. 19 fluctuates greatly, indicating that the jitter of the network is compared. Big.
  • the maximum burst jitter is more.
  • the maximum burst jitter of Figure 20 is relatively large (the number of times before and after the arrival of the two packets in the figure is greater than 1000 ms).
  • the traditional Jitter value can be calculated by the method in RFC 3550 to indicate the current "time” network jitter, but this is often not enough, because the overall jitter of Figure 20 is smaller, but the burst is more jittery.
  • the cumulative histogram statistics, the variance statistics, the smooth envelope value during the whole call, the number of bursts, etc. the "time difference between the arrival of the two packets before and after" is used to distinguish the two network models of Figure 19 and Figure 20. .
  • the number of consecutive packet loss In addition to the "time difference between the arrival of the two packets", the number of consecutive packet loss, the overall packet loss rate, the out-of-order rate, the out-of-order length, and the like can be analyzed as modeling parameters.
  • the debounce parameters AD_up and AD_dw are initially determined; then, AD_up and AD_dw are adjusted according to the historical data of the current call.
  • Adjust the debounce parameter according to the content of the signal adjust the debounce parameter according to the content of the current signal (music or voice, etc.), importance (mute or unmute, etc.) (ie, adjust AD_up and AD_dw, F1 ⁇ F4) .
  • the music signal is used, try to use the larger AD_up and AD_dw in the same network.
  • the general principle is: where important frames are, try to minimize the debounce processing; when the buffer length is larger than AD_up, the adjustment strategy can be slightly slowed down, and wait until the non-significant frame is processed; and when the buffer length is less than AD_dw, it needs to be adjusted as soon as possible. Avoid stuttering. When it is necessary to ensure the quality of auditory perception, debounce processing is performed when necessary.
  • the perceptual sensing is used to adjust the debounce parameters: when the signal is extended, compressed, or adjusted for duration, the adjustment frequency should be controlled so that the perceptually audible effect of fast or slow broadcast cannot be heard.
  • the speed of transmitting the packet is not uniform or irregular due to the different processing capabilities of the device and the scheduling characteristics of the application.
  • the debounce module is designed based on the uniform or regular transmission speed of the packet.
  • the uniformity of the transmission speed is mainly determined by the acquisition mode of the sound card and the scheduling characteristics of the thread. For example, if the sound card callback is used to drive the application to encode/send, then the android device will have more uneven time intervals between the two sound card callbacks than the iOS device, and the worse the performance of the machine. The more this situation.
  • the above sub-portions are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the contents disclosed in the above embodiment 1. It should be noted that the foregoing sub-portion may be implemented in a hardware environment as shown in FIG. 1 as part of the device, and may be implemented by software or by hardware, where the hardware environment includes a network environment.
  • the embodiment of the present application further provides a server or a terminal for implementing the foregoing method.
  • FIG. 7 is a structural block diagram of a terminal according to an embodiment of the present application.
  • the terminal may include: one or more (only one shown in the figure) processor 701, memory 703, and transmission device 705. (As in the transmitting apparatus in the above embodiment), as shown in FIG. 7, the terminal may further include an input/output device 707.
  • the memory 703 can be configured to store software programs and sub-portions, such as program instructions/sub-portions corresponding to the methods and devices in the embodiments of the present application, and the processor 701 runs the software programs and sub-portions stored in the memory 703, thereby The above methods are implemented by performing various functional applications and data processing.
  • Memory 703 can include high speed random access memory, and can also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 703 can further include memory remotely located relative to processor 701, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the processor 701 can be a central processing unit, a microprocessor, a digital signal processor, an application processor, or a programmable array or the like.
  • the processor 701 can be connected to the memory 703 via an integrated circuit bus or the like.
  • the transmission device 705 described above is configured to receive or transmit data via a network, and can also be used for data transmission between the processor and the memory.
  • Specific examples of the above network may include a wired network and a wireless network.
  • the transmission device 705 includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • NIC Network Interface Controller
  • transmission device 705 is a radio frequency (RF) sub-port that is configured to communicate with the Internet wirelessly.
  • the memory 703 is configured to store an application.
  • the processor 701 can invoke the application stored in the memory 703 by using the transmission device 705 to perform the following steps: determining the second client based on the first data packet sent by the second client received by the first client through the preset network. Whether the first media information is sent to the first client by the preset network, and the first media information includes the first data packet, where the first media information is that the second client performs an audio call with the first client or The media information transmitted during the video call; the network state information of the preset network is obtained when the first media information is lost, and the second client is sent when the network state information meets the preset condition.
  • the request is sent, wherein the retransmission request is used to request the second client to retransmit the second data packet that is lost in the first media information, where the preset condition is used to indicate a network condition that the preset network needs to retransmit the second data packet. If the network status information does not satisfy the preset condition, cancel sending the retransmission request to the second client.
  • the processor 701 is further configured to: after receiving the retransmission request to the second client, receiving the second data packet sent by the second client; generating the second media information according to the first data packet and the second data packet In the case that the network status information does not satisfy the preset condition, according to the first number
  • the third media information is generated according to the package.
  • the processor 701 is further configured to perform the following steps: determining the network status information after acquiring the network status information of the preset network, and before sending the retransmission request to the second client or canceling the retransmission request to the second client. Whether the indicated first network state of the preset network matches the second network state required for retransmitting the second data packet; if the first network state matches the second network state, determining that the network state information meets the pre- If the first network state does not match the second network state, it is determined that the network state information does not satisfy the preset condition.
  • the processor 701 is further configured to: determine whether the difference between the bandwidth threshold and the currently used bandwidth is smaller than the first preset value; determine whether the current transmission delay is smaller than the transmission delay threshold; and determine whether the current packet loss rate is smaller than the packet loss.
  • the threshold value is determined as to determine whether the number of consecutive packet loss is less than a second preset value.
  • the preset determination result is used to indicate that the first network state matches the second network state, and the preset determination result includes at least one of the following: determining the bandwidth.
  • the difference between the threshold and the current used bandwidth is smaller than the first preset value; the current transmission delay is determined to be smaller than the transmission delay threshold; the current packet loss rate is determined to be smaller than the packet loss rate threshold; and the number of consecutive lost packets is determined to be smaller than the second pre- Set the value.
  • a scheme of a call method Determining, according to the first data packet sent by the second client that is received by the first client by using the preset network, whether the first media information sent by the second client to the first client by the preset network is lost, wherein The first media information includes a first data packet; when it is determined that the first media information is lost, the network state information of the preset network is acquired; and when the network state information meets the preset condition, the second client is obtained.
  • the terminal may be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (MID). Terminal equipment such as PAD.
  • FIG. 7 does not limit the structure of the above electronic device.
  • the terminal may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 7, or have a different configuration than that shown in FIG.
  • Embodiments of the present application also provide a computer storage medium.
  • the foregoing storage medium may be used to execute computer executable instructions such as program code of a call method.
  • the foregoing storage medium may be located on at least one of the plurality of network devices in the network shown in the foregoing embodiment.
  • the computer storage medium can be a non-transitory storage medium.
  • the storage medium is arranged to store program code for performing the following steps:
  • the first media information includes a first data packet, where the first media information is media information that is transmitted when the second client performs an audio call or a video call with the first client.
  • the storage medium is further configured to store program code for performing the following steps: after transmitting the retransmission request to the second client, receiving the second data packet sent by the second client; according to the first data packet and The second data packet generates second media information; and if the network state information does not satisfy the preset condition, the third media information is generated according to the first data packet.
  • the storage medium is further configured to store program code for performing the following steps: after acquiring the network status information of the preset network, and sending a retransmission request to the second client or canceling sending to the second client Before retransmitting the request, determining whether the first network state of the preset network indicated by the network state information matches the second network state required for retransmitting the second data packet; and the case where the first network state matches the second network state And determining that the network state information meets the preset condition; if the first network state does not match the second network state, determining that the network state information does not satisfy the preset condition.
  • the storage medium is further configured to store program code for performing: determining whether a difference between the bandwidth threshold and the currently used bandwidth is less than a first preset value; determining whether the current transmission delay is less than a transmission delay threshold; Determining whether the current packet loss rate is smaller than the packet loss rate threshold; determining whether the number of consecutive packet loss is less than a second preset value; wherein the preset determination result is used to indicate that the first network state matches the second network state, and the preset determination result is The method includes the following: determining that the difference between the bandwidth threshold and the current used bandwidth is less than the first preset value; determining that the current transmission delay is smaller than the transmission delay threshold; determining that the current packet loss rate is smaller than the packet loss rate threshold; Number of consecutive drops The amount is less than the second preset value.
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • FIG. 8 An embodiment of a call method is also provided below in conjunction with FIG. 8, including:
  • Step S1 The packet loss detection may include: determining, according to the sequence index information in the packet header information, whether there is a packet loss. If no packet loss is detected in step S1, the retransmission request is not sent, and the subsequent process is continued, otherwise the step is entered. S2.
  • Step S2 Perform network characteristic analysis of the current network status.
  • Network characteristics include, but are not limited to, using code rate, estimated bandwidth, packet loss rate, jitter, end-to-end transmission delay, and so on.
  • Step S3 Calculate the relevant threshold of the corresponding network parameter according to the result of the analysis in step S2.
  • the calculation of the correlation threshold includes, but is not limited to, determining a bandwidth threshold, based on the estimated bandwidth. Under certain circumstances, when the used code rate is greater than a certain threshold, it is not allowed to send a retransmission request.
  • the transmission delay threshold is determined according to the network jitter; when the transmission delay is greater than a certain threshold under a certain jitter, the retransmission request is not allowed to be sent, because even if it is sent at this time Retransmission request, retransmission of response data may not be used, the utilization rate is too low.
  • the packet loss rate threshold is determined according to the historical packet loss rate and the packet loss model analysis, and the threshold value at the current packet loss rate is determined. For example, in a network with insufficient bandwidth or a network with a particularly large packet loss rate, the more data is sent, the more data is lost. At this time, it is useless or harmful to send a retransmission request to increase the network load.
  • Step S4 determining the correlation of the network parameters before adjusting according to the corresponding utilization rate of the retransmission request Threshold.
  • the corresponding utilization rate here is one of the aforementioned preset parameters.
  • the ratio of the retransmission request to the received response data is calculated: the historical data cached by the client B has a certain length limit. If the transmission delay of the client A to B is too large, the client B receives the weight. If the request packet data information carried in the request is already outside the cached data, then the client A's retransmission request will not be responded, and the proportion of the retransmitted request/received response data will be particularly low. Therefore, based on the ratio, a correlation threshold of the required network parameters such that the ratio is higher than a certain value can be obtained.
  • client B receives the retransmission request. After that, the corresponding data is found in the history cache data, and the data is resent as a response packet to client A.
  • the response data may not meet the data requirements of the real-time call when it arrives at client A, and the packet that arrives late needs to be actively discarded. Responsive to the data, but the utilization of the response data is too low. If the actual utilization rate is low for a long period of time, it is also necessary to reduce the request frequency of the retransmission, that is, to increase the relevant threshold of the network parameter.
  • Step S5 Analysis of signal characteristics corresponding to the transmitted data packet: analyzing the signal, such as unvoiced, voiced sound analysis, voice, silence analysis, semantic importance analysis, etc., and then using the relevant threshold of the network parameter adjusted in step S4, for example, When the bandwidth is sufficient, the retransmission request can be performed as long as the packet loss is detected. When the bandwidth is insufficient, only the retransmitted important voice frame is retransmitted.
  • Step S6 Requesting a judgment: performing a comprehensive judgment according to a relevant threshold of the network parameter, a current network condition, a signal characteristic, etc., and determining whether to allow a retransmission request to be sent when there is a packet loss. If retransmission is allowed, a retransmission request is sent, and if retransmission is not allowed, the retransmission request is prohibited from being sent, and the process returns to step S1.
  • the integrated portion of the above embodiment if implemented in the form of a software functional portion and sold or used as a standalone product, may be stored in the above computer readable storage medium.
  • the technical solution of the present application in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium.
  • a number of instructions are included to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the disclosed client may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the part is only a logical function division.
  • there may be another division manner for example, multiple parts or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, portion or sub-portion, and may be electrical or otherwise.
  • the parts described as separate parts may or may not be physically separated, and the parts displayed as parts may or may not be physical parts, that is, may be located in one place, or may be distributed to a plurality of network parts. Some or all of the parts may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional part in each embodiment of the present application may be integrated in one processing part, or each part may exist physically separately, or two or more parts may be integrated in one part.
  • the above integrated parts can be implemented in the form of hardware or in the form of software functional parts.
  • the network status of the preset network that receives the retransmitted data packet is obtained, to determine whether to request retransmission according to the network condition, so that the preset network condition may be reduced.
  • congestion a large number of retransmission requests are also received, resulting in further congestion of the network, so that the preset network reserves more resources for the transmission of new data, improves transmission efficiency, and thus has a positive industrial effect.
  • computer executable instructions such as corresponding computer programs can be run in the terminal device, which has the characteristics of strong industrial achievability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Environmental & Geological Engineering (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种通话方法和装置。其中,该方法包括:基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,其中,第一媒体信息包括第一数据包;在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;在网络状态信息满足第一预设条件的情况下,向第二客户端发送重传请求,其中,重传请求用于请求第二客户端重传第一媒体信息中丢失的第二数据包;在网络状态信息不满足第一预设条件的情况下,取消向第二客户端发送重传请求。本申请实施例还公开了一种计算机存储介质及终端。

Description

通话方法、装置、计算机存储介质及终端
本申请基于申请号为201610844042.2、201610940605.8及201610945642.8的三件中国专利申请提出,并要求中国专利申请的优先权,中国专利申请的全部内容并入本申请。
技术领域
本申请涉及即时通讯领域,具体而言,涉及一种通话方法、装置、计算机存储介质及终端。
背景技术
随着社会的发展,信息的交互显得越来越重要,为了满足信息的及时交互,即时通讯软件如雨后春笋般出现,如微信、QQ等,即时通讯软件的使用主要依赖于互联网,因此,网络的好坏程度将直接影响即时通讯软件的通讯质量。
目前,随着网络设备的大量普及,网络的承载压力也越来越大,在大量设备同时使用网络时就会造成网络的拥堵,从而影响到各个网络设备的网络通讯,如影响即时通讯软件的即时通讯,造成用户间的通讯质量较差。
针对相关技术中由于网络拥堵造成的即时通讯质量较差的问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种通话方法和装置,以至少解决相关技术中由于网络拥堵造成的即时通讯质量较差的技术问题。
根据本申请实施例的一个方面,提供了一种通话方法,包括:基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,其 中,第一媒体信息包括初传成功的第一数据包,第一媒体信息是第二客户端与第一客户端进行音频通话或视频通话时传输的媒体信息;
在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;
确定请求所述第二客户端重传第二数据包的预定参数,其中,所述第二数据包为所述第一媒体信息中传输失败的数据包的重传数据包;所述预定参数包括:重传成功的第一概率阈值及成功输出所述第二数据包的第二概率阈值的至少其中之一;
根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,其中,所述预设条件用于指示所述预设网络成功重传所述第二数据包的概率不小于所述第一概率阈值所需的网络条件,和/或,用于指示成功重传的所述第二数据包能够成功被输出的概率不小于所述第二概率阈值所需的网络条件;
在网络状态信息满足所述预设条件的情况下,向第二客户端发送重传请求;
在网络状态信息不满足所述预设条件的情况下,取消向第二客户端发送重传请求。
根据本申请实施例的另一方面,还提供了一种通话装置,包括:
第一判断部分,配置为基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,其中,第一媒体信息包括初传成功的第一数据包,第一媒体信息是第二客户端与第一客户端进行音频通话或视频通话时传输的媒体信息;
参数确定部分,配置为确定请求所述第二客户端重传第二数据包的预定参数,其中,所述第二数据包为所述第一媒体信息中传输失败的数据包 的重传数据包;所述预定参数包括:重传可获得参数及有效使用参数的至少其中之一;所述重传可获得参数用于指示能够成功重传所述第二数据包的概率;所述有效使用参数用于重传的所述第二数据包被成功输出的概率;
条件确定部分,配置为根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,其中,所述预设条件用于指示所述预设网络成功重传所述第二数据包所需达到的网络条件,和/或,用于指示成功重传的所述第二数据包能够成功被输出的网络条件;
第一获取部分,配置为在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;
第一执行部分,配置为在网络状态信息满足预设条件的情况下,向第二客户端发送重传请求;其中,重传请求用于请求第二客户端重传第一媒体信息中丢失的第二数据包,预设条件用于指示预设网络重传第二数据包所需达到的网络条件;
第二执行部分,配置为在网络状态信息不满足第一预设条件的情况下,取消向第二客户端发送重传请求。
根据本申请实施例的又一方面,还提供了一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行前述通话方法。
根据本申请实施例的再一方面,还提供一种终端,包括:
网络接口,配置为通过网络与服务器连接;
存储器,配置为存储计算机可执行指令;
处理器,分别与所述网络接口及所述存储器连接,配置为通过执行所述计算机可执行指令,实现前述通话方法。
在本申请实施例中提供的技术方案,基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第 一客户端发送的第一媒体信息是否发生丢包,其中,第一媒体信息包括第一数据包,第一媒体信息是第二客户端与第一客户端进行音频通话或视频通话时传输的媒体信息;在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;在网络状态信息满足预设条件的情况下,向第二客户端发送重传请求,其中,重传请求用于请求第二客户端重传第一媒体信息中丢失的第二数据包,预设条件用于指示预设网络重传第二数据包所需达到的网络条件;在网络状态信息不满足预设条件的情况下,取消向第二客户端发送重传请求,在网络情况允许的情况下,通过重传请求获取丢失的数据包,达到使媒体信息更为完整的目的。本申请实施例能减少在网络已经拥堵的情况下,由于仍不断发送重传请求而导致的预设网络的进一步拥堵,解决由于预设网络拥堵状况迟迟得不到缓解而导致的媒体信息传输进一步受堵的现象。故从整个络而言,缓解了拥堵,能够为客户端提供更好的媒体信息的传输,从而整体上提升了即时通讯质量。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的通话方法的硬件环境的示意图;
图2是根据本申请实施例的一种通讯消息传输系统的示意图;
图3A是根据本申请实施例的一种可选的通话方法的流程图;
图3B是根据本申请实施例的一种可选的通话方法的流程图
图4是根据本申请实施例的一种可选的通话方法的流程图;
图5是根据本申请实施例的一种可选的通话装置的示意图;
图6是根据本申请实施例的一种可选的通话装置的示意图;以及,
图7是根据本申请实施例的一种终端的结构框图;
图8是根据本申请实施例提供的通话方法的硬件环境的示意图;
图9为本申请实施例中进行信息交互的各方硬件实体的示意图;
图10为本申请一个方法实现流程的示意图;
图11为本申请实施例另一方法实现流程的示意图;
图12为本申请实施例又一方法实现流程的示意图;
图13为本申请实施例又一方法实现流程的示意图;
图14为本申请一个系统架构组成示意图;
图15为现有技术中通话的端到端模块示意图;
图16至图17为均为通话实现示意图;
图18为应用本申请一个场景的示意图;
图19至图20为应用本申请实施例后去抖动处理的结果对比示意图;
图21是根据一示例性实施例示出的一种通话状态检测方法的流程图;
图22是图21为本申请实施例涉及的一种混合信号频谱图;
图23是图21为本申请实施例涉及的一种远端信号衰减流程示意图;
图24是图21为本申请实施例涉及的一种相关值计算的流程示意图;
图25是图21为本申请实施例涉及的一种通话状态检测流程的示意图;
图26是根据本申请实施例示出的一种通话状态检测装置的结构方框图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,应当理解,以下所说明的优选实施例仅用于说明和解释本申请,并不用于限定本申请。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第 一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或部分的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或部分,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或部分。
根据本申请实施例,提供了一种通话方法的方法实施例。
可选地,在本实施例中,上述方法可以应用于如图1所示的由服务器102和终端104所构成的硬件环境中。如图1所示,服务器102通过网络与终端104进行连接,上述网络包括但不限于:广域网、城域网或局域网,终端104并不限定于PC、手机、平板电脑等。本申请实施例的方法可以由服务器102来执行,也可以由终端104来执行,还可以是由服务器102和终端104共同执行。其中,终端104执行本申请实施例的方法也可以是由安装在其上的客户端来执行。
实时音视频通话的总体框图如图2所示,客户端B将接收到的声卡采集到的数据进行编码、发送,通过网络传输到客户端A(即通过原有数据流);客户端A对数据(即原有数据流中传输的数据)进行接收、解码,将解码后的数据送到声卡进行播放。客户端A接收数据时,如果发现有丢包现象(通过步骤S31的丢包检测实现),就可以向客户端B发送重传请求(即重传请求数据流中的请求),客户端B接收到重传请求之后,将所需的数据重新发一遍给客户端A,重新发的这一份数据即重传响应数据流中的响应数据。
在上述的传输方式中,未考虑网络的不同特性对于实际重传的影响,对于有些网络(如受限带宽网络),丢包是由于拥塞造成的,在这样的网络 中发送重传请求、对端响应重传请求而重传数据,相当于进一步增加了网络的负担,使得拥塞现象更加严重,这时进行重传,不仅浪费带宽,还可能造成网络拥塞更加严重,由于网络拥塞的加剧,进而增加丢包,使通话质量恶化,从而造成恶性循环。
且该传输方法也未考虑语音实时通话的特性,实时通话对数据到达的时间有严格的要求,而重传是在检测到有丢包之后,再重新发送重传请求、等待对方重新发送响应数据,在这样的情况下,考虑到丢包检测、重传请求的发送以及响应数据的接收所需消耗的时间,这样经过一来一回就需要消耗一定的时间,如果这个时间过大,那么即使响应数据重新传到接收端了,对实时通信来说,也是没用的,这种网络状况下,重传数据的使用率会非常低,甚至根本不起作用。
为了考虑网络特性、语音实时通话特性对于即时通讯的影响,根据本申请实施例,提供了一种通话方法的方法实施例。
图3A是根据本申请实施例的一种可选的通话方法的流程图,如图3A所示,该方法可以包括以下步骤:
步骤S302,基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,第一媒体信息包括第一数据包,第一媒体信息是第二客户端与第一客户端进行音频通话或视频通话时传输的媒体信息;这里的第一数据包为所述第一媒体信息中初次传输就成功的数据包,故简称为初传成功数据包。
步骤S304,在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;
步骤S306,在网络状态信息满足预设条件的情况下,向第二客户端发送重传请求,重传请求用于请求第二客户端重传第一媒体信息中丢失的第 二数据包,预设条件用于指示预设网络重传第二数据包所需达到的网络条件;
步骤S308,在网络状态信息不满足预设条件的情况下,取消向第二客户端发送重传请求。
通过上述步骤S302至步骤S308,在第一媒体信息发生丢包的情况下,根据网络状态信息判断是否发送重传请求,在网络状况较为理想的情况下发送重传请求以获取丢失的数据包,达到使媒体信息更为完整的目的,在网络状况不理想的情况下,不发送重传请求,以避免加剧网络的拥堵状况,可以解决了相关技术中由于网络拥堵造成的即时通讯质量较差的技术问题,进而达到提高即时通讯质量的技术效果。
在执行所述步骤S306之前,还需要确定出所述预设条件;在确定所述预设条件时可如图3B所示,包括以下步骤:
步骤S3041:确定请求所述第二客户端重传第二数据包的预定参数,其中,所述第二数据包为所述第一媒体信息中传输失败的数据包的重传数据包;所述预定参数包括:重传成功的第一概率阈值及成功输出所述第二数据包的第二概率阈值的至少其中之一;
步骤S3042:根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,其中,所述预设条件用于指示所述预设网络成功重传所述第二数据包的概率不小于所述第一概率阈值所需的网络条件,和/或,用于指示成功重传的所述第二数据包能够成功被输出的概率不小于所述第二概率阈值所需的网络条件;
在一定情况下,所述第二客户端在缓存保留发送给所述第一客户端的数据包的时长是有限的,这样的话,即便重传请求达到第二客户端,而第二客户端自身丢弃了所述第一媒体信息,显然,即便重传请求成功达到第二客户端,也没有办法成功请求到所述第二数据包。在一些情况下,若当 前网络状况很差,所述重传请求都可能在达到第二客户端的过程中出现丢包现象,这样的话,由于重传请求的丢失的会导致重传请求失败。故在本实施例中,会首先基于当前接收到第一媒体信息的传输状况信息,确定所述请求重传的概率等参数。
在本实施例中所述预设参数可为预先协商好的参数,也可以是根据第一客户端和第二客户端当前传输的第一媒体信息的类型动态确定的。例如,传输语音数据包和视频数据包对应的所述第一概率阈值和所述第二概率阈值就可以不同。
在本实施例中可以统计重传请求成功获得重传数据包的概率,只有当概率高于第一概率阈值时,才发送所述重传请求,以请求重传的数据包。
在一些情况下,虽然成功从所述第二客户端请求了重传的数据包,但是重传的第二数据包的输出时间已经过了,故这种重传数据包是没有必要请求的,故在本实施例中还可判断成功重传的所述第二数据包被输出的概率不小于第二概率阈值所需的网络条件。
只要当前网络状况满足上述第一网络条件或第二网络条件的情况下,才发送重传请求,这样显然不限定任何网络条件,直接在出现丢包的情况下就发送所述重传请求,可以有效的降低重传请求发送的频次,且减少在网络拥堵状况下,重传请求频繁发送导致的进一步拥堵现象,尽可能的将有用带宽用于有用的媒体信息传输。
例如,所述根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,包括以下至少之一:
根据所述第二客户端缓存所述第一媒体信息的缓存时间,确定所述重传请求以不小于所述第一概率阈值在所述缓存时间内达到所述第二客户端所需的第一网络条件;
根据所述第一客户端中媒体信息的输出速率,确定所述第二数据包达 到所述第一客户端后以不小于所述第二概率阈值被输出所需的第二网络条件。
不同类型的传输场景,所述第二客户端缓存所述第一媒体信息的时长可能不同,在本实施例中,会根据第二客户端缓存所述第一媒体信息的时长,利用各种重传模型等计算出若需要确保第二数据包重传的成功概率达到第二概率阈值以上的第一网络条件。
在一些情况下,可能会出现请求重传的数据包被成功重传了,但是第一客户端接收到之后,第一客户端实质上已经过了需要输出该数据的时间,该数据包不会被输出了。故在本实施例中,为了确保被请求重传的数据包不仅能够被成功重传,而且确实被使用到,需要怎样的网络条件。
故在本实施例中,会根据预定参数确定出当前网络状况信息需要满足的网络条件。
可选方式一:所述根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,包括:
根据所述第二客户端缓存所述第一媒体信息的缓存时间,确定所述重传请求以不小于所述第一概率阈值在所述缓存时间内达到所述第二客户端所需的第一网络条件。
可选方式二:所述根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,还包括:
根据所述第一客户端中媒体信息的输出速率,确定所述第二数据包达到所述第一客户端后以不小于所述第二概率阈值被输出所需的第二网络条件。
这里的媒体信息的输出速率针对语音可为单位时间内输出的语音数据包的个数,单位时间内输出的语音数据量。针对视频可为单位时间内输出的图像帧的帧数,即对应于帧率等。
上述的客户端可以为通讯用的客户端,该客户端可以安装在计算机等固定设备或移动设备上。可选地,客户端可以为对通讯的即时性要求较高的客户端,也即即时通讯客户端,如微信、QQ等可用于提即时通信服务的应用。所述固定设备可包括:台式电脑、智能电视等。所述移动设备可包括:手机、平板电脑、可穿戴设备等。
所述预设网络为:客户端之间通讯用的网络,例如,连接两个客户端的互联网。例如,客户端A位于北京海淀区,客户端B位于北京朝阳区;连接所述客户端A和客户端B的服务器,也在海淀区及朝阳区有部署,则所述预定网络,可包括:连接所述海淀区及朝阳区的网络。总之,这里的预设网络可为所述第一媒体信息传输的网络。所述第一媒体信息可以为动态的多媒体信息,如视频、音频、GIF图片等,也可以为静态信息,如文字信息、静态图片等。
网络状态信息也即用于描述网络通信状态的信息,如网络传输速度、延迟等信息等。上述网络条件指传输第二数据包所需占用的最低网络资源和/或网络所需提供的最次的网络通信状态,如用于限定预设网络重传第二数据包所需达到的最小的网络传输速度、最小延迟等条件。
需要说明的是,在相关技术中只要发生丢包就会发送重传请求,由于此时网络堵塞较为严重,在发送重传请求的同时,无疑加剧了网络的堵塞状况,进而会造成更多的数据包丢失,且由于网络堵塞情况严重,即使收到了响应数据包,响应数据包的有效性也大大降低,起不到提高通讯质量的效果,相反,由于网络堵塞的加重,会造成更多的数据包丢失。而在本申请的技术方案中,在网络状况不理想的情况下,不发送重传请求,以避免加剧网络的拥堵状况,相对于在相关技术中采用的手段,可减少后续丢包现象的发生,进而相对地提高了通讯质量。
步骤S302至步骤S308的执行主体可以是接收数据包的客户端(即第 一客户端),即第一客户端根据自身需求向第二客户端发起重传请求,为了降低第一客户端的运行负载,也可以是客户端所属的应用服务器来执行步骤S302至步骤S308,由服务器对第一客户端的数据包接收情况进行监控,在确定了丢包之后,根据网络情况来向第二客户端申请丢失的数据包,这里的服务器可以为客户端的服务器,如在客户端为即时通讯应用时,服务器为即时通讯应用服务器。
本申请基于历史数据对当前网络特点进行分析,根据网络特性、接收语音数据的重要性来决定是否发送重传请求,同时,根据重传数据的利用率,实时调整重传控制的相关策略,使得在各种网络条件下,带宽利用率和重传使用率都达到最优。可选地实现方式如图3所示:
在步骤S302提供的技术方案中,基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包可以通过如下方式实现:根据第一数据包中的序号索引信息判断第一媒体信息是否发生丢包。
可选地,可以根据序号索引的连续性来确定,如收到了索引为7和9的数据包,那么可以确定索引为8的数据包丢失。另外,在数据包中会标识出对应于某一媒体信息的多个数据包的索引区间,例如,对于即时通讯应用中的一条语音,可以拆为100个数据包进行发送,那么在数据包中可以标识出该语音使用的索引区间为301至400,这样,在任意一个数据包丢失的时候均可以根据收到的数据包来确定。
在步骤S304提供的技术方案中,在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息,获取的信息主要包括用于表征第一网络状态的当前使用带宽、当前传输时延、当前丢包率以及用于描述允许连续丢包数量的第二预设值。
需要说明的是,上述的当前使用带宽用于表示当前使用码率。使用码 率指的是当前通话实际使用的码率,包括发送码率和接收码率,发送码率是发送的总字节数除以通话时长,接收码率是接收的总字节数除以通话时长。例如,估计的带宽(即带宽阀值)远大于当前使用的发送码率,那么就说明带宽很充足,多发一些重传包也没关系,不会对网络造成压力。估计带宽,估计的是当前通话时链路的大概带宽情况,是一个实时变化的值。
在判断网络状态信息所指示的预设网络的第一网络状态是否与重传第二数据包所需的第二网络状态匹配之前,根据预设网络的带宽信息确定带宽阈值;根据预设网络的网络抖动信息确定传输时延阈值;根据历史丢包率和丢包模型确定丢包率阈值。
丢包率包括长时丢包率(即通话开始到当前时刻为止的丢包率)、短时丢包率(如5秒内的丢包率,用来指示网络丢包率是否发生突变)、连续丢包个数的累计直方图(用来表征丢包模型,即是均匀丢包的网络类型、还是突发大丢包比较多的网络类型)。
传输时延,是指节点在发送数据时使数据块从节点进入到传输媒体所需的时间,即一个站点从开始发送数据帧到数据帧发送完毕所需要的全部时间(或者是接收站点接收另一站点发送的数据帧的全部时间)。
在步骤S306或S308提供的技术方案中,在获取预设网络的网络状态信息之后、且在向第二客户端发送重传请求或取消向第二客户端发送重传请求之前,判断网络状态信息所指示的预设网络的第一网络状态是否与重传第二数据包所需的第二网络状态匹配;在第一网络状态与第二网络状态匹配的情况下,判断出网络状态信息满足预设条件;在第一网络状态与第二网络状态不匹配的情况下,判断出网络状态信息不满足预设条件。
可选地,判断网络状态信息所指示的预设网络的第一网络状态是否与重传第二数据包所需的第二网络状态匹配包括以下至少之一:判断带宽阈值与当前使用带宽的差值是否小于第一预设值;判断当前传输时延是否小 于传输时延阈值;判断当前丢包率是否小于丢包率阈值;判断连续丢包的数量是否小于第二预设值;其中,预设判断结果用于指示第一网络状态与第二网络状态匹配,预设判断结果包括以下至少之一:判断出带宽阈值与当前使用带宽的差值小于第一预设值;判断出当前传输时延小于传输时延阈值;判断出当前丢包率小于丢包率阈值;判断出连续丢包的数量小于第二预设值。
可选地,在向第二客户端发送重传请求之前,通过对第一数据包中的媒体信息段进行信号特征分析确定丢失的第二数据包的语音特征;在网络状态信息满足预设条件的情况下,向第二客户端发送重传请求包括:在网络状态信息满足预设条件,且语音特征包括浊音特征、语音特征以及语义特征中的至少一个的情况下,向第二客户端发送重传请求。
可选地,可对语音信号进行分析,如清音、浊音分析,语音、静音分析、语义重要性分析等,以调整网络参数阈值,比如,带宽足够时,只要检测到丢包就可以进行重传请求,带宽不够时,只对丢失的重要语音帧(即上述满足浊音特征、语音特征以及语义特征中的一个或多个的语音帧)进行重传请求。如对包括重要语义的语音数据包进行重传。
在一些实施例中,所述方法还包括:
在判断出所述第一媒体信息丢包的情况下,判断丢失的第二数据包的数据内容是否为预定类型;对应地,此时,所述步骤S304包括:
当所述数据内容为预定类型,获取所述预设网络的网络状态信息。
当所述第一媒体信息为视频信息时,所述视频信息中包括:关键帧和非关键帧,当所述第二数据包的数据内容为非关键帧时,对第一多媒体信息的播放影响不大,在本实施例中一方面为简化终端的操作,同时为了缓解预设网络的拥堵,故步骤S304可为:所述数据内容不是预定类型时,则可以屏蔽所述步骤S304。
在步骤S306或S308执行完毕之后,在向第二客户端发送重传请求或取消向第二客户端发送重传请求之后,该方法还包括以下至少之一:根据前一次确定的带宽阈值和预设网络的当前带宽信息重新确定当前的带宽阈值;在接收到的第二数据包的数量与发送的重传请求的数量的第一比值小于第三预设值的情况下,增大丢包率阈值,并减小传输时延阈值;在接收到的有效的第二数据包与接收到的所有第二数据包间的第二比值小于第四预设值的情况下,增大丢包率阈值,并减小传输时延阈值。
上述的有效的第二数据包是指满足实时性要求的数据包,即在丢失后的预设时间内收到的数据包。
需要说明的是,带宽阈值、丢包率阈值、传输时延阈值等阈值可以在初始的时候根据经验设置一个初始值,步骤S302至步骤S308初次执行使用的是各个阈值的初始值,在运行的过程中,可以根据网络情况和实际的反馈情况进行自调整,以达到提高语音通信质量的目的。
在改变丢包率阈值和传输时延阈值的过程中,并不是一次性调整一个极大的数值,可以按照该参数的当前数值的某一百分比(如10%)进行增加或者减小,从而避免调整过度,以达到平滑过度的目的。
在步骤S306或S308执行完毕之后,在向第二客户端发送重传请求之后,接收第二客户端发送的第二数据包;根据第一数据包和第二数据包生成第二媒体信息;或在网络状态信息不满足预设条件的情况下,根据第一数据包生成第三媒体信息。
在接收到第一媒体信息的所有数据包的情况下,即接收到每一个丢失的第二数据包的情况下,恢复生成的第二媒体信息即第一媒体信息,即可以恢复得到一段完整的语音;由于出现了语音缺失,即出现了丢包,第三媒体信息相较于第一媒体信息,质量会相对较低。
在上述的实施例中,为了更清晰地描述重传的机制,如图2所示,重 传控制流程主要包括:
步骤S31,丢包检测,根据接收到的数据包的包头信息中的序号索引信息,判断是否有丢包,例如,当前数据包的序号索引为25,而前一数据包的序号索引为24,由于两个数据包的序号索引是连续的,根据序号索引可知没有发生丢包,若前一数据包的序号索引为22,由于两个数据包的序号索引不是连续的,根据序号索引可知发生丢包,且丢包的数量为2(即丢失数据包的序号索引为23和24)。
步骤S32,请求控制,如果在步骤S31中检测到有丢包发生,则向对方(如客户端B)发送重传请求。
步骤S33,响应控制,根据接收到的重传请求信息,在历史缓存数据中,确定将哪些数据进行重传。确定的依据包括:重传数据与已发送数据的长度间隔,所需重传数据的重要等级。
在步骤S33的响应控制中,对于进行重传的控制,对于步骤S32的请求控制,均是检测到有丢包就发送重传请求,而发送重传请求信息也是需要消耗带宽的,在有些网络下,多消耗带宽可能造成网络拥塞的加剧,使通话质量更加恶化,或者由于实时通话的特性,造成重传数据的利用率太低,这时候步骤S32中的重新请求信息的发送就是不必要的;同时,在步骤S33的响应控制中,也没有考虑网络特性、重传的利用率等。因此,在这种重传控制方法中,重传数据的利用率和带宽的利用率都没有依据不同的网络特性加以控制。
下面结合图4进一步地详述本申请的技术方案,如图4所示:
步骤S401,丢包检测,根据包头信息中的序号索引信息,判断是否有丢包。
步骤S402,丢包判决,即判断是否发生丢包,如果步骤S401中检测到没有丢包,就执行步骤S409,如果检测到发生丢包,则执行步骤S403。
步骤S403,进行网络特性分析,网络特性包括但不限于:使用码率、估计带宽、丢包率、网络抖动、端到端的传输时延等。
上述的使用码率指的是当前通话实际使用的码率,包括发送码率和接收码率,发送码率是发送的总字节数除以通话时长,接收码率是接收的总字节数除以通话时长,例如,估计的带宽是512kbps,当前使用的发送码率是100kbps,那么就说明带宽很充足,多发一些重传包也没关系,不会对网络造成压力。
估计带宽,估计的是当前通话时链路的大概带宽情况,是一个实时变化的值。
丢包率包括长时丢包率(即通话开始到当前时刻为止的丢包率)、短时丢包率(如5秒内的丢包率,用来指示网络丢包率是否发生突变)、连续丢包个数的累计直方图(用来表征丢包模型,即是均匀丢包的网络类型、还是突发大丢包比较多的网络类型)。
网络抖动,是QOS(Quality Of Service,服务质量)中的概念,是指分组延迟的变化程度,如果网络发生拥塞,排队延迟将影响端到端的延迟,并导致通过同一连接传输的分组延迟各不相同,而抖动就是用来描述这样的延迟变化的程度。
步骤S404,根据步骤S403中分析的结果,计算相应的网络参数阈值。
(1)确定带宽阈值,根据估计的带宽,当使用码率(即当前使用带宽,如接收码率、发送码率)大于一定阈值时,就不允许发送ARQ请求(即重传请求)。
(2)确定传输时延阈值,根据网络抖动,确定传输时延阈值;在一定的抖动下,当传输时延大于某个阈值的时候,就不允许发送ARQ请求,因为这时候即使发送了ARQ请求,重传过来的响应数据也可能用不上,导致利用率太低。
(3)丢包率阈值,根据历史丢包率大小和丢包模型的分析确定当前丢包率下的阈值。比如在某些带宽不够的网络下、或者丢包率特别大的网络下,发送的数据越多,丢的数据也越多,这时候再发送ARQ请求就会增加网络负担,也即发送ARQ请求也是无用或者有害的。
例如,假设估计带宽是512kbps,而当前使用码率是100kbps,那么说明带宽比较充足,检测到有丢包就可以发送重传请求;假设估计带宽是512kbps,使用码率是450kbps,说明剩余带宽不是很充足,这时候,只有丢包率大于15%,且连续丢包个数的累计直方图显示连续丢多个(如4个)以上包的比例比较大的时候,才发重传请求。之所以这样考虑,是因为丢包率比较低的时候,虽然听觉上通话质量会有所下降,但还是不影响语义的理解;而丢包率大到一定程度时,就会影响语义的接收。带宽不太够的时候,为了避免多发的重传包对网络造成冲击,只有当丢包率达到一定大的时候才发重传请求。
步骤S405,统计重传请求的相关利用率。
(1)计算接收到的响应数据的数量与ARQ请求的数量之间的第一比值,客户端B缓存的历史数据是有一定的长度限制的,如果客户端A到客户端B的传输时延太大,客户端B收到的ARQ请求中携带的请求包数据信息已经在缓存数据之外,那么就不会对客户端A的ARQ请求进行响应,这时候计算出来的第一比值的数值就会特别低。为了避免客户端A发送太多ARQ请求而造成带宽浪费,需要降低ARQ请求的发送频率,即提高网络参数的相关阈值;
(2)计算响应数据的实际利用率,客户端B收到ARQ请求之后,在历史缓存数据中找到了相应的数据,将数据作为响应包重新发送给客户端A。这时候,如果客户端B到客户端A之间的传输时延太大,响应数据到达客户端A的时候已经不满足实时通话的数据要求,变成晚到的包需要主 动丢掉,这时候虽然有响应数据,但是响应数据的利用率太低,如果一段时间内的实际利用率低,也需要降低ARQ的请求频率,即提高网络参数的相关阈值。
步骤S406,更新阈值。
由于网络带宽和传输时延等都是估计值,即使按估计带宽、使用码率、丢包率、传输时延等参数做了合理的控制,实际效果可能还是达不到理想效果,比如,可能带宽估计不够准确,增加了重传包之后,网络拥塞使得传输时延变大,发送了很多重传请求,但是收到的重传响应包很少,这时接收到的响应数据的数量与ARQ请求的数量之间的比值就会很低,比如发送了1000个重传请求,只收到了一个ARQ响应包,这个时候就要减少重传请求的频率。减少时不是一下减少很多,而是通过一步一步增加相关的网络参数来实现的,比如原来是丢包率大于10%、传输时延小于200ms时允许发送ARQ请求,现在提高门槛,只有丢包率大于20%、传输时延小于150ms时才允许发送ARQ请求。
步骤S407,信号特性分析。
对信号进行分析,如清音、浊音分析,语音、静音分析、语义重要性分析等,以调整步骤S406中的网络参数阈值,比如,带宽足够时,只要检测到丢包就可以进行重传请求,带宽不够时,只对丢失的重要语音帧进行重传请求。
比如,带宽估计是512kbps,使用码率是100kbps,说明带宽很充足,那么只要有丢包就可以发送重传请求;假设带宽估计是512kbps,而使用带宽是460kbps,说明带宽已经不是很充裕,那么只有发现丢失的包是重要信息的时候,才发送重传请求。
步骤S408,请求的综合判决。
综合判决时,带宽充裕,就可以多发一些重传请求,带宽不充裕时, 只有重要信息丢失了才发重传请求。
步骤S409,不允许发送ARQ请求。
步骤S410,允许发送ARQ请求。
通过上述实施例,可使重传请求的发送自适应不同的网络特性,使得在各种网络环境下,带宽利用率和重传效率都达到最优。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和子部分并不一定是本申请所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
本申请还提供另一个实施例;所述通话方法还包括:
根据当前的所述网络状况信息和第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输所述音频通话或视频通话的通话数据的缓冲区容量,使所述音频通话或视频通话的时延符合预期。
在本申请实施例中通过第二去抖动策略可以获得去抖动参数,可以通过对语音通话或视频通话中的去抖动处理,提升通话质量。
可选地,所述方法还包括:采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数;根据所述至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略;根据用于评估音频通话或视频通话的通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略。
对所述第一去抖动策略进行修正的方式有多种,以下提供几种可选方式。
可选方式一:
所述根据用于评估音频通话或视频通话的通话质量的通话质量的特征参数对所述第一去抖动策略进行修正,包括:
获取本次音频通话或视频通话的历史数据;
根据所述本次音频通话或视频通话的历史数据对所述第一去抖动策略进行修正。
可选方式二:
所述根据用于音频通话或视频通话的通话质量的通话质量的特征参数对所述第一去抖动策略进行修正,包括:
获取本次音频通话或视频通话的信号内容;
根据所述本次音频通话或视频通话的信号内容对所述第一去抖动策略进行修正。
可选地方式三:
所述根据用于评估音频通话或视频通话的通话质量的通话质量的特征参数对所述第一去抖动策略进行修正,包括:
获取本次音频通话或视频通话的感知听觉结果;
根据所述感知听觉结果对所述第一去抖动策略进行修正。
在一些实施例中,所述方法还包括:
采集本次音频通话或视频通话的通话数据时,获取终端设备的不同处理能力和/或作为通话媒介的应用的调度特性;
根据所述终端设备的不同处理能力和/或作为所述通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
此外,所述方法还包括:
播放本次音频通话或视频通话的通话数据时,获取终端设备的不同处理能力和/或作为所述通话媒介的应用的调度特性;
根据所述终端设备的不同处理能力和/或作为所述通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
本申请实施例的客户端对应于智能终端(如移动终端)可以以各种形式来实施。例如,本申请实施例中描述的移动终端可以包括诸如移动电话、智能电话、笔记本电脑、数字广播接收器、个人数字助理(PDA,Personal Digital Assistant)、平板电脑(PAD)、便携式多媒体播放器(PMP,Portable Media Player)、导航装置等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。下面,假设终端是移动终端。然而,本领域技术人员将理解的是,除了特别用于移动目的的元件之外,根据本申请的实施方式的构造也能够应用于固定类型的终端。
图9为本申请实施例中进行信息交互的各方硬件实体的示意图,图9中包括:终端设备1,服务器2、终端设备3。其中,终端设备1称为发送端设备,由终端设备11-14构成;终端设备3称为接收端设备,由终端设备31-35构成;服务器2用于执行去抖动的处理逻辑。终端设备通过有线网络或者无线网络与服务器进行信息交互。终端设备包括手机、台式机、PC机、一体机等类型。采用本申请实施例,终端设备1经由服务器2和终端设备3进行信息传输和交互。本申请的通话可为语音通话或视频通话。可选地,以Voip网络通话为例,终端设备11-14在本次Voip网络通话中,发送网络 数据,网络数据通过服务器2进行去抖动处理后,交由终端设备31-35进行播放,完成本次Voip网络通话。由于现有技术中采用单一的参数来构建去抖动策略并不精准,从而导致影响到Voip网络通话的通话质量,本申请实施例采用现网的离线网络数据,并从离线网络数据中提取出用于表征网络特征的至少一个网络参数,根据所述至少一个网络参数构建网络模型,使得根据所述网络模型确定的第一去抖动策略(或称去抖动策略)趋于精准。可选地,对执行去抖动处理的服务器2中的处理逻辑10包括:S1、采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数;S2、根据所述至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略;S3、根据用于评估如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略;S4、根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期。
上述图9的例子只是实现本申请实施例的一个系统架构实例,本申请实施例并不限于上述图9所述的系统结构,基于上述图9所述的系统架构,提出本申请方法各个实施例。
本申请实施例的一种信息处理方法,如图10所示,所述方法包括:采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数,根据所述至少一个网络参数构建网络模型,以根据所述网络模型衡量或模拟Voip的通话质量,根据所述网络模型确定第一去抖动策略(101)。可选地,第一去抖动策略也可以称为初始去抖动策略,在实际应用中,通过不同的网络类型来收集大量现网相关网络数据,经离线训练构建得到该网络模型,该网络模型除了可以确定初始去抖动策略,由于基于该初始去抖动策略输出的相关参数包括去抖动参数和时延参数等,也可 以说,根据所述网络模型确定初始去抖动策略和相关参数,相关参数包括去抖动参数和时延参数。根据用于评估如Voip通话的语音通话或视频通话质量的特征参数(如本次通话的历史数据、本次通话的信号内容、本次通话的感知听觉结果等)对所述第一去抖动策略进行修正,得到第二去抖动策略(1021)。其中,就本次通话的历史数据而言,它可以反映本次通话网络特性;就本次通话的信号内容而言,它决定了当前帧是否为重要帧,语音数据内容为重要帧,需要重点关注,而静音数据内容无需重点关注,对不同内容,去抖动的处理是不同的;就感知听觉结果而言,不同的感知听觉结果对去抖动调整的方式和幅度是不同的。根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,趋于合理(103)。在实际应用中,根据由第二去抖动策略得到的去抖动参数确定去抖动缓冲区的大小,最后,基于该去抖动缓冲区的大小对缓存区数据进行调整。
采用本申请实施例,采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数;根据所述至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略,由于采用多个参数构建去抖动算法,从而对网络通话环境中的各种复杂情况进行了充分估计,得到的第一去抖动策略(或称初始去抖动策略)是趋于精准的,据此初始去抖动策略得到的相关参数,如去抖动参数等也趋于精准。为了进一步提高精准度,还根据用于评估如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略;根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,通过一系列去抖动策 略的优化,使得据此设置的缓冲区大小趋于合理,则据该缓冲区大小对网络通话质量进行改善具备可参考性,提高了网络通话质量。
这里需要指出的是,上述方法处理逻辑中的采集、策略确定、策略修正等逻辑不限定是位于发送端、接收端或服务器中,这些逻辑的部分或全部可以位于发送端、接收端或服务器。
本申请实施例的一种信息处理方法,如图11所示,所述方法包括:采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数,根据所述至少一个网络参数构建网络模型,以根据所述网络模型衡量或模拟Voip的通话质量,根据所述网络模型确定第一去抖动策略(201)。可选地,第一去抖动策略也可以称为初始去抖动策略,在实际应用中,通过不同的网络类型来收集大量现网相关网络数据,经离线训练构建得到该网络模型,该网络模型除了可以确定初始去抖动策略,由于基于该初始去抖动策略输出的相关参数包括去抖动参数和时延参数等,也可以说,根据所述网络模型确定初始去抖动策略和相关参数,相关参数包括去抖动参数和时延参数。获取本次通话的历史数据,将本次通话的历史数据作为用于评估如Voip通话的语音通话或视频通话质量的特征参数,根据所述本次通话的历史数据对所述第一去抖动策略进行修正,得到第二去抖动策略(202)。其中,就本次通话的历史数据而言,它可以反映本次通话网络特性,在单次通话中,根据本次通话的历史数据,可以调整所述第一去抖动策略中的网络参数设置,如去抖动参数和时延处理参数。根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,趋于合理(203)。在实际应用中,根据由第二去抖动策略得到的去抖动参数确定去抖动缓冲区的大小,最后,基于该去抖动缓冲区的大小对缓存区数据进行调整。
本申请实施例的一种信息处理方法,如图12所示,所述方法包括:采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数,根据所述至少一个网络参数构建网络模型,以根据所述网络模型衡量或模拟Voip的通话质量,根据所述网络模型确定第一去抖动策略(301)。可选地,第一去抖动策略也可以称为初始去抖动策略,在实际应用中,通过不同的网络类型来收集大量现网相关网络数据,经离线训练构建得到该网络模型,该网络模型除了可以确定初始去抖动策略,由于基于该初始去抖动策略输出的相关参数包括去抖动参数和时延参数等,也可以说,根据所述网络模型确定初始去抖动策略和相关参数,相关参数包括去抖动参数和时延参数。获取本次通话的信号内容,将本次通话的信号内容作为用于评估如Voip通话的语音通话或视频通话质量的特征参数,根据所述本次通话的信号内容对所述第一去抖动策略进行修正,得到第二去抖动策略(302)。其中,就本次通话的信号内容而言,它决定了当前帧是否为重要帧,语音数据内容为重要帧,需要重点关注,而静音数据内容无需重点关注,对不同内容,去抖动的处理是不同的,在单次通话中,可以调整所述第一去抖动策略中的网络参数设置,如去抖动参数和时延处理参数。当然,也可以在用于评估如Voip通话的语音通话或视频通话质量的特征参数(如本次通话的历史数据、本次通话的感知听觉结果等)对所述第一去抖动策略进行修正后,再根据所述本次通话的信号内容对所述修正后的去抖动策略进行再次修正,以提高去抖动策略的精度。根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,趋于合理(303)。在实际应用中,根据由第二去抖动策略得到的去抖动参数确定去抖动缓冲区的大小,最后,基于该去抖动缓冲区的大小对缓存区数据进行调整。
本申请实施例的一种信息处理方法,如图13所示,所述方法包括:采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数,根据所述至少一个网络参数构建网络模型,以根据所述网络模型衡量或模拟Voip的通话质量,根据所述网络模型确定第一去抖动策略(401)。可选地,第一去抖动策略也可以称为初始去抖动策略,在实际应用中,通过不同的网络类型来收集大量现网相关网络数据,经离线训练构建得到该网络模型,该网络模型除了可以确定初始去抖动策略,由于基于该初始去抖动策略输出的相关参数包括去抖动参数和时延参数等,也可以说,根据所述网络模型确定初始去抖动策略和相关参数,相关参数包括去抖动参数和时延参数。获取本次通话的感知听觉结果,也可以称为传统的感知听觉评价参数,将本次通话的感知听觉结果作为用于评估如Voip通话的语音通话或视频通话质量的特征参数,根据所述本次通话的感知听觉结果对所述第一去抖动策略进行修正,得到第二去抖动策略(402)。其中,就感知听觉结果而言,不同的感知听觉结果对去抖动调整的方式和幅度是不同的,在单次通话中,可以调整所述第一去抖动策略中的网络参数设置,如去抖动参数和时延处理参数。当然,也可以在用于评估如Voip通话的语音通话或视频通话质量的特征参数(如本次通话的历史数据、本次通话的信号内容等)对所述第一去抖动策略进行修正后,再根据所述本次通话的信号内容对所述修正后的去抖动策略进行再次修正,以提高去抖动策略的精度。根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,趋于合理(403)。在实际应用中,根据由第二去抖动策略得到的去抖动参数确定去抖动缓冲区的大小,最后,基于该去抖动缓冲区的大小对缓存区数据进行调整。
在实际应用中,除了在去抖动端的处理,在整个Voip网络通话中,还可以在发送端和接收端(或称播放端),分别根据设备的不同处理能力、应用程序线程的调度特性等设置不同的时延处理方法和参数,以便对第一去抖动策略继续进行修正,以提高去抖动策略的精度,如以下实施例所示。
就整个Voip网络通话中的发送端而言,本申请实施例的一种信息处理方法中,采集本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性,根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
就整个Voip网络通话中的接收端(或称播放端)而言,本申请实施例的一种信息处理方法中,播放本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性,根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
采用上述各个实施例,在实际应用中,可以先通过离线抓包,提取相应的参数表征网络特性,通过大量的离线训练,建立不同的网络模型参数,根据建立的网络参数模型决定初始的去抖动算法和相关参数,然后,根据当次通话的历史数据,对去抖动策略和相关参数进行调整。由于,在网络模型的建模上,考虑了整个通话过程中网络的总体特性,也考虑一段时间内的突发性,因此,能更精确的估计网络特性。
就去抖动策略而言,以图1所示的系统架构为例,服务器2在执行去抖动处理时,其去抖动策略始终工作在最好的状态下,本文中的JB_len指缓冲区大小,AD_up指缓冲区上限,AD_dw指缓冲区下限,F1-F4指调整 参数的经验值,具体内容如下所示:
一,JB_len>AD_up的情况下:
当JB_len>AD_up×F1时,如果当前帧信号内容是重要帧(如语音段),则对当前缓冲区数据进行压缩处理;如果当前帧为非重要数据(如静音数据),则直接将当前帧丢掉。当JB_len>AD_up×F2时(F1>F2),如果当前帧信号内容是重要帧(如语音段),则不对当前缓冲区数据进行任何处理;如果当前帧为非重要数据(如静音数据),则对当前缓冲区数据进行压缩处理。
压缩的幅度根据F1、F2的大小决定,每次压缩的幅度小于当前帧的数据长度。
这样处理的依据,是因为不管是对信号进行压缩还是直接丢掉,其实对通话质量都是一种损伤,而且,直接丢包的损伤大于压缩的损伤;基于单个包的压缩算法,每次压缩的幅度都小于1帧的数据长度,所以数据压缩相比直接丢掉当前帧,对缓冲区数据长度的降低没那么快,即:使得对端到端时延的降低速度慢一些。因此,只有在缓冲区数据长度非常大且当前数据为非重要数据时,我们才采取直接丢帧的方法;而如果缓存区的数据长度非常大而当前数据又是重要数据时,用损伤较小的方式即压缩来调整缓冲区长度;而如果缓存区的数据长度虽然大于一定的阈值,但是当前帧为重要数据,还是采取什么都不做的策略,这样最大程度的保证了语音段的通话质量。多出来的时延,可以等到非静音段再进行快速处理,达到降低端到端时延目的的同时,又最大程度的保证通话的感知质量。
二,当JB_len<AD_dw时:
当JB_len<AD_dw×F3时,如果当前帧是非重要帧,直接对当前帧进行重复拷贝,拷贝次数根据F3的大小决定;如果当前帧是重要帧,则对当前缓存区数据进行扩展处理。当JB_len<AD_dw×F4时(F3<F4),对当前缓冲区进行扩展处理。每次扩展的幅度,根据F3和F4的大小决定。
这样处理的依据,是因为虽然扩展或者直接拷贝数据,对声音也是一种损伤,但是相比因为缓冲区数据为空而造成的声音卡顿来说,这种损伤对通话体验的影响要小很多,所以当发现缓冲区数据长度小于调整下限时,原则上是快速响应、尽快调整缓冲区数据大小。
三,当AD_up>=JB_len>=AD_dw时:
这时,直接将缓冲区的数据解码后送入声卡设备,不做任何去抖动处理。
在上述第一和第二部分内容中涉及的去抖动策略调整中,无论是采用扩展或者压缩,还需要看信号的内容和当时的调整算法,比如,因为扩展和压缩算法是基于基音周期的,而音乐信号不适合这种扩展或者压缩算法,所以如果检测到当前信号是音乐信号而非语音信号时,还需要对调整参数(AD_up、AD_dw、F1~F4)做适当调整。同时,如果连续扩展/压缩太多,听觉感知上会有快放或者慢播的效果,所以,针对上述第一和第二部分内容中涉及的去抖动策略调整中,还需要根据历时调整策略做适当的调整(如规定连续扩展或者压缩的最大次数等),保证最终听觉感知上听不出快播或者慢播的效果。
在一些实施例中,所述方法还包括:
判断所述第一客户端和所述第二客户端是否处于同时采集到声音的双讲状态;
当处于所述双讲状态时,对所述语音通话或所述视频通话进行提升通话质量的特定处理。
可选地,所述判断所述第一客户端和所述第二客户端是否处于同时采集到声音的双讲状态,包括:
根据所述第一媒体信息,获取所述第一客户端提供的远端信号,所述远端信号是根据语音通话的对端发送的声音信号所获得的信号;
对所述远端信号叠加超声波信号,获得叠加所述超声波信号后的混合信号,并通过扬声器部分播放所述混合信号;
获取所述第二客户端的近端信号,所述近端信号是第二客户端通过麦克风部分采集到的声音信号;
根据所述超声波信号确定所述混合信号中的第一信号段和所述近端信号中的第二信号段;
计算所述第一信号段与所述第二信号段之间的相关值;
当所述相关值小于预设的相关值阈值时,确定所述麦克风部分采集到所述近端信号时的通话状态为双讲状态。
可选地,所述对所述远端信号叠加超声波信号之前,还包括:
检测所述远端信号的功率值是否大于预设功率阈值;
当检测结果为所述远端信号的功率值大于所述预设功率阈值时,执行所述对所述远端信号叠加超声波信号的步骤。
可选地,所述获取远端信号,包括:
对接收到的所述第一媒体信息中的声音信号进行低通滤波,获得所述远端信号;
其中,所述低通滤波的截止频率低于所述超声波信号的最低频率。
可选地,所述根据所述超声波信号确定所述远端信号中的第一信号段和所述近端信号中的第二信号段,包括:
将所述近端信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为所述第二信号段;
确定最近播放的,且叠加有承载所述目标数据信息的超声波信号的混合信号的播放时间;
将所述混合信号中,在所述播放时间上播放的信号确定为所述第一信号段。
可选地,,所述根据所述超声波信号确定所述远端信号中的第一信号段和所述近端信号中的第二信号段,包括:
将所述混合信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为所述第一信号段;
在所述第一信号段被播放后采集到的所述近端信号中,查询承载所述目标数据信息的超声波信号所对应的时域上的信号;
将查询获得的信号确定为所述第二信号段。
可选地,叠加在所述远端信号上的所述超声波信号所承载的数据信息在预定周期内不重复;
所述预定周期大于或者等于回声时延的最大值,所述回声时延是所述扬声器部分播放所述混合信号到所述麦克风部分采集到所述混合信号对应的回声之间的时延。
可选地,所述超声波信号所承载的数据信息包括若干个超声编码,每个所述超声编码由至少两个编码部分组成,且每个所述编码部分用于指示至少两个超声频点中的每个超声频点上是否存在信号。
可选地,所述计算所述第一信号段与所述第二信号段之间的相关值,包括:
分别获取所述第一信号段与所述第二信号段各自对应的功率谱;
对所述第一信号段与所述第二信号段各自对应的功率谱进行二值化处理,获得所述第一信号段与所述第二信号段各自对应的二值化数组;
计算所述第一信号段与所述第二信号段各自对应的二值化数组之间的相关值。
可选地,所述方法还包括:
在对所述远端信号叠加超声波信号之前,检测将所述远端信号和所述超声波信号叠加之后获得的声音信号的幅值是否超出预设的幅值范围;
若检测结果为所述声音信号的幅值超出所述预设的幅值范围,则按照预定的衰减策略对所述远端信号的幅值进行衰减处理。
图21是根据一示例性实施例示出的一种通话状态检测方法的流程图,该通话状态检测方法可以包括如下几个步骤:
步骤S201,接收语音通话的对端发送的声音信号。
终端在进行语音通话的过程中,可以接收通话的对端发送的声音信号,该声音信号可以是通过PSTN发送的声音信号,也可以是通过数据网络发送的声音信号。
步骤S202,对接收到的该声音信号进行低通滤波,获得远端信号。
其中,该远端信号是承载语音通话的对端发出的声音的信号,该低通滤波的截止频率低于超声波的最低频率。
在语音通话过程中,语音信号的正常频率比较低,通常在几百到几千赫兹之间,而在接收到的声音信号中,可能会携带一些高频的干扰信号,这些高频的干扰信号中可能存在超声波信号。而在本申请后续的步骤中,需要通过叠加超声波信号来实现信号检测和对齐,如果语音通话的对端发送的声音信号中携带超声波信号,则可能会对后续叠加的超声波信号造成干扰,影响信号对齐的准确性,进一步影响双讲状态检测的准确性,因此,在本申请实施例中,终端在接收到语音通话的对端发送的声音信号后,首先对该声音信号进行低通滤波,滤除接收到的声音信号中的高频干扰信号。其中,该低通滤波的截止频率需要低于超声波的最低频率,避免在后续步骤中对叠加在远端信号上的超声波信号造成干扰。
具体的,超声波信号的最低频率为20KHz,上述低通滤波的截止频率可以介于语音信号的正常频率与超声波信号的最低频率之间,比如,该截止频率可以是12KHz,即终端将接收到的声音信号中,低于12KHz的信号获取为远端信号。
步骤S203,检测该远端信号的功率值是否大于预设功率阈值,若是,进入步骤S204,否则,进入步骤S211。
一方面,由于本申请实施例需要借助远端信号反射到麦克风部分的回声信号来进行信号对齐和相关性计算,首先需要远端信号能够产生被麦克风部分采集到的回声。因此,在本申请实施例中,在获取到远端信号之后,终端首先判断该远端信号的功率值是否大于预设功率阈值,如果是,则说明远端信号的功率较高,通过扬声器部分播放后,麦克风部分会采集到回声信号,反之,如果该远端信号的功率值不大于预设功率阈值,则说明远端信号的功率较高,通过扬声器部分播放后,麦克风部分可能不会采集到回声信号。
另一方面,远端信号的功率值同时也用于判断语音通话的对端是否正在发出声音。若远端信号的大于预设功率阈值,则说明语音通话的对端正在发出声音,比如对端用户正在讲话,此时进入步骤S204,进行后续的进一步检测;若远端信号的不大于预设功率阈值,则说明语音通话的对端没有发出声音,或者语音通话的对端发出的声音很小,比如对端用户当前没有讲话,此时进入步骤205。
在本申请实施例中,在计算远端信号的功率值时,终端可以将远端信号以固定时长(例如20ms)进行分帧,并对每一个远端信号帧分别进行功率值计算,具体的,以计算第n帧的功率值为例,该第n帧的功率值的计算公式可以如下:
Figure PCTCN2017095309-appb-000001
其中,PX(n)为第n帧的功率值,M为帧长度,数值上等于远端信号的采样频率乘以20ms,x为远端信号的采样值。
步骤S204,对该远端信号叠加超声波信号,获得叠加该超声波信号后的混合信号。
常规麦克风部分采用48KHz的采样频率,根据香农采样定理,麦克风部分采集到的信号的最大频率为24KHz。为了使得麦克风部分能够采集到叠加有超声波信号的回声信号,在本申请实施例中,对远端信号叠加的超声波信号的频率需要低于麦克风部分采集到的信号的最大频率。具体比如,当麦克风部分的采样率为48KHz时,在远端信号上叠加的超声波信号的频率范围可设为20~22KHz。
可选的,为了便于后续检测和采集近端信号以及将混合信号与麦克风部分采集到的近端信号进行对齐,终端需要对叠加在远端信号上的超声波信号进行编码,以使得叠加在该远端语音信号上的该超声波信号所承载的数据信息在预定周期内不重复;该预定周期大于或者等于回声时延的最大值。
其中,该回声时延是该扬声器部分播放该混合信号到该麦克风部分采集到该混合信号对应的回声之间的时延。
可选的,该超声波信号所承载的数据信息用于指示该超声波信号对应的频点。比如,该超声波信号所承载的数据信息可以包括若干个超声编码,每个超声编码由至少两个编码部分组成,且每个编码部分用于指示至少两个超声频点中的每个超声频点上是否存在信号。
具体的,以每个超声编码由三个编码部分组成,每个编码部分用于指示三个超声频点中的每个超声频点上是否存在信号为例,超声波信号的编码设计可以如下:
在本申请实施例中,每个编码部分由f0(频率为20400hz)、f1(频率为21100hz)、f2(频率为21800hz)三个超声频点中的一个频点赋值构建而成(实际应用中,也可以设计大于3个频点的编码部分,编码部分的个数可以由最大回声时延和帧长度来确定,本申请实施例仅以三个编码部分进行举例说明)为例,每个编码部分对应的超声波信号的公式如下:
s=A*(b0sin(2πf0*t)+b1sin(2πf1*t)+b2sin(2πf2*t));
其中,A为超声波信号的幅值,t的取值范围为[0,M-1];b0、b1以及b2为对应的三个频点的赋值开关(即b0、b1以及b2的取值为0或1),因此,一个编码部分可以代表一个0~7的值,在一个超声编码中,第一个和第二个编码部分取值范围设定为1~7,而第三个编码部分设定为0,这样可以最多构造49个不同值的超声编码,利用这49个不同值的超声编码可以设计成一个大小为49的码表,当远端信号需要叠加超声信号时,按顺序读取该码表得到对应的超声编码,按照上述超声波信号公式构成超声波信号后,与远端信号进行叠加(将信号样点值,也就是信号的幅值相加即可);当按顺序读完最后一个码表数据后,下一次读取码表数据时,从码表第一个数据开始,这样循环读取码表数据构建超声波信号。其中,设定为0的编码部分用于指示叠加在远端信号上的相邻两个超声编码之间的边界,可选的,在实际应用中,在一个超声编码中,设定为0的编码部分也可以是第一个编码部分或者第二个编码部分。
在本申请实施例中,在远端信号中叠加超声信号时,在每20ms时长的一个远端信号帧上叠加一个编码部分对应的超声波信号,即每相邻三个远端信号帧上叠加一个超声编码对应的超声波信号。具体的,以每个编码部分以二进制指示对应的编码值为例,请参考图22,其示出了本申请实施例涉及的一种混合信号频谱图,在图22中,终端从0.36s的时刻点开始,在每0.02s的时长上叠加同一个编码部分对应的超声波信号,并且,每0.06s中,最后0.02s的时长上不叠加超声波信号,或者说,该最后0.02s的时长上叠加的超声波信号对应的编码部分的编码值为0,上述每0.06s的时长上叠加的超声波信号用于指示一个超声编码,在预定周期内,每个超声编码的编码值都不相同。具体的,在图22中,一个编码部分的编码值以b2、b1和b0的取值来表示,一个超声编码的编码值以三个编码部分的编码值来表 示,在0.36s~0.38s内,f2频点上无信号,f1和f0频点上有信号,则编码部分的编码值为011(即表示3),0.38s~0.40s内,f2和f1频点上有信号,f0频点上无信号,则编码部分的编码值为110(即表示6),在0.40s~0.42s内,f2、f1和f0频点上都无信号,则编码部分的编码值为000(即表示0),即在0.36s~0.42s内,远端信号上叠加的超声波信号对应的超声编码的编码值为“360”,依此类推,在0.42s~0.48s内,远端信号上叠加的超声波信号对应的超声编码的编码值为“540”。
可选的,终端在对该远端信号叠加超声波信号之前,还可以检测将该远端信号和该超声波信号叠加之后获得的声音信号的幅值是否超出预设的幅值范围;若该声音信号的幅值超出该预设的幅值范围,则按照预定的衰减策略对该远端信号的幅值进行衰减处理。
在语音信号处理中,信号样点值使用16位数据进行表示,即最多表示216个不同的信号采样值,而语音信号中每个幅值对应一个信号采样值,即幅值处于[32767,-32768]之间的语音信号能够被准确表示,而超出该幅值范围的语音信号则无法被准确表示,从而导致语音播放时产生破音现象。在本申请实施例中,为了避免叠加超声波信号之后的混合信号播放时产生破音,可以对幅值过大的远端信号进行幅值衰减。具体的,请参考图23,其示出了本申请实施例涉及的一种远端信号衰减流程示意图。如图23所示,在对远端信号叠加超声波信号之前,首先判断将远端信号与超声波信号进行叠加之后,获得的声音信号的幅值是否超出[32767,-32768],若是,则说明该声音信号通过扬声器部分播放时会产生破音现象,此时,可以按照预定的衰减策略对远端信号进行衰减,并检测衰减后的远端信号与超声波信号进行叠加获得的声音信号的幅值是否超出[32767,-32768],若获得的声音信号的幅值没有超出[32767,-32768],则将远端信号与超声波信号进行叠加,获得混合信号。
其中,上述按照预定的衰减策略对远端信号进行衰减,具体可以是按照预定的衰减比例对远端信号进行衰减,比如,每次对远端信号进行衰减时,可以将远端信号的幅值乘以衰减比例,以获得衰减后的远端信号,该衰减比例可以是小于1的正数,比如,该衰减比例可以0.9或者0.8等等。
可选的,在本申请实施例中,超声波信号的幅值(即上述公式中的A)应该取一个适当值,以在终端能够准确检测出麦克风部分采集到的近端信号中的超声波信号的同时,避免超声波信号的幅值过高而导致超声叠加后的混合信号出现破音,从而影响通话效果,比如,超声波信号的幅值可以设置为3000。
步骤S205,通过扬声器部分播放该混合信号。
在本申请实施例中,终端通过扬声器部分播放该混合信号的同时,还将混合信号缓存在本地,以便后续进行信号对齐。
步骤206,获取近端信号,该近端信号是麦克风部分采集到的声音信号。
其中,在本申请所示的方案中,近端信号是指终端通过麦克风部分采集到的声音信号,其中包含扬声器部分播放的声音信号经过反射达到麦克风部分后,被麦克风部分采集到的回声信号,以及终端本地产生的声音信号;即麦克风部分采集到的近端信号中,包含扬声器部分播放的远端信号、叠加在远端信号上的超声波信号以及终端本地产生的声音信号(比如终端的用户说话的声音)。
步骤207,根据超声波信号确定混合信号中的第一信号段和近端信号中的第二信号段。
其中,第一信号段是混合信号中某一段时域上的信号,第二信号段是近端信号中某一段时域上的信号。
在一种可能的实现方式中,终端可以先确定近端信号中的第二信号段,然后再根据第二信号段中包含的超声波信号来确定混合信号中的第一信号 段。比如,在确定第一信号段和第二信号段时,终端可以解析该近端信号中包含的超声波信号所承载的数据信息,将该近端信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为第二信号段,确定最近播放的,且叠加有承载目标数据信息的超声波信号的混合信号的播放时间,并将该混合信号中,在该确定的播放时间上播放的信号确定为第二信号段。
具体的,终端可以对麦克风采集信号的超声波频段进行分析,按照上述编码规则获取超声波信号的编码信息,比如,终端采用FFT(Fast Fourier Transformation,快速傅立叶变换)分析法对采集到的近端信号进行分析,确定采集到的近端信号中f0、f1以及f2这三个超声频点上的功率值,并检测这三个超声频点上的功率值是否大于某个阈值,若是,则说明对应的频点上有信号,否则认为对应的频点上无信号,进而解析出当前采集到的,承载一个完整的超声编码的相邻三帧近端信号Cap(i),该完整的超声编码即为上述目标数据信息,该相邻三帧近端信号Cap(i)即为上述第二信号段,同时搜索已播放的混合信号中,承载相同超声编码,且最近播放的相邻三帧混合信号Play(i),并将该混合信号Play(i)和当前采集的近端信号Cap(i)对齐,即当前采集的近端信号Cap(i)包含了Play(i)对应的回声信号,该混合信号Play(i)即为近端信号Cap(i)对应的第一信号段。其中,上述每一帧近端信号的时长与每一帧混合信号的时长与上述步骤中每一帧远端信号的时长相同,比如,都是20ms。终端在搜索上述混合信号Play(i)时,可以首先确定最近播放,且叠加有上述完整的超声编码的超声波信号的混合信号的播放时间,并将该播放时间上播放的信号确定为上述混合信号Play(i)。
具体比如,假设终端检测到从0.37s的时间点开始,近端信号中的f2、f1和f0频点上有信号,且在0.37s~0.43s内,近端信号中上述的f2、f1和f0频点上的超声波信号对应的超声编码的编码值为“360”,终端查询确定图22对应的混合信号中,从0.36s~0.42s内的混合信号中携带的超声波信号对应 的超声编码的编码值也为“360”,则确定在0.37s~0.43s内采集到的近端信号与图22中0.36s~0.42s内的混合信号包含相同超声波信号,即图22中0.36s~0.42s内播放的混合信号为第一信号段,在0.37s~0.43s内采集到的近端信号为第二信号段。
或者,在另一种可能的实现方式中,终端也可以先确定混合信号中的第一信号段,然后再根据第一信号段中包含的超声波信号来确定近端信号中的第二信号段。比如,在确定第一信号段和第二信号段时,终端可以将混合信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为第一信号段,并在第一信号段被播放后采集到的近端信号中,查询承载该目标数据信息的超声波信号所对应的时域上的信号,将查询获得的信号确定为该第二信号段。
具体的,终端可以将混合信号中,承载某一个超声编码的相邻三帧混合信号Play(ii)确定为第一信号段,并对麦克风部分在该第一信号段被播放后采集到的近端信号进行分析,在第一信号段被播放后采集到的近端信号中,查询与上述混合信号Play(ii)包含相同的超生编码的相邻三帧近端信号Cap(ii),该近端信号Cap(ii)即为混合信号Play(ii)对应的第二信号段。
步骤S208,计算第一信号段与第二信号段之间的相关值。
可选的,在计算该第一信号段与该第二信号段之间的相关值时,终端可以通过快速傅里叶变换分别计算该第一信号段与该第二信号段各自对应的功率谱,对第一信号段与第二信号段各自对应的功率谱进行二值化处理,获得第一信号段与第二信号段各自对应的二值化数组,并计算该第一信号段与该第二信号段各自对应的二值化数组之间的相关值。
可选的,为了降低相关值计算的复杂度,提高计算速度,降低终端的电量消耗,在本申请实施例中,终端计算第一信号段与该第二信号段各自对应的功率谱时,可以计算该第一信号段和该第二信号段各自对应在指定 频段上的功率谱,该指定频段可以是语音通话过程中,大部分声音所在的频段,比如,该指定频段可以是500Hz~1200Hz。
其中,对一个信号(比如上述第一信号段或第二信号段)的功率谱进行二值化处理时,可以对该信号的功率谱进行平滑滤波,获得该信号的功率谱中每个频点上的功率平滑值,并根据该信号的功率谱,以及该信号的功率谱中每个频点上的功率平滑值,对该信号的功率谱进行二值化处理,获得该信号对应的二值化数组。
具体的,请参考图24,其示出了本申请实施例涉及的一种相关值计算的流程示意图,其中,终端对第一信号段进行做快速傅立叶变换,获得该第一信号段在500Hz~1200Hz上的功率谱Pp(j),其中,该功率谱Pp(j)表示第一信号段在500Hz~1200Hz中的各个频点上的功率,j的取值范围为[m1,m2],其中,
Figure PCTCN2017095309-appb-000002
其中,M是上述快速傅立叶变换点数的一半,fs是第一信号段的采样频率。
终端对Pp(j)进行平滑滤波,获得Ppsm(j),该Ppsm(j)表示Pp(j)中每个频点上的功率平滑值。终端根据Ppsm(j)对Pp(j)进行二值化,具体的,对于Pp(j)上的每个频点,比较该频点的功率值与该频点对应在Ppsm(j)中的功率平滑值的大小,若该频点的功率值大于该频点对应在Ppsm(j)中的功率平滑值,则将该频点的取值设置为1,否则,将该频点的取值设置为0,最后获得Pp(j)的二值化数组Ppb(j)。
相应的,终端同样对第二信号段做快速傅立叶变换,获得第二信号段在500Hz~1200Hz上的功率谱Pc(j),对Pc(j)做平滑滤波,获得Pcsm(j),该Pcsm(j)表示Pc(j)中每个频点上的功率平滑值,终端根据Pcsm(j)对Pc(j)进行 二值化,获得Pc(j)的二值化数组Pcb(j)。
最后,终端计算Ppb(j)和Pcb(j)之间的相关值,计算出的相关值即可以作为第一信号段和第二信号段在指定频段上的相关值。具体的相关值计算公式可以如下:
PCxor=Σk∈[m1,m2](Ppb(k)Xor Pcb(k))/(m2-m1+1);
其中,Xor为异或运算符。
步骤S209,判断该相关值是否小于预设的相关值阈值,若是,进入步骤210,否则,进入步骤S211。
其中,上述相关值阈值可以是开发人员预先设置的阈值。
步骤S210,确定通话状态为双讲状态。
当上述步骤S209判断出相关值小于预设的相关值阈值时,可以确定麦克风部分采集到上述近端信号时的通话状态为双讲状态。
步骤S211,确定通话状态为非双讲状态。
当上述步骤S203检测出远端信号的功率值不大于预设功率阈值时,可以确定获取到远端信号时的通话状态为非双讲状态;或者,当上述步骤209判断出相关值不小于预设的相关值阈值时,可以确定麦克风部分采集到上述近端信号时的通话状态为非双讲状态。
在本申请实施例中,终端中的扬声器部分播放混合信号时,终端的麦克风部分采集到的近端信号中包含本地产生的声音信号(比如终端的用户的说话声音)以及混合信号经过发射到达麦克风部分的回声信号。上述步骤S208中计算出的相关值越大,说明回声信号在近端信号中的占比越大,本地产生的声音信号在近端信号中的占比越小;反之,若计算出的相关值越小,则说明回声信号在近端信号中的占比越小,本地产生的声音信号在近端信号中的占比越大。当上述计算出的相关值小于预设的相关值阈值时,则可以认为本地产生的声音信号的信号强度较高,很可能是终端的用户在 说话,再结合上述步骤S203中判断出远端信号的功率值大于预设的功率阈值,可以确定近端信号对应的通话状态为双讲状态;反之,当上述计算出的相关值不小于预设的相关值阈值时,则可以认为本地产生的声音信号的信号强度较低,终端的用户可能没有说话,可以确定近端信号对应的通话状态为非双讲状态。
具体的,请参考图25,其示出了本申请实施例涉及的一种通话状态检测流程的示意图,如图25所示,终端接收到通话的对端发送的声音信号时,对接收到的声音信号进行低通滤波,获得远端信号,并判断远端信号的功率是否大于预设的功率阈值,若远端信号的功率不大于功率阈值,则确定当前通话状态为非双讲状态,若远端信号的功率大于功率阈值,则对远端信号叠加承载超声波信号,获得混合信号并存储;终端通过扬声器部分播放该混合信号,同时将麦克风部分采集到的声音信号获取为近端信号,通过解析近端信号中的超声波信号所携带的编码来与混合信号进行对齐,确定出包含相同超声波信号的,混合信号中的第一信号段以及近端信号中的第二信号段,并计算第一信号段和第二信号段之间的相关值,若计算出的相关值小于相关值阈值,则确定当前通话状态为双讲状态,否则,确定当前通话状态为非双讲状态。
综上所述,本申请实施例提供的通话状态检测方法,终端通过在远端信号中叠加的超声波信号以及麦克风部分采集到的近端信号中包含的超声波信号对混合信号和近端信号进行对齐,并通过对齐后的近端信号和混合信号之间的相关值判断通话状态是否为双讲状态,相比于对远端信号反射到达麦克风部分的过程中的幅度衰减情况进行估计的方案,本申请所示的方案能够提高对双讲状态检测的准确性。
此外,本申请实施例提供的方法,终端通过人的听觉无法感觉到的超声波信号来对混合信号和近端信号进行对齐,避免对用户的正常通话造成 干扰。
图26是根据一示例性实施例示出的一种通话状态检测装置的结构方框图。该通话状态检测装置,可以执行图21所示实施例中的全部或者部分步骤。该通话状态检测装置可以包括:
远端信号获取部分801,获取远端信号,所述远端信号是根据语音通话的对端发送的声音信号所获得的信号;
信号叠加部分802,用于对所述远端信号叠加超声波信号,获得叠加所述超声波信号后的混合信号;
播放部分803,用于通过扬声器部分播放所述混合信号;
近端信号获取部分804,用于获取近端信号,所述近端信号是通过麦克风部分采集到的声音信号;
信号确定部分805,用于根据所述超声波信号确定所述混合信号中的第一信号段和所述近端信号中的第二信号段;
相关值计算部分806,计算所述第一信号段与所述第二信号段之间的相关值;
状态确定部分807,配置为当所述相关值小于预设的相关值阈值时,确定所述麦克风部分采集到所述近端信号时的通话状态为双讲状态。
可选的,所述装置还包括:
功率检测部分,配置为在所述信号叠加部分对所述远端信号叠加超声波信号之前,检测所述远端信号的功率值是否大于预设功率阈值;
所述信号叠加部分,配置为当所述功率检测部分的检测结果为所述远端信号的功率值大于所述预设功率阈值时,执行所述对所述远端信号叠加超声波信号的步骤。
可选的,所述信号获取部分,包括:
信号接收部分,用于接收所述对端发送的声音信号;
滤波部分,用于对接收到的所述声音信号进行低通滤波,获得所述远端信号;
其中,所述低通滤波的截止频率低于所述超声波信号的最低频率。
可选的,叠加在所述远端语音信号上的所述超声波信号所承载的数据信息在预定周期内不重复;
所述预定周期大于或者等于回声时延的最大值,所述回声时延是所述扬声器部分播放所述混合信号到所述麦克风部分采集到所述混合信号对应的回声之间的时延。
可选的,所述信号确定部分,包括:
第一信号确定部分,配置为将所述近端信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为所述第二信号段;
播放时间确定部分,配置为确定最近播放的,且叠加有承载所述目标数据信息的超声波信号的混合信号的播放时间;
第二信号确定部分,配置为将所述混合信号中,在所述播放时间上播放的信号确定为所述第一信号段。
可选的,所述信号确定部分,包括:
第三信号确定部分,配置为将所述混合信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为所述第一信号段;
查询部分,配置为在所述第一信号段被播放后采集到的所述近端信号中,查询承载所述目标数据信息的超声波信号所对应的时域上的信号;
第四信号确定部分,配置为将所述查询部分查询获得的信号确定为所述第二信号段。
可选的,所述超声波信号所承载的数据信息用于指示所述超声波信号对应的频点。
可选的,所述超声波信号所承载的数据信息包括若干个超声编码,每 个所述超声编码由至少两个编码部分组成,且每个所述编码部分用于指示至少两个超声频点中的每个超声频点上是否存在信号。
可选的,所述相关值计算部分,包括:
功率谱获取部分,配置为分别获取所述第一信号段与所述第二信号段各自对应的功率谱;
二值化处理部分,配置为对所述第一信号段与所述第二信号段各自对应的功率谱进行二值化处理,获得所述第一信号段与所述第二信号段各自对应的二值化数组;
相关值计算部分,配置为计算所述第一信号段与所述第二信号段各自对应的二值化数组之间的相关值。
可选的,所述装置还包括:
幅值检测部分,配置为在所述信号叠加部分对所述远端信号叠加超声波信号之前,检测将所述远端信号和所述超声波信号叠加之后获得的声音信号的幅值是否超出预设的幅值范围;
衰减部分,用于当所述幅值检测部分的检测结果为所述声音信号的幅值超出所述预设的幅值范围时,按照预定的衰减策略对所述远端信号的幅值进行衰减处理。
综上所述,本申请实施例提供的通话状态检测装置,通过在远端信号中叠加的超声波信号以及麦克风部分采集到的近端信号中包含的超声波信号对混合信号和近端信号进行对齐,并通过对齐后的近端信号和混合信号之间的相关值判断通话状态是否为双讲状态,相比于对远端信号反射到达麦克风部分的过程中的幅度衰减情况进行估计的方案,本申请所示的方案能够提高对双讲状态检测的准确性。
此外,本申请实施例提供的装置,终端通过人的听觉无法感觉到的超声波信号来对混合信号和近端信号进行对齐,避免对用户的正常通话造成 干扰。
本申请实施例还提供了一种用于实施上述通话方法的通话装置。图5是根据本申请实施例的一种可选的通话装置的示意图,如图5所示,该装置可以包括:第一判断部分52、第一获取部分54、第一执行部分56以及第二执行部分58。
第一判断部分52,配置为基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,其中,第一媒体信息包括第一数据包,第一媒体信息是第二客户端与第一客户端进行音频通话或视频通话时传输的媒体信息;
第一获取部分54,配置为在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;
第一执行部分56,配置为在网络状态信息满足第一预设条件的情况下,向第二客户端发送重传请求,其中,重传请求用于请求第二客户端重传第一媒体信息中丢失的第二数据包,第一预设条件用于指示预设网络重传第二数据包所需达到的网络条件;
第二执行部分58,配置为在网络状态信息不满足第一预设条件的情况下,取消向第二客户端发送重传请求。
在本实施例中,所述传输装置还包括:
参数确定部分,配置为确定请求所述第二客户端重传第二数据包的预定参数,其中,所述第二数据包为所述第一媒体信息中传输失败的数据包的重传数据包;所述预定参数包括:重传成功的第一概率阈值及成功输出所述第二数据包的第二概率阈值的至少其中之一;
条件确定部分,配置为根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,其中,所述预设条件用于指示所述预设网 络成功重传所述第二数据包的概率不小于所述第一概率阈值所需的网络条件,和/或,用于指示成功重传的所述第二数据包能够成功被输出的概率不小于所述第二概率阈值所需的网络条件。
所述参数确定部分及所述条件确定部分,同样对应于处理器或处理电路,可以用于当前是否发送重传请求的预设条件。
需要说明的是,该实施例中的发起子部分52可以用于执行本申请实施例1中的步骤S302,该实施例中的开启子部分54可以用于执行本申请实施例1中的步骤S304,该实施例中的发送子部分56可以用于执行本申请实施例1中的步骤S306,该实施例中的第一关闭子部分58可以用于执行本申请实施例1中的步骤S308。
此处需要说明的是,上述子部分与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述子部分作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现。
通过上述子部分,在第一媒体信息发生丢包的情况下,根据网络状态信息判断是否发送重传请求,在网络状况较为理想的情况下发送重传请求以获取丢失的数据包,达到使媒体信息更为完整的目的,在网络状况不理想的情况下,不发送重传请求,以避免加剧网络的拥堵状况,可以解决了相关技术中由于网络拥堵造成的即时通讯质量较差的技术问题,进而达到提高即时通讯质量的技术效果。
上述的客户端可以为通讯用的客户端,该客户端可以安装在计算机、移动设备上,优选地,客户端可以为对通讯的即时性要求较高的客户端,也即即时通讯客户端,如微信、QQ等;预设网络即客户端间通讯用的网络;媒体信息可以为动态的多媒体信息,如视频、音频、GIF图片等,也可以为静态信息,如文字信息、静态图片等;网络状态信息也即用于描述网络特 征的信息,如网络传输速度、延迟等信息。
需要说明的是,在相关技术中只要发生丢包就会发送重传请求,由于此时网络堵塞较为严重,在发送重传请求的同时,无疑加剧了网络的堵塞状况,进而会造成更多的数据包丢失,且由于网络堵塞情况严重,即使收到了响应数据包,响应数据包的有效性也大大降低,起不到提高通讯质量的效果,相反,由于网络堵塞的加重,会造成更多的数据包丢失。而在本申请的技术方案中,在网络状况不理想的情况下,不发送重传请求,以避免加剧网络的拥堵状况,相对于在相关技术中采用的手段,可减少后续丢包现象的发生,进而相对地提高了通讯质量。
上述的第一判断部分52、第一获取部分54、第一执行部分56以及第二执行部分58可以设置在第一客户端上,即第一客户端根据自身需求向第二客户端发起重传请求,为了降低第一客户端的运行负载,上述的第一判断部分52、第一获取部分54、第一执行部分56以及第二执行部分58也可以设置在应用服务器上,由服务器对第一客户端的数据包接收情况进行监控,在确定了丢包之后,根据网络情况来向第二客户端申请丢失的数据包,这里的服务器可以为客户端的服务器,如在客户端为即时通讯应用时,服务器为即时通讯应用服务器。
本申请基于历史数据对当前网络特点进行分析,根据网络特性、接收语音数据的重要性来决定是否发送重传请求,同时,根据重传数据的利用率,实时调整重传控制的相关策略,使得在各种网络条件下,带宽利用率和重传使用率都达到最优。可选地实现方式参照图3。
可选地,第一判断部分还配置为根据第一数据包中的序号索引信息判断第一媒体信息是否发生丢包。
可选地,可以根据序号索引的连续性来确定,如收到了索引为7和9的数据包,那么可以确定索引为8的数据包丢失。另外,在数据包中会标 识出对应于某一媒体信息的多个数据包的索引区间,例如,对于即时通讯应用中的一条语音,可以拆为100个数据包进行发送,那么在数据包中可以标识出该语音使用的索引区间为301至400,这样,在任意一个数据包丢失的时候均可以根据收到的数据包来确定。
可选地,该装置还包括:第二获取部分,配置为在判断网络状态信息所指示的预设网络的第一网络状态是否与重传第二数据包所需的第二网络状态匹配之前,获取用于表征第一网络状态的当前使用带宽、当前传输时延、当前丢包率以及用于描述允许连续丢包数量的第二预设值;第三确定部分,用于根据预设网络的带宽信息确定带宽阈值;第四确定部分,用于根据预设网络的网络抖动信息确定传输时延阈值;第五确定部分,用于根据历史丢包率和丢包模型确定丢包率阈值。
可选地,该装置还包括:第二判断部分,配置为在获取预设网络的网络状态信息之后、且在向第二客户端发送重传请求或取消向第二客户端发送重传请求之前,判断网络状态信息所指示的预设网络的第一网络状态是否与重传第二数据包所需的第二网络状态匹配;第一确定部分,配置为在第一网络状态与第二网络状态匹配的情况下,判断出网络状态信息满足第一预设条件;第二确定部分,用于在第一网络状态与第二网络状态不匹配的情况下,判断出网络状态信息不满足第一预设条件。
可选地,第二判断部分包括:第一判断子部分,配置为判断带宽阈值与当前使用带宽的差值是否小于第一预设值;第二判断子部分,配置为判断当前传输时延是否小于传输时延阈值;第三判断子部分,配置为判断当前丢包率是否小于丢包率阈值;第四判断子部分,用于判断连续丢包的数量是否小于第二预设值;其中,预设判断结果用于指示第一网络状态与第二网络状态匹配,预设判断结果包括以下至少之一:判断出带宽阈值与当前使用带宽的差值小于第一预设值;判断出当前传输时延小于传输时延阈 值;判断出当前丢包率小于丢包率阈值;判断出连续丢包的数量小于第二预设值。
作为一种可选的实施例,装置还包括:第一更新部分,配置为在向第二客户端发送重传请求或取消向第二客户端发送重传请求之后,根据前一次确定的带宽阈值和预设网络的当前带宽信息重新确定当前的带宽阈值;第二更新部分,配置为在接收到的第二数据包的数量与发送的重传请求的数量的第一比值小于第三预设值的情况下,增大丢包率阈值,并减小传输时延阈值;第三更新部分,配置为在接收到的有效的第二数据包与接收到的所有第二数据包间的第二比值小于第四预设值的情况下,增大丢包率阈值,并减小传输时延阈值。
需要说明的是,在改变丢包率阈值和传输时延阈值的过程中,并不是一次性调整一个极大的数值,可以按照该参数的当前数值的某一百分比(如10%)进行增加或者减小,从而避免调整过度,以达到平滑过度的目的。
可选地,该装置还包括:第六确定部分,配置为在向第二客户端发送重传请求之前,通过对第一数据包中的媒体信息段进行信号特征分析确定丢失的第二数据包的语音特征;第一执行部分还配置为在网络状态信息满足预设条件,且语音特征包括浊音特征、语音特征以及语义特征中的至少一个的情况下,向第二客户端发送重传请求。
可选地,可对语音信号进行分析,如清音、浊音分析,语音、静音分析、语义重要性分析等,以调整网络参数阈值,比如,带宽足够时,只要检测到丢包就可以进行重传请求,带宽不够时,只对丢失的重要语音帧进行重传请求。如对包括重要语义的语音数据包进行重传。
可选地,如图6所示,该装置还包括:接收部分60,配置为在向第二客户端发送重传请求之后,接收第二客户端发送的第二数据包;第一生成部分62,配置为根据第一数据包和第二数据包生成第二媒体信息;第二生 成部分64,配置为在网络状态信息不满足第一预设条件的情况下,根据第一数据包生成第三媒体信息。
在接收到第一媒体信息的所有数据包的情况下,即接收到每一个丢失的第二数据包的情况下,恢复的第二媒体信息即第一媒体信息,即可以恢复得到一段完整的语音;由于出现了语音缺失,即出现了丢包,第三媒体信息相较于第一媒体信息,质量会相对较低。
本申请实施例的一种信息处理装置,所述装置包括:采集部分,配置为采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数。策略确定部分,配置为根据所述至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略,以根据所述网络模型衡量或模拟Voip的通话质量,可选地,第一去抖动策略也可以称为初始去抖动策略,在实际应用中,通过不同的网络类型来收集大量现网相关网络数据,经离线训练构建得到该网络模型,该网络模型除了可以确定初始去抖动策略,由于基于该初始去抖动策略输出的相关参数包括去抖动参数和时延参数等,也可以说,根据所述网络模型确定初始去抖动策略和相关参数,相关参数包括去抖动参数和时延参数。策略修正部分,用于根据用于评估如Voip通话的语音通话或视频通话质量的特征参数(如本次通话的历史数据、本次通话的信号内容、本次通话的感知听觉结果等)对所述第一去抖动策略进行修正,得到第二去抖动策略。其中,就本次通话的历史数据而言,它可以反映本次通话网络特性;就本次通话的信号内容而言,它决定了当前帧是否为重要帧,语音数据内容为重要帧,需要重点关注,而静音数据内容无需重点关注,对不同内容,去抖动的处理是不同的;就感知听觉结果而言,不同的感知听觉结果对去抖动调整的方式和幅度是不同的。缓冲区调整部分,用于根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语 音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,趋于合理。在实际应用中,根据由第二去抖动策略得到的去抖动参数确定去抖动缓冲区的大小,最后,基于该去抖动缓冲区的大小对缓存区数据进行调整。
采用本申请实施例,采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数;根据所述至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略,由于采用多个参数构建去抖动算法,从而对网络通话环境中的各种复杂情况进行了充分估计,得到的第一去抖动策略(或称初始去抖动策略)是趋于精准的,据此初始去抖动策略得到的相关参数,如去抖动参数等也趋于精准。为了进一步提高精准度,还根据用于评估如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略;根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,通过一系列去抖动策略的优化,使得据此设置的缓冲区大小趋于合理,则据该缓冲区大小对网络通话质量进行改善具备可参考性,提高了网络通话质量。
这里需要指出的是,上述装置中的采集部分、策略确定部分、策略修正部分不限定是位于发送端、接收端或服务器中,这些部分的部分或全部可以位于发送端、接收端或服务器。
在本申请一个实施方式中,所述策略修正部分,配置为:获取本次通话的历史数据;根据所述本次通话的历史数据对所述第一去抖动策略进行修正。
在本申请一个实施方式中,所述策略修正部分,配置为:获取本次通话的信号内容,根据所述本次通话的信号内容对所述第一去抖动策略进行 修正。
在本申请一个实施方式中,所述策略修正部分,配置为:获取本次通话的感知听觉结果,根据所述感知听觉结果对所述第一去抖动策略进行修正。
在本申请一个实施方式中,所述装置还包括:通话采集部分,用于采集本次通话的如Voip通话的语音通话或视频通话数据。所述策略修正部分,配置为:触发采集本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性,根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
在本申请一个实施方式中,所述装置还包括:通话播放部分,配置为播放本次通话的如Voip通话的语音通话或视频通话数据。所述策略修正部分,配置为:触发播放本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性,根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
本申请实施例的一种信息处理系统,包括发送端(或称采集端)41、去抖动端42和接收端(或称播放端)43。其中,发送端(或称采集端)的处理逻辑包括:采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数,将所述至少一个网络参数用于构建网络模型,所述网络模型用于在传输如Voip通话的语音通话或视频通话数据中确定第一去抖动策略;采集本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理能力和/或作为所述如Voip通话的语音通 话或视频通话媒介的应用的调度特性;根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
去抖动端的处理逻辑包括:根据至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略,所述至少一个网络参数,来源于从采集的离线网络数据中提取的用于表征网络特征的参数;根据用于评估网络协议语音如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略;根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,趋于合理。
在实际应用中,所述根据用于评估如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,包括:获取本次通话的历史数据,根据所述本次通话的历史数据对所述第一去抖动策略进行修正。
在实际应用中,所述根据用于评估如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,包括:获取本次通话的信号内容,根据所述本次通话的信号内容对所述第一去抖动策略进行修正。
在实际应用中,所述根据用于评估如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,包括:获取本次通话的感知听觉结果,根据所述感知听觉结果对所述第一去抖动策略进行修正。
接收端(或称播放端)的处理逻辑包括:获取在传输如Voip通话的语音通话或视频通话数据中确定的第一去抖动策略,所述第一去抖动策略根据至少一个网络参数构建的网络模型得到,所述至少一个网络参数,来源于从采集的离线网络数据中提取的用于表征网络特征的参数;播放本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理 能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性;根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
上述信息处理系统,如图14所示,包括发送端(或称采集端)41、去抖动端42和接收端(或称播放端)43。其中,发送端(或称采集端)41包括:采集部分411,用于采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数,将所述至少一个网络参数用于构建网络模型,所述网络模型用于在传输如Voip通话的语音通话或视频通话数据中确定第一去抖动策略;通话采集部分412,用于采集本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性;第一策略修正部分413,用于根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。去抖动端42包括:策略确定部分421,用于根据至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略,所述至少一个网络参数,来源于从采集的离线网络数据中提取的用于表征网络特征的参数;第二策略修正部分422,用于根据用于评估如Voip通话的语音通话或视频通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略;缓冲区调整部分423,用于根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输如Voip通话的语音通话或视频通话数据的缓冲区大小,使如Voip通话的语音通话或视频通话的时延符合预期,趋于合理。接收端(或称播放端)43包括:获取部分431,用于获取在传输如Voip通话的语音通话或视频通话数据中确定的第一去抖动策略,所述第一去抖动策略根据至少一个网络参数构建的网络模型得到,所述至少一个网络参数,来源于从采集的离线网 络数据中提取的用于表征网络特征的参数;通话播放部分432,用于播放本次通话的如Voip通话的语音通话或视频通话数据时,获取终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性;第三策略修正部分433,用于根据所述终端设备的不同处理能力和/或作为所述如Voip通话的语音通话或视频通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
其中,对于用于数据处理的处理器而言,在执行处理时,可以采用微处理器、中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现;对于存储介质来说,包含操作指令,该操作指令可以为计算机可执行代码,通过所述操作指令来实现上述本申请实施例信息处理方法流程中的各个步骤。
这里需要指出的是:以上涉及终端和服务器项的描述,与上述方法描述是类似的,同方法的有益效果描述,不做赘述。对于本申请终端和服务器实施例中未披露的技术细节,请参照本申请方法流程描述的实施例所描述内容。
以一个现实应用场景为例对本申请实施例阐述如下:
在Voip网络通话场景中,采用本申请实施例,可以是如Voip通话的语音通话或视频通话中端到端延时处理的方案。一般如Voip通话的语音通话或视频通话端到端所包含的模块如图15示,端到端时延指的是从说话人A说话开始到接听人B接听到声音的时间差。如Voip通话的语音通话或视频通话技术将数据以包的形式通过IP网络进行分组传输,由于IP网络固有的特性,每个包在网络上传输所使用的时间是不确定的,这种传输时间的差异称之为抖动。可以通过合理的路由调度,选择抖动小的链路进行传输;对于已选定的链路,可以通过增加缓存时延来处理抖动;但是如果缓存时 延太大,会增加端到端的总体时延,影响实时通话的体验效果;缓存时延太小,则会造成声音的卡顿,影响通话质量。处理抖动的模块主要在图15中的“去抖动&解码”模块。
从图15可以看出,端到端的时延主要有:设备的缓存时延(主要是声卡采集的缓存时延、声卡播放的缓存时延)、Voip应用程序各模块处理的数据缓存时延(主要是去抖动模块产生的时延)、网络传输时延(不可控)。本申请实施例可以实现实时通话中降低端到端的时延,涉及到从采集到播放的各个环节,包括以下内容:
一,对于应用程序的去抖动模块:
a)根据不同的网络类型,收集大量现网相关数据,离线训练,建立网络模型,根据不同的大数据网络模型设置时延处理方法和参数;
b)在单次通话中,根据本次通话的历史数据,调整a)中的络参数设置和时延处理参数;
c)单次通话中,根据感知听觉结果来调整b)中的时延处理参数;
d)单次通话中,根据信号内容调整b)中的时延处理参数;
二,对于设备:
根据设备的不同处理能力、应用程序线程的调度特性等设置不同的时延处理方法和参数。
针对上述应用场景,现有技术中的方案大多是针对网络传输的去抖动方案,具体采用如图15中的“去抖动&解码”模块来实现,实现框图分别如图16-9所示。
如图16所示,在方案一的实现流程中,包括:确定用于表示当前网络抖动情况的网络抖动参数;根据当前网络抖动参数,调整抖动缓存Jitter Buffer的延时参数;根据调整后的Jitter Buffer的延时参数,对Jitter Buffer中的数据包进行延时处理。可选地,是首先确定用于表示当前网络抖动的 参数,为:使用PktComeThisTime记录每次到达Jitter Buffer的10ms包的个数,记录多个PktComeThisTime并确定其中的最大值,记为Pm;然后由一系列加权平均Pm得到一个表示网络抖动J的参数,根据J来调整Jitter Buffer的大小。
如图17所示,在方案二的实现流程中,首先,在接收端根据历史数据预测或者估计网络时延dn,同时,统计接收端的丢包率;然后,根据估计的网络时延和统计的丢包率,基于E-Model得到当前理想的去抖动缓冲区的大小;最后,基于缓冲区大小对缓存区数据进行调整。
上述两个方案所存在的问题包括:
1)网络估计方面:网络特性的估计对去抖动算法起着重要指导作用,在两个现有技术方案中,去抖动缓冲区的大小,都是依据当次通话的历史数据估计的网络特性决定的,虽然网络特性估计方法不同,但是共同的缺点都是使用的参数比较单一,对网络的复杂性模拟不够。
而采用本申请实施例,先通过离线抓包,提取相应的参数表征网络特性,通过大量的离线训练,建立不同的网络模型参数,根据建立的网络参数模型决定初始的去抖动算法和相关参数;然后,根据当次通话的历史数据,对去抖动算法和相关参数进行调整。同时,在网络模型的建模上,考虑了整个通话过程中网络的总体特性,也考虑一段时间内的突发性。这样,能更精确的估计网络特性。
2)去抖算法上:对于缓冲区数据的调整方案上,在方案一中,仅仅是根据网络估计值对缓冲区进行调整,没有考虑到不同数据内容对人耳听觉感知的影响,如方案中提到,某些情况下,为了保证时延,需要将缓冲区数据丢掉,这时,就没有考虑当时的信号类型,不管是语音消息还是静音数据统统直接丢掉,简单粗暴,通话体验效果是不好的;在方案二中,虽然使用了E-model进行指导,但是在单次通话中,E-model的复杂度太高, 可实用性有限。而且,两种方案中的去抖动算法,都是以“包”为单位进行调整,这样灵活性也有限。
而采用本申请实施例,根据调整时刻信号的内容和传统的感知听觉评价参数来决定去抖动算法的选取,更灵活的处理,使得感知听觉上最终效果更好。
3)在采集和播放方面:上述两个技术方案都没有考虑到不同的采集、播放策略和线程调度对去抖动的影响,而采用本申请实施例,充分考虑了采集、播放策略和线程调度对去抖动的影响。
针对上述应用场景,采用本申请实施例,总体示意图如图18所示,包括:根据当前网络估计情况,确定当前缓冲区大小调整的下限值AD_dw和调整的上限值AD_up。然后,根据当前缓冲区数据的大小JB_len、AD_up/AD_dw的大小、当前信号内容和人耳感知听觉模型,决定对当前缓冲区数据进行调整的方式和调整的幅度。同时,在采集和播放时,根据设备性能调整采集和播放策略,使得发送数据速度更均匀、向缓冲区要数据的速度也更均匀,使去抖动模块工作在最好的状态下,具体实现如下所述:
1)当JB_len>AD_up时:
当JB_len>AD_up×F1时,如果当前帧信号内容是重要帧(如语音段),则对当前缓冲区数据进行压缩处理;如果当前帧为非重要数据(如静音数据),则直接将当前帧丢掉。当JB_len>AD_up×F2时(F1)F2),如果当前帧信号内容是重要帧(如语音段),则不对当前缓冲区数据进行任何处理;如果当前帧为非重要数据(如静音数据),则对当前缓冲区数据进行压缩处理。
压缩的幅度根据F1、F2的大小决定,每次压缩的幅度小于当前帧的数据长度。
这样处理的依据,是因为不管是对信号进行压缩还是直接丢掉,其实对通话质量都是一种损伤,而且,直接丢包的损伤大于压缩的损伤;基于 单个包的压缩算法,每次压缩的幅度都小于1帧的数据长度,所以数据压缩相比直接丢掉当前帧,对缓冲区数据长度的降低没那么快,也即对端到端时延的降低速度慢一些。因此,只有在缓冲区数据长度非常大且当前数据为非重要数据时,我们才采取直接丢帧的方法;而如果缓存区的数据长度非常大而当前数据又是重要数据时,我们就用损伤较小的方式即压缩来调整缓冲区长度;而如果缓存区的数据长度虽然大于一定的阈值,但是当前帧为重要数据,我们还是采取什么都不做的策略,这样最大程度的保证了语音段的通话质量。多出来的时延,我们可以等到非静音段再进行快速处理,达到降低端到端时延目的的同时,又最大程度的保证通话的感知质量。
2)当JB_len<AD_dw时:
当JB_len<AD_dw×F3时,如果当前帧是非重要帧,直接对当前帧进行重复拷贝,拷贝次数根据F3的大小决定;如果当前帧是重要帧,则对当前缓存区数据进行扩展处理。当JB_len<AD_dw×F4时(F3<F4),对当前缓冲区进行扩展处理。每次扩展的幅度,根据F3和F4的大小决定。
这样处理的依据,是因为虽然扩展或者直接拷贝数据,对声音也是一种损伤,但是相比因为缓冲区数据为空而造成的声音卡顿来说,这种损伤对通话体验的影响要小很多,所以当发现缓冲区数据长度小于调整下限时,原则上是快速响应、尽快调整缓冲区数据大小。
3)当AD_up>=JB_len>=AD_dw时:
这时,直接将缓冲区的数据解码后送入声卡设备,不做任何去抖动处理。
在1)和2)的调整算法中,扩展或者压缩,还需要看信号的内容和当时的调整算法,比如,因为扩展和压缩算法是基于基音周期的,而音乐信号不适合这种扩展或者压缩算法,所以如果检测到当前信号是音乐信号而 非语音信号时,还需要对调整参数(AD_up、AD_dw、F1~F4)做适当调整。
同时,如果连续扩展/压缩太多,听觉感知上会有快放或者慢播的效果,所以1)和2)中的调整算法,还需要根据历时调整策略做适当的调整(如规定连续扩展或者压缩的最大次数等),保证最终听觉感知上听不出快播或者慢播的效果。
本方案中,通过离线网络特征建模:通过离线抓包,分析大量现网数据,提取参数,建立不同的网络模型。
例如,图19和图20提取了离线数据中的“前后两个包到达的时间差”作为模型特征参数之一,相比于图20,图19的取值范围波动比较大,表示网络的抖动比较大。图19中,对于抖动较大时前后两个包的到达时间差,极大的突然抖动,较少;图20中,对于抖动较小时前后两个包的到达时间差,极大突发抖动,较多。但是,图20的极大突发抖动比较多(图中前后两个包到达时间差大于1000ms的次数比较多)。可以用如RFC 3550中的方法计算传统的Jitter值,表示当前“时刻”的网络抖动,但是这往往是不够的,因为,图20的总体抖动虽然小些,但是突发大抖动多一些,可以通过对“前后两个包到达的时间差”进行累计直方图统计、方差统计、整个通话过程中的平滑包络值、突发次数等等进行计算,用来区分图19和图20两种网络模型。
除了“前后两个包到达的时间差”,还可以对连续丢包个数、整体丢包率、乱序率、乱序长度等等作为建模参数来进行分析。
根据当前通话历史的网络参数对去抖动参数的调整:根据步骤1)的结果,初步决定去抖动参数AD_up和AD_dw;然后,根据当前通话的历史数据,调整AD_up和AD_dw。
比如,从对大量的离线数据分析我们发现,对于不同的网络类型,如2g、3g、4g、wifi,在大的方向上,会表征出不同的网络特性趋势,如相比 于4g来说,2g网络更容易因为网络拥塞出现大的抖动。这个时候,2g相对于4g,在初始化时,我们可以设置较大的AD_up和AD_dw;然后,根据当前通话的历史数据,由1)中分析出的网络参数,根据不同的网络模型来调整AD_up和AD_dw的大小和参数F1~F4。即使同样是wifi网络,特性也不相同,比如,对于类似图14的网络类型,即总体抖动小但是突发大抖动比较多的网络类型,我们可以设置较小的AD_up和AD_dw,保证端到端的总体时延较小,但是,当JB_len<AD_dw时,我们可以调整F3和F4,使扩展策略更加激进(扩展服务更大或者一次拷贝更多数据),响应更加快,以达到更好更快抵抗突发大抖动的目的。
根据信号内容对去抖动参数调整:根据当前信号的内容(音乐或者语音等)、重要程度(静音或者非静音等),对去抖动参数进行调整(即对AD_up和AD_dw、F1~F4进行调整)。如:音乐信号时,相同网络情况下,尽量使用较大的AD_up和AD_dw。总体原则是:重要帧的地方,尽量少做去抖动处理;缓冲区长度大于AD_up时,调整策略可以稍微缓一下,等到非重要帧再进行处理;而缓冲区长度小于AD_dw时,需要尽快调整,避免卡顿。尽量保证听觉感知质量的情况下,在一定必要时才做去抖动处理。
根绝听觉感知对去抖动参数进行调整:对信号做扩展、压缩或者时长调整时,要控制调整频率,使得感知听觉上听不出快播或者慢播的效果。
采集/播放设备的自适应:在图18中,由于设备的处理能力不同、应用程序的调度特性不同,使得发送包的速度不够均匀或者是没有规律可言。而去抖动模块的设计,是基于包的发送速度是均匀或者有规律的。而发送速度的均匀性,主要是由声卡的采集方式和线程的调度特性决定的。比如,如果采用声卡回调的方式驱动应用程序进行编码/发送,那么相对于ios设备来说,android设备两次声卡回调的时间间隔出现不均匀的情况会更多一些,而且,性能越差的机器,这种情况越多。这个时候,我们可以根据不 同的设备性能,采用声卡回调或者定时器回调的方法来驱动应用程序进行编码/发送,使得包的发送间隔更加均匀。同样的,在播放端,也尽量使应用程序向缓冲区要数据的数据均匀,这样,才能使去抖动模块工作在最佳状态,使得端到端时延最低。对于线程调度的差异,比如相同的设备,音视频通话相对于纯音频通话时,由于有视频的采集、编解码等,手持设备的处理能力有限,音视频通话时,线程调度的均匀性没有纯音频时好,这个时候,在充分优化了线程调度方法之后,同样的网络情况下,可以适当增大去抖动算法的参数,以减少卡顿。
此处需要说明的是,上述子部分与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例1所公开的内容。需要说明的是,上述子部分作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现,其中,硬件环境包括网络环境。
本申请实施例还提供了一种用于实施上述方法的服务器或终端。
图7是根据本申请实施例的一种终端的结构框图,如图7所示,该终端可以包括:一个或多个(图中仅示出一个)处理器701、存储器703、以及传输装置705(如上述实施例中的发送装置),如图7所示,该终端还可以包括输入输出设备707。
其中,存储器703可配置为存储软件程序以及子部分,如本申请实施例中的方法和装置对应的程序指令/子部分,处理器701通过运行存储在存储器703内的软件程序以及子部分,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器703可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器703可进一步包括相对于处理器701远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述处理器701可为中央处理器、微处理器、数字信号处理器、应用处理器或可编程阵列等。
所述处理器701可通过集成电路总线等与存储器703连接。
上述的传输装置705配置为经由一个网络接收或者发送数据,还可以用于处理器与存储器之间的数据传输。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置705包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置705为射频(Radio Frequency,RF)子部分,其配置为通过无线方式与互联网进行通讯。
其中,可选地,存储器703用于存储应用程序。
处理器701可以通过传输装置705调用存储器703存储的应用程序,以执行下述步骤:基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,其中,第一媒体信息包括第一数据包,第一媒体信息是第二客户端与第一客户端进行音频通话或视频通话时传输的媒体信息;在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;在网络状态信息满足预设条件的情况下,向第二客户端发送重传请求,其中,重传请求用于请求第二客户端重传第一媒体信息中丢失的第二数据包,预设条件用于指示预设网络重传第二数据包所需达到的网络条件;在网络状态信息不满足预设条件的情况下,取消向第二客户端发送重传请求。
处理器701还配置为执行下述步骤:在向第二客户端发送重传请求之后,接收第二客户端发送的第二数据包;根据第一数据包和第二数据包生成第二媒体信息;在网络状态信息不满足预设条件的情况下,根据第一数 据包生成第三媒体信息。
处理器701还配置为执行下述步骤:在获取预设网络的网络状态信息之后、且在向第二客户端发送重传请求或取消向第二客户端发送重传请求之前,判断网络状态信息所指示的预设网络的第一网络状态是否与重传第二数据包所需的第二网络状态匹配;在第一网络状态与第二网络状态匹配的情况下,判断出网络状态信息满足预设条件;在第一网络状态与第二网络状态不匹配的情况下,判断出网络状态信息不满足预设条件。
处理器701还配置为执行下述步骤:判断带宽阈值与当前使用带宽的差值是否小于第一预设值;判断当前传输时延是否小于传输时延阈值;判断当前丢包率是否小于丢包率阈值;判断连续丢包的数量是否小于第二预设值;其中,预设判断结果用于指示第一网络状态与第二网络状态匹配,预设判断结果包括以下至少之一:判断出带宽阈值与当前使用带宽的差值小于第一预设值;判断出当前传输时延小于传输时延阈值;判断出当前丢包率小于丢包率阈值;判断出连续丢包的数量小于第二预设值。
采用本申请实施例,提供了一种通话方法的方案。基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,其中,第一媒体信息包括第一数据包;在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络状态信息;在网络状态信息满足预设条件的情况下,向第二客户端发送重传请求,其中,重传请求用于请求第二客户端重传第一媒体信息中丢失的第二数据包;在网络状态信息不满足预设条件的情况下,取消向第二客户端发送重传请求,在网络情况允许的情况下,通过重传请求获取丢失的数据包,达到使媒体信息更为完整的目的,从而实现了提升即时通讯质量的技术效果,进而解决了相关技术中由于网络拥堵造成的即时通讯质量较差的技术问题。
可选地,本实施例中的具体示例可以参考上述实施例1和实施例2中所描述的示例,本实施例在此不再赘述。
本领域普通技术人员可以理解,图7所示的结构仅为示意,终端可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图7其并不对上述电子装置的结构造成限定。例如,终端还可包括比图7中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图7所示不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本申请的实施例还提供了一种计算机存储介质。可选地,在本实施例中,上述存储介质可以用于执行通话方法的程序代码等计算机可执行指令。
可选地,在本实施例中,上述存储介质可以位于上述实施例所示的网络中的多个网络设备中的至少一个网络设备上。
所述计算机存储介质可为非瞬间存储介质。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:
S1,基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断第二客户端通过预设网络向第一客户端发送的第一媒体信息是否发生丢包,其中,第一媒体信息包括第一数据包,第一媒体信息是第二客户端与第一客户端进行音频通话或视频通话时传输的媒体信息;
S2,在判断出第一媒体信息发生丢包的情况下,获取预设网络的网络 状态信息;
S3,在网络状态信息满足预设条件的情况下,向第二客户端发送重传请求,其中,重传请求用于请求第二客户端重传第一媒体信息中丢失的第二数据包,预设条件用于指示预设网络重传第二数据包所需达到的网络条件;
S4,在网络状态信息不满足预设条件的情况下,取消向第二客户端发送重传请求。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:在向第二客户端发送重传请求之后,接收第二客户端发送的第二数据包;根据第一数据包和第二数据包生成第二媒体信息;在网络状态信息不满足预设条件的情况下,根据第一数据包生成第三媒体信息。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:在获取预设网络的网络状态信息之后、且在向第二客户端发送重传请求或取消向第二客户端发送重传请求之前,判断网络状态信息所指示的预设网络的第一网络状态是否与重传第二数据包所需的第二网络状态匹配;在第一网络状态与第二网络状态匹配的情况下,判断出网络状态信息满足预设条件;在第一网络状态与第二网络状态不匹配的情况下,判断出网络状态信息不满足预设条件。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:判断带宽阈值与当前使用带宽的差值是否小于第一预设值;判断当前传输时延是否小于传输时延阈值;判断当前丢包率是否小于丢包率阈值;判断连续丢包的数量是否小于第二预设值;其中,预设判断结果用于指示第一网络状态与第二网络状态匹配,预设判断结果包括以下至少之一:判断出带宽阈值与当前使用带宽的差值小于第一预设值;判断出当前传输时延小于传输时延阈值;判断出当前丢包率小于丢包率阈值;判断出连续丢包的数 量小于第二预设值。
可选地,本实施例中的具体示例可以参考上述实施例1和实施例2中所描述的示例,本实施例在此不再赘述。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以下还结合图8提供一个通话方法的实施例,包括:
步骤S1:丢包检测,例如可包括:根据包头信息中的序号索引信息,判断是否有丢包,如果步骤S1中检测到没有丢包,就不发送重传请求则接续后续流程,否则进入步骤S2。
步骤S2:进行当前网络状况的网络特性分析。网络特性包括但不限于:使用码率、估计带宽、丢包率、抖动、端到端传输时延等。
步骤S3:根据步骤S2中分析的结果,计算相应的网络参数的相关阈值。所述相关阈值的计算包括但不限于,确定带宽阈值、根据估计的带宽。在一定情况下,当使用码率大于一定阈值时,就不允许发送重传请求。
例如,确定传输时延阈值时,根据网络抖动,确定传输时延阈值;在一定的抖动下,当传输时延大于某个阈值的时候,就不允许发送重传请求,因为这时候即使发送了重传请求,重传过来的响应数据也可能用不上,利用率太低。
再例如,丢包率阈值:根据历史丢包率、丢包模型分析,确定当前丢包率下的阈值。比如在某些带宽不够的网络下、或者丢包率特别特别大的网络下,发的数据越多丢的数据越多,这时候再发送重传请求增加网络负担也是无用或者有害的。
步骤S4:根据重传请求的相应利用率,调整之前确定网络参数的相关 阈值。这里的相应利用率为前述预设参数的一种。
例如:计算重传请求与接收到的响应数据的比例:客户端B缓存的历史数据是有一定的长度限制的,如果客户端A到B的传输时延太大,客户端B收到的重传请求中携带的请求包数据信息已经在缓存数据之外,那么就不会对客户端A的重传请求进行响应,这时候重传请求/接收到的响应数据的比例就会特别低。故可以基于所述比例,得到使得该比例高于一定值的所需的网络参数的相关阈值。
又例如,为了避免客户端A发送太多重传请求造成带宽浪费,需要降低重传请求的发送频率,即提高网络参数的相关阈值,计算响应数据的实际利用率:客户端B收到重传请求之后,在历史缓存数据中找到了相应的数据,将数据作为响应包重新发送给客户端A。这时候,如果客户端B到客户端A的传输时延太大,响应数据到达客户端A的时候可能已经不满足实时通话的数据要求,变成晚到的包需要主动丢掉,这时候虽然有响应数据,但是响应数据的利用率太低。如果一段时间内长期实际利用率低,也需要降低重传的请求频率,即提高网络参数的相关阈值。
这些相关阈值即描述了前述的网络条件。
步骤S5:传输的数据包对应的信号特性分析:对信号进行分析,如清音、浊音分析,语音、静音分析,语义重要性分析等,再利用步骤S4调整后的网络参数的相关阈值,比如,带宽足够时,只要检测到丢包就可以进行重传请求,带宽不够时,只对丢失的重要语音帧进行重传请求。
步骤S6:请求判决:根据网络参数的相关阈值、当前的网络状况、信号特性等进行综合判决,决定有丢包时,是否允许发送重传请求。若允许重传,则发送重传请求,若不允许重传,则禁止发送重传请求,并返回步骤S1。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的部分如果以软件功能部分的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述部分的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个部分或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,部分或子部分的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的部分可以是或者也可以不是物理上分开的,作为部分显示的部件可以是或者也可以不是物理部分,即可以位于一个地方,或者也可以分布到多个网络部分上。可以根据实际的需要选择其中的部分或者全部部分来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能部分可以集成在一个处理部分中,也可以是各个部分单独物理存在,也可以两个或两个以上部分集成在一个部分中。上述集成的部分既可以采用硬件的形式实现,也可以采用软件功能部分的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的 普通技术人员来说,凡按照本申请原理所作的修改,都应当理解为落入本申请的保护范围。
工业实用性
本申请实施例中在请求数据包重发之前,会获取接收重传的数据包的预设网络的网络状况,以根据网络状况确定是否请求重传,这样的话,就可以减少因为预设网络状况在拥堵时,还接收到大量的重传请求,导致的网络的进一步拥堵,以使得预设网络保留更多的资源用于新传数据的传输,提升传输效率,从而具有积极的工业效果。且可以在终端设备中运行对应的计算机程序等计算机可执行指令,具有工业可实现性强的特点。

Claims (41)

  1. 一种通话方法,包括:
    基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断所述第二客户端通过所述预设网络向所述第一客户端发送的第一媒体信息是否发生丢包,其中,所述第一媒体信息包括初传成功的所述第一数据包,所述第一媒体信息是所述第二客户端与所述第一客户端进行音频通话或视频通话时传输的媒体信息;
    在判断出所述第一媒体信息发生丢包的情况下,获取所述预设网络的网络状态信息;
    确定请求所述第二客户端重传第二数据包的预定参数,其中,所述第二数据包为所述第一媒体信息中传输失败的数据包的重传数据包;所述预定参数包括:重传成功的第一概率阈值及成功输出所述第二数据包的第二概率阈值的至少其中之一;
    根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,其中,所述预设条件用于指示所述预设网络成功重传所述第二数据包的概率不小于所述第一概率阈值所需的网络条件,和/或,用于指示成功重传的所述第二数据包能够成功被输出的概率不小于所述第二概率阈值所需的网络条件;
    在所述网络状态信息满足所述预设条件的情况下,向所述第二客户端发送重传请求;
    在所述网络状态信息不满足所述预设条件的情况下,取消向所述第二客户端发送所述重传请求。
  2. 根据权利要求1所述的方法,其中,
    在向所述第二客户端发送重传请求之后,所述方法还包括:接收所述第二客户端发送的所述第二数据包;根据所述第一数据包和所述第二数据 包生成第二媒体信息;
    在所述网络状态信息不满足所述预设条件的情况下,所述方法还包括:根据所述第一数据包生成第三媒体信息。
  3. 根据权利要求1所述的方法,其中,在获取所述预设网络的网络状态信息之后、且在向所述第二客户端发送重传请求或取消向所述第二客户端发送所述重传请求之前,所述方法还包括:
    判断所述网络状态信息所指示的所述预设网络的第一网络状态是否与重传所述第二数据包所需的第二网络状态匹配;
    在所述第一网络状态与所述第二网络状态匹配的情况下,判断出所述网络状态信息满足所述预设条件;在所述第一网络状态与所述第二网络状态不匹配的情况下,判断出所述网络状态信息不满足所述预设条件。
  4. 根据权利要求3所述的方法,其中,判断所述网络状态信息所指示的所述预设网络的第一网络状态是否与重传所述第二数据包所需的第二网络状态匹配包括以下至少之一:
    判断带宽阈值与当前使用带宽的差值是否小于第一预设值;
    判断当前传输时延是否小于传输时延阈值;
    判断当前丢包率是否小于丢包率阈值;
    判断连续丢包的数量是否小于第二预设值;
    其中,预设判断结果用于指示所述第一网络状态与所述第二网络状态匹配,所述预设判断结果包括以下至少之一:判断出所述带宽阈值与所述当前使用带宽的差值小于所述第一预设值;判断出所述当前传输时延小于所述传输时延阈值;判断出所述当前丢包率小于所述丢包率阈值;判断出连续丢包的数量小于所述第二预设值。
  5. 根据权利要求4所述的方法,其中,在判断所述网络状态信息所指示的所述预设网络的第一网络状态是否与重传所述第二数据包所需的第二 网络状态匹配之前,所述方法还包括:
    获取用于表征所述第一网络状态的所述当前使用带宽、所述当前传输时延、所述当前丢包率以及用于描述允许连续丢包数量的所述第二预设值;
    根据所述预设网络的带宽信息确定所述带宽阈值;
    根据所述预设网络的网络抖动信息确定所述传输时延阈值;
    根据历史丢包率和丢包模型确定所述丢包率阈值。
  6. 根据权利要求1至5中任一项所述的方法,其中,在向所述第二客户端发送重传请求或取消向所述第二客户端发送所述重传请求之后,所述方法还包括以下至少之一:
    根据前一次确定的带宽阈值和所述预设网络的当前带宽信息重新确定当前的带宽阈值;
    在接收到的所述第二数据包的数量与发送的所述重传请求的数量的第一比值小于第三预设值的情况下,增大丢包率阈值,并减小传输时延阈值;
    在接收到的有效的所述第二数据包与接收到的所有所述第二数据包间的第二比值小于第四预设值的情况下,增大所述丢包率阈值,并减小所述传输时延阈值。
  7. 根据权利要求1所述的方法,其中,
    在向所述第二客户端发送重传请求之前,所述方法还包括:通过对所述第一数据包中的媒体信息段进行信号特征分析确定丢失的所述第二数据包的语音特征;
    在所述网络状态信息满足预设条件的情况下,向所述第二客户端发送重传请求包括:在所述网络状态信息满足所述预设条件,且所述语音特征包括浊音特征、语音特征以及语义特征中的至少一个的情况下,向所述第二客户端发送重传请求。
  8. 根据权利要求1所述的方法,其中,判断所述第二客户端通过所述 预设网络向所述第一客户端发送的第一媒体信息是否发生丢包包括:
    根据所述第一数据包中的序号索引信息判断所述第一媒体信息是否发生丢包。
  9. 根据权利要求1所述的方法,其中,
    所述根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,包括以下至少之一:
    根据所述第二客户端缓存所述第一媒体信息的缓存时间,确定所述重传请求以不小于所述第一概率阈值在所述缓存时间内达到所述第二客户端所需的第一网络条件;
    根据所述第一客户端中媒体信息的输出速率,确定所述第二数据包达到所述第一客户端后以不小于所述第二概率阈值被输出所需的第二网络条件。
  10. 根据权利要求1所述的方法,其中,所述方法还包括:
    根据当前的所述网络状况信息和第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输所述音频通话或视频通话的通话数据的缓冲区容量,使所述音频通话或视频通话的时延符合预期。
  11. 根据权利要求10所述的方法,其中,所述方法还包括:
    采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数;
    根据所述至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略;
    根据用于评估音频通话或视频通话的通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略。
  12. 根据权利要求11所述的方法,其中,
    所述根据用于评估音频通话或视频通话的通话质量的通话质量的特征 参数对所述第一去抖动策略进行修正,包括:
    获取本次音频通话或视频通话的历史数据;
    根据所述本次音频通话或视频通话的历史数据对所述第一去抖动策略进行修正。
  13. 根据权利要求10所述的方法,其中,所述根据用于音频通话或视频通话的通话质量的通话质量的特征参数对所述第一去抖动策略进行修正,包括:
    获取本次音频通话或视频通话的信号内容;
    根据所述本次音频通话或视频通话的信号内容对所述第一去抖动策略进行修正。
  14. 根据权利要求10所述的方法,其中,所述根据用于评估音频通话或视频通话的通话质量的通话质量的特征参数对所述第一去抖动策略进行修正,包括:
    获取本次音频通话或视频通话的感知听觉结果;
    根据所述感知听觉结果对所述第一去抖动策略进行修正。
  15. 根据权利要求10所述的方法,其中,所述方法还包括:
    采集本次音频通话或视频通话的通话数据时,获取终端设备的不同处理能力和/或作为通话媒介的应用的调度特性;
    根据所述终端设备的不同处理能力和/或作为所述通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
  16. 根据权利要求10所述的方法,其中,所述方法还包括:
    播放本次音频通话或视频通话的通话数据时,获取终端设备的不同处理能力和/或作为所述通话媒介的应用的调度特性;
    根据所述终端设备的不同处理能力和/或作为所述通话媒介的应用的调度特性对所述第一去抖动策略进行修正。
  17. 根据权利要求1或10所述的方法,其中,所述方法还包括:
    判断所述第一客户端和所述第二客户端是否处于同时采集到声音的双讲状态;
    当处于所述双讲状态时,对所述语音通话或所述视频通话进行提升通话质量的特定处理。
  18. 根据权利要求17所述的方法,其中,
    所述判断所述第一客户端和所述第二客户端是否处于同时采集到声音的双讲状态,包括:
    根据所述第一媒体信息,获取所述第一客户端提供的远端信号,所述远端信号是根据语音通话的对端发送的声音信号所获得的信号;
    对所述远端信号叠加超声波信号,获得叠加所述超声波信号后的混合信号,并通过扬声器部分播放所述混合信号;
    获取所述第二客户端的近端信号,所述近端信号是第二客户端通过麦克风部分采集到的声音信号;
    根据所述超声波信号确定所述混合信号中的第一信号段和所述近端信号中的第二信号段;
    计算所述第一信号段与所述第二信号段之间的相关值;
    当所述相关值小于预设的相关值阈值时,确定所述麦克风部分采集到所述近端信号时的通话状态为双讲状态。
  19. 根据权利要求18所述的方法,其中,所述对所述远端信号叠加超声波信号之前,还包括:
    检测所述远端信号的功率值是否大于预设功率阈值;
    当检测结果为所述远端信号的功率值大于所述预设功率阈值时,执行所述对所述远端信号叠加超声波信号的步骤。
  20. 根据权利要求18所述的方法,其中,所述获取远端信号,包括:
    对接收到的所述第一媒体信息中的声音信号进行低通滤波,获得所述远端信号;
    其中,所述低通滤波的截止频率低于所述超声波信号的最低频率。
  21. 根据权利要求18所述的方法,其中,所述根据所述超声波信号确定所述远端信号中的第一信号段和所述近端信号中的第二信号段,包括:
    将所述近端信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为所述第二信号段;
    确定最近播放的,且叠加有承载所述目标数据信息的超声波信号的混合信号的播放时间;
    将所述混合信号中,在所述播放时间上播放的信号确定为所述第一信号段。
  22. 根据权利要求18所述的方法,其中,所述根据所述超声波信号确定所述远端信号中的第一信号段和所述近端信号中的第二信号段,包括:
    将所述混合信号中,承载目标数据信息的超声波信号所对应的时域上的信号确定为所述第一信号段;
    在所述第一信号段被播放后采集到的所述近端信号中,查询承载所述目标数据信息的超声波信号所对应的时域上的信号;
    将查询获得的信号确定为所述第二信号段。
  23. 根据权利要求18所述的方法,其中,叠加在所述远端信号上的所述超声波信号所承载的数据信息在预定周期内不重复;
    所述预定周期大于或者等于回声时延的最大值,所述回声时延是所述扬声器部分播放所述混合信号到所述麦克风部分采集到所述混合信号对应的回声之间的时延。
  24. 根据权利要求18至23任一所述的方法,其中,
    所述超声波信号所承载的数据信息包括若干个超声编码,每个所述超 声编码由至少两个编码部分组成,且每个所述编码部分用于指示至少两个超声频点中的每个超声频点上是否存在信号。
  25. 根据权利要求18至23任一所述的方法,其中,所述计算所述第一信号段与所述第二信号段之间的相关值,包括:
    分别获取所述第一信号段与所述第二信号段各自对应的功率谱;
    对所述第一信号段与所述第二信号段各自对应的功率谱进行二值化处理,获得所述第一信号段与所述第二信号段各自对应的二值化数组;
    计算所述第一信号段与所述第二信号段各自对应的二值化数组之间的相关值。
  26. 根据权利要求18至23任一所述的方法,其特征在于,所述方法还包括:
    在对所述远端信号叠加超声波信号之前,检测将所述远端信号和所述超声波信号叠加之后获得的声音信号的幅值是否超出预设的幅值范围;
    若检测结果为所述声音信号的幅值超出所述预设的幅值范围,则按照预定的衰减策略对所述远端信号的幅值进行衰减处理。
  27. 一种通话装置,包括:
    第一判断部分,配置为基于第一客户端通过预设网络接收到的第二客户端发送的第一数据包,判断所述第二客户端通过所述预设网络向所述第一客户端发送的第一媒体信息是否发生丢包,其中,所述第一媒体信息包括初传成功的所述第一数据包,所述第一媒体信息是所述第二客户端与所述第一客户端进行音频通话或视频通话时传输的媒体信息;
    参数确定部分,配置为确定请求所述第二客户端重传第二数据包的预定参数,其中,所述第二数据包为所述第一媒体信息中传输失败的数据包的重传数据包;所述预定参数包括:重传成功的第一概率阈值及成功输出所述第二数据包的第二概率阈值的至少其中之一;
    条件确定部分,配置为根据所述预定参数,确定请求重传时所述网络状况信息需要满足的预设条件,其中,所述预设条件用于指示所述预设网络成功重传所述第二数据包的概率不小于所述第一概率阈值所需的网络条件,和/或,用于指示成功重传的所述第二数据包能够成功被输出的概率不小于所述第二概率阈值所需的网络条件;
    第一获取部分,配置为在判断出所述第一媒体信息发生丢包的情况下,获取所述预设网络的网络状态信息;
    第一执行部分,配置为在所述网络状态信息满足预设条件的情况下,向所述第二客户端发送重传请求;
    第二执行部分,配置为在所述网络状态信息不满足所述预设条件的情况下,取消向所述第二客户端发送所述重传请求。
  28. 根据权利要求27所述的装置,其中,所述装置还包括:
    接收部分,配置为在向所述第二客户端发送重传请求之后,接收所述第二客户端发送的所述第二数据包;
    第一生成部分,配置为根据所述第一数据包和所述第二数据包生成第二媒体信息;
    第二生成部分,配置为在所述网络状态信息不满足预设条件的情况下,根据所述第一数据包生成第三媒体信息。
  29. 根据权利要求27所述的装置,其中,所述装置还包括:
    第二判断部分,配置为在获取所述预设网络的网络状态信息之后、且在向所述第二客户端发送重传请求或取消向所述第二客户端发送所述重传请求之前,判断所述网络状态信息所指示的所述预设网络的第一网络状态是否与重传所述第二数据包所需的第二网络状态匹配;
    第一确定部分,配置为在所述第一网络状态与所述第二网络状态匹配的情况下,判断出所述网络状态信息满足所述第一预设条件;
    第二确定部分,配置为在所述第一网络状态与所述第二网络状态不匹配的情况下,判断出所述网络状态信息不满足所述第一预设条件。
  30. 根据权利要求29所述的装置,其中,所述第二判断部分包括:
    第一判断子部分,配置为判断带宽阈值与当前使用带宽的差值是否小于第一预设值;
    第二判断子部分,配置为判断当前传输时延是否小于传输时延阈值;
    第三判断子部分,配置为判断当前丢包率是否小于丢包率阈值;
    第四判断子部分,配置为判断连续丢包的数量是否小于第二预设值;
    其中,预设判断结果用于指示所述第一网络状态与所述第二网络状态匹配,所述预设判断结果包括以下至少之一:判断出所述带宽阈值与所述当前使用带宽的差值小于所述第一预设值;判断出所述当前传输时延小于所述传输时延阈值;判断出所述当前丢包率小于所述丢包率阈值;判断出连续丢包的数量小于所述第二预设值。
  31. 根据权利要求30所述的装置,其中,所述装置还包括:
    第二获取部分,配置为在判断所述网络状态信息所指示的所述预设网络的第一网络状态是否与重传所述第二数据包所需的第二网络状态匹配之前,获取用于表征所述第一网络状态的所述当前使用带宽、所述当前传输时延、所述当前丢包率以及用于描述允许连续丢包数量的所述第二预设值;
    第三确定部分,配置为根据所述预设网络的带宽信息确定所述带宽阈值;
    第四确定部分,配置为根据所述预设网络的网络抖动信息确定所述传输时延阈值;
    第五确定部分,配置为根据历史丢包率和丢包模型确定所述丢包率阈值。
  32. 根据权利要求27至31中任意一项所述的装置,其中,
    所述装置还包括:
    第一更新部分,配置为在向所述第二客户端发送重传请求或取消向所述第二客户端发送所述重传请求之后,根据前一次确定的带宽阈值和所述预设网络的当前带宽信息重新确定当前的带宽阈值;
    第二更新部分,配置为在接收到的所述第二数据包的数量与发送的所述重传请求的数量的第一比值小于第三预设值的情况下,增大丢包率阈值,并减小传输时延阈值;
    第三更新部分,配置为在接收到的有效的所述第二数据包与接收到的所有所述第二数据包间的第二比值小于第四预设值的情况下,增大所述丢包率阈值,并减小所述传输时延阈值。
  33. 根据权利要求27所述的装置,其中,
    所述装置还包括:
    第六确定部分,配置为在向所述第二客户端发送重传请求之前,通过对所述第一数据包中的媒体信息段进行信号特征分析确定丢失的所述第二数据包的语音特征;
    所述第一执行部分,还配置为在所述网络状态信息满足所述预设条件,且所述语音特征包括浊音特征、语音特征以及语义特征中的至少一个的情况下,向所述第二客户端发送重传请求。
  34. 根据权利要求27所述的装置,其中,
    所述第一判断部分,还配置为根据所述第一数据包中的序号索引信息判断所述第一媒体信息是否发生丢包。
  35. 根据权利要求27所述的装置,其中,
    所述条件确定部分,配置为执行以下至少之一:
    根据所述第二客户端缓存所述第一媒体信息的缓存时间,确定所述重传请求以不小于所述第一概率阈值在所述缓存时间内达到所述第二客户端 所需的第一网络条件;
    根据所述第一客户端中媒体信息的输出速率,确定所述第二数据包达到所述第一客户端后以不小于所述第二概率阈值被输出所需的第二网络条件。
  36. 根据权利要求27所述的装置,其中,还包括:
    缓冲区调整部分,配置为根据当前实时的网络情况和所述第二去抖动策略得到去抖动参数,根据所述去抖动参数设置用于传输所述语音通话或视频通话的通话数据的缓冲区容量,使音频通话或视频通话的时延符合预期。
  37. 根据权利要求36所述的装置,其中,还包括:
    采集部分,配置为采集离线网络数据,从所述离线网络数据中提取出用于表征网络特征的至少一个网络参数;
    策略确定部分,配置为根据所述至少一个网络参数构建网络模型,根据所述网络模型确定第一去抖动策略;
    策略修正部分,配置为根据用于评估音频通话或视频通话的通话质量的特征参数对所述第一去抖动策略进行修正,得到第二去抖动策略。
  38. 根据权利要求27或36所述的装置,其中,还包括:
    状态确定部分,配置为判断所述第一客户端和所述第二客户端是否处于同时采集到声音的双讲状态;当处于所述双讲状态时,对所述语音通话或所述视频通话进行提升通话质量的特定处理。
  39. 根据权利要求38所述的装置,其中,还包括:
    远端信号获取部分,配置为根据所述第一媒体信息,获取所述第一客户端的远端信号,所述远端信号是根据语音通话的对端发送的声音信号所获得的信号;
    信号叠加部分,配置为对所述远端信号叠加超声波信号,获得叠加所 述超声波信号后的混合信号;
    播放模块,配置为通过扬声器部分播放所述混合信号;
    近端信号获取模块,配置为获取所述第二客户端的近端信号,所述近端信号是通过麦克风部分采集到的声音信号;
    信号确定模块,配置为根据所述超声波信号确定所述混合信号中的第一信号段和所述近端信号中的第二信号段;
    相关值计算模块,计算所述第一信号段与所述第二信号段之间的相关值;
    所述状态确定模块,配置为当所述相关值小于预设的相关值阈值时,确定所述麦克风部分采集到所述近端信号时的通话状态为所述双讲状态。
  40. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至26任一项所述通话方法。
  41. 一种终端,包括:
    存储器,配置为存储计算机可执行指令;
    处理器,与所述存储器连接,配置为通过执行所述计算机可执行指令,实现权利要求1至26任一项所述通话方法。
PCT/CN2017/095309 2016-09-22 2017-07-31 通话方法、装置、计算机存储介质及终端 WO2018054171A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17852241.3A EP3490199B1 (en) 2016-09-22 2017-07-31 Calling method and terminal
US16/208,473 US10693799B2 (en) 2016-09-22 2018-12-03 Calling method and device, computer storage medium, and terminal

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN201610844042.2 2016-09-22
CN201610844042.2A CN107864084B (zh) 2016-09-22 2016-09-22 数据包的传输方法和装置
CN201610940605.8A CN107979482B (zh) 2016-10-25 2016-10-25 一种信息处理方法、装置、发送端、去抖动端、接收端
CN201610940605.8 2016-10-25
CN201610945642.8A CN106506872B (zh) 2016-11-02 2016-11-02 通话状态检测方法及装置
CN201610945642.8 2016-11-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/208,473 Continuation US10693799B2 (en) 2016-09-22 2018-12-03 Calling method and device, computer storage medium, and terminal

Publications (1)

Publication Number Publication Date
WO2018054171A1 true WO2018054171A1 (zh) 2018-03-29

Family

ID=61690767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/095309 WO2018054171A1 (zh) 2016-09-22 2017-07-31 通话方法、装置、计算机存储介质及终端

Country Status (3)

Country Link
US (1) US10693799B2 (zh)
EP (1) EP3490199B1 (zh)
WO (1) WO2018054171A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361776A (zh) * 2018-12-18 2019-02-19 郑州云海信息技术有限公司 云计算系统中消息传输方法和装置
CN114027667A (zh) * 2021-12-01 2022-02-11 慕思健康睡眠股份有限公司 一种在离床状态判定方法、装置、智能床垫及介质
CN114173164A (zh) * 2021-12-18 2022-03-11 杭州视洞科技有限公司 一种基于国标gb28181协议的平滑推流方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073445B (zh) * 2016-11-18 2021-10-22 腾讯科技(深圳)有限公司 基于分布式流计算的背压处理方法和系统
CN110351201B (zh) * 2018-04-04 2021-09-14 华为技术有限公司 一种数据处理方法及装置
US10937418B1 (en) * 2019-01-04 2021-03-02 Amazon Technologies, Inc. Echo cancellation by acoustic playback estimation
US11133888B2 (en) * 2019-05-06 2021-09-28 Qualcomm Incorporated Codec configuration adaptation based on packet loss rate
KR20210123835A (ko) * 2020-04-06 2021-10-14 삼성전자주식회사 네트워크의 상태에 기반한 패킷의 재전송을 수행하는 전자 장치 및 전자 장치의 동작 방법
CN111478826A (zh) * 2020-06-09 2020-07-31 北京大米科技有限公司 丢包率确定方法、数据传输控制方法和数据传输系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7032153B1 (en) * 2000-11-28 2006-04-18 Nortel Networks Limited Dynamic automatic retransmission request in wireless access networks
CN102075312A (zh) * 2011-01-10 2011-05-25 西安电子科技大学 基于视频服务质量的混合选择重传方法
CN102710401A (zh) * 2012-05-30 2012-10-03 陈日清 一种用于高清视频无线传输的跨层自适应失真调制方法
CN102790666A (zh) * 2011-05-17 2012-11-21 华为终端有限公司 差错控制的方法、接收端、发送端和系统
CN103338471A (zh) * 2013-06-27 2013-10-02 南京邮电大学 基于模型的无线多跳网络服务质量指标评价方法

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3499548B1 (ja) * 2002-07-01 2004-02-23 松下電器産業株式会社 受信装置及び通信方法
AU2002350456A1 (en) * 2002-09-24 2004-04-19 Telefonaktiebolaget Lm Ericsson (Publ) Method and devices for error tolerant data transmission, wherein retransmission of erroneous data is performed up to the point where the remaining number of errors is acceptable
CN101114982A (zh) 2006-07-24 2008-01-30 互联天下科技发展(深圳)有限公司 一种基于IP网络的音视频QoS算法
FR2927749B1 (fr) * 2008-02-14 2010-12-17 Canon Kk Procede et dispositif de transmission de donnees, notamment video.
US20140133648A1 (en) * 2008-03-06 2014-05-15 Andrzej Czyzewski Method and apparatus for acoustic echo cancellation in voip terminal
US8300620B1 (en) * 2008-12-29 2012-10-30 Sprint Communications Company L.P. Dynamically tuning a timer mechanism according to radio frequency conditions
CN101656747A (zh) 2009-09-25 2010-02-24 深圳创维数字技术股份有限公司 流媒体数据的传输方法及系统
US8458548B2 (en) * 2009-12-22 2013-06-04 Intel Corporation Adaptive H-ARQ using outage capacity optimization
CN101895466B (zh) 2010-07-02 2013-03-20 北京交通大学 一种降低sctp多路径传输数据包乱序影响的方法
JP5682253B2 (ja) * 2010-11-22 2015-03-11 富士通株式会社 プログラムおよび通信装置
US8750494B2 (en) * 2011-08-17 2014-06-10 Alcatel Lucent Clock skew compensation for acoustic echo cancellers using inaudible tones
US9313250B2 (en) * 2013-06-04 2016-04-12 Tencent Technology (Shenzhen) Company Limited Audio playback method, apparatus and system
CN104808831B (zh) * 2014-01-29 2018-06-22 联发科技(新加坡)私人有限公司 数据共享方法、传送装置与接收装置
US9628411B2 (en) * 2014-02-21 2017-04-18 Dialogic Corporation Efficient packet processing at video receiver in multimedia communications over packet networks
US9516159B2 (en) 2014-11-04 2016-12-06 Apple Inc. System and method of double talk detection with acoustic echo and noise control
CN104579601B (zh) 2014-12-01 2019-02-12 华为技术有限公司 一种重传请求处理方法和装置
DE102015115681A1 (de) * 2015-09-17 2017-03-23 Intel IP Corporation Einrichtung und verfahren zum scheduling einer zuteilung eines satzes von verbindungsressourcen
ITUB20155144A1 (it) * 2015-10-16 2017-04-16 Univ Degli Studi Di Roma La Sapienza Roma ?metodo per gestire in modo adattivo e congiunto la politica di istradamento e la politica di ritrasmissione di un nodo in una rete sottomarina, ed i mezzi per la sua attuazione?
CN105280195B (zh) 2015-11-04 2018-12-28 腾讯科技(深圳)有限公司 语音信号的处理方法及装置
CN105847611B (zh) 2016-03-21 2020-02-11 腾讯科技(深圳)有限公司 一种回声时延检测方法、回声消除芯片及终端设备
CN105872156B (zh) 2016-05-25 2019-02-12 腾讯科技(深圳)有限公司 一种回声时延跟踪方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7032153B1 (en) * 2000-11-28 2006-04-18 Nortel Networks Limited Dynamic automatic retransmission request in wireless access networks
CN102075312A (zh) * 2011-01-10 2011-05-25 西安电子科技大学 基于视频服务质量的混合选择重传方法
CN102790666A (zh) * 2011-05-17 2012-11-21 华为终端有限公司 差错控制的方法、接收端、发送端和系统
CN102710401A (zh) * 2012-05-30 2012-10-03 陈日清 一种用于高清视频无线传输的跨层自适应失真调制方法
CN103338471A (zh) * 2013-06-27 2013-10-02 南京邮电大学 基于模型的无线多跳网络服务质量指标评价方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361776A (zh) * 2018-12-18 2019-02-19 郑州云海信息技术有限公司 云计算系统中消息传输方法和装置
CN114027667A (zh) * 2021-12-01 2022-02-11 慕思健康睡眠股份有限公司 一种在离床状态判定方法、装置、智能床垫及介质
CN114027667B (zh) * 2021-12-01 2023-08-15 慕思健康睡眠股份有限公司 一种在离床状态判定方法、装置、智能床垫及介质
CN114173164A (zh) * 2021-12-18 2022-03-11 杭州视洞科技有限公司 一种基于国标gb28181协议的平滑推流方法
CN114173164B (zh) * 2021-12-18 2023-08-15 杭州视洞科技有限公司 一种基于国标gb28181协议的平滑推流方法

Also Published As

Publication number Publication date
EP3490199A1 (en) 2019-05-29
EP3490199B1 (en) 2021-07-21
US10693799B2 (en) 2020-06-23
EP3490199A4 (en) 2019-11-20
US20190104079A1 (en) 2019-04-04

Similar Documents

Publication Publication Date Title
WO2018054171A1 (zh) 通话方法、装置、计算机存储介质及终端
TWI439086B (zh) 顫動緩衝器調整技術
EP2984790B1 (en) Voip bandwidth management
CN106664161B (zh) 基于冗余的包传输错误恢复的系统和方法
CN101854308B (zh) VoIP系统高音质服务网络自适应实现方法
WO2015160617A1 (en) Jitter buffer control based on monitoring of delay jitter and conversational dynamics
WO2014048127A1 (zh) 语音质量监控的方法和装置
JP2003318966A (ja) リアルタイム通信のためのビットレート制御方法および装置
US20060146805A1 (en) Systems and methods of providing voice communications over packet networks
KR20150023351A (ko) 적응적 실시간 통신을 위한 사용자 상호작용 모니터링 기법
US20080049785A1 (en) Discontinuous transmission of speech signals
CN112821992A (zh) 数据传输方法、装置、电子设备和存储介质
EP2408165B1 (en) Method and receiver for reliable detection of the status of an RTP packet stream
CN109361494B (zh) 一种音频数据的处理方法、装置、设备和存储介质
WO2014134789A1 (zh) 业务中断的处理方法及其装置
EP2158753B1 (en) Selection of audio signals to be mixed in an audio conference
WO2014207978A1 (ja) 送信装置、受信装置および中継装置
CN108540273B (zh) 一种数据包重传的方法和装置
US20230146871A1 (en) Audio data processing method and apparatus, device, and storage medium
WO2009015567A1 (fr) Procédé et système consistant à détecter un attribut de données et un dispositif d&#39;analyse d&#39;attribut de données
CN105827575B (zh) 一种传输控制方法、装置及电子设备
JP2008085822A (ja) 通信端末装置およびパケット送信制御方法
JP4232553B2 (ja) 通信装置、その方法およびプログラム
KR100636278B1 (ko) 브이오아이피 단말의 음성 큐오에스 보장 시스템 및 그 방법
JP2015076846A (ja) 遅延予測器、遅延予測方法およびプログラム、通信装置、通信方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17852241

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017852241

Country of ref document: EP

Effective date: 20190225

NENP Non-entry into the national phase

Ref country code: DE