CN111371957A - Redundancy control method and device, electronic equipment and storage medium - Google Patents

Redundancy control method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111371957A
CN111371957A CN202010452126.8A CN202010452126A CN111371957A CN 111371957 A CN111371957 A CN 111371957A CN 202010452126 A CN202010452126 A CN 202010452126A CN 111371957 A CN111371957 A CN 111371957A
Authority
CN
China
Prior art keywords
audio channel
uplink audio
uplink
mixing
downlink
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010452126.8A
Other languages
Chinese (zh)
Other versions
CN111371957B (en
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Tencent Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010452126.8A priority Critical patent/CN111371957B/en
Publication of CN111371957A publication Critical patent/CN111371957A/en
Application granted granted Critical
Publication of CN111371957B publication Critical patent/CN111371957B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2236Quality of speech transmission monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0041Arrangements at the transmitter end
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants

Abstract

The present application relates to the field of communications technologies, and in particular, to a redundancy control method, apparatus, electronic device, and storage medium, for improving FEC coding efficiency at a sending end and reducing use of a network bandwidth. The method comprises the following steps: acquiring a sound signal of each uplink audio channel; determining the packet loss rate of each uplink audio channel; predicting the audio mixing contribution of each uplink audio channel according to the participation of each sound signal in each downlink audio channel downlink audio mixing signal; and respectively obtaining the target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution degree of each uplink audio channel. According to the method and the device, when the FEC redundancy is determined, the FEC redundancy is controlled in a targeted manner based on the characteristics of multi-person conversation by referring to the audio mixing contribution of each uplink audio channel instead of the packet loss rate, the FEC encoding efficiency of a transmitting end is improved, and the overall multi-person conversation quality and experience are guaranteed to be improved as much as possible under a certain network bandwidth.

Description

Redundancy control method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a redundancy control method and apparatus, an electronic device, and a storage medium.
Background
In practical application, the call quality of multi-person call is mainly affected by network packet loss, and packet loss occurs in the transmission process due to instability of the transmission network, so that the sound at the receiving end is blocked and disconnected, and the experience of a listener is very poor. To combat network packet loss, there are many methods including: FEC (forward error correction), PLC (Packet Loss Concealment), ARQ (Automatic Repeat Request), and the like.
According to the FEC packet loss prevention scheme, an original data packet is encoded through a specific forward error correction code, data after FEC encoding are packaged and sent to audio mixing equipment, and the audio mixing equipment receives the forward error correction code and then decodes the data, so that complete data of a packet loss position can be recovered, and a perfect recovery effect is achieved. However, because additional bandwidth consumption is needed when FEC packet loss resistance is achieved, the higher FEC redundancy is, the stronger the packet loss resistance is, but bandwidth is also increased, and at present, there is no method for reducing bandwidth consumption while effectively controlling FEC redundancy.
Disclosure of Invention
The embodiment of the application provides a redundancy control method, a redundancy control device, an electronic device and a storage medium, which are used for improving the FEC encoding efficiency of a sending end and reducing the use of network bandwidth.
A first redundancy control method provided in an embodiment of the present application includes:
respectively acquiring sound signals of each uplink audio channel, wherein the uplink audio channel, the downlink audio channel and the call participation terminal are in one-to-one correspondence;
respectively determining the packet loss rate of each uplink audio channel;
predicting the audio mixing contribution of each uplink audio channel according to the participation of each sound signal in the downlink audio channel of each channel;
and respectively obtaining the target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution of each uplink audio channel.
A second redundancy control method provided in an embodiment of the present application includes:
obtaining a target FEC redundancy of a corresponding uplink audio channel, wherein the target FEC redundancy is determined according to a mixing contribution and a packet loss rate of the uplink audio channel, the mixing contribution is determined according to the participation of a sound signal of the uplink audio channel in a downlink mixing signal of each downlink audio channel after the server respectively obtains the sound signal of each uplink audio channel, and the uplink audio channel, the downlink audio channel and a call participation terminal in each channel are in one-to-one correspondence;
and performing FEC encoding on the voice-encoded sound signal of the uplink audio channel according to the target FEC redundancy.
The first redundancy control device provided by the embodiment of the application comprises:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for respectively acquiring sound signals of each uplink audio channel, and the uplink audio channel, the downlink audio channel and the call participation terminal of each channel are in one-to-one correspondence;
the determining unit is used for respectively determining the packet loss rate of each uplink audio channel;
a prediction unit, configured to predict a mixing contribution of each uplink audio channel according to a participation of each sound signal in a down-mixing signal in each downlink audio channel;
and the control unit is used for respectively obtaining the target FEC redundancy of each path of uplink audio channel according to the packet loss rate and the audio mixing contribution degree of each path of uplink audio channel.
The second redundancy control device provided by the embodiment of the application comprises:
a redundancy obtaining unit, configured to obtain a target FEC redundancy of a corresponding uplink audio channel, where the target FEC redundancy is determined according to a mixing contribution and a packet loss rate of the uplink audio channel, the mixing contribution is determined by the server according to a participation degree of a downlink mixing signal of each downlink audio channel in each uplink audio channel after the server obtains a sound signal of each uplink audio channel, and each uplink audio channel, the downlink audio channel, and a call participation terminal are in one-to-one correspondence;
and the signal processing unit is used for carrying out FEC coding on the sound signals of the uplink audio channel after the sound coding according to the target FEC redundancy.
An electronic device provided by an embodiment of the present application includes a processor and a memory, where the memory stores program codes, and when the program codes are executed by the processor, the processor is caused to execute any one of the steps of the redundancy control method.
An embodiment of the present application provides a computer-readable storage medium, which includes program code for causing an electronic device to perform any one of the steps of the redundancy control method described above when the program product is run on the electronic device.
The beneficial effect of this application is as follows:
according to the redundancy control method, the redundancy control device, the electronic device and the storage medium provided by the embodiment of the application, the target FEC redundancy of each path of uplink audio channel is determined not directly based on the packet loss rate of each path of uplink audio channel, but by combining the audio mixing contribution of each path of uplink audio channel. Under the condition that the FEC redundancy of each uplink audio channel is directly determined according to the packet loss rate, some sound sources cannot be finally heard by a listener, but higher FEC redundancy can be used. According to the method and the device, the FEC redundancy is controlled in a targeted manner according to the characteristics of multi-person conversation, the FEC encoding efficiency of the sending end is improved, and the whole multi-person conversation quality and experience are improved as much as possible under a certain network bandwidth.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1A is an alternative schematic diagram of a server mixing scheme in an embodiment of the present application;
FIG. 1B is an alternative schematic diagram of a server routing scheme in an embodiment of the present application;
fig. 2 is a schematic diagram of an application scenario in an embodiment of the present application;
FIG. 3 is an alternative schematic diagram of a first redundancy control method in an embodiment of the present application;
fig. 4A is an alternative schematic diagram of another server mixing scheme in the embodiment of the present application;
FIG. 4B is an alternative schematic diagram of another server routing scheme in an embodiment of the present application;
FIG. 5 is an alternative schematic diagram of a second redundancy control method in an embodiment of the present application;
fig. 6A is a schematic diagram of a first alternative interaction implementation timing sequence in the embodiment of the present application;
FIG. 6B is a schematic diagram illustrating a second alternative interactive implementation timing sequence in the embodiment of the present application;
FIG. 7 is a schematic diagram of a first redundancy control apparatus according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a second redundancy control apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application;
fig. 10 is a schematic diagram of a hardware component of a server to which an embodiment of the present application is applied;
fig. 11 is a schematic diagram of a hardware component structure of a terminal device to which an embodiment of the present application is applied.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.
Some concepts related to the embodiments of the present application are described below.
1. Multi-person conversation: the method comprises the steps that multiple parties participating in a call carry out sound signal acquisition and various audio processing through different equipment (terminals), then are packaged through speech coding and network transmission and sent to an audio mixing device (server) through a network, the audio mixing device decodes speech coding data and then carries out sound mixing superposition processing, namely sound mixing processing, and finally the terminals of all participants play sound according to corresponding downlink sound mixing signals.
2. FEC: forward error correction, also called forward error correction code, is a method to increase the reliability of data communication. In a one-way communication channel, once an error is found, its receiver will not be entitled to a transmission again. FEC is a method of transmitting lengthy information using data that will allow the receiver to reconstruct the data when errors occur in the transmission. In the embodiment of the application, forward error correction recovers channel packet loss through a redundant packet coding algorithm, and voice video blocking, screen splash and delay under a weak network environment are reduced. The FEC algorithm mainly used is Reed-Solomon codes (RS codes), Hamming codes, LDPC codes (Low Density Parity Check codes), XOR codes (exclusive OR).
3. Redundancy: in data transmission, the data code is abruptly changed due to attenuation or interference, and the interference resistance of the data code is improved. This necessitates that the length of the binary code of several bits is increased on the basis of the length of the original binary code, so that the corresponding data has a certain redundancy, also called margin. In this embodiment of the present application, the FEC redundancy refers to a redundancy degree when FEC encoding is performed on an audio signal, for example, an exclusive or algorithm is used for FEC, the redundancy is 20%, a ratio of a voice packet to a redundancy packet is 5:1, when encoding is described, a redundancy packet is generated for every 5 voice packets, the 5 voice packets and 1 redundancy packet may be considered as an FEC Group (FEC Group), and when an encoded audio signal is sent to a server, the 5 voice packets and the encoded 1 redundancy packet may be sent in a packet manner; when the server side has voice packet data loss, it can try to recover by using FEC in the current FEC Group.
4. Packet loss rate: packet loss in the network is random, for example, the packet loss rate is 10%, which indicates that 100 packets (including voice packets and redundant packets) are randomly lost by 10. In the embodiment of the present application, when the packet loss rate is low, a lower FEC redundancy is used, and conversely, when the packet loss rate is high, a higher FEC redundancy is used to resist the network packet loss. When the original FEC redundancy is determined based on the packet loss rate, for example, the packet loss rate is 16.67%, and then, distortion-free recovery can be achieved by selecting 20% of redundancy, and the sound quality is good without interruption.
5. Code rate: which is the number of data bits transmitted per unit time during data transmission, is typically in kbps, i.e., kilobits per second. In terms of audio, the higher the code rate, the smaller the compression ratio, and the smaller the sound quality loss, and the closer to the sound quality of the sound source. In the embodiment of the present application, the higher the target FEC redundancy is, the higher the voice coding rate is, and the lower the target FEC redundancy is, the lower the voice coding rate is.
6. Audio channel: in the embodiment of the application, one terminal device corresponds to one audio channel, and specifically can be divided into an uplink audio channel and a downlink audio channel, wherein the uplink audio channel is used for the terminal device to send collected audio data to a server, and the downlink audio channel is mainly used for the server to send information such as audio data after audio mixing or a routing result to the terminal device.
7. Target cumulative smoothed value: in this embodiment of the present application, the target cumulative smooth value is a value corresponding to all downlink audio signals for all downlink audio channels, and refers to a value obtained by smoothing the sum of downlink audio signals of all downlink audio channels within a certain time.
8. Contribution cumulative smoothed value: in the embodiment of the present application, the contribution accumulated smooth value is, for each uplink audio channel, a respective contribution accumulated smooth value corresponding to each uplink audio channel. For any uplink audio channel, the cumulative smooth value of the contribution of the uplink audio channel within a certain time period is determined based on the audio mixing contribution of the sound signal of the uplink audio channel within the time period in the downlink audio mixing signals of other downlink audio channels except the corresponding downlink audio channel within the time period, and the cumulative audio mixing contribution of each sound signal in the downlink audio mixing signals of other downlink audio channels except the corresponding downlink audio channel in the current period is smoothed.
9. The routing state is as follows: the method indicates whether the server selects a certain audio channel to participate in the subsequent mixing processing in the server routing scheme. The server will generally decide the routing of the audio channel based on the voice feature information such as the energy or signal-to-noise ratio of the voice signal, for example, the channel with low energy or low signal-to-noise ratio will probably not be selected, and the channel signal with higher energy and higher signal-to-noise ratio will be selected.
The following briefly introduces the design concept of the embodiments of the present application:
with the development of computer technology, speech processing technology has emerged, which is a generic term for various processing methods such as speech generation process, sound signal analysis, or speech recognition, and is also called digital sound signal processing technology. The development of the voice processing technology brings great convenience to the life and work of people. For example, the user may implement a voice call, such as a double-person call or a multi-person call, through a telecommunication network or the internet.
In a multi-person conversation scene, because the effective identification capability of human ears on mixed signals from different sound sources at the same time is relatively limited, the human ears can only identify simultaneous speaking sounds below 4 persons in general, and when the speakers reach or exceed 4 persons at the same time, the human ears of the mixed sound are difficult to identify, and the sound is disordered and unclear. In order to solve the problem, a server sound mixing scheme or a server routing scheme for multi-person conversation performs weighting processing or routing screening processing on sound signals from different participants, so that limited paths of sounds are highlighted, and the phenomenon that some non-essential or interfering sound signals are mixed to further influence the listening effect of human ears is avoided. For example, in the routing scheme, 50 calls, of which 10 have utterances, if the preset maximum number of routing parties is 3, only 3 parties of voices are finally selected at each time, and the rest of call data which are not selected are not forwarded to the receiving end.
The following describes a server mixing scheme and a server routing scheme in the related art solutions, respectively.
Fig. 1A is a schematic diagram of a server mixing scheme in an embodiment of the present application. In fig. 1A, each participant acquires a digital audio signal through a sound collection device, performs speech coding and FEC coding, performs network packing on the coded data, and transmits the data to a server, and the server receives a relevant data packet, and after losing the data packet through an FEC recovery portion, performs speech decoding to obtain a PCM (Pulse Code Modulation) linear audio signal, and performs multi-channel sound mixing processing on the audio signals of each uplink audio channel according to a mixing algorithm. The downlink audio mixing signals corresponding to each downlink audio channel are sent to each participant after being subjected to secondary voice coding and network data packaging by the server, and each participant device (call participation terminal) receives the downlink audio mixing signals which are sent by the server and subjected to secondary voice coding, decodes and plays the signals.
Fig. 1B is a schematic diagram of a server routing scheme in the embodiment of the present application. Different from the multi-user server sound mixing scheme shown in fig. 1A, the routing scheme does not need to decode and secondarily encode the encoded data of the sound signals sent by each participating party, and needs to extract some voice feature information of the sound signals required by routing at the sending end, and pack and send the voice code stream formed by the extracted voice feature information and the sound signals to the server, the server decides which audio channels will finally participate in the call according to the voice feature information of the sound signals of each uplink audio channel, i.e. selected by the routing algorithm, and which audio channels will not finally participate in the call, i.e. not selected by the routing algorithm, the voice code stream of the selected audio channel is forwarded to the receiving end, and the sound signals encoded by a plurality of selected uplink audio channels at the receiving end are subjected to sound mixing processing after being subjected to voice decoding to obtain downlink sound mixing signals, and finally playing the down-mixing signal.
However, in the server mixing scheme illustrated in fig. 1A and the server routing scheme illustrated in fig. 1B, the FEC redundancy at the transmitting end is a packet loss rate detected only depending on the packet loss of the server, and a lower FEC redundancy is used when the packet loss rate is lower, and a higher FEC redundancy is used when the packet loss rate is higher, so as to resist the network packet loss, but some sound sources are not finally heard by the listener in reality, and at this time, the FEC anti-packet loss is ineffective and consumes network bandwidth resources. When the number of the participants for multi-person conversation is large, for example, hundreds of very large voice conferences, the consumption of the network bandwidth in this case may significantly increase the service operation cost.
In view of this, embodiments of the present application provide a redundancy control method, an apparatus, an electronic device, and a storage medium, which determine, for a special application scenario of multi-person call, a sound mixing contribution of an uplink audio channel on each path based on multi-person call sound mixing contribution and route selection state tracking prediction, and further adjust a target FEC redundancy of a sending end based on the sound mixing contribution. Based on the method in the embodiment of the application, the uplink audio channel which is inactive in a certain time uses a lower target FEC redundancy, even FEC is turned off (the target FEC redundancy is 0), and the uplink audio channel with higher activity in a certain time is given a higher target FEC redundancy, so that the quality of sound of the uplink audio channel is ensured, the problem that the experience of all listeners is influenced by the packet loss of an uplink network of a speaking party is avoided, the network bandwidth is saved, and the cost of users and operators is saved.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are merely for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Fig. 2 is a schematic view of an application scenario according to an embodiment of the present application. The application scenario diagram includes four terminal devices 210 and one server 220. The terminal device 210 is a call participant terminal, and the terminal device 210 and the server 220 can communicate with each other through a communication network.
In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 210 and the server 220 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
In this embodiment, the terminal device 210 is an electronic device used by a user, and the electronic device may be a computer device having a certain computing capability and running instant messaging software and a website or social contact software and a website, such as a personal computer, a mobile phone, a tablet computer, a notebook, an e-book reader, and the like. Each terminal device 210 is connected to the server 220 through a wireless Network, and the server 220 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
The redundancy control method in the embodiment of the application is mainly applied to a multi-person call application scenario, for example, a user a creates a multi-person call task through a terminal device 210 and triggers a call creation instruction. After the user B, the user C, and the user D respectively obtain the call creation instruction through the corresponding terminal device 210, add the call creation instruction to the multi-user call created by the user a based on the call creation instruction according to the call creation instruction. Then, each terminal device 210 may collect a sound signal generated when the local user participates in the call, and after performing speech coding and FEC coding, send the sound signal to the server 220, and the server 220 performs corresponding processing based on a mixing algorithm or a routing algorithm. In the above process, the terminal device 210 needs to continuously collect the sound signal and send the sound signal to the server 220, and the server also needs to continuously send the downlink audio mixing signal of each downlink audio channel or the sound signal of the selected uplink audio channel represented by the routing state to the corresponding terminal device 210. It should be noted that the four terminal devices listed above are only examples, and actually there may be more terminal devices.
As shown in fig. 3, which is an implementation flowchart of a redundancy control method provided in the embodiment of the present application, applied to a server, the method includes the following specific implementation flows:
s31: respectively acquiring sound signals of each uplink audio channel, wherein the uplink audio channel, the downlink audio channel and the call participation terminal are in one-to-one correspondence;
in the embodiment of the application, because the sound signal is a quasi-steady-state signal, the sound signal is processed by frames at times in the processing, each frame is about 20ms-30ms in length, and the sound signal is regarded as a steady-state signal in the interval. The call participant terminal needs to continuously collect the voice signal generated by the call member during the call, and actually this step is performed periodically, where the voice signal in one period refers to a plurality of continuous frames of voice signals generated in a period of time, such as a continuous 3-frame voice signal, and each frame may also be regarded as one period, which is not specifically limited in the embodiment of the present application.
In a multi-person conversation scene, each terminal device participating in the conversation corresponds to an uplink audio channel and a downlink audio channel, the uplink audio channels are used for sending sound signals collected by the terminal devices to a server, and after sound mixing or route selection is carried out through the server, sound mixing results or route selection results are sent to the terminal devices through the downlink audio channels. And the terminal equipment obtains the downlink audio mixing signal of the corresponding downlink audio channel according to the audio mixing result or the routing result returned by the server, and performs voice playing according to the downlink audio mixing signal.
For example, the number of the members participating in the current multi-person call is 5, the 5 members respectively use different terminal devices to participate in the call, and the value of the terminal device i is 1-5 corresponding to the uplink audio channel i and the downlink audio channel i.
S32: respectively determining the packet loss rate of each uplink audio channel;
after receiving the sound signal sent by each terminal device through the corresponding uplink audio channel, the server performs packet loss detection on the received sound signal to obtain a packet loss rate of the uplink audio channel in each path, for example, the packet loss rate of the uplink audio channel 1 is 10%, the packet loss rate of the uplink audio channel 2 is 20%, the packet loss rate of the uplink audio channel 3 is 15%, the packet loss rate of the uplink audio channel 4 is 25%, and the packet loss rate of the uplink audio channel 5 is 30%.
In this embodiment of the present application, the FEC redundancy of each uplink audio channel determined based on the packet loss rate may be referred to as an original FEC redundancy, and when the original FEC redundancy of each uplink audio channel is determined based on the packet loss rate, it is generally ensured that the redundancy is not less than the packet loss rate. Assuming that the packet loss rate of the uplink audio channel 1 is 10%, determining the original FEC redundancy of the uplink audio channel 1 according to the packet loss rate to be 15%; when the packet loss rate of the audio signal of the uplink audio channel 2 is 20%, it is determined that the original FEC redundancy of the uplink audio channel 2 may be 25% according to the packet loss rate, and the same may be true for other uplink audio channels, assuming that the original FEC redundancy of the uplink audio channel 3 is 20%, the original FEC redundancy of the uplink audio channel 4 is 30%, and the original FEC redundancy of the uplink audio channel 5 is 35%.
S33: predicting the audio mixing contribution of each uplink audio channel according to the participation of each sound signal in the downlink audio channel of each channel;
s34: and respectively obtaining the target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution degree of each uplink audio channel.
The process of obtaining the target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution of each uplink audio channel is actually a process of adjusting the original FEC redundancy determined based on the packet loss rate according to the audio mixing contribution.
In an optional implementation manner, for any one path of uplink audio channel, if the audio mixing contribution of the path of uplink audio channel is smaller than a preset threshold, the original FEC redundancy of the path of uplink audio channel determined based on the packet loss rate is reduced, and the target FEC redundancy of the path of uplink audio channel is obtained.
Assuming that the preset threshold is 0.5, the corresponding mixing contributions of the upstream audio channel 1 to the upstream audio channel 5 are 0.5, 0.6, 0.2, 0.25, and 0.7, respectively, and it is known that there are an upstream audio channel 3 and an upstream audio channel 4 whose mixing contributions are lower than the preset threshold, and at this time, the original FEC redundancy of the two upstream audio channels needs to be reduced to obtain the corresponding target FEC redundancy.
For example, the original FEC redundancy of the uplink audio channel 3 determined only according to the packet loss rate is 20%, and the target FEC redundancy determined according to the audio mixing contribution is 8%; the original FEC redundancy of the downlink audio channel 4 determined only according to the packet loss rate is 30%, and the target FEC redundancy determined according to the audio mixing contribution rate is 15%.
In addition, when the mixing contribution of the uplink audio channel is not less than the preset threshold, the target FEC redundancy and the original FEC redundancy of the uplink audio channel may be kept consistent, even if the original FEC redundancy determined by the packet loss rate is kept unchanged. Considering that when the audio mixing contribution of the uplink audio channel is high, it indicates that the aggressiveness of the uplink audio channel participating in the call in the period of the current period is high, and the participation degree is high, so that the sound signal of the uplink audio channel is more desirable to be received by the listener finally, at this time, in order to ensure the quality of the uplink audio channel, when the audio mixing contribution of the uplink audio channel is higher than the preset threshold, it may be ensured that the target FEC redundancy is higher than the original FEC redundancy determined based on the packet loss rate, that is, if the audio mixing contribution of any one uplink audio channel is greater than the preset threshold, the original FEC redundancy of the uplink audio channel determined based on the packet loss rate is increased, and the target FEC redundancy of the uplink audio channel is obtained.
For example, the original FEC redundancy of the uplink audio channel 2 determined only according to the packet loss rate is 25%, and the target FEC redundancy determined according to the audio mixing contribution is 30%; the original FEC redundancy of the downlink audio channel 5 determined only according to the packet loss rate is 35%, and the target FEC redundancy determined according to the audio mixing contribution rate is 49%.
It should be noted that the process of adjusting the original FEC redundancy to obtain the target FEC redundancy may be implemented based on a monotonically increasing target function, which will be described in detail below.
In addition, steps S31 to S33 in this embodiment are a loop, and a new sound signal is generated in the next time period (next period) as time goes by, at this time, steps S31 to S33 may be repeated, and for the newly generated sound signal, packet loss detection and prediction of the mixing contribution degree are further performed, and the target FEC redundancy of the uplink audio channel is continuously updated.
In the above embodiment, the target FEC redundancy of each uplink audio channel is not directly determined based on the packet loss rate of each uplink audio channel, but is determined in combination with the audio mixing contribution of each uplink audio channel, so as to reduce the original FEC redundancy of the uplink audio channel with lower audio mixing contribution within a certain time, and further obtain the target FEC redundancy smaller than the original FEC redundancy, even without redundancy, thereby effectively reducing the use of network bandwidth; in addition, under the condition of higher mixing contribution, the original FEC redundancy can be increased, and higher target FEC redundancy can be obtained. According to the characteristics of multi-person conversation, the FEC redundancy is controlled in a targeted manner, so that when the terminal equipment at the sending end carries out FEC coding based on the target FEC redundancy, the FEC coding efficiency of the sending end can be effectively improved, and the whole multi-person conversation quality and experience are guaranteed to be improved as much as possible under a certain network bandwidth.
In this embodiment of the present application, the target FEC redundancy of each uplink audio channel is obtained according to the packet loss ratio and the audio mixing contribution of each uplink audio channel, and may be specifically divided into the following two ways:
the first method is as follows: the target FEC redundancy is calculated by the server.
Optionally, the server obtains a target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution of each uplink audio channel; and respectively sending the target FEC redundancy of each path of uplink audio channel to the call participation terminal of the corresponding uplink audio channel.
In this manner, when the target FEC redundancy of any one uplink audio channel changes, the server may send the target FEC redundancy of the uplink audio channel to the terminal device corresponding to the uplink audio channel, so that the terminal device performs FEC encoding on the audio signal of the uplink audio channel according to the received target FEC redundancy. If the target FEC redundancy of a certain uplink audio channel calculated in the current period does not change, the target FEC redundancy does not need to be sent to the terminal device, and the terminal device may continue to encode the audio signal according to the target FEC redundancy obtained last time.
For example, the current period is the second period, the target FEC redundancy of the uplink audio channel 1 is 20% in the first period, and the target FEC redundancy of the uplink audio channel 1 is still 20% in the second period, at this time, the target FEC redundancy of the uplink audio channel 1 does not change, and it is not necessary to resend the target FEC redundancy of the uplink audio channel 1 in the current period to the terminal device 1, and the terminal device 1 still performs FEC encoding using the target FEC redundancy used in the first period.
In the above embodiment, the server directly calculates the target FEC redundancy of each uplink audio channel, and sends the target FEC redundancy to each call participant terminal when the redundancy changes, so that the FEC coding efficiency of the sending terminal can be significantly improved, and the overall multi-user call quality and experience can be improved as much as possible under a certain network bandwidth.
The second method comprises the following steps: the target FEC redundancy is calculated by the terminal device.
Optionally, the server sends the packet loss rate and the audio mixing contribution rate of each uplink audio channel to the call participation terminal corresponding to the uplink audio channel, so that the call participation terminal corresponding to each uplink audio channel determines a target FEC redundancy according to the received packet loss rate and the received audio mixing contribution rate, performs FEC encoding on the voice-encoded voice signal according to the determined target FEC redundancy, and then packetizes the voice-encoded voice signal and sends the voice-encoded voice signal to the server.
In this way, if the packet loss rate or the audio mixing contribution of any one uplink audio channel in the current period changes compared with the previous period, the data with the change sent is sent to the terminal device corresponding to the uplink audio channel, so that the terminal device updates the target FEC redundancy according to the changed data. When the terminal device does not receive the changed packet loss rate or the audio mixing contribution degree sent by the server, the target FEC redundancy does not need to be recalculated.
For example, in the first period, the packet loss rate of the uplink audio channel 1 is 20%, the audio mixing contribution degree is 0.4, and the calculated target FEC redundancy is 20%, in the second period, the server calculates that neither the packet loss rate nor the audio mixing contribution degree of the uplink audio channel changes, and at this time, the calculated target FEC redundancy is still 20%, so the server does not need to resend the packet loss rate and the audio mixing contribution degree of the uplink audio channel 1 in the current period to the terminal device 1, the terminal device 1 still performs FEC encoding using the target FEC redundancy used in the first period, and at this time, the terminal device 1 does not need to recalculate the target FEC redundancy.
In the above embodiment, the respective terminal devices calculate the corresponding target FEC redundancy, so that the efficiency of FEC regulation and control can be effectively improved, the server side does not need to calculate the target FEC redundancy, and can directly execute the subsequent process, thereby effectively reducing the time delay. In addition, the FEC coding efficiency of the sending end can be obviously improved, and the whole multi-user call quality and experience are improved as much as possible under a certain network bandwidth.
Compared with the schemes shown in fig. 1A and 1B, in the embodiment of the present application, rather than calculating the FEC redundancy according to the packet loss rate, the target FEC redundancy is determined by combining the prediction of the mixing contribution of each uplink audio channel, and the following describes in detail the process of predicting the mixing contribution of each audio channel by combining the server mixing scheme listed in fig. 4A and the server routing scheme listed in fig. 4B:
in the embodiment of the application, for a server sound mixing scheme, the process of obtaining the downlink sound mixing signal of each downlink audio channel is tracked and predicted mainly based on the fact that the server performs sound mixing processing on the sound signal of each uplink audio channel, and in the server sound mixing scheme, the server performs weighting processing on the sound signals from different uplink audio channels, so that the sound mixing contribution degree of each uplink audio channel can be predicted according to the sound mixing contribution of each uplink audio channel during sound mixing. The following introduces a server mixing scheme and a server routing scheme in different cases:
for a server mixing scheme, for example, as shown in fig. 4A, which is a schematic diagram of another server mixing scheme in this embodiment of the present application, compared with fig. 1A, a target FEC redundancy of a sending end is not only controlled by a server receiving a packet loss ratio, but also cooperatively controlled by a tracking prediction result of a down-mix signal in the server mixing scheme shown in fig. 4A, and according to a ratio of a sound signal of each uplink audio channel to a finally determined energy of the down-mix signal, a participation degree of each sound signal in each downlink audio channel is determined according to a finally determined energy of the down-mix signal, so as to obtain a mixing contribution degree of each uplink audio channel, and according to the mixing contribution degree of each uplink audio channel, an original FEC redundancy determined based on the packet loss ratio is further adjusted, so as to ensure that when the ratio of the sound signal of a certain uplink audio channel to the final energy of the down-mix signal is higher, so that the target FEC redundancy can be determined to remain even slightly higher than the original FEC redundancy determined based on the packet loss rate alone; on the contrary, when the energy of the audio signal of a certain uplink audio channel in the downlink mixed signal is low, the determined target FEC redundancy may be smaller than the original FEC redundancy determined based on the packet loss rate only, or even no FEC redundancy is performed.
In an alternative embodiment, the mixing contribution of each uplink audio channel is predicted according to the participation of each sound signal in the downmix signal of each downlink audio channel, which specifically includes the following steps:
respectively acquiring downlink audio mixing signals of each downlink audio channel, wherein any downlink audio mixing signal is obtained by mixing audio signals of other uplink audio channels except the audio signal of the corresponding uplink audio channel; and predicting the mixing contribution degree of each uplink audio channel according to the contribution of the sound signal of each uplink audio channel in each downlink mixing signal.
For any downlink audio channel, the specific calculation method of the corresponding downlink audio mixing signal can be seen in formula 1:
Figure 513577DEST_PATH_IMAGE001
(ii) a (formula 1)
Wherein the content of the first and second substances,
Figure 696296DEST_PATH_IMAGE002
is the mixed output signal of the jth party, i.e. the down-mix signal of the jth downstream audio channel,
Figure 538350DEST_PATH_IMAGE003
is the input signal of the ith party, i.e. the sound signal of the ith upstream audio channel. t represents the time, M is the number of effective voice parties participating in the multi-person call,
Figure 762921DEST_PATH_IMAGE004
mixing weights for the i-th input, i.e. the i-th upstream channel when calculating the downstream mixing signal of the j-th downstream channelThe mixing weight corresponding to the sound signal of (1).
In the embodiment of the present application, for the jth downlink audio channel, the corresponding down-mix signal is obtained by mixing the sound signals of the other uplink audio channels except the sound signal of the jth uplink audio channel, that is, the sound signals of the remaining M-1 uplink audio channels except the i = j uplink audio channel are multiplied by the corresponding mixing weight
Figure 108451DEST_PATH_IMAGE004
And then overlapping.
Wherein, the mixing weight of each uplink audio channel
Figure 196493DEST_PATH_IMAGE004
There are different ways to perform the calculations, which are illustrated below:
the first method is an average weight method: linearly superposing the PCM sound signals of each uplink audio channel and then averaging, namely mixing weight
Figure 525843DEST_PATH_IMAGE005
Method two, the alignment weight method: respectively calculating respective maximum absolute value of sound signal sampling values of each uplink audio channel
Figure 52639DEST_PATH_IMAGE006
And calculating the maximum absolute value of each downlink audio channel after linear superposition of downlink mixed signals
Figure 987097DEST_PATH_IMAGE007
Where T is the start time of a frame of downmix signal,
Figure 806893DEST_PATH_IMAGE008
the length of the frame is, the mixing weight of each uplink audio channel is:
Figure 826801DEST_PATH_IMAGE009
; (formula 2)
Wherein the content of the first and second substances,
Figure 891709DEST_PATH_IMAGE010
for adjusting the value of the final output mixing result, if taking into account
Figure 680674DEST_PATH_IMAGE011
May be greater than the limit value
Figure 172835DEST_PATH_IMAGE012
At this time, only can get
Figure 243821DEST_PATH_IMAGE013
Otherwise overflow will result, that is to say LjIs a coefficient for preventing the amplitude of the down-mixed signal after mixing from exceeding the digital range of 2^ (Q-1), wherein Q represents the number of bits of the sound signal or the down-mixed signal code, and is generally 16 bits.
Based on the above formula, for any one uplink audio channel, the mixing contribution of the sound signal of the uplink audio channel in a certain downlink mixing signal can be expressed as the product of the sound signal of the uplink audio channel and the corresponding mixing weight, and the calculation formula can be referred to as formula 3:
Figure 50103DEST_PATH_IMAGE014
(ii) a (formula 3)
In the embodiment of the application, the server may periodically obtain the downmix signal of each downlink audio channel according to the sound signal of each uplink audio channel and send the downmix signal to the corresponding call participating terminal. Considering that the real-time calculation result of the voice features has larger fluctuation in the actual process, corresponding smoothing processing can be performed, and the influence on final judgment under parameter fluctuation is avoided. Therefore, when the mixing contribution degree of each uplink audio channel is periodically calculated, the calculation result of the current period can be smoothed based on the calculation result of the previous period, so as to obtain the corresponding mixing contribution degree.
In an alternative embodiment, the method for predicting the mixing contribution degree of each uplink audio channel according to the contribution of each uplink audio channel in each downmix signal includes the following specific steps:
and smoothing the sum of the downlink audio mixing signals of each downlink audio channel in the current period according to the target accumulated smooth values corresponding to the downlink audio mixing signals of all the downlink audio channels in the previous period to obtain the target accumulated smooth values corresponding to the downlink audio mixing signals of all the downlink audio channels in the current period, wherein the target accumulated smooth values corresponding to the downlink audio mixing signals of all the downlink audio channels in the first period are the sum or a first preset value of the downlink audio mixing signals of all the downlink audio channels in the first period.
And smoothing the cumulative audio mixing contribution of each sound signal in the current period in the downlink audio mixing signals of other downlink audio channels except the corresponding downlink audio channel according to the cumulative contribution smooth value corresponding to the sound signal of each uplink audio channel in the previous period, and acquiring the cumulative contribution smooth value corresponding to the sound signal of each uplink audio channel in the current period, wherein the cumulative contribution smooth value corresponding to each uplink audio channel in the first period is as follows: the first period is the mixing contribution or the second preset value of each sound signal in the down-mixing signals of other down-audio channels except the corresponding down-audio channel.
And finally, respectively taking the ratio of the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel in the current period to the target accumulated smooth value corresponding to the downlink audio mixing signals of all the downlink audio channels in the current period as the audio mixing contribution degree of each uplink audio channel in the current period.
In addition, in addition to setting the target cumulative smooth value or the contribution cumulative smooth value in the first period as the preset value, special processing may also be performed on a period of time from the beginning in the actual process, for example, the target cumulative smooth value corresponding to the downmix signal or the contribution cumulative smooth value corresponding to the audio signal in the previous N periods is set as the preset value. Or, in the first N periods, the target cumulative smooth value corresponding to the down-mix signals of all the down-audio channels in each period is the sum of the down-mix signals of all the down-audio channels in the current period, and the contribution cumulative smooth value corresponding to the sound signal of each up-audio channel is the mixing contribution of each sound signal in the current period in the down-mix signals of other down-audio channels except the corresponding down-audio channel. For example, if the first N periods correspond to 100 frames, i.e., 100 frames later, the smoothing process is not performed, or the smoothing process is performed quickly within 100 frames.
The target cumulative smooth value corresponding to the downmix signals of all the downlink audio channels in the current period can be calculated by the following formula 4:
Figure 959153DEST_PATH_IMAGE015
(ii) a (formula 4)
When calculating the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel, for any uplink audio channel, the contribution accumulated smooth value corresponding to the sound signal of the uplink audio channel in the current period can be calculated by the following formula 5:
Figure 622216DEST_PATH_IMAGE016
(ii) a (formula 5)
In the case of the equations 4 and 5,
Figure 616717DEST_PATH_IMAGE017
accumulating the smooth value for the target corresponding to the downmix signals of all the downlink audio channels of the current period,
Figure 256383DEST_PATH_IMAGE018
accumulating the smooth value for the target corresponding to the downmix signals of all the downlink audio channels of the previous period,
Figure 285519DEST_PATH_IMAGE019
is the down-mixing signal of the j-th down-audio channel of the current period,
Figure 385062DEST_PATH_IMAGE020
accumulating the smooth value for the contribution corresponding to the sound signal of the ith uplink audio channel in the current period,
Figure 601280DEST_PATH_IMAGE021
for the audio mixing contribution of the sound signal of the ith uplink audio channel in the audio mixing signal of the jth downlink audio channel in the current period, the value ranges of i and j are both 1-M, M is the number of the call participation terminals, and β is a first preset parameter with the value range of (0, 1).
Wherein, the sum of all downlink audio channel downlink mixed sound signals is:
Figure 811681DEST_PATH_IMAGE022
for the ith uplink audio channel, the mixing contribution of the ith uplink audio channel can be expressed by the following formula 6:
Figure 695324DEST_PATH_IMAGE023
(ii) a (formula 6)
By the above embodiment, the phenomenon that a higher FEC redundancy is still adopted under the condition that the audio signal mixing participation of some uplink audio channels is lower, and more network bandwidth resources are consumed but corresponding experience is not improved can be effectively avoided.
In addition, it should be noted that, in this embodiment of the present application, the server may also directly issue the audio mixing result of each uplink audio channel to the corresponding terminal device, and the terminal device calculates the audio mixing contribution degree based on the listed formula 4, formula 5, and formula 6.
In the embodiment of the application, for the server routing scheme, the routing state of each uplink audio channel is tracked based on the server, and the corresponding mixing contribution degree is predicted based on the routing state of each uplink audio channel.
For example, as shown in fig. 4B, which is a schematic diagram of another server routing scheme in this embodiment of the present application, compared with fig. 1B, a target FEC redundancy of a sending end is not only controlled by a server receiving a packet loss rate, but also cooperatively controlled by a routing state tracking prediction result in the server routing scheme shown in fig. 4B, and a sound mixing contribution degree of an uplink audio channel of each path may be determined according to a routing state of the uplink audio channel of each path, and then an original FEC redundancy determined based on the packet loss rate is further adjusted according to the sound mixing contribution degree of the uplink audio channel of each path, so that when a selected probability prediction value of a certain uplink audio channel is greater than a certain threshold, the target FEC redundancy obtained after adjustment may be kept even slightly higher than the original FEC redundancy determined based on the packet loss rate; on the contrary, when the selected probability prediction value of a certain uplink audio channel is smaller than a certain threshold, the target FEC redundancy obtained after adjustment can be smaller than the original FEC redundancy determined only based on the packet loss rate, or even the FEC redundancy is not performed.
The selection probability prediction value may be determined according to voice feature information of a voice signal of an uplink audio channel, where the voice feature information may refer to energy or signal-to-noise ratio of the voice signal, for example, if the voice signal energy of a certain uplink audio channel is larger or the signal-to-noise ratio is higher, the probability that the uplink audio channel is selected is larger, and the selection prediction probability value is larger at this time. When the sound signal energy of a certain uplink audio channel is lower or the signal-to-noise ratio is smaller, the probability that the uplink audio channel is selected is smaller, and the prediction probability value of the selection is smaller.
Through the implementation mode, the phenomenon that certain uplink audio channels still adopt higher FEC redundancy under the condition of not being selected, more network bandwidth resources are consumed, and corresponding experience is not improved can be effectively avoided.
In an alternative embodiment, the mixing contribution of each uplink audio channel is predicted according to the participation of each sound signal in the downmix signal of each downlink audio channel, which specifically includes the following steps:
analyzing the voice characteristic information of each voice signal to obtain a routing state of each uplink audio channel, wherein the routing state is used for indicating whether the server selects the voice signal of the uplink audio channel to participate in sound mixing processing; and predicting the sound mixing contribution degree of each uplink audio channel according to the routing state of each uplink audio channel.
When the routing state of each uplink audio channel is determined according to the speech feature information of each sound signal, a selection probability prediction value can be determined based on the speech feature information according to the manner listed in the above embodiment, and when it is predicted that the selection probability prediction value of a certain uplink audio channel is greater than a certain threshold, it can be determined that the routing state representation corresponding to the uplink audio channel is selected, that is, when sound mixing is performed in the current period, the sound signal of the uplink audio channel participates in sound mixing. Otherwise, when the predicted value of the selection probability of a certain uplink audio channel is predicted to be not greater than a certain threshold, it can be determined that the routing state corresponding to the uplink audio channel indicates that the channel is not selected, that is, when sound mixing is performed in the current period, the sound signal of the uplink audio channel does not participate in sound mixing.
In the embodiment of the application, the server periodically determines the routing state of each uplink audio channel according to the sound signal of each uplink audio channel, outputs the judgment result of whether the ith uplink audio channel is selected or not based on the routing algorithm, and selects the definition result
Figure 404916DEST_PATH_IMAGE024
Not selected definition result
Figure 170747DEST_PATH_IMAGE025
. And sending the sound signal of the uplink audio channel corresponding to the selected routing state to the corresponding call participation terminal, so that each call participation terminal obtains the downlink audio mixing signal of the corresponding downlink audio channel.
Suppose that in the current period, for the upstream audio channel 1 and the upstream audio channel 2, the two upstream audio channels do not participate in the audio mixing, i.e. when i =1, 2
Figure 653681DEST_PATH_IMAGE026
When i =3, 4, 5
Figure 391829DEST_PATH_IMAGE024
When determining the downmix signals of the downlink audio channel 1 and the downlink audio channel 2, it is required to mix the sound signals of the uplink audio channel 3 to the uplink audio channel 5. Similarly, when determining the downmix signal corresponding to the downlink audio channel 3, it is necessary to mix the sound signals of the uplink audio channel 4 and the uplink audio channel 5. When determining the downmix signal corresponding to the downstream audio channel 4, it is necessary to mix the sound signals of the upstream audio channel 3 and the upstream audio channel 5. When determining the downmix signal corresponding to the downstream audio channel 5, it is necessary to mix the sound signals of the upstream audio channel 3 and the upstream audio channel 4.
In an optional implementation manner, according to the routing state of each uplink audio channel, the mixing contribution of each uplink audio channel is predicted, and the specific process is as follows:
for any path of uplink audio channel, if the path selection state representation of the uplink audio channel is selected, smoothing the path selection state smooth value of the uplink audio channel in the previous period according to a second preset parameter to obtain the path selection state smooth value of the uplink audio channel in the current period, and taking the path selection state smooth value as the sound mixing contribution degree of the uplink audio channel in the current period; and the value ranges of the second preset parameters are (0, 1).
Optionally, the following formula 7 may be used to calculate and determine the smooth value of the routing state of the uplink audio channel in the current period:
Figure 36437DEST_PATH_IMAGE027
(ii) a (formula 7)
Wherein the content of the first and second substances,
Figure 811537DEST_PATH_IMAGE028
the smooth value of the routing state of the uplink audio channel in the current period is obtained,
Figure 567003DEST_PATH_IMAGE029
for the smooth value of the routing state of the uplink audio channel in the previous period, α is a second preset parameter, and the value range is 0-1, for example, α = 0.1.
If the path selection state of the uplink audio channel indicates that the uplink audio channel is not selected, smoothing the path selection state smooth value of the uplink audio channel in the previous period according to a third preset parameter to obtain the path selection state smooth value of the uplink audio channel in the current period, and taking the path selection state smooth value as the sound mixing contribution degree of the uplink audio channel in the current period; the value range of the third preset parameter is also (0, 1), and the third preset parameter is greater than the second preset parameter.
Alternatively, the smooth value of the routing state of the uplink audio channel in the current period can be determined by the following formula 8:
Figure 894079DEST_PATH_IMAGE030
(ii) a (formula 8)
η is a third preset parameter, and the value range is 0-1, for example, η = 0.98.
In addition, the route selection state smooth value corresponding to the uplink audio channel in the first period can also be directly assigned to a third preset value or a fourth preset value, and the fourth preset value can be smaller than the third preset value. Similar to the server mixing scheme, in addition to directly assigning the smooth value of the routing state in the first period, the smooth values in the first N periods may also be directly assigned, for example, when the first N periods include 100 frames, the smoothing process is performed after 100 frames, or the smoothing process is performed quickly in 100 frames.
Through the embodiment, the result of the current period is subjected to smoothing processing based on the result of the previous period, so that the influence of parameter fluctuation can be effectively avoided, and the accuracy is improved.
In addition, it should be noted that, in this embodiment of the present application, the server may also directly issue the routing state of each uplink audio channel to the corresponding terminal device, and the terminal device calculates the smooth value of the routing state based on the listed formula 7 or formula 8, so as to determine the audio mixing contribution degree.
In an optional implementation manner, the target FEC redundancy of each uplink audio channel is respectively obtained according to the packet loss ratio and the audio mixing contribution of each uplink audio channel, and the specific process is as follows:
respectively obtaining an adjusting parameter corresponding to each uplink audio channel according to the audio mixing contribution degree of each uplink audio channel, wherein the adjusting parameter is a function value corresponding to a target function when the audio mixing contribution degree of the uplink audio channel is taken as an input parameter of the target function, the target function is a monotone increasing function, and the function value corresponding to the target function is 1 when the input parameter of the target function is a preset threshold; and taking the product of the adjustment parameter corresponding to the uplink audio channel of each path and the original FEC redundancy determined based on the corresponding packet loss rate as the target FEC redundancy of the uplink audio channel of each path.
The objective function is a monotonically increasing function, such as a linear function f (x) = ax + b (a > 0). In the embodiment of the present application, taking a server audio mixing scheme as an example, the target FEC redundancy of the ith uplink audio channel in the current period may be represented as formula 9:
Figure 506326DEST_PATH_IMAGE031
(ii) a (formula 9)
Wherein the content of the first and second substances,
Figure 450012DEST_PATH_IMAGE032
the target FEC redundancy for the ith upstream audio channel in the current cycle,
Figure 540327DEST_PATH_IMAGE033
for the mixing contribution of the ith upstream audio channel in the current period,
Figure DEST_PATH_IMAGE034
the original FEC redundancy for the ith upstream audio channel in the current period.
In the above embodiment, the objective function f (x) is a monotonically increasing function, which can ensure that the target FEC redundancy is increased as the mixing contribution ratio value is increased, and further the speech coding code rate is also increased as the mixing contribution ratio value is increased, where the mixing contribution ratio value is a ratio of the contribution cumulative smooth value of the ith uplink audio channel to the target cumulative smooth values corresponding to all downlink audio channels, that is, a representation manner of the mixing contribution degree in the embodiment of the present application.
Taking the server routing scheme as an example, the target FEC redundancy of the ith uplink audio channel in the current period can be represented by formula 10:
Figure 816850DEST_PATH_IMAGE035
(ii) a (formula 10)
Wherein the content of the first and second substances,
Figure 272102DEST_PATH_IMAGE036
the target FEC redundancy for the ith upstream audio channel in the current cycle,
Figure 499821DEST_PATH_IMAGE037
smoothing the routing state of the ith uplink audio channel in the current period,
Figure 331511DEST_PATH_IMAGE038
the original FEC redundancy for the ith upstream audio channel in the current period.
In the above embodiment, the objective function f (x) is a monotonically increasing function, which can ensure that the target FEC redundancy is increased along with the increase of the routing state smooth value, and further the speech coding code rate is also increased along with the increase of the routing state smooth value, where the routing state smooth value is another representation of the mixing contribution degree in the embodiment of the present application.
It should be noted that when x (input parameter, in the embodiment of the present application, the remix contribution degree) in the objective function f (x) is a preset threshold, f (x) =1 at this time, it can be ensured that
Figure 194031DEST_PATH_IMAGE039
In addition, since f (x) is a monotonically increasing function, when x is smaller than a preset threshold, it is ensured that
Figure 882502DEST_PATH_IMAGE040
(ii) a When x is larger than the preset threshold value, the method can ensure
Figure 535200DEST_PATH_IMAGE041
In addition, when x is larger than the preset threshold, the original FEC redundancy can be kept unchanged even though
Figure 436160DEST_PATH_IMAGE042
. In the above embodiment, the quality of sound can be ensured, the problem that the experience of all listeners is affected by the packet loss of the uplink network of the speaking party is avoided, and the phenomenon that some uplink audio channels still adopt higher FEC redundancy under the condition that the participation degree of the uplink audio channels during sound mixing is low or the uplink audio channels are not selected, consume more network bandwidth resources, and do not obtain corresponding experience improvement can also be effectively avoided.
Based on the same inventive concept, the present application further provides an implementation flowchart of a redundancy control method, which is applied to a terminal device, as shown in fig. 5, and is an implementation flowchart of a second redundancy control method provided in the embodiment of the present application, and a specific implementation flow of the method is as follows:
s51: and obtaining a target FEC redundancy of the corresponding uplink audio channels, wherein the target FEC redundancy is determined according to the audio mixing contribution and the packet loss rate of the uplink audio channels, the audio mixing contribution is determined according to the participation of the audio signals of the uplink audio channels in the downlink audio signals of each downlink audio channel, and the uplink audio channels, the downlink audio channels and the call participation terminals in each channel are in one-to-one correspondence.
There are two ways for the terminal device to obtain the target FEC redundancy of the corresponding uplink audio channel: one is that the corresponding target FEC redundancy sent by the server is directly received; and the other method is that the packet loss rate and the audio mixing contribution degree of the corresponding uplink audio channel issued by the server are received, and the target FEC redundancy is determined based on the packet loss rate and the audio mixing contribution degree. The method for determining the target FEC redundancy by the terminal device based on the packet loss rate and the audio mixing contribution is consistent with the process recited in the above embodiment, that is, the corresponding adjustment parameter is obtained according to the audio mixing contribution of the uplink audio channel corresponding to the terminal device; the product of the corresponding adjustment parameter and the original FEC redundancy determined based on the corresponding packet loss rate is used as the target FEC redundancy of the uplink audio channel corresponding to the terminal device, and specific implementation manners may refer to the above embodiments, and repeated details are not repeated.
S52: and performing FEC encoding on the voice-encoded sound signal of the uplink audio channel according to the target FEC redundancy.
In this embodiment of the present application, the terminal device FEC encodes the sound signal generated later according to the target FEC redundancy, for example, after the terminal device 1 receives the sound signal sent by the server in the first period and determines the packet loss rate and the audio mixing contribution of the uplink audio channel 1, calculates the target FEC redundancy, and sends the calculated target FEC redundancy to the terminal device 1, the terminal device 1 may perform speech encoding on the sound signal collected in the second period according to the target FEC redundancy, and then FEC encodes the sound signal in the second period according to the received target FEC redundancy.
It should be noted that the server may send information such as the target FEC redundancy, the packet loss rate, the mixing contribution degree, and the like to the terminal device through an rtcp (Real-time control protocol) packet, and there may be a delay in this process, but a redundancy fluctuation is not very frequent, and is generally a change update on the second level or even on the 10s level or more, so it is also feasible if the terminal device 1 receives the target FEC redundancy determined based on the sound signal of the first period returned by the server at a time in the third period, and in addition, the setting of the period in this embodiment of the application may be adjusted based on the fluctuation of the redundancy, and no specific limitation is made herein.
When the terminal device encodes the voice signal according to the target FEC redundancy, for example, the target FEC redundancy is 20%, the ratio of the voice packet to the redundant packet is 5:1, when FEC encoding is performed, a redundant packet can be generated according to every 5 voice packets, and the 5 voice packets and 1 redundant packet can be regarded as an FEC Group, and the 5 voice packets and the encoded 1 redundant packet are packaged and transmitted; for example, the voice packets No. 1 to No. 5 generate redundant packets No. 1, and package and send these 6 packets to the server, the voice packets No. 6 to No. 10 generate redundant packets No. 2, and package and send these 6 packets to the server, and so on. When the server side has voice packet data loss, the recovery by using the FEC in the current FEC Group can be attempted to recover the partially lost data packet.
In this embodiment, after the terminal device packages and sends the FEC-encoded sound signals to the server, the server may use the server mixing scheme shown in fig. 4A to perform mixing processing and then return a mixing result to the terminal device, or use the server routing scheme shown in fig. 4B to perform routing processing and then return a routing result to the terminal device. The terminal device obtains the downlink audio mixing signal of the corresponding downlink audio channel according to the audio mixing result returned by the server or the routing result, and performs voice playing according to the downlink audio mixing signal.
In the above embodiment, the original FEC redundancy determined according to the packet loss rate is adjusted in combination with the audio mixing contribution of each uplink audio channel, so as to reduce the original FEC redundancy of the uplink audio channel with a lower audio mixing contribution within a certain time, and further obtain the target FEC redundancy smaller than the original FEC redundancy. According to the characteristics of multi-person conversation, the FEC redundancy is controlled in a targeted manner, the FEC coding efficiency of a sending end is improved, and the use of network bandwidth is reduced.
Fig. 6A is a schematic diagram illustrating an interaction timing chart of a redundancy control method based on a server mixing scheme in an embodiment of the present application. The specific implementation flow of the method is as follows:
step S61: the method comprises the steps that terminal equipment collects sound signals generated by a local user when the local user participates in a call;
step S62: the terminal equipment carries out voice coding on the collected sound signals;
step S63: the terminal equipment determines a target FEC redundancy according to the received packet loss rate and the received audio mixing contribution degree, and performs FEC encoding on the encoded sound signal according to the target FEC redundancy;
step S64: the terminal equipment packs the sound signals subjected to the FEC coding and sends the sound signals to the server;
step S65: the server detects packet loss of the sound signals of the uplink audio channels and returns the packet loss rate of the uplink audio channels to the corresponding terminal equipment;
step S66: the server decodes the sound signals of each uplink audio channel after FEC coding;
step S67: the server mixes the sound signals of other uplink audio channels except the sound signal of the corresponding uplink audio channel of each sound signal to obtain a downlink mixed sound signal of each downlink audio channel;
step S68: the server predicts the audio mixing contribution degree of each uplink audio channel according to the contribution of each uplink audio channel in each downlink audio mixing signal and returns the packet loss rate of each uplink audio channel to the corresponding terminal equipment;
step S69: the server carries out secondary voice coding on the downlink audio mixing signal of each downlink audio channel;
step S610: the server sends the downlink audio mixing signal subjected to the secondary speech coding to corresponding terminal equipment;
step S611: the terminal equipment decodes the received downlink audio mixing signal;
step S612: and the terminal equipment plays the down-mixing sound signal.
Here, the steps S68 and S69 may not be distinguished in detail in terms of timing.
Fig. 6B is a schematic diagram illustrating an interaction timing chart of a redundancy control method based on a server routing scheme according to an embodiment of the present application. The specific implementation flow of the method is as follows:
step S61': the method comprises the steps that terminal equipment collects sound signals generated by a local user when the local user participates in a call;
step S62': the terminal equipment carries out voice coding on the collected sound signals;
step S63': the terminal equipment determines a target FEC redundancy according to the received packet loss rate and the received audio mixing contribution degree, and performs FEC encoding on the encoded sound signal according to the target FEC redundancy;
step S64': the terminal equipment packs the sound signals subjected to the FEC coding and sends the sound signals to the server;
step S65': the server detects packet loss of the sound signals of the uplink audio channels and returns the packet loss rate of the uplink audio channels to the corresponding terminal equipment;
step S66': the server analyzes the voice characteristic information of each sound signal to obtain the routing state of each uplink audio channel;
step S67': the server sends the routing state of each uplink audio channel to corresponding terminal equipment;
step S68': the terminal equipment performs voice decoding on the sound signal according to the routing state of the corresponding uplink audio channel;
step S69': the terminal equipment performs audio mixing on the sound signals of other uplink audio channels except the corresponding uplink audio channel according to the routing state of each uplink audio channel to correspondingly obtain downlink audio mixing signals;
step S610': and the terminal equipment plays the down-mixing sound signal.
Here, the steps S67 'and S68' may not be distinguished in detail in terms of timing.
It should be noted that only one terminal device is shown in fig. 6A and fig. 6B in the embodiment of the present application, and actually, there are many terminal devices participating in a call, which are not directly shown here, but the implementation manners of the respective terminal devices are substantially the same. In addition, fig. 6A and 6B only illustrate the execution of one cycle, and actually, these steps need to be looped.
Based on the same inventive concept, an embodiment of the present application further provides a redundancy control apparatus, as shown in fig. 7, which is a schematic structural diagram of a redundancy control apparatus 700 provided in the embodiment of the present application, and the redundancy control apparatus may include:
an obtaining unit 701, configured to obtain a sound signal of each uplink audio channel, where the uplink audio channel, the downlink audio channel, and the call participant terminal are in one-to-one correspondence;
a determining unit 702, configured to determine packet loss rates of uplink audio channels in each path respectively;
a predicting unit 703, configured to predict a mixing contribution of each uplink audio channel according to a participation of each sound signal in a down-mixing signal in each downlink audio channel;
a control unit 704, configured to obtain a target FEC redundancy of each uplink audio channel according to the packet loss ratio and the audio mixing contribution of each uplink audio channel.
Optionally, the control unit 704 is specifically configured to:
respectively obtaining the target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution of each uplink audio channel;
and respectively sending the target FEC redundancy of each path of uplink audio channel to the call participation terminal of the corresponding uplink audio channel.
Optionally, the control unit 704 is specifically configured to:
and respectively sending the packet loss rate and the sound mixing contribution degree of each uplink audio channel to the call participation terminal corresponding to the uplink audio channel, so that the call participation terminal corresponding to each uplink audio channel determines the target FEC redundancy according to the received packet loss rate and the received sound mixing contribution degree.
Optionally, the prediction unit 703 is specifically configured to:
respectively acquiring downlink audio mixing signals of each downlink audio channel, wherein any downlink audio mixing signal is obtained by mixing audio signals of other uplink audio channels except the audio signal of the corresponding uplink audio channel;
and predicting the mixing contribution degree of each uplink audio channel according to the contribution of the sound signal of each uplink audio channel in each downlink mixing signal.
Optionally, the prediction unit 703 is specifically configured to:
analyzing the voice characteristic information of each voice signal to obtain a routing state of each uplink audio channel, wherein the routing state is used for indicating whether the server selects the voice signal of the uplink audio channel to participate in sound mixing processing;
and predicting the sound mixing contribution degree of each uplink audio channel according to the routing state of each uplink audio channel.
Optionally, periodically obtaining a downlink audio mixing signal of each downlink audio channel according to the sound signal of each uplink audio channel, and sending the downlink audio mixing signal to the corresponding call participation terminal;
the prediction unit 703 is specifically configured to:
according to the target accumulated smooth values corresponding to the downmixing signals of all the downmixing channels in the previous period, smoothing the sum of the downmixing signals of each downlink audio channel in the current period to obtain the target accumulated smooth values corresponding to the downmixing signals of all the downmixing channels in the current period, wherein the target accumulated smooth values corresponding to the downmixing signals of all the downmixing channels in the first period are the sum of the downmixing signals of all the downmixing channels in the first period and a first preset value; and
according to the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel in the previous period, smoothing the accumulated mixed sound contribution of each sound signal in the downlink mixed sound signals of other downlink audio channels except the corresponding downlink audio channel in the current period to obtain the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel in the current period, wherein the contribution accumulated smooth value corresponding to each uplink audio channel in the first period is as follows: the audio mixing contribution or a second preset value of each sound signal in the first period in the downlink audio mixing signals of other downlink audio channels except the corresponding downlink audio channel;
and respectively taking the ratio of the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel in the current period to the target accumulated smooth value corresponding to the downlink audio mixing signals of all the downlink audio channels in the current period as the audio mixing contribution degree of each uplink audio channel in the current period.
Optionally, the prediction unit 703 is specifically configured to:
calculating target accumulated smooth values corresponding to the down-mixing signals of all the down-mixing audio channels in the current period by the following formula:
Figure 389072DEST_PATH_IMAGE043
for any uplink audio channel, calculating a contribution accumulated smooth value corresponding to the sound signal of the uplink audio channel in the current period by the following formula:
Figure 15488DEST_PATH_IMAGE044
wherein the content of the first and second substances,
Figure 889903DEST_PATH_IMAGE045
accumulating the smooth value for the target corresponding to the downmix signals of all the downlink audio channels of the current period,
Figure 594554DEST_PATH_IMAGE046
accumulating the smooth value for the target corresponding to the downmix signals of all the downlink audio channels of the previous period,
Figure 995448DEST_PATH_IMAGE047
is the down-mixing signal of the j-th down-audio channel of the current period,
Figure 524256DEST_PATH_IMAGE048
accumulating the smooth value for the contribution corresponding to the sound signal of the ith uplink audio channel in the current period,
Figure 885968DEST_PATH_IMAGE021
for the audio mixing contribution of the sound signal of the ith uplink audio channel in the j downlink audio channel in the current period, the value ranges of i and j are both 1-M, and M is the number of the call participation terminalsAnd β is a first preset parameter with a value range of (0, 1).
Optionally, the routing state of each uplink audio channel is determined periodically according to the sound signal of each uplink audio channel, and the sound signal of the uplink audio channel corresponding to the selected routing state is sent to the corresponding call participant terminal, so that each call participant terminal obtains the downmix signal of the corresponding downlink audio channel;
the prediction unit 703 is specifically configured to:
for any one path of uplink audio channel, if the path selection state of the uplink audio channel indicates that the uplink audio channel is not selected, smoothing the path selection state smooth value of the uplink audio channel in the previous period according to a second preset parameter to obtain the path selection state smooth value of the uplink audio channel in the current period, and taking the path selection state smooth value as the sound mixing contribution degree of the uplink audio channel in the current period, wherein the path selection state smooth value corresponding to the uplink audio channel in the first period is a third preset value;
if the path selection state of the uplink audio channel indicates that the uplink audio channel is selected, smoothing the path selection state smooth value of the uplink audio channel in the previous period according to a third preset parameter to obtain the path selection state smooth value of the uplink audio channel in the current period, and taking the path selection state smooth value as the sound mixing contribution degree of the uplink audio channel in the current period, wherein the path selection state smooth value corresponding to the uplink audio channel in the first period is a fourth preset value;
the value ranges of the second preset parameter and the third preset parameter are both (0, 1), and the third preset parameter is larger than the second preset parameter.
Optionally, the prediction unit 703 is specifically configured to:
if the route selection state of the uplink audio channel indicates that the uplink audio channel is selected, determining a route selection state smooth value of the uplink audio channel in the current period by the following formula:
Figure 518943DEST_PATH_IMAGE049
if the route selection state of the uplink audio channel indicates that the uplink audio channel is not selected, determining the route selection state smooth value of the uplink audio channel in the current period by the following formula:
Figure 649710DEST_PATH_IMAGE050
wherein the content of the first and second substances,
Figure 352349DEST_PATH_IMAGE028
the smooth value of the routing state of the uplink audio channel in the current period is obtained,
Figure 263673DEST_PATH_IMAGE029
the smoothed value of the routing state of the upstream audio channel in the previous period is α, which is the second predetermined parameter, and η which is the third predetermined parameter.
Optionally, the control unit 704 is specifically configured to:
for any path of uplink audio channel, if the audio mixing contribution of the uplink audio channel is smaller than a preset threshold, reducing the original FEC redundancy of the uplink audio channel determined based on the packet loss rate to obtain the target FEC redundancy of the uplink audio channel;
and if the audio mixing contribution degree of the uplink audio channel is greater than the preset threshold, increasing the original FEC redundancy of the uplink audio channel determined based on the packet loss rate to obtain the target FEC redundancy of the uplink audio channel.
Optionally, the control unit 704 is specifically configured to:
respectively obtaining an adjusting parameter corresponding to each uplink audio channel according to the audio mixing contribution degree of each uplink audio channel, wherein the adjusting parameter is a function value corresponding to a target function when the audio mixing contribution degree of the uplink audio channel is taken as an input parameter of the target function, the target function is a monotone increasing function, and the function value corresponding to the target function is 1 when the input parameter of the target function is a preset threshold;
and taking the product of the adjustment parameter corresponding to the uplink audio channel of each path and the original FEC redundancy determined based on the corresponding packet loss rate as the target FEC redundancy of the uplink audio channel of each path.
Based on the same inventive concept, an embodiment of the present application further provides a redundancy control apparatus, as shown in fig. 8, which is a schematic structural diagram of a second redundancy control apparatus 800 provided in the embodiment of the present application, and the redundancy control apparatus may include:
a redundancy obtaining unit 801, configured to obtain a target FEC redundancy of a corresponding uplink audio channel, where the target FEC redundancy is determined according to a mixing contribution and a packet loss rate of the uplink audio channel, the mixing contribution is determined by a participation degree of a downlink mixing signal of each downlink audio channel according to a sound signal of the uplink audio channel after a server obtains a sound signal of each uplink audio channel, and each uplink audio channel, each downlink audio channel, and a call participation terminal are in one-to-one correspondence;
the signal processing unit 802 is configured to perform FEC encoding on the voice-encoded sound signal of the uplink audio channel according to the target FEC redundancy.
For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.
Having described the redundancy control method and apparatus of the exemplary embodiments of the present application, an electronic device according to another exemplary embodiment of the present application is next described.
As will be appreciated by one skilled in the art, each aspect of the present application may be embodied as a system, method or program product. Accordingly, each aspect of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
Fig. 9 is a block diagram illustrating an electronic device 900 according to an example embodiment, the apparatus comprising:
a processor 910;
a memory 920 for storing executable program code for the processor 910;
wherein the processor 910 is configured to execute the program code to implement a redundancy control method in an embodiment of the present disclosure, such as the steps shown in fig. 3 or the steps shown in fig. 5.
In an exemplary embodiment, a storage medium including program code, such as the memory 920 including program code, is also provided, and the above operations may be performed by the processor 910 of the electronic device 900 to perform the above method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In some possible embodiments, a server according to the present application may include at least one processing unit, and at least one storage unit. Wherein the storage unit stores program code which, when executed by the processing unit, causes the processing unit to perform the steps of the redundancy control method according to various exemplary embodiments of the present application described above in the present specification. For example, the processing unit may perform the steps as shown in fig. 3.
The server 100 according to this embodiment of the present application is described below with reference to fig. 10. The server 100 of fig. 10 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in fig. 10, the server 100 is represented in the form of a general server. The components of the server 100 may include, but are not limited to: the at least one processing unit 101, the at least one memory unit 102, and a bus 103 connecting various system components (including the memory unit 102 and the processing unit 101).
Bus 103 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.
The storage unit 102 may include readable media in the form of volatile memory, such as a Random Access Memory (RAM) 1021 and/or a cache storage unit 1022, and may further include a Read Only Memory (ROM) 1023.
Storage unit 102 may also include a program/utility 1025 having a set (at least one) of program modules 1024, such program modules 1024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The server 100 may also communicate with one or more external devices 104 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the server 100, and/or with any devices (e.g., router, modem, etc.) that enable the server 100 to communicate with one or more other servers. Such communication may be through an input/output (I/O) interface 105. Also, the server 100 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 106. As shown, the network adapter 106 communicates with other modules for the server 100 over the bus 103. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the server 100, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Referring to fig. 11, the terminal device 1100 includes a display unit 1140, a processor 1180 and a memory 1120, where the display unit 1140 includes a display panel 1141 for displaying information input by a user or information provided to the user, various object selection interfaces of the terminal device 1100, and the like, and in this embodiment, the display unit 1140 is mainly used for displaying an interface of an application installed in the terminal device 1100, a shortcut window, and the like. Alternatively, the Display panel 1141 may be configured in the form of an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
The processor 1180 is configured to read the computer program and then execute a method defined by the computer program, for example, the processor 1180 reads an application program of an application available for multi-person conversation, so as to run the application on the terminal device 1100, and displays an interface of the application on the display unit 1140. The Processor 1180 may include one or more general processors, and may further include one or more DSPs (Digital Signal processors) for performing relevant operations to implement the technical solutions provided in the embodiments of the present application.
Memory 1120 generally includes both internal and external memory, which may be Random Access Memory (RAM), Read Only Memory (ROM), CACHE memory (CACHE), and the like. The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk or a tape drive. The memory 1120 is used for storing computer programs including application programs and the like corresponding to applications, and other data, which may include data generated after an operating system or application programs are executed, including system data (e.g., configuration parameters of the operating system) and user data. Program instructions in the embodiments of the present application are stored in the memory 1120, and the processor 1180 executes the program instructions stored in the memory 1120, thereby implementing the redundancy control method discussed above, or implementing the functions of the adaptation application discussed above.
Further, the terminal device 1100 may further include a display unit 1140 for receiving input numerical information, character information, or contact touch operation/non-contact gesture, and generating signal input related to user setting and function control of the terminal device 1100, and the like. Specifically, in the embodiment of the present application, the display unit 1140 may include a display panel 1141. The display panel 1141, such as a touch screen, may collect touch operations of a user (e.g., operations of a player on the display panel 1141 or on the display panel 1141 using any suitable object or accessory such as a finger, a stylus, etc.) on or near the display panel 1141, and drive the corresponding connection device according to a preset program. Alternatively, the display panel 1141 may include two parts of a touch detection device and a touch controller. The touch detection device comprises a touch controller, a touch detection device and a touch control unit, wherein the touch detection device is used for detecting the touch direction of a user, detecting a signal brought by touch operation and transmitting the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1180, and can receive and execute commands sent by the processor 1180.
The display panel 1141 may be implemented by various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the display unit 1140, the terminal device 1100 may further include an input unit 1130, and the input unit 1130 may include, but is not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like. In fig. 11, an example is given in which the input unit 1130 includes an image input device 1131 and other input devices 1132.
In addition to the above, the terminal device 1100 may also include a power supply 1190 for powering other modules, an audio circuit 1160, a near field communication module 1170, and RF circuits 1110. The terminal device 1100 may also include one or more sensors 1150, such as acceleration sensors, light sensors, pressure sensors, and the like. The audio circuit 1160 specifically includes a speaker 1161 and a microphone 1162, for example, a user may use voice control, and the terminal device 1100 may collect a voice of the user through the microphone 1162, control the voice of the user, and play a corresponding prompt tone through the speaker 1161 when the user needs to be prompted.
In some possible embodiments, each aspect of the redundancy control method provided by the present application may also be implemented in the form of a program product comprising program code for causing a computer device to perform the steps in the redundancy control method of the various exemplary embodiments of the present application described above in this specification when the program product is run on a computer device, for example, the computer device may perform the steps as shown in fig. 3 or the steps shown in fig. 5.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (15)

1. A redundancy control method, comprising:
respectively acquiring sound signals of each uplink audio channel, wherein the uplink audio channel, the downlink audio channel and the call participation terminal are in one-to-one correspondence;
respectively determining the packet loss rate of each uplink audio channel;
predicting the audio mixing contribution of each uplink audio channel according to the participation of each sound signal in the downlink audio channel of each channel;
and respectively obtaining the target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution of each uplink audio channel.
2. The method according to claim 1, wherein the obtaining the target FEC redundancy of each uplink audio channel according to the packet loss ratio and the audio mixing contribution of each uplink audio channel respectively comprises:
respectively obtaining the target FEC redundancy of each uplink audio channel according to the packet loss rate and the audio mixing contribution of each uplink audio channel;
and respectively sending the target FEC redundancy of each uplink audio channel to the call participation terminal of the corresponding uplink audio channel.
3. The method according to claim 1, wherein the obtaining the target FEC redundancy of each uplink audio channel according to the packet loss ratio and the audio mixing contribution of each uplink audio channel respectively comprises:
and respectively sending the packet loss rate and the audio mixing contribution degree of each uplink audio channel to the call participation terminal corresponding to the uplink audio channel, so that the call participation terminal corresponding to each uplink audio channel determines the target FEC redundancy according to the received packet loss rate and the received audio mixing contribution degree.
4. The method according to claim 1, wherein the predicting the mixing contribution of each upstream audio channel according to the participation of each sound signal in the downstream mixing signal of each downstream audio channel comprises:
respectively acquiring downlink audio mixing signals of each downlink audio channel, wherein any downlink audio mixing signal is obtained by mixing audio signals of other uplink audio channels except the audio signal of the corresponding uplink audio channel;
and predicting the mixing contribution degree of each uplink audio channel according to the contribution of the sound signal of each uplink audio channel in each downlink mixing signal.
5. The method according to claim 4, wherein the down-mixing signals of each downlink audio channel are respectively obtained periodically according to the sound signals of each uplink audio channel and are sent to the corresponding call participant terminal;
the predicting the mixing contribution degree of each uplink audio channel according to the contribution of each uplink audio channel in each downlink mixing signal specifically includes:
according to the target accumulated smooth values corresponding to the downmixing signals of all the downmixing channels in the previous period, smoothing the sum of the downmixing signals of each downlink audio channel in the current period to obtain the target accumulated smooth values corresponding to the downmixing signals of all the downmixing channels in the current period, wherein the target accumulated smooth values corresponding to the downmixing signals of all the downmixing channels in the first period are the sum of the downmixing signals of all the downmixing channels in the first period and a first preset value; and
according to the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel in the previous period, smoothing the accumulated sound mixing contribution of each sound signal in the downlink sound mixing signals of other downlink audio channels except the corresponding downlink audio channel in the current period to obtain the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel in the current period, wherein the contribution accumulated smooth value corresponding to each uplink audio channel in the first period is the sound mixing contribution or a second preset value of each sound signal in the downlink sound mixing signals of other downlink audio channels except the corresponding downlink audio channel in the first period;
and respectively taking the ratio of the contribution accumulated smooth value corresponding to the sound signal of each uplink audio channel in the current period to the target accumulated smooth value corresponding to the downlink audio mixing signals of all the downlink audio channels in the current period as the audio mixing contribution degree of each uplink audio channel in the current period.
6. The method according to claim 1, wherein the predicting the mixing contribution of each upstream audio channel according to the participation of each sound signal in the downstream mixing signal of each downstream audio channel comprises:
analyzing the voice characteristic information of each voice signal to obtain a routing state of each uplink audio channel, wherein the routing state is used for indicating whether a server selects the voice signal of the uplink audio channel to participate in sound mixing processing;
and predicting the sound mixing contribution degree of each uplink audio channel according to the routing state of each uplink audio channel.
7. The method according to claim 5, wherein the routing state of each uplink audio channel is periodically determined according to the sound signal of each uplink audio channel, and the sound signal of the uplink audio channel corresponding to the selected routing state is sent to the corresponding call participant terminal, so that each call participant terminal obtains the downmix signal of the corresponding downlink audio channel;
the predicting the audio mixing contribution degree of each uplink audio channel according to the routing state of each uplink audio channel specifically includes:
for any uplink audio channel, if the routing state representation of the uplink audio channel is selected, smoothing the routing state smooth value of the uplink audio channel in the previous period according to a second preset parameter to obtain the routing state smooth value of the uplink audio channel in the current period, and taking the routing state smooth value as the sound mixing contribution degree of the uplink audio channel in the current period, wherein the routing state smooth value corresponding to the uplink audio channel in the first period is a third preset value;
if the routing state of the uplink audio channel indicates that the uplink audio channel is not selected, smoothing the routing state smooth value of the uplink audio channel in the previous period according to a third preset parameter to obtain the routing state smooth value of the uplink audio channel in the current period, and taking the routing state smooth value as the sound mixing contribution degree of the uplink audio channel in the current period, wherein the routing state smooth value corresponding to the uplink audio channel in the first period is a fourth preset value;
the value ranges of the second preset parameter and the third preset parameter are both (0, 1), the third preset parameter is greater than the second preset parameter, and the third preset value is greater than the fourth preset value.
8. The method according to any one of claims 1 to 7, wherein the obtaining the FEC redundancy of the uplink audio channels of each path according to the packet loss ratio and the mixing contribution of the uplink audio channels of each path respectively includes:
for any path of uplink audio channel, if the audio mixing contribution of the uplink audio channel is smaller than a preset threshold, reducing the original FEC redundancy of the uplink audio channel determined based on the packet loss rate to obtain the target FEC redundancy of the uplink audio channel;
if the audio mixing contribution of the uplink audio channel is greater than a preset threshold, increasing the original FEC redundancy of the uplink audio channel determined based on the packet loss rate to obtain the target FEC redundancy of the uplink audio channel.
9. The method according to claim 8, wherein the obtaining the target FEC redundancy of each uplink audio channel according to the packet loss ratio and the audio mixing contribution of each uplink audio channel respectively comprises:
respectively obtaining an adjusting parameter corresponding to each uplink audio channel according to the audio mixing contribution of each uplink audio channel, wherein the adjusting parameter is a function value corresponding to a target function when the audio mixing contribution of the uplink audio channel is taken as an input parameter of the target function, the target function is a monotone increasing function, and the function value corresponding to the target function is 1 when the input parameter of the target function is the preset threshold;
and taking the product of the adjustment parameter corresponding to the uplink audio channel of each path and the original FEC redundancy determined based on the corresponding packet loss rate as the target FEC redundancy of the uplink audio channel of each path.
10. A redundancy control method, comprising:
obtaining a target FEC redundancy of a corresponding uplink audio channel, wherein the target FEC redundancy is determined according to a mixing contribution and a packet loss rate of the uplink audio channel, the mixing contribution is determined according to the participation of a sound signal of the uplink audio channel in a downlink mixing signal of each downlink audio channel after the server respectively obtains the sound signal of each uplink audio channel, and the uplink audio channel, the downlink audio channel and a call participation terminal in each channel are in one-to-one correspondence;
and performing FEC encoding on the voice-encoded sound signal of the uplink audio channel according to the target FEC redundancy.
11. A redundancy control apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for respectively acquiring sound signals of each uplink audio channel, and the uplink audio channel, the downlink audio channel and the call participation terminal of each channel are in one-to-one correspondence;
the determining unit is used for respectively determining the packet loss rate of each uplink audio channel;
a prediction unit, configured to predict a mixing contribution of each uplink audio channel according to a participation of each sound signal in a down-mixing signal in each downlink audio channel;
and the control unit is used for respectively obtaining the target FEC redundancy of each path of uplink audio channel according to the packet loss rate and the audio mixing contribution degree of each path of uplink audio channel.
12. The apparatus as claimed in claim 11, wherein said control unit is specifically configured to:
for any path of uplink audio channel, if the audio mixing contribution of the uplink audio channel is smaller than a preset threshold, reducing the original FEC redundancy of the uplink audio channel determined based on the packet loss rate to obtain the target FEC redundancy of the uplink audio channel;
if the audio mixing contribution of the uplink audio channel is greater than a preset threshold, increasing the original FEC redundancy of the uplink audio channel determined based on the packet loss rate to obtain the target FEC redundancy of the uplink audio channel.
13. A redundancy control apparatus, comprising:
a redundancy obtaining unit, configured to obtain a target FEC redundancy of a corresponding uplink audio channel, where the target FEC redundancy is determined according to a mixing contribution and a packet loss rate of the uplink audio channel, the mixing contribution is determined by the server according to a participation degree of a downlink mixing signal of each downlink audio channel in each uplink audio channel after the server obtains a sound signal of each uplink audio channel, and each uplink audio channel, the downlink audio channel, and a call participation terminal are in one-to-one correspondence;
and the signal processing unit is used for carrying out FEC coding on the sound signals of the uplink audio channel after the sound coding according to the target FEC redundancy.
14. An electronic device, comprising a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-9 or to perform the steps of the method of claim 10.
15. A computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method of any one of claims 1-9 or to carry out the steps of the method of claim 10, when said program product is run on the electronic device.
CN202010452126.8A 2020-05-26 2020-05-26 Redundancy control method and device, electronic equipment and storage medium Active CN111371957B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010452126.8A CN111371957B (en) 2020-05-26 2020-05-26 Redundancy control method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010452126.8A CN111371957B (en) 2020-05-26 2020-05-26 Redundancy control method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111371957A true CN111371957A (en) 2020-07-03
CN111371957B CN111371957B (en) 2020-08-25

Family

ID=71211061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010452126.8A Active CN111371957B (en) 2020-05-26 2020-05-26 Redundancy control method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111371957B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951813A (en) * 2020-07-20 2020-11-17 腾讯科技(深圳)有限公司 Voice coding control method, device and storage medium
CN113192520A (en) * 2021-07-01 2021-07-30 腾讯科技(深圳)有限公司 Audio information processing method and device, electronic equipment and storage medium
CN113192519A (en) * 2021-04-29 2021-07-30 北京达佳互联信息技术有限公司 Audio encoding method and apparatus, and audio decoding method and apparatus
CN114448588A (en) * 2022-01-14 2022-05-06 杭州网易智企科技有限公司 Audio transmission method and device, electronic equipment and computer readable storage medium
WO2023202250A1 (en) * 2022-04-18 2023-10-26 腾讯科技(深圳)有限公司 Audio transmission method and apparatus, terminal, storage medium and program product

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030832A (en) * 2006-03-03 2007-09-05 华为技术有限公司 Method and system for realizing realtime transmission protocol message redundancy
WO2015015880A1 (en) * 2013-07-30 2015-02-05 ソニー株式会社 Information processing device, information processing method, and program
CN104469032A (en) * 2014-10-30 2015-03-25 苏州科达科技股份有限公司 Sound mixing processing method and system
CN104539816A (en) * 2014-12-25 2015-04-22 广州华多网络科技有限公司 Intelligent voice mixing method and device for multi-party voice communication
CN104917671A (en) * 2015-06-10 2015-09-16 腾讯科技(深圳)有限公司 Mobile terminal based audio processing method and device
CN105610635A (en) * 2016-02-29 2016-05-25 腾讯科技(深圳)有限公司 Voice code transmitting method and apparatus
CN105991577A (en) * 2015-02-11 2016-10-05 腾讯科技(深圳)有限公司 Voice communication processing method, voice communication processing system and cloud server
CN106452663A (en) * 2015-08-11 2017-02-22 阿里巴巴集团控股有限公司 Network communication data transmission method based on RTP protocol, and communication equipment
CN110545161A (en) * 2019-08-13 2019-12-06 河北远东通信系统工程有限公司 multimedia data real-time transmission method with redundancy
CN110838894A (en) * 2019-11-27 2020-02-25 腾讯科技(深圳)有限公司 Voice processing method, device, computer readable storage medium and computer equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030832A (en) * 2006-03-03 2007-09-05 华为技术有限公司 Method and system for realizing realtime transmission protocol message redundancy
WO2015015880A1 (en) * 2013-07-30 2015-02-05 ソニー株式会社 Information processing device, information processing method, and program
CN104469032A (en) * 2014-10-30 2015-03-25 苏州科达科技股份有限公司 Sound mixing processing method and system
CN104539816A (en) * 2014-12-25 2015-04-22 广州华多网络科技有限公司 Intelligent voice mixing method and device for multi-party voice communication
CN105991577A (en) * 2015-02-11 2016-10-05 腾讯科技(深圳)有限公司 Voice communication processing method, voice communication processing system and cloud server
CN104917671A (en) * 2015-06-10 2015-09-16 腾讯科技(深圳)有限公司 Mobile terminal based audio processing method and device
CN106452663A (en) * 2015-08-11 2017-02-22 阿里巴巴集团控股有限公司 Network communication data transmission method based on RTP protocol, and communication equipment
CN105610635A (en) * 2016-02-29 2016-05-25 腾讯科技(深圳)有限公司 Voice code transmitting method and apparatus
CN110545161A (en) * 2019-08-13 2019-12-06 河北远东通信系统工程有限公司 multimedia data real-time transmission method with redundancy
CN110838894A (en) * 2019-11-27 2020-02-25 腾讯科技(深圳)有限公司 Voice processing method, device, computer readable storage medium and computer equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111951813A (en) * 2020-07-20 2020-11-17 腾讯科技(深圳)有限公司 Voice coding control method, device and storage medium
CN113192519A (en) * 2021-04-29 2021-07-30 北京达佳互联信息技术有限公司 Audio encoding method and apparatus, and audio decoding method and apparatus
CN113192520A (en) * 2021-07-01 2021-07-30 腾讯科技(深圳)有限公司 Audio information processing method and device, electronic equipment and storage medium
CN113192520B (en) * 2021-07-01 2021-09-24 腾讯科技(深圳)有限公司 Audio information processing method and device, electronic equipment and storage medium
CN114448588A (en) * 2022-01-14 2022-05-06 杭州网易智企科技有限公司 Audio transmission method and device, electronic equipment and computer readable storage medium
CN114448588B (en) * 2022-01-14 2024-01-23 杭州网易智企科技有限公司 Audio transmission method, device, electronic equipment and computer readable storage medium
WO2023202250A1 (en) * 2022-04-18 2023-10-26 腾讯科技(深圳)有限公司 Audio transmission method and apparatus, terminal, storage medium and program product

Also Published As

Publication number Publication date
CN111371957B (en) 2020-08-25

Similar Documents

Publication Publication Date Title
CN111371957B (en) Redundancy control method and device, electronic equipment and storage medium
CN108011686B (en) Information coding frame loss recovery method and device
CN104040622B (en) System, method, equipment and the computer-readable media controlled for key threshold value
US10617362B2 (en) Machine learned optimizing of health activity for participants during meeting times
US8589153B2 (en) Adaptive conference comfort noise
CN105099949A (en) Jitter buffer control based on monitoring for dynamic states of delay jitter and conversation
CN111464262B (en) Data processing method, device, medium and electronic equipment
CN105099795A (en) Jitter buffer level estimation
CN111585776B (en) Data transmission method, device, equipment and computer readable storage medium
US9325853B1 (en) Equalization of silence audio levels in packet media conferencing systems
Silveira et al. Predicting packet loss statistics with hidden Markov models for FEC control
CN111628992B (en) Multi-person call control method and device, electronic equipment and storage medium
CN112449208B (en) Voice processing method and device
US20230124470A1 (en) Enhancing musical sound during a networked conference
US10701124B1 (en) Handling timestamp inaccuracies for streaming network protocols
US20070253557A1 (en) Methods And Apparatuses For Processing Audio Streams For Use With Multiple Devices
CN112751820A (en) Digital voice packet loss concealment using deep learning
US20200090648A1 (en) Maintaining voice conversation continuity
CN113192520B (en) Audio information processing method and device, electronic equipment and storage medium
EP3259906B1 (en) Handling nuisance in teleconference system
US11489620B1 (en) Loss recovery using streaming codes in forward error correction
CN113936669A (en) Data transmission method, system, device, computer readable storage medium and equipment
Wah et al. The design of VoIP systems with high perceptual conversational quality
US20230106959A1 (en) Loss recovery using streaming codes in forward error correction
CN116996622B (en) Voice data transmission method, device, equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40025790

Country of ref document: HK

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220211

Address after: 510310 No. 1, brand area, No. 397, Xingang Middle Road, Haizhu District, Guangzhou City, Guangdong Province

Patentee after: GUANGZHOU TENCENT TECHNOLOGY Co.,Ltd.

Address before: 35th floor, Tencent building, Keji Zhongyi Road, high tech Zone, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.