CN117812053A

CN117812053A - Voice data processing method, equipment and storage medium

Info

Publication number: CN117812053A
Application number: CN202311844838.4A
Authority: CN
Inventors: 李婧; 胡小鹏; 尚德建
Original assignee: Suzhou Keda Special Video Co ltd; Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Special Video Co ltd; Suzhou Keda Technology Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-02

Abstract

The invention provides a voice data processing method, a system, equipment and a storage medium, wherein the voice data processing method comprises the following steps: when the current decoding channel is overtime and the current frame voice is not read, judging whether the current frame is effective voice or not by using at least one audio frame voice before the current frame; if yes, carrying out voice packet loss compensation on the current frame by using a packet loss compensation model; if not, the previous audio frame voice of the current frame is used for filling the current frame. According to the embodiment, the packet loss compensation is performed by using the PLC model only when the voice reading time is limited and the current frame of voice is judged to be possibly effective voice, and the packet loss compensation is not performed on all unread voices, so that the use times of the PLC model can be reduced, the occupancy rate of the PLC model to system resources is reduced, and the chip utilization rate loaded with the PLC model is improved. Meanwhile, the method uses the PLC model of the receiving end, reduces the requirement on the audio data encoder of the sending end, and is more beneficial to being compatible with multiple types of terminals to enter a voice mixing conference.

Description

Voice data processing method, equipment and storage medium

Technical Field

The present invention relates to the field of speech processing technologies, and in particular, to a speech data processing method, apparatus, and storage medium.

Background

With the increasing popularization of internet technology, voice call is popularized, voice call quality is affected by network conditions and related factors, and packet loss occurs in a transmission process due to instability of a transmission network, so that voice at a receiving end is blocked and is not consistent, and experience of a listener is poor.

Therefore, how to solve the problem of packet loss in voice communication is a technical problem to be solved in the industry.

It should be noted that the information disclosed in the foregoing background section is only for enhancement of understanding of the background of the invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

Aiming at the problems in the prior art, the invention aims to provide a voice data processing method, equipment and a storage medium, which overcome the technical problem of low voice call quality in the prior art.

The embodiment of the disclosure provides a voice data processing method, which comprises the following steps:

when the current decoding channel is overtime and the current frame voice is not read, judging whether the current frame is effective voice or not by using at least one audio frame voice before the current frame, wherein the current decoding channel is a decoding channel participating in mixing;

if yes, carrying out voice packet loss compensation on the current frame by using a packet loss compensation model;

if not, the previous audio frame voice of the current frame is used for filling the current frame.

In some embodiments, the voice data processing method further comprises:

when the current decoding channel is overtime and the current frame voice is not read, at least one audio frame voice before the current frame voice is used for judging whether the current frame voice is effective voice or not, the current decoding channel with the packet loss identification is screened out from a plurality of decoding channels, and whether the current decoding channel is overtime and the current frame voice is not read is judged.

In some embodiments, the voice data processing method further comprises:

before judging whether the current decoding channel is overtime and does not read the current frame voice, triggering the current decoding channel to read the current frame voice from the buffer zone through the mixer timer.

In some embodiments, determining whether the current decoding channel has not read the current frame of speech over time comprises:

under the condition that the current decoding channel fails to read the current frame of voice for the first time, reading the current frame of voice at intervals until the target times are reached, and if the current frame of voice is not read, judging that the current decoding channel fails to read the current frame of voice overtime.

In some embodiments, when the current decoding channel times out that the current frame of speech is not read, determining whether the current frame of speech is valid speech using at least one audio frame of speech preceding the current frame of speech includes:

when the current decoding channel is overtime and the current frame voice is not read, under the condition that the continuous packet loss time of the current decoding channel before the current frame does not exceed a preset time period, judging whether the current frame is effective voice or not by using at least one audio frame voice before the current frame.

In some embodiments, the voice data processing method further comprises:

and filling mute data in the current frame under the condition that the continuous packet loss time of the current decoding channel before the current frame exceeds a preset time period.

In some embodiments, the voice data processing method further comprises:

under the condition that the current decoding channel successfully reads the current frame of voice, judging whether the current frame of voice is continuous with the adjacent previous frame of voice according to the identification and the time stamp of the current frame;

if not, decoding the current frame of voice and then carrying out smoothing treatment;

if yes, decoding the current frame of voice.

In some embodiments, the voice data processing method further comprises:

and under the condition that the current frame voice is continuous with the adjacent previous frame and the current frame voice participates in the mixing, calculating algorithm parameters of a packet loss compensation model according to decoded data of the current frame voice, and updating the packet loss compensation model by using the algorithm parameters.

The embodiment of the disclosure also provides a voice data processing system, which comprises:

the first judging module judges whether the current frame is effective voice or not by using at least one audio frame voice before the current frame when the current decoding channel is overtime and the current frame voice is not read;

the compensation module is used for carrying out voice packet loss compensation on the current frame by using a packet loss compensation model if the current frame is in the voice packet loss compensation mode;

and the filling module is used for filling the current frame by using the voice of the audio frame which is the last of the current frame if the current frame is not the current frame.

The embodiment of the invention also provides electronic equipment, which comprises:

a processor;

a memory having stored therein executable instructions of a processor;

wherein the processor is configured to perform the steps of the above-described voice data processing method via execution of executable instructions.

The embodiment of the present invention also provides a computer-readable storage medium storing a program which, when executed, implements the steps of the above-described voice data processing method.

According to the voice data processing method, the voice data processing system, the voice data processing equipment and the storage medium, the PLC model can be used for carrying out packet loss compensation only when the voice reading time is limited and the fact that the voice of the current frame is likely to be effective voice is judged, and packet loss compensation is not carried out on all unread voices, so that the using times of the PLC model can be reduced, the occupancy rate of the PLC model to system resources is reduced, and the chip utilization rate loaded with the PLC model is improved. Meanwhile, the method uses the PLC model of the receiving end, reduces the requirement on the audio data encoder of the sending end, and is more beneficial to being compatible with multiple types of terminals to enter a voice mixing conference.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings.

Fig. 1 shows one of the flowcharts of the voice data processing method of the embodiment of the present disclosure.

FIG. 2 shows a second flowchart of a voice data processing method according to an embodiment of the present disclosure.

FIG. 3 shows a third flowchart of a voice data processing method according to an embodiment of the present disclosure.

Fig. 4 shows a fourth flowchart of a voice data processing method of an embodiment of the present disclosure.

Fig. 5 shows one of the structural schematic diagrams of the voice data processing system of the embodiment of the present disclosure.

FIG. 6 shows a second schematic diagram of a voice data processing system according to an embodiment of the present disclosure.

Fig. 7 is a schematic structural view of the electronic device of the present invention. And

Fig. 8 is a schematic structural view of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware forwarding modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

Furthermore, the flow shown in the drawings is merely illustrative and not necessarily all steps are included. For example, some steps may be decomposed, some steps may be combined or partially combined, and the order of actual execution may be changed according to actual situations.

In the related art, in order to solve the problem of packet loss in voice communication, the currently used packet loss compensation technology is mainly divided into the following two types:

one is based on the transmission side for compensation, such as (English: forward error correction, abbreviated FEC), redundant audio data (redundant audio data, abbreviated RED), and automatic retransmission techniques;

another technique is a technique based on compensation at the receiving end, such as a packet loss compensation model (Packet Loss Concealment, abbreviated PLC) module by methods such as waveform correlation or noise filling.

For the above two operations, the first way increases the complexity of the transmitting end; the second way of PLC algorithm is complex, and the performance of the receiving end is a big challenge, i.e. the operation load of the PLC algorithm in the system is increased.

The embodiment of the disclosure provides a voice data processing method, a voice data processing system, voice data processing equipment and a voice data processing storage medium, which reduce the operation burden of a PLC model, thereby improving the resource utilization rate of a voice data processing system.

Fig. 1 shows a flowchart of a voice data processing method according to an embodiment of the present disclosure, where an execution body of the method may be a voice receiving device. The application scene of the method is a video conference scene, and specifically, decoding, mixing and outputting the audio acquired on site.

As shown in fig. 1, the voice data processing method includes, but is not limited to, the following steps:

step 110: when the current decoding channel is overtime and the current frame voice is not read, judging whether the current frame is effective voice or not by using at least one audio frame voice before the current frame, wherein the current decoding channel is a decoding channel participating in mixing;

step 120: if yes, carrying out voice packet loss compensation on the current frame by using a packet loss compensation model;

step 130: if not, the previous audio frame voice of the current frame is used for filling the current frame.

With the present embodiment, only the decoding channels participating in the mixing will perform packet loss compensation, the decoding channels not participating in the mixing will not perform packet loss processing, and the current decoding channel in the present embodiment is the decoding channel participating in the mixing. On the basis that the current decoding channel is a decoding channel participating in audio mixing, the PLC model is used for carrying out packet loss compensation only when the voice reading time is limited and the current frame voice is judged to be possibly effective voice, and packet loss compensation is not carried out on all decoding channels and all unread voices, so that the using times of the PLC model can be reduced, the occupancy rate of the PLC model to system resources is reduced, and the chip utilization rate loaded with the PLC model is improved. Meanwhile, the method uses the PLC model of the receiving end, reduces the requirement on the audio data encoder of the sending end, and is more beneficial to being compatible with multiple types of terminals to enter a voice mixing conference.

In one embodiment, the PLC model is internally arranged in an embedded micro control unit (Microcontroller Unit; MCU), and the utilization rate of an embedded chip where the MCU is located is improved and the voice call quality is ensured due to the reduction of the use times of the PLC model.

In one application scenario, a voice receiving end continuously receives a voice audio code stream and decodes the received audio code stream frame by frame through a decoding channel. In one embodiment, as shown in fig. 2, the voice data acquisition procedure includes, but is not limited to, the following steps:

step 210: receiving a voice code stream sent by a voice sender terminal;

step 220: monitoring whether voice packets received by a plurality of decoding channels within a certain period of time are lost;

step 230: if yes, marking a packet loss identifier for a decoding channel with a lost packet;

for example, the packet loss flag of the decoding channel with the lost packet is set to 1;

step 240: if not, clearing the packet loss identification of the decoding channel;

step 250: the voice data received in the process are all put into the buffer.

In this case, the embodiment of the present disclosure further provides a packet loss processing method shown in fig. 3:

step 310: screening a current decoding channel with a packet loss identifier from a plurality of decoding channels, and judging whether the current decoding channel is overtime and does not read the current frame of voice;

step 320: under the condition that the current decoding channel is not read out by overtime and the current decoding channel is the decoding channel participating in the mixing, judging whether the current frame is effective voice or not by using at least one audio frame voice before the current frame;

step 330: if yes, carrying out voice packet loss compensation on the current frame by using a packet loss compensation model;

step 340: if not, the previous audio frame voice of the current frame is used for filling the current frame.

In the embodiment of the disclosure, the receiving side performs statistics on voice data packets within a certain time, and identifies the decoding channels according to whether packet loss exists within a certain time, so as to screen out a plurality of channels which possibly need to perform packet loss compensation under the condition of weak network, and only performs voice packet loss compensation on effective voices in the decoding channels participating in mixing. Therefore, the packet loss compensation model only runs aiming at decoding channels which possibly have packet loss and participate in mixing, and the calculation amount of the PLC model is reduced. Wherein, the above-mentioned monitoring decodes the voice packet in the channel and loses in the certain period, this certain period can be set up according to the need.

In one embodiment, the voice data is buffered before decoding, and the decoding channel reads the voice data from the buffer and decodes the voice data, which is advantageous for controlling the delay time. Optionally, the receiving end is provided with a timer, the timer triggers a decoding event, and the corresponding decoding channel reads the voice data from the buffer. The receiving end is provided with a mixer, the timer is arranged on the mixer, and the mixer periodically triggers a decoding event through the timer.

In this case, fig. 4 shows a voice data processing method according to an embodiment of the present disclosure, which includes, but is not limited to, the following steps:

step 410: the current decoding channel reads the current frame voice from the buffer area;

specifically, the mixer timer triggers the decoder decoding time, and the decoding channel reads the voice data in the buffer. The timer of the mixer triggers the working mode of the decoder, so that the data buffer area can intensively store the data before decoding, thereby being more beneficial to controlling the delay time.

Step 420: judging whether the current decoding channel has a packet loss identifier or not;

if the result of the execution of step 420 is yes, that is, the current decoding channel has a packet loss identifier, step 430 is executed: judging whether the current decoding channel can read the current frame of voice or not;

if the execution result of step 420 is no, that is, the current decoding channel has no packet loss identifier, the obtained current frame of speech is directly decoded, whether to participate in mixing is determined based on the decoded data, and the data participating in mixing is input into the mixing module.

If the result of the execution of step 430 is yes, that is, the current decoding channel reads the current frame of speech, step 440 is executed: decoding the read current frame voice data;

and after decoding, based on the decoded data, performs step 450: judging whether the current frame of voice participates in mixing;

if the result of the execution of step 450 is yes, that is, the current frame of speech participates in the mixing, step 460 is executed: judging whether the read current frame voice is continuous with the last frame voice;

specifically, according to the decoded data, the identification and the time stamp of the current frame voice are obtained, and whether the current frame voice is continuous with the voice of the adjacent previous frame audio frame or not is judged;

if the result of the step 460 is yes, that is, the read current frame of speech is continuous with the last frame of speech, then step 470 is executed: reversely presuming and calculating algorithm parameters of the PLC model according to the decoded data;

updating the PLC model by using the algorithm parameters to carry out subsequent packet loss compensation operation; simultaneously inputting the current frame of voice into a sound mixing module;

in the steps, the algorithm parameters of the PLC model are calculated by using the decoded data reverse speculation, namely, the algorithm of the decoded data reverse-thrust periodic signal is used, so that the PLC model algorithm is applicable to all audio coding formats and is more convenient to apply;

if the execution result of step 460 is no, that is, the read current frame of voice is discontinuous with the last frame of voice, the current frame of voice and the last frame of voice are input into the audio mixing module after being smoothed;

in the step, discontinuous voice and the previous frame are subjected to smooth processing and then mixed, so that the voice communication quality can be ensured;

if the result of the execution of step 430 is no, that is, the current decoding channel does not read the current frame of speech, step 480 is executed: judging whether the current frame of voice is not read out after overtime;

specifically, under the condition that the current decoding channel fails to read the current frame of voice for the first time, the current frame of voice is read at intervals until the target times are reached, and if the current frame of voice is not read, the current decoding channel is judged to be overtime and not read the current frame of voice. If the reading is successful within the target frequency range, determining that the time is not overtime, and returning to step 430;

in one example, the nominal interval time may be 2ms or other, the target number may be 3 or other, and these parameters may be adjusted as desired.

If the result of step 480 is yes, i.e. the current frame of speech is not read out after the timeout, step 490 is executed: judging whether the current decoding channel participates in audio mixing;

if the result of step 490 is yes, i.e. the current decoding channel participates in the mixing, step 4100 is executed: calculating continuous packet loss time length of the current frame;

if the current decoding channel does not participate in the audio mixing, directly returning to the decoding channel for no processing; the frame loss data of the decoding channel which does not participate in the mixing is processed without calling the PLC model, so that the use frequency of the PLC model can be reduced, and the chip utilization rate of loading the PLC model is improved;

step 4110: judging whether the current continuous packet loss time length does not exceed a preset time period or not;

the continuous packet loss time length comprises the time length from continuous packet loss compensation to the current frame, and if the packet loss time length exceeds a preset time period, the packet loss is considered to be serious;

if the execution result of step 4110 is no, that is, the current continuous packet loss duration exceeds the preset time period, step 4150 is executed: filling mute data for the current frame;

if the result of the step 4110 is yes, that is, the current continuous packet loss duration does not exceed the preset time period, step 4120 is executed: judging whether the current frame voice is effective voice or not by using at least one audio frame voice before the current frame;

if the result of the step 4120 is yes, that is, the current frame voice is valid voice, then step 4130 is executed: invoking a PLC model to carry out packet loss compensation on the current frame of voice, and inputting the voice subjected to the packet loss compensation and the voice of the previous frame into a sound mixing module after carrying out smoothing treatment;

in the step, the voice after packet loss compensation is subjected to smoothing treatment and then is subjected to sound mixing, so that the quality of voice communication can be ensured;

if the execution result of step 4120 is no, that is, the current frame speech is invalid speech, step 4160 is executed: and filling the current frame by using the voice data of the previous frame, and then inputting the current frame into a sound mixing module.

In some scenarios, the discontinuity of the audio frame may be caused by network jitter, or may have been compensated for packet loss before, and if compensation is performed again, the delay is increased, so that only smoothing is performed on the discontinuity.

In some scenes, the audio frame continuously indicates that the current frame has no voice data missing, at the moment, the packet loss identification of the current decoding channel is cleared, then the current frame is subjected to voice decoding and voice detection, and then whether to participate in mixing is determined according to a mixing strategy.

In an alternative manner, if the current decoding channel uses the PLC model to perform continuous target number frame compensation, and the current frame is not yet read successfully, the continuous packet loss duration of the current frame is considered to exceed the preset time period, and the mute data is directly filled. Where the target number of frames may be 3 times or other values.

This is to take into account that packet loss compensation by the PLC model, energy attenuation is performed to reduce correlation between successive signals, so that data approaches silence packets after compensating for a few frames.

In the above embodiment, step 4120 is performed to obtain the decoded data of the previous audio frame voice or voices, and if the decoded data is displayed as valid voice, it is determined that the current frame voice is also valid voice, then packet loss compensation is entered. Otherwise, if the current frame of voice is invalid voice, such as environmental noise or background noise, packet loss compensation is not needed, and the current frame is filled with the last frame of voice data.

Fig. 5 is a schematic diagram of the structure of the voice data processing system of the present invention. As shown in fig. 5, the voice data processing system 500 of the present invention includes:

a first judging module 510, configured to judge whether the current frame is valid voice by using at least one audio frame voice before the current frame when the current decoding channel is overtime and the current frame voice is not read, where the current decoding channel is a decoding channel participating in mixing;

the compensation module 520, if yes, uses the packet loss compensation model to perform voice packet loss compensation on the current frame;

and a filling module 530, if not, the current frame is filled with the audio frame last to the current frame.

In an alternative embodiment, the first determining module 510 is specifically further configured to:

In an alternative embodiment, the first determining module 510 is specifically configured to:

judging whether the current decoding channel overtime does not read the current frame of voice or not, comprising:

In an alternative embodiment, the filling module 530 is specifically further configured to:

In an alternative embodiment, the speech data processing system 600 shown in FIG. 6 further includes, in comparison to FIG. 5:

a second judging module 610, configured to judge whether the current frame of speech is continuous with the adjacent previous frame of speech according to the identifier and the timestamp of the current frame if the current decoding channel is successful in reading the current frame of speech;

the smoothing module 620 decodes the current frame of speech and then performs smoothing processing if not;

and the decoding module 630 decodes the current frame of voice if yes.

Further, the voice data processing system 600 further includes a parameter updating module:

the parameter updating module is used for calculating algorithm parameters of the packet loss compensation model according to decoded data of the current frame voice under the condition that the current frame voice is continuous with the adjacent previous frame and the current frame voice participates in mixing, and updating the packet loss compensation model by using the algorithm parameters.

According to the embodiment of the disclosure, the PLC model is used for carrying out packet loss compensation by limiting the voice reading time and judging that the current frame voice is likely to be effective voice, and packet loss compensation is not carried out on all unread voices, so that the using times of the PLC model can be reduced, the occupancy rate of the PLC model to system resources is reduced, and the chip utilization rate loaded with the PLC model is improved. Meanwhile, the method uses the PLC model of the receiving end, reduces the requirement on the audio data encoder of the sending end, and is more beneficial to being compatible with multiple types of terminals to enter a voice mixing conference.

The embodiment of the invention also provides electronic equipment which comprises a processor. A memory having stored therein executable instructions of a processor. Wherein the processor is configured to perform the steps of the voice data processing method via execution of the executable instructions.

As described above, the electronic device of the present invention uses the PLC model to perform packet loss compensation by limiting the voice reading time and determining that the current frame voice is likely to be effective voice, and does not perform packet loss compensation on all unread voices, so that the number of times of using the PLC model can be reduced, the occupancy rate of the PLC model to system resources can be reduced, and the chip utilization rate loaded with the PLC model can be improved. Meanwhile, the method uses the PLC model of the receiving end, reduces the requirement on the audio data encoder of the sending end, and is more beneficial to being compatible with multiple types of terminals to enter a voice mixing conference.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" platform.

Fig. 7 is a schematic structural view of the electronic device of the present invention. An electronic device 700 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 7, the electronic device 700 is embodied in the form of a general purpose computing device. Components of electronic device 700 may include, but are not limited to: at least one processing unit 710, at least one memory unit 720, a bus 730 connecting the different platform components (including memory unit 720 and processing unit 710), a display unit 740, and the like.

In which a storage unit stores program code that can be executed by the processing unit 710 such that the processing unit 710 performs the steps according to various exemplary embodiments of the present invention described in the voice data processing method section of the present specification. For example, the processing unit 710 may perform the steps as shown in any of the embodiments of fig. 1-4.

The memory unit 720 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 721 and/or cache memory 722, and may further include Read Only Memory (ROM) 723.

The storage unit 720 may also include a program/utility 724 having a set (at least one) of program modules 725, such program modules 725 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 730 may be a bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 70 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 700, and/or any device (e.g., router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 750. Also, electronic device 700 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 760. Network adapter 760 may communicate with other modules of electronic device 700 via bus 730. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 700, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage platforms, and the like.

The embodiment of the invention also provides a computer readable storage medium for storing a program, which when executed implements the steps of the voice data processing method. In some possible embodiments, the aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the above-mentioned speech data processing method section of this specification, when the program product is run on the terminal device.

As described above, when the program of the computer readable storage medium of this embodiment is executed, it is possible to perform packet loss compensation by using the PLC model only when the voice reading time is limited and it is determined that the current frame voice is likely to be a valid voice, without performing packet loss compensation on all unread voices, so that the number of times of using the PLC model can be reduced, the occupancy of the system resource by the PLC model is reduced, and the chip utilization loaded with the PLC model is improved. Meanwhile, the method uses the PLC model of the receiving end, reduces the requirement on the audio data encoder of the sending end, and is more beneficial to being compatible with multiple types of terminals to enter a voice mixing conference.

Fig. 8 is a schematic structural view of a computer-readable storage medium of the present invention. Referring to fig. 8, a program product 800 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

In summary, the voice data processing method, the system, the device and the storage medium of the invention can carry out packet loss compensation by using the PLC model only when the voice reading time is limited and the current frame voice is judged to be possibly effective voice, and do not carry out packet loss compensation on all unread voices, thus reducing the use times of the PLC model, reducing the occupancy of the PLC model to system resources and improving the utilization rate of a chip loaded with the PLC model. Meanwhile, the method uses the PLC model of the receiving end, reduces the requirement on the audio data encoder of the sending end, and is more beneficial to being compatible with multiple types of terminals to enter a voice mixing conference.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. A method of processing speech data, comprising:

if not, the current frame is filled with the voice of the audio frame which is the last of the current frame.

2. The voice data processing method according to claim 1, characterized in that the voice data processing method further comprises:

when the current decoding channel is overtime and the current frame voice is not read, at least one audio frame voice before the current frame voice is used for judging whether the current frame voice is effective voice or not, the current decoding channel with a packet loss mark is screened out of a plurality of decoding channels, and whether the current decoding channel is overtime and the current frame voice is not read is judged.

3. The voice data processing method according to claim 2, characterized in that the voice data processing method further comprises:

triggering the current decoding channel to read the current frame voice from a buffer area through a mixer timer before judging whether the current decoding channel is overtime and does not read the current frame voice.

4. The method according to claim 2, wherein said determining whether the current decoding channel has not read the current frame of speech over time comprises:

and under the condition that the current decoding channel fails to read the current frame voice for the first time, reading the current frame voice at intervals until the target times are reached, and if the current decoding channel fails to read, judging that the current frame voice is not read when the current decoding channel is overtime.

5. The method according to claim 1, wherein when the current decoding channel times out that the current frame voice is not read, determining whether the current frame voice is valid voice using at least one audio frame voice before the current frame voice, comprises:

6. The voice data processing method according to claim 5, characterized in that the voice data processing method further comprises:

7. The voice data processing method according to claim 1, characterized in that the voice data processing method further comprises:

judging whether the current frame voice is continuous with the adjacent previous frame voice according to the identification and the timestamp of the current frame under the condition that the current decoding channel successfully reads the current frame voice;

if not, decoding the current frame voice and then performing smoothing treatment;

if yes, decoding the current frame of voice.

8. The voice data processing method according to claim 7, characterized in that the voice data processing method further comprises:

and under the condition that the current frame voice is continuous with the adjacent previous frame and the current frame voice participates in the mixing, calculating algorithm parameters of the packet loss compensation model according to the decoded data of the current frame voice, and updating the packet loss compensation model by using the algorithm parameters.

9. An electronic device, comprising:

a processor;

a memory having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the speech data processing method of any of claims 1 to 8 via execution of the executable instructions.

10. A computer-readable storage medium storing a program, characterized in that the program when executed implements the steps of the speech data processing method of any one of claims 1 to 8.