WO2023193506A1

WO2023193506A1 - Voice transmission method, terminal and computer-readable storage medium

Info

Publication number: WO2023193506A1
Application number: PCT/CN2023/071976
Authority: WO
Inventors: 颜蓓
Original assignee: 中兴通讯股份有限公司
Priority date: 2022-04-08
Filing date: 2023-01-12
Publication date: 2023-10-12
Also published as: CN116935870A

Abstract

A voice transmission method, a terminal and a computer-readable storage medium. The method comprises: acquiring a voice signal from a first terminal (S101); extracting feature information of the voice signal (S102); sending the feature information to a second terminal by means of a circuit switched domain (S103); and transmitting the voice signal to the second terminal by means of a packet switched domain, so that when a network parameter satisfies a preset condition, the second terminal repairs, according to the feature information received by means of the circuit switched domain, the voice signal received by means of the packet switched domain, and outputs the repaired voice signal (S104).

Description

Voice transmission method, terminal and computer-readable storage medium

Cross-references to related applications

This application is filed based on a Chinese patent application with application number 202210364802.5 and a filing date of April 8, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.

Technical field

Embodiments of the present application relate to but are not limited to the field of communications, and in particular, to a voice transmission method, a terminal and a computer-readable storage medium.

Background technique

In the early stages of network construction for the new generation of mobile communication technology or in areas where base stations are sparsely distributed, voice transmission quality will rapidly decline due to poor network quality. For example, in VONR (Voice Over New Radio, 5G phone) calls, the problem we often encounter is that when 5G (5th Generation Mobile Communication Technology, fifth generation mobile communication technology) base stations are less distributed, due to network loss Packet loss and delay increased sharply, and there was a lot of jitter and intermittent distortion in the voice. The voice quality was even far inferior to that in the CS domain, resulting in many bad user experiences. At present, the way to solve the problem of voice quality degradation due to poor signals in 4G and 5G networks is to force voice calls to fall back to 3G and 2G. This approach will not cause voice interruptions, etc., but there is no way to guarantee the high bandwidth and high sound quality of VONR. , causing users to make calls under 5G, but actually only have the call experience of 3G and 2G. Therefore, how to avoid the rapid decline of corresponding voice quality due to poor conditions of the new generation communication network has become an urgent problem to be solved.

Contents of the invention

The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.

Embodiments of the present application provide a voice transmission method, a terminal and a computer-readable storage medium.

In a first aspect, embodiments of the present application provide a voice transmission method, applied to a first terminal. The method includes: acquiring a voice signal of the first terminal; extracting characteristic information of the voice signal; Send the characteristic information to the second terminal; transmit the voice signal to the second terminal through the packet switching domain, so that when the network parameters meet the preset conditions, the second terminal receives the signal according to the circuit switching domain The characteristic information is used to repair the voice signal received in the packet switching domain, and the repaired voice signal is output.

In a second aspect, embodiments of the present application provide a voice transmission method, applied to a second terminal. The method includes: receiving characteristic information sent by the first terminal through the circuit-switched domain, where the characteristic information is extracted from the first terminal. The voice signal of the terminal; receiving the voice signal transmitted by the first terminal through the packet switching domain; when the network parameters meet the preset conditions, based on the characteristic information received by the circuit switching domain, the packet switching domain receives the voice signal. Repair the voice signal; output the repaired voice signal.

In a third aspect, embodiments of the present application provide a terminal, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above when executing the computer program. The voice transmission method described in the first aspect, or the voice transmission method described in the second aspect above.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium that stores a computer-executable program. The computer-executable program is used to cause a computer to execute the method described in the first aspect. Voice transmission method, or the voice transmission method as described in the second aspect above.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and obtained by the structure particularly pointed out in the specification, claims and appended drawings.

Description of the drawings

The drawings are used to provide a further understanding of the technical solution of the present application and constitute a part of the specification. They are used to explain the technical solution of the present application together with the embodiments of the present application and do not constitute a limitation of the technical solution of the present application.

Figure 1 is a flow chart of a voice transmission method provided by an embodiment of the present application (first terminal side);

Figure 2 is a schematic diagram of content information, time domain information and frequency domain characteristic information corresponding to a single speech segment provided by an embodiment of the present application;

Figure 3 is a sub-flow chart of a voice transmission method provided by an embodiment of the present application;

Figure 4 is a flow chart of a voice transmission method provided by an embodiment of the present application (second terminal side);

Figure 5 is a schematic diagram of speech repair using time domain information provided by an embodiment of the present application;

Figure 6 is a schematic diagram of voice repair using content information and frequency domain characteristic information provided by an embodiment of the present application;

Figure 7 is a sub-flow chart of a voice transmission method provided by an embodiment of the present application;

Figure 8 is a schematic structural diagram of a voice transmission system provided by an embodiment of the present application;

Figure 9 is a schematic structural diagram of a terminal provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

It should be understood that in the description of the embodiments of this application, the meaning of multiple (or multiple items) is two or more. Greater than, less than, exceeding, etc. are understood to exclude the number, and above, below, within, etc. are understood to include the number. If there are descriptions of "first", "second", etc., they are only used for the purpose of distinguishing technical features and cannot be understood as indicating or implying the relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the indicated technical features. The sequence relationship of technical features.

Embodiments of the present application provide a voice transmission method, a terminal and a computer-readable storage medium. By acquiring the voice signal of the first terminal, extracting the characteristic information of the voice signal, the characteristic information is sent to the computer through the CS (Circuit Switch, Circuit Switch) domain. The second terminal transmits the voice signal to the second terminal through the PS (PacketSwitch, packet switching) domain, so that when the network parameters meet the preset conditions, the second terminal responds to the characteristic information received in the PS domain based on the characteristic information received in the CS domain. Repair the voice signal and output the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through the call to the second terminal through the CS domain. When the network signal is poor, the second terminal receives the CS domain The obtained characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. . Therefore, this application has great flexibility. When the network signal is good, there is no need to perform repairs and repairs. When the network signal declines, the repair is automatically started. The user does not feel the impact of the network signal drop at all. Impact on call sound quality.

As shown in Figure 1, Figure 1 is a flow chart of a voice transmission method provided by an embodiment of the present application. The voice transmission method is applied to the first terminal. The voice transmission method includes but is not limited to the following steps:

S101, obtain the voice signal of the first terminal;

S102, extract feature information of the speech signal;

S103. Send the characteristic information to the second terminal through the circuit-switched domain;

S104, transmit the voice signal to the second terminal through the packet switching domain, so that when the network parameters meet the preset conditions, the second terminal repairs the voice signal received by the packet switching domain according to the characteristic information received by the circuit switching domain. , and output the repaired speech signal.

It can be understood that the terminal may include but is not limited to a mobile phone. The first terminal and the second terminal are used to represent two different terminals.

It can be understood that the feature information includes but is not limited to content information, time domain information and frequency domain characteristic information of the speech signal. Among them, the transmission of content information will not cause obvious loss, discontinuity and jitter in the voice; the transmission of frequency domain characteristic information can reduce the distortion of the voice; and the transmission of time domain information can also reduce the sharp increase in network delay. At this time, the voice call experience between the two parties will not be degraded due to large delays.

It can be understood that Figure 2 is a schematic diagram of the content information, time domain information and frequency domain characteristic information corresponding to a single voice segment. This voice segment is only an example. In actual operation, the time length of the voice segment can be determined according to a specific encoding method. The sampling rate can be flexibly selected; as shown in Figure 2, the content information that the user wants to express can be extracted from the original speech signal in the upper part of the figure, such as: "I will go home for dinner tonight"; while the timeline in the middle part of the figure is is the time domain part of this speech, indicating the sequential position of this segment in the entire call; the frequency domain characteristic curve in the lower half of the figure is used to characterize the timbre, and can be used to identify the characteristics of each person's speech, because each person's frequency The domain characteristic curve basically does not change much, so the amount of transmitted data in this part is very small.

It can be understood that to extract the content information, time domain information and frequency domain characteristic information of the speech signal, the number of extracted sampling points should be kept the same as the VONR speech sampling rate, and at the same time, these three parts of information at each sampling point should be processed A corresponding encoding, and since the three parts of characteristic information are completely transmitted through the CS domain, it is very safe and stable, and almost no loss of key information will occur.

It can be understood that the CS domain is a circuit-switched domain and is mainly responsible for voice services and video phone services; the PS domain is a packet-switched domain and is mainly responsible for data services.

It can be understood that this application can flexibly adjust the strategy according to the network signal quality. When the VONR network signal is good, there is no need to patch and repair. When the network signal fades, repair is automatically started, and the user does not feel the 5G network at all. The impact of signal dropout on call sound quality. The quality of the network signal can be judged through preset conditions. For example, when the network parameter of VONR is greater than the preset threshold, the network signal is considered to be poor, and the defective part of the voice signal needs to be repaired and repaired. The situation where the network parameters of VONR are greater than the preset threshold can include: the packet loss rate of the network is greater than 10% or the network delay increase is greater than 20ms. As long as any one of the above conditions is met, the network signal can be considered poor. The defective part of the speech signal needs to be repaired and repaired, where the defective part may include the missing part and the damaged part of the speech signal.

It is understandable that this application can first determine whether to activate the patching strategy based on the network packet loss. The network packet loss rate can be read in real time in the terminal's log, and the network delay can also be read in real time through information sent by the network. , when network packet loss and network delay reach a certain level, the defective part of the voice signal is repaired. Because any sentence of speech can be It is extracted into three parts: content information, time domain information and frequency domain characteristic information. As long as these three parts of information are available, a speech can be restored with complete fidelity. The CS domain transmits voice content information, voice time domain information, and voice frequency domain characteristic information into binary digital information. At the same time, the original high sampling rate and very clear voice analog signal is transmitted through 5G VONR. The other user's smart terminal compares the voice information received in the CS domain, time domain information and frequency domain characteristic information with the voice information received in the PS domain and further repairs defects and distortions to completely restore the voice information of the sending end. All voice messages are delivered simultaneously without any loss of sound quality or excessive delay.

It can be understood that by acquiring the voice signal of the first terminal, extracting the feature information of the voice signal, sending the feature information to the second terminal through the CS domain, and transmitting the voice signal to the second terminal through the PS domain of VONR, so that in the VONR When the network parameters are greater than the preset threshold, the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .

As shown in Figure 3, step S103 may include but is not limited to the following sub-steps:

S301, convert feature information into digital information;

S302. Send digital information to the second terminal through the circuit-switched domain.

It can be understood that the first terminal converts the content information, time domain information and frequency domain characteristic information of the voice signal into binary digital information, and then sends the digital information to the second terminal through the CS domain. Through digital conversion, the transmitted data can be reduced. quantity.

In summary, this application can adaptively reduce the jitter and distortion of the voice caused by network loss and packet loss during VONR calls. This method can ensure that the voice is always carried out under VONR during the call, and will not be affected by poor signals. While being forced to fall back to 3G or 2G, it can also ensure that the mobile phone can meet the GCF (Global Certification Forum) certification requirements for voice quality and overall delay after adding packet loss and delay. Due to its adaptive and flexible adjustment method, users can hardly feel the impact of VONR network quality on voice calls. Therefore, this application can be regarded as a good transition method for the unstable network conditions in many areas due to insufficient base station distribution in the early stage of 5G network construction. This application can adaptively patch the voice according to the packet loss or delay of the network. This can not only ensure the punctuality, stability and coherence of voice information transmission, but is not constrained by the 5G network signal quality at all.

As shown in Figure 4, Figure 4 is a flow chart of a voice transmission method provided by an embodiment of the present application. Voice transmission is applied to the second terminal. Voice transmission includes but is not limited to the following steps:

S401. Receive feature information sent by the first terminal through the circuit-switched domain. The feature information is extracted from the voice signal of the first terminal;

S402, receive the voice signal transmitted by the first terminal through the packet switching domain;

S403. When the network parameters of VONR meet the preset conditions, repair the voice signal received by the packet switching domain according to the characteristic information received by the circuit switching domain;

S404, output the repaired voice signal.

It can be understood that the second terminal can determine whether the voice signal needs to be repaired based on the packet loss rate and new delay in the network. When the content information, time domain information and frequency domain characteristic information of the voice signal are transmitted to the other terminal Finally, since the sampling rate is exactly the same, it can be compared one by one with the original voice signal transmitted by VONR. When the network condition is not good, the other terminal can perform synchronized synthesis and repair one by one according to the encoding.

It can be understood that, as shown in Figure 5, the speech segments are sorted according to the time domain information of the CS domain, where the speech signal is composed of multiple speech segments. In order to deal with the problem of confusion in the order of speech clips caused by delay and jitter, invalid speech clips mixed in, and delayed arrival of valid speech clips, the time domain information transmitted from the CS domain can be used to reorder and reorder these chaotic speech clips. Return to its original position and remove invalid information fragments.

It can be understood that, as shown in Figure 6, in order to solve the problem of partial missing and incomplete speech segment information caused by network loss, the content information and frequency domain information transmitted from the CS domain can be used to synthesize the speech segments one by one to convert the speech signal. The defective part is fully repaired, where the defective part may include the missing part and the damaged part of the speech signal.

It can be understood that this application can flexibly adjust the strategy according to the network signal quality. When the VONR network signal is good, there is no need to perform patching and repair. When the network signal is fading or poor, repair can be automatically initiated, and the user simply cannot There is no impact on call sound quality due to network signal drop. The quality of the network signal can be judged through preset conditions. For example, when the network parameter of VONR is greater than the preset threshold, the network signal is considered to be poor, and the defective part of the voice signal needs to be repaired and repaired. The situation where the network parameters of VONR are greater than the preset threshold can include: the packet loss rate of the network is greater than the preset percentage (for example, 10%) or the network delay increase is greater than the preset delay (20ms), as long as any of the above is satisfied. In this case, it can be considered that the network signal is poor, and the defective part of the voice signal needs to be repaired and repaired.

It can be understood that the second terminal receives the feature information sent by the first terminal through the CS domain, and the feature information is extracted from the voice signal of the first terminal. The second terminal simultaneously receives the voice signal transmitted by the first terminal through the PS domain of VONR. When the network parameter of VONR is greater than the preset threshold, the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .

As shown in Figure 7, step S403 may include but is not limited to the following sub-steps:

S701. Compare the voice signal received in the packet switching domain according to the content information, time domain information and frequency domain characteristic information of the received voice signal in the circuit switching domain to determine the defective part of the voice signal received in the packet switching domain. ;

S702, repair the defective part.

It is understandable that this application can first determine whether to activate the patching strategy based on the network packet loss. The network packet loss rate can be read in real time in the terminal's log, and the network delay can also be read in real time through information sent by the network. , when network packet loss and network delay reach a certain level, the defective part of the voice signal is repaired. Since any speech can be extracted into three parts: content information, time domain information and frequency domain characteristic information, as long as these three parts of information are available, a speech can be restored with complete fidelity. The CS domain transmits voice content information, voice time domain information, and voice frequency domain characteristic information into binary digital information. At the same time, the original high sampling rate and very clear voice analog signal is transmitted through 5G VONR. The other user's smart terminal compares the voice information received in the CS domain, time domain information and frequency domain characteristic information with the voice information received in the PS domain and further repairs defects and distortions to completely restore the voice information of the sending end. All voice messages are delivered simultaneously without any loss of sound quality or excessive delay.

In summary, when a VONR voice call starts, this application begins to extract the information of the original voice signal, decompose it into three parts: content information, time domain information, and frequency domain characteristic information and convert it into a digital signal. After passing the original voice information through VONR When transmitting in the PS domain, the extracted information is also continuously sent to the other party's mobile phone through the CS domain. Then judge the VONR network situation. If the packet loss rate is greater than the preset percentage or the network delay increases by more than the preset delay, the original voice signal will be patched when the other terminal receives the original voice signal and extracts the information. Since it has been Carry out one-to-one coding correspondence, so there will be no information loss, and the timbre of the original voice signal can be restored without reducing the user experience of VONR. If the VONR network is in good condition, the repair will not be initiated. Therefore, this application has great flexibility and adaptability, and can flexibly adjust the strategy according to the network quality. The content information is transmitted without obvious loss, discontinuity and jitter in the voice; the frequency domain characteristic information is The transmission can reduce the distortion of voice; and the transmission of time domain information can also prevent the voice call experience of both parties from degrading due to large delays when network delays increase sharply.

As shown in Figure 8, this embodiment of the present application also provides a voice transmission system.

The voice transmission system includes an information extraction module, a network situation judgment module and a voice repair module. Among them, the function of the information extraction module is to extract the speaker's content information, time domain information and frequency domain characteristic information. The number of extraction sampling points must be the same as the VONR speech sampling rate. At the same time, the content information of each sampling point, Time domain information and frequency domain characteristic information are encoded in one-to-one correspondence. Since content information, time domain information and frequency domain characteristic information are completely transmitted through the CS domain, it is very safe and stable, and almost no key information is lost. The function of the network condition judgment module is to judge whether the voice patching module needs to be activated based on the packet loss rate and new delay in the network. The function of the voice repair module is that after the content information, time domain information and frequency domain characteristic information are transmitted to the other party's terminal, since the sampling rate is exactly the same, it can be compared one by one with the original voice transmitted by VONR. When the network condition is not good , the other party's terminal can perform synchronized patching one by one according to the number. When a VONR voice call starts, the voice transmission system begins to extract the information of the original voice signal, decompose it into three parts: content information, time domain information, and frequency domain characteristic information and convert it into a digital signal. After passing the original voice signal through the PS domain of VONR During transmission, the extracted information is also continuously sent to the other party's mobile phone through the CS domain. Then judge the VONR network situation. If the packet loss rate is greater than the preset percentage or the network delay increases by more than the preset delay, the original voice signal will be patched when the other terminal receives the original voice signal and extracts the information. Since it has been Carry out one-to-one encoding correspondence, so there will be no information loss, and the timbre of the original voice can be restored without reducing the VONR user experience. If the VONR network is in good condition, the patching system will not start. In this way, the entire voice transmission system is an adaptive system that can flexibly adjust the strategy according to the network quality. The transmission of content information will not cause obvious loss, discontinuity and jitter in the voice; the transmission of frequency domain characteristic information can make The distortion of voice is reduced; and the transmission of time domain information can also prevent the voice call experience of both parties from being degraded due to large delays when network delays increase sharply.

As shown in Figure 9, an embodiment of the present application also provides a terminal, which includes but is not limited to a mobile phone.

In one embodiment, the terminal includes: one or more processors and memories. In FIG. 9 , one processor and memory are taken as an example. The processor and memory can be connected through a bus or other means. Figure 9 takes the connection through a bus as an example.

As a non-transitory computer-readable storage medium, the memory can be used to store non-transitory software programs and non-transitory computer executable programs, such as the voice transmission method in the above embodiments of the present application. The processor implements the voice transmission method in the above embodiment of the present application by running non-transient software programs and programs stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system and an application program required for at least one function; the storage data area may store data required to execute the voice transmission method in the embodiments of the present application. wait. In addition, the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may include memory located remotely relative to the processor, and these remote memories may be connected to the terminal through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.

The non-transitory software programs and programs required to implement the above-mentioned voice transmission method in the embodiment of the present application are stored in the memory. When executed by one or more processors, the above-mentioned voice transmission method in the embodiment of the present application is executed, for example , perform the above-described method steps S101 to step S104 in FIG. 1 and the method steps S301 to step S302 in FIG. 3, or perform the above-described method steps S401 to step S404 in FIG. 4 and method step S701 in FIG. 7 Go to step S702, obtain the voice signal of the first terminal, extract the feature information of the voice signal, send the feature information to the second terminal through the CS domain, and transmit the voice signal to the second terminal through the PS domain of VONR, so that in the VONR network When the parameter is greater than the preset threshold, the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .

In addition, embodiments of the present application also provide a computer-readable storage medium that stores a computer-executable program, and the computer-executable program is executed by one or more control processors, for example, as shown in FIG. 9 Execution by one of the processors can cause the one or more processors to execute the voice transmission method in the embodiment of the present application, for example, execute the method steps S101 to S104 in Figure 1 described above, the method in Figure 3 Steps S301 to step S302, or perform the above-described method steps S401 to step S404 in Figure 4, and method steps S701 to S702 in Figure 7 to obtain the voice signal of the first terminal, extract the characteristic information of the voice signal, and The CS domain sends characteristic information to the second terminal, and the voice signal is transmitted to the second terminal through the PS domain of VONR, so that when the network parameters of VONR are greater than the preset threshold, the second terminal responds based on the characteristic information received in the CS domain. The speech signal received by the PS domain is speech repaired, and the repaired speech signal is output. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received feature information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .

Embodiments of the present application include: obtaining the voice signal of the first terminal, extracting the characteristic information of the voice signal, sending the characteristic information to the second terminal through the circuit switching domain, and transmitting the voice signal to the second terminal through the packet switching domain, so that in the network When the parameters meet the preset conditions, the second terminal repairs the voice signal received by the packet-switched domain based on the characteristic information received by the circuit-switched domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through the call to the second terminal through the circuit-switched domain. When the network signal is poor, the second terminal transmits the circuit-switched signal to the second terminal. The characteristic information received in the packet switching domain is compared with the voice signal received in the packet switching domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sending end without reducing any sound quality or being too long. delay. Therefore, this application has great flexibility. When the network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the network signal drop on the call sound quality at all. .

Those of ordinary skill in the art can understand that all or some steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable programs, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies a computer-readable program, data structure, program module or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

The above is a detailed description of several implementations of the present application, but the present application is not limited to the above-mentioned implementations. Those skilled in the art can also make various equivalent modifications or substitutions without violating the essence of the present application. Equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

A voice transmission method, applied to a first terminal, the method includes:

Obtain the voice signal of the first terminal;

Extract characteristic information of the speech signal;

Send the characteristic information to the second terminal through the circuit-switched domain;

The voice signal is transmitted to the second terminal through the packet switching domain, so that when the network parameters meet the preset conditions, the second terminal receives information from the packet switching domain based on the characteristic information received by the circuit switching domain. The received voice signal is repaired, and the repaired voice signal is output.
The method according to claim 1, wherein the extracting the characteristic information of the speech signal includes extracting the content information, time domain information and frequency domain characteristic information of the speech signal.
The method according to claim 2, wherein the sampling rate of extracting the content information, the time domain information and the frequency domain characteristic information of the voice signal and the sampling rate of transmitting the voice signal through a packet switching domain are The same, and wherein the extracting the content information, time domain information and frequency domain characteristic information of the speech signal includes:

The content information, the time domain information and the frequency domain characteristic information of the sampled voice signal are encoded one by one.
The method according to claim 1, wherein the network parameters meeting preset conditions include at least one of the following:

The network packet loss rate is greater than the preset percentage; or

The network delay increases greater than the preset delay.
The method according to claim 1, wherein said sending the characteristic information to the second terminal through a circuit-switched domain includes:

Convert the characteristic information into digital information;

The digital information is sent to the second terminal through the circuit-switched domain.
A voice transmission method, applied to a second terminal, the method includes:

Receive feature information sent by the first terminal through the circuit-switched domain, where the feature information is extracted from the voice signal of the first terminal;

Receive the voice signal transmitted by the first terminal through the packet switching domain;

When the network parameters meet the preset conditions, repair the voice signal received by the packet-switched domain according to the characteristic information received by the circuit-switched domain;

The repaired voice signal is output.
The method according to claim 6, wherein the characteristic information includes content information, time domain information and frequency domain characteristic information of the voice signal, and the packet switching domain is modified according to the characteristic information received by the circuit switching domain. Speech repair is performed on the received voice signal, including:

The voice signal received in the packet-switched domain is compared according to the content information, time domain information and frequency domain characteristic information of the voice signal received in the circuit-switched domain to determine the quality of the voice signal received in the packet-switched domain. defective part;

Repair the defective part.
The method according to claim 7, wherein performing voice repair on the voice signal received by the packet-switched domain according to the characteristic information received by the circuit-switched domain further includes:

The speech segments are sorted according to the time domain information of the circuit-switched domain, wherein the speech signal is composed of a plurality of the speech segments.
A terminal, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the processor implements the requirements of any one of claims 1 to 5. The voice transmission method described in claim 6, or the voice transmission method described in any one of claims 6 to 8.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer-executable program, and the computer-executable program is used to cause the computer to execute the voice transmission method according to any one of claims 1 to 5. , or the voice transmission method according to any one of claims 6 to 8.