WO2023193506A1 - Voice transmission method, terminal and computer-readable storage medium - Google Patents

Voice transmission method, terminal and computer-readable storage medium Download PDF

Info

Publication number
WO2023193506A1
WO2023193506A1 PCT/CN2023/071976 CN2023071976W WO2023193506A1 WO 2023193506 A1 WO2023193506 A1 WO 2023193506A1 CN 2023071976 W CN2023071976 W CN 2023071976W WO 2023193506 A1 WO2023193506 A1 WO 2023193506A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
domain
terminal
voice
voice signal
Prior art date
Application number
PCT/CN2023/071976
Other languages
French (fr)
Chinese (zh)
Inventor
颜蓓
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023193506A1 publication Critical patent/WO2023193506A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W36/00Hand-off or reselection arrangements
    • H04W36/14Reselecting a network or an air interface

Definitions

  • Embodiments of the present application relate to but are not limited to the field of communications, and in particular, to a voice transmission method, a terminal and a computer-readable storage medium.
  • Embodiments of the present application provide a voice transmission method, a terminal and a computer-readable storage medium.
  • embodiments of the present application provide a voice transmission method, applied to a first terminal.
  • the method includes: acquiring a voice signal of the first terminal; extracting characteristic information of the voice signal; Send the characteristic information to the second terminal; transmit the voice signal to the second terminal through the packet switching domain, so that when the network parameters meet the preset conditions, the second terminal receives the signal according to the circuit switching domain
  • the characteristic information is used to repair the voice signal received in the packet switching domain, and the repaired voice signal is output.
  • embodiments of the present application provide a voice transmission method, applied to a second terminal.
  • the method includes: receiving characteristic information sent by the first terminal through the circuit-switched domain, where the characteristic information is extracted from the first terminal.
  • the voice signal of the terminal includes: receiving the voice signal transmitted by the first terminal through the packet switching domain; when the network parameters meet the preset conditions, based on the characteristic information received by the circuit switching domain, the packet switching domain receives the voice signal. Repair the voice signal; output the repaired voice signal.
  • embodiments of the present application provide a terminal, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above when executing the computer program.
  • a terminal including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above when executing the computer program.
  • embodiments of the present application provide a computer-readable storage medium that stores a computer-executable program.
  • the computer-executable program is used to cause a computer to execute the method described in the first aspect.
  • Figure 1 is a flow chart of a voice transmission method provided by an embodiment of the present application (first terminal side);
  • Figure 2 is a schematic diagram of content information, time domain information and frequency domain characteristic information corresponding to a single speech segment provided by an embodiment of the present application;
  • Figure 3 is a sub-flow chart of a voice transmission method provided by an embodiment of the present application.
  • Figure 4 is a flow chart of a voice transmission method provided by an embodiment of the present application (second terminal side);
  • Figure 5 is a schematic diagram of speech repair using time domain information provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of voice repair using content information and frequency domain characteristic information provided by an embodiment of the present application.
  • Figure 7 is a sub-flow chart of a voice transmission method provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a voice transmission system provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • Embodiments of the present application provide a voice transmission method, a terminal and a computer-readable storage medium.
  • the voice signal of the first terminal extracting the characteristic information of the voice signal, the characteristic information is sent to the computer through the CS (Circuit Switch, Circuit Switch) domain.
  • the second terminal transmits the voice signal to the second terminal through the PS (PacketSwitch, packet switching) domain, so that when the network parameters meet the preset conditions, the second terminal responds to the characteristic information received in the PS domain based on the characteristic information received in the CS domain. Repair the voice signal and output the repaired voice signal.
  • the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through the call to the second terminal through the CS domain.
  • the second terminal receives the CS domain
  • the obtained characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. . Therefore, this application has great flexibility.
  • the network signal is good, there is no need to perform repairs and repairs.
  • the repair is automatically started. The user does not feel the impact of the network signal drop at all. Impact on call sound quality.
  • Figure 1 is a flow chart of a voice transmission method provided by an embodiment of the present application.
  • the voice transmission method is applied to the first terminal.
  • the voice transmission method includes but is not limited to the following steps:
  • the terminal may include but is not limited to a mobile phone.
  • the first terminal and the second terminal are used to represent two different terminals.
  • the feature information includes but is not limited to content information, time domain information and frequency domain characteristic information of the speech signal.
  • the transmission of content information will not cause obvious loss, discontinuity and jitter in the voice;
  • the transmission of frequency domain characteristic information can reduce the distortion of the voice;
  • the transmission of time domain information can also reduce the sharp increase in network delay. At this time, the voice call experience between the two parties will not be degraded due to large delays.
  • Figure 2 is a schematic diagram of the content information, time domain information and frequency domain characteristic information corresponding to a single voice segment.
  • This voice segment is only an example.
  • the time length of the voice segment can be determined according to a specific encoding method.
  • the sampling rate can be flexibly selected; as shown in Figure 2, the content information that the user wants to express can be extracted from the original speech signal in the upper part of the figure, such as: "I will go home for dinner tonight"; while the timeline in the middle part of the figure is is the time domain part of this speech, indicating the sequential position of this segment in the entire call; the frequency domain characteristic curve in the lower half of the figure is used to characterize the timbre, and can be used to identify the characteristics of each person's speech, because each person's frequency The domain characteristic curve basically does not change much, so the amount of transmitted data in this part is very small.
  • the number of extracted sampling points should be kept the same as the VONR speech sampling rate, and at the same time, these three parts of information at each sampling point should be processed A corresponding encoding, and since the three parts of characteristic information are completely transmitted through the CS domain, it is very safe and stable, and almost no loss of key information will occur.
  • the CS domain is a circuit-switched domain and is mainly responsible for voice services and video phone services
  • the PS domain is a packet-switched domain and is mainly responsible for data services.
  • this application can flexibly adjust the strategy according to the network signal quality.
  • the VONR network signal is good, there is no need to patch and repair.
  • the network signal fades, repair is automatically started, and the user does not feel the 5G network at all.
  • the quality of the network signal can be judged through preset conditions. For example, when the network parameter of VONR is greater than the preset threshold, the network signal is considered to be poor, and the defective part of the voice signal needs to be repaired and repaired.
  • the situation where the network parameters of VONR are greater than the preset threshold can include: the packet loss rate of the network is greater than 10% or the network delay increase is greater than 20ms. As long as any one of the above conditions is met, the network signal can be considered poor.
  • the defective part of the speech signal needs to be repaired and repaired, where the defective part may include the missing part and the damaged part of the speech signal.
  • this application can first determine whether to activate the patching strategy based on the network packet loss.
  • the network packet loss rate can be read in real time in the terminal's log, and the network delay can also be read in real time through information sent by the network. , when network packet loss and network delay reach a certain level, the defective part of the voice signal is repaired. Because any sentence of speech can be It is extracted into three parts: content information, time domain information and frequency domain characteristic information. As long as these three parts of information are available, a speech can be restored with complete fidelity.
  • the CS domain transmits voice content information, voice time domain information, and voice frequency domain characteristic information into binary digital information. At the same time, the original high sampling rate and very clear voice analog signal is transmitted through 5G VONR.
  • the other user's smart terminal compares the voice information received in the CS domain, time domain information and frequency domain characteristic information with the voice information received in the PS domain and further repairs defects and distortions to completely restore the voice information of the sending end. All voice messages are delivered simultaneously without any loss of sound quality or excessive delay.
  • the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal.
  • the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain.
  • the second terminal transmits the CS domain
  • the received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility.
  • the VONR network signal is good, there is no need to repair and repair.
  • the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
  • step S103 may include but is not limited to the following sub-steps:
  • the first terminal converts the content information, time domain information and frequency domain characteristic information of the voice signal into binary digital information, and then sends the digital information to the second terminal through the CS domain. Through digital conversion, the transmitted data can be reduced. quantity.
  • this application can adaptively reduce the jitter and distortion of the voice caused by network loss and packet loss during VONR calls.
  • This method can ensure that the voice is always carried out under VONR during the call, and will not be affected by poor signals. While being forced to fall back to 3G or 2G, it can also ensure that the mobile phone can meet the GCF (Global Certification Forum) certification requirements for voice quality and overall delay after adding packet loss and delay. Due to its adaptive and flexible adjustment method, users can hardly feel the impact of VONR network quality on voice calls. Therefore, this application can be regarded as a good transition method for the unstable network conditions in many areas due to insufficient base station distribution in the early stage of 5G network construction.
  • This application can adaptively patch the voice according to the packet loss or delay of the network. This can not only ensure the punctuality, stability and coherence of voice information transmission, but is not constrained by the 5G network signal quality at all.
  • Figure 4 is a flow chart of a voice transmission method provided by an embodiment of the present application. Voice transmission is applied to the second terminal. Voice transmission includes but is not limited to the following steps:
  • S401 Receive feature information sent by the first terminal through the circuit-switched domain.
  • the feature information is extracted from the voice signal of the first terminal;
  • the terminal may include but is not limited to a mobile phone.
  • the first terminal and the second terminal are used to represent two different terminals.
  • the CS domain is a circuit-switched domain and is mainly responsible for voice services and video phone services
  • the PS domain is a packet-switched domain and is mainly responsible for data services.
  • the feature information includes but is not limited to content information, time domain information and frequency domain characteristic information of the speech signal.
  • the transmission of content information will not cause obvious loss, discontinuity and jitter in the voice;
  • the transmission of frequency domain characteristic information can reduce the distortion of the voice;
  • the transmission of time domain information can also reduce the sharp increase in network delay. At this time, the voice call experience between the two parties will not be degraded due to large delays.
  • the second terminal can determine whether the voice signal needs to be repaired based on the packet loss rate and new delay in the network.
  • the content information, time domain information and frequency domain characteristic information of the voice signal are transmitted to the other terminal.
  • the sampling rate is exactly the same, it can be compared one by one with the original voice signal transmitted by VONR.
  • the other terminal can perform synchronized synthesis and repair one by one according to the encoding.
  • the speech segments are sorted according to the time domain information of the CS domain, where the speech signal is composed of multiple speech segments.
  • the time domain information transmitted from the CS domain can be used to reorder and reorder these chaotic speech clips. Return to its original position and remove invalid information fragments.
  • the content information and frequency domain information transmitted from the CS domain can be used to synthesize the speech segments one by one to convert the speech signal.
  • the defective part is fully repaired, where the defective part may include the missing part and the damaged part of the speech signal.
  • this application can flexibly adjust the strategy according to the network signal quality.
  • the VONR network signal is good, there is no need to perform patching and repair.
  • repair can be automatically initiated, and the user simply cannot There is no impact on call sound quality due to network signal drop.
  • the quality of the network signal can be judged through preset conditions. For example, when the network parameter of VONR is greater than the preset threshold, the network signal is considered to be poor, and the defective part of the voice signal needs to be repaired and repaired.
  • the situation where the network parameters of VONR are greater than the preset threshold can include: the packet loss rate of the network is greater than the preset percentage (for example, 10%) or the network delay increase is greater than the preset delay (20ms), as long as any of the above is satisfied. In this case, it can be considered that the network signal is poor, and the defective part of the voice signal needs to be repaired and repaired.
  • the second terminal receives the feature information sent by the first terminal through the CS domain, and the feature information is extracted from the voice signal of the first terminal.
  • the second terminal simultaneously receives the voice signal transmitted by the first terminal through the PS domain of VONR.
  • the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal.
  • the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain.
  • the second terminal transmits the CS domain
  • the received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility.
  • the VONR network signal is good, there is no need to repair and repair.
  • the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
  • step S403 may include but is not limited to the following sub-steps:
  • this application can first determine whether to activate the patching strategy based on the network packet loss.
  • the network packet loss rate can be read in real time in the terminal's log, and the network delay can also be read in real time through information sent by the network. , when network packet loss and network delay reach a certain level, the defective part of the voice signal is repaired. Since any speech can be extracted into three parts: content information, time domain information and frequency domain characteristic information, as long as these three parts of information are available, a speech can be restored with complete fidelity.
  • the CS domain transmits voice content information, voice time domain information, and voice frequency domain characteristic information into binary digital information. At the same time, the original high sampling rate and very clear voice analog signal is transmitted through 5G VONR.
  • the other user's smart terminal compares the voice information received in the CS domain, time domain information and frequency domain characteristic information with the voice information received in the PS domain and further repairs defects and distortions to completely restore the voice information of the sending end. All voice messages are delivered simultaneously without any loss of sound quality or excessive delay.
  • this application begins to extract the information of the original voice signal, decompose it into three parts: content information, time domain information, and frequency domain characteristic information and convert it into a digital signal.
  • the extracted information is also continuously sent to the other party's mobile phone through the CS domain. Then judge the VONR network situation. If the packet loss rate is greater than the preset percentage or the network delay increases by more than the preset delay, the original voice signal will be patched when the other terminal receives the original voice signal and extracts the information.
  • the content information is transmitted without obvious loss, discontinuity and jitter in the voice; the frequency domain characteristic information is The transmission can reduce the distortion of voice; and the transmission of time domain information can also prevent the voice call experience of both parties from degrading due to large delays when network delays increase sharply.
  • this embodiment of the present application also provides a voice transmission system.
  • the voice transmission system includes an information extraction module, a network situation judgment module and a voice repair module.
  • the function of the information extraction module is to extract the speaker's content information, time domain information and frequency domain characteristic information.
  • the number of extraction sampling points must be the same as the VONR speech sampling rate.
  • the content information of each sampling point, Time domain information and frequency domain characteristic information are encoded in one-to-one correspondence. Since content information, time domain information and frequency domain characteristic information are completely transmitted through the CS domain, it is very safe and stable, and almost no key information is lost.
  • the function of the network condition judgment module is to judge whether the voice patching module needs to be activated based on the packet loss rate and new delay in the network.
  • the function of the voice repair module is that after the content information, time domain information and frequency domain characteristic information are transmitted to the other party's terminal, since the sampling rate is exactly the same, it can be compared one by one with the original voice transmitted by VONR.
  • the other party's terminal can perform synchronized patching one by one according to the number.
  • the voice transmission system begins to extract the information of the original voice signal, decompose it into three parts: content information, time domain information, and frequency domain characteristic information and convert it into a digital signal. After passing the original voice signal through the PS domain of VONR During transmission, the extracted information is also continuously sent to the other party's mobile phone through the CS domain.
  • the VONR network situation If the packet loss rate is greater than the preset percentage or the network delay increases by more than the preset delay, the original voice signal will be patched when the other terminal receives the original voice signal and extracts the information. Since it has been Carry out one-to-one encoding correspondence, so there will be no information loss, and the timbre of the original voice can be restored without reducing the VONR user experience. If the VONR network is in good condition, the patching system will not start. In this way, the entire voice transmission system is an adaptive system that can flexibly adjust the strategy according to the network quality.
  • the transmission of content information will not cause obvious loss, discontinuity and jitter in the voice; the transmission of frequency domain characteristic information can make The distortion of voice is reduced; and the transmission of time domain information can also prevent the voice call experience of both parties from being degraded due to large delays when network delays increase sharply.
  • an embodiment of the present application also provides a terminal, which includes but is not limited to a mobile phone.
  • the terminal includes: one or more processors and memories.
  • one processor and memory are taken as an example.
  • the processor and memory can be connected through a bus or other means.
  • Figure 9 takes the connection through a bus as an example.
  • the memory can be used to store non-transitory software programs and non-transitory computer executable programs, such as the voice transmission method in the above embodiments of the present application.
  • the processor implements the voice transmission method in the above embodiment of the present application by running non-transient software programs and programs stored in the memory.
  • the memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system and an application program required for at least one function; the storage data area may store data required to execute the voice transmission method in the embodiments of the present application. wait.
  • the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory may include memory located remotely relative to the processor, and these remote memories may be connected to the terminal through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the non-transitory software programs and programs required to implement the above-mentioned voice transmission method in the embodiment of the present application are stored in the memory.
  • the above-mentioned voice transmission method in the embodiment of the present application is executed, for example , perform the above-described method steps S101 to step S104 in FIG. 1 and the method steps S301 to step S302 in FIG. 3, or perform the above-described method steps S401 to step S404 in FIG. 4 and method step S701 in FIG.
  • step S702 obtain the voice signal of the first terminal, extract the feature information of the voice signal, send the feature information to the second terminal through the CS domain, and transmit the voice signal to the second terminal through the PS domain of VONR, so that in the VONR network
  • the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal.
  • the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain.
  • the second terminal transmits the CS domain
  • the received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility.
  • the VONR network signal is good, there is no need to repair and repair.
  • the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
  • embodiments of the present application also provide a computer-readable storage medium that stores a computer-executable program, and the computer-executable program is executed by one or more control processors, for example, as shown in FIG. 9
  • Execution by one of the processors can cause the one or more processors to execute the voice transmission method in the embodiment of the present application, for example, execute the method steps S101 to S104 in Figure 1 described above, the method in Figure 3 Steps S301 to step S302, or perform the above-described method steps S401 to step S404 in Figure 4, and method steps S701 to S702 in Figure 7 to obtain the voice signal of the first terminal, extract the characteristic information of the voice signal, and
  • the CS domain sends characteristic information to the second terminal, and the voice signal is transmitted to the second terminal through the PS domain of VONR, so that when the network parameters of VONR are greater than the preset threshold, the second terminal responds based on the characteristic information received in the CS domain.
  • the speech signal received by the PS domain is speech repaired, and the repaired speech signal is output.
  • the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain.
  • the second terminal transmits the CS domain
  • the received feature information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility.
  • the VONR network signal is good, there is no need to repair and repair.
  • the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
  • Embodiments of the present application include: obtaining the voice signal of the first terminal, extracting the characteristic information of the voice signal, sending the characteristic information to the second terminal through the circuit switching domain, and transmitting the voice signal to the second terminal through the packet switching domain, so that in the network
  • the second terminal repairs the voice signal received by the packet-switched domain based on the characteristic information received by the circuit-switched domain, and outputs the repaired voice signal.
  • the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through the call to the second terminal through the circuit-switched domain.
  • the second terminal transmits the circuit-switched signal to the second terminal.
  • the characteristic information received in the packet switching domain is compared with the voice signal received in the packet switching domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sending end without reducing any sound quality or being too long. delay. Therefore, this application has great flexibility.
  • the network signal is good, there is no need to repair and repair.
  • the repair is automatically started. The user does not feel the impact of the network signal drop on the call sound quality at all. .
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies a computer-readable program, data structure, program module or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A voice transmission method, a terminal and a computer-readable storage medium. The method comprises: acquiring a voice signal from a first terminal (S101); extracting feature information of the voice signal (S102); sending the feature information to a second terminal by means of a circuit switched domain (S103); and transmitting the voice signal to the second terminal by means of a packet switched domain, so that when a network parameter satisfies a preset condition, the second terminal repairs, according to the feature information received by means of the circuit switched domain, the voice signal received by means of the packet switched domain, and outputs the repaired voice signal (S104).

Description

语音传输方法、终端和计算机可读存储介质Voice transmission method, terminal and computer-readable storage medium
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202210364802.5、申请日为2022年04月08日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with application number 202210364802.5 and a filing date of April 8, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域Technical field
本申请实施例涉及但不限于通信领域,特别是涉及一种语音传输方法、终端和计算机可读存储介质。Embodiments of the present application relate to but are not limited to the field of communications, and in particular, to a voice transmission method, a terminal and a computer-readable storage medium.
背景技术Background technique
在新一代移动通信技术的建网初期或者响应基站分布较稀少的地域,语音传输质量会随着网络质量不佳而迅速下降。例如,在VONR(Voice Over New Radio,5G电话)通话中,我们时常会遇到的问题是在5G(5th Generation Mobile Communication Technology,第五代移动通信技术)基站分布较少时,在由于网损丢包和延时陡增,语音中出现了大量抖动和断续失真的情况,语音质量甚至远远不如CS域,造成了很多不好的用户体验。目前,解决4G和5G网络下由于信号不好而语音质量下降的方法是强制将语音通话回落到3G和2G,这样的做法不会导致语音断续等,但是没有办法保证VONR的高带宽高音质,导致用户在5G下通话,实际却只有3G和2G的通话体验。因此,如何避免由于新一代通信网络情况不佳从而导致相应语音质量快速下降成为亟待解决的问题。In the early stages of network construction for the new generation of mobile communication technology or in areas where base stations are sparsely distributed, voice transmission quality will rapidly decline due to poor network quality. For example, in VONR (Voice Over New Radio, 5G phone) calls, the problem we often encounter is that when 5G (5th Generation Mobile Communication Technology, fifth generation mobile communication technology) base stations are less distributed, due to network loss Packet loss and delay increased sharply, and there was a lot of jitter and intermittent distortion in the voice. The voice quality was even far inferior to that in the CS domain, resulting in many bad user experiences. At present, the way to solve the problem of voice quality degradation due to poor signals in 4G and 5G networks is to force voice calls to fall back to 3G and 2G. This approach will not cause voice interruptions, etc., but there is no way to guarantee the high bandwidth and high sound quality of VONR. , causing users to make calls under 5G, but actually only have the call experience of 3G and 2G. Therefore, how to avoid the rapid decline of corresponding voice quality due to poor conditions of the new generation communication network has become an urgent problem to be solved.
发明内容Contents of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本申请实施例提供了一种语音传输方法、终端和计算机可读存储介质。Embodiments of the present application provide a voice transmission method, a terminal and a computer-readable storage medium.
第一方面,本申请实施例提供了一种语音传输方法,应用于第一终端,所述方法包括:获取所述第一终端的语音信号;提取所述语音信号的特征信息;通过电路交换域发送所述特征信息至第二终端;通过分组交换域传输所述语音信号至所述第二终端,以使得在网络参数满足预设条件的情况下,所述第二终端根据电路交换域收到的所述特征信息对分组交换域收到的语音信号进行修复,并输出修复后的所述语音信号。In a first aspect, embodiments of the present application provide a voice transmission method, applied to a first terminal. The method includes: acquiring a voice signal of the first terminal; extracting characteristic information of the voice signal; Send the characteristic information to the second terminal; transmit the voice signal to the second terminal through the packet switching domain, so that when the network parameters meet the preset conditions, the second terminal receives the signal according to the circuit switching domain The characteristic information is used to repair the voice signal received in the packet switching domain, and the repaired voice signal is output.
第二方面,本申请实施例提供了一种语音传输方法,应用于第二终端,所述方法包括:接收第一终端通过电路交换域发送的特征信息,所述特征信息提取自所述第一终端的语音信号;接收所述第一终端通过分组交换域传输的所述语音信号;在网络参数满足预设条件的情况下,根据电路交换域收到的所述特征信息对分组交换域收到的语音信号进行修复;输出修复后的所述语音信号。In a second aspect, embodiments of the present application provide a voice transmission method, applied to a second terminal. The method includes: receiving characteristic information sent by the first terminal through the circuit-switched domain, where the characteristic information is extracted from the first terminal. The voice signal of the terminal; receiving the voice signal transmitted by the first terminal through the packet switching domain; when the network parameters meet the preset conditions, based on the characteristic information received by the circuit switching domain, the packet switching domain receives the voice signal. Repair the voice signal; output the repaired voice signal.
第三方面,本申请实施例提供了一种终端,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如上第一方面所述的语音传输方法,或者如上第二方面所述的语音传输方法。 In a third aspect, embodiments of the present application provide a terminal, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the above when executing the computer program. The voice transmission method described in the first aspect, or the voice transmission method described in the second aspect above.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行程序,所述计算机可执行程序用于使计算机执行如上第一方面所述的语音传输方法,或者如上第二方面所述的语音传输方法。In a fourth aspect, embodiments of the present application provide a computer-readable storage medium that stores a computer-executable program. The computer-executable program is used to cause a computer to execute the method described in the first aspect. Voice transmission method, or the voice transmission method as described in the second aspect above.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and obtained by the structure particularly pointed out in the specification, claims and appended drawings.
附图说明Description of the drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The drawings are used to provide a further understanding of the technical solution of the present application and constitute a part of the specification. They are used to explain the technical solution of the present application together with the embodiments of the present application and do not constitute a limitation of the technical solution of the present application.
图1是本申请一个实施例提供的一种语音传输方法的流程图(第一终端侧);Figure 1 is a flow chart of a voice transmission method provided by an embodiment of the present application (first terminal side);
图2是本申请一个实施例提供的单个语音片段对应的内容信息、时域信息和频域特性信息的示意图;Figure 2 is a schematic diagram of content information, time domain information and frequency domain characteristic information corresponding to a single speech segment provided by an embodiment of the present application;
图3是本申请一个实施例提供的一种语音传输方法的一子流程图;Figure 3 is a sub-flow chart of a voice transmission method provided by an embodiment of the present application;
图4是本申请一个实施例提供的一种语音传输方法的流程图(第二终端侧);Figure 4 is a flow chart of a voice transmission method provided by an embodiment of the present application (second terminal side);
图5是本申请一个实施例提供的利用时域信息进行语音修复示意图;Figure 5 is a schematic diagram of speech repair using time domain information provided by an embodiment of the present application;
图6是本申请一个实施例提供的利用内容信息和频域特性信息进行语音修复示意图;Figure 6 is a schematic diagram of voice repair using content information and frequency domain characteristic information provided by an embodiment of the present application;
图7是本申请一个实施例提供的一种语音传输方法的一子流程图;Figure 7 is a sub-flow chart of a voice transmission method provided by an embodiment of the present application;
图8是本申请一个实施例提供的一种语音传输系统的结构示意图;Figure 8 is a schematic structural diagram of a voice transmission system provided by an embodiment of the present application;
图9是本申请一个实施例提供的一种终端结构示意图。Figure 9 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.
应了解,在本申请实施例的描述中,多个(或多项)的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到“第一”、“第二”等只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。It should be understood that in the description of the embodiments of this application, the meaning of multiple (or multiple items) is two or more. Greater than, less than, exceeding, etc. are understood to exclude the number, and above, below, within, etc. are understood to include the number. If there are descriptions of "first", "second", etc., they are only used for the purpose of distinguishing technical features and cannot be understood as indicating or implying the relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the indicated technical features. The sequence relationship of technical features.
本申请实施例提供了一种语音传输方法、终端和计算机可读存储介质,通过获取第一终端的语音信号,提取语音信号的特征信息,通过CS(Circuit Switch,电路交换)域发送特征信息至第二终端,通过PS(PacketSwitch,分组交换)域传输语音信号至第二终端,以使得在网络参数满足预设条件的情况下,第二终端根据CS域收到的特征信息对PS域收到的语音信号进行修复,并输出修复后的语音信号。基于此,通过第一终端对语音信号提取特征信息,通过CS域将特征信息以及通过通话传输原始的语音信号传输给第二终端,在网络信号不佳的情况下,第二终端将CS域收到的特征信息对PS域收到的语音信号进行比对,以对语音信号进行缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。因此,本申请具备很大的灵活性,当网络信号好的时候,可以不需要进行修补和修复,当网络信号衰落时,自动启动修复,用户根本感觉不到因网络信号回落对于 通话音质的影响。Embodiments of the present application provide a voice transmission method, a terminal and a computer-readable storage medium. By acquiring the voice signal of the first terminal, extracting the characteristic information of the voice signal, the characteristic information is sent to the computer through the CS (Circuit Switch, Circuit Switch) domain. The second terminal transmits the voice signal to the second terminal through the PS (PacketSwitch, packet switching) domain, so that when the network parameters meet the preset conditions, the second terminal responds to the characteristic information received in the PS domain based on the characteristic information received in the CS domain. Repair the voice signal and output the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through the call to the second terminal through the CS domain. When the network signal is poor, the second terminal receives the CS domain The obtained characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. . Therefore, this application has great flexibility. When the network signal is good, there is no need to perform repairs and repairs. When the network signal declines, the repair is automatically started. The user does not feel the impact of the network signal drop at all. Impact on call sound quality.
如图1所示,图1是本申请一个实施例提供的一种语音传输方法的流程图。语音传输方法应用于第一终端,语音传输方法包括但不限于如下步骤:As shown in Figure 1, Figure 1 is a flow chart of a voice transmission method provided by an embodiment of the present application. The voice transmission method is applied to the first terminal. The voice transmission method includes but is not limited to the following steps:
S101,获取第一终端的语音信号;S101, obtain the voice signal of the first terminal;
S102,提取语音信号的特征信息;S102, extract feature information of the speech signal;
S103,通过电路交换域发送特征信息至第二终端;S103. Send the characteristic information to the second terminal through the circuit-switched domain;
S104,通过分组交换域传输语音信号至第二终端,以使得在网络参数满足预设条件的情况下,第二终端根据电路交换域收到的特征信息对分组交换域收到的语音信号进行修复,并输出修复后的语音信号。S104, transmit the voice signal to the second terminal through the packet switching domain, so that when the network parameters meet the preset conditions, the second terminal repairs the voice signal received by the packet switching domain according to the characteristic information received by the circuit switching domain. , and output the repaired speech signal.
可以理解的是,终端可以包括但不限于手机。其中,第一终端和第二终端用于表示两个不同的终端。It can be understood that the terminal may include but is not limited to a mobile phone. The first terminal and the second terminal are used to represent two different terminals.
可以理解的是,特征信息包括但不限于语音信号的内容信息、时域信息和频域特性信息。其中,内容信息的传递另语音不会产生明显的丢失,断续和抖动等;频域特性信息的传递可使语音的失真度降低;而时域信息的传递也能在网络延时陡增的时候,双方语音通话不会由于较大的延时而导致体验下降。It can be understood that the feature information includes but is not limited to content information, time domain information and frequency domain characteristic information of the speech signal. Among them, the transmission of content information will not cause obvious loss, discontinuity and jitter in the voice; the transmission of frequency domain characteristic information can reduce the distortion of the voice; and the transmission of time domain information can also reduce the sharp increase in network delay. At this time, the voice call experience between the two parties will not be degraded due to large delays.
可以理解的是,图2是单个语音片段对应的内容信息、时域信息和频域特性信息的示意图,该语音片段仅是一个举例,实际操作中,语音片段的时间长度可根据特定编码方式的采样率来灵活选取;如图2所示,图上半部分的原始语音信号中可提取出用户要表达的内容信息,比如:“我今晚回家吃饭”;而图中间部分的时间轴则是这段语音的时域部分,表明这个片段在整个通话中的顺序位置;图下半部分的频域特性曲线是用来表征音色的,可用于辨识每个人说话的特征,由于每个人的频域特性曲线基本是变化不大的,所以这部分的传输数据量是很小的。It can be understood that Figure 2 is a schematic diagram of the content information, time domain information and frequency domain characteristic information corresponding to a single voice segment. This voice segment is only an example. In actual operation, the time length of the voice segment can be determined according to a specific encoding method. The sampling rate can be flexibly selected; as shown in Figure 2, the content information that the user wants to express can be extracted from the original speech signal in the upper part of the figure, such as: "I will go home for dinner tonight"; while the timeline in the middle part of the figure is is the time domain part of this speech, indicating the sequential position of this segment in the entire call; the frequency domain characteristic curve in the lower half of the figure is used to characterize the timbre, and can be used to identify the characteristics of each person's speech, because each person's frequency The domain characteristic curve basically does not change much, so the amount of transmitted data in this part is very small.
可以理解的是,对语音信号的内容信息、时域信息和频域特性信息进行提取,提取采样点的数量要和VONR语音采样率保持一样,同时对每个采样点的这三部分信息进行一一对应的编码,而由于三部分特性信息完全通过CS域传输,则非常安全和稳定,几乎不会造成关键信息的丢失。It can be understood that to extract the content information, time domain information and frequency domain characteristic information of the speech signal, the number of extracted sampling points should be kept the same as the VONR speech sampling rate, and at the same time, these three parts of information at each sampling point should be processed A corresponding encoding, and since the three parts of characteristic information are completely transmitted through the CS domain, it is very safe and stable, and almost no loss of key information will occur.
可以理解的是,CS域为电路交换域,主要负责语音业务和视频电话业务;PS域为分组交换域,主要负责数据业务。It can be understood that the CS domain is a circuit-switched domain and is mainly responsible for voice services and video phone services; the PS domain is a packet-switched domain and is mainly responsible for data services.
可以理解的是,本申请可以根据网络信号质量来灵活地调整策略,当VONR网络信号好的时候,不需要进行修补和修复,当网络信号衰落时,自动启动修复,用户根本感觉不到5G网络信号回落对于通话音质的影响。对于网络信号的好坏,可以通过预设的条件进行判断,例如,当VONR的网络参数大于预设阈值,则认为网络信号不佳,需要对语音信号的缺损部分进行修补和修复。而对于VONR的网络参数大于预设阈值的情况可以包括:网络的丢包率大于10%或者网络的延时增加大于20ms,只需满足上述任意之一的情况,即可以认为网络信号不佳,需要对语音信号的缺损部分进行修补和修复,其中,缺损部分可以包括语音信号的缺失部分和损坏部分。It can be understood that this application can flexibly adjust the strategy according to the network signal quality. When the VONR network signal is good, there is no need to patch and repair. When the network signal fades, repair is automatically started, and the user does not feel the 5G network at all. The impact of signal dropout on call sound quality. The quality of the network signal can be judged through preset conditions. For example, when the network parameter of VONR is greater than the preset threshold, the network signal is considered to be poor, and the defective part of the voice signal needs to be repaired and repaired. The situation where the network parameters of VONR are greater than the preset threshold can include: the packet loss rate of the network is greater than 10% or the network delay increase is greater than 20ms. As long as any one of the above conditions is met, the network signal can be considered poor. The defective part of the speech signal needs to be repaired and repaired, where the defective part may include the missing part and the damaged part of the speech signal.
可以理解的是,本申请可以先根据网络丢包情况来判断是否启动修补策略,网络丢包率可在终端的日志中实时读取,网络的延时情况也可过网络下发信息实时读取,当网络丢包和网络延时达到一定程度时就对语音信号的缺损部分进行修复。由于任何一句语音,都可以被 提取成内容信息、时域信息和频域特性信息这三部分,只要有这三部分信息,就可以完全保真地还原一句语音。通过CS域传输语音的内容信息,语音的时域信息,语音的频域特性信息所转化成的二进制的数字信息,同时通过5G VONR传输原始的高采样率的非常清晰的语音模拟信号。对方用户的智能终端将CS域收到的语音信息,时域信息和频域特性信息对PS域的收到的语音信息进行比对以及进一步的缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。It is understandable that this application can first determine whether to activate the patching strategy based on the network packet loss. The network packet loss rate can be read in real time in the terminal's log, and the network delay can also be read in real time through information sent by the network. , when network packet loss and network delay reach a certain level, the defective part of the voice signal is repaired. Because any sentence of speech can be It is extracted into three parts: content information, time domain information and frequency domain characteristic information. As long as these three parts of information are available, a speech can be restored with complete fidelity. The CS domain transmits voice content information, voice time domain information, and voice frequency domain characteristic information into binary digital information. At the same time, the original high sampling rate and very clear voice analog signal is transmitted through 5G VONR. The other user's smart terminal compares the voice information received in the CS domain, time domain information and frequency domain characteristic information with the voice information received in the PS domain and further repairs defects and distortions to completely restore the voice information of the sending end. All voice messages are delivered simultaneously without any loss of sound quality or excessive delay.
可以理解的是,通过获取第一终端的语音信号,提取语音信号的特征信息,通过CS域发送特征信息至第二终端,通过VONR的PS域传输语音信号至第二终端,以使得在VONR的网络参数大于预设阈值的情况下,第二终端根据CS域收到的特征信息对PS域收到的语音信号进行语音修复,并输出修复处理后的语音信号。基于此,通过第一终端对语音信号提取特征信息,通过CS域将特征信息以及通过5G VONR传输原始的语音信号传输给第二终端,在网络信号不佳的情况下,第二终端将CS域收到的特征信息对PS域收到的语音信号进行比对,以对语音信号进行缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。因此,本申请具备很大的灵活性,当VONR网络信号好的时候,不需要进行修补和修复,当网络信号衰落时,自动启动修复,用户根本感觉不到5G网络信号回落对于通话音质的影响。It can be understood that by acquiring the voice signal of the first terminal, extracting the feature information of the voice signal, sending the feature information to the second terminal through the CS domain, and transmitting the voice signal to the second terminal through the PS domain of VONR, so that in the VONR When the network parameters are greater than the preset threshold, the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
如图3所示,步骤S103可以包括但不限于如下子步骤:As shown in Figure 3, step S103 may include but is not limited to the following sub-steps:
S301,将特征信息转化成数字信息;S301, convert feature information into digital information;
S302,通过电路交换域发送数字信息至第二终端。S302. Send digital information to the second terminal through the circuit-switched domain.
可以理解的是,第一终端将语音信号的内容信息、时域信息和频域特性信息转化成二进制的数字信息,再通过CS域发送数字信息至第二终端,通过数字转换可以减少传输的数据量。It can be understood that the first terminal converts the content information, time domain information and frequency domain characteristic information of the voice signal into binary digital information, and then sends the digital information to the second terminal through the CS domain. Through digital conversion, the transmitted data can be reduced. quantity.
综上所述,本申请可自适应减少VONR通话中由于网损丢包而导致语音的抖动失真,本方法既可保证通话时语音一直是在VONR下开展的,而并不会由于信号不好而被强制地回落到3G或2G,同时又可以保证手机能够满足GCF(Global Certification Forum)认证对于加入丢包和延时后语音音质及整体延时的要求。由于其自适应的灵活调整方式,用户几乎感受不到VONR网络质量对于语音通话的影响。因此,本申请不失为5G网络建网初期很多地区由于基站分布不够,导致网络状况不稳定的一种过渡好办法。本申请可以根据网络的丢包情况或延时情况来自适应地修补语音,这样既可保证语音信息传输的准时性、稳定性和连贯性,且完全不受5G网络信号质量的约束。In summary, this application can adaptively reduce the jitter and distortion of the voice caused by network loss and packet loss during VONR calls. This method can ensure that the voice is always carried out under VONR during the call, and will not be affected by poor signals. While being forced to fall back to 3G or 2G, it can also ensure that the mobile phone can meet the GCF (Global Certification Forum) certification requirements for voice quality and overall delay after adding packet loss and delay. Due to its adaptive and flexible adjustment method, users can hardly feel the impact of VONR network quality on voice calls. Therefore, this application can be regarded as a good transition method for the unstable network conditions in many areas due to insufficient base station distribution in the early stage of 5G network construction. This application can adaptively patch the voice according to the packet loss or delay of the network. This can not only ensure the punctuality, stability and coherence of voice information transmission, but is not constrained by the 5G network signal quality at all.
如图4所示,图4是本申请一个实施例提供的一种语音传输方法的流程图。语音传输应用于第二终端,语音传输包括但不限于如下步骤:As shown in Figure 4, Figure 4 is a flow chart of a voice transmission method provided by an embodiment of the present application. Voice transmission is applied to the second terminal. Voice transmission includes but is not limited to the following steps:
S401,接收第一终端通过电路交换域发送的特征信息,特征信息提取自第一终端的语音信号;S401. Receive feature information sent by the first terminal through the circuit-switched domain. The feature information is extracted from the voice signal of the first terminal;
S402,接收第一终端通过分组交换域传输的语音信号;S402, receive the voice signal transmitted by the first terminal through the packet switching domain;
S403,在VONR的网络参数满足预设条件的情况下,根据电路交换域收到的特征信息对分组交换域收到的语音信号进行修复;S403. When the network parameters of VONR meet the preset conditions, repair the voice signal received by the packet switching domain according to the characteristic information received by the circuit switching domain;
S404,输出修复后的语音信号。S404, output the repaired voice signal.
可以理解的是,终端可以包括但不限于手机。其中,第一终端和第二终端用于表示两个不同的终端。 It can be understood that the terminal may include but is not limited to a mobile phone. The first terminal and the second terminal are used to represent two different terminals.
可以理解的是,CS域为电路交换域,主要负责语音业务和视频电话业务;PS域为分组交换域,主要负责数据业务。It can be understood that the CS domain is a circuit-switched domain and is mainly responsible for voice services and video phone services; the PS domain is a packet-switched domain and is mainly responsible for data services.
可以理解的是,特征信息包括但不限于语音信号的内容信息、时域信息和频域特性信息。其中,内容信息的传递另语音不会产生明显的丢失,断续和抖动等;频域特性信息的传递可使语音的失真度降低;而时域信息的传递也能在网络延时陡增的时候,双方语音通话不会由于较大的延时而导致体验下降。It can be understood that the feature information includes but is not limited to content information, time domain information and frequency domain characteristic information of the speech signal. Among them, the transmission of content information will not cause obvious loss, discontinuity and jitter in the voice; the transmission of frequency domain characteristic information can reduce the distortion of the voice; and the transmission of time domain information can also reduce the sharp increase in network delay. At this time, the voice call experience between the two parties will not be degraded due to large delays.
可以理解的是,第二终端可以根据网络中的丢包率和新增延时来判断是否需要对语音信号进行修复,当语音信号的内容信息、时域信息和频域特性信息传输到对方终端后,由于采样率完全一样,可以同VONR传输过来的原始语音信号进行一一比对,当网络情况不好时,对方终端可根据编码进行一一同步合成修补。It can be understood that the second terminal can determine whether the voice signal needs to be repaired based on the packet loss rate and new delay in the network. When the content information, time domain information and frequency domain characteristic information of the voice signal are transmitted to the other terminal Finally, since the sampling rate is exactly the same, it can be compared one by one with the original voice signal transmitted by VONR. When the network condition is not good, the other terminal can perform synchronized synthesis and repair one by one according to the encoding.
可以理解的是,如图5所示,根据CS域的时域信息对语音片段进行排序,其中,语音信号由多个语音片段组成。为了应对延时和抖动导致的语音片段顺序混乱,无效语音片段夹杂其中,有效语音片段延迟到来的问题,可以用CS域传过来的时域信息来对这些混乱的语音片段进行重新的顺序调整和归位,并将无效信息片段剔除。It can be understood that, as shown in Figure 5, the speech segments are sorted according to the time domain information of the CS domain, where the speech signal is composed of multiple speech segments. In order to deal with the problem of confusion in the order of speech clips caused by delay and jitter, invalid speech clips mixed in, and delayed arrival of valid speech clips, the time domain information transmitted from the CS domain can be used to reorder and reorder these chaotic speech clips. Return to its original position and remove invalid information fragments.
可以理解的是,如图6所示,针对网损引起的语音片段信息部分缺失和不全的问题,可以用CS域传过来的内容信息及频域信息一一合成的语音片段来将语音信号的缺损部分修复完好,其中,缺损部分可以包括语音信号的缺失部分和损坏部分。It can be understood that, as shown in Figure 6, in order to solve the problem of partial missing and incomplete speech segment information caused by network loss, the content information and frequency domain information transmitted from the CS domain can be used to synthesize the speech segments one by one to convert the speech signal. The defective part is fully repaired, where the defective part may include the missing part and the damaged part of the speech signal.
可以理解的是,本申请可以根据网络信号质量来灵活地调整策略,当VONR网络信号好的时候,可以不需要进行修补和修复,当网络信号衰落或不佳时,可以自动启动修复,用户根本感觉不到因网络信号回落对于通话音质的影响。对于网络信号的好坏,可以通过预设的条件进行判断,例如,当VONR的网络参数大于预设阈值,则认为网络信号不佳,需要对语音信号的缺损部分进行修补和修复。而对于VONR的网络参数大于预设阈值的情况可以包括:网络的丢包率大于预设百分比(例如10%)或者网络的延时增加大于预设时延(20ms),只需满足上述任意之一的情况,即可以认为网络信号不佳,需要对语音信号的缺损部分进行修补和修复。It can be understood that this application can flexibly adjust the strategy according to the network signal quality. When the VONR network signal is good, there is no need to perform patching and repair. When the network signal is fading or poor, repair can be automatically initiated, and the user simply cannot There is no impact on call sound quality due to network signal drop. The quality of the network signal can be judged through preset conditions. For example, when the network parameter of VONR is greater than the preset threshold, the network signal is considered to be poor, and the defective part of the voice signal needs to be repaired and repaired. The situation where the network parameters of VONR are greater than the preset threshold can include: the packet loss rate of the network is greater than the preset percentage (for example, 10%) or the network delay increase is greater than the preset delay (20ms), as long as any of the above is satisfied. In this case, it can be considered that the network signal is poor, and the defective part of the voice signal needs to be repaired and repaired.
可以理解的是,第二终端接收第一终端通过CS域发送的特征信息,特征信息提取自第一终端的语音信号。第二终端同时接收第一终端通过VONR的PS域传输的语音信号。在VONR的网络参数大于预设阈值的情况下,第二终端根据CS域收到的特征信息对PS域收到的语音信号进行语音修复,输出修复处理后的语音信号。基于此,通过第一终端对语音信号提取特征信息,通过CS域将特征信息以及通过5G VONR传输原始的语音信号传输给第二终端,在网络信号不佳的情况下,第二终端将CS域收到的特征信息对PS域收到的语音信号进行比对,以对语音信号进行缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。因此,本申请具备很大的灵活性,当VONR网络信号好的时候,不需要进行修补和修复,当网络信号衰落时,自动启动修复,用户根本感觉不到5G网络信号回落对于通话音质的影响。It can be understood that the second terminal receives the feature information sent by the first terminal through the CS domain, and the feature information is extracted from the voice signal of the first terminal. The second terminal simultaneously receives the voice signal transmitted by the first terminal through the PS domain of VONR. When the network parameter of VONR is greater than the preset threshold, the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
如图7所示,步骤S403可以包括但不限于如下子步骤:As shown in Figure 7, step S403 may include but is not limited to the following sub-steps:
S701,根据电路交换域的收到的语音信号的内容信息、时域信息和频域特性信息对分组交换域收到的语音信号进行比对,以确定分组交换域接收到的语音信号的缺损部分;S701. Compare the voice signal received in the packet switching domain according to the content information, time domain information and frequency domain characteristic information of the received voice signal in the circuit switching domain to determine the defective part of the voice signal received in the packet switching domain. ;
S702,对缺损部分进行修复。 S702, repair the defective part.
可以理解的是,本申请可以先根据网络丢包情况来判断是否启动修补策略,网络丢包率可在终端的日志中实时读取,网络的延时情况也可过网络下发信息实时读取,当网络丢包和网络延时达到一定程度时就对语音信号的缺损部分进行修复。由于任何一句语音,都可以被提取成内容信息、时域信息和频域特性信息这三部分,只要有这三部分信息,就可以完全保真地还原一句语音。通过CS域传输语音的内容信息,语音的时域信息,语音的频域特性信息所转化成的二进制的数字信息,同时通过5G VONR传输原始的高采样率的非常清晰的语音模拟信号。对方用户的智能终端将CS域收到的语音信息,时域信息和频域特性信息对PS域的收到的语音信息进行比对以及进一步的缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。It is understandable that this application can first determine whether to activate the patching strategy based on the network packet loss. The network packet loss rate can be read in real time in the terminal's log, and the network delay can also be read in real time through information sent by the network. , when network packet loss and network delay reach a certain level, the defective part of the voice signal is repaired. Since any speech can be extracted into three parts: content information, time domain information and frequency domain characteristic information, as long as these three parts of information are available, a speech can be restored with complete fidelity. The CS domain transmits voice content information, voice time domain information, and voice frequency domain characteristic information into binary digital information. At the same time, the original high sampling rate and very clear voice analog signal is transmitted through 5G VONR. The other user's smart terminal compares the voice information received in the CS domain, time domain information and frequency domain characteristic information with the voice information received in the PS domain and further repairs defects and distortions to completely restore the voice information of the sending end. All voice messages are delivered simultaneously without any loss of sound quality or excessive delay.
综上所述,当VONR语音通话开始,本申请开始将原始语音信号的信息提取,分解为内容信息、时域信息、频域特性信息三部分并转化为数字信号,在将原始语音信息通过VONR的PS域传输时,也不断地将提取的信息通过CS域发到对方手机。然后对VONR网络情况进行判断,如果丢包率大于预设百分比或者网络延时增加预设时延以上,则在对方终端收到原始语音信号和提取信息时对原始语音信号进行修补,由于之前已进行一一的编码对应,所以不会有信息缺失,也能使原始语音信号的音色得到还原,不会降低VONR的用户体验。如果VONR网络情况良好,则不会启动修复。因此,本申请具备很大的灵活性和自适应性,可根据网络质量来灵活地来调整策略,内容信息的传递另语音不会产生明显的丢失,断续和抖动等;频域特性信息的传递可使语音的失真度降低;而时域信息的传递也能在网络延时陡增的时候,双方语音通话不会由于较大的延时而导致体验下降。In summary, when a VONR voice call starts, this application begins to extract the information of the original voice signal, decompose it into three parts: content information, time domain information, and frequency domain characteristic information and convert it into a digital signal. After passing the original voice information through VONR When transmitting in the PS domain, the extracted information is also continuously sent to the other party's mobile phone through the CS domain. Then judge the VONR network situation. If the packet loss rate is greater than the preset percentage or the network delay increases by more than the preset delay, the original voice signal will be patched when the other terminal receives the original voice signal and extracts the information. Since it has been Carry out one-to-one coding correspondence, so there will be no information loss, and the timbre of the original voice signal can be restored without reducing the user experience of VONR. If the VONR network is in good condition, the repair will not be initiated. Therefore, this application has great flexibility and adaptability, and can flexibly adjust the strategy according to the network quality. The content information is transmitted without obvious loss, discontinuity and jitter in the voice; the frequency domain characteristic information is The transmission can reduce the distortion of voice; and the transmission of time domain information can also prevent the voice call experience of both parties from degrading due to large delays when network delays increase sharply.
如图8所示,本申请实施例还提供了一种语音传输系统。As shown in Figure 8, this embodiment of the present application also provides a voice transmission system.
语音传输系统包括信息提取模块、网络情况判断模块和语音修补模块。其中,信息提取模块的功能是对说话者的内容信息、时域信息和频域特性信息进行提取,提取采样点的数量要和VONR语音采样率保持一样,同时对每个采样点的内容信息、时域信息和频域特性信息进行一一对应的编码,而由于内容信息、时域信息和频域特性信息完全通过CS域传输,则非常安全和稳定,几乎不会造成关键信息的丢失。网络情况判断模块的功能是根据网络中的丢包率和新增延时来判断是否需要启动语音修补模块。语音修补模块的功能是在内容信息、时域信息和频域特性信息传输到对方终端后,由于采样率完全一样,可以同VONR传输过来的原始语音进行一一比对,当网络情况不好时,对方终端可根据编号进行一一同步合成修补。当VONR语音通话开始,语音传输系统就开始将原始语音信号的信息提取,分解为内容信息、时域信息、频域特性信息三部分并转化为数字信号,在将原始语音信号通过VONR的PS域传输时,也不断地将提取的信息通过CS域发到对方手机。然后对VONR网络情况进行判断,如果丢包率大于预设百分比或者网络延时增加预设时延以上,则在对方终端收到原始语音信号和提取信息时对原始语音信号进行修补,由于之前已进行一一的编码对应,所以不会有信息缺失,也能使原始语音的音色得到还原,不会降低VONR的用户体验。如果VONR网络情况良好,则修补系统不启动。这样整个语音传输系统是一个自适应的系统,可根据网络质量来灵活地来调整策略,内容信息的传递另语音不会产生明显的丢失,断续和抖动等;频域特性信息的传递可使语音的失真度降低;而时域信息的传递也能在网络延时陡增的时候,双方语音通话不会由于较大的延时而导致体验下降。The voice transmission system includes an information extraction module, a network situation judgment module and a voice repair module. Among them, the function of the information extraction module is to extract the speaker's content information, time domain information and frequency domain characteristic information. The number of extraction sampling points must be the same as the VONR speech sampling rate. At the same time, the content information of each sampling point, Time domain information and frequency domain characteristic information are encoded in one-to-one correspondence. Since content information, time domain information and frequency domain characteristic information are completely transmitted through the CS domain, it is very safe and stable, and almost no key information is lost. The function of the network condition judgment module is to judge whether the voice patching module needs to be activated based on the packet loss rate and new delay in the network. The function of the voice repair module is that after the content information, time domain information and frequency domain characteristic information are transmitted to the other party's terminal, since the sampling rate is exactly the same, it can be compared one by one with the original voice transmitted by VONR. When the network condition is not good , the other party's terminal can perform synchronized patching one by one according to the number. When a VONR voice call starts, the voice transmission system begins to extract the information of the original voice signal, decompose it into three parts: content information, time domain information, and frequency domain characteristic information and convert it into a digital signal. After passing the original voice signal through the PS domain of VONR During transmission, the extracted information is also continuously sent to the other party's mobile phone through the CS domain. Then judge the VONR network situation. If the packet loss rate is greater than the preset percentage or the network delay increases by more than the preset delay, the original voice signal will be patched when the other terminal receives the original voice signal and extracts the information. Since it has been Carry out one-to-one encoding correspondence, so there will be no information loss, and the timbre of the original voice can be restored without reducing the VONR user experience. If the VONR network is in good condition, the patching system will not start. In this way, the entire voice transmission system is an adaptive system that can flexibly adjust the strategy according to the network quality. The transmission of content information will not cause obvious loss, discontinuity and jitter in the voice; the transmission of frequency domain characteristic information can make The distortion of voice is reduced; and the transmission of time domain information can also prevent the voice call experience of both parties from being degraded due to large delays when network delays increase sharply.
如图9所示,本申请实施例还提供了一种终端,该终端包括但不限于手机。 As shown in Figure 9, an embodiment of the present application also provides a terminal, which includes but is not limited to a mobile phone.
在一实施例中,该终端包括:一个或多个处理器和存储器,图9中以一个处理器及存储器为例。处理器和存储器可以通过总线或者其他方式连接,图9中以通过总线连接为例。In one embodiment, the terminal includes: one or more processors and memories. In FIG. 9 , one processor and memory are taken as an example. The processor and memory can be connected through a bus or other means. Figure 9 takes the connection through a bus as an example.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序,如上述本申请实施例中的语音传输方法。处理器通过运行存储在存储器中的非暂态软件程序以及程序,从而实现上述本申请实施例中的语音传输方法。As a non-transitory computer-readable storage medium, the memory can be used to store non-transitory software programs and non-transitory computer executable programs, such as the voice transmission method in the above embodiments of the present application. The processor implements the voice transmission method in the above embodiment of the present application by running non-transient software programs and programs stored in the memory.
存储器可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储执行上述本申请实施例中的语音传输方法所需的数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system and an application program required for at least one function; the storage data area may store data required to execute the voice transmission method in the embodiments of the present application. wait. In addition, the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may include memory located remotely relative to the processor, and these remote memories may be connected to the terminal through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
实现上述本申请实施例中的语音传输方法所需的非暂态软件程序以及程序存储在存储器中,当被一个或者多个处理器执行时,执行上述本申请实施例中的语音传输方法,例如,执行以上描述的图1中的方法步骤S101至步骤S104,图3中的方法步骤S301至步骤S302,或者执行以上描述的图4中的方法步骤S401至步骤S404,图7中的方法步骤S701至步骤S702,通过获取第一终端的语音信号,提取语音信号的特征信息,通过CS域发送特征信息至第二终端,通过VONR的PS域传输语音信号至第二终端,以使得在VONR的网络参数大于预设阈值的情况下,第二终端根据CS域收到的特征信息对PS域收到的语音信号进行语音修复,并输出修复处理后的语音信号。基于此,通过第一终端对语音信号提取特征信息,通过CS域将特征信息以及通过5G VONR传输原始的语音信号传输给第二终端,在网络信号不佳的情况下,第二终端将CS域收到的特征信息对PS域收到的语音信号进行比对,以对语音信号进行缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。因此,本申请具备很大的灵活性,当VONR网络信号好的时候,不需要进行修补和修复,当网络信号衰落时,自动启动修复,用户根本感觉不到5G网络信号回落对于通话音质的影响。The non-transitory software programs and programs required to implement the above-mentioned voice transmission method in the embodiment of the present application are stored in the memory. When executed by one or more processors, the above-mentioned voice transmission method in the embodiment of the present application is executed, for example , perform the above-described method steps S101 to step S104 in FIG. 1 and the method steps S301 to step S302 in FIG. 3, or perform the above-described method steps S401 to step S404 in FIG. 4 and method step S701 in FIG. 7 Go to step S702, obtain the voice signal of the first terminal, extract the feature information of the voice signal, send the feature information to the second terminal through the CS domain, and transmit the voice signal to the second terminal through the PS domain of VONR, so that in the VONR network When the parameter is greater than the preset threshold, the second terminal performs voice repair on the voice signal received in the PS domain based on the characteristic information received in the CS domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received characteristic information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
此外,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行程序,该计算机可执行程序被一个或多个控制处理器执行,例如,被图9中的一个处理器执行,可使得上述一个或多个处理器执行上述本申请实施例中的语音传输方法,例如,执行以上描述的图1中的方法步骤S101至步骤S104,图3中的方法步骤S301至步骤S302,或者执行以上描述的图4中的方法步骤S401至步骤S404,图7中的方法步骤S701至步骤S702,通过获取第一终端的语音信号,提取语音信号的特征信息,通过CS域发送特征信息至第二终端,通过VONR的PS域传输语音信号至第二终端,以使得在VONR的网络参数大于预设阈值的情况下,第二终端根据CS域收到的特征信息对PS域收到的语音信号进行语音修复,并输出修复处理后的语音信号。基于此,通过第一终端对语音信号提取特征信息,通过CS域将特征信息以及通过5G VONR传输原始的语音信号传输给第二终端,在网络信号不佳的情况下,第二终端将CS域收到的特征信息对PS域收到的语音信号进行比对,以对语音信号进行缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。因此,本申请具备很大的灵活性,当VONR网络信号好的时候,不需要进行修补和修复,当网络信号衰落时,自动启动修复,用户根本感觉不到5G网络信号回落对于通话音质的影响。 In addition, embodiments of the present application also provide a computer-readable storage medium that stores a computer-executable program, and the computer-executable program is executed by one or more control processors, for example, as shown in FIG. 9 Execution by one of the processors can cause the one or more processors to execute the voice transmission method in the embodiment of the present application, for example, execute the method steps S101 to S104 in Figure 1 described above, the method in Figure 3 Steps S301 to step S302, or perform the above-described method steps S401 to step S404 in Figure 4, and method steps S701 to S702 in Figure 7 to obtain the voice signal of the first terminal, extract the characteristic information of the voice signal, and The CS domain sends characteristic information to the second terminal, and the voice signal is transmitted to the second terminal through the PS domain of VONR, so that when the network parameters of VONR are greater than the preset threshold, the second terminal responds based on the characteristic information received in the CS domain. The speech signal received by the PS domain is speech repaired, and the repaired speech signal is output. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through 5G VONR to the second terminal through the CS domain. When the network signal is poor, the second terminal transmits the CS domain The received feature information is compared with the voice signal received in the PS domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sender without reducing any sound quality or excessive delay. hour. Therefore, this application has great flexibility. When the VONR network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the 5G network signal drop on the call sound quality at all. .
本申请实施例包括:通过获取第一终端的语音信号,提取语音信号的特征信息,通过电路交换域发送特征信息至第二终端,通过分组交换域传输语音信号至第二终端,以使得在网络参数满足预设条件的情况下,第二终端根据电路交换域收到的特征信息对分组交换域收到的语音信号进行修复,并输出修复后的语音信号。基于此,通过第一终端对语音信号提取特征信息,通过电路交换域将特征信息以及通过通话传输原始的语音信号传输给第二终端,在网络信号不佳的情况下,第二终端将电路交换域收到的特征信息对分组交换域收到的语音信号进行比对,以对语音信号进行缺失修补和失真修复,就完全还原了送话端的所有语音信息同时不降低任何音质,也没有过长的延时。因此,本申请具备很大的灵活性,当网络信号好的时候,可以不需要进行修补和修复,当网络信号衰落时,自动启动修复,用户根本感觉不到因网络信号回落对于通话音质的影响。Embodiments of the present application include: obtaining the voice signal of the first terminal, extracting the characteristic information of the voice signal, sending the characteristic information to the second terminal through the circuit switching domain, and transmitting the voice signal to the second terminal through the packet switching domain, so that in the network When the parameters meet the preset conditions, the second terminal repairs the voice signal received by the packet-switched domain based on the characteristic information received by the circuit-switched domain, and outputs the repaired voice signal. Based on this, the first terminal extracts feature information from the voice signal, and transmits the feature information and the original voice signal through the call to the second terminal through the circuit-switched domain. When the network signal is poor, the second terminal transmits the circuit-switched signal to the second terminal. The characteristic information received in the packet switching domain is compared with the voice signal received in the packet switching domain to repair the defects and distortions of the voice signal. This completely restores all the voice information at the sending end without reducing any sound quality or being too long. delay. Therefore, this application has great flexibility. When the network signal is good, there is no need to repair and repair. When the network signal declines, the repair is automatically started. The user does not feel the impact of the network signal drop on the call sound quality at all. .
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读程序、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读程序、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some steps and systems in the methods disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable programs, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies a computer-readable program, data structure, program module or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上是对本申请的若干实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请本质的共享条件下还可作出种种等同的变形或替换,这些等同的变形或替换均包括在本申请权利要求所限定的范围内。 The above is a detailed description of several implementations of the present application, but the present application is not limited to the above-mentioned implementations. Those skilled in the art can also make various equivalent modifications or substitutions without violating the essence of the present application. Equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims (10)

  1. 一种语音传输方法,应用于第一终端,所述方法包括:A voice transmission method, applied to a first terminal, the method includes:
    获取所述第一终端的语音信号;Obtain the voice signal of the first terminal;
    提取所述语音信号的特征信息;Extract characteristic information of the speech signal;
    通过电路交换域发送所述特征信息至第二终端;Send the characteristic information to the second terminal through the circuit-switched domain;
    通过分组交换域传输所述语音信号至所述第二终端,以使得在网络参数满足预设条件的情况下,所述第二终端根据电路交换域收到的所述特征信息对分组交换域收到的语音信号进行修复,并输出修复后的所述语音信号。The voice signal is transmitted to the second terminal through the packet switching domain, so that when the network parameters meet the preset conditions, the second terminal receives information from the packet switching domain based on the characteristic information received by the circuit switching domain. The received voice signal is repaired, and the repaired voice signal is output.
  2. 根据权利要求1所述的方法,其中,所述提取所述语音信号的特征信息包括提取所述语音信号的内容信息、时域信息和频域特性信息。The method according to claim 1, wherein the extracting the characteristic information of the speech signal includes extracting the content information, time domain information and frequency domain characteristic information of the speech signal.
  3. 根据权利要求2所述的方法,其中,提取所述语音信号的所述内容信息、所述时域信息和所述频域特性信息的采样率和通过分组交换域传输所述语音信号的采样率相同,以及其中所述提取所述语音信号的内容信息、时域信息和频域特性信息,包括:The method according to claim 2, wherein the sampling rate of extracting the content information, the time domain information and the frequency domain characteristic information of the voice signal and the sampling rate of transmitting the voice signal through a packet switching domain are The same, and wherein the extracting the content information, time domain information and frequency domain characteristic information of the speech signal includes:
    对采样的所述语音信号的所述内容信息、所述时域信息和所述频域特性信息进行逐一对应编码。The content information, the time domain information and the frequency domain characteristic information of the sampled voice signal are encoded one by one.
  4. 根据权利要求1所述的方法,其中,所述网络参数满足预设条件的情况包括如下至少之一:The method according to claim 1, wherein the network parameters meeting preset conditions include at least one of the following:
    网络的丢包率大于预设百分比;或The network packet loss rate is greater than the preset percentage; or
    网络的延时增加大于预设时延。The network delay increases greater than the preset delay.
  5. 根据权利要求1所述的方法,其中,所述通过电路交换域发送所述特征信息至第二终端,包括:The method according to claim 1, wherein said sending the characteristic information to the second terminal through a circuit-switched domain includes:
    将所述特征信息转化成数字信息;Convert the characteristic information into digital information;
    通过电路交换域发送所述数字信息至第二终端。The digital information is sent to the second terminal through the circuit-switched domain.
  6. 一种语音传输方法,应用于第二终端,所述方法包括:A voice transmission method, applied to a second terminal, the method includes:
    接收第一终端通过电路交换域发送的特征信息,所述特征信息提取自所述第一终端的语音信号;Receive feature information sent by the first terminal through the circuit-switched domain, where the feature information is extracted from the voice signal of the first terminal;
    接收所述第一终端通过分组交换域传输的所述语音信号;Receive the voice signal transmitted by the first terminal through the packet switching domain;
    在网络参数满足预设条件的情况下,根据电路交换域收到的所述特征信息对分组交换域收到的语音信号进行修复;When the network parameters meet the preset conditions, repair the voice signal received by the packet-switched domain according to the characteristic information received by the circuit-switched domain;
    输出修复后的所述语音信号。The repaired voice signal is output.
  7. 根据权利要求6所述的方法,其中,所述特征信息包括所述语音信号的内容信息、时域信息和频域特性信息,所述根据电路交换域收到的所述特征信息对分组交换域收到的语音信号进行语音修复,包括:The method according to claim 6, wherein the characteristic information includes content information, time domain information and frequency domain characteristic information of the voice signal, and the packet switching domain is modified according to the characteristic information received by the circuit switching domain. Speech repair is performed on the received voice signal, including:
    根据电路交换域的收到的所述语音信号的内容信息、时域信息和频域特性信息对分组交换域收到的语音信号进行比对,以确定分组交换域接收到的所述语音信号的缺损部分;The voice signal received in the packet-switched domain is compared according to the content information, time domain information and frequency domain characteristic information of the voice signal received in the circuit-switched domain to determine the quality of the voice signal received in the packet-switched domain. defective part;
    对所述缺损部分进行修复。Repair the defective part.
  8. 根据权利要求7所述的方法,其中,所述根据电路交换域收到的所述特征信息对分组交换域收到的语音信号进行语音修复,还包括: The method according to claim 7, wherein performing voice repair on the voice signal received by the packet-switched domain according to the characteristic information received by the circuit-switched domain further includes:
    根据电路交换域的所述时域信息对语音片段进行排序,其中,所述语音信号由多个所述语音片段组成。The speech segments are sorted according to the time domain information of the circuit-switched domain, wherein the speech signal is composed of a plurality of the speech segments.
  9. 一种终端,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至5中任意一项所述的语音传输方法,或者如权利要求6至8任意一项所述的语音传输方法。A terminal, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the processor implements the requirements of any one of claims 1 to 5. The voice transmission method described in claim 6, or the voice transmission method described in any one of claims 6 to 8.
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机可执行程序,所述计算机可执行程序用于使计算机执行如权利要求1至5任意一项所述的语音传输方法,或者如权利要求6至8任意一项所述的语音传输方法。 A computer-readable storage medium, wherein the computer-readable storage medium stores a computer-executable program, and the computer-executable program is used to cause the computer to execute the voice transmission method according to any one of claims 1 to 5. , or the voice transmission method according to any one of claims 6 to 8.
PCT/CN2023/071976 2022-04-08 2023-01-12 Voice transmission method, terminal and computer-readable storage medium WO2023193506A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210364802.5A CN116935870A (en) 2022-04-08 2022-04-08 Voice transmission method, terminal and computer readable storage medium
CN202210364802.5 2022-04-08

Publications (1)

Publication Number Publication Date
WO2023193506A1 true WO2023193506A1 (en) 2023-10-12

Family

ID=88243912

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071976 WO2023193506A1 (en) 2022-04-08 2023-01-12 Voice transmission method, terminal and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN116935870A (en)
WO (1) WO2023193506A1 (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012024844A1 (en) * 2010-08-27 2012-03-01 Qualcomm Incorporated Packet switched communication precedence at multi-mode communication device
KR20150025891A (en) * 2013-08-30 2015-03-11 에스케이텔레콤 주식회사 Method and apparatus for handing over call from packet switched domain to circuit switched domain
CN106102087A (en) * 2016-05-27 2016-11-09 维沃移动通信有限公司 A kind of audio communication method and mobile terminal
CN106358254A (en) * 2016-08-31 2017-01-25 广东欧珀移动通信有限公司 Network access control method and equipment
CN109040495A (en) * 2018-08-07 2018-12-18 奇酷互联网络科技(深圳)有限公司 Voice communication control method, device, mobile terminal and storage medium
CN111405622A (en) * 2020-03-20 2020-07-10 Oppo广东移动通信有限公司 Switching method and device based on voice quality, terminal and storage medium
CN111491290A (en) * 2020-04-14 2020-08-04 深圳市沃特沃德股份有限公司 Method, device and computer equipment for parallel transmission of network voice and PS domain data
CN111901841A (en) * 2020-07-21 2020-11-06 陕西银河景天电子有限责任公司 Method, server and storage medium for fusing and connecting CS domain and PS domain
CN111988821A (en) * 2019-05-22 2020-11-24 华为技术有限公司 Voice communication method and device
CN113035226A (en) * 2019-12-24 2021-06-25 中兴通讯股份有限公司 Voice call method, communication terminal, and computer-readable medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012024844A1 (en) * 2010-08-27 2012-03-01 Qualcomm Incorporated Packet switched communication precedence at multi-mode communication device
KR20150025891A (en) * 2013-08-30 2015-03-11 에스케이텔레콤 주식회사 Method and apparatus for handing over call from packet switched domain to circuit switched domain
CN106102087A (en) * 2016-05-27 2016-11-09 维沃移动通信有限公司 A kind of audio communication method and mobile terminal
CN106358254A (en) * 2016-08-31 2017-01-25 广东欧珀移动通信有限公司 Network access control method and equipment
CN109040495A (en) * 2018-08-07 2018-12-18 奇酷互联网络科技(深圳)有限公司 Voice communication control method, device, mobile terminal and storage medium
CN111988821A (en) * 2019-05-22 2020-11-24 华为技术有限公司 Voice communication method and device
CN113035226A (en) * 2019-12-24 2021-06-25 中兴通讯股份有限公司 Voice call method, communication terminal, and computer-readable medium
CN111405622A (en) * 2020-03-20 2020-07-10 Oppo广东移动通信有限公司 Switching method and device based on voice quality, terminal and storage medium
CN111491290A (en) * 2020-04-14 2020-08-04 深圳市沃特沃德股份有限公司 Method, device and computer equipment for parallel transmission of network voice and PS domain data
CN111901841A (en) * 2020-07-21 2020-11-06 陕西银河景天电子有限责任公司 Method, server and storage medium for fusing and connecting CS domain and PS domain

Also Published As

Publication number Publication date
CN116935870A (en) 2023-10-24

Similar Documents

Publication Publication Date Title
US9094200B2 (en) Method, apparatus and system for sending and receiving a media stream
US8422388B2 (en) Graceful degradation for communication services over wired and wireless networks
AU2017254886A1 (en) Methods and Systems for Delayed Notifications in Communications Networks
US20200193999A1 (en) Handling of poor audio quality in a terminal device
CN104601521A (en) Method, device and system for dynamically selecting communication transmission protocol
CN104579710A (en) Method for conference member to issue voice information in fragmentation asynchronous conference system
US8391213B2 (en) Graceful degradation for communication services over wired and wireless networks
CN104579712A (en) Method for conference member to issue character information in fragmentation asynchronous conference system
WO2023193506A1 (en) Voice transmission method, terminal and computer-readable storage medium
US20150063261A1 (en) Method for transmitting and receiving voice packet and electronic device implementing the same
CN104320385A (en) Mobile terminal voice communication implementation method and device
CN113746602A (en) Network real-time call method, device and system
CN108668098B (en) Multi-party call establishing method and device and multi-way terminal
CN102065372A (en) Method for transmitting data in broadcast mode and related device
CN103702003A (en) Network telephone connecting method and network telephone connecting system
KR100511298B1 (en) Messaging service method of multi-media
CN102802197A (en) Method and device for transmitting application data
CN102158615A (en) LINUX-based media server in (Voice Over Internet Protocol) system and playback method of media server
CN113035226B (en) Voice communication method, communication terminal and computer readable medium
CN101106824B (en) Method and wireless network controller for enabling encryption in call establishment process
CN102098184A (en) Network access method and device
CN105553935A (en) Data packet processing method and device, and terminal
CN105072592A (en) Method and device for remotely modifying short message
CN101404821B (en) Status information reporting method, equipment and system based on separation structure
CN110418021B (en) Voice talkback implementation method based on double-buffer intelligent cache

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23784070

Country of ref document: EP

Kind code of ref document: A1