CN113035226B

CN113035226B - Voice communication method, communication terminal and computer readable medium

Info

Publication number: CN113035226B
Application number: CN201911348597.8A
Authority: CN
Inventors: 颜蓓; 任鹏
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2024-04-23
Anticipated expiration: 2039-12-24
Also published as: CN113035226A

Abstract

The disclosure provides a voice call method, comprising: acquiring voice content information of a first terminal through a first core network; acquiring voice characteristic information of a first terminal through a second core network; and restoring the original audio according to the voice content information and the voice characteristic information. The present disclosure also provides a communication terminal and a computer-readable medium.

Description

Voice communication method, communication terminal and computer readable medium

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a voice call method, a communication terminal, and a computer readable medium.

Background

At present, voice call services of various communication terminals are mostly based on a Circuit Switched (CS) domain, and the voice call quality depends on a sampling rate, a transmission rate and a sound spectrum width in a unit time. The current speech coding modes applied to the circuit switching domain include AMR-NB (the sound spectrum width is 100 Hz-4 KHz, the highest transmission rate is 12.2 Kbps), AMR-WB (the sound spectrum width is 100 Hz-8 KHz, the highest transmission rate is 23.65 Kbps) and EVS-SWB (the sound spectrum width is 100 Hz-20 KHz, the highest transmission rate is 128 Kbps), and the transmission bit rate of the single-channel lossless audio only needs to reach over 192 Kbps. Based on the above, for voice service, the highest 128Kbps rate of the circuit switched domain can only meet the normal conversation, ensure the voice of a speaker to be clear, realize relative fidelity, but not realize the transmission requirement of lossless audio, and also not ensure that environmental sounds, background sounds and other special sounds outside the voice can be restored at the opposite end of the conversation.

Disclosure of Invention

The present disclosure aims to solve at least one of the technical problems in the prior art, and proposes a voice call method, a communication terminal, and a computer readable medium.

To achieve the above object, in a first aspect, an embodiment of the present disclosure provides a voice call method, including:

acquiring voice content information of a first terminal through a first core network;

acquiring voice characteristic information of the first terminal through a second core network;

And restoring the original audio according to the voice content information and the voice characteristic information.

In a second aspect, an embodiment of the present disclosure provides another voice call method, including:

Acquiring original audio;

extracting voice content information from the original audio;

The voice content information is sent to a second terminal through a first core network;

and controlling a second core network to send the voice characteristic information to the second terminal.

In a third aspect, an embodiment of the present disclosure provides a communication terminal, including:

One or more processors;

A storage means for storing one or more programs;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the voice call method as described in any of the above embodiments.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the steps of the voice call method as described in any of the above embodiments.

The present disclosure has the following beneficial effects:

The embodiment of the disclosure provides a voice call method, a communication terminal and a computer readable medium, which can respectively transmit voice content information and voice characteristic information through different networks to improve the utilization rate of voice service resources of each network and realize high-quality voice call.

Drawings

Fig. 1 is a flowchart of a voice call method according to an embodiment of the disclosure;

fig. 2 is a flowchart of another voice call method according to an embodiment of the disclosure;

fig. 3 is a flowchart of yet another voice call method according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of a specific implementation of step S7 of an embodiment of the present disclosure;

fig. 5 is a flowchart of yet another voice call method according to an embodiment of the present disclosure;

Fig. 6 is a signaling diagram of yet another voice call method according to an embodiment of the disclosure;

fig. 7 is a signaling diagram of still another voice call method according to an embodiment of the disclosure.

Detailed Description

In order to better understand the technical solutions of the present disclosure, the following describes in detail a voice call method, a communication terminal, and a computer readable medium provided by the present disclosure with reference to the accompanying drawings.

Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. Accordingly, a first element, component, or module discussed below could be termed a second element, component, or module without departing from the teachings of the present disclosure.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The voice call method provided by the disclosure can be used for respectively transmitting the voice content information and the voice characteristic information to the call opposite terminal through different networks, improving the utilization rate of voice service resources of each network, and realizing high-quality performance of voice call.

Fig. 1 is a flowchart of a voice call method according to an embodiment of the disclosure. As shown in fig. 1, the method includes:

step S1, voice content information of a first terminal is obtained through a first core network.

In step S1, after the call is established, the voice content information of the first terminal is obtained, where the voice content information includes the original audio, that is, the audio collected by the opposite terminal of the call, and after being extracted by voice recognition, the readable voice content without voice feature information is obtained.

In some embodiments, the first core network comprises: a 2G core network, a 3G core network, or a 4G core network. When the first core network is a 2G core network or a 3G core network, step S1 is to obtain voice content information of the first terminal through the first core network, and specifically includes:

the voice content information is acquired through a circuit switched domain of the first core network.

Specifically, voice content information transmitted by the first terminal is received through a circuit switched domain in a 2G or 3G network. The circuit switching domain is responsible for voice service in a 2G or 3G network, and the voice service of a user is performed in a mode of exclusive channel resources, so that the stability and the safety are higher.

Correspondingly, when the first core network is a 4G core network, step S1 of acquiring, by the first core network, voice content information of the first terminal specifically includes:

The voice content information is acquired through an IP multimedia system (IP Multimedia Subsystem, abbreviated IMS) of the first core network.

In some embodiments, based on the 4G core network, the voice call may be performed by using a circuit switched Fallback (CS Fallback) or using an IP multimedia system, where the voice content information sent by the first terminal is received substantially through a circuit switched domain in the 2G or 3G network.

And S2, acquiring voice characteristic information of the first terminal through the second core network.

In some embodiments, the voice characteristic information comprises: spectral characteristic information. In general, the spectral characteristic information characterizes the timbre of the speaker, corresponding to the timbre information of the original audio.

In some embodiments, the second core network comprises: 5G core network. Step S2, the step of obtaining the voice characteristic information of the first terminal through the second core network specifically comprises the following steps:

The voice feature information is acquired through a packet switched (PACKET SWITCHED, PS) domain of the second core network.

Specifically, voice characteristic information sent by the first terminal is received through the 5G core network. In general, the 4G core network and the 5G core network do not contain a circuit switched domain, and the packet switched domain is based on a mode of sharing channel resources by multiple users, so that the transmission rate and the resource utilization rate are higher, but all data information cannot be ensured to reach the opposite communication end safely.

In the embodiment of the disclosure, since the information transmitted through the packet-switched network is non-voice content information, even if there is a loss of part of the information in the packet-switched network (for example, an area where the 5G signal is not good), the restoration and transmission of the information transmitted in the circuit-switched network are not affected; the voice content information is completely transmitted through the circuit switched network, so that the voice content information is very safe and stable, and the loss of key information in the conversation process is avoided.

And S3, restoring the original audio according to the voice content information and the voice characteristic information.

In some embodiments, the speech content information and the speech feature information are synthesized based on respective speech synthesis algorithms to restore the original audio.

Fig. 2 is a flowchart of another voice call method according to an embodiment of the disclosure. As shown in fig. 2, the method is an alternative embodiment based on one embodiment of the method shown in fig. 1. Specifically, the method includes not only steps S1 to S2, but also step S301 and step S302, where step S302 is a specific embodiment of step S3. Only step S301 and step S302 will be described in detail below.

Step S301, acquiring the environmental audio information of the first terminal through a second core network.

The environment audio information is environment sound, background sound, lossless music or other special sound after the original audio is extracted through voice recognition.

Correspondingly, the step S3 is a step of restoring the original audio according to the voice content information and the voice characteristic information, and specifically comprises the following steps:

Step S302, the original audio is restored according to the voice content information, the voice characteristic information and the environment audio information.

Fig. 3 is a flowchart of yet another voice call method according to an embodiment of the disclosure. As shown in fig. 3, the method includes:

and S4, acquiring original audio.

In some embodiments, after a voice call is established, the original audio from the local speaker is obtained by an audio collection device.

And S5, extracting voice content information from the original audio.

Wherein the voice content information is extracted from the original audio based on voice recognition and corresponding analysis techniques.

And S6, transmitting the voice content information to the second terminal through the first core network.

In some embodiments, the first core network comprises: a 2G core network, a 3G core network, or a 4G core network. When the first core network is a 2G core network or a 3G core network, step S6, the step of sending the voice content information to the second terminal through the first core network, specifically includes:

the voice content information is transmitted over a circuit switched domain of the first core network.

Correspondingly, when the first core network is a 4G core network, step S6, the step of sending the voice content information to the second terminal through the first core network specifically includes:

the voice content information is transmitted through an IP multimedia system of the first core network.

And S7, controlling the second core network to send the voice characteristic information to the second terminal.

In some embodiments, step S7, the step of controlling the second core network to send the voice feature information to the second terminal specifically includes:

and sending a control instruction to the second core network to instruct the second core network to acquire corresponding voice characteristic information from a pre-stored database and send the voice characteristic information to the second terminal.

Fig. 4 is a flowchart of a specific implementation of step S7 in the embodiment of the disclosure. Wherein the second core network comprises: 5G core network. As shown in fig. 4, before the step of controlling the second core network to send the voice feature information to the second terminal in step S7, the method further includes:

step S701a, extracting voice feature information from the original audio.

Wherein, based on the voice recognition and the corresponding analysis technology, voice characteristic information is extracted from the original audio.

Accordingly, in step S7, the step of controlling the second core network to send the voice feature information to the second terminal specifically includes:

Step S702a, the voice feature information is sent to the second terminal through the second core network.

The voice characteristic information is sent to the second terminal through the second core network, namely the voice characteristic information extracted from the original audio in real time is sent to the second terminal through the second core network.

The embodiment of the disclosure provides a voice call method, which can be used for transmitting voice content information to a call opposite terminal through a circuit switched network when a voice call is performed, ensuring that voice is completely transmitted to the call opposite terminal, ensuring safety and stability, transmitting voice characteristic information to the call opposite terminal through a packet switched network, effectively utilizing voice service network resources, and improving transmission rate, thereby not affecting the overall call quality even if part of information transmitted through the packet switched network is lost.

Fig. 5 is a flowchart of still another voice call method according to an embodiment of the disclosure. As shown in fig. 5, the method is an alternative embodiment based on one embodiment of the method shown in fig. 3. Specifically, the method includes not only steps S4 to S7 but also steps S8 to S10. Only steps S8 to S10 will be described in detail below.

And S8, extracting the environment audio information from the original audio.

The environmental audio information is the residual environmental sound, background sound or other special sound after the information related to the speaker voice is extracted from the original audio based on the voice recognition and the corresponding analysis technology.

And step S9, the environmental audio information is sent to the second terminal through the second core network.

Step S10, a synchronization instruction is sent to the first core network and the second core network to instruct the first core network and the second core network to respectively slice and number the voice content information and the environment audio information according to the synchronization instruction.

In step S10, the first core network corresponds to a circuit switched domain or an IP multimedia system, the second core network corresponds to a packet switched domain, and a synchronization instruction is sent to the first core network and the first core network, so as to instruct the first core network and the first core network to respectively slice and number the voice content information and the environmental audio information according to the synchronization instruction. Therefore, the second terminal receives the voice content information and the environment audio information after the slicing number, and can synchronously synthesize the voice content information and the environment audio information according to the corresponding numbers. Meanwhile, even if part of the environmental audio information is lost, the restoration can be performed according to the slice information and the corresponding number.

The embodiment of the disclosure provides a voice call method, which can be used for sending environmental audio information to a call opposite end through a packet switched network, realizing lossless audio transmission while guaranteeing transmission rate, improving voice call quality, wherein synchronous transmission of each data information is guaranteed through synchronous indication, and based on synchronism, even if part of information sent by the packet switched network is lost, the information can be repaired through a corresponding algorithm.

Fig. 6 is a signaling diagram of still another voice call method according to an embodiment of the disclosure. As shown in fig. 6, includes:

BZ01, the first terminal acquires the first original audio (not shown in the figure).

BZ02, the first terminal extracts first voice content information, voice feature information, and first environmental audio information (not shown in the figure) from the first original audio.

The case where the information extracted from the first original audio includes the first environmental audio information is only an alternative implementation of the embodiment of the disclosure.

BZ101, the first terminal, sends the first voice content information to the circuit switched domain (based on the circuit switched domain in the 2G or 3G network).

The BZ102, the first terminal, sends the voice feature information and the first ambient audio information to the packet switched domain (based on the 5G core network).

BZ2, the first terminal sends a synchronization indication to the circuit switched domain and the packet switched domain.

BZ201, the circuit switched domain slices and numbers the first voice content information according to the synchronization instruction.

The BZ2021, packet switched domain slices and numbers the first ambient audio information according to the synchronization indication.

BZ2022, packet switched domain stores voice characteristic information into a database.

BZ301, the circuit switching domain sends the first voice content information after the slice number to the second terminal.

BZ302, packet switched domain sends voice characteristic information and first environmental audio information after slice numbering to the second terminal.

And the BZ4 and the second terminal restore the first original audio according to the first voice content information, the voice characteristic information and the first environment audio information based on the corresponding numbers.

Fig. 7 is a signaling diagram of still another voice call method according to an embodiment of the disclosure. As shown in fig. 7, includes:

BZ501, the first terminal acquires the second original audio (not shown in the figure).

BZ502, the first terminal extracts second voice content information (not shown in the figure) from the second original audio.

BZ601, the first terminal sends the second voice content information to the circuit switched domain.

BZ602, the first terminal sends control instructions to the packet switched domain.

BZ701, the circuit switched domain, sends the second voice content information to the second terminal.

BZ7021 and the packet switching domain acquire voice characteristic information corresponding to the first terminal from a pre-stored database according to the control instruction.

BZ7022, packet switched domain sends voice feature information to the second terminal.

BZ8, the second terminal restores the second original audio according to the second voice content information and the voice characteristic information.

The embodiment of the disclosure also provides a communication terminal, which comprises: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement any of the voice call methods as in the embodiments described above.

The presently disclosed embodiments also provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the voice call methods of the embodiments described above.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, functional modules/units in the apparatus disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A voice call method, comprising:

Acquiring voice content information of a first terminal through a circuit switched domain or an IP multimedia system of a first core network; the voice content information comprises voice content which does not contain voice characteristic information after the original audio is extracted through voice recognition;

Wherein the first core network comprises: a 2G core network, a 3G core network, or a 4G core network;

when the first core network is a 2G core network or a 3G core network, the step of acquiring the voice content information of the first terminal through the first core network specifically includes:

Acquiring the voice content information through a circuit switched domain of the first core network;

When the first core network is a 4G core network, the step of acquiring the voice content information of the first terminal through the first core network specifically includes:

Acquiring the voice content information through an IP multimedia system of the first core network; acquiring voice characteristic information of the first terminal through a packet switched domain of a second core network; wherein the second core network comprises: the 5G core network is used for receiving voice characteristic information, wherein the voice characteristic information comprises frequency spectrum characteristic information; and restoring the original audio according to the voice content information and the voice characteristic information.

2. The voice call method as claimed in claim 1, wherein, before the step of restoring the original audio from the voice content information and the voice feature information, further comprising:

acquiring environmental audio information of the first terminal through the second core network;

the step of restoring the original audio according to the voice content information and the voice characteristic information specifically comprises the following steps:

and restoring the original audio according to the voice content information, the voice characteristic information and the environment audio information.

3. A voice call method, comprising:

Acquiring original audio;

extracting voice content information from the original audio, wherein the voice content information comprises voice content which does not contain voice characteristic information after the original audio is extracted through voice recognition;

Transmitting the voice content information to a second terminal through a circuit switched domain of a first core network or an IP multimedia system; wherein the first core network comprises: a 2G core network, a 3G core network, or a 4G core network;

When the first core network is a 2G core network or a 3G core network, the step of sending the voice content information to the second terminal through the first core network specifically includes:

transmitting the voice content information over a circuit switched domain of the first core network;

When the first core network is a 4G core network, the step of sending the voice content information to the second terminal through the first core network specifically includes:

Transmitting the voice content information through an IP multimedia system of the first core network;

Controlling a packet switched domain of a second core network to send voice characteristic information to the second terminal; wherein the second core network comprises: the 5G core network, the voice characteristic information includes: spectral characteristic information.

4. A voice call method according to claim 3, wherein, before the step of controlling the second core network to transmit the voice feature information to the second terminal, further comprising:

extracting voice characteristic information from the original audio;

The step of controlling the second core network to send the voice feature information to the second terminal specifically includes:

And transmitting the voice characteristic information to a second terminal through a second core network.

5. The voice call method as claimed in claim 3, wherein the step of controlling the second core network to transmit the voice feature information to the second terminal specifically comprises:

and sending a control instruction to the second core network to instruct the second core network to acquire the corresponding voice characteristic information from a pre-stored database, and sending the voice characteristic information to the second terminal.

6. The voice call method according to any one of claims 3 to 5, further comprising:

extracting environmental audio information from the original audio;

And sending the environmental audio information to the second terminal through the second core network.

7. The voice call method of claim 6, further comprising:

And sending a synchronization instruction to the first core network and the second core network to instruct the first core network and the second core network to respectively slice and number the voice content information and the environment audio information according to the synchronization instruction.

8. A communication terminal, comprising:

One or more processors;

A storage means for storing one or more programs;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the voice call method of any of claims 1-7.

9. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the steps of the voice call method according to any of claims 1-7.