WO2022135237A1 - Voice processing method, terminal device, and storage medium - Google Patents

Voice processing method, terminal device, and storage medium Download PDF

Info

Publication number
WO2022135237A1
WO2022135237A1 PCT/CN2021/138389 CN2021138389W WO2022135237A1 WO 2022135237 A1 WO2022135237 A1 WO 2022135237A1 CN 2021138389 W CN2021138389 W CN 2021138389W WO 2022135237 A1 WO2022135237 A1 WO 2022135237A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal device
information
phonetic
encoding
codec
Prior art date
Application number
PCT/CN2021/138389
Other languages
French (fr)
Chinese (zh)
Inventor
宁杰
申呈洁
鲍光照
张岳
渠畅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022135237A1 publication Critical patent/WO2022135237A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones

Definitions

  • the embodiments of the present application relate to the field of communication technologies, and in particular, to a voice processing method, a terminal device, and a storage medium.
  • the terminal device When a user uses a terminal device to make a voice call, at the voice sender, the terminal device encodes the voice signal. Correspondingly, at the voice receiver, the terminal device decodes the received data and restores it to voice.
  • terminal equipment adopts traditional speech coding methods such as waveform coding.
  • the traditional speech coding method will cause the signal distortion of the receiving end to be large, the call will be intermittent, there will be noise or even no sound, and the call effect will be very poor.
  • Embodiments of the present application provide a voice processing method, terminal device, and storage medium.
  • both parties to a call can clearly communicate the pronunciation of words, and users can judge semantics by pronunciation, which improves the effect of the call.
  • a voice processing method which is applied to a first terminal device, and the first terminal device and the second terminal device are in a call state, the method includes: acquiring a voice signal input by a user; after determining that it is currently in a weak signal environment, And when it is determined that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding, multiple sets of information are extracted from the voice signal, and each group of information includes the phonetic symbol, tone and duration of a single word; Encoding and decoding the duration; obtaining the encoding information corresponding to each group of information according to the encoding table; the encoding table stores phonetic symbols, tones, and the correspondence between the duration and the encoding information; sending the encoding information to the second terminal device.
  • the voice processing method provided in the first aspect can be applied to a voice sender who conducts a voice call in a weak signal environment.
  • the phonetic symbol, pitch and duration of each word in the user's speech are extracted and encoded, to obtain encoded information of a preset bit length corresponding to each word. Since the encoded information has a preset bit length, it can be transmitted at a very low bit rate in a weak signal environment.
  • the receiver can clearly restore the pronunciation of the sender user, so that the receiver user can acquire semantics according to the played pronunciation, which improves the call effect.
  • obtaining the corresponding coding information of each group of information according to the coding table including: for each group of information, determining whether the commonly used index table includes the phonetic symbols in the group of information; if the commonly used index table includes this group of information. If the phonetic symbols in the group of information are not included in the common index table, the encoding information is obtained according to the global index table.
  • the common index table is generated according to the common words of the user, and the number of common index values in the common index table is much smaller than the number of global index values in the global index table. The amount of search data is reduced, and the coding efficiency is improved.
  • determining that the current is in a weak signal environment includes: if it is determined that the target parameter satisfies the first preset condition, sending a first request message to the second terminal device, where the first request message is used to instruct the second terminal.
  • the device uses phonetic coding and decoding, and the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located; the first response message sent by the second terminal device is received, and the first response message is used to instruct the second terminal device to use phonetic coding. decoding.
  • the first request message includes a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use phonetic symbol encoding and decoding.
  • time is reserved for the switching of the voice codec mode. After the lag timer expires, the first terminal device and the second terminal device use phonetic symbol codec at the same time, which improves the codec mode. switching effect.
  • determining that it is currently in a weak signal environment includes: receiving a second request message sent by a second terminal device, where the second request message is used to instruct the first terminal device to use phonetic codec; A second response message is sent, where the second response message is used to instruct the first terminal device to use phonetic symbol codec.
  • the method further includes: if it is determined that the target parameter satisfies the second preset condition, sending a third request message to the second terminal device, where the third request message is used to instruct the second terminal device to use waveform encoding and decoding.
  • the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located; the third response message sent by the second terminal device is received, and the third response message is used to instruct the second terminal device to use waveform encoding and decoding.
  • the method further includes: receiving a fourth request message sent by the second terminal device, where the fourth request message is used to instruct the first terminal device to use waveform encoding and decoding; if it is determined that the target parameter satisfies the second preset condition , then send a fourth response message to the second terminal device, where the fourth response message is used to instruct the first terminal device to use waveform codec, and the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located.
  • the traditional voice codec mode can be switched to improve the call effect.
  • determining that the first terminal device supports phonetic symbol encoding and decoding includes: if it is determined that the first terminal device does not have the phonetic symbol encoding and decoding function enabled, generating and outputting prompt information; receiving the user's first instruction; A command turns on the function.
  • determining that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding includes: sending first capability information to the second terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic symbology. Encoding and decoding; receiving the first capability response information sent by the second terminal device, where the first capability response information is used to indicate that the second terminal device supports phonetic symbol encoding and decoding; or, receiving the second capability information sent by the second terminal device, the second capability The information is used to indicate that the second terminal device supports phonetic codec; the second capability response information is sent to the second terminal device, and the second capability response information is used to indicate that the first terminal device supports phonetic codec.
  • the method further includes: displaying a setting interface; receiving a user's operation in the setting interface; and in response to the operation, enabling the function of phonetic symbol encoding and decoding.
  • a voice processing method is provided, which is applied to a second terminal device, and the second terminal device is in a talking state with the first terminal device.
  • the method includes: receiving first information sent by the first terminal device; The first information is decoded to obtain multiple groups of information; each group of information includes the phonetic symbol, tone and duration of a single character, and the coding table stores the correspondence between the phonetic symbol, the tone, and the duration and the encoded information; according to the multiple groups of information, a voice signal is generated; Play the voice signal with the preset sound.
  • the voice processing method provided in the second aspect can be applied to a voice receiver who conducts a voice call in a weak signal environment.
  • the voice receiver decodes the received information to obtain the phonetic symbol, pitch and duration of each word, thereby generating a complete and smooth voice signal and using the preset sound to play. Since the encoded information has a preset bit length, it can be transmitted at a very low bit rate in a weak signal environment.
  • the receiver can clearly restore the pronunciation of the sender user, so that the receiver user can acquire semantics according to the played pronunciation, which improves the call effect.
  • decoding the first information according to the encoding table to obtain multiple sets of information including: sequentially obtaining multiple encoding information from the first information, and the length of the encoding information is a preset bit length; for each encoding information, and obtain the phonetic symbol, pitch and duration corresponding to the encoding information according to the encoding table.
  • the method before receiving the first information sent by the first terminal device, the method further includes: receiving a first request message sent by the first terminal device, where the first request message is used to instruct the second terminal device to use phonetic symbol encoding and decoding. ; Phonetic codec refers to encoding and decoding phonetics, pitch and duration; sending a first response message to the first terminal device, the first response message is used to instruct the second terminal device to use phonetic codec.
  • the first request message includes a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use phonetic symbol encoding and decoding.
  • the method before receiving the first information sent by the first terminal device, the method further includes: if it is determined that the target parameter satisfies the first preset condition, sending a second request message to the first terminal device, the second request message Used to instruct the first terminal device to use phonetic codec, the phonetic codec refers to encoding and decoding phonetic symbols, pitch and duration, and the target parameter is used to indicate the signal state of the communication environment where the second terminal device is currently located; receiving the first terminal A second response message sent by the device, where the second response message is used to instruct the first terminal device to use phonetic symbol codec.
  • the method further includes: receiving a third request message sent by the first terminal device, where the third request message is used to instruct the second terminal device to use waveform encoding and decoding; if it is determined that the target parameter satisfies the second preset condition , then send a third response message to the first terminal device, where the third response message is used to instruct the second terminal device to use waveform codec, and the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located.
  • the method further includes: if it is determined that the target parameter satisfies the second preset condition, sending a fourth request message to the first terminal device, where the fourth request message is used to instruct the first terminal device to use waveform encoding and decoding. ; Receive a fourth response message sent by the first terminal device, where the fourth response message is used to instruct the first terminal device to use waveform codec.
  • the method before receiving the first information sent by the first terminal device, the method further includes: receiving the first capability information sent by the first terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic symbol encoding and decoding. ; Send the first capability response information to the first terminal device, and the first capability response information is used to indicate that the second terminal device supports phonetic symbol encoding and decoding; Phonetic symbol encoding and decoding refers to encoding and decoding phonetic symbols, tones and duration; Second capability information sent by a terminal device, the second capability information is used to indicate that the second terminal device supports phonetic codec; second capability response information sent by the first terminal device is received, and the second capability response information is used to indicate the first terminal.
  • the device supports phonetic codec.
  • the method further includes: displaying a setting interface; receiving a user's operation in the setting interface; in response to the operation, turning on the function of phonetic symbol encoding and decoding, where phonetic symbol encoding and decoding refers to encoding phonetic symbols, pitch and duration. decoding.
  • an apparatus comprising: means or means for performing the steps in any of the above aspects.
  • a terminal device including a processor, a memory, and a transceiver, where the transceiver is used for communicating with other devices, and the processor is used for calling a program stored in the memory to execute the method provided in any of the above aspects.
  • a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer or a processor, the method provided in any of the above aspects is implemented.
  • a program product in a sixth aspect, includes a computer program, the computer program is stored in a readable storage medium, and at least one processor of a device can read the computer program from the readable storage medium , the at least one processor executes the computer program to cause the device to implement the method provided in any of the above aspects.
  • the encoded information includes a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration.
  • the first information component includes a first information sub-component corresponding to a phonetic symbol and a second information sub-component corresponding to a tone.
  • the coding table includes a global index table and a common index table
  • the common index table is generated according to the number of words used by the user within a preset time period
  • the global index table includes a phonetic symbol and a global index of the phonetic symbol
  • the phonetic symbols included in the common index table have the common index value and the global index value of the phonetic symbol in the global index table.
  • the target parameter includes at least one of the following: the location information of the terminal device, the cell identifier of the cell currently accessed by the terminal device, the signal strength of the signal received by the terminal device or the voice packet loss rate.
  • FIG. 1 is a diagram of an application scenario to which an embodiment of the present application is applicable
  • Fig. 2 is a schematic diagram of the principle when a terminal device performs a voice call
  • FIG. 3 is a schematic diagram of a call effect using traditional voice codec in a weak signal environment
  • FIG. 4 is a schematic diagram of a call effect provided by an embodiment of the present application in a weak signal environment
  • FIG. 5 is a message interaction diagram of the voice processing method provided by the embodiment of the present application.
  • FIG. 6 is another message interaction diagram of the voice processing method provided by the embodiment of the present application.
  • FIG. 7 is another message interaction diagram of the voice processing method provided by the embodiment of the present application.
  • FIG. 8 is another message interaction diagram of the voice processing method provided by the embodiment of the present application.
  • FIG. 9 is another message interaction diagram of the voice processing method provided by the embodiment of the present application.
  • Fig. 10 is a kind of interface diagram of setting phonetic symbol encoding and decoding mode provided by the embodiment of this application;
  • 11 is another message interaction diagram of the voice processing method provided by the embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 14 is another schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 1 is a diagram of an application scenario to which this embodiment of the present application is applicable.
  • user A uses terminal device 100
  • user B uses terminal device 200
  • user A and user B can conduct a voice call.
  • This embodiment of the present application does not limit the type of the terminal device.
  • examples of some terminal devices may be: mobile phones, tablet computers, PDAs, wearable devices, and the like.
  • FIG. 2 is a schematic diagram of the principle when a terminal device performs a voice call.
  • the terminal device 100 is the voice sender, and the terminal device 200 is the voice receiver.
  • the terminal device 100 acquires the voice signal input by the user A, performs voice encoding on the voice signal, and generates encoded information.
  • the encoded information is received by the terminal device 200 after being transmitted through the channel.
  • the terminal device 200 performs voice decoding on the received information, restores and generates a voice signal, and outputs it to user B.
  • FIG. 2 shows the voice codec part in the voice call process, and other processing processes are not limited.
  • Speech coding has a broad meaning and a narrow meaning. In a broad sense, it refers to an encoding method that includes speech encoding at the sender and speech decoding at the receiver.
  • the narrow meaning refers to speech encoding at the sender.
  • encoding in a broad sense is referred to as encoding and decoding.
  • the purpose of speech codec is to digitize the speech signal, compress the transmission bandwidth of the speech signal, and improve the transmission rate of the channel.
  • speech encoding and decoding for example, traditional speech encoding and decoding such as waveform encoding and decoding, feature encoding and decoding, and parameter encoding and decoding, as well as phonetic symbol encoding and decoding in the embodiments of the present application.
  • the traditional speech encoding and decoding methods in the embodiments of the present application take waveform encoding and decoding as an example for description.
  • the signal quality is poor. If the traditional voice encoding and decoding method is used, the transmitted voice code rate will be reduced, and the voice signal restored by the receiving end will be greatly distorted, resulting in discontinuity, noise or even no sound.
  • the terminal device is in a weak signal environment, which means that any one of the two terminal devices in a call is in a weak signal environment.
  • the terminal device 100 is in a weak signal environment, including the following three scenarios: scenario 1, the terminal device 100 is in a weak signal environment; scenario 2, the terminal device talking to the terminal device 100 200 is in a weak signal environment; in scenario three, both the terminal device 100 and the terminal device 200 are in a weak signal environment.
  • the terminal device may acquire its own target parameters, and determine whether the terminal device is in a weak signal environment according to whether the target parameters satisfy a preset condition.
  • a preset condition when the target parameter satisfies the first preset condition, it is determined that the terminal device is in a weak signal environment, and when the target parameter satisfies the second preset condition, it is determined that the terminal device is not in a weak signal environment.
  • the target parameters are different, and the corresponding first preset conditions and the second preset conditions are different.
  • the target parameter may include at least one of the following: the location information of the terminal device, the cell identifier of the cell currently accessed by the terminal device, the signal strength of the signal received by the terminal device or the voice packet loss rate.
  • the target parameter is the location information of the terminal device
  • the weak signal geographic range can be recorded in advance.
  • the location information of the terminal device is within the weak signal geographic range, it is determined that the terminal device is in a weak signal environment.
  • the location information of the terminal device is not within the weak signal geographic range, it is determined that the terminal device is not in a weak signal environment.
  • This embodiment of the present application does not limit the geographic range of weak signals, for example, some mountainous areas, bridges, and other areas where it is difficult to deploy stations.
  • the target parameter is the cell identifier of the cell currently accessed by the terminal device, and the weak signal cell identifier may be pre-recorded.
  • the cell identifier of the cell currently accessed by the terminal device is a weak signal cell identifier
  • the cell identity of the cell currently accessed by the terminal device is not the weak signal cell identity
  • This embodiment of the present application does not limit the identification of weak signal cells. For example, in a chain distribution scenario such as high-speed rail, subway, or expressway, the signal coverage blind spot is relatively fixed, and the identification of the cell with poor signal.
  • the target parameter is the signal strength of the signal received by the terminal device
  • a first threshold and a second threshold may be preset, and the second threshold is greater than or equal to the first threshold.
  • the signal strength of the signal received by the terminal device is less than or equal to the first threshold, it is determined that the terminal device is in a weak signal environment.
  • the signal strength of the signal received by the terminal device is greater than or equal to the second threshold, it is determined that the terminal device is not in a weak signal environment.
  • This embodiment of the present application does not limit the values of the first threshold and the second threshold.
  • the target parameter is the voice packet loss rate of the terminal device
  • a third threshold and a fourth threshold may be preset, and the third threshold is greater than or equal to the fourth threshold.
  • the voice packet loss rate is greater than or equal to the fourth threshold, it is determined that the terminal device is in a weak signal environment.
  • the voice packet loss rate is less than or equal to the third threshold, it is determined that the terminal device is not in a weak signal environment.
  • This embodiment of the present application does not limit the values of the third threshold and the fourth threshold.
  • the phonetic symbol encoding and decoding refers to encoding and decoding the phonetic symbol, pitch, and duration of a single word according to a coding table.
  • This embodiment of the present application does not limit the name of the phonetic codec, for example, it may also be called weak-signal high-definition speech codec.
  • Phonetic symbols can be word-based, words, sentences, or other units.
  • the embodiment of the present application is described by taking the word unit as an example, and each single word has three pieces of information, which are phonetic symbol, pitch, and duration respectively.
  • the duration can include short, medium and long. Short means less than 0.5 seconds, medium means greater than or equal to 0.5 seconds and less than 2 seconds, and long means greater than or equal to 2 seconds.
  • the duration may include four categories of 1 to 4. 1 means less than 0.5 seconds, 2 means greater than or equal to 0.5 seconds and less than 1 second, 3 means greater than or equal to 1 second and less than 2 seconds, 4 means greater than or equal to 2 seconds.
  • phonetic coding is performed on the phonetic symbol, tone and duration of each word according to the coding table to generate coding information with a preset bit length.
  • This embodiment of the present application does not limit the value of the preset bit length, for example, 22 bits.
  • phonetic symbols refer to the pinyin of a Chinese character.
  • “hello” corresponds to 2 phonetic symbols, which are “ni” for "you” and “hao” for “hao”.
  • a phonetic symbol refers to an English word.
  • “Good morning” corresponds to 2 phonetic symbols, namely "good” and "morning”.
  • Tones in English may include, but are not limited to, at least two of the following: affirmative, interrogative, rising, falling, rising and falling, falling and rising, flat, high, and low.
  • Coding table coding information, global index table and common index table
  • the encoding table is stored in the terminal device, and the encoding table stores the correspondence between phonetic symbols, tones, duration and encoding information.
  • the terminal device completes the phonetic symbol encoding and decoding or phonetic symbol decoding by looking up the coding table.
  • the correspondence between phonetic symbols, tones and durations and the encoded information can include at least one of the following: the correspondence between the phonetic symbols and the encoded information, the correspondence between the tones and the encoded information, the duration Correspondence between a combination of phonetic symbols and tones and encoding information, or a corresponding relationship between a combination of phonetic symbols, tones and duration and encoding information.
  • the coding table may include one table or at least two tables according to different correspondences. According to different correspondences, the encoded information of the preset bit length may include information components corresponding to different combinations of phonetic symbols, tones and durations.
  • the encoded information may include a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration.
  • the first information component includes a first information sub-component corresponding to a phonetic symbol and a second information sub-component corresponding to a tone. That is, phonetic symbols, pitches, and durations respectively correspond to information components.
  • the encoding table may include a global index table and a common index table.
  • the global index table is used for encoding by the voice sender and decoding by the voice receiver, and a common index table is used for encoding by the voice sender.
  • the global index table includes a phonetic symbol and a global index value of the phonetic symbol.
  • the global index table can be understood as a complete set of words or phonetic symbols in a certain language, and the value range of the global index value is large.
  • the commonly used index table is generated according to the number of words used by the user within a preset time period, and the embodiment of the present application does not limit the value of the preset time period, for example, 3 months, 6 months, or 1 year.
  • the phonetic symbols included in the common index table have the common index value and the global index value of the phonetic symbol in the global index table.
  • the terminal device When the terminal device performs phonetic symbol encoding and decoding according to the encoding table, it may first look up the commonly used index table. Because the common index table is generated according to the common words of the user, the number of common index values is small, and the table lookup time is short, which can improve the efficiency of phonetic symbol encoding and decoding. When no result is found in the common index table, the global index table is searched for phonetic symbol encoding and decoding.
  • This embodiment of the present application does not limit the value range of the global index value and the phonetic symbol ordering. For example, common words are sorted first.
  • This embodiment of the present application does not limit the value range of the commonly used index values and the phonetic symbol ordering.
  • the number of phonetic symbols can be 5000
  • the commonly used index value can be 13 bits in length, representing a maximum of 8192 phonetic symbols, which can be sorted in descending order according to the usage frequency of single words within a preset time period.
  • the global index table and the commonly used index table may be updated periodically, and the update period is not limited in this embodiment of the present application.
  • the following takes the language of Chinese as an example to illustrate the coding table, coding information, global index table and common index table.
  • the coding table includes: a Chinese global index table, a duration table, and a common index table.
  • the encoded information includes a first information component and a second information component.
  • the Chinese global index table is used to indicate the correspondence between the combination of words, phonetic symbols and tones and the first information component (global index value) in the encoded information.
  • the global index value can be 20 bits, representing a maximum of 1.04 million words.
  • the duration table is used to indicate the correspondence between the duration and the second information component in the encoded information.
  • the common index table includes words, phonetic symbols, tones, duration, global index value and common index value. Exemplarily, it will be described in conjunction with Tables 1 to 3.
  • the encoded information is 22 bits long, wherein the first information component (global index value) is 20 bits long, and the second information component is 2 bits long.
  • the Chinese global index table can be understood as a complete set of Chinese words, and each word corresponds to a global index value.
  • the pitch ranges from 1 to 4, representing the first to fourth tones, respectively.
  • the duration includes three types: short, medium, and long, and the corresponding index value is 2 bits, and the index value is the second information component.
  • the commonly used index values can uniquely distinguish the combination of word, phonetic symbol, tone and duration.
  • Table 1 Chinese global index table
  • the encoding table includes: a Chinese global index table, a duration table, and a common index table.
  • the encoded information includes a first information component and a second information component.
  • the Chinese global index table can be found in Table 1
  • the duration table can be found in Table 2.
  • Common index tables can be found in Table 4. As shown in Table 4, the commonly used index values can uniquely distinguish the combination of words, phonetic symbols and tones.
  • the encoded information includes a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration. Since the durations are encoded separately, the difference of the durations may not be considered in the commonly used index table, which further reduces the number of commonly used index values and improves the rate of searching for the commonly used index table.
  • the encoding table includes: a Chinese global index table, a tone table, a duration table, and a common index table.
  • the encoded information includes a first information subcomponent, a second information subcomponent and a second information component.
  • the Chinese global index table is used to indicate the correspondence between the phonetic symbols and the first information sub-component (global index value) in the encoded information.
  • the tone table is used to indicate the correspondence between the tone and the second information sub-component in the encoded information (the index value of 2 bits in Table 6).
  • the pitch ranges from 1 to 4, representing the first to fourth tones, respectively.
  • the duration table can be found in Table 2.
  • Table 7 the commonly used index values can uniquely distinguish different phonetic symbols.
  • the phonetic symbol, pitch and duration are separately encoded and decoded, and the difference between different words is not considered, which further reduces the number of common index values and the number of global index values, and improves the search for common index tables and global index tables. speed, improving the encoding and decoding efficiency.
  • each language corresponds to a global index table.
  • the multiple languages may include, but are not limited to, at least two of the following: Chinese, English, German, French, Japanese, Korean, or dialects.
  • the coding table includes three global index tables, which are Chinese global index table, English global index table and dialect global index table respectively.
  • the Chinese global index table can include 380,000 phonetic symbols, of which the common phonetic symbols can be 100,000, and the common phonetic symbols are sorted first.
  • the English global index table can include 280,000 phonetic symbols, of which the commonly used phonetic symbols can be 35,000, and the common phonetic symbols are sorted first.
  • the dialect global index table can include 100,000 phonetic symbols.
  • the encoding table includes a Chinese global index table and a dialect global index table.
  • the value range of the global index value in the Chinese global index table is 1 to 100, with a maximum of 100 phonetic symbols.
  • the global index value in the dialect global index table can be numbered from 101.
  • the sum of the number of phonetic symbols in all global index tables is less than a preset value, which is not limited in this embodiment of the present application, for example, 1 million.
  • Terminal equipment supports phonetic codec
  • the terminal device supports the function of phonetic symbol encoding and decoding by default, and there is no related setting switch, and no user setting is required, then the terminal device supports phonetic symbol encoding and decoding.
  • the terminal device has the function of phonetic symbol encoding and decoding, and there is a related setting switch, which needs to be set by the user.
  • the terminal device supports phonetic symbol codec means that the phonetic symbol codec function is currently enabled on the terminal device through user settings. If the terminal device currently disables the phonetic codec function, the terminal device does not support phonetic codec.
  • This embodiment of the present application does not limit the manner in which the user enables or disables the phonetic symbol encoding and decoding function of the terminal device. For example, it can be controlled by any of the following: voice control, preset gesture control, and control by touch operation in the relevant interface.
  • FIG. 3 is a schematic diagram of a call effect using traditional voice codec in a weak signal environment. As shown in Figure 3, user A and user B are currently in a weak signal environment, and for user B, the call is intermittent, and the effect is very poor.
  • the voice sender encodes the phonetic symbols, pitch and duration of the words spoken by the user, Encoding information with a preset bit length can be obtained. After the encoded information is transmitted through the channel, at the voice receiver, the received information is decoded to obtain the phonetic symbol, pitch and duration of the single word, thereby generating a complete and smooth voice signal and playing it with a preset sound.
  • the speech processing method provided by the embodiment of the present application can transmit at an extremely low bit rate in a weak signal environment.
  • FIG. 4 is a schematic diagram of a call effect provided by an embodiment of the present application in a weak signal environment.
  • user A and user B are currently in a weak signal environment.
  • the terminal device can hear the pinyin sound played by the preset sound, which clearly restores the pronunciation of the sender, user A, and user B can By judging the semantics by pronunciation, the intention of user A is understood, and the effect of the call is improved.
  • user A and user B are familiar with each other.
  • user A or user B is in a weak signal environment and needs to talk, through the voice processing method provided by the embodiment of the present application, through the fuzzy matching of homophonic or approximate pronunciation, the receiver user does not need to recognize the exact meaning of the word issued by the sound source, and can rely on both parties. Accurately understand what the other party really wants to say according to the pronunciation played by the terminal device, and improve the call effect.
  • user A is in an emergency or dangerous environment and the signal is poor.
  • user A talks with user B, through the voice processing method provided in this embodiment of the present application, user A can send a brief message to user B.
  • user B can accurately understand the intention of user A according to the pronunciation played by the terminal device, which improves the call effect.
  • FIG. 5 is a message interaction diagram of the voice processing method provided by the embodiment of the present application.
  • This embodiment involves a first terminal device and a second terminal device, the first terminal device and the second terminal device are in a call state, the first terminal device is a voice sender, and the second terminal device is a voice receiver.
  • the voice processing method provided in this embodiment may include:
  • the first terminal device acquires a voice signal input by a user.
  • the voice signal input by the user is the voice signal corresponding to the user A's voice "Hello, I'm on the mountain" processed by the first terminal device.
  • each group of information includes the phonetic symbol, tone and duration of the single word.
  • determining that the first terminal device is currently in a weak signal environment may include: the first terminal device is in a weak signal environment, or the second terminal device is in a weak signal environment, or both the first terminal device and the second terminal device are in a weak signal environment signal environment.
  • the weak signal environment and the implementation manner of judging whether the terminal device is in the weak signal environment reference may be made to the above description, and details are not repeated here.
  • multiple sets of information can be extracted from the speech signal by way of waveform comparison.
  • the waveform of the speech signal is segmented. For each waveform, compare the waveform or waveform feature with the waveforms or waveform features of all locally pre-stored words, and determine the word with the greatest similarity in waveform or waveform feature among all the words as the word corresponding to the waveform.
  • the waveform of the speech signal is segmented, and waveform extraction can be performed by word, or segmented by a specified length. This embodiment does not limit the value of the specified length, for example, 1 second.
  • the phonetic symbol encoding and decoding used in the embodiment of the present application only requires that the pronunciation of the words is close, whether the meanings of the words are consistent may not be considered, and the efficiency of extracting multiple sets of information can be improved by means of waveform comparison.
  • multiple sets of information can be extracted from the speech signal through a neural network model or a machine model.
  • a neural network model or a machine model is used for semantic recognition, and corresponding words are output according to the input speech signal.
  • the neural network model or the machine model is used for speech recognition, and the corresponding phonetic symbols and tones are output according to the input speech signal.
  • the duration of the word can be determined through energy detection.
  • the level of the duration and the energy threshold corresponding to each level can be preset, and the duration of the word is determined according to the comparison with multiple energy thresholds. Exemplarily, see Table 2 for the level of duration.
  • 6 groups of information can be extracted from the speech signal, which are the phonetic symbols, pitch and duration of the words “you", “good”, “me”, “zai”, “mountain” and “shang” respectively .
  • the phonetic symbol is ni
  • the tone is the third tone
  • the duration is assumed to be short.
  • the phonetic symbol is hao
  • the pitch is the third tone
  • the duration is assumed to be medium.
  • the first terminal device acquires the encoding information corresponding to each group of information according to the encoding table.
  • the coding table stores the correspondence between phonetic symbols, tones and durations and coding information, and the coding information is a preset bit length, which can be referred to the above description, and will not be repeated here.
  • the first terminal device sends the encoded information to the second terminal device.
  • the encoded information is received by the second terminal device after being transmitted through the channel.
  • the information received by the second terminal device from the channel is called first information.
  • the second terminal device can receive the first information with a length of 132 bits from the channel.
  • the second terminal device decodes the first information according to the coding table to obtain multiple sets of information.
  • Each set of information includes the phonetic symbol, pitch and duration of the word.
  • decode the first information according to the coding table to obtain multiple sets of information which may include:
  • a plurality of encoding information are sequentially acquired from the first information, and the length of the encoding information is a preset bit length.
  • the phonetic symbol, pitch and duration corresponding to the encoding information are obtained according to the encoding table.
  • the first information is 132 bits.
  • obtain the first 22-bit encoding information from the first information and obtain the phonetic symbol, tone and duration of the word corresponding to the encoding information according to the encoding table.
  • continue to acquire the second 22-bit encoding information from the first information and acquire the phonetic symbol, tone and duration of the word corresponding to the encoding information according to the encoding table.
  • the phonetic symbols, tones and duration of 6 single words are obtained.
  • the second terminal device generates a voice signal according to the multiple sets of information.
  • each set of information includes the phonetic symbol, tone and duration of a single word, the pronunciation and duration of each word can be restored to synthesize a complete and smooth speech signal.
  • the second terminal device uses a preset sound to play the voice signal.
  • the preset voice is not limited in this embodiment, for example, it may be a male voice or a female voice.
  • the voice processing method provided in this embodiment can be applied to two terminal devices that conduct voice calls in a weak signal environment.
  • the phonetic symbol, pitch and duration of each word in the user's speech are extracted and encoded, to obtain encoded information of a preset bit length corresponding to each word.
  • the received information is decoded to obtain the phonetic symbol, pitch and duration of each word, so as to generate a complete and smooth voice signal, and use the preset sound for playback .
  • the encoded information since the encoded information has a preset bit length, it can be transmitted at an extremely low bit rate in a weak signal environment.
  • the receiver can clearly restore the pronunciation of the sender user, and then obtain the semantics according to the pronunciation of the playback, which solves the intermittent call, noise or even silence that occurs when using traditional speech encoding and decoding.
  • the problem is that the call effect is improved.
  • the encoding information corresponding to each group of information is obtained according to the encoding table, which may include:
  • the coding information is obtained according to the commonly used index table.
  • the encoding information is obtained according to the global index table.
  • the common index table is generated according to the common words of the user, the number of common index values in the common index table is much smaller than the number of global index values in the global index table, so the common index table is searched and encoded first, which reduces the search data. quantity. If the difference is not found in the common index table, it is searched in the global index table for encoding, which improves the encoding efficiency.
  • the encoding table shown in Table 1 to Table 3 above is used.
  • the encoded information may include a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to duration.
  • the first terminal device may first look up the commonly used index table according to the phonetic symbol, tone and duration of the single character and the single character, and if found, use the corresponding 20-bit global index value as the first information component , if not found, search in the Chinese global index table according to the phonetic symbols and tones of the single character and the single character, and take the corresponding 20-bit global index value as the first information component.
  • search is performed in the duration table according to the duration of the single word, and the corresponding 2-bit index value is used as the second information component, thereby obtaining 22-bit encoded information.
  • the second terminal device acquires 22 bits of encoded information, the first 20 bits are the first information component, and the last 2 bits are the second information component. Search in the Chinese global index table according to the first 20 bits to obtain the phonetic symbols and tones of the words. Search in the duration table according to the last 2 bits to obtain the duration of the word.
  • the coding table shown in Table 1, Table 2 and Table 4 above is another implementation manner.
  • the first terminal device searches the commonly used index table, it searches the commonly used index table according to the phonetic symbols and tones of the single character and the single character.
  • the common index table since the common index table does not consider the duration, the number of common index values is further reduced, the search speed is faster, and the coding efficiency is improved.
  • the encoded information may include first information components corresponding to phonetic symbols and tones, and second information components corresponding to durations, where the first information components include first information subcomponents corresponding to phonetic symbols and second information subcomponents corresponding to tones.
  • the first terminal device can first search in the common index table according to the phonetic symbol of the single word, if found, the corresponding 18-bit global index value is used as the first information subcomponent, if not found, Then according to the phonetic symbol of the single word, the Chinese global index table is searched, and the corresponding 18-bit global index value is used as the first information sub-component.
  • the pitch table is searched, and the corresponding 2-bit index value is used as the second information sub-component.
  • the second terminal device obtains 22 bits of encoded information, the first 18 bits are the first information subcomponent, the middle 2 bits are the second information subcomponent, and the last 2 bits are the second information component.
  • Search in the tone table according to the middle 2bit to get the tone of the word.
  • Search in the duration table according to the last 2 bits to obtain the duration of the word.
  • the phonetic symbol, pitch and duration are separately encoded and decoded, the number of common index values and the number of global index values are further reduced, the search speed is faster, and the encoding and decoding efficiency is improved.
  • an implementation manner of determining that the terminal device is in a weak signal environment in S502 is provided on the basis of the embodiment shown in FIG. 5 above. Through the negotiation between the first terminal device and the second terminal device, it is determined that phonetic symbol codec can be used when it is currently in a weak signal environment.
  • determining that the first terminal device is currently in a weak signal environment may include:
  • the first terminal device determines that the target parameter of the first terminal device satisfies the first preset condition, send a first request message to the second terminal device, where the first request message is used to instruct the second terminal device to use phonetic symbol encoding and decoding.
  • the target parameter of the first terminal device is used to indicate the signal state of the communication environment where the first terminal device is currently located.
  • the target parameter and the first preset condition reference may be made to the above description of this application, and details are not repeated here.
  • the second terminal device receives the first request message.
  • the second terminal device sends a first response message to the first terminal device, where the first response message is used to instruct the second terminal device to use phonetic symbol encoding and decoding.
  • the first terminal device as the voice sender determines that it is currently in a weak signal environment according to its own target parameters, and actively initiates a negotiation of codec mode switching to the second terminal device, thereby ensuring that phonetic symbol codec is used in a timely manner, improving conversation quality.
  • the first request message may include a first indication field, which is used to indicate that phonetic symbol codec is used.
  • This embodiment of the present application does not limit the name of the first indication field.
  • the first request message may include a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use phonetic symbol encoding and decoding.
  • the terminal device adopts the traditional voice codec by default.
  • time is reserved for the switching of the voice codec mode.
  • the first terminal device and the second terminal device use the phonetic symbol codec at the same time, which improves the codec mode switching effect.
  • determining that the first terminal device is currently in a weak signal environment may include:
  • the second terminal device determines that the target parameter of the second terminal device satisfies the first preset condition, send a second request message to the first terminal device, where the second request message is used to instruct the first terminal device to use phonetic symbol encoding and decoding.
  • the target parameter of the second terminal device is used to indicate the signal state of the communication environment where the second terminal device is currently located.
  • the target parameter and the first preset condition reference may be made to the above description of this application, and details are not repeated here.
  • the first terminal device receives the second request message.
  • the second request message may include a first indication field, and reference may be made to the above description of the first indication field, which will not be repeated here.
  • the second request message may include a hysteresis timer, where the hysteresis timer is used to indicate a delay time for the first terminal device to use phonetic symbol encoding and decoding.
  • the first terminal device sends a second response message to the second terminal device, where the second response message is used to instruct the first terminal device to use phonetic symbol encoding and decoding.
  • the second terminal device as the voice receiver determines that it is currently in a weak signal environment according to its own target parameters, and actively initiates the negotiation of codec mode switching to the first terminal device, so as to ensure that the phonetic symbol codec mode is used in time. Improve call quality.
  • the first response message or the second response message may be retransmitted.
  • the number of retransmissions may be set, and the specific value is not limited in this embodiment.
  • an implementation manner of determining that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding in S502 is provided. Through capability negotiation between the first terminal device and the second terminal device, it is determined that both parties in the call support phonetic codec, and phonetic codec can be used when currently in a weak signal environment.
  • the first terminal device determines that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding, which may include:
  • the first terminal device sends first capability information to the second terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic symbol encoding and decoding.
  • the second terminal device receives the first capability information sent by the first terminal device.
  • the second terminal device sends first capability response information to the first terminal device, where the first capability response information is used to indicate that the second terminal device supports phonetic codec codec.
  • the first terminal device which is the voice sender, actively initiates capability negotiation with the second terminal device, so as to ensure that the phonetic symbol encoding and decoding method is adopted in time to improve the quality of the call.
  • the first terminal device determines that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding, which may include:
  • the second terminal device sends second capability information to the first terminal device, where the second capability information is used to indicate that the second terminal device supports phonetic symbol encoding and decoding.
  • the first terminal device receives the second capability information sent by the second terminal device.
  • the first terminal device sends second capability response information to the second terminal device, where the second capability response information is used to indicate that the first terminal device supports phonetic codec codec.
  • the second terminal device which is the voice receiver, actively initiates capability negotiation with the first terminal device, so as to ensure that the phonetic symbol encoding and decoding method is adopted in time to improve the call quality.
  • first capability information and the second capability information may be separate messages, or may be carried in existing messages. This embodiment does not limit the time of the capability negotiation process. For example, after the first terminal device establishes a connection with the second terminal device, capability negotiation may be performed during an altering message, and the ringing message may include a New audiocodec capability field to indicate whether the terminal device supports phonetic codec codec .
  • the first capability response information or the second capability response information may be retransmitted.
  • the number of retransmissions may be set, and the specific value is not limited in this embodiment.
  • the terminal device supporting phonetic symbol encoding and decoding means that the phonetic symbol encoding and decoding function is currently enabled on the terminal device through user settings.
  • the voice processing method provided in this embodiment may further include:
  • the setting interface is displayed.
  • FIG. 10 is an interface diagram for setting a phonetic symbol encoding and decoding mode provided by an embodiment of the present application.
  • the terminal device currently displays a setting interface 1001, and the setting interface 1001 includes the function option “weak signal HD voice coding”, that is, the phonetic symbol coding and decoding function in the embodiment of the present application.
  • the state of the control 1010 can display whether the phonetic symbol codec function is enabled on the current terminal device.
  • the phonetic symbol codec function is turned off.
  • the user can perform a click operation on the control 1010.
  • the terminal device responds to the click operation to enable the phonetic symbol encoding and decoding function, as shown in (b) of FIG. 10 .
  • this embodiment does not limit the time for the user to set the phonetic symbol encoding and decoding function.
  • the voice processing method provided in this embodiment may further include:
  • a first instruction from the user is received.
  • the prompting voice may be played, the prompting music may be played, or a prompting box or prompting information may be popped up in the interface currently displayed by the terminal device.
  • an implementation manner of switching from phonetic symbol encoding and decoding to traditional speech encoding and decoding is provided on the basis of the foregoing embodiment.
  • the communication environment where the two terminal devices of the call are located changes in real time and will not always be in a weak signal environment.
  • the phonetic symbol encoding and decoding in the embodiment of the present application is more suitable for a weak signal environment and meets basic communication requirements.
  • the communication environment improves, it should switch back to the traditional voice codec in time to improve the user's call experience.
  • traditional voice codec can be used.
  • the voice processing method provided in this embodiment may further include:
  • the first terminal device determines that the target parameter of the first terminal device satisfies the second preset condition, send a third request message to the second terminal device, where the third request message is used to instruct the second terminal device to use waveform encoding and decoding.
  • the target parameter of the first terminal device is used to indicate the signal state of the communication environment where the first terminal device is currently located.
  • the target parameter and the second preset condition reference may be made to the above description of this application, and details are not repeated here.
  • the second terminal device receives the third request message.
  • the third request message may include a second indication field, which is used to indicate the use of traditional speech codec.
  • This embodiment of the present application does not limit the name of the second indication field.
  • the name could be back to HD.
  • the second indication field may also indicate the start time point of using traditional speech codec.
  • the second terminal device determines that the target parameter of the second terminal device meets the second preset condition, send a third response message to the first terminal device, where the third response message is used to instruct the second terminal device to use waveform encoding and decoding.
  • the target parameter of the second terminal device is used to indicate the signal state of the communication environment where the second terminal device is currently located.
  • the target parameter and the second preset condition reference may be made to the above description of this application, and details are not repeated here.
  • the first terminal device as the voice sender determines that the signal environment in which it is located has improved, it actively initiates a negotiation of codec mode switching to the second terminal device, and when the second terminal device also determines that it is not currently in the In a weak signal environment, a response message is returned to ensure that the traditional voice codec method is used in a timely manner when the communication environment is good to improve call quality.
  • the voice processing method provided in this embodiment may further include:
  • the second terminal device determines that the target parameter of the second terminal device meets the second preset condition, send a fourth request message to the first terminal device, where the fourth request message is used to instruct the first terminal device to use waveform encoding and decoding.
  • the first terminal device receives the fourth request message.
  • the fourth request message may include a second indication field, and reference may be made to the above description of the second indication field, which will not be repeated here.
  • the first terminal device determines that the target parameter of the first terminal device satisfies the second preset condition, send a fourth response message to the second terminal device, where the fourth response message is used to instruct the first terminal device to use waveform encoding and decoding.
  • the second terminal device when the second terminal device as the voice receiver determines that the signal environment in which it is located has improved, it actively initiates a negotiation of codec mode switching to the first terminal device, and when the first terminal device also determines that it is not currently in a In a weak signal environment, a response message is returned to ensure that the traditional voice codec method is used in a timely manner when the communication environment is good to improve call quality.
  • the third response message or the fourth response message may be retransmitted.
  • the number of retransmissions may be set, and the specific value is not limited in this embodiment.
  • the terminal device includes corresponding hardware and/or software modules for executing each function.
  • the present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
  • the terminal device may be divided into functional modules according to the foregoing method examples.
  • each functional module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • the names of the modules in the embodiments of the present application are schematic, and the names of the modules are not limited in actual implementation.
  • FIG. 13 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device may include: a sending module 1301 , a processing module 1302 and a receiving module 1303 .
  • the sending module 1301 is used to send data to other devices. For example, encoding information, first request message, second response message, third request message, fourth response message, first capability information or second capability response information.
  • the receiving module 1303 is used for receiving data from other devices. For example, the first information, the second request message, the first response message, the fourth request message, the third response message, the second capability information or the first capability response information.
  • the processing module 1302 is used to obtain the voice signal input by the user, extract multiple sets of information from the voice signal, and obtain the corresponding coding information of each set of information according to the coding table; decode the first information according to the coding table to obtain multiple sets of information, and obtain multiple sets of information according to the multiple sets of information.
  • the information generates a voice signal, and a preset sound is used to play the voice signal, etc.
  • FIG. 14 shows another structure of a terminal device provided by an embodiment of the present application.
  • the terminal device includes: a processor 1401 , a receiver 1402 , a transmitter 1403 , a memory 1404 , and a bus 1405 .
  • the processor 1401 includes one or more processing cores, and the processor 1401 executes various functional applications and information processing by running software programs and modules.
  • the receiver 1402 and the transmitter 1403 may be implemented as a communication component, which may be a baseband chip.
  • the memory 1404 is connected to the processor 1401 through the bus 1405 .
  • the memory 1404 may be configured to store at least one program instruction, and the processor 1401 may be configured to execute the at least one program instruction, so as to implement the technical solutions of the foregoing embodiments.
  • the implementation principle and technical effect thereof are similar to the related embodiments of the above method, and are not repeated here.
  • the processor can read the software program in the memory, interpret and execute the instructions of the software program, and process the data of the software program.
  • the processor performs baseband processing on the data to be sent, and outputs the baseband signal to the control circuit in the control circuit.
  • the control circuit performs radio frequency processing on the baseband signal and sends the radio frequency signal through the antenna in the form of electromagnetic waves send.
  • the control circuit receives the radio frequency signal through the antenna, converts the radio frequency signal into a baseband signal, and outputs the baseband signal to the processor, which converts the baseband signal into data and processes the data.
  • FIG. 14 only shows one memory and one processor. In an actual terminal, there may be multiple processors and memories.
  • the memory may also be referred to as a storage medium or a storage device, etc., which is not limited in this embodiment of the present application.
  • the processor may include a baseband processor and a central processing unit.
  • the baseband processor is mainly used to process communication data
  • the central processing unit is mainly used to execute software programs and process data of the software programs.
  • the baseband processor and the central processing unit may be integrated into one processor, or may be independent processors, which are interconnected through technologies such as a bus.
  • a terminal may include multiple baseband processors to adapt to different network standards
  • a terminal may include multiple central processors to enhance its processing capability
  • various components of the terminal may be connected through various buses.
  • the baseband processor can also be expressed as a baseband processing circuit or a baseband processing chip.
  • the central processing unit can also be expressed as a central processing circuit or a central processing chip.
  • the function of processing the communication protocol and communication data may be built in the processor, or may be stored in the memory in the form of a software program, and the processor executes the software program to realize the baseband processing function.
  • the memory can be integrated into the processor or independent of the processor.
  • the memory includes a cache, which can store frequently accessed data/instructions.
  • the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which can implement or
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the memory may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SS), etc., or may also be a volatile memory (volatile memory), for example Random-access memory (RAM).
  • Memory is, without limitation, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory in this embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data.
  • the methods provided by the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • When implemented in software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable apparatus.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line, DSL), or wireless (eg, infrared, wireless, microwave, etc.)
  • a readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media.
  • the available media can be magnetic media (eg, floppy disks, hard disks, magnetic tapes) ), optical media (eg, digital video disc (DWD), or semiconductor media (eg, SSD), etc.).
  • the embodiments of the present application provide a computer program product, which enables the terminal to execute the technical solutions in the foregoing embodiments when the computer program product runs on a terminal.
  • the implementation principle and technical effect thereof are similar to those of the above-mentioned related embodiments, which will not be repeated here.
  • the embodiments of the present application provide a computer-readable storage medium, on which program instructions are stored, and when the program instructions are executed by a terminal, the terminal executes the technical solutions of the foregoing embodiments.
  • the implementation principle and technical effect thereof are similar to those of the above-mentioned related embodiments, which will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A voice processing method, a terminal device, and a storage medium. A first terminal device and a second terminal device are in a call state. The method comprises: the first terminal device obtains a voice signal input by a user (S501); when determining that the first terminal device is currently in the weak signal environment, and determining that both the first terminal device and the second terminal device support phonetic alphabet encoding and decoding, extracting a plurality of groups of information from the voice signal, each group of information comprising a phonetic alphabet, a tone, and a duration of a single word (S502); obtaining, according to an encoding table, encoding information corresponding to each group of information (S503); and sending the encoding information to the second terminal device (S504). The encoding table stores the correspondence between the phonetic alphabet, the tone, and the duration, and the encoding information. According to the voice processing method, transmission is carried out at an extremely low code rate in the weak signal environment, both parties of the call can clearly convey the pronunciation of the words, the user determines the semantic meaning according to the pronunciation, and the call effect is improved.

Description

语音处理方法、终端设备及存储介质Voice processing method, terminal device and storage medium
本申请要求于2020年12月25日提交国家知识产权局、申请号为202011568861.1、申请名称为“语音处理方法、终端设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202011568861.1 and the application name "Voice processing method, terminal equipment and storage medium" filed with the State Intellectual Property Office on December 25, 2020, the entire contents of which are incorporated by reference in in this application.
技术领域technical field
本申请实施例涉及通信技术领域,尤其涉及一种语音处理方法、终端设备及存储介质。The embodiments of the present application relate to the field of communication technologies, and in particular, to a voice processing method, a terminal device, and a storage medium.
背景技术Background technique
用户使用终端设备进行语音通话时,在语音发送方,终端设备对语音信号进行语音编码,相应的,在语音接收方,终端设备对接收到的数据进行语音解码,还原为语音。When a user uses a terminal device to make a voice call, at the voice sender, the terminal device encodes the voice signal. Correspondingly, at the voice receiver, the terminal device decodes the received data and restores it to voice.
目前,终端设备采用波形编码等传统的语音编码方法。但是,当通信环境较差时,采用传统的语音编码方法会造成接收端信号失真较大,通话断续、有杂音甚至无声,通话效果很差。At present, terminal equipment adopts traditional speech coding methods such as waveform coding. However, when the communication environment is poor, the traditional speech coding method will cause the signal distortion of the receiving end to be large, the call will be intermittent, there will be noise or even no sound, and the call effect will be very poor.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种语音处理方法、终端设备及存储介质,当终端设备的通信环境较差时,通话双方可以清晰传达字的发音,用户通过发音判断语义,提升了通话效果。Embodiments of the present application provide a voice processing method, terminal device, and storage medium. When the communication environment of the terminal device is poor, both parties to a call can clearly communicate the pronunciation of words, and users can judge semantics by pronunciation, which improves the effect of the call.
第一方面,提供了一种语音处理方法,应用于第一终端设备,第一终端设备与第二终端设备处于通话状态,方法包括:获取用户输入的语音信号;在确定当前处于弱信号环境,且确定第一终端设备与第二终端设备均支持音标编解码时,从语音信号中提取多组信息,每组信息包括单字的音标、音调和持续时长;音标编解码是指对音标、音调和持续时长进行编解码;根据编码表获取每组信息对应的编码信息;编码表存储有音标、音调和持续时长与编码信息之间的对应关系;向第二终端设备发送编码信息。In a first aspect, a voice processing method is provided, which is applied to a first terminal device, and the first terminal device and the second terminal device are in a call state, the method includes: acquiring a voice signal input by a user; after determining that it is currently in a weak signal environment, And when it is determined that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding, multiple sets of information are extracted from the voice signal, and each group of information includes the phonetic symbol, tone and duration of a single word; Encoding and decoding the duration; obtaining the encoding information corresponding to each group of information according to the encoding table; the encoding table stores phonetic symbols, tones, and the correspondence between the duration and the encoding information; sending the encoding information to the second terminal device.
第一方面提供的语音处理方法,可以应用于弱信号环境中进行语音通话的语音发送方。在语音发送方,提取用户语音中每个单字的音标、音调和持续时长并进行编码,得到每个单字对应的预设比特长度的编码信息。由于编码信息为预设比特长度,可以在弱信号环境中以极低码率进行传输。通过对单字的发音和持续时长进行编码,接收方可以清晰的还原发送方用户的发音,从而接收方用户根据播放的发音获取语义,提升了通话效果。The voice processing method provided in the first aspect can be applied to a voice sender who conducts a voice call in a weak signal environment. At the voice sender, the phonetic symbol, pitch and duration of each word in the user's speech are extracted and encoded, to obtain encoded information of a preset bit length corresponding to each word. Since the encoded information has a preset bit length, it can be transmitted at a very low bit rate in a weak signal environment. By encoding the pronunciation and duration of a single word, the receiver can clearly restore the pronunciation of the sender user, so that the receiver user can acquire semantics according to the played pronunciation, which improves the call effect.
一种可能的实现方式中,根据编码表获取每组信息对应的编码信息,包括:对于每组信息,确定常用索引表中是否包括该组信息中的音标;若常用索引表中包括该组信息中的音标,则根据常用索引表获取编码信息;若常用索引表中不包括该组信息中的音标,则根据全局索引表获取编码信息。In a possible implementation, obtaining the corresponding coding information of each group of information according to the coding table, including: for each group of information, determining whether the commonly used index table includes the phonetic symbols in the group of information; if the commonly used index table includes this group of information. If the phonetic symbols in the group of information are not included in the common index table, the encoding information is obtained according to the global index table.
在该实现方式中,常用索引表是根据用户的常用字生成的,常用索引表中常用索引值的数量远远小于全局索引表中全局索引值的数量,先在常用索引表中查找进行编码,降低了查找数据量,提升了编码效率。In this implementation, the common index table is generated according to the common words of the user, and the number of common index values in the common index table is much smaller than the number of global index values in the global index table. The amount of search data is reduced, and the coding efficiency is improved.
一种可能的实现方式中,确定当前处于弱信号环境,包括:若确定目标参数满足第一预设条件,则向第二终端设备发送第一请求消息,第一请求消息用于指示第二终端设备使用音标编解码,目标参数用于指示第一终端设备当前所处通信环境的信号状态;接收第二终端设备发送的第一响应消息,第一响应消息用于指示第二终端设备使用音标编解码。In a possible implementation manner, determining that the current is in a weak signal environment includes: if it is determined that the target parameter satisfies the first preset condition, sending a first request message to the second terminal device, where the first request message is used to instruct the second terminal The device uses phonetic coding and decoding, and the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located; the first response message sent by the second terminal device is received, and the first response message is used to instruct the second terminal device to use phonetic coding. decoding.
一种可能的实现方式中,第一请求消息包括滞后定时器,滞后定时器用于指示第二终端设备使用音标编解码的延迟时间。In a possible implementation manner, the first request message includes a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use phonetic symbol encoding and decoding.
在该实现方式中,通过设置滞后定时器,为语音编解码方式的切换预留了时间,滞后定时器超时后,第一终端设备和第二终端设备同时使用音标编解码,提升了编解码方式的切换效果。In this implementation, by setting a lag timer, time is reserved for the switching of the voice codec mode. After the lag timer expires, the first terminal device and the second terminal device use phonetic symbol codec at the same time, which improves the codec mode. switching effect.
一种可能的实现方式中,确定当前处于弱信号环境,包括:接收第二终端设备发送的第二请求消息,第二请求消息用于指示第一终端设备使用音标编解码;向第二终端设备发送第二响应消息,第二响应消息用于指示第一终端设备使用音标编解码。In a possible implementation manner, determining that it is currently in a weak signal environment includes: receiving a second request message sent by a second terminal device, where the second request message is used to instruct the first terminal device to use phonetic codec; A second response message is sent, where the second response message is used to instruct the first terminal device to use phonetic symbol codec.
一种可能的实现方式中,方法还包括:若确定目标参数满足第二预设条件,则向第二终端设备发送第三请求消息,第三请求消息用于指示第二终端设备使用波形编解码,目标参数用于指示第一终端设备当前所处通信环境的信号状态;接收第二终端设备发送的第三响应消息,第三响应消息用于指示第二终端设备使用波形编解码。In a possible implementation manner, the method further includes: if it is determined that the target parameter satisfies the second preset condition, sending a third request message to the second terminal device, where the third request message is used to instruct the second terminal device to use waveform encoding and decoding. , the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located; the third response message sent by the second terminal device is received, and the third response message is used to instruct the second terminal device to use waveform encoding and decoding.
一种可能的实现方式中,方法还包括:接收第二终端设备发送的第四请求消息,第四请求消息用于指示第一终端设备使用波形编解码;若确定目标参数满足第二预设条件,则向第二终端设备发送第四响应消息,第四响应消息用于指示第一终端设备使用波形编解码,目标参数用于指示第一终端设备当前所处通信环境的信号状态。In a possible implementation manner, the method further includes: receiving a fourth request message sent by the second terminal device, where the fourth request message is used to instruct the first terminal device to use waveform encoding and decoding; if it is determined that the target parameter satisfies the second preset condition , then send a fourth response message to the second terminal device, where the fourth response message is used to instruct the first terminal device to use waveform codec, and the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located.
在上述实现方式中,当第一终端设备和第二终端设备所处通信环境均不是弱信号环境时,可以切换为传统的语音编解码方式,提升通话效果。In the above implementation manner, when the communication environment in which the first terminal device and the second terminal device are located is not a weak signal environment, the traditional voice codec mode can be switched to improve the call effect.
一种可能的实现方式中,确定第一终端设备支持音标编解码,包括:若确定第一终端设备没有打开音标编解码的功能,则生成并输出提示信息;接收用户的第一指令;根据第一指令打开功能。In a possible implementation manner, determining that the first terminal device supports phonetic symbol encoding and decoding includes: if it is determined that the first terminal device does not have the phonetic symbol encoding and decoding function enabled, generating and outputting prompt information; receiving the user's first instruction; A command turns on the function.
一种可能的实现方式中,确定第一终端设备与第二终端设备均支持音标编解码,包括:向第二终端设备发送第一能力信息,第一能力信息用于指示第一终端设备支持音标编解码;接收第二终端设备发送的第一能力响应信息,第一能力响应信息用于指示第二终端设备支持音标编解码;或者,接收第二终端设备发送的第二能力信息,第二能力信息用于指示第二终端设备支持音标编解码;向第二终端设备发送第二能力响应信息,第二能力响应信息用于指示第一终端设备支持音标编解码。In a possible implementation manner, determining that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding includes: sending first capability information to the second terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic symbology. Encoding and decoding; receiving the first capability response information sent by the second terminal device, where the first capability response information is used to indicate that the second terminal device supports phonetic symbol encoding and decoding; or, receiving the second capability information sent by the second terminal device, the second capability The information is used to indicate that the second terminal device supports phonetic codec; the second capability response information is sent to the second terminal device, and the second capability response information is used to indicate that the first terminal device supports phonetic codec.
一种可能的实现方式中,方法还包括:显示设置界面;接收用户在设置界面中的操作;响应于操作,打开音标编解码的功能。In a possible implementation manner, the method further includes: displaying a setting interface; receiving a user's operation in the setting interface; and in response to the operation, enabling the function of phonetic symbol encoding and decoding.
第二方面,提供了一种语音处理方法,应用于第二终端设备,第二终端设备与第一终端设备处于通话状态,方法包括:接收第一终端设备发送的第一信息;根据编码表对第一信息解码,获得多组信息;每组信息包括单字的音标、音调和持续时长,编码表存储有音标、音调和持续时长与编码信息之间的对应关系;根据多组信息生成语音信号;采用预设声音播放语音信号。In a second aspect, a voice processing method is provided, which is applied to a second terminal device, and the second terminal device is in a talking state with the first terminal device. The method includes: receiving first information sent by the first terminal device; The first information is decoded to obtain multiple groups of information; each group of information includes the phonetic symbol, tone and duration of a single character, and the coding table stores the correspondence between the phonetic symbol, the tone, and the duration and the encoded information; according to the multiple groups of information, a voice signal is generated; Play the voice signal with the preset sound.
第二方面提供的语音处理方法,可以应用于弱信号环境中进行语音通话的语音接收方。 语音发送方发送的编码信息经过信道传输后,在语音接收方,对接收到的信息进行解码,得到每个单字的音标、音调和持续时长,从而生成完整流畅的语音信号,并采用预设声音进行播放。由于编码信息为预设比特长度,可以在弱信号环境中以极低码率进行传输。通过对单字的发音和持续时长进行编码,接收方可以清晰的还原发送方用户的发音,从而接收方用户根据播放的发音获取语义,提升了通话效果。The voice processing method provided in the second aspect can be applied to a voice receiver who conducts a voice call in a weak signal environment. After the encoded information sent by the voice sender is transmitted through the channel, the voice receiver decodes the received information to obtain the phonetic symbol, pitch and duration of each word, thereby generating a complete and smooth voice signal and using the preset sound to play. Since the encoded information has a preset bit length, it can be transmitted at a very low bit rate in a weak signal environment. By encoding the pronunciation and duration of a single word, the receiver can clearly restore the pronunciation of the sender user, so that the receiver user can acquire semantics according to the played pronunciation, which improves the call effect.
一种可能的实现方式中,根据编码表对第一信息解码,获得多组信息,包括:从第一信息中依次获取多个编码信息,编码信息的长度为预设比特长度;对于每个编码信息,根据编码表获取该编码信息对应的音标、音调和持续时长。In a possible implementation, decoding the first information according to the encoding table to obtain multiple sets of information, including: sequentially obtaining multiple encoding information from the first information, and the length of the encoding information is a preset bit length; for each encoding information, and obtain the phonetic symbol, pitch and duration corresponding to the encoding information according to the encoding table.
一种可能的实现方式中,接收第一终端设备发送的第一信息之前,还包括:接收第一终端设备发送的第一请求消息,第一请求消息用于指示第二终端设备使用音标编解码;音标编解码是指对音标、音调和持续时长进行编解码;向第一终端设备发送第一响应消息,第一响应消息用于指示第二终端设备使用音标编解码。In a possible implementation manner, before receiving the first information sent by the first terminal device, the method further includes: receiving a first request message sent by the first terminal device, where the first request message is used to instruct the second terminal device to use phonetic symbol encoding and decoding. ; Phonetic codec refers to encoding and decoding phonetics, pitch and duration; sending a first response message to the first terminal device, the first response message is used to instruct the second terminal device to use phonetic codec.
一种可能的实现方式中,第一请求消息包括滞后定时器,滞后定时器用于指示第二终端设备使用音标编解码的延迟时间。In a possible implementation manner, the first request message includes a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use phonetic symbol encoding and decoding.
一种可能的实现方式中,接收第一终端设备发送的第一信息之前,还包括:若确定目标参数满足第一预设条件,则向第一终端设备发送第二请求消息,第二请求消息用于指示第一终端设备使用音标编解码,音标编解码是指对音标、音调和持续时长进行编解码,目标参数用于指示第二终端设备当前所处通信环境的信号状态;接收第一终端设备发送的第二响应消息,第二响应消息用于指示第一终端设备使用音标编解码。In a possible implementation manner, before receiving the first information sent by the first terminal device, the method further includes: if it is determined that the target parameter satisfies the first preset condition, sending a second request message to the first terminal device, the second request message Used to instruct the first terminal device to use phonetic codec, the phonetic codec refers to encoding and decoding phonetic symbols, pitch and duration, and the target parameter is used to indicate the signal state of the communication environment where the second terminal device is currently located; receiving the first terminal A second response message sent by the device, where the second response message is used to instruct the first terminal device to use phonetic symbol codec.
一种可能的实现方式中,方法还包括:接收第一终端设备发送的第三请求消息,第三请求消息用于指示第二终端设备使用波形编解码;若确定目标参数满足第二预设条件,则向第一终端设备发送第三响应消息,第三响应消息用于指示第二终端设备使用波形编解码,目标参数用于指示第一终端设备当前所处通信环境的信号状态。In a possible implementation manner, the method further includes: receiving a third request message sent by the first terminal device, where the third request message is used to instruct the second terminal device to use waveform encoding and decoding; if it is determined that the target parameter satisfies the second preset condition , then send a third response message to the first terminal device, where the third response message is used to instruct the second terminal device to use waveform codec, and the target parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located.
一种可能的实现方式中,方法还包括:若确定目标参数满足第二预设条件,则向第一终端设备发送第四请求消息,第四请求消息用于指示第一终端设备使用波形编解码;接收第一终端设备发送的第四响应消息,第四响应消息用于指示第一终端设备使用波形编解码。In a possible implementation manner, the method further includes: if it is determined that the target parameter satisfies the second preset condition, sending a fourth request message to the first terminal device, where the fourth request message is used to instruct the first terminal device to use waveform encoding and decoding. ; Receive a fourth response message sent by the first terminal device, where the fourth response message is used to instruct the first terminal device to use waveform codec.
一种可能的实现方式中,接收第一终端设备发送的第一信息之前,还包括:接收第一终端设备发送的第一能力信息,第一能力信息用于指示第一终端设备支持音标编解码;向第一终端设备发送第一能力响应信息,第一能力响应信息用于指示第二终端设备支持音标编解码;音标编解码是指对音标、音调和持续时长进行编解码;或者,向第一终端设备发送的第二能力信息,第二能力信息用于指示第二终端设备支持音标编解码;接收第一终端设备发送的第二能力响应信息,第二能力响应信息用于指示第一终端设备支持音标编解码。In a possible implementation manner, before receiving the first information sent by the first terminal device, the method further includes: receiving the first capability information sent by the first terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic symbol encoding and decoding. ; Send the first capability response information to the first terminal device, and the first capability response information is used to indicate that the second terminal device supports phonetic symbol encoding and decoding; Phonetic symbol encoding and decoding refers to encoding and decoding phonetic symbols, tones and duration; Second capability information sent by a terminal device, the second capability information is used to indicate that the second terminal device supports phonetic codec; second capability response information sent by the first terminal device is received, and the second capability response information is used to indicate the first terminal. The device supports phonetic codec.
一种可能的实现方式中,方法还包括:显示设置界面;接收用户在设置界面中的操作;响应于操作,打开音标编解码的功能,音标编解码是指对音标、音调和持续时长进行编解码。In a possible implementation manner, the method further includes: displaying a setting interface; receiving a user's operation in the setting interface; in response to the operation, turning on the function of phonetic symbol encoding and decoding, where phonetic symbol encoding and decoding refers to encoding phonetic symbols, pitch and duration. decoding.
第三方面,提供一种装置,包括:用于执行以上任一方面中各个步骤的单元或手段(means)。In a third aspect, there is provided an apparatus comprising: means or means for performing the steps in any of the above aspects.
第四方面,提供一种终端设备,包括处理器、存储器和收发器,收发器用于与其他设备通信,处理器用于调用存储器中存储的程序,以执行以上任一方面提供的方法。In a fourth aspect, a terminal device is provided, including a processor, a memory, and a transceiver, where the transceiver is used for communicating with other devices, and the processor is used for calling a program stored in the memory to execute the method provided in any of the above aspects.
第五方面,提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当指令在计算机或处理器上运行时,实现以上任一方面提供的方法。In a fifth aspect, a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer or a processor, the method provided in any of the above aspects is implemented.
第六方面,提供一种程序产品,所述程序产品包括计算机程序,所述计算机程序存储在可读存储介质中,设备的至少一个处理器可以从所述可读存储介质读取所述计算机程序,所述至少一个处理器执行所述计算机程序使得该设备实施以上任一方面提供的方法。In a sixth aspect, a program product is provided, the program product includes a computer program, the computer program is stored in a readable storage medium, and at least one processor of a device can read the computer program from the readable storage medium , the at least one processor executes the computer program to cause the device to implement the method provided in any of the above aspects.
在以上任一方面中,在一种可能的实现方式中,编码信息包括音标和音调对应的第一信息分量,以及持续时长对应的第二信息分量。In any of the above aspects, in a possible implementation manner, the encoded information includes a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration.
在一种可能的实现方式中,第一信息分量包括音标对应的第一信息子分量和音调对应的第二信息子分量。In a possible implementation manner, the first information component includes a first information sub-component corresponding to a phonetic symbol and a second information sub-component corresponding to a tone.
在一种可能的实现方式中,编码表包括全局索引表和常用索引表,常用索引表是根据预设时间段内用户使用的单字的次数生成的,全局索引表包括音标和该音标的全局索引值,常用索引表中包括的音标具有常用索引值以及该音标在全局索引表中的全局索引值。In a possible implementation manner, the coding table includes a global index table and a common index table, the common index table is generated according to the number of words used by the user within a preset time period, and the global index table includes a phonetic symbol and a global index of the phonetic symbol The phonetic symbols included in the common index table have the common index value and the global index value of the phonetic symbol in the global index table.
在一种可能的实现方式中,目标参数包括下列中的至少一项:终端设备的位置信息、终端设备当前接入小区的小区标识、终端设备接收信号的信号强度或语音丢包率。In a possible implementation manner, the target parameter includes at least one of the following: the location information of the terminal device, the cell identifier of the cell currently accessed by the terminal device, the signal strength of the signal received by the terminal device or the voice packet loss rate.
附图说明Description of drawings
图1为本申请实施例适用的应用场景图;FIG. 1 is a diagram of an application scenario to which an embodiment of the present application is applicable;
图2为终端设备进行语音通话时的原理示意图;Fig. 2 is a schematic diagram of the principle when a terminal device performs a voice call;
图3为弱信号环境中采用传统语音编解码的通话效果示意图;3 is a schematic diagram of a call effect using traditional voice codec in a weak signal environment;
图4为弱信号环境中本申请实施例提供的通话效果示意图;FIG. 4 is a schematic diagram of a call effect provided by an embodiment of the present application in a weak signal environment;
图5为本申请实施例提供的语音处理方法的一种消息交互图;FIG. 5 is a message interaction diagram of the voice processing method provided by the embodiment of the present application;
图6为本申请实施例提供的语音处理方法的另一种消息交互图;FIG. 6 is another message interaction diagram of the voice processing method provided by the embodiment of the present application;
图7为本申请实施例提供的语音处理方法的又一种消息交互图;FIG. 7 is another message interaction diagram of the voice processing method provided by the embodiment of the present application;
图8为本申请实施例提供的语音处理方法的又一种消息交互图;FIG. 8 is another message interaction diagram of the voice processing method provided by the embodiment of the present application;
图9为本申请实施例提供的语音处理方法的又一种消息交互图;FIG. 9 is another message interaction diagram of the voice processing method provided by the embodiment of the present application;
图10为本申请实施例提供的设置音标编解码方式的一种界面图;Fig. 10 is a kind of interface diagram of setting phonetic symbol encoding and decoding mode provided by the embodiment of this application;
图11为本申请实施例提供的语音处理方法的又一种消息交互图;11 is another message interaction diagram of the voice processing method provided by the embodiment of the present application;
图12为本申请实施例提供的语音处理方法的又一种消息交互图;12 is another message interaction diagram of the voice processing method provided by the embodiment of the present application;
图13为本申请实施例提供的终端设备的一种结构示意图;FIG. 13 is a schematic structural diagram of a terminal device provided by an embodiment of the present application;
图14为本申请实施例提供的终端设备的另一种结构示意图。FIG. 14 is another schematic structural diagram of a terminal device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图描述本申请实施例。The embodiments of the present application are described below with reference to the accompanying drawings.
示例性的,图1为本申请实施例适用的应用场景图。如图1所示,用户A使用终端设备100,用户B使用终端设备200,用户A和用户B可以进行语音通话。本申请实施例对终端设备的类型不做限定。例如,一些终端设备的举例可以为:手机、平板电脑、掌上电脑、可穿戴设备等。Exemplarily, FIG. 1 is a diagram of an application scenario to which this embodiment of the present application is applicable. As shown in FIG. 1 , user A uses terminal device 100, user B uses terminal device 200, and user A and user B can conduct a voice call. This embodiment of the present application does not limit the type of the terminal device. For example, examples of some terminal devices may be: mobile phones, tablet computers, PDAs, wearable devices, and the like.
图2为终端设备进行语音通话时的原理示意图。如图1和图2所示,当用户A说话、 用户B收听时,终端设备100为语音发送方,终端设备200为语音接收方。终端设备100获取用户A输入的语音信号,对该语音信号进行语音编码,生成编码信息。编码信息经过信道传输后,被终端设备200接收。终端设备200对接收到的信息进行语音解码,还原生成语音信号,并输出给用户B。FIG. 2 is a schematic diagram of the principle when a terminal device performs a voice call. As shown in FIG. 1 and FIG. 2 , when user A speaks and user B listens, the terminal device 100 is the voice sender, and the terminal device 200 is the voice receiver. The terminal device 100 acquires the voice signal input by the user A, performs voice encoding on the voice signal, and generates encoded information. The encoded information is received by the terminal device 200 after being transmitted through the channel. The terminal device 200 performs voice decoding on the received information, restores and generates a voice signal, and outputs it to user B.
需要说明,图2示出了语音通话过程中的语音编解码部分,对其他处理过程不做限定。It should be noted that FIG. 2 shows the voice codec part in the voice call process, and other processing processes are not limited.
下面对本申请实施例中的概念进行说明。The concepts in the embodiments of the present application are described below.
1、语音编码1. Voice coding
语音编码具有广义含义和狭义含义。广义含义是指一种编码方法,包括在发送方进行语音编码以及在接收方进行语音解码。狭义含义是指在发送方进行语音编码。为了进行区分,在本申请实施例中,广义含义的编码均称为编解码。Speech coding has a broad meaning and a narrow meaning. In a broad sense, it refers to an encoding method that includes speech encoding at the sender and speech decoding at the receiver. The narrow meaning refers to speech encoding at the sender. For distinction, in the embodiments of the present application, encoding in a broad sense is referred to as encoding and decoding.
语音编解码的目的是为了将语音信号数字化,并压缩语音信号的传输带宽,提高信道的传输速率。The purpose of speech codec is to digitize the speech signal, compress the transmission bandwidth of the speech signal, and improve the transmission rate of the channel.
语音编解码的实现方式有多种,例如,波形编解码、特征编解码、参数编解码等传统的语音编解码,还包括本申请实施例中的音标编解码。There are many ways to implement speech encoding and decoding, for example, traditional speech encoding and decoding such as waveform encoding and decoding, feature encoding and decoding, and parameter encoding and decoding, as well as phonetic symbol encoding and decoding in the embodiments of the present application.
为了便于说明,本申请实施例中传统的语音编解码方式以波形编解码为例进行说明。For the convenience of description, the traditional speech encoding and decoding methods in the embodiments of the present application take waveform encoding and decoding as an example for description.
2、弱信号环境2. Weak signal environment
终端设备处于弱信号环境时,信号质量较差,如果采用传统的语音编解码方式会导致传输的语音码率降低,接收端还原的语音信号失真较大,出现断续、杂音甚至无声的情况。When the terminal equipment is in a weak signal environment, the signal quality is poor. If the traditional voice encoding and decoding method is used, the transmitted voice code rate will be reduced, and the voice signal restored by the receiving end will be greatly distorted, resulting in discontinuity, noise or even no sound.
需要说明,在本申请实施例中,终端设备处于弱信号环境,是指通话的两个终端设备中任意一个处于弱信号环境。示例性的,以图1中终端设备100为例,终端设备100处于弱信号环境,包括如下三种场景:场景一、终端设备100处于弱信号环境;场景二、与终端设备100通话的终端设备200处于弱信号环境;场景三、终端设备100和终端设备200均处于弱信号环境。It should be noted that, in this embodiment of the present application, the terminal device is in a weak signal environment, which means that any one of the two terminal devices in a call is in a weak signal environment. Exemplarily, taking the terminal device 100 in FIG. 1 as an example, the terminal device 100 is in a weak signal environment, including the following three scenarios: scenario 1, the terminal device 100 is in a weak signal environment; scenario 2, the terminal device talking to the terminal device 100 200 is in a weak signal environment; in scenario three, both the terminal device 100 and the terminal device 200 are in a weak signal environment.
其中,终端设备可以获取自身的目标参数,根据目标参数是否满足预设条件确定终端设备是否处于弱信号环境。在本申请实施例中,当目标参数满足第一预设条件时确定终端设备处于弱信号环境,当目标参数满足第二预设条件时确定终端设备没有处于弱信号环境。目标参数不同,对应的第一预设条件和第二预设条件不同。可选的,目标参数可以包括下列中的至少一项:终端设备的位置信息、终端设备当前接入小区的小区标识、终端设备接收信号的信号强度或语音丢包率。The terminal device may acquire its own target parameters, and determine whether the terminal device is in a weak signal environment according to whether the target parameters satisfy a preset condition. In this embodiment of the present application, when the target parameter satisfies the first preset condition, it is determined that the terminal device is in a weak signal environment, and when the target parameter satisfies the second preset condition, it is determined that the terminal device is not in a weak signal environment. The target parameters are different, and the corresponding first preset conditions and the second preset conditions are different. Optionally, the target parameter may include at least one of the following: the location information of the terminal device, the cell identifier of the cell currently accessed by the terminal device, the signal strength of the signal received by the terminal device or the voice packet loss rate.
可选的,目标参数为终端设备的位置信息,可以预先记录弱信号地理范围。当终端设备的位置信息在弱信号地理范围内,则确定终端设备处于弱信号环境。相反的,当终端设备的位置信息不在弱信号地理范围内,则确定终端设备未处于弱信号环境。本申请实施例对弱信号地理范围不做限定,例如,一些山区、桥梁等布站困难的地区。Optionally, the target parameter is the location information of the terminal device, and the weak signal geographic range can be recorded in advance. When the location information of the terminal device is within the weak signal geographic range, it is determined that the terminal device is in a weak signal environment. On the contrary, when the location information of the terminal device is not within the weak signal geographic range, it is determined that the terminal device is not in a weak signal environment. This embodiment of the present application does not limit the geographic range of weak signals, for example, some mountainous areas, bridges, and other areas where it is difficult to deploy stations.
可选的,目标参数为终端设备当前接入小区的小区标识,可以预先记录弱信号小区标识。当终端设备当前接入小区的小区标识为弱信号小区标识,则确定终端设备处于弱信号环境。相反的,当终端设备当前接入小区的小区标识不是弱信号小区标识,则确定终端设备未处于弱信号环境。本申请实施例对弱信号小区标识不做限定,例如,在高铁、地铁或高速路等链式分布场景中,信号覆盖盲点较为固定,其中信号差的小区的标识。Optionally, the target parameter is the cell identifier of the cell currently accessed by the terminal device, and the weak signal cell identifier may be pre-recorded. When the cell identifier of the cell currently accessed by the terminal device is a weak signal cell identifier, it is determined that the terminal device is in a weak signal environment. On the contrary, when the cell identity of the cell currently accessed by the terminal device is not the weak signal cell identity, it is determined that the terminal device is not in a weak signal environment. This embodiment of the present application does not limit the identification of weak signal cells. For example, in a chain distribution scenario such as high-speed rail, subway, or expressway, the signal coverage blind spot is relatively fixed, and the identification of the cell with poor signal.
可选的,目标参数为终端设备接收信号的信号强度,可以预先设置第一阈值和第二阈值,第二阈值大于或等于第一阈值。当终端设备接收信号的信号强度小于或等于第一阈值时,确定终端设备处于弱信号环境。当终端设备接收信号的信号强度大于或等于第二阈值时,确定终端设备未处于弱信号环境。本申请实施例对第一阈值和第二阈值的取值不做限定。Optionally, the target parameter is the signal strength of the signal received by the terminal device, a first threshold and a second threshold may be preset, and the second threshold is greater than or equal to the first threshold. When the signal strength of the signal received by the terminal device is less than or equal to the first threshold, it is determined that the terminal device is in a weak signal environment. When the signal strength of the signal received by the terminal device is greater than or equal to the second threshold, it is determined that the terminal device is not in a weak signal environment. This embodiment of the present application does not limit the values of the first threshold and the second threshold.
可选的,目标参数为终端设备的语音丢包率,可以预先设置第三阈值和第四阈值,第三阈值大于或等于第四阈值。当语音丢包率大于或等于第四阈值时,确定终端设备处于弱信号环境。当语音丢包率小于或等于第三阈值时,确定终端设备未处于弱信号环境。本申请实施例对第三阈值和第四阈值的取值不做限定。Optionally, the target parameter is the voice packet loss rate of the terminal device, a third threshold and a fourth threshold may be preset, and the third threshold is greater than or equal to the fourth threshold. When the voice packet loss rate is greater than or equal to the fourth threshold, it is determined that the terminal device is in a weak signal environment. When the voice packet loss rate is less than or equal to the third threshold, it is determined that the terminal device is not in a weak signal environment. This embodiment of the present application does not limit the values of the third threshold and the fourth threshold.
3、音标编解码3. Phonetic codec
在本申请实施例中,音标编解码是指根据编码表对单字的音标、音调和持续时长进行编解码。本申请实施例对音标编解码的名称不做限定,例如,也可以称为弱信号高清语音编码。In this embodiment of the present application, the phonetic symbol encoding and decoding refers to encoding and decoding the phonetic symbol, pitch, and duration of a single word according to a coding table. This embodiment of the present application does not limit the name of the phonetic codec, for example, it may also be called weak-signal high-definition speech codec.
音标可以以字为单位,也可以是词、句子或其他单位。本申请实施例以字单位为示例进行说明,每个单字具有三段信息,分别为音标、音调和持续时长。Phonetic symbols can be word-based, words, sentences, or other units. The embodiment of the present application is described by taking the word unit as an example, and each single word has three pieces of information, which are phonetic symbol, pitch, and duration respectively.
本申请实施例对持续时长的划分标准不做限定。例如,持续时长可以包括短、中、长三类。短表示小于0.5秒,中表示大于等于0.5秒且小于2秒,长表示大于等于2秒。又例如,持续时长可以包括1~4共四类。1表示小于0.5秒,2表示大于等于0.5秒且小于1秒,3表示大于等于1秒且小于2秒,4表示大于等于2秒。This embodiment of the present application does not limit the division standard of the duration. For example, the duration can include short, medium and long. Short means less than 0.5 seconds, medium means greater than or equal to 0.5 seconds and less than 2 seconds, and long means greater than or equal to 2 seconds. For another example, the duration may include four categories of 1 to 4. 1 means less than 0.5 seconds, 2 means greater than or equal to 0.5 seconds and less than 1 second, 3 means greater than or equal to 1 second and less than 2 seconds, 4 means greater than or equal to 2 seconds.
在语音发送方,根据编码表对每个单字的音标、音调和持续时长进行音标编码生成预设比特长度的编码信息。本申请实施例对预设比特长度的取值不做限定,例如,22bit。编码信息经过信道传输后到达语音接收方,接收方根据编码表对接收到的信息进行音标解码得到单字的音标、音调和持续时长。On the voice sender side, phonetic coding is performed on the phonetic symbol, tone and duration of each word according to the coding table to generate coding information with a preset bit length. This embodiment of the present application does not limit the value of the preset bit length, for example, 22 bits. After the encoded information is transmitted through the channel, it reaches the voice receiver, and the receiver performs phonetic decoding on the received information according to the coding table to obtain the phonetic symbol, pitch and duration of the single word.
其中,在不同的语言中,音标和音调的定义是不同的,本申请实施例对此不作限定。例如,在中文中,音标指一个汉字的拼音。例如,“你好”对应2个音标,分别为“你”的拼音“ni”和“好”的拼音“hao”。汉语中的音调包括4种,分别为:阴平(第一声)、阳平(第二声)、上声(第三声)和去声(第四声)。又例如,在英语中,音标指一个英语单词。例如,“Good morning”对应2个音标,分别为“good”和“morning”。英语中的音调可以包括但不限于下列中的至少两种:肯定、疑问、升调、降调、先升后降、先降后升、平调、高调和低调。Wherein, in different languages, the definitions of phonetic symbols and tones are different, which are not limited in this embodiment of the present application. For example, in Chinese, phonetic symbols refer to the pinyin of a Chinese character. For example, "hello" corresponds to 2 phonetic symbols, which are "ni" for "you" and "hao" for "hao". There are four tones in Chinese: Yinping (first tone), Yangping (second tone), Shang tone (third tone) and Qu tone (fourth tone). For another example, in English, a phonetic symbol refers to an English word. For example, "Good morning" corresponds to 2 phonetic symbols, namely "good" and "morning". Tones in English may include, but are not limited to, at least two of the following: affirmative, interrogative, rising, falling, rising and falling, falling and rising, flat, high, and low.
4、编码表、编码信息、全局索引表和常用索引表4. Coding table, coding information, global index table and common index table
编码表存储在终端设备中,编码表存储有音标、音调和持续时长与编码信息之间的对应关系。终端设备通过查找编码表完成音标编解或音标解码。可选的,音标、音调和持续时长与编码信息之间的对应关系,可以包括下列中的至少一种:音标与编码信息之间的对应关系,音调与编码信息之间的对应关系,持续时长与编码信息之间的对应关系,音标和音调的组合与编码信息之间的对应关系,或者,音标、音调和持续时长的组合与编码信息之间的对应关系。编码表根据不同的对应关系可以包括1张表或至少2张表。根据不同的对应关系,预设比特长度的编码信息中可以包括音标、音调和持续时长之间不同组合对应的信息分量。The encoding table is stored in the terminal device, and the encoding table stores the correspondence between phonetic symbols, tones, duration and encoding information. The terminal device completes the phonetic symbol encoding and decoding or phonetic symbol decoding by looking up the coding table. Optionally, the correspondence between phonetic symbols, tones and durations and the encoded information can include at least one of the following: the correspondence between the phonetic symbols and the encoded information, the correspondence between the tones and the encoded information, the duration Correspondence between a combination of phonetic symbols and tones and encoding information, or a corresponding relationship between a combination of phonetic symbols, tones and duration and encoding information. The coding table may include one table or at least two tables according to different correspondences. According to different correspondences, the encoded information of the preset bit length may include information components corresponding to different combinations of phonetic symbols, tones and durations.
可选的,编码信息可以包括音标和音调对应的第一信息分量,以及持续时长对应的第二信息分量。Optionally, the encoded information may include a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration.
可选的,第一信息分量包括音标对应的第一信息子分量和音调对应的第二信息子分量。即,音标、音调和持续时长分别对应有信息分量。Optionally, the first information component includes a first information sub-component corresponding to a phonetic symbol and a second information sub-component corresponding to a tone. That is, phonetic symbols, pitches, and durations respectively correspond to information components.
可选的,考虑到用户的常用词汇是有限的,为了节省查表时间,提高音标编解码的效率,编码表可以包括全局索引表和常用索引表。全局索引表用于语音发送方进行编码以及语音接收方进行解码,常用索引表用于语音发送方进行编码。其中,全局索引表包括音标和该音标的全局索引值。全局索引表可以理解为某种语言下单字或音标的全集,全局索引值的取值范围较大。常用索引表是根据预设时间段内用户使用的单字的次数生成的,本申请实施例对预设时间段的取值不做限定,例如,3个月、6个月或1年。常用索引表中包括的音标具有常用索引值以及该音标在全局索引表中的全局索引值。Optionally, considering that the user's commonly used vocabulary is limited, in order to save time for table lookup and improve the efficiency of phonetic symbol encoding and decoding, the encoding table may include a global index table and a common index table. The global index table is used for encoding by the voice sender and decoding by the voice receiver, and a common index table is used for encoding by the voice sender. Wherein, the global index table includes a phonetic symbol and a global index value of the phonetic symbol. The global index table can be understood as a complete set of words or phonetic symbols in a certain language, and the value range of the global index value is large. The commonly used index table is generated according to the number of words used by the user within a preset time period, and the embodiment of the present application does not limit the value of the preset time period, for example, 3 months, 6 months, or 1 year. The phonetic symbols included in the common index table have the common index value and the global index value of the phonetic symbol in the global index table.
终端设备根据编码表进行音标编解时,可以首先查找常用索引表。由于常用索引表是根据用户的常用字生成的,常用索引值的数量较少,查表时间短,可以提升音标编解码的效率。当在常用索引表中没有查到结果时,再查找全局索引表进行音标编解。When the terminal device performs phonetic symbol encoding and decoding according to the encoding table, it may first look up the commonly used index table. Because the common index table is generated according to the common words of the user, the number of common index values is small, and the table lookup time is short, which can improve the efficiency of phonetic symbol encoding and decoding. When no result is found in the common index table, the global index table is searched for phonetic symbol encoding and decoding.
本申请实施例对全局索引值的取值范围和音标排序不做限定。例如,常用字排序在先。This embodiment of the present application does not limit the value range of the global index value and the phonetic symbol ordering. For example, common words are sorted first.
本申请实施例对常用索引值的取值范围和音标排序不做限定。例如,音标数量可以为5000个,常用索引值可以为13bit长度,最多表示8192个音标,可以根据预设时间段内单字的使用频次按照从多到少的顺序进行排序。This embodiment of the present application does not limit the value range of the commonly used index values and the phonetic symbol ordering. For example, the number of phonetic symbols can be 5000, and the commonly used index value can be 13 bits in length, representing a maximum of 8192 phonetic symbols, which can be sorted in descending order according to the usage frequency of single words within a preset time period.
可选的,全局索引表和常用索引表可以定期更新,本申请实施例对更新周期不做限定。Optionally, the global index table and the commonly used index table may be updated periodically, and the update period is not limited in this embodiment of the present application.
下面以语言为中文为例,对编码表、编码信息、全局索引表和常用索引表进行举例说明。The following takes the language of Chinese as an example to illustrate the coding table, coding information, global index table and common index table.
可选的,在一种实现方式中,编码表包括:中文全局索引表、持续时长表和常用索引表。编码信息包括第一信息分量和第二信息分量。中文全局索引表用于指示单字、音标和音调的组合与编码信息中第一信息分量(全局索引值)之间的对应关系。全局索引值可以为20bit,最多表示104万个单字。持续时长表用于指示持续时长与编码信息中第二信息分量之间的对应关系。常用索引表包括单字、音标、音调、持续时长、全局索引值和常用索引值。示例性的,结合表1~表3进行说明。假设,编码信息为22bit长度,其中,第一信息分量(全局索引值)为20bit长度,第二信息分量为2bit长度。如表1所示,中文全局索引表可以理解为中文单字的全集,每个字对应有全局索引值。其中,音调取值1~4,分别表示第一声~第四声。如表2所示,持续时长包括短、中、长三类,对应的索引值为2bit,该索引值即为第二信息分量。如表3所示,常用索引值可以唯一区分单字、音标、音调和持续时长的组合。Optionally, in an implementation manner, the coding table includes: a Chinese global index table, a duration table, and a common index table. The encoded information includes a first information component and a second information component. The Chinese global index table is used to indicate the correspondence between the combination of words, phonetic symbols and tones and the first information component (global index value) in the encoded information. The global index value can be 20 bits, representing a maximum of 1.04 million words. The duration table is used to indicate the correspondence between the duration and the second information component in the encoded information. The common index table includes words, phonetic symbols, tones, duration, global index value and common index value. Exemplarily, it will be described in conjunction with Tables 1 to 3. It is assumed that the encoded information is 22 bits long, wherein the first information component (global index value) is 20 bits long, and the second information component is 2 bits long. As shown in Table 1, the Chinese global index table can be understood as a complete set of Chinese words, and each word corresponds to a global index value. The pitch ranges from 1 to 4, representing the first to fourth tones, respectively. As shown in Table 2, the duration includes three types: short, medium, and long, and the corresponding index value is 2 bits, and the index value is the second information component. As shown in Table 3, the commonly used index values can uniquely distinguish the combination of word, phonetic symbol, tone and duration.
表1中文全局索引表Table 1 Chinese global index table
单字single word 音标phonetic symbol 音调tone 全局索引值(20bit)Global index value (20bit)
it is good haohao 33 0x000010x00001
Bad huaihuai 44 0x000020x00002
many duoduo 11 0x000030x00003
few shaoshao 33 0x000040x00004
表2持续时长表Table 2 Duration table
持续时长duration 索引值(2bit)(二进制)Index value (2bit) (binary)
short 0000
middle 0101
long 1010
表3常用索引表Table 3 Commonly used index table
单字single word 音标phonetic symbol 音调tone 持续时长duration 全局索引值(20bit)Global index value (20bit) 常用索引值(13bit)Common index value (13bit)
it is good haohao 33 short 0x000010x00001 0x00010x0001
it is good haohao 33 middle 0x000010x00001 0x00020x0002
Bad huaihuai 44 long 0x000020x00002 0x00030x0003
many duoduo 11 short 0x000030x00003 0x00040x0004
few shaoshao 33 long 0x000040x00004 0x00050x0005
可选的,在另一种实现方式中,编码表包括:中文全局索引表、持续时长表和常用索引表。编码信息包括第一信息分量和第二信息分量。中文全局索引表可以参见表1,持续时长表可以参见表2。常用索引表可以参见表4。如表4所示,常用索引值可以唯一区分单字、音标和音调的组合。Optionally, in another implementation manner, the encoding table includes: a Chinese global index table, a duration table, and a common index table. The encoded information includes a first information component and a second information component. The Chinese global index table can be found in Table 1, and the duration table can be found in Table 2. Common index tables can be found in Table 4. As shown in Table 4, the commonly used index values can uniquely distinguish the combination of words, phonetic symbols and tones.
在该实现方式中,编码信息包括音标和音调对应的第一信息分量,以及持续时长对应的第二信息分量。由于持续时长单独编码,因此常用索引表中可以不考虑持续时长的区别,进一步减少了常用索引值的数量,提高了查找常用索引表的速率。In this implementation manner, the encoded information includes a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration. Since the durations are encoded separately, the difference of the durations may not be considered in the commonly used index table, which further reduces the number of commonly used index values and improves the rate of searching for the commonly used index table.
表4常用索引表Table 4 Commonly used index table
单字single word 音标phonetic symbol 音调tone 全局索引值(20bit)Global index value (20bit) 常用索引值(13bit)Common index value (13bit)
it is good haohao 33 0x000010x00001 0x00010x0001
Bad huaihuai 44 0x000020x00002 0x00020x0002
many duoduo 11 0x000030x00003 0x00030x0003
few shaoshao 33 0x000040x00004 0x00040x0004
可选的,在又一种实现方式中,编码表包括:中文全局索引表、音调表、持续时长表和常用索引表。编码信息包括第一信息子分量、第二信息子分量和第二信息分量。示例性的,如表5所示,中文全局索引表用于指示音标与编码信息中第一信息子分量(全局索引值)之间的对应关系。示例性的,如表6所示,音调表用于指示音调与编码信息中第二信息子分量(表6中2bit的索引值)之间的对应关系。其中,音调取值1~4,分别表示第一声~第四声。持续时长表可以参见表2。示例性的,如表7所示,常用索引值可以唯一区分不同的音标。Optionally, in another implementation manner, the encoding table includes: a Chinese global index table, a tone table, a duration table, and a common index table. The encoded information includes a first information subcomponent, a second information subcomponent and a second information component. Exemplarily, as shown in Table 5, the Chinese global index table is used to indicate the correspondence between the phonetic symbols and the first information sub-component (global index value) in the encoded information. Exemplarily, as shown in Table 6, the tone table is used to indicate the correspondence between the tone and the second information sub-component in the encoded information (the index value of 2 bits in Table 6). The pitch ranges from 1 to 4, representing the first to fourth tones, respectively. The duration table can be found in Table 2. Exemplarily, as shown in Table 7, the commonly used index values can uniquely distinguish different phonetic symbols.
在该实现方式中,对音标、音调和持续时长分别进行编解码,没有考虑不同单字的区别,进一步减少了常用索引值的数量和全局索引值的数量,提高了查找常用索引表和全局索引表的速率,提高了编解码效率。In this implementation, the phonetic symbol, pitch and duration are separately encoded and decoded, and the difference between different words is not considered, which further reduces the number of common index values and the number of global index values, and improves the search for common index tables and global index tables. speed, improving the encoding and decoding efficiency.
表5中文全局索引表Table 5 Chinese global index table
音标phonetic symbol 全局索引值(18bit)Global index value (18bit)
haohao 0x000010x00001
huaihuai 0x000020x00002
duoduo 0x000030x00003
shaoshao 0x000040x00004
表6音调表Table 6 Tone Table
音调tone 索引值(2bit)(二进制)Index value (2bit) (binary)
11 0000
22 0101
33 1010
44 1111
表7常用索引表Table 7 Commonly used index table
音标phonetic symbol 全局索引值(18bit)Global index value (18bit) 常用索引值(12bit)Common index value (12bit)
haohao 0x000010x00001 0x00010x0001
huaihuai 0x000020x00002 0x00020x0002
duoduo 0x000030x00003 0x00030x0003
shaoshao 0x000040x00004 0x00040x0004
需要说明,当语言为多种时,每种语言均对应有全局索引表。可选的,多种语言可以包括但不限于下列中的至少两种:中文、英文、德文、法文、日文、韩文或方言。It should be noted that when there are multiple languages, each language corresponds to a global index table. Optionally, the multiple languages may include, but are not limited to, at least two of the following: Chinese, English, German, French, Japanese, Korean, or dialects.
例如,编码表包括3张全局索引表,分别为中文全局索引表、英文全局索引表和方言全局索引表。中文全局索引表可以包括38万个音标,其中常用音标可以为10万个,常用音标排序在先。英文全局索引表可以包括28万个音标,其中常用音标可以为3万5千个,常用音标排序在先。方言全局索引表可以包括10万个音标。For example, the coding table includes three global index tables, which are Chinese global index table, English global index table and dialect global index table respectively. The Chinese global index table can include 380,000 phonetic symbols, of which the common phonetic symbols can be 100,000, and the common phonetic symbols are sorted first. The English global index table can include 280,000 phonetic symbols, of which the commonly used phonetic symbols can be 35,000, and the common phonetic symbols are sorted first. The dialect global index table can include 100,000 phonetic symbols.
需要说明,本申请实施例对每个全局索引表中全局索引值的取值范围不做限定。可选的,所有的全局索引表可以统一编号,便于在音标编解码时快速查找。例如,编码表包括中文全局索引表和方言全局索引表。中文全局索引表中全局索引值的取值范围为1~100,最多100个音标,方言全局索引表中的全局索引值可以从101开始编号。It should be noted that this embodiment of the present application does not limit the value range of the global index value in each global index table. Optionally, all the global index tables can be numbered uniformly, which is convenient for quick searching during phonetic symbol encoding and decoding. For example, the encoding table includes a Chinese global index table and a dialect global index table. The value range of the global index value in the Chinese global index table is 1 to 100, with a maximum of 100 phonetic symbols. The global index value in the dialect global index table can be numbered from 101.
可选的,为了确保音标编解码效率,所有的全局索引表中音标的数量和小于预设数值,本申请实施例对预设数值不做限定,例如,100万。Optionally, in order to ensure the phonetic symbol encoding and decoding efficiency, the sum of the number of phonetic symbols in all global index tables is less than a preset value, which is not limited in this embodiment of the present application, for example, 1 million.
5、终端设备支持音标编解码5. Terminal equipment supports phonetic codec
可选的,在一种实现方式中,终端设备默认支持音标编解码的功能,且没有相关设置开关,不需要用户设置,则终端设备支持音标编解码。Optionally, in an implementation manner, the terminal device supports the function of phonetic symbol encoding and decoding by default, and there is no related setting switch, and no user setting is required, then the terminal device supports phonetic symbol encoding and decoding.
可选的,在另一种实现方式中,终端设备具有音标编解码的功能,且存在相关设置开关,需要用户设置。终端设备支持音标编解码是指通过用户设置、终端设备当前打开了音标编解码的功能。如果终端设备当前关闭了音标编解码的功能,那么终端设备不支持音标编解码。本申请实施例对用户打开或关闭终端设备的音标编解码功能的方式不做限定。例 如,可以通过下列中的任意一种:语音控制方式、预设手势控制、通过在相关界面中进行触控操作进行控制。Optionally, in another implementation manner, the terminal device has the function of phonetic symbol encoding and decoding, and there is a related setting switch, which needs to be set by the user. The terminal device supports phonetic symbol codec means that the phonetic symbol codec function is currently enabled on the terminal device through user settings. If the terminal device currently disables the phonetic codec function, the terminal device does not support phonetic codec. This embodiment of the present application does not limit the manner in which the user enables or disables the phonetic symbol encoding and decoding function of the terminal device. For example, it can be controlled by any of the following: voice control, preset gesture control, and control by touch operation in the relevant interface.
目前,终端设备进行语音通话时,通常采用传统的语音编解码方式,例如,波形编解码或特征编解码。当终端设备所处通信环境的信号变差时,传输的语音码率降低,波形特征更加稀疏,导致接收端还原的波形和原波形失真较大,通话断续、有杂音甚至无声,通话效果很差。示例性的,图3为弱信号环境中采用传统语音编解码的通话效果示意图。如图3所示,用户A和用户B当前处于弱信号环境,对于用户B,通话断续,效果很差。At present, when a terminal device conducts a voice call, a traditional voice encoding and decoding method is usually used, for example, waveform encoding and decoding or feature encoding and decoding. When the signal of the communication environment in which the terminal equipment is located deteriorates, the transmitted voice code rate is reduced, and the waveform characteristics are more sparse, resulting in greater distortion of the waveform restored by the receiving end and the original waveform, and the call is intermittent, noisy or even silent, and the call effect is very good. Difference. Exemplarily, FIG. 3 is a schematic diagram of a call effect using traditional voice codec in a weak signal environment. As shown in Figure 3, user A and user B are currently in a weak signal environment, and for user B, the call is intermittent, and the effect is very poor.
本申请实施例提供的语音处理方法,当进行语音通话的两个终端设备中的任意一个处于弱信号环境时,在语音发送方,通过对用户说的单字的音标、音调和持续时长进行编码,可以得到预设比特长度的编码信息。编码信息经过信道传输后,在语音接收方,对接收到的信息进行解码,得到单字的音标、音调和持续时长,从而生成完整流畅的语音信号,并采用预设声音进行播放。本申请实施例提供的语音处理方法,在弱信号环境中可以以极低码率传输。而且,通过对单字的发音和持续时长进行编码,接收方可以清晰的还原发送方用户的发音,进而根据播放的发音获取语义,解决了通话断续、有杂音甚至无声的问题,提升了通话效果。示例性的,图4为弱信号环境中本申请实施例提供的通话效果示意图。如图4所示,用户A和用户B当前处于弱信号环境,对于用户B,可以听到终端设备采用预设声音播放的拼音发声,清晰的还原了发送方用户A的发音,进而用户B可以通过发音判断语义,明白用户A的意图,提升了通话效果。In the voice processing method provided by the embodiment of the present application, when any one of the two terminal devices conducting a voice call is in a weak signal environment, the voice sender encodes the phonetic symbols, pitch and duration of the words spoken by the user, Encoding information with a preset bit length can be obtained. After the encoded information is transmitted through the channel, at the voice receiver, the received information is decoded to obtain the phonetic symbol, pitch and duration of the single word, thereby generating a complete and smooth voice signal and playing it with a preset sound. The speech processing method provided by the embodiment of the present application can transmit at an extremely low bit rate in a weak signal environment. Moreover, by encoding the pronunciation and duration of a single word, the receiver can clearly restore the pronunciation of the sender user, and then obtain the semantics according to the playback pronunciation, which solves the problem of intermittent, noisy or even silent calls, and improves the call effect. . Exemplarily, FIG. 4 is a schematic diagram of a call effect provided by an embodiment of the present application in a weak signal environment. As shown in Figure 4, user A and user B are currently in a weak signal environment. For user B, the terminal device can hear the pinyin sound played by the preset sound, which clearly restores the pronunciation of the sender, user A, and user B can By judging the semantics by pronunciation, the intention of user A is understood, and the effect of the call is improved.
下面对一些应用场景进行举例说明。Some application scenarios are illustrated below.
可选的,在一个应用场景中,用户A和用户B熟知对方。当用户A或用户B处于弱信号环境且需要通话时,通过本申请实施例提供的语音处理方法,通过同音或近似读音模糊匹配,接收方用户不要求识别音源发出的准确字意,可以依赖双方的熟悉度,根据终端设备播放的发音准确了解到对方真正想说的话,提升通话效果。Optionally, in an application scenario, user A and user B are familiar with each other. When user A or user B is in a weak signal environment and needs to talk, through the voice processing method provided by the embodiment of the present application, through the fuzzy matching of homophonic or approximate pronunciation, the receiver user does not need to recognize the exact meaning of the word issued by the sound source, and can rely on both parties. Accurately understand what the other party really wants to say according to the pronunciation played by the terminal device, and improve the call effect.
可选的,在另一个应用场景中,用户A处于紧急或危险环境且信号较差,用户A与用户B通话后,通过本申请实施例提供的语音处理方法,用户A可以向用户B传递简短重要的字,用户B根据终端设备播放的发音可以准确了解到用户A的意图,提升通话效果。Optionally, in another application scenario, user A is in an emergency or dangerous environment and the signal is poor. After user A talks with user B, through the voice processing method provided in this embodiment of the present application, user A can send a brief message to user B. For important words, user B can accurately understand the intention of user A according to the pronunciation played by the terminal device, which improves the call effect.
下面通过具体的实施例对本申请的技术方案进行详细说明。下面的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。The technical solutions of the present application will be described in detail below through specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
本申请实施例中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。The terms "first", "second", "third", "fourth", etc. (if any) in the embodiments of the present application are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. .
图5为本申请实施例提供的语音处理方法的一种消息交互图。本实施例涉及第一终端设备和第二终端设备,第一终端设备与第二终端设备处于通话状态,第一终端设备为语音发送方,第二终端设备为语音接收方。如图5所示,本实施例提供的语音处理方法,可以包括:FIG. 5 is a message interaction diagram of the voice processing method provided by the embodiment of the present application. This embodiment involves a first terminal device and a second terminal device, the first terminal device and the second terminal device are in a call state, the first terminal device is a voice sender, and the second terminal device is a voice receiver. As shown in FIG. 5 , the voice processing method provided in this embodiment may include:
S501、第一终端设备获取用户输入的语音信号。S501. The first terminal device acquires a voice signal input by a user.
例如,在图4中,用户输入的语音信号为用户A的语音“你好,我在山上”通过第一 终端设备处理后对应的语音信号。For example, in Fig. 4, the voice signal input by the user is the voice signal corresponding to the user A's voice "Hello, I'm on the mountain" processed by the first terminal device.
S502、在第一终端设备确定当前处于弱信号环境,且确定第一终端设备与第二终端设备均支持音标编解码时,从语音信号中提取多组信息。其中,每组信息包括单字的音标、音调和持续时长。S502. When the first terminal device determines that it is currently in a weak signal environment, and determines that both the first terminal device and the second terminal device support phonetic codec codec, extract multiple sets of information from the speech signal. Wherein, each group of information includes the phonetic symbol, tone and duration of the single word.
其中,第一终端设备确定当前处于弱信号环境,可以包括:第一终端设备处于弱信号环境,或者,第二终端设备处于弱信号环境,或者,第一终端设备和第二终端设备均处于弱信号环境。关于弱信号环境以及判断终端设备是否处于弱信号环境的实施方式,可以参见上面描述,此处不再赘述。Wherein, determining that the first terminal device is currently in a weak signal environment may include: the first terminal device is in a weak signal environment, or the second terminal device is in a weak signal environment, or both the first terminal device and the second terminal device are in a weak signal environment signal environment. Regarding the weak signal environment and the implementation manner of judging whether the terminal device is in the weak signal environment, reference may be made to the above description, and details are not repeated here.
其中,终端设备支持音标编解码可以参见上面描述,此处不再赘述。Wherein, reference may be made to the above description for the terminal device to support phonetic symbol encoding and decoding, which will not be repeated here.
可选的,可以通过波形比对的方式从语音信号中提取多组信息。具体的,将语音信号的波形进行分段。对于每段波形,将该段波形或波形特征与本地预存的所有字的波形或波形特征进行比对,将所有字中波形或波形特征的相似度最大的字确定为该段波形对应的字。可选的,将语音信号的波形进行分段,可以按字进行波形抽取,或者,按照指定长度进行分段。本实施例对指定长度的取值不做限定,例如,1秒。Optionally, multiple sets of information can be extracted from the speech signal by way of waveform comparison. Specifically, the waveform of the speech signal is segmented. For each waveform, compare the waveform or waveform feature with the waveforms or waveform features of all locally pre-stored words, and determine the word with the greatest similarity in waveform or waveform feature among all the words as the word corresponding to the waveform. Optionally, the waveform of the speech signal is segmented, and waveform extraction can be performed by word, or segmented by a specified length. This embodiment does not limit the value of the specified length, for example, 1 second.
由于本申请实施例采用的音标编解码只需要字的发音接近,可以不考虑字意是否一致,通过波形比对的方式可以提高提取多组信息的效率。Because the phonetic symbol encoding and decoding used in the embodiment of the present application only requires that the pronunciation of the words is close, whether the meanings of the words are consistent may not be considered, and the efficiency of extracting multiple sets of information can be improved by means of waveform comparison.
可选的,可以通过神经网络模型或机器模型从语音信号中提取多组信息。可选的,神经网络模型或机器模型用于语义识别,根据输入的语音信号输出对应的字。可选的,神经网络模型或机器模型用于语音识别,根据输入的语音信号输出对应的音标和音调。Optionally, multiple sets of information can be extracted from the speech signal through a neural network model or a machine model. Optionally, a neural network model or a machine model is used for semantic recognition, and corresponding words are output according to the input speech signal. Optionally, the neural network model or the machine model is used for speech recognition, and the corresponding phonetic symbols and tones are output according to the input speech signal.
可选的,可以通过能量检测确定单字的持续时长。例如,可以预先设置持续时长的等级和每个等级对应的能量门限,根据与多个能量门限进行比较确定单字的持续时长。示例性的,持续时长的等级可以参见表2。Optionally, the duration of the word can be determined through energy detection. For example, the level of the duration and the energy threshold corresponding to each level can be preset, and the duration of the word is determined according to the comparison with multiple energy thresholds. Exemplarily, see Table 2 for the level of duration.
例如,在图4中,可以从语音信号中提取6组信息,分别为单字“你”、“好”、“我”、“在”、“山”、“上”的音标、音调和持续时长。对于单字“你”,音标为ni、音调为第三声、持续时长假设为短。对于单字“好”,音标为hao、音调为第三声、持续时长假设为中。For example, in Figure 4, 6 groups of information can be extracted from the speech signal, which are the phonetic symbols, pitch and duration of the words "you", "good", "me", "zai", "mountain" and "shang" respectively . For the single word "you", the phonetic symbol is ni, the tone is the third tone, and the duration is assumed to be short. For the single word "好", the phonetic symbol is hao, the pitch is the third tone, and the duration is assumed to be medium.
S503、第一终端设备根据编码表获取每组信息对应的编码信息。S503. The first terminal device acquires the encoding information corresponding to each group of information according to the encoding table.
其中,编码表存储有音标、音调和持续时长与编码信息之间的对应关系,编码信息为预设比特长度,可以参见上面描述,此处不再赘述。The coding table stores the correspondence between phonetic symbols, tones and durations and coding information, and the coding information is a preset bit length, which can be referred to the above description, and will not be repeated here.
S504、第一终端设备向第二终端设备发送编码信息。S504. The first terminal device sends the encoded information to the second terminal device.
相应的,编码信息经过信道传输后被第二终端设备接收。在本实施例中,第二终端设备从信道中接收的信息称为第一信息。Correspondingly, the encoded information is received by the second terminal device after being transmitted through the channel. In this embodiment, the information received by the second terminal device from the channel is called first information.
例如,在图4中,假设编码信息的预设比特长度为22bit,第一终端设备向第二终端设备发送6个编码信息,分别为单字“你”、“好”、“我”、“在”、“山”、“上”对应的编码信息,共计22*6=132bit。相应的,编码信息经过信道传输后,第二终端设备可以从信道中接收132bit长度的第一信息。For example, in FIG. 4 , assuming that the preset bit length of the encoded information is 22 bits, the first terminal device sends 6 pieces of encoded information to the second terminal device, which are the words "you", "good", "me", "in ", "mountain", "up" corresponding to the encoded information, a total of 22*6=132bit. Correspondingly, after the encoded information is transmitted through the channel, the second terminal device can receive the first information with a length of 132 bits from the channel.
S505、第二终端设备根据编码表对第一信息解码,获得多组信息。每组信息包括单字的音标、音调和持续时长。S505. The second terminal device decodes the first information according to the coding table to obtain multiple sets of information. Each set of information includes the phonetic symbol, pitch and duration of the word.
可选的,根据编码表对第一信息解码,获得多组信息,可以包括:Optionally, decode the first information according to the coding table to obtain multiple sets of information, which may include:
从第一信息中依次获取多个编码信息,编码信息的长度为预设比特长度。A plurality of encoding information are sequentially acquired from the first information, and the length of the encoding information is a preset bit length.
对于每个编码信息,根据编码表获取该编码信息对应的音标、音调和持续时长。For each encoding information, the phonetic symbol, pitch and duration corresponding to the encoding information are obtained according to the encoding table.
例如,在图4中,第一信息为132bit。首先从第一信息中获取第一个22bit长度的编码信息,根据编码表获取该编码信息对应的单字的音标、音调和持续时长。然后,继续从第一信息中获取第二个22bit长度的编码信息,根据编码表获取该编码信息对应的单字的音标、音调和持续时长。以此类推,直至对第一信息解码完成,获得6个单字的音标、音调和持续时长。For example, in FIG. 4 , the first information is 132 bits. First, obtain the first 22-bit encoding information from the first information, and obtain the phonetic symbol, tone and duration of the word corresponding to the encoding information according to the encoding table. Then, continue to acquire the second 22-bit encoding information from the first information, and acquire the phonetic symbol, tone and duration of the word corresponding to the encoding information according to the encoding table. By analogy, until the decoding of the first information is completed, the phonetic symbols, tones and duration of 6 single words are obtained.
S506、第二终端设备根据多组信息生成语音信号。S506. The second terminal device generates a voice signal according to the multiple sets of information.
由于每组信息包括单字的音标、音调和持续时长,因此可以还原出每个字的发音以及持续时长,从而合成完整顺畅的语音信号。Since each set of information includes the phonetic symbol, tone and duration of a single word, the pronunciation and duration of each word can be restored to synthesize a complete and smooth speech signal.
S507、第二终端设备采用预设声音播放语音信号。S507. The second terminal device uses a preset sound to play the voice signal.
其中,本实施例对预设声音不做限定,例如,可以为男声,也可以为女声。The preset voice is not limited in this embodiment, for example, it may be a male voice or a female voice.
可见,本实施例提供的语音处理方法,可以应用于弱信号环境中进行语音通话的两个终端设备。在语音发送方,提取用户语音中每个单字的音标、音调和持续时长并进行编码,得到每个单字对应的预设比特长度的编码信息。编码信息经过信道传输后,相应的,在语音接收方,对接收到的信息进行解码,得到每个单字的音标、音调和持续时长,从而生成完整流畅的语音信号,并采用预设声音进行播放。本申请实施例提供的语音处理方法,由于编码信息为预设比特长度,可以在弱信号环境中以极低码率进行传输。通过对单字的发音和持续时长进行编解码,接收方可以清晰的还原发送方用户的发音,进而根据播放的发音获取语义,解决了采用传统语音编解码时出现的通话断续、杂音甚至无声的问题,提升了通话效果。It can be seen that the voice processing method provided in this embodiment can be applied to two terminal devices that conduct voice calls in a weak signal environment. At the voice sender, the phonetic symbol, pitch and duration of each word in the user's speech are extracted and encoded, to obtain encoded information of a preset bit length corresponding to each word. After the encoded information is transmitted through the channel, correspondingly, at the voice receiver, the received information is decoded to obtain the phonetic symbol, pitch and duration of each word, so as to generate a complete and smooth voice signal, and use the preset sound for playback . In the speech processing method provided by the embodiment of the present application, since the encoded information has a preset bit length, it can be transmitted at an extremely low bit rate in a weak signal environment. By encoding and decoding the pronunciation and duration of a single word, the receiver can clearly restore the pronunciation of the sender user, and then obtain the semantics according to the pronunciation of the playback, which solves the intermittent call, noise or even silence that occurs when using traditional speech encoding and decoding. The problem is that the call effect is improved.
可选的,S503中,根据编码表获取每组信息对应的编码信息,可以包括:Optionally, in S503, the encoding information corresponding to each group of information is obtained according to the encoding table, which may include:
对于每组信息,确定常用索引表中是否包括该组信息中的音标。For each set of information, it is determined whether the phonetic symbols in the set of information are included in the common index table.
若常用索引表中包括该组信息中的音标,则根据常用索引表获取编码信息。If the commonly used index table includes the phonetic symbols in the group of information, the coding information is obtained according to the commonly used index table.
若常用索引表中不包括该组信息中的音标,则根据全局索引表获取编码信息。If the phonetic symbols in the group of information are not included in the common index table, the encoding information is obtained according to the global index table.
由于常用索引表是根据用户的常用字生成的,常用索引表中常用索引值的数量远远小于全局索引表中全局索引值的数量,因此先在常用索引表中查找进行编码,降低了查找数据量。如果常用索引表中差找不到,再在全局索引表中查找进行编码,提升了编码效率。Since the common index table is generated according to the common words of the user, the number of common index values in the common index table is much smaller than the number of global index values in the global index table, so the common index table is searched and encoded first, which reduces the search data. quantity. If the difference is not found in the common index table, it is searched in the global index table for encoding, which improves the encoding efficiency.
其中,编码表的实现方式不同,进行音标编码和音标解码的方式有所不同。下面通过示例对S503和S505进行说明。Among them, the implementation manner of the coding table is different, and the manners of phonetic symbol encoding and phonetic symbol decoding are different. S503 and S505 are described below with examples.
可选的,在一种实现方式中,如上述表1~表3所示的编码表。编码信息可以包括音标和音调对应的第一信息分量,以及持续时长对应的第二信息分量。在S503中,对于每组信息,第一终端设备可以根据单字和单字的音标、音调和持续时长先在常用索引表中查找,若查找到,则将对应的20bit全局索引值作为第一信息分量,若没有查找到,则根据单字和单字的音标、音调在中文全局索引表中查找,将对应的20bit全局索引值作为第一信息分量。然后根据单字的持续时长在持续时长表进行查找,将对应的2bit索引值作为第二信息分量,从而得到22bit的编码信息。相应的,在S505中,第二终端设备获取22bit的编码信息,前20bit为第一信息分量,后2bit为第二信息分量。根据前20bit在中文全局索引表中查找,获取单字的音标和音调。根据后2bit在持续时长表中进行查找,获取单字的持续时长。Optionally, in an implementation manner, the encoding table shown in Table 1 to Table 3 above is used. The encoded information may include a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to duration. In S503, for each group of information, the first terminal device may first look up the commonly used index table according to the phonetic symbol, tone and duration of the single character and the single character, and if found, use the corresponding 20-bit global index value as the first information component , if not found, search in the Chinese global index table according to the phonetic symbols and tones of the single character and the single character, and take the corresponding 20-bit global index value as the first information component. Then, search is performed in the duration table according to the duration of the single word, and the corresponding 2-bit index value is used as the second information component, thereby obtaining 22-bit encoded information. Correspondingly, in S505, the second terminal device acquires 22 bits of encoded information, the first 20 bits are the first information component, and the last 2 bits are the second information component. Search in the Chinese global index table according to the first 20 bits to obtain the phonetic symbols and tones of the words. Search in the duration table according to the last 2 bits to obtain the duration of the word.
可选的,在另一种实现方式中,如上述表1、表2和表4所示的编码表。本实现方式与上面实现方式的区别在于:第一终端设备查找常用索引表时,根据单字和单字的音标、音调在常用索引表中查找。在本实现方式中,由于常用索引表没有考虑持续时长,常用索引值的数量进一步减小,查找速度更快,提升了编码效率。Optionally, in another implementation manner, the coding table shown in Table 1, Table 2 and Table 4 above. The difference between this implementation manner and the above implementation manner is that: when the first terminal device searches the commonly used index table, it searches the commonly used index table according to the phonetic symbols and tones of the single character and the single character. In this implementation manner, since the common index table does not consider the duration, the number of common index values is further reduced, the search speed is faster, and the coding efficiency is improved.
可选的,在又一种实现方式中,如上述表2、表5、表6和表7所示的编码表。编码信息可以包括音标和音调对应的第一信息分量,以及持续时长对应的第二信息分量,第一信息分量包括音标对应的第一信息子分量和音调对应的第二信息子分量。在S503中,对于每组信息,第一终端设备可以根据单字的音标先在常用索引表中查找,若查找到,则将对应的18bit全局索引值作为第一信息子分量,若没有查找到,则根据单字的音标在中文全局索引表中查找,将对应的18bit全局索引值作为第一信息子分量。然后,根据单字的音调在音调表中查找,将对应的2bit索引值作为第二信息子分量。之后,根据单字的持续时长在持续时长表中查找,将对应的2bit索引值作为第二信息分量,从而得到18+2+2=22bit的编码信息。相应的,在S505中,第二终端设备获取22bit的编码信息,前18bit为第一信息子分量,中间的2bit为第二信息子分量,最后的2bit为第二信息分量。根据前18bit在中文全局索引表中查找,获取单字的音标。根据中间的2bit在音调表中查找,获取单字的音调。根据后2bit在持续时长表中进行查找,获取单字的持续时长。Optionally, in yet another implementation manner, the coding table shown in Table 2, Table 5, Table 6 and Table 7 above. The encoded information may include first information components corresponding to phonetic symbols and tones, and second information components corresponding to durations, where the first information components include first information subcomponents corresponding to phonetic symbols and second information subcomponents corresponding to tones. In S503, for each group of information, the first terminal device can first search in the common index table according to the phonetic symbol of the single word, if found, the corresponding 18-bit global index value is used as the first information subcomponent, if not found, Then according to the phonetic symbol of the single word, the Chinese global index table is searched, and the corresponding 18-bit global index value is used as the first information sub-component. Then, according to the pitch of the single word, the pitch table is searched, and the corresponding 2-bit index value is used as the second information sub-component. Afterwards, the duration table is searched according to the duration of the single word, and the corresponding 2-bit index value is used as the second information component, thereby obtaining coded information of 18+2+2=22 bits. Correspondingly, in S505, the second terminal device obtains 22 bits of encoded information, the first 18 bits are the first information subcomponent, the middle 2 bits are the second information subcomponent, and the last 2 bits are the second information component. Search in the Chinese global index table according to the first 18 bits to obtain the phonetic symbol of a single word. Search in the tone table according to the middle 2bit to get the tone of the word. Search in the duration table according to the last 2 bits to obtain the duration of the word.
在本实现方式中,对音标、音调和持续时长分别进行编解码,常用索引值的数量和全局索引值的数量进一步减小,查找速度更快,提升了编解码效率。In this implementation manner, the phonetic symbol, pitch and duration are separately encoded and decoded, the number of common index values and the number of global index values are further reduced, the search speed is faster, and the encoding and decoding efficiency is improved.
可选的,在本申请的另一个实施例中,在上述图5所示实施例的基础上提供了S502中确定终端设备处于弱信号环境的实现方式。通过第一终端设备与第二终端设备之间的协商,确定当前处于弱信号环境时可以采用音标编解码。Optionally, in another embodiment of the present application, an implementation manner of determining that the terminal device is in a weak signal environment in S502 is provided on the basis of the embodiment shown in FIG. 5 above. Through the negotiation between the first terminal device and the second terminal device, it is determined that phonetic symbol codec can be used when it is currently in a weak signal environment.
可选的,在一种实现方式中,如图6所示,第一终端设备确定当前处于弱信号环境,可以包括:Optionally, in an implementation manner, as shown in FIG. 6 , determining that the first terminal device is currently in a weak signal environment may include:
S601、若第一终端设备确定第一终端设备的目标参数满足第一预设条件,则向第二终端设备发送第一请求消息,第一请求消息用于指示第二终端设备使用音标编解码。S601. If the first terminal device determines that the target parameter of the first terminal device satisfies the first preset condition, send a first request message to the second terminal device, where the first request message is used to instruct the second terminal device to use phonetic symbol encoding and decoding.
其中,第一终端设备的目标参数用于指示第一终端设备当前所处通信环境的信号状态。目标参数和第一预设条件可以参见本申请上面描述,此处不再赘述。Wherein, the target parameter of the first terminal device is used to indicate the signal state of the communication environment where the first terminal device is currently located. For the target parameter and the first preset condition, reference may be made to the above description of this application, and details are not repeated here.
相应的,第二终端设备接收第一请求消息。Correspondingly, the second terminal device receives the first request message.
S602、第二终端设备向第一终端设备发送第一响应消息,第一响应消息用于指示第二终端设备使用音标编解码。S602. The second terminal device sends a first response message to the first terminal device, where the first response message is used to instruct the second terminal device to use phonetic symbol encoding and decoding.
在本实现方式中,作为语音发送方的第一终端设备根据自身的目标参数确定当前处于弱信号环境,主动向第二终端设备发起编解码方式切换的协商,从而确保及时采用音标编解码,提升通话质量。In this implementation, the first terminal device as the voice sender determines that it is currently in a weak signal environment according to its own target parameters, and actively initiates a negotiation of codec mode switching to the second terminal device, thereby ensuring that phonetic symbol codec is used in a timely manner, improving conversation quality.
可选的,第一请求消息可以包含第一指示字段,用于指示使用音标编解码。本申请实施例对第一指示字段的名称不做限定。Optionally, the first request message may include a first indication field, which is used to indicate that phonetic symbol codec is used. This embodiment of the present application does not limit the name of the first indication field.
可选的,第一请求消息可以包括滞后定时器,滞后定时器用于指示第二终端设备使用音标编解码的延迟时间。Optionally, the first request message may include a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use phonetic symbol encoding and decoding.
通常,终端设备默认采用传统的语音编解码方式。通过设置滞后定时器,为语音编解 码方式的切换预留了时间,滞后定时器超时后,第一终端设备和第二终端设备同时使用音标编解码,提升了编解码方式的切换效果。Usually, the terminal device adopts the traditional voice codec by default. By setting the lag timer, time is reserved for the switching of the voice codec mode. After the lag timer expires, the first terminal device and the second terminal device use the phonetic symbol codec at the same time, which improves the codec mode switching effect.
可选的,在另一种实现方式中,如图7所示,第一终端设备确定当前处于弱信号环境,可以包括:Optionally, in another implementation manner, as shown in FIG. 7 , determining that the first terminal device is currently in a weak signal environment may include:
S701、若第二终端设备确定第二终端设备的目标参数满足第一预设条件,则向第一终端设备发送第二请求消息,第二请求消息用于指示第一终端设备使用音标编解码。S701. If the second terminal device determines that the target parameter of the second terminal device satisfies the first preset condition, send a second request message to the first terminal device, where the second request message is used to instruct the first terminal device to use phonetic symbol encoding and decoding.
其中,第二终端设备的目标参数用于指示第二终端设备当前所处通信环境的信号状态。目标参数和第一预设条件可以参见本申请上面描述,此处不再赘述。Wherein, the target parameter of the second terminal device is used to indicate the signal state of the communication environment where the second terminal device is currently located. For the target parameter and the first preset condition, reference may be made to the above description of this application, and details are not repeated here.
相应的,第一终端设备接收第二请求消息。Correspondingly, the first terminal device receives the second request message.
可选的,第二请求消息可以包含第一指示字段,可以参见上面关于第一指示字段的描述,此处不再赘述。Optionally, the second request message may include a first indication field, and reference may be made to the above description of the first indication field, which will not be repeated here.
可选的,第二请求消息可以包括滞后定时器,滞后定时器用于指示第一终端设备使用音标编解码的延迟时间。Optionally, the second request message may include a hysteresis timer, where the hysteresis timer is used to indicate a delay time for the first terminal device to use phonetic symbol encoding and decoding.
S702、第一终端设备向第二终端设备发送第二响应消息,第二响应消息用于指示第一终端设备使用音标编解码。S702. The first terminal device sends a second response message to the second terminal device, where the second response message is used to instruct the first terminal device to use phonetic symbol encoding and decoding.
在本实现方式中,作为语音接收方的第二终端设备根据自身的目标参数确定当前处于弱信号环境,主动向第一终端设备发起编解码方式切换的协商,从而确保及时采用音标编解码方式,提升通话质量。In this implementation, the second terminal device as the voice receiver determines that it is currently in a weak signal environment according to its own target parameters, and actively initiates the negotiation of codec mode switching to the first terminal device, so as to ensure that the phonetic symbol codec mode is used in time. Improve call quality.
可选的,在上述过程中,若第一响应消息或第二响应消息没有接收成功,则可以重发。可选的,可以设置重发次数,本实施例对具体取值不做限定。Optionally, in the above process, if the first response message or the second response message is not received successfully, it may be retransmitted. Optionally, the number of retransmissions may be set, and the specific value is not limited in this embodiment.
可选的,在本申请的又一个实施例中,在上述图5所示实施例的基础上提供了S502中确定第一终端设备与第二终端设备均支持音标编解码的实现方式。通过第一终端设备与第二终端设备之间的能力协商,确定通话双方均支持音标编解码,当前处于弱信号环境时可以采用音标编解码。Optionally, in another embodiment of the present application, based on the embodiment shown in FIG. 5 above, an implementation manner of determining that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding in S502 is provided. Through capability negotiation between the first terminal device and the second terminal device, it is determined that both parties in the call support phonetic codec, and phonetic codec can be used when currently in a weak signal environment.
可选的,在一种实现方式中,如图8所示,第一终端设备确定第一终端设备与第二终端设备均支持音标编解码,可以包括:Optionally, in an implementation manner, as shown in FIG. 8 , the first terminal device determines that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding, which may include:
S801、第一终端设备向第二终端设备发送第一能力信息,第一能力信息用于指示第一终端设备支持音标编解码。S801. The first terminal device sends first capability information to the second terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic symbol encoding and decoding.
相应的,第二终端设备接收第一终端设备发送的第一能力信息。Correspondingly, the second terminal device receives the first capability information sent by the first terminal device.
S802、第二终端设备向第一终端设备发送第一能力响应信息,第一能力响应信息用于指示第二终端设备支持音标编解码。S802. The second terminal device sends first capability response information to the first terminal device, where the first capability response information is used to indicate that the second terminal device supports phonetic codec codec.
在本实现方式中,作为语音发送方的第一终端设备主动向第二终端设备发起能力协商,从而确保及时采用音标编解码方式,提升通话质量。In this implementation manner, the first terminal device, which is the voice sender, actively initiates capability negotiation with the second terminal device, so as to ensure that the phonetic symbol encoding and decoding method is adopted in time to improve the quality of the call.
可选的,在另一种实现方式中,如图9所示,第一终端设备确定第一终端设备与第二终端设备均支持音标编解码,可以包括:Optionally, in another implementation manner, as shown in FIG. 9 , the first terminal device determines that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding, which may include:
S901、第二终端设备向第一终端设备发送第二能力信息,第二能力信息用于指示第二终端设备支持音标编解码。S901. The second terminal device sends second capability information to the first terminal device, where the second capability information is used to indicate that the second terminal device supports phonetic symbol encoding and decoding.
相应的,第一终端设备接收第二终端设备发送的第二能力信息。Correspondingly, the first terminal device receives the second capability information sent by the second terminal device.
S902、第一终端设备向第二终端设备发送第二能力响应信息,第二能力响应信息用于指示第一终端设备支持音标编解码。S902: The first terminal device sends second capability response information to the second terminal device, where the second capability response information is used to indicate that the first terminal device supports phonetic codec codec.
在本实现方式中,作为语音接收方的第二终端设备主动向第一终端设备发起能力协商,从而确保及时采用音标编解码方式,提升通话质量。In this implementation manner, the second terminal device, which is the voice receiver, actively initiates capability negotiation with the first terminal device, so as to ensure that the phonetic symbol encoding and decoding method is adopted in time to improve the call quality.
需要说明,第一能力信息和第二能力信息可以是单独的消息,也可以携带在现有的消息中。本实施例对能力协商过程的时间不做限定。例如,可以在第一终端设备与第二终端设备建立连接后,在振铃(altering)消息期间进行能力协商,振铃消息中可以包括New audiocodec能力字段,用于指示终端设备是否支持音标编解码。It should be noted that the first capability information and the second capability information may be separate messages, or may be carried in existing messages. This embodiment does not limit the time of the capability negotiation process. For example, after the first terminal device establishes a connection with the second terminal device, capability negotiation may be performed during an altering message, and the ringing message may include a New audiocodec capability field to indicate whether the terminal device supports phonetic codec codec .
可选的,在上述过程中,若第一能力响应信息或第二能力响应信息没有接收成功,则可以重发。可选的,可以设置重发次数,本实施例对具体取值不做限定。Optionally, in the above process, if the first capability response information or the second capability response information is not received successfully, it may be retransmitted. Optionally, the number of retransmissions may be set, and the specific value is not limited in this embodiment.
可选的,在用户可以设置终端设备打开或关闭音标编解码功能的场景中,终端设备支持音标编解码是指通过用户设置、终端设备当前打开了音标编解码的功能。Optionally, in a scenario where the user can set the terminal device to enable or disable the phonetic symbol encoding and decoding function, the terminal device supporting phonetic symbol encoding and decoding means that the phonetic symbol encoding and decoding function is currently enabled on the terminal device through user settings.
可选的,若终端设备当前未打开音标编解码的功能,用户可以进行设置。本实施例提供的语音处理方法,还可以包括:Optionally, if the function of phonetic symbol encoding and decoding is not currently enabled on the terminal device, the user can set it. The voice processing method provided in this embodiment may further include:
显示设置界面。The setting interface is displayed.
接收用户在设置界面中的操作。Receive user actions in the settings interface.
响应于操作,打开音标编解码的功能。In response to the operation, the function of phonetic codec is turned on.
示例性的,图10为本申请实施例提供的设置音标编解码方式的一种界面图。如图10中的(a)所示,终端设备当前显示设置界面1001,设置界面1001中包括功能选项“弱信号高清语音编码”,即本申请实施例中的音标编解码功能。控件1010的状态可以显示当前终端设备是否打开音标编解码功能。在图10中的(a)中,音标编解码功能处于关闭状态。用户可以对控件1010进行点击操作,相应的,终端设备响应于点击操作,打开音标编解码功能,如图10中的(b)所示。Exemplarily, FIG. 10 is an interface diagram for setting a phonetic symbol encoding and decoding mode provided by an embodiment of the present application. As shown in (a) of FIG. 10 , the terminal device currently displays a setting interface 1001, and the setting interface 1001 includes the function option “weak signal HD voice coding”, that is, the phonetic symbol coding and decoding function in the embodiment of the present application. The state of the control 1010 can display whether the phonetic symbol codec function is enabled on the current terminal device. In (a) of FIG. 10 , the phonetic symbol codec function is turned off. The user can perform a click operation on the control 1010. Correspondingly, the terminal device responds to the click operation to enable the phonetic symbol encoding and decoding function, as shown in (b) of FIG. 10 .
需要说明,本实施例对用户设置音标编解码功能的时间不做限定。It should be noted that this embodiment does not limit the time for the user to set the phonetic symbol encoding and decoding function.
可选的,若终端设备当前未打开音标编解码的功能,且终端设备确定当前处于弱信号环境,本实施例提供的语音处理方法,还可以包括:Optionally, if the terminal device currently does not have the phonetic symbol encoding and decoding function enabled, and the terminal device determines that it is currently in a weak signal environment, the voice processing method provided in this embodiment may further include:
生成并输出提示信息。Generate and output prompt information.
接收用户的第一指令。A first instruction from the user is received.
根据第一指令打开音标编解码功能。Turn on the phonetic symbol encoding and decoding function according to the first instruction.
本实施例对于提示信息的实现方式不做限定。例如,可以为播放提示语音,播放提示音乐,或者在终端设备当前显示的界面中弹出提示框或提示信息。This embodiment does not limit the implementation manner of the prompt information. For example, the prompting voice may be played, the prompting music may be played, or a prompting box or prompting information may be popped up in the interface currently displayed by the terminal device.
可选的,在本申请的又一个实施例中,在上述实施例的基础上提供了从音标编解码切换为传统语音编解码的实现方式。通话的两个终端设备所处的通信环境是实时变化的,不会一直处于弱信号环境。本申请实施例中的音标编解码更适用于弱信号环境,满足基本通信需求。当通信环境变好时,应该及时切换回传统的语音编解码,以提升用户的通话感受。通过第一终端设备与第二终端设备之间的协商,确定当前不是弱信号环境时可以采用传统的语音编解码。Optionally, in yet another embodiment of the present application, an implementation manner of switching from phonetic symbol encoding and decoding to traditional speech encoding and decoding is provided on the basis of the foregoing embodiment. The communication environment where the two terminal devices of the call are located changes in real time and will not always be in a weak signal environment. The phonetic symbol encoding and decoding in the embodiment of the present application is more suitable for a weak signal environment and meets basic communication requirements. When the communication environment improves, it should switch back to the traditional voice codec in time to improve the user's call experience. Through the negotiation between the first terminal device and the second terminal device, when it is determined that the current environment is not a weak signal environment, traditional voice codec can be used.
可选的,在一种实现方式中,如图11所示,本实施例提供的语音处理方法还可以包括:Optionally, in an implementation manner, as shown in FIG. 11 , the voice processing method provided in this embodiment may further include:
S1101、若第一终端设备确定第一终端设备的目标参数满足第二预设条件,则向第二终端设备发送第三请求消息,第三请求消息用于指示第二终端设备使用波形编解码。S1101. If the first terminal device determines that the target parameter of the first terminal device satisfies the second preset condition, send a third request message to the second terminal device, where the third request message is used to instruct the second terminal device to use waveform encoding and decoding.
其中,第一终端设备的目标参数用于指示第一终端设备当前所处通信环境的信号状态。目标参数和第二预设条件可以参见本申请上面描述,此处不再赘述。Wherein, the target parameter of the first terminal device is used to indicate the signal state of the communication environment where the first terminal device is currently located. For the target parameter and the second preset condition, reference may be made to the above description of this application, and details are not repeated here.
相应的,第二终端设备接收第三请求消息。Correspondingly, the second terminal device receives the third request message.
可选的,第三请求消息可以包含第二指示字段,用于指示使用传统的语音编解码。本申请实施例对第二指示字段的名称不做限定。例如,名称可以为back to HD。可选的,第二指示字段还可以指示使用传统的语音编解码的开始时间点。Optionally, the third request message may include a second indication field, which is used to indicate the use of traditional speech codec. This embodiment of the present application does not limit the name of the second indication field. For example, the name could be back to HD. Optionally, the second indication field may also indicate the start time point of using traditional speech codec.
S1102、若第二终端设备确定第二终端设备的目标参数满足第二预设条件,则向第一终端设备发送第三响应消息,第三响应消息用于指示第二终端设备使用波形编解码。S1102. If the second terminal device determines that the target parameter of the second terminal device meets the second preset condition, send a third response message to the first terminal device, where the third response message is used to instruct the second terminal device to use waveform encoding and decoding.
其中,第二终端设备的目标参数用于指示第二终端设备当前所处通信环境的信号状态。目标参数和第二预设条件可以参见本申请上面描述,此处不再赘述。Wherein, the target parameter of the second terminal device is used to indicate the signal state of the communication environment where the second terminal device is currently located. For the target parameter and the second preset condition, reference may be made to the above description of this application, and details are not repeated here.
在本实现方式中,作为语音发送方的第一终端设备在确定自身所处信号环境变好时,主动向第二终端设备发起编解码方式切换的协商,当第二终端设备也确定当前没有处于弱信号环境,则返回响应消息,确保在通信环境较好时及时的采用传统的语音编解码方式,提升通话质量。In this implementation manner, when the first terminal device as the voice sender determines that the signal environment in which it is located has improved, it actively initiates a negotiation of codec mode switching to the second terminal device, and when the second terminal device also determines that it is not currently in the In a weak signal environment, a response message is returned to ensure that the traditional voice codec method is used in a timely manner when the communication environment is good to improve call quality.
可选的,在另一种实现方式中,如图12所示,本实施例提供的语音处理方法还可以包括:Optionally, in another implementation manner, as shown in FIG. 12 , the voice processing method provided in this embodiment may further include:
S1201、若第二终端设备确定第二终端设备的目标参数满足第二预设条件,则向第一终端设备发送第四请求消息,第四请求消息用于指示第一终端设备使用波形编解码。S1201. If the second terminal device determines that the target parameter of the second terminal device meets the second preset condition, send a fourth request message to the first terminal device, where the fourth request message is used to instruct the first terminal device to use waveform encoding and decoding.
相应的,第一终端设备接收第四请求消息。Correspondingly, the first terminal device receives the fourth request message.
可选的,第四请求消息可以包含第二指示字段,可以参见上面关于第二指示字段的描述,此处不再赘述。Optionally, the fourth request message may include a second indication field, and reference may be made to the above description of the second indication field, which will not be repeated here.
S1202、若第一终端设备确定第一终端设备的目标参数满足第二预设条件,则向第二终端设备发送第四响应消息,第四响应消息用于指示第一终端设备使用波形编解码。S1202. If the first terminal device determines that the target parameter of the first terminal device satisfies the second preset condition, send a fourth response message to the second terminal device, where the fourth response message is used to instruct the first terminal device to use waveform encoding and decoding.
在本实现方式中,作为语音接收方的第二终端设备在确定自身所处信号环境变好时,主动向第一终端设备发起编解码方式切换的协商,当第一终端设备也确定当前没有处于弱信号环境,则返回响应消息,确保在通信环境较好时及时的采用传统的语音编解码方式,提升通话质量。In this implementation manner, when the second terminal device as the voice receiver determines that the signal environment in which it is located has improved, it actively initiates a negotiation of codec mode switching to the first terminal device, and when the first terminal device also determines that it is not currently in a In a weak signal environment, a response message is returned to ensure that the traditional voice codec method is used in a timely manner when the communication environment is good to improve call quality.
可选的,在上述过程中,若第三响应消息或第四响应消息没有接收成功,则可以重发。可选的,可以设置重发次数,本实施例对具体取值不做限定。Optionally, in the above process, if the third response message or the fourth response message is not received successfully, it may be retransmitted. Optionally, the number of retransmissions may be set, and the specific value is not limited in this embodiment.
可以理解的是,终端设备为了实现上述功能,其包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that, in order to realize the above-mentioned functions, the terminal device includes corresponding hardware and/or software modules for executing each function. The present application can be implemented in hardware or in the form of a combination of hardware and computer software in conjunction with the algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functionality for each particular application in conjunction with the embodiments, but such implementations should not be considered beyond the scope of this application.
本申请实施例可以根据上述方法示例对终端设备进行功能模块的划分,例如,可以对 应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。需要说明的是,本申请实施例中模块的名称是示意性的,实际实现时对模块的名称不做限定。In this embodiment of the present application, the terminal device may be divided into functional modules according to the foregoing method examples. For example, each functional module may be divided according to each function, or two or more functions may be integrated into one processing module. It should be noted that, the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation. It should be noted that the names of the modules in the embodiments of the present application are schematic, and the names of the modules are not limited in actual implementation.
在采用对应各个功能划分各个功能模块的情况下,图13为本申请实施例提供的终端设备的一种结构示意图。如图13所示,该终端设备可以包括:发送模块1301、处理模块1302和接收模块1303。In the case where each functional module is divided according to each function, FIG. 13 is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 13 , the terminal device may include: a sending module 1301 , a processing module 1302 and a receiving module 1303 .
发送模块1301,用于向其他设备发送数据。例如,编码信息、第一请求消息、第二响应消息、第三请求消息、第四响应消息、第一能力信息或第二能力响应信息。The sending module 1301 is used to send data to other devices. For example, encoding information, first request message, second response message, third request message, fourth response message, first capability information or second capability response information.
接收模块1303,用于从其他设备接收数据。例如,第一信息、第二请求消息、第一响应消息、第四请求消息、第三响应消息、第二能力信息或第一能力响应信息。The receiving module 1303 is used for receiving data from other devices. For example, the first information, the second request message, the first response message, the fourth request message, the third response message, the second capability information or the first capability response information.
处理模块1302,用于获取用户输入的语音信号,从语音信号中提取多组信息,根据编码表获取每组信息对应的编码信息;根据编码表对第一信息解码获得多组信息,根据多组信息生成语音信号,采用预设声音播放语音信号等。The processing module 1302 is used to obtain the voice signal input by the user, extract multiple sets of information from the voice signal, and obtain the corresponding coding information of each set of information according to the coding table; decode the first information according to the coding table to obtain multiple sets of information, and obtain multiple sets of information according to the multiple sets of information. The information generates a voice signal, and a preset sound is used to play the voice signal, etc.
需要说明的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。It should be noted that, all relevant contents of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.
请参考图14,其示出了本申请实施例提供的终端设备的另一种结构,该终端设备包括:处理器1401、接收器1402、发射器1403、存储器1404和总线1405。处理器1401包括一个或者多个处理核心,处理器1401通过运行软件程序以及模块,从而执行各种功能的应用以及信息处理。接收器1402和发射器1403可以实现为一个通信组件,该通信组件可以是一块基带芯片。存储器1404通过总线1405和处理器1401相连。存储器1404可用于存储至少一个程序指令,处理器1401用于执行至少一个程序指令,以实现上述实施例的技术方案。其实现原理和技术效果与上述方法相关实施例类似,此处不再赘述。Please refer to FIG. 14 , which shows another structure of a terminal device provided by an embodiment of the present application. The terminal device includes: a processor 1401 , a receiver 1402 , a transmitter 1403 , a memory 1404 , and a bus 1405 . The processor 1401 includes one or more processing cores, and the processor 1401 executes various functional applications and information processing by running software programs and modules. The receiver 1402 and the transmitter 1403 may be implemented as a communication component, which may be a baseband chip. The memory 1404 is connected to the processor 1401 through the bus 1405 . The memory 1404 may be configured to store at least one program instruction, and the processor 1401 may be configured to execute the at least one program instruction, so as to implement the technical solutions of the foregoing embodiments. The implementation principle and technical effect thereof are similar to the related embodiments of the above method, and are not repeated here.
当终端开机后,处理器可以读取存储器中的软件程序,解释并执行软件程序的指令,处理软件程序的数据。当需要通过天线发送数据时,处理器对待发送的数据进行基带处理后,输出基带信号至控制电路中的控制电路,控制电路将基带信号进行射频处理后将射频信号通过天线以电磁波的形式向外发送。当有数据发送到终端时,控制电路通过天线接收到射频信号,将射频信号转换为基带信号,并将基带信号输出至处理器,处理器将基带信号转换为数据并对该数据进行处理。When the terminal is powered on, the processor can read the software program in the memory, interpret and execute the instructions of the software program, and process the data of the software program. When it is necessary to send data through the antenna, the processor performs baseband processing on the data to be sent, and outputs the baseband signal to the control circuit in the control circuit. The control circuit performs radio frequency processing on the baseband signal and sends the radio frequency signal through the antenna in the form of electromagnetic waves send. When data is sent to the terminal, the control circuit receives the radio frequency signal through the antenna, converts the radio frequency signal into a baseband signal, and outputs the baseband signal to the processor, which converts the baseband signal into data and processes the data.
本领域技术人员可以理解,为了便于说明,图14仅示出了一个存储器和处理器。在实际的终端中,可以存在多个处理器和存储器。存储器也可以称为存储介质或者存储设备等,本申请实施例对此不做限制。Those skilled in the art can understand that, for the convenience of description, FIG. 14 only shows one memory and one processor. In an actual terminal, there may be multiple processors and memories. The memory may also be referred to as a storage medium or a storage device, etc., which is not limited in this embodiment of the present application.
作为一种可选的实现方式,处理器可以包括基带处理器和中央处理器,基带处理器主要用于对通信数据进行处理,中央处理器主要用于执行软件程序,处理软件程序的数据。本领域技术人员可以理解,基带处理器和中央处理器可以集成在一个处理器中,也可以是各自独立的处理器,通过总线等技术互联。本领域技术人员可以理解,终端可以包括多个基带处理器以适应不同的网络制式,终端可以包括多个中央处理器以增强其处理能力,终端的各个部件可以通过各种总线连接。该基带处理器也可以表述为基带处理电路或者基带 处理芯片。该中央处理器也可以表述为中央处理电路或者中央处理芯片。对通信协议以及通信数据进行处理的功能可以内置在处理器中,也可以以软件程序的形式存储在存储器中,由处理器执行软件程序以实现基带处理功能。该存储器可以集成在处理器中,也可以独立在处理器之外。该存储器包括高速缓存Cache,可以存放频繁访问的数据/指令。As an optional implementation manner, the processor may include a baseband processor and a central processing unit. The baseband processor is mainly used to process communication data, and the central processing unit is mainly used to execute software programs and process data of the software programs. Those skilled in the art can understand that the baseband processor and the central processing unit may be integrated into one processor, or may be independent processors, which are interconnected through technologies such as a bus. Those skilled in the art can understand that a terminal may include multiple baseband processors to adapt to different network standards, a terminal may include multiple central processors to enhance its processing capability, and various components of the terminal may be connected through various buses. The baseband processor can also be expressed as a baseband processing circuit or a baseband processing chip. The central processing unit can also be expressed as a central processing circuit or a central processing chip. The function of processing the communication protocol and communication data may be built in the processor, or may be stored in the memory in the form of a software program, and the processor executes the software program to realize the baseband processing function. The memory can be integrated into the processor or independent of the processor. The memory includes a cache, which can store frequently accessed data/instructions.
在本申请实施例中,处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In this embodiment of the present application, the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which can implement or The methods, steps and logic block diagrams disclosed in the embodiments of this application are executed. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
在本申请实施例中,存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SS)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,不限于此。In this embodiment of the present application, the memory may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SS), etc., or may also be a volatile memory (volatile memory), for example Random-access memory (RAM). Memory is, without limitation, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。本申请各实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DWD)、或者半导体介质(例如,SSD)等。The memory in this embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data. The methods provided by the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line, DSL), or wireless (eg, infrared, wireless, microwave, etc.) A readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The available media can be magnetic media (eg, floppy disks, hard disks, magnetic tapes) ), optical media (eg, digital video disc (DWD), or semiconductor media (eg, SSD), etc.).
本申请实施例提供一种计算机程序产品,当所述计算机程序产品在终端运行时,使得所述终端执行上述实施例中的技术方案。其实现原理和技术效果与上述相关实施例类似,此处不再赘述。The embodiments of the present application provide a computer program product, which enables the terminal to execute the technical solutions in the foregoing embodiments when the computer program product runs on a terminal. The implementation principle and technical effect thereof are similar to those of the above-mentioned related embodiments, which will not be repeated here.
本申请实施例提供一种计算机可读存储介质,其上存储有程序指令,所述程序指令被终端执行时,使得所述终端执行上述实施例的技术方案。其实现原理和技术效果与上述相关实施例类似,此处不再赘述。综上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。The embodiments of the present application provide a computer-readable storage medium, on which program instructions are stored, and when the program instructions are executed by a terminal, the terminal executes the technical solutions of the foregoing embodiments. The implementation principle and technical effect thereof are similar to those of the above-mentioned related embodiments, which will not be repeated here. To sum up, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present application.

Claims (29)

  1. 一种语音处理方法,其特征在于,应用于第一终端设备,所述第一终端设备与第二终端设备处于通话状态,所述方法包括:A voice processing method, characterized in that it is applied to a first terminal device, the first terminal device and the second terminal device are in a call state, the method comprising:
    获取用户输入的语音信号;Obtain the voice signal input by the user;
    在确定当前处于弱信号环境,且确定所述第一终端设备与所述第二终端设备均支持音标编解码时,从所述语音信号中提取多组信息,每组信息包括单字的音标、音调和持续时长;所述音标编解码是指对音标、音调和持续时长进行编解码;When it is determined that it is currently in a weak signal environment, and it is determined that both the first terminal device and the second terminal device support phonetic codec, extracting multiple sets of information from the voice signal, each set of information includes phonetic symbols and tones of a single word and duration; the phonetic codec refers to encoding and decoding phonetic symbols, pitch and duration;
    根据编码表获取所述每组信息对应的编码信息;所述编码表存储有音标、音调和持续时长与编码信息之间的对应关系;Obtain the coding information corresponding to each group of information according to the coding table; the coding table stores the correspondence between phonetic symbols, tones and duration and the coding information;
    向所述第二终端设备发送所述编码信息。The encoded information is sent to the second terminal device.
  2. 根据权利要求1所述的方法,其特征在于,所述编码信息包括音标和音调对应的第一信息分量,以及持续时长对应的第二信息分量。The method according to claim 1, wherein the encoded information comprises a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration.
  3. 根据权利要求2所述的方法,其特征在于,所述第一信息分量包括音标对应的第一信息子分量和音调对应的第二信息子分量。The method according to claim 2, wherein the first information components comprise first information sub-components corresponding to phonetic symbols and second information sub-components corresponding to tones.
  4. 根据权利要求1所述的方法,其特征在于,所述编码表包括全局索引表和常用索引表,所述常用索引表是根据预设时间段内用户使用的单字的次数生成的,所述全局索引表包括音标和该音标的全局索引值,所述常用索引表中包括的音标具有常用索引值以及该音标在所述全局索引表中的全局索引值。The method according to claim 1, wherein the coding table includes a global index table and a common index table, the common index table is generated according to the number of words used by a user within a preset time period, and the global index table is The index table includes a phonetic symbol and a global index value of the phonetic symbol, and the phonetic symbol included in the common index table has a common index value and a global index value of the phonetic symbol in the global index table.
  5. 根据权利要求4所述的方法,其特征在于,所述根据编码表获取所述每组信息对应的编码信息,包括:The method according to claim 4, characterized in that, acquiring the encoding information corresponding to each group of information according to the encoding table comprises:
    对于所述每组信息,确定所述常用索引表中是否包括该组信息中的音标;For each group of information, determine whether the phonetic symbols in the group of information are included in the commonly used index table;
    若所述常用索引表中包括该组信息中的音标,则根据所述常用索引表获取所述编码信息;If the phonetic symbols in the group of information are included in the commonly used index table, then the coding information is obtained according to the commonly used index table;
    若所述常用索引表中不包括该组信息中的音标,则根据所述全局索引表获取所述编码信息。If the phonetic symbols in the group of information are not included in the common index table, the encoding information is acquired according to the global index table.
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述确定当前处于弱信号环境,包括:The method according to any one of claims 1-5, wherein the determining that the current is in a weak signal environment comprises:
    若确定目标参数满足第一预设条件,则向所述第二终端设备发送第一请求消息,所述第一请求消息用于指示所述第二终端设备使用所述音标编解码,所述目标参数用于指示所述第一终端设备当前所处通信环境的信号状态;If it is determined that the target parameter satisfies the first preset condition, a first request message is sent to the second terminal device, where the first request message is used to instruct the second terminal device to use the phonetic symbol codec, the target The parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located;
    接收所述第二终端设备发送的第一响应消息,所述第一响应消息用于指示所述第二终端设备使用所述音标编解码。A first response message sent by the second terminal device is received, where the first response message is used to instruct the second terminal device to use the phonetic symbol codec.
  7. 根据权利要求6所述的方法,其特征在于,所述第一请求消息包括滞后定时器,所述滞后定时器用于指示所述第二终端设备使用所述音标编解码的延迟时间。The method according to claim 6, wherein the first request message includes a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use the phonetic symbol codec.
  8. 根据权利要求1-5中任一项所述的方法,其特征在于,所述确定当前处于弱信号环境,包括:The method according to any one of claims 1-5, wherein the determining that the current is in a weak signal environment comprises:
    接收所述第二终端设备发送的第二请求消息,所述第二请求消息用于指示所述第一终端设备使用所述音标编解码;receiving a second request message sent by the second terminal device, where the second request message is used to instruct the first terminal device to use the phonetic symbol codec;
    向所述第二终端设备发送第二响应消息,所述第二响应消息用于指示所述第一终端设 备使用所述音标编解码。Send a second response message to the second terminal device, where the second response message is used to instruct the first terminal device to use the phonetic symbol codec.
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-8, wherein the method further comprises:
    若确定目标参数满足第二预设条件,则向所述第二终端设备发送第三请求消息,所述第三请求消息用于指示所述第二终端设备使用波形编解码,所述目标参数用于指示所述第一终端设备当前所处通信环境的信号状态;If it is determined that the target parameter satisfies the second preset condition, a third request message is sent to the second terminal device, where the third request message is used to instruct the second terminal device to use waveform encoding and decoding, and the target parameter uses to indicate the signal state of the communication environment where the first terminal device is currently located;
    接收所述第二终端设备发送的第三响应消息,所述第三响应消息用于指示所述第二终端设备使用所述波形编解码。A third response message sent by the second terminal device is received, where the third response message is used to instruct the second terminal device to use the waveform codec.
  10. 根据权利要求1-8中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-8, wherein the method further comprises:
    接收所述第二终端设备发送的第四请求消息,所述第四请求消息用于指示所述第一终端设备使用波形编解码;receiving a fourth request message sent by the second terminal device, where the fourth request message is used to instruct the first terminal device to use waveform codec;
    若确定目标参数满足第二预设条件,则向所述第二终端设备发送第四响应消息,所述第四响应消息用于指示所述第一终端设备使用所述波形编解码,所述目标参数用于指示所述第一终端设备当前所处通信环境的信号状态。If it is determined that the target parameter meets the second preset condition, a fourth response message is sent to the second terminal device, where the fourth response message is used to instruct the first terminal device to use the waveform codec, the target The parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located.
  11. 根据权利要求6或9或10所述的方法,其特征在于,所述目标参数包括下列中的至少一项:所述第一终端设备的位置信息、所述第一终端设备当前接入小区的小区标识、所述第一终端设备接收信号的信号强度或语音丢包率。The method according to claim 6 or 9 or 10, wherein the target parameter includes at least one of the following: location information of the first terminal device, information about a cell currently accessed by the first terminal device The cell identifier, the signal strength of the signal received by the first terminal device, or the voice packet loss rate.
  12. 根据权利要求1-11中任一项所述的方法,其特征在于,确定所述第一终端设备支持音标编解码,包括:The method according to any one of claims 1-11, wherein determining that the first terminal device supports phonetic symbol encoding and decoding comprises:
    若确定所述第一终端设备没有打开音标编解码的功能,则生成并输出提示信息;If it is determined that the first terminal device does not have the function of phonetic symbol codec enabled, then generate and output prompt information;
    接收所述用户的第一指令;receiving a first instruction from the user;
    根据所述第一指令打开所述功能。The function is turned on according to the first instruction.
  13. 根据权利要求1-11中任一项所述的方法,其特征在于,所述确定所述第一终端设备与所述第二终端设备均支持音标编解码,包括:The method according to any one of claims 1-11, wherein the determining that both the first terminal device and the second terminal device support phonetic symbol encoding and decoding, comprising:
    向所述第二终端设备发送第一能力信息,所述第一能力信息用于指示所述第一终端设备支持音标编解码;sending first capability information to the second terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic codec;
    接收所述第二终端设备发送的第一能力响应信息,所述第一能力响应信息用于指示所述第二终端设备支持音标编解码;receiving first capability response information sent by the second terminal device, where the first capability response information is used to indicate that the second terminal device supports phonetic symbol encoding and decoding;
    或者,or,
    接收所述第二终端设备发送的第二能力信息,所述第二能力信息用于指示所述第二终端设备支持音标编解码;receiving second capability information sent by the second terminal device, where the second capability information is used to indicate that the second terminal device supports phonetic codec;
    向第二终端设备发送第二能力响应信息,所述第二能力响应信息用于指示所述第一终端设备支持音标编解码。Send second capability response information to the second terminal device, where the second capability response information is used to indicate that the first terminal device supports phonetic codec codec.
  14. 根据权利要求1-11中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-11, wherein the method further comprises:
    显示设置界面;Display the setting interface;
    接收所述用户在所述设置界面中的操作;receiving the user's operation in the setting interface;
    响应于所述操作,打开音标编解码的功能。In response to the operation, the function of phonetic codec is turned on.
  15. 一种语音处理方法,其特征在于,应用于第二终端设备,所述第二终端设备与第一终端设备处于通话状态,所述方法包括:A voice processing method, characterized in that it is applied to a second terminal device, and the second terminal device is in a call state with the first terminal device, the method comprising:
    接收所述第一终端设备发送的第一信息;receiving the first information sent by the first terminal device;
    根据编码表对所述第一信息解码,获得多组信息;每组信息包括单字的音标、音调和持续时长,所述编码表存储有音标、音调和持续时长与编码信息之间的对应关系;According to the coding table, the first information is decoded, and multiple groups of information are obtained; each group of information includes the phonetic symbol, tone and duration of a single character, and the coding table stores the correspondence between the phonetic symbol, the pitch and the duration and the encoded information;
    根据所述多组信息生成语音信号;generating a speech signal according to the multiple sets of information;
    采用预设声音播放所述语音信号。The voice signal is played with a preset sound.
  16. 根据权利要求15所述的方法,其特征在于,所述根据编码表对所述第一信息解码,获得多组信息,包括:The method according to claim 15, wherein the decoding the first information according to the coding table to obtain multiple sets of information, comprising:
    从所述第一信息中依次获取多个编码信息,所述编码信息的长度为预设比特长度;Obtain a plurality of encoding information in sequence from the first information, and the length of the encoding information is a preset bit length;
    对于每个所述编码信息,根据所述编码表获取该编码信息对应的音标、音调和持续时长。For each piece of encoding information, the phonetic symbol, pitch and duration corresponding to the encoding information are acquired according to the encoding table.
  17. 根据权利要求16所述的方法,其特征在于,所述编码信息包括音标和音调对应的第一信息分量,以及持续时长对应的第二信息分量。The method according to claim 16, wherein the encoded information comprises a first information component corresponding to phonetic symbols and tones, and a second information component corresponding to a duration.
  18. 根据权利要求17所述的方法,其特征在于,所述第一信息分量包括音标对应的第一信息子分量和音调对应的第二信息子分量。The method according to claim 17, wherein the first information components comprise first information sub-components corresponding to phonetic symbols and second information sub-components corresponding to tones.
  19. 根据权利要求15所述的方法,其特征在于,所述编码表包括全局索引表和常用索引表,所述常用索引表是根据预设时间段内用户使用的单字的次数生成的,所述全局索引表包括音标和该音标的全局索引值,所述常用索引表中包括的音标具有常用索引值以及该音标在所述全局索引表中的全局索引值。The method according to claim 15, wherein the coding table includes a global index table and a commonly used index table, the commonly used index table is generated according to the number of words used by a user within a preset time period, and the global index table is The index table includes a phonetic symbol and a global index value of the phonetic symbol, and the phonetic symbol included in the common index table has a common index value and a global index value of the phonetic symbol in the global index table.
  20. 根据权利要求15-19中任一项所述的方法,其特征在于,所述接收所述第一终端设备发送的第一信息之前,还包括:The method according to any one of claims 15-19, wherein before the receiving the first information sent by the first terminal device, the method further comprises:
    接收所述第一终端设备发送的第一请求消息,所述第一请求消息用于指示所述第二终端设备使用音标编解码;所述音标编解码是指对音标、音调和持续时长进行编解码;Receive a first request message sent by the first terminal device, where the first request message is used to instruct the second terminal device to use phonetic codec; the phonetic codec refers to encoding phonetic symbols, pitch and duration decoding;
    向所述第一终端设备发送第一响应消息,所述第一响应消息用于指示所述第二终端设备使用所述音标编解码。Send a first response message to the first terminal device, where the first response message is used to instruct the second terminal device to use the phonetic symbol codec.
  21. 根据权利要求20所述的方法,其特征在于,所述第一请求消息包括滞后定时器,所述滞后定时器用于指示所述第二终端设备使用所述音标编解码的延迟时间。The method according to claim 20, wherein the first request message includes a hysteresis timer, and the hysteresis timer is used to indicate a delay time for the second terminal device to use the phonetic symbol codec.
  22. 根据权利要求15-19中任一项所述的方法,其特征在于,所述接收所述第一终端设备发送的第一信息之前,还包括:The method according to any one of claims 15-19, wherein before the receiving the first information sent by the first terminal device, the method further comprises:
    若确定目标参数满足第一预设条件,则向所述第一终端设备发送第二请求消息,所述第二请求消息用于指示所述第一终端设备使用所述音标编解码,所述音标编解码是指对音标、音调和持续时长进行编解码,所述目标参数用于指示所述第二终端设备当前所处通信环境的信号状态;If it is determined that the target parameter satisfies the first preset condition, a second request message is sent to the first terminal device, where the second request message is used to instruct the first terminal device to use the phonetic symbol codec, the phonetic symbol Encoding and decoding refers to encoding and decoding phonetic symbols, tones and duration, and the target parameter is used to indicate the signal state of the communication environment where the second terminal device is currently located;
    接收所述第一终端设备发送的第二响应消息,所述第二响应消息用于指示所述第一终端设备使用所述音标编解码。A second response message sent by the first terminal device is received, where the second response message is used to instruct the first terminal device to use the phonetic symbol codec.
  23. 根据权利要求15-22中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15-22, wherein the method further comprises:
    接收所述第一终端设备发送的第三请求消息,所述第三请求消息用于指示所述第二终端设备使用波形编解码;receiving a third request message sent by the first terminal device, where the third request message is used to instruct the second terminal device to use waveform codec;
    若确定目标参数满足第二预设条件,则向所述第一终端设备发送第三响应消息,所述第三响应消息用于指示所述第二终端设备使用所述波形编解码,所述目标参数用于指示所述第一终端设备当前所处通信环境的信号状态。If it is determined that the target parameter meets the second preset condition, a third response message is sent to the first terminal device, where the third response message is used to instruct the second terminal device to use the waveform codec, the target The parameter is used to indicate the signal state of the communication environment where the first terminal device is currently located.
  24. 根据权利要求15-22中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15-22, wherein the method further comprises:
    若确定目标参数满足第二预设条件,则向所述第一终端设备发送第四请求消息,所述第四请求消息用于指示所述第一终端设备使用波形编解码;If it is determined that the target parameter satisfies the second preset condition, sending a fourth request message to the first terminal device, where the fourth request message is used to instruct the first terminal device to use waveform encoding and decoding;
    接收所述第一终端设备发送的第四响应消息,所述第四响应消息用于指示所述第一终端设备使用所述波形编解码。A fourth response message sent by the first terminal device is received, where the fourth response message is used to instruct the first terminal device to use the waveform codec.
  25. 根据权利要求22-24中任一项所述的方法,其特征在于,所述目标参数包括下列中的至少一项:所述第二终端设备的位置信息、所述第二终端设备当前接入小区的小区标识、所述第二终端设备接收信号的信号强度或语音丢包率。The method according to any one of claims 22-24, wherein the target parameter includes at least one of the following: location information of the second terminal device, current access of the second terminal device The cell identifier of the cell, the signal strength of the signal received by the second terminal device, or the voice packet loss rate.
  26. 根据权利要求15-25中任一项所述的方法,其特征在于,所述接收所述第一终端设备发送的第一信息之前,还包括:The method according to any one of claims 15-25, wherein before the receiving the first information sent by the first terminal device, the method further comprises:
    接收所述第一终端设备发送的第一能力信息,所述第一能力信息用于指示所述第一终端设备支持音标编解码;receiving first capability information sent by the first terminal device, where the first capability information is used to indicate that the first terminal device supports phonetic codec;
    向所述第一终端设备发送第一能力响应信息,所述第一能力响应信息用于指示所述第二终端设备支持音标编解码;所述音标编解码是指对音标、音调和持续时长进行编解码;Send first capability response information to the first terminal device, where the first capability response information is used to indicate that the second terminal device supports phonetic codec; codec;
    或者,or,
    向所述第一终端设备发送的第二能力信息,所述第二能力信息用于指示所述第二终端设备支持音标编解码;second capability information sent to the first terminal device, where the second capability information is used to indicate that the second terminal device supports phonetic codec;
    接收所述第一终端设备发送的第二能力响应信息,所述第二能力响应信息用于指示所述第一终端设备支持音标编解码。Receive second capability response information sent by the first terminal device, where the second capability response information is used to indicate that the first terminal device supports phonetic codec codec.
  27. 根据权利要求15-25中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 15-25, wherein the method further comprises:
    显示设置界面;Display the setting interface;
    接收用户在所述设置界面中的操作;receiving the user's operation in the setting interface;
    响应于所述操作,打开音标编解码的功能,所述音标编解码是指对音标、音调和持续时长进行编解码。In response to the operation, the function of phonetic symbol encoding and decoding is turned on, and the phonetic symbol encoding and decoding refers to encoding and decoding phonetic symbols, pitch and duration.
  28. 一种终端设备,其特征在于,包括处理器、存储器和收发器,所述收发器用于与其他设备通信,所述处理器用于调用所述存储器中存储的程序,以执行如权利要求1-27中任一项所述的方法。A terminal device, characterized in that it includes a processor, a memory and a transceiver, the transceiver is used to communicate with other devices, and the processor is used to call a program stored in the memory to execute the program according to claims 1-27 The method of any of the above.
  29. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在终端设备上运行时,使得所述终端设备执行如权利要求1-27中任一项所述的方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores computer instructions, when the computer instructions are executed on a terminal device, the terminal device is made to perform any one of claims 1-27. one of the methods described.
PCT/CN2021/138389 2020-12-25 2021-12-15 Voice processing method, terminal device, and storage medium WO2022135237A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011568861.1A CN114694662A (en) 2020-12-25 2020-12-25 Voice processing method, terminal device and storage medium
CN202011568861.1 2020-12-25

Publications (1)

Publication Number Publication Date
WO2022135237A1 true WO2022135237A1 (en) 2022-06-30

Family

ID=82129873

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138389 WO2022135237A1 (en) 2020-12-25 2021-12-15 Voice processing method, terminal device, and storage medium

Country Status (2)

Country Link
CN (1) CN114694662A (en)
WO (1) WO2022135237A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101835204A (en) * 2010-03-01 2010-09-15 华为技术有限公司 Voice communication method, equipment and system
JP2016066878A (en) * 2014-09-24 2016-04-28 沖電気工業株式会社 Communication system, communication state determination method, transmitting/receiving device, and program
CN108881182A (en) * 2018-05-30 2018-11-23 上海携程商务有限公司 The networking telephone realization method and system of mobile terminal based on IOS
CN108966250A (en) * 2018-06-29 2018-12-07 努比亚技术有限公司 Weak signal call method, mobile terminal and computer readable storage medium
CN110149167A (en) * 2019-05-05 2019-08-20 Oppo广东移动通信有限公司 Method and device for dynamically adjusting codes, mobile terminal and storage medium
CN110913073A (en) * 2019-11-27 2020-03-24 深圳传音控股股份有限公司 Voice processing method and related equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101835204A (en) * 2010-03-01 2010-09-15 华为技术有限公司 Voice communication method, equipment and system
JP2016066878A (en) * 2014-09-24 2016-04-28 沖電気工業株式会社 Communication system, communication state determination method, transmitting/receiving device, and program
CN108881182A (en) * 2018-05-30 2018-11-23 上海携程商务有限公司 The networking telephone realization method and system of mobile terminal based on IOS
CN108966250A (en) * 2018-06-29 2018-12-07 努比亚技术有限公司 Weak signal call method, mobile terminal and computer readable storage medium
CN110149167A (en) * 2019-05-05 2019-08-20 Oppo广东移动通信有限公司 Method and device for dynamically adjusting codes, mobile terminal and storage medium
CN110913073A (en) * 2019-11-27 2020-03-24 深圳传音控股股份有限公司 Voice processing method and related equipment

Also Published As

Publication number Publication date
CN114694662A (en) 2022-07-01

Similar Documents

Publication Publication Date Title
US11227129B2 (en) Language translation device and language translation method
US9547642B2 (en) Voice to text to voice processing
JP6113302B2 (en) Audio data transmission method and apparatus
WO2020063146A1 (en) Data transmission method and system, and bluetooth headphone
CN110493123B (en) Instant messaging method, device, equipment and storage medium
CN110692055A (en) Keyword group detection using audio watermarking
US20200118569A1 (en) Conference sound box and conference recording method, apparatus, system and computer storage medium
CN109036406A (en) A kind of processing method of voice messaging, device, equipment and storage medium
WO2020238058A1 (en) Voice transmission method and apparatus, computer device and storage medium
CN102067209B (en) Methods and systems for simplifying copying and pasting transcriptions generated from a dictation based speech-to-text system
CN103646645B (en) A kind of method exported based on voice translation text
WO2020237886A1 (en) Voice and text conversion transmission method and system, and computer device and storage medium
CN103514882A (en) Voice identification method and system
CN110351419B (en) Intelligent voice system and voice processing method thereof
CN112530417B (en) Voice signal processing method and device, electronic equipment and storage medium
JP2022101663A (en) Human-computer interaction method, device, electronic apparatus, storage media and computer program
CN108769891A (en) A kind of audio frequency transmission method and mobile translation equipment
CN108418791A (en) Communication means and mobile terminal with addition caption function
US11580954B2 (en) Systems and methods of handling speech audio stream interruptions
US11328131B2 (en) Real-time chat and voice translator
WO2022135237A1 (en) Voice processing method, terminal device, and storage medium
JP2017068061A (en) Communication terminal and voice recognition system
EP3113175A1 (en) Method for converting text to individual speech, and apparatus for converting text to individual speech
JP2023529699A (en) clear text echo
CN112562688A (en) Voice transcription method, device, recording pen and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21909233

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21909233

Country of ref document: EP

Kind code of ref document: A1