CN112788187A

CN112788187A - Audio data playing method, device, equipment, storage medium, program and terminal

Info

Publication number: CN112788187A
Application number: CN202011563770.9A
Authority: CN
Inventors: 葛永亮; 梅杰; 贺学焱
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-05-11

Abstract

The application provides an audio data playing method, device, equipment, storage medium, program and terminal, which relate to the field of artificial intelligence such as voice technology and Internet of vehicles and comprise the following steps: if the situation that the audio is blocked in the conversation process is determined, determining an effective audio segment comprising the human voice in the received audio data packet; and playing the effective audio segment in the audio data packet. The audio data playing method, the device, the equipment, the storage medium, the program and the terminal can play the effective audio segment in the audio data packet under the condition of audio blockage, and then can play the received audio data in a short time under the condition of no loss of effective information, so that the electronic equipment can also ensure normal conversation quality under the condition of poor network quality.

Description

Audio data playing method, device, equipment, storage medium, program and terminal

Technical Field

The present application relates to artificial intelligence technologies such as voice technology and internet of vehicles in computer technology, and in particular, to an audio data playing method, apparatus, device, storage medium, program, and terminal.

Background

The Internet phone is also called VOIP (Voice over Internet Protocol, Voice over IP) phone, and is a communication mode for directly dialing a fixed phone and a mobile phone of a counterpart through the Internet. The charges for internet telephony are cheaper than conventional telephony and are therefore widely used.

In the process of using the network telephone, the network quality directly influences the call quality. If the network quality is poor, the audio reception will be blocked, and the received audio will not be played normally.

Therefore, how to keep the audio data transmitted in the network phone to be played normally under the condition of poor network quality is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application provides an audio data playing method, device, equipment, storage medium, program and terminal, which aim to solve the problem that the network telephone call quality is poor under the condition of poor network quality in the prior art.

According to a first aspect of the present application, there is provided an audio data playing method in a call process, including:

in response to determining that there is an audio blockage during the call, determining a valid audio segment comprising a human voice in the received audio data packet;

and playing the effective audio segment in the audio data packet.

According to a second aspect of the present application, there is provided an audio data playing apparatus during a call, comprising:

a determining unit, configured to determine, in response to determining that there is audio blocking during a call, an effective audio segment including a human voice in a received audio data packet;

and the playing unit is used for playing the effective audio segment in the audio data packet.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method for playing audio data during a call according to the first aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the audio data playing method during a call according to the first aspect.

According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method for playing audio data during a call as described in the first aspect.

According to a sixth aspect of the present application, there is provided a terminal having a call function, comprising: an audio data playback apparatus during a call as described in the second aspect.

The audio data playing method, device, equipment, storage medium, program and terminal provided by the application comprise the following steps: in response to determining that there is an audio blockage during the call, determining a valid audio segment comprising a human voice in the received audio data packet; and playing the effective audio segment in the audio data packet. The audio data playing method, the device, the equipment, the storage medium, the program and the terminal can play the effective audio segment in the audio data packet under the condition of audio blockage, and then can play the received audio data in a short time under the condition of no loss of effective information, so that the electronic equipment can also ensure normal conversation quality under the condition of poor network quality.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a diagram of an application scenario illustrated in an exemplary embodiment of the present application;

fig. 2 is a flowchart illustrating an audio data playing method in a call process according to an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of determining valid audio segments as provided herein;

fig. 4 is a flowchart illustrating an audio data playing method during a call according to another exemplary embodiment of the present application;

FIG. 5 is a schematic diagram illustrating the determination of an active audio segment in accordance with an exemplary embodiment of the present application;

fig. 6 is a flowchart illustrating an audio data playing method during a call according to another exemplary embodiment of the present application;

FIG. 7 is a diagram illustrating the determination of combined audio data according to an exemplary embodiment of the present application;

fig. 8 is a schematic structural diagram of an audio data playing apparatus during a call according to an exemplary embodiment of the present application;

fig. 9 is a schematic structural diagram of an audio data playing apparatus during a call according to another exemplary embodiment of the present application;

FIG. 10 is a block diagram of an electronic device shown in an exemplary embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The network telephone is a communication mode of directly dialing a fixed telephone or a mobile phone of the other party through the internet. As shown in fig. 1, a call can be made between a first client 11 and a second client 12 via the internet.

The first terminal 11 and the second terminal 12 are connected via a network, and if the network quality is poor, the call quality is poor. For example, when the network quality is poor, the audio data packet sent from the first terminal 11 to the second terminal 12 is blocked, and cannot reach the second terminal 12 in time. In this case, there is a case where the audio packets that are jammed collectively reach the second terminal 12.

In order to release the buffer as soon as possible, the second terminal 12 performs n-times speed playing on the received audio data packet, which may result in too fast playing audio, and the user hears an effect like a "serge". As another example, the second terminal 12 may discard a portion of the received audio data packets, resulting in the user not hearing the portion of the audio content.

Therefore, when the network quality is poor, the call quality of the network phone also becomes poor.

In order to solve the above technical problem, in the method provided by the present application, if an audio blocking condition is found in a call process, an effective audio segment with a voice is identified in a received audio data packet, and only an effective audio is played. In this embodiment, the length of the audio data to be played can be shortened without losing useful information, and the user can hear the useful information in the call when the network quality is poor.

The application provides an audio data playing scheme in a conversation process, which is applied to the fields of artificial intelligence such as a voice technology and an Internet of vehicles in a computer technology, so that the network conversation quality is optimized under the condition of poor network quality.

Fig. 2 is a flowchart illustrating an audio data playing method in a call process according to an exemplary embodiment of the present application.

As shown in fig. 2, the audio data playing method in a call process provided by the present application includes:

in step 201, in response to determining that there is audio blocking during the call, a valid audio segment including a human voice is determined in the received audio data packet.

The method provided by the application can be applied to an electronic device with computing capability, wherein the electronic device has communication capability, and can be a computer loaded with communication software, for example, and can be a fixed telephone or a mobile phone.

Specifically, the electronic device may receive a call request from another terminal, and respond to the call request. For example, the user a may use the first terminal to send a call request to an electronic device executing the solution of the present application. The user B using the electronic equipment can operate the electronic equipment to enable the electronic equipment to receive the call request sent by the first terminal.

Further, the other terminal may collect the audio signal in real time and transmit it to the electronic device, for example, user a may speak into a microphone of the first terminal, and the first terminal may transmit the recorded audio signal to the electronic device through the network.

In practical application, other terminals can collect audio signals in real time, and the collected signals are irrelevant to whether a user speaks. For example, if the user a does not speak in the first 3 seconds of the passing process and speaks in the 4 th to 10 th seconds, the first terminal may send an audio signal for 10 seconds to the electronic device, and the audio signal for 10 seconds includes a part without human voice for 3 seconds.

Under the condition that the network is unobstructed, the electronic equipment can play the received audio data frame by frame. For example, the first terminal sequentially sends 5 frames of data to the electronic device, and the electronic device may sequentially play the 5 frames of data.

Specifically, if the electronic device receives a plurality of audio data packets at the same time, it may be determined that there is currently audio congestion. For example, when the network is unobstructed, the electronic device receives 10 frames of audio data packets every millisecond, but the electronic device receives 15 audio data packets within 1 millisecond, and the electronic device may determine that there is audio blocking in the current call process.

However, when audio is blocked, if the electronic device plays the received audio data in sequence, the buffer cannot be released in time, and even the audio data transmitted by the network cannot be received normally. For example, the first terminal sends audio data with a duration of 10 seconds to the electronic device, and the data is blocked in the network and cannot be normally transmitted to the electronic device. During the audio jam, the first terminal sends audio data to the outside, for example, the first terminal sends audio data for 10 seconds to the electronic device.

When the audio data that is blocked arrives at the electronic device end, for example, the audio data that is 10 seconds long arrives at the electronic device end, the audio data that is 10 seconds and is subsequently sent by the first terminal may also arrive at the electronic device end, which may cause the electronic device to be unable to normally play the audio data.

One reason why the electronic device cannot normally play the audio data is that the audio data received in the electronic device is long, and therefore, in the method provided by the application, the valid audio segment can be determined in the received audio data packet.

Specifically, the electronic device may detect a portion of the audio data packet that includes a human voice and a portion that does not include a human voice. The portion including the human voice is used as an active audio segment in the audio data packet.

In one embodiment, the electronic device may extract each active audio segment in the audio data packet, thereby extracting a portion of the audio data packet that includes only human voice.

In another embodiment, the electronic device may remove a portion that does not include the voice from the audio data packet, so as to obtain a portion that only includes the voice from the audio data packet.

Fig. 3 is a schematic diagram of determining an active audio segment as provided herein.

As shown in fig. 3, for one audio data packet 31, the electronic device may determine a valid audio segment 32 therein that includes a human voice, and if the audio data packet includes a plurality of valid audio segments, the electronic device may determine a plurality of valid audio segments 32.

The electronic device may also splice the determined valid audio segments 32 to obtain complete valid audio segments 33 corresponding to the audio data packets.

In step 202, the active audio segment in the audio data packet is played.

Further, if the electronic device receives a plurality of audio data packets, the electronic device may determine, for each audio data packet, a valid audio segment therein.

In actual application, the electronic equipment can play the determined effective audio segment. If the electronic equipment determines the effective audio segments corresponding to the plurality of audio data packets, the determined effective audio segments can be played in sequence according to the receiving sequence.

For example, if the electronic device receives 3 audio data packets A, B, C when the audio is blocked, the electronic device may determine corresponding valid audio segments a, b, and c for the 3 audio data packets, and play a, b, and c in sequence.

In this embodiment, the electronic device may extract the valid audio segment in the received audio data packet in case of audio blocking, and may shorten the audio data. The audio data has a time attribute, for example, if the length of the audio data of 3 seconds is played within 3 seconds, the playing effect is normal, and if the length of the audio data of 3 seconds is played within 1 second, the playing effect is abnormal.

According to the method provided by the application, under the condition that the audio is blocked, the received audio data packet is shortened under the condition that effective information is not lost, so that the normal playing time of the audio data packet is shortened. When the audio data is played, the length of the effective audio segment to be played is shorter, so that the effective audio segment can be normally played in a shorter time.

The audio data playing method in the call process comprises the following steps: in response to determining that there is an audio blockage during the call, determining a valid audio segment comprising a human voice in the received audio data packet; and playing the effective audio segment in the audio data packet. The audio data playing method in the call process can play the effective audio segment in the audio data packet under the condition of audio blockage, and then can play the received audio data in a short time under the condition of no loss of effective information, so that the electronic equipment can also ensure normal call quality under the condition of poor network quality.

Fig. 4 is a flowchart illustrating an audio data playing method in a call process according to another exemplary embodiment of the present application.

As shown in fig. 4, the audio data playing method in the call process provided by the present application includes:

step 401A, if the number of the received audio data packets in the unit time exceeds a preset number value, it is determined that there is an audio blocking condition in the call process.

Specifically, a preset quantity value may be set, which is used to represent the quantity of audio data packets received by the electronic device in a unit time under a normal network condition. For example, the electronic device should receive 10 audio packets per unit time.

Further, the unit time may be set according to a requirement, for example, the unit time may be a time when the electronic device receives a frame of audio data packet under a normal condition of the network, and in this case, the preset quantity value may be 1.

In practical application, the unit time may be, for example, 1ms, 5ms, and the like, and may be specifically set according to requirements.

If the audio data packets received by the electronic device in unit time exceed the preset quantity value, it can be considered that there is an audio blocking condition in the current call process, and the audio data packets are transmitted to the electronic device in a centralized manner.

In this embodiment, the electronic device may determine whether there is an audio blocking condition during the call while receiving the audio data packet. The audio blocking condition existing in the conversation process can be determined only by monitoring the received audio data packet without additional data processing, so that the audio data packet received under the audio blocking condition can be processed in time.

Step 401B, if the space occupied by the received audio data packet exceeds the preset buffer space, determining that there is an audio blocking condition in the call process.

Specifically, a buffer space for storing the audio data packet is provided in the electronic device. The electronic equipment receives the audio data packet and plays the received audio data packet at the same time. In this embodiment, the audio data packets in the buffer space have an ingress and an egress, and the space occupied by the audio data packets stored in the buffer space does not exceed the preset buffer space.

However, if the space occupied by the received audio data packets exceeds the preset buffer space, it may be considered that the audio data packets received by the electronic device are too many to be played in time, and therefore, the space occupied by the audio data packets may exceed the preset buffer space. Under the circumstance, it can be considered that the audio blocking condition exists in the current call process, and the audio data packets are transmitted to the electronic equipment in a centralized manner, so that the data packets cannot be played in time.

In practical application, whether audio blocking exists in the current call process can be determined based on the manner of steps 401A and/or 401B.

In this embodiment, no additional data processing is required, and the electronic device can determine the audio blocking condition existing in the call process only by monitoring the preset buffer space for storing the audio data packet, so that the audio data packet received under the audio blocking condition can be processed in time.

Step 402, in response to determining that there is audio blocking during the call, performing valid detection on valid audio data packets.

If the electronic device determines that the audio blocking condition exists in the call process, the electronic device can effectively detect the received audio data packet.

Specifically, the electronic device may effectively detect the audio data packet stored in the preset buffer when it is determined that the audio is blocked during the call.

Further, the valid detection means detecting valid information and invalid information included in the audio data packet, the valid information means an audio portion with a voice, and the invalid information means an audio portion without a voice.

In this way, the electronic device is able to identify audio segments in the audio data packets that include human voices.

Step 403, removing the invalid audio segment from the audio data packet according to the detection result to obtain an effective audio segment; the detection result includes valid information and invalid information of the audio data packet.

In practical application, the electronic device may determine each invalid audio segment in the audio data packet according to the detection result, and may remove the invalid audio segments, thereby retaining the valid audio segments. The active audio segment is of a shorter duration so that the electronic device can play the active portion of the audio data packet in a shorter time.

Wherein, the electronic equipment can directly cut off invalid audio segments in the audio data packets.

Fig. 5 is a diagram illustrating determination of an active audio segment according to an exemplary embodiment of the present application.

As shown in fig. 5, the electronic device receives an audio data packet as shown at 51, and the electronic device may detect the audio data packet 51 and cut off invalid audio segments in the audio data packet to obtain valid audio segments as shown at 52.

In this embodiment, the electronic device can shorten the duration of the audio data packet, and the shortened valid audio segment does not lose valid information. Therefore, when the audio is played, the electronic equipment can play the effective audio segment at a normal speed.

And step 404, playing the effective audio data in the audio data packet according to the preset multiplying power.

Specifically, the method provided by the present application may further set a preset magnification, and the preset magnification may be, for example, a numerical value between 1 and 2. The electronic equipment can play the effective audio segment at a multiple speed by using a preset multiplying power, specifically, the effective audio segment can be played at the multiple speed according to the preset multiplying power.

Furthermore, the effective audio segment has shorter duration compared with a complete audio data packet, the shorter effective audio segment is played at double speed, and the playing time of the audio can be further reduced, so that a next audio data packet can be played as soon as possible, and the cache space of the electronic equipment is released as soon as possible.

Fig. 6 is a flowchart illustrating an audio data playing method in a call process according to another exemplary embodiment of the present application.

As shown in fig. 6, the audio data playing method in the call process provided by the present application includes:

step 601, in response to determining that there is audio blocking during the call, determining an effective audio segment including a human voice in the received audio data packet.

The implementation principle and manner of step 601 are similar to those of step 201, and are not described again.

Step 602, a second audio data packet is obtained.

In the method provided by the application, after the electronic device finishes processing the current audio data packet, the next second audio data packet can be obtained.

For example, if the electronic device determines that there is audio blocking during the call, step 601 may be executed for each currently buffered audio data packet.

Specifically, the electronic device may obtain a second audio data packet received after the audio data packets.

And 603, combining the effective audio segment and the second audio data packet to obtain combined audio data.

Furthermore, the electronic device may combine the valid audio segment and the second audio data packet to obtain combined audio data.

In one embodiment, the electronic device may directly splice the valid audio segment with the data in the second audio data packet to obtain the combined audio data.

In practical application, the effective audio segment and the obtained second audio data are directly spliced, so that the data processing amount of the electronic equipment can be reduced, the electronic equipment can quickly play the received audio data packet under the condition of less data processing amount, and the preset cache space of the electronic equipment is released.

In another embodiment, the electronic device may perform valid detection on the second audio data packet to obtain a second valid audio segment in the second audio data packet; and splicing the effective audio segment and the second effective audio segment to obtain combined audio data.

When the electronic equipment determines that the audio blocking condition exists, the method not only can effectively detect the currently cached audio data packet, but also can effectively detect the subsequently received second audio data packet.

Specifically, the electronic device may determine a second valid audio segment in the second audio data packet, and concatenate the valid audio segment and the second valid audio segment to obtain the combined audio data.

Fig. 7 is a diagram illustrating determining combined audio data according to an exemplary embodiment of the present application.

As shown in fig. 7, the electronic device may determine a valid audio segment 72 based on the buffered audio data packet 71 in the event of an audio block. The electronic device may also retrieve a second audio data packet 73 received thereafter and determine a second active audio segment 74.

Further, the electronic device may splice the active audio segment 72 with the second active audio segment 74 to obtain combined audio data 75.

In practice, the total duration of the audio data packet 71 and the second audio data packet 73 is t1, and the total duration of the valid audio segment 72 and the second valid audio segment 74 is t 2. The electronic device needs to spend time t1 to play the audio data packet 71 and the second audio data packet 73, and by extracting the valid audio segment, the electronic device only needs to spend time t2 to play the valid information in the audio data packet 71 and the second audio data packet 73.

Through the implementation mode, the electronic equipment can quickly play the received audio data packet, and the problem that the preset cache space cannot be timely released due to backlog caused by network transmission blockage of the audio data packet is avoided.

And step 604, playing the combined audio data according to a preset multiplying power.

In practical application, the electronic device can play the combined audio data according to a preset multiplying power.

If the effective audio segment is long, the second audio data packet received again by the electronic device cannot be played in time after the effective audio segment is played, and even subsequently received audio data packets cannot be played in time.

Specifically, in any one of the above embodiments, the electronic device performing effective detection on the audio data packet includes:

extracting frequency spectrum characteristic information in the audio data packet; and determining that the audio data packet comprises an effective audio frequency segment of the human voice and an ineffective audio frequency segment without the human voice according to the frequency spectrum characteristic information and the preset human voice frequency spectrum information.

Further, when the electronic device effectively detects any audio data packet, the electronic device may extract the spectral feature information of the audio data packet, for example, the energy value of the audio at each instant in the audio data packet may be obtained and used as the spectral feature information.

In practical application, the voice frequency spectrum information can be preset, and the frequency spectrum characteristic information in the audio data packet can be compared with the preset voice frequency spectrum information, so that the effective part including the voice in the audio data packet and the ineffective part not including the voice are determined.

In the method, the effective audio segment in the audio data packet received by the electronic equipment can be identified, and then only the effective audio segment can be played, so that the audio playing time is shortened on the premise of not losing information.

Fig. 8 is a schematic structural diagram of an audio data playing apparatus during a call according to an exemplary embodiment of the present application.

As shown in fig. 8, the audio data playing apparatus 800 in the call process provided by the present application includes:

a determining unit 810, configured to determine, if it is determined that an audio blocking condition exists during a call, an effective audio segment including a human voice in a received audio data packet;

a playing unit 820, configured to play the valid audio segment in the audio data packet.

The audio data playing device comprises a determining unit, a judging unit and a playing unit, wherein the determining unit is used for responding to the condition that audio blockage exists in the conversation process and determining an effective audio segment comprising human voice in a received audio data packet; and the playing unit is used for playing the effective audio segments in the audio data packets. The device provided by the application can play the effective audio segment in the audio data packet under the condition of audio blockage, and then can play the received audio data in a short time under the condition of not losing effective information, so that the electronic equipment can also ensure that the conversation quality is normal under the condition of poor network quality.

Fig. 9 is a schematic structural diagram of an audio data playing apparatus during a call according to another exemplary embodiment of the present application.

As shown in fig. 9, the present application provides an apparatus 900 for playing audio data during a call, wherein the determining unit 810 includes:

a detection module 811 for performing valid detection on the valid audio data packets;

a cutting module 812, configured to remove an invalid audio segment from the audio data packet according to the detection result, so as to obtain the valid audio segment; wherein, the detection result includes valid information and invalid information of the audio data packet.

The playing unit 820 is specifically configured to:

and playing the effective audio segment in the audio data packet according to the preset multiplying power.

Wherein, still include:

an obtaining unit 830, configured to obtain a second audio data packet;

the play unit 820 includes:

a combining module 821, configured to combine the effective audio segment and the second audio data packet to obtain combined audio data;

and a playing module 822, configured to play the combined audio data according to a preset magnification.

Wherein the combination module 821 is specifically configured to:

and splicing the effective audio segment with the data in the second audio data packet to obtain the combined audio data.

Wherein the combination module 821 is specifically configured to:

performing effective detection on the second audio data packet to obtain a second effective audio segment in the second audio data packet;

and splicing the effective audio segment and the second effective audio segment to obtain the combined audio data.

If the number of the received audio data packets in the unit time exceeds the preset number value, the determining unit 810 determines that there is audio blocking in the call process.

If the space occupied by the received audio data packet exceeds a preset buffer space, the determining unit 810 determines that there is audio blocking during the call.

Wherein the detecting module 811 and/or the combining module 821 are specifically configured to:

extracting spectral feature information in the audio data packet;

and determining that the audio data packet comprises an effective audio segment of the human voice and an ineffective audio segment without the human voice according to the frequency spectrum characteristic information and preset human voice frequency spectrum information.

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for playing audio data in a call process as described above is implemented.

An embodiment of the present application further provides a terminal product with a call function, including: the audio data playing device in the call process as described in fig. 9 or 10.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 10 is a block diagram of an electronic device according to an embodiment of the present application, illustrating an audio data playing method during a call. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 10, the electronic apparatus 1000 includes: one or more processors 1001, memory 1002, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 10 illustrates an example of one processor 1001.

The memory 1002 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the audio data playing method in the call process provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the audio data playing method during a call provided by the present application.

The memory 1002, as a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the audio data playing method during a call in the embodiment of the present application (e.g., the determining unit 810, the playing unit 820 shown in fig. 8). The processor 1001 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions, and modules stored in the memory 1002, that is, implements the audio data playing method during a call in the above method embodiment.

The memory 1002 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device 1000 of the audio data playing method during a call, and the like. Further, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1002 may optionally include a memory remotely located from the processor 1001, and such remote memory may be connected over a network to the electronic device 1000 for audio data playback methods during a call. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device 1000 of the audio data playing method in the call process may further include: an input device 1003 and an output device 1004. The processor 1001, the memory 1002, the input device 1003, and the output device 1004 may be connected by a bus or other means, and the bus connection is exemplified in fig. 10.

The input device 1003 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus 1000 for an audio data playback method during a call, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 1004 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An audio data playing method in a call process comprises the following steps:

and playing the effective audio segment in the audio data packet.

2. The method of claim 1, wherein said determining a valid audio segment comprising a human voice in a received audio data packet comprises:

performing effective detection on the effective audio data packet;

according to the detection result, removing the invalid audio segment from the audio data packet to obtain the valid audio segment; wherein, the detection result includes valid information and invalid information of the audio data packet.

3. The method of claim 1, wherein said playing said active audio segment in said audio data packet comprises:

4. The method of claim 1, further comprising:

acquiring a second audio data packet;

the playing the active audio segment in the audio data packet comprises:

combining the effective audio segment and the second audio data packet to obtain combined audio data;

and playing the combined audio data according to a preset multiplying power.

5. The method of claim 4, wherein said combining the second audio data packet from the active audio segment to obtain combined audio data comprises:

6. The method of claim 4, wherein said combining the second audio data packet from the active audio segment to obtain combined audio data comprises:

7. The method according to any one of claims 1 to 6, wherein if the number of audio packets received in a unit time exceeds a predetermined number value, it is determined that there is audio blocking during the call.

8. The method according to any one of claims 1 to 6, wherein if the space occupied by the received audio data packet exceeds a preset buffer space, it is determined that there is an audio blocking condition during the call.

9. The method of claim 2 or 6, wherein the valid detection of the audio data packet comprises:

extracting spectral feature information in the audio data packet;

10. An audio data playing device in a call process comprises:

11. The apparatus of claim 10, wherein the determining unit comprises:

the detection module is used for effectively detecting the effective audio data packet;

the cutting module is used for removing the invalid audio segment from the audio data packet according to the detection result to obtain the valid audio segment; wherein, the detection result includes valid information and invalid information of the audio data packet.

12. The apparatus according to claim 10, wherein the playback unit is specifically configured to:

13. The apparatus of claim 10, further comprising:

an acquisition unit configured to acquire a second audio data packet;

the playback unit includes:

the combination module is used for combining the effective audio segment and the second audio data packet to obtain combined audio data;

and the playing module is used for playing the combined audio data according to a preset multiplying power.

14. The apparatus of claim 13, wherein the combining module is specifically configured to:

15. The apparatus of claim 13, wherein the combining module is specifically configured to:

16. The apparatus according to any one of claims 10-15, wherein the determining unit determines that there is an audio blocking condition during the call if the number of audio packets received per unit time exceeds a preset number value.

17. The apparatus according to any one of claims 10 to 15, wherein the determining unit determines that there is an audio blocking condition during a call if a space occupied by the received audio data packet exceeds a preset buffer space.

18. The apparatus of claim 11 or 15, wherein the detection module and/or the combination module are specifically configured to:

extracting spectral feature information in the audio data packet;

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-9.

22. A terminal having a call function, comprising: an apparatus for playing audio data during a call as claimed in any one of claims 10-18.