CN108449506B

CN108449506B - Voice call data processing method and device, storage medium and mobile terminal

Info

Publication number: CN108449506B
Application number: CN201810201879.4A
Authority: CN
Inventors: 郑志勇; 柳明; 李智豪
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-03-12
Filing date: 2018-03-12
Publication date: 2020-04-17
Anticipated expiration: 2038-03-12
Also published as: CN108449506A

Abstract

The embodiment of the application discloses a voice call data processing method, a voice call data processing device, a storage medium and a mobile terminal. The method comprises the following steps: after a voice call group in a preset application program is successfully established, acquiring first sound data acquired by a microphone of a mobile terminal at the current time period; when the howling prevention processing event is triggered, separating the human voice and the background voice of the first voice data, and weakening the separated background voice; and after the background sound after the weakening processing and the separated voice are subjected to sound mixing processing, first uplink voice call data are obtained, and the first uplink voice call data are sent to a server corresponding to a preset application program. By adopting the technical scheme, after the voice call group of the preset application program in the mobile terminal is successfully established, when the howling prevention processing event is detected to be triggered, the howling prevention processing can be timely carried out on the uplink voice call data of the current mobile terminal.

Description

Voice call data processing method and device, storage medium and mobile terminal

Technical Field

The embodiment of the application relates to the technical field of voice call, in particular to a voice call data processing method, a voice call data processing device, a storage medium and a mobile terminal.

Background

At present, with the rapid popularization of mobile terminals, mobile terminals such as mobile phones and tablet computers have become one of the necessary communication tools for people. Communication modes between mobile terminal users are becoming more and more abundant, and are not limited to traditional telephone and short message services provided by mobile communication operators for a long time, and in many scenarios, users tend to use internet-based communication modes, such as voice chat and video chat functions in various social software.

In addition, the functions of Application programs (APP) in the mobile terminal are increasingly improved, and a voice call function is set in many APP programs, so that communication between users using the same APP program is facilitated. Taking a game application as an example, some games requiring interaction between players have a built-in voice communication function added, and a user can perform voice communication with other players in the process of playing the games by using a mobile terminal. However, in the voice call process, the voice data includes many kinds of voices, such as voices spoken by each player, voices of the application program itself (e.g., background sounds or special effects of a game), and other voices in the environment where the mobile terminal is located, and the voice is relatively complicated, so that a howling phenomenon is easily generated, which seriously affects the use of the user.

Disclosure of Invention

The embodiment of the application provides a voice call data processing method, a voice call data processing device, a storage medium and a mobile terminal, which can perform howling prevention processing in a targeted manner after a voice call function in a mobile terminal application program is started.

In a first aspect, an embodiment of the present application provides a voice call data processing method, including:

after a voice call group in a preset application program is successfully established, acquiring first sound data acquired by a microphone of a mobile terminal at the current time period;

when detecting that the anti-howling processing event is triggered, performing a human voice and background sound separation operation on the first sound data, wherein the separation operation comprises: acquiring each sound source position corresponding to the first sound data; determining a sound source position, of the sound source positions, with a distance to the mobile terminal being smaller than a first preset distance value as a target sound source position; taking the sound corresponding to the target sound source position in the first sound data as a human sound, and taking the sound obtained after the human sound is separated from the first sound data as a background sound;

weakening the separated background sound;

and after the background sound after weakening processing and the separated voice are subjected to sound mixing processing, first uplink voice call data are obtained, and the first uplink voice call data are sent to a server corresponding to the preset application program.

In a second aspect, an embodiment of the present application provides a voice call data processing apparatus, including:

the voice data acquisition module is used for acquiring first voice data acquired by a microphone of the mobile terminal at the current time period after a voice call group in a preset application program is successfully established;

a sound data separation module, configured to, when it is detected that a howling prevention processing event is triggered, perform a human voice and background sound separation operation on the first sound data, where the separation operation includes: acquiring each sound source position corresponding to the first sound data; determining a sound source position, of the sound source positions, with a distance to the mobile terminal being smaller than a first preset distance value as a target sound source position; taking the sound corresponding to the target sound source position in the first sound data as a human sound, and taking the sound obtained after the human sound is separated from the first sound data as a background sound;

the background sound weakening module is used for weakening the separated background sound;

and the sound mixing processing module is used for carrying out sound mixing processing on the weakened background sound and the separated voice to obtain first uplink voice call data and sending the first uplink voice call data to the server corresponding to the preset application program.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a voice call data processing method according to an embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a mobile terminal, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the voice call data processing method according to the embodiment of the present application.

According to the voice call data processing scheme provided in the embodiment of the application, after a voice call group in a preset application program of a mobile terminal is successfully established, first sound data collected by a microphone of the mobile terminal at the current time interval are obtained, when a howling prevention processing event is detected to be triggered, human voice and background sound separation operation is performed on the first sound data according to each sound source position corresponding to the first sound data, the background sound is weakened, the weakened background sound and the separated human voice are subjected to sound mixing processing to obtain uplink voice call data, and the uplink voice call data are sent to a server corresponding to the preset application program. By adopting the technical scheme, after the voice call group of the preset application program in the mobile terminal is successfully established, when the howling prevention processing event is detected to be triggered, the howling prevention processing can be timely carried out on the uplink voice call data of the current mobile terminal, and inconvenience brought to users by howling sound is reduced.

Drawings

Fig. 1 is a schematic flowchart of a voice call data processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another voice call data processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another voice call data processing method according to an embodiment of the present application;

fig. 4 is a block diagram of a voice call data processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of another mobile terminal according to an embodiment of the present application.

Detailed Description

The technical scheme of the application is further explained by the specific implementation mode in combination with the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Fig. 1 is a flowchart illustrating a voice call data processing method according to an embodiment of the present application, where the method may be executed by a voice call data processing apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a mobile terminal. As shown in fig. 1, the method includes:

step 101, after a voice call group in a preset application program is established successfully, acquiring first sound data acquired by a microphone of the mobile terminal at the current time interval.

For example, the mobile terminal in the embodiment of the present application may include mobile devices such as a mobile phone and a tablet computer. The preset application may be an application with built-in voice group call function, such as a network game application, an online classroom application, a video conference application, or other applications that require multi-person collaboration, and so on.

For example, the voice call group may include 2 members, but in most cases, the voice call group generally includes 3 or more than 3 members, that is, voice calls between 3 or more than 3 mobile terminals can be realized. The voice talk group can be established by user initiation using a preset application program on the mobile terminal, and after the voice talk group is established successfully, all the mobile terminals included in the voice talk group can communicate with each other. Generally, when the mobile terminal is not in the mute mode or the earphone mode, it may be understood that the mobile terminal is in the play-out mode, and the sound of each user in the voice call group is collected by the microphone of the mobile terminal being used by the user, and is played through the speakers of the mobile terminals of other users after being transmitted and processed through the network. Taking game application as an example, if team formation is needed to cooperate, team formation voice function can be started, and if 5 players exist in a team, after a voice call group is successfully established, the 5 players can talk with each other, and any one player can simultaneously hear the words spoken by the other 4 players, so that the game can be conveniently played while communicating as if the other 4 players speak at the same time.

Generally, when the mobile terminal is in the play-out mode, the sound collected by the microphone of the mobile terminal not only includes the voice of the user speaking, but also may include the sound emitted by the preset application program played by the speaker, such as background music, etc., and may include ambient sounds, and may also include sounds played by speakers that are spoken by others in the voice talkgroup, and, as such, when a plurality of mobile terminals send data including various sounds collected by the respective mobile terminals to the same mobile terminal through a network (for example, 5 mobile terminals are included in a voice call group, 4 of the mobile terminals send the sound collected by the respective mobile terminals to a server, and the server sends the sound data of the 4 mobile terminals to a 5 th mobile terminal), these sounds may be mixed and played in the mobile terminal, thereby generating a howling phenomenon.

In the embodiment of the present application, the current time period may be understood as a time length formed by tracing back a preset time period from the current time. The length of the current time period may be determined according to factors such as the configuration of the mobile terminal, the data processing capability, and the requirement of the voice call on timeliness, and the embodiment of the present application is not limited. For example, it may be 300 ms, or may be any time length between 100 ms and 1 s.

For example, the first sound data may include sound data of a user speaking corresponding to the current mobile terminal collected by the user in the current time period, and may also include environmental sound of an environment in which the mobile terminal is currently located. Of course, when the speaker or the receiver of the mobile terminal plays the downlink voice call data, the user can hear the downlink voice call data, and in addition, the microphone of the mobile terminal can also acquire the downlink voice call data, that is, the first voice data also includes the voice data corresponding to the downlink voice call data. Of course, if the downlink voice call data includes a howling sound, the first sound data collected by the microphone of the mobile terminal also includes the howling sound.

And 102, when the howling prevention processing event is triggered, carrying out human voice and background voice separation operation on the first voice data.

Wherein the separating operation comprises: acquiring each sound source position corresponding to the first sound data; determining a sound source position, of the sound source positions, with a distance to the mobile terminal being smaller than a first preset distance value as a target sound source position; and taking the sound corresponding to the target sound source position in the first sound data as a human sound, and taking the sound obtained after the human sound is separated from the first sound data as a background sound.

In the embodiment of the present application, in order to perform howling prevention processing at an appropriate timing, a condition that an event of the howling prevention processing is triggered may be set in advance. In order to effectively perform howling prevention processing in time, a howling prevention processing event can be triggered immediately after a voice call group in a preset application program is successfully established; optionally, in order to perform howling prevention more specifically and save extra power consumption caused by howling prevention processing operation, theoretical analysis or research and the like can be performed on scenes in which howling is likely to occur, a reasonable preset scene is set, and when the mobile terminal is detected to be in the preset scene, a howling prevention processing event is triggered.

In the embodiment of the application, when it is detected that the howling prevention processing event is triggered, the separation operation of the human voice and the background voice is performed on the first sound data, so that a basis is laid for the howling prevention processing on the first sound data. Wherein, the operation of separating the voice and the background sound is carried out on the first sound data, and the operation comprises the following steps: and obtaining each sound source position corresponding to the first sound data, determining a sound source position, of each sound source position, of which the distance from the mobile terminal is smaller than a first preset distance value as a target sound source position, taking the sound corresponding to the target sound source position in the first sound data as a human sound, and taking the sound obtained after the human sound is separated from the first sound data as a background sound. This has the advantage that the human voice and background sounds contained in the first sound data can be separated quickly.

For example, a microphone array (the number of microphones is greater than or equal to 2) is usually present in the mobile terminal, and the position of each sound source corresponding to the second sound data collected by the microphone, that is, the sound source position, can be accurately determined by the microphone array. And taking a sound source with a distance from the current mobile terminal smaller than a first preset distance value as a target sound source, and taking the sound corresponding to the target sound source as a voice. For example, the position of the user and the mobile terminal is usually fixed, and the distance between the user and the mobile terminal is also short, so that the first preset distance value may be set to 0.5 meter or 1 meter, a sound source that is short (for example, less than 1 meter or less than 0.5 meter) from the mobile terminal is selected from the sound source positions as a target sound source, a sound emitted by the target sound source is used as a human voice, and a sound obtained by separating the human voice from the first sound data is used as a background sound. It should be noted that, in the embodiments of the present application, a manner of determining each sound source position corresponding to the first sound source data is not specifically limited. For example, the first sound data may be subjected to a spectrum analysis to determine respective sound source frequencies and corresponding bandwidths included in the first sound data, and then respective sound source positions may be determined based on the sound source frequencies and the corresponding bandwidths.

And 103, weakening the separated background sound.

For example, the attenuating process for the separated background sound may be to reduce the sound of the background sound by adjusting the gain, or to filter the background sound. After the background sound is weakened, the volume is reduced, the condition that the sound is larger and larger is destroyed, and then howling caused by the background sound is effectively weakened.

And 104, performing sound mixing processing on the weakened background sound and the separated human voice to obtain first uplink voice call data, and sending the first uplink voice call data to a server corresponding to the preset application program.

In the embodiment of the application, the attenuated background sound and the separated human voice are subjected to sound mixing processing, that is, the first voice data is subjected to howling prevention processing, so that first uplink voice call data is obtained. It can be understood that the first voice data after the howling prevention processing is used as first uplink voice call data, and the first uplink voice call data is sent to a server corresponding to a preset application program. The advantage of the setting is that when the server sends the uplink voice call data to other mobile terminals in the voice call group, howling sound caused by background sound in the sound data received by other mobile terminals can be effectively avoided.

According to the voice call data processing method provided by the embodiment of the application, after a voice call group in a preset application program of a mobile terminal is successfully established, first sound data collected by a microphone of the mobile terminal at the current time interval are obtained, when a howling prevention processing event is detected to be triggered, human voice and background sound separation operation is carried out on the first sound data according to each sound source position corresponding to the first sound data, the background sound is weakened, the weakened background sound and the separated human voice are subjected to sound mixing processing to obtain uplink voice call data, and the uplink voice call data are sent to a server corresponding to the preset application program. By adopting the technical scheme, after the voice call group of the preset application program in the mobile terminal is successfully established, when the howling prevention processing event is detected to be triggered, the howling prevention processing can be timely carried out on the uplink voice call data of the current mobile terminal, and inconvenience brought to users by howling sound is reduced.

In an application scenario of multi-user voice, the inventor finds that although attenuation processing is performed on background sound in sound data acquired by a microphone of a mobile terminal at a current time, howling is still likely to occur when similarity between the sound data subjected to attenuation processing and sound data subjected to attenuation processing at a previous time is high, that is, when similarity between uplink voice call data corresponding to the current time and uplink voice call data corresponding to the previous time, which are acquired by the mobile terminal, is high. The method comprises the steps that the current mobile terminal always plays the same sound data in a certain time period and sends the sound data to a server, the server sends the same sound data collected in the time period to other mobile terminals in a voice call group, and when the same sound data are played by other mobile terminals, sound is amplified in a doubling mode due to the superposition of the sound data, so that a howling sound is generated. Therefore, in the embodiment of the present application, it is necessary to further determine the similarity between the uplink voice call data corresponding to the current time period and the uplink voice call data corresponding to the previous time period in the mobile terminal, and determine whether the similarity between the uplink data corresponding to the two adjacent time periods is greater than a certain similarity threshold, and if so, perform howling prevention processing on the uplink voice call data corresponding to the current time period. Optionally, the uplink voice call data corresponding to the current time period in the mobile terminal may also be superimposed in a simulation manner with the uplink voice call data corresponding to the previous time period, that is, the uplink data corresponding to two adjacent time periods are simultaneously played in a simulation manner, whether a howling sound exists in the superimposed sound data is detected, and if the howling sound exists, the howling prevention processing needs to be further performed on the uplink voice call data corresponding to the current time period.

In some embodiments, the sending the first uplink voice call data to the server corresponding to the preset application program includes: comparing the first uplink voice call data with second uplink voice call data stored in advance, and determining the similarity between the first uplink voice call data and the second uplink voice call data, wherein the second uplink voice call data is uplink voice call data obtained by the mobile terminal in a last time period; when the similarity is larger than a preset similarity value, performing howling prevention processing on the first uplink voice call data to obtain target uplink voice call data; and sending the target uplink voice call data to a server corresponding to the preset application program. The method has the advantages that the similarity of the uplink voice call data acquired in two adjacent time periods can be simply and quickly determined, and whether the uplink voice call data corresponding to the current time period needs to be subjected to howling prevention processing or not can be quickly determined.

Illustratively, second uplink voice call data stored in advance is obtained, where the second uplink voice call data may be stored in a buffer area of an uplink voice channel of the mobile terminal, and the second uplink voice call data is a sound segment obtained by attenuating a background sound in sound data acquired by a microphone of the mobile terminal at a previous time, that is, the uplink voice call data obtained by the mobile terminal at the previous time. It is understood that the second uplink voice call data is not fixed, but is updated every preset time period, for example, every 300 milliseconds, that is, the second uplink voice call data is data after the background sound attenuation processing in the last sound data collected in 300 milliseconds. For example, the first uplink voice call data is the current uplink voice call data of 300 milliseconds, and the similarity between the first uplink voice call data and the second uplink voice call data is determined by comparing the sound data of the first uplink voice call data with the previous uplink voice call data of 300 milliseconds. The first uplink voice call data can be used as a whole to be compared with the second uplink voice call data as a whole, and the comparison result is used as the similarity of two adjacent uplink data. The greater the similarity, the more similar the first voice count uplink voice call data and the second uplink voice call data are, that is, the more the same or similar voice contents are included in the two uplink voice call data and the second uplink voice call data. And when the similarity (such as 0.7) between the first uplink voice call data and the second uplink voice call data is greater than a preset similarity threshold (such as 0.5), performing anti-howling processing on the first uplink voice call data to obtain target uplink voice call data, and sending the target uplink voice call data to a server corresponding to a preset application program.

Optionally, comparing the first uplink voice call data with second uplink voice call data stored in advance, and determining a similarity between the first uplink voice call data and the second uplink voice call data includes: the first uplink voice call data is processed in a blocking mode, and each data block is compared with the second uplink voice call data to obtain sub-similarity corresponding to the data block; summing the sub-similarity to obtain the similarity of the first uplink voice call data and the second uplink voice call data; the performing howling prevention processing on the first uplink voice call data includes: determining a preset number of target audio signals with larger sub-similarity in the first uplink voice call data; and performing weakening processing or removing processing on the target audio signal. The advantage of this arrangement is that the similarity between the first uplink voice call data and the second uplink voice call data can be accurately determined.

For example, the first uplink voice call data is processed in blocks according to a preset unit length, which may be 30 ms. Assuming that the first uplink voice call data is a sound segment within the current 300 ms time period, and the preset unit length is 30 ms, the first uplink voice call data may be divided into 10 data blocks. And comparing the 10 data blocks with the second uplink voice call data respectively to obtain corresponding 10 sub-similarities. And taking the sum of the 10 sub-similarities as the similarity of the first uplink voice call data and the second uplink voice call data. Of course, the average value of the sub similarities may also be used as the similarity between the first uplink voice call data and the second uplink voice call data. When the similarity between the first uplink voice call data and the second uplink voice call data is higher (greater than a preset similarity value), determining a target audio signal with a larger preset number of sub-similarities in the first sound data, and weakening or removing the target audio signal. And taking the first uplink voice call data subjected to the attenuation processing or the removal processing on the target audio signal as target uplink voice call data, and sending the target uplink voice call data to a server corresponding to a preset application program. The preset number may be set according to actual requirements, for example, the preset number may be 3. Assuming that the first uplink voice call data is divided into 10 data blocks, correspondingly, 10 sub-similarities can be obtained, and three similarities with the maximum similarity are selected from the 10 sub-similarities as the target similarity. And taking data corresponding to the target similarity in the first uplink voice call data as a target audio signal, and weakening or removing the target audio signal. The attenuation process may be to reduce the sound of the target audio signal by adjusting the gain, and the removal process may be to directly filter out the target audio signal.

In some embodiments, the sending the first uplink voice call data to the server corresponding to the preset application program includes: performing analog superposition processing on the first uplink voice call data and second uplink voice call data which are stored in advance to obtain analog voice call data, wherein the second uplink voice call data are uplink voice call data obtained by the mobile terminal in the last time period; when the fact that a howling point exists in the simulated voice call data is determined, performing howling prevention processing on the first uplink voice call data to obtain target uplink voice call data; and sending the target uplink voice call data to a server corresponding to the preset application program. The method has the advantages that whether howling sound is generated or not when the uplink voice call data corresponding to the two adjacent time periods are played can be accurately determined, and whether further howling prevention processing needs to be carried out on the uplink voice call data corresponding to the current time period or not can be accurately determined.

Illustratively, the first uplink voice call data corresponding to the current time period and the second uplink voice call data corresponding to the previous time period are subjected to superposition processing to obtain analog sound data. Judging whether the analog sound data contains howling characteristics or not, and if so, determining that a howling point exists in the analog sound data. The howling characteristics may include energy concentration, periodicity, and frequency higher than a preset frequency threshold. The howling detection may also be performed on the superimposed data to determine whether a howling point exists in the analog sound data, that is, whether a howling sound exists in the analog sound data.

In some embodiments, whether a howling sound exists in the analog sound data may be determined as follows:

firstly, block processing is carried out on the analog sound data; for each data block, determining suspected howling points existing in the current data block by adopting a preset analysis mode; when a plurality of suspected howling point groups presenting periodic characteristics exist and the energy values corresponding to the suspected howling points are in an ascending trend according to the sequence of the data blocks, determining that howling sound exists in the analog sound data; the suspected howling point group is a suspected howling point of which the frequency difference in the continuous adjacent data blocks is within a preset range, and the number of the continuous adjacent data blocks reaches a preset continuous threshold value.

Secondly, block processing is carried out on the analog sound data to obtain M data blocks; whether suspected squeaking points exist in the current data blocks is sequentially analyzed in a preset analysis mode, and the data blocks with the suspected squeaking points appearing for the first time are determined as initial data blocks; sequentially taking N data blocks as data segments to be analyzed from the initial data block, analyzing suspected howling points contained in the current data segment by adopting the preset analysis mode, and determining that howling sound exists in the analog sound data when the frequency difference between the suspected howling points contained in the N data segments is within a preset range; wherein N is 2, 3, …, N; n is less than or equal to M and greater than or equal to 2; the starting point of each data segment is the same as the starting point of the starting data block, and the starting data block is the first data segment.

Of course, in the embodiment of the present application, other manners may also be used to determine whether there is a howling sound in the analog sound data, and the present application is not limited thereto. The following will explain the above two modes in detail by way of example.

With the first mode, the block processing of the analog sound data may be the block processing in a preset unit length, which may be 40 milliseconds, for example. Assuming that the time length corresponding to the obtained analog sound data is 600 milliseconds and the preset unit length is 40 milliseconds after the first sound data and the second sound data are subjected to superposition processing, the analog sound data can be divided into 15 data blocks.

The preset analysis method is not particularly limited in the embodiment of the present application. For example, the preset analysis manner may include: acquiring a frequency point to be judged, of which the energy value is higher than a preset energy threshold value, in a high-frequency region on a frequency domain, calculating energy difference values of a preset number of frequency points around the frequency point to be judged, and determining the frequency point to be judged as a suspected howling point when the energy difference values are larger than the preset difference threshold value; the high-frequency region is a frequency range with frequency higher than a preset frequency threshold.

Specifically, for the current data block, it may be first transformed from the time domain to the frequency domain, which facilitates the spectral analysis. Transform mode the embodiment of the present application is not limited, and a fourier transform mode, such as Fast Fourier Transform (FFT), may be adopted. Taking 40ms as an example, the size of 40ms audio data (16bit,16K sampling rate) is 40 × 16 × 16/2-1280 bytes, which is suitable for performing spectrum analysis by using 1024 as FFT transformation, and the frequency range in the frequency analysis after FFT processing is 0-16K/2, the step size is (16K/2)/1024, and the step size is about 8 Hz.

In the embodiment of the present application, a frequency threshold may be preset as a boundary value to divide the high-frequency region and the other regions. The preset frequency threshold can be set according to actual conditions, for example, the preset frequency threshold can be set according to the frequency of human voice and the frequency characteristics of easy occurrence of howling, and can be 1KHz, 1.5KHz, 2KHz or the like. For example, the preset frequency threshold is 2KHz, that is, the part greater than 2KHz is a high frequency region. Generally, the frequency of the howling sound appears in a high-frequency area, and the sound is relatively large (i.e. the energy value is relatively high), and the suspected howling point in one data block can be quickly determined according to the distribution characteristics of the energy value.

For example, an energy value corresponding to each frequency point (frequency point for short) in a data block is obtained, then a frequency point to be determined, of which the energy value is higher than a preset energy threshold value, is found from a high-frequency region, and energy difference values of a preset number of frequency points around the frequency point to be determined are calculated. The preset energy threshold and the preset number can be set according to actual requirements, for example, the preset energy threshold can be-10 dB, and the preset number can be 8 (4 in front of and 4 behind the frequency point to be determined). Taking the step size of about 8Hz as an example, assuming that the frequency value of the frequency point to be determined is 3362Hz, the frequency values of the frequency points around the frequency point to be determined are about 3330Hz, 3338Hz, 3346Hz, 3354Hz, 3370Hz, 3378Hz, 3386Hz, and 3394 Hz. The energy difference value is used for measuring the degree of difference between the frequency point to be determined and the frequency points with the preset number around, specifically, the difference value may be a maximum energy value and a minimum energy value, and may also be an energy variance value or an energy mean variance value, and the like, and the application is not limited. The preset difference threshold corresponds to the energy difference value, for example, when the energy difference value is an energy variance value, the preset difference threshold is a preset variance threshold. When the energy difference value is larger than the preset difference threshold value, it is indicated that the frequency point to be determined is more prominent and is very likely to be a howling point, and therefore, the frequency point to be determined is determined to be a suspected howling point. The suspected howling point can be quickly and accurately identified by the arrangement, and a foundation is laid for further judging whether the howling sound exists in the analog sound data.

For example, a plurality of frequency points to be determined may exist in one data block, and the suspected howling point may be determined from the frequency point to be determined with the highest corresponding energy.

In addition, the preset analysis mode may further include: the method comprises the steps of obtaining a first frequency point with the largest energy value in a high-frequency area and a second frequency point with the largest energy value in a low-frequency area on a frequency domain, and when the first frequency point meets a preset suspected howling condition, determining that the first frequency point is a suspected howling point in a current data block, wherein the preset suspected howling condition comprises that the energy value of the first frequency point is larger than a preset energy threshold value, and the energy difference value between the first frequency point and the second frequency point is larger than a preset difference threshold value.

Specifically, for the current data block, it may be first transformed from the time domain to the frequency domain, which facilitates the spectral analysis. It is also possible to preset the division frequency as a boundary value to divide the high frequency region and the low frequency region. The preset division frequency can be set according to actual conditions, for example, the preset division frequency can be set according to the frequency of human voice and the frequency characteristics of easy occurrence of howling, and can be 1KHz, 1.5KHz, 2KHz and the like. For example, the preset division frequency is 2KHz, that is, the part greater than 2KHz is a high frequency region, and the part less than or equal to 2KHz is a low frequency region.

Illustratively, an energy value corresponding to each frequency point in the data block is obtained, then a first frequency point with the largest energy value is found from the high-frequency region, a second frequency point with the largest energy value is found from the low-frequency region, and if the energy value of the first frequency point is greater than a preset energy threshold (e.g., -30dB), and the difference between the energy value of the first frequency point and the energy value of the second frequency point is greater than a preset difference threshold (e.g., 60), the first frequency point can be considered as a suspected howling point in the current data block. The suspected howling point can be quickly and accurately identified by the arrangement, and a foundation is laid for further judging whether the howling sound exists in the analog sound data.

Illustratively, for each data block, the above preset analysis manner is respectively adopted to determine whether a suspected howling point exists, if so, the suspected howling point is recorded, and whether the current analog sound data contains howling sound is further determined.

It is understood that if a suspected howling sound exists in a certain data block, the whole analog sound data cannot be considered to contain the howling sound, and may also be mistakenly recognized as the howling sound because some special sounds are mistakenly recognized, for example, the first sound data or the second sound data contain harsh sounds generated by object friction, which are generally high in frequency and large in sound and are likely to be recognized as the suspected howling sound, but the sounds are generally short and short in duration and do not belong to the howling sound, and therefore, further determination needs to be added.

In the embodiment of the application, the distribution characteristics of suspected howling sounds existing in each data block are analyzed. When there are suspected howling points with small frequency difference in a plurality of consecutive adjacent data blocks, the suspected howling points can be set as a group of suspected howling points. Namely, the suspected howling point group is a suspected howling point in which the frequency difference between the consecutive adjacent data blocks is within a preset range, and the number of the consecutive adjacent data blocks reaches a preset consecutive threshold. The preset continuous threshold value can be determined according to actual conditions, for example, 3; the preset range corresponding to the frequency difference can also be determined according to actual conditions, such as 40 Hz. The inventors found that howling generally exhibits a persistent characteristic in a short time and occurs periodically, and further, the sound gradually becomes louder. Therefore, in the embodiment of the present application, whether howling sound exists in the current analog sound data is identified by using a plurality of (which may be understood as being greater than or equal to 2) groups of suspected howling points exhibiting a periodic characteristic and an energy value corresponding to the suspected howling points appearing in an ascending trend according to the order of the belonging data blocks as a determination condition, and if the condition is satisfied, the presence of howling sound is determined, so that the howling sound can be identified quickly and accurately.

For example, it is assumed that analog sound data is divided into 15 data blocks. For example, when pseudo-howling points having a frequency within an (a-40, a +40) interval are detected in all 10 data blocks of 1 st, 2 nd, 3 th, 5 th, 7 th, 8 th, 9 th, 13 th, 14 th, and 15 th data blocks, a pseudo-howling point group corresponding to each 2 data blocks is formed, 5 pseudo-howling point groups have a periodic characteristic, and energy values corresponding to the pseudo-howling points sequentially increase, and thus it is determined that the simulated sound data includes howling. For another example, if a pseudo-howling point having a frequency within a (B-40, B +40) interval is detected in only 2 data blocks of 1 st and 2 nd, the pseudo-howling points corresponding to the 2 data blocks become a pseudo-howling point group, but only one pseudo-howling point group exists and a periodic feature is not present, and thus it is determined that the analog sound data does not include a howling sound.

For the second manner, the related content in the first manner may be referred to as the block processing manner and the preset analysis manner, and the embodiment of the present application is not described again.

Specifically, whether a suspected howling point exists in the first data block is analyzed in the preset analysis mode, if yes, the suspected howling point appears for the first time, and the first data block is determined as a starting data block; and if the suspected howling point does not exist, taking the next data block of the current data block as a new current data block, and analyzing whether the suspected howling point exists in the new current data block by adopting the preset analysis mode. And repeating the steps until the data block with the suspected howling point is determined as the initial data block, and if the suspected howling point does not exist in the M data blocks, determining that the current analog sound data does not contain the howling sound.

Taking the above blocking manner as an example, M is 15, and N is 2 ≦ 15. When the spectrum analysis is carried out, the length of the data to be analyzed influences the analysis result, and the accuracy may not be too accurate when the data points are few, so that the data with larger length is used for carrying out the analysis again, which is equivalent to a correction process, and whether the howling is carried out or not can be determined more accurately. The specific value of N is not limited in the present application, and assuming that N is 4 and the length of one data block is 40ms, the time range of the start data block may be recorded as 0 to 40ms, since the start data block has been analyzed and is used as the first data segment, the time range of the second data segment may be recorded as 0 to 80ms, and so on, the time range of the third data segment may be recorded as 0 to 120ms, and the time range of the third data segment may be recorded as 0 to 160ms, starting from N is 2.

Illustratively, the preset range may be set according to actual conditions, and may be, for example, 40Hz (as exemplified above, it may be considered to be equivalent to 5 steps). Assuming that the frequencies of suspected howling points analyzed by the 4 data segments are A, B, C and D, respectively, and A, B, C and D are within 40Hz, it is determined that howling sound exists in the analog sound data.

And when the analog sound data is determined to have howling sound, determining the suspected howling point as a howling point.

In some embodiments, the performing howling prevention processing on the first uplink voice call data includes: and performing weakening processing or removing processing on the audio signal corresponding to the howling point in the first uplink voice call data. The advantage of this arrangement is that the howling point in the uplink voice call data can be accurately weakened, and other effective or important sound data in the uplink voice call data can be prevented from being weakened or filtered.

In some embodiments, when the similarity is greater than a preset similarity value, performing howling prevention processing on the first uplink voice call data to obtain target uplink voice call data, including: and when the similarity is larger than a preset similarity value, performing simulated superposition processing on the first uplink voice call data and second uplink voice call data stored in advance to obtain simulated voice call data, and when a howling point exists in the simulated voice call data, performing howling prevention processing on the first uplink voice call data to obtain target uplink voice call data. The advantage of this arrangement is that it can be quickly and accurately determined whether further anti-howling processing is required for the first uplink voice call data.

In some embodiments, the detecting that the anti-howling processing event is triggered comprises: and when a target mobile terminal with the distance between the target mobile terminal and the mobile terminal being smaller than a second preset distance value exists in the voice call group, determining that the howling prevention processing event is triggered. In the application scenario of multi-person voice, the inventor finds that howling is very easy to occur when the distance between two mobile terminals is relatively close. Supposing that the mobile terminal A and the mobile terminal B in the voice call group are close to each other, the loudspeaker of the mobile terminal A amplifies and plays the received sound collected by the microphone of the mobile terminal B, and because the two mobile terminals are close to each other, the sound is collected again by the microphone of the mobile terminal B and is sent to the mobile terminal A, the sound is amplified and played continuously, positive feedback amplification of the sound is easily formed, and howling sound is generated. Therefore, in the embodiment of the application, it may be determined whether there is a closer distance between one other mobile terminal and the current mobile terminal in the voice call, and if so, the howling prevention processing event is triggered, and it is further detected that the howling prevention processing event is triggered. The second preset distance value may be, for example, 20 meters or 10 meters, and may be set according to actual requirements.

In the embodiment of the present application, there may be many specific ways for determining whether there is a target mobile terminal in the voice call group whose distance to the mobile terminal is smaller than the second preset distance value, and the specific ways are not limited, and several ways are given as schematic illustrations below.

1. Playing a preset sound segment in a preset mode, and receiving feedback information of other mobile terminals in the voice call group, wherein the feedback information comprises a result of the other mobile terminals trying to acquire sound signals corresponding to the preset sound segment; and judging whether a target mobile terminal with the distance between the target mobile terminal and the mobile terminal being smaller than a second preset distance value exists in the voice call group or not according to the feedback information.

The method has the advantages that whether the target mobile terminal exists can be judged quickly and accurately, and whether the howling prevention processing event needs to be triggered or not can be further quickly determined. Illustratively, a prerecorded or prerequished sound clip may be played through a speaker at a preset volume; or playing the ultrasonic wave segments with preset frequency and preset intensity by the ultrasonic wave transmitter. The preset volume, or the preset frequency and the preset intensity can be set according to the second preset distance value. The result included in the feedback information may indicate whether the other mobile terminal can collect the sound signal. When other mobile terminals can acquire the sound signals corresponding to the preset sound segments, the distance between the two mobile terminals is smaller than a second preset distance value. The feedback information can be forwarded by a server corresponding to the preset network game application program. In addition, the feedback information may further include attribute information of the collected sound signal, such as sound intensity, and since the intensity of the sound played by the mobile terminal is known, the sound may be attenuated along with the propagation of the sound, the farther the propagation distance is, the higher the attenuation degree is, the distance between the other mobile terminal and the current mobile terminal may be determined according to the intensity information of the sound signal in the feedback information, and whether the distance is smaller than a second preset distance value may be determined.

2. Acquiring first positioning information of the mobile terminal and second positioning information of other mobile terminals in the voice call group; and judging whether a target mobile terminal with the distance between the target mobile terminal and the mobile terminal being smaller than the second preset distance value exists in the voice call group or not according to the first positioning information and the second positioning information.

The method has the advantages that the mobile terminal generally has a positioning function, and can quickly and accurately judge whether the target mobile terminal exists by utilizing the positioning information, so as to quickly determine whether the howling prevention processing event needs to be triggered. For example, the mobile terminal may obtain the Positioning information through a Global Positioning System (GPS) or a Beidou satellite System, or may obtain the Positioning information through a base station Positioning or a network Positioning. The positioning information may include latitude and longitude coordinates, etc. The second positioning information of other mobile terminals in the voice call group can be forwarded to the current mobile terminal through a server corresponding to the preset network game application program. The current mobile terminal compares the first positioning information of the current mobile terminal with at least one second positioning information forwarded by the server one by one, and judges whether the distance between the second positioning information and the first positioning information is smaller than a second preset distance value.

3. Acquiring first WiFi information connected with the mobile terminal and second WiFi information connected with other mobile terminals in the voice call group; and judging whether a target mobile terminal with the distance between the target mobile terminal and the mobile terminal being smaller than the second preset distance value exists in the voice call group or not according to the first WiFi information and the second WiFi information.

The method has the advantages that in order to save traffic cost, a user generally adopts a mode of connecting the WiFi hotspot to carry out voice call, and can quickly and accurately judge whether the target mobile terminal exists or not by utilizing the characteristic, so as to quickly determine whether a howling prevention processing event needs to be triggered or not. For example, the WiFi information may include attribute information of the WiFi hotspot, and the attribute information may be, for example, a name of the WiFi hotspot, a Media Access Control (MAC) address of the WiFi hotspot, and the like, and may further include WiFi signal strength, and the like. Generally, the effective signal range of the WiFi hotspot is limited, generally about 50 meters, if the second preset distance value is greater than the effective signal range of the WiFi hotspot, it may be determined whether a target mobile terminal whose distance from the mobile terminal is smaller than the second preset distance value exists in the voice call group according to whether WiFi hotspot attribute information of the second WiFi information is the same as the WiFi hotspot attribute information of the first WiFi information exists, and if any WiFi hotspot attribute information of the second WiFi information is the same as the WiFi hotspot attribute information of the first WiFi information exists, it is determined that a target mobile terminal exists in the voice call group, that is, when one other mobile terminal in the voice call group is connected with the current mobile terminal at the same WiFi hotspot, the other mobile terminal may be considered as the target mobile terminal. In addition, if the second preset distance value is smaller than the effective signal range of the WiFi hotspot, for example, 10 meters, the distances between the mobile terminals connected to the same WiFi hotspot and the WiFi hotspot can be further estimated according to the WiFi signal strength, so as to determine the distance between the two mobile terminals, and determine whether the distance is smaller than the second preset distance value.

4. Acquiring third sound data acquired by a microphone and acquiring downlink voice call data in the mobile terminal; wherein, the third sound data does not contain the sound played by the loudspeaker of the mobile terminal; and judging whether a target mobile terminal with the distance between the target mobile terminal and the mobile terminal being smaller than the second preset distance value exists in the voice call group or not according to whether the third voice data and the downlink voice call data contain the voice of the same person or not.

The advantage of this arrangement is that it can quickly and accurately determine whether there is a target mobile terminal without using other information (such as the above positioning information or WiFi information), and then quickly determine whether to trigger the anti-howling processing event. Illustratively, the third sound data does not include the sound played by the speaker of the mobile terminal, and the third sound data may be implemented by: the loudspeaker of the mobile terminal is in a closed state in the process of acquiring the third voice data and the downlink voice call data; or the loudspeaker of the mobile terminal is in an open state in the process of acquiring the third sound data and the downlink voice call data, and the third sound data is sound data obtained by filtering sound data played by the loudspeaker from all sound data acquired by the microphone. When two users hold the mobile terminal and the distance is close, it is assumed that the first user uses the first mobile terminal, the second user uses the second mobile terminal, the voice of the first user is collected by a microphone of the first mobile terminal and sent to the second mobile terminal, the downlink voice call data of the second mobile terminal comprises the voice of the first user, and the voice of the first user is collected by the microphone of the second mobile terminal due to the close distance between the first user and the second user, so that for the second mobile terminal, the third voice data collected by the microphone and the obtained downlink voice call data comprise the voice of the same person (the first user), and the fact that the distance between the first mobile terminal and the second mobile terminal in a voice call group is smaller than a second preset distance value is determined, namely for the second mobile terminal, the first mobile terminal is a target mobile terminal.

It can be understood that any one or a combination of multiple manners described above may be selected according to actual situations to determine whether the target mobile terminal exists, and the embodiment of the present application is not limited. In addition, the step of judging whether the target mobile terminal exists or not can be completed by a server corresponding to the preset network game application program, when the server judges that the target mobile terminal exists, a judgment result is sent to the mobile terminal, and the judgment result is used for indicating the mobile terminal to trigger the howling prevention processing event. Correspondingly, the method in the embodiment of the present application further includes receiving a determination result sent by a server corresponding to the preset network game application program, and triggering an anti-howling processing event when the determination result includes the following contents: and a target mobile terminal with the distance to the mobile terminal being smaller than a second preset distance value exists in the voice call group. The specific determination process of the server may refer to the above-mentioned several determination methods, which are not described in detail in this embodiment of the present application.

Optionally, before the mixing the attenuated background sound and the separated voice to obtain the first uplink voice call data, the method further includes: carrying out enhancement processing on the separated human voice; after the background sound after the attenuation processing and the separated voice are subjected to the sound mixing processing, first uplink voice call data are obtained, which includes: and performing sound mixing processing on the weakened background sound and the strengthened human voice to obtain first uplink voice call data. The method has the advantages that the voice call group can enable other users in the voice call group to more clearly hear the voice of the user corresponding to the current mobile terminal while ensuring that the uplink voice call data of the current mobile terminal is timely subjected to howling prevention processing.

Fig. 2 is a schematic flow chart of another voice call data detection method according to an embodiment of the present application, where a preset application is taken as an example of an online game application, the method includes the following steps:

step 201, detecting that the voice call group in the preset game application is successfully established.

For example, in the case of a team fighting game, such as royal, where each team has 5 players, the two teams of red and blue fight, and 5 players of each team need to communicate with each other to exchange a strategy of fighting the amount of business, many players may choose to open the in-team voice call function, for example, after one player applies for opening the in-team voice call function, the voice call group is successfully established. Thereafter, any one of the 5 players of the same team may hear the voice of the remaining 4 players speaking. Generally, a player sets the mobile terminal to a play-out mode, which facilitates a game.

Step 202, acquiring first sound data acquired by a microphone of the mobile terminal in the current time period.

Step 203, judging whether a target mobile terminal with the distance between the target mobile terminal and the mobile terminal being smaller than a second preset distance value exists in the voice call group, if so, executing step 204; otherwise, step 203 is repeated.

And step 204, acquiring each sound source position corresponding to the first sound data.

And step 205, determining the sound source position with the distance to the mobile terminal smaller than the first preset distance value in each sound source position as the target sound source position.

Step 206, regarding the sound corresponding to the target sound source position in the first sound data as a human voice, and regarding the sound obtained by separating the human voice from the first sound data as a background sound.

And step 207, weakening the separated background sound.

And step 208, performing sound mixing processing on the weakened background sound and the separated human voice to obtain first uplink voice call data.

And 209, performing blocking processing on the first uplink voice call data, and comparing each data block with the second uplink voice call data to obtain the sub-similarity corresponding to the data block.

Step 210, summing the sub-similarities to obtain the similarity between the first uplink voice call data and the second uplink voice call data.

And the second uplink voice call data is the uplink voice call data obtained by the mobile terminal in the last time period.

Step 211, determine whether the similarity is greater than a preset similarity value, if so, execute step 212, otherwise, execute step 214.

Step 212, determining a preset number of target audio signals with larger sub-similarity in the first uplink voice call data.

And step 213, weakening or removing the target audio signal in the first uplink voice call data to obtain target uplink voice call data, and sending the target uplink voice call data to a server corresponding to a preset application program.

And step 214, sending the first uplink voice call data to a server corresponding to a preset application program.

According to the embodiment of the application, after a voice call group in game application is successfully established, after first uplink voice call data corresponding to a current time period is obtained, the first uplink voice call data are processed in a blocking mode, each data block is compared with second uplink voice call data corresponding to a pre-stored previous time period, the similarity between the first uplink voice call data and the second uplink voice call data is determined, when the similarity is high, a preset number of target audio signals with high sub-similarity in the first uplink voice call data are weakened or removed, and the uplink voice call data of the current mobile terminal are processed in a howling prevention mode in time, so that interference of the howling sound to a game process is avoided, pain spots of game players are reduced, and functions of the mobile terminal are more complete.

Fig. 3 is a schematic flow chart of another voice call data detection method according to an embodiment of the present application, where a preset application is taken as an example of an online game application, the method includes the following steps:

step 301, detecting that the voice call group in the preset game application is successfully established.

Step 302, acquiring first sound data acquired by a microphone of the mobile terminal in the current time period.

Step 303, judging whether a target mobile terminal with a distance to the mobile terminal smaller than a second preset distance value exists in the voice call group, if so, executing step 304; otherwise, step 303 is repeated.

And step 304, acquiring the positions of the sound sources corresponding to the first sound data.

And 305, determining a sound source position with a distance smaller than a first preset distance value from the mobile terminal in each sound source position as a target sound source position.

Step 306, regarding the sound corresponding to the target sound source position in the first sound data as a human voice, and regarding the sound obtained by separating the human voice from the first sound data as a background sound.

Step 307, attenuating the separated background sound.

And 308, enhancing the separated human voice.

And 309, performing sound mixing processing on the weakened background sound and the strengthened human voice to obtain first uplink voice call data.

And 310, performing analog superposition processing on the first uplink voice call data and the pre-stored second uplink voice call data to obtain analog voice call data.

The second uplink voice call data is the uplink voice call data obtained by the mobile terminal in the last time period;

step 311, determining whether there is a howling point in the simulated voice call data, if yes, executing step 312, otherwise, executing step 313.

And step 312, performing weakening or removing processing on the audio signal corresponding to the howling point in the first uplink voice call data to obtain target uplink voice call data, and sending the target uplink voice call data to a server corresponding to a preset application program.

And step 313, sending the first uplink voice call data to a server corresponding to a preset application program.

It should be noted that, in the embodiment of the present application, the execution sequence of the step 307 and the step 308 is not limited, wherein the step 307 may be executed first, and then the step 308 is executed, or the step 308 may be executed first, and then the step 307 is executed, or the step 307 and the step 308 may be executed at the same time.

After a voice call group in game application is successfully established, after first uplink voice call data corresponding to a current time period is obtained, the first uplink voice call data and second uplink voice call data corresponding to the previous time period are subjected to simulated superposition processing, and when a howling point exists in the uplink voice call data after superposition processing, an audio signal corresponding to the howling point in the first uplink voice call data is weakened or removed, howling prevention processing can be timely performed on the uplink voice call data of the current mobile terminal, interference caused by the howling sound to a game process is avoided, pain points of game players are reduced, and functions of the mobile terminal are more complete.

Fig. 4 is a block diagram of a voice call data processing apparatus according to an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and is generally integrated in a mobile terminal, and may perform howling prevention processing on voice call data by executing a voice call data processing method. As shown in fig. 4, the apparatus includes:

the voice data acquisition module 401 is configured to acquire first voice data acquired by a microphone of the mobile terminal at a current time period after a voice call group in a preset application program is successfully established;

a sound data separation module 402, configured to, when it is detected that the howling prevention processing event is triggered, perform a human voice and background sound separation operation on the first sound data, where the separation operation includes: acquiring each sound source position corresponding to the first sound data; determining a sound source position, of the sound source positions, with a distance to the mobile terminal being smaller than a first preset distance value as a target sound source position; taking the sound corresponding to the target sound source position in the first sound data as a human sound, and taking the sound obtained after the human sound is separated from the first sound data as a background sound;

a background sound attenuating module 403, configured to attenuate the separated background sound;

and the sound mixing processing module 404 is configured to perform sound mixing processing on the attenuated background sound and the separated voice to obtain first uplink voice call data, and send the first uplink voice call data to the server corresponding to the preset application program.

The voice call data processing device provided in the embodiment of the application can perform howling prevention processing on the uplink voice call data of the current mobile terminal in time when a howling prevention processing event is detected to be triggered after a voice call group of a preset application program in the mobile terminal is successfully established, so that inconvenience brought to users by howling is reduced.

Optionally, the sending the first uplink voice call data to the server corresponding to the preset application program includes:

comparing the first uplink voice call data with second uplink voice call data stored in advance, and determining the similarity between the first uplink voice call data and the second uplink voice call data, wherein the second uplink voice call data is uplink voice call data obtained by the mobile terminal in a last time period;

when the similarity is larger than a preset similarity value, performing howling prevention processing on the first uplink voice call data to obtain target uplink voice call data;

and sending the target uplink voice call data to a server corresponding to the preset application program.

Optionally, comparing the first uplink voice call data with second uplink voice call data stored in advance, and determining a similarity between the first uplink voice call data and the second uplink voice call data includes:

the first uplink voice call data is processed in a blocking mode, and each data block is compared with the second uplink voice call data to obtain sub-similarity corresponding to the data block;

summing the sub-similarity to obtain the similarity of the first uplink voice call data and the second uplink voice call data;

the performing howling prevention processing on the first uplink voice call data includes:

determining a preset number of target audio signals with larger sub-similarity in the first uplink voice call data;

and performing weakening processing or removing processing on the target audio signal.

performing analog superposition processing on the first uplink voice call data and second uplink voice call data which are stored in advance to obtain analog voice call data, wherein the second uplink voice call data are uplink voice call data obtained by the mobile terminal in the last time period;

when the fact that a howling point exists in the simulated voice call data is determined, performing howling prevention processing on the first uplink voice call data to obtain target uplink voice call data;

Optionally, the performing howling prevention processing on the first uplink voice call data includes:

and performing weakening processing or removing processing on the audio signal corresponding to the howling point in the first uplink voice call data.

Optionally, the detecting that the anti-howling processing event is triggered includes:

and judging whether a target mobile terminal with the distance between the target mobile terminal and the mobile terminal being smaller than a second preset distance value exists in the voice call group, and if so, determining that a howling detection event is triggered.

Optionally, the apparatus further comprises:

the voice enhancement module is used for enhancing the separated voice before the first uplink voice communication data is obtained after the weakened background voice and the separated voice are subjected to sound mixing processing;

the background sound attenuating module is configured to:

and performing sound mixing processing on the weakened background sound and the strengthened human voice to obtain first uplink voice call data.

Optionally, the preset application program is an online game application program.

Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a voice call data processing method, the method including:

weakening the separated background sound;

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDRRAM, SRAM, EDORAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium provided in the embodiments of the present application and containing computer-executable instructions is not limited to the voice call data processing operation described above, and may also perform related operations in the voice call data processing method provided in any embodiment of the present application.

The embodiment of the application provides a mobile terminal, and the voice call data processing device provided by the embodiment of the application can be integrated in the mobile terminal. Fig. 5 is a schematic structural diagram of a mobile terminal according to an embodiment of the present application. The mobile terminal 500 may include: the device comprises a memory 501, a processor 502 and a computer program stored on the memory 501 and capable of being executed by the processor 502, wherein the processor 502 executes the computer program to realize the voice call data processing method according to the embodiment of the application.

The mobile terminal provided by the embodiment of the application can perform howling prevention processing on the uplink voice call data of the current mobile terminal in time when a howling prevention processing event is detected to be triggered after the voice call group of a preset application program in the mobile terminal is successfully established, so that inconvenience brought to the use of a user by howling sound is reduced.

Fig. 6 is a schematic structural diagram of another mobile terminal provided in an embodiment of the present application, where the mobile terminal may include: a housing (not shown), a memory 601, a Central Processing Unit (CPU) 602 (also called a processor, hereinafter referred to as CPU), a circuit board (not shown), and a power circuit (not shown). The circuit board is arranged in a space enclosed by the shell; the CPU602 and the memory 601 are disposed on the circuit board; the power supply circuit is used for supplying power to each circuit or device of the mobile terminal; the memory 601 is used for storing executable program codes; the CPU602 executes a computer program corresponding to the executable program code by reading the executable program code stored in the memory 601 to implement the steps of:

weakening the separated background sound;

The mobile terminal further includes: peripheral interface 603, RF (Radio Frequency) circuitry 605, audio circuitry 606, speakers 611, power management chip 608, input/output (I/O) subsystem 609, other input/control devices 610, touch screen 612, other input/control devices 610, and external port 604, which communicate via one or more communication buses or signal lines 607.

It should be understood that the illustrated mobile terminal 600 is merely one example of a mobile terminal and that the mobile terminal 600 may have more or fewer components than shown, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The following describes the mobile terminal for processing voice call data provided in this embodiment in detail, and the mobile terminal is taken as a mobile phone as an example.

A memory 601, the memory 601 being accessible by the CPU602, the peripheral interface 603, and the like, the memory 601 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other volatile solid state storage devices.

A peripheral interface 603, said peripheral interface 603 may connect input and output peripherals of the device to the CPU602 and the memory 601.

An I/O subsystem 609, the I/O subsystem 609 may connect input and output peripherals on the device, such as a touch screen 612 and other input/control devices 610, to the peripheral interface 603. The I/O subsystem 609 may include a display controller 6091 and one or more input controllers 6092 for controlling other input/control devices 610. Where one or more input controllers 6092 receive electrical signals from or transmit electrical signals to other input/control devices 610, the other input/control devices 610 may include physical buttons (push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click wheels. It is noted that the input controller 6092 may be connected to any one of: a keyboard, an infrared port, a USB interface, and a pointing device such as a mouse.

A touch screen 612, which touch screen 612 is an input interface and an output interface between the user's mobile terminal and the user, displays visual output to the user, which may include graphics, text, icons, video, and the like.

The display controller 6091 in the I/O subsystem 609 receives electrical signals from the touch screen 612 or transmits electrical signals to the touch screen 612. The touch screen 612 detects a contact on the touch screen, and the display controller 6091 converts the detected contact into an interaction with a user interface object displayed on the touch screen 612, that is, to implement a human-computer interaction, where the user interface object displayed on the touch screen 612 may be an icon for running a game, an icon networked to a corresponding network, or the like. It is worth mentioning that the device may also comprise a light mouse, which is a touch sensitive surface that does not show visual output, or an extension of the touch sensitive surface formed by the touch screen.

The RF circuit 605 is mainly used to establish communication between the mobile phone and the wireless network (i.e., network side), and implement data reception and transmission between the mobile phone and the wireless network. Such as sending and receiving short messages, e-mails, etc. In particular, RF circuitry 605 receives and transmits RF signals, also referred to as electromagnetic signals, through which RF circuitry 605 converts electrical signals to or from electromagnetic signals and communicates with a communication network and other devices. RF circuitry 605 may include known circuitry for performing these functions including, but not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC (CODEC) chipset, a Subscriber Identity Module (SIM), and so forth.

The audio circuit 606 is mainly used to receive audio data from the peripheral interface 603, convert the audio data into an electric signal, and transmit the electric signal to the speaker 611.

The speaker 611 is used to convert the voice signal received by the handset from the wireless network through the RF circuit 605 into sound and play the sound to the user.

And a power management chip 608 for supplying power and managing power to the hardware connected to the CPU602, the I/O subsystem, and the peripheral interface.

The voice call data processing device, the storage medium and the mobile terminal provided in the above embodiments can execute the voice call data processing method provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. For details of the voice call data processing method provided in any of the embodiments of the present application, reference may be made to the technical details not described in detail in the above embodiments.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A voice call data processing method is characterized by comprising the following steps:

after a voice call group in a preset application program is successfully established, acquiring first sound data acquired by a microphone of a mobile terminal at the current time period, wherein the first sound data comprises at least one of sound data acquired by the microphone at the current time period, sound data of a user speaking corresponding to the mobile terminal, environment sound of the environment where the mobile terminal is located and sound data played by a loudspeaker of the mobile terminal;

when detecting that the anti-howling processing event is triggered, performing a human voice and background sound separation operation on the first sound data, wherein the separation operation comprises: acquiring each sound source position corresponding to the first sound data; determining a sound source position, of the sound source positions, with a distance to the mobile terminal being smaller than a first preset distance value as a target sound source position; taking the sound corresponding to the target sound source position in the first sound data as a human sound, and taking the sound obtained after the human sound is separated from the first sound data as a background sound, wherein obtaining each sound source position corresponding to the first sound data includes: performing spectrum analysis on the first sound data to determine each sound source frequency and corresponding bandwidth contained in the first sound data; determining the respective sound source locations based on the respective sound source frequencies and corresponding bandwidths;

weakening the separated background sound;

2. The method according to claim 1, wherein the sending the first uplink voice call data to a server corresponding to the preset application program comprises:

3. The method according to claim 2, wherein comparing the first uplink voice call data with a second uplink voice call data stored in advance to determine a similarity between the first uplink voice call data and the second uplink voice call data comprises:

4. The method according to claim 1, wherein the sending the first uplink voice call data to a server corresponding to the preset application program comprises:

5. The method according to claim 4, wherein the performing anti-howling processing on the first uplink voice call data comprises:

6. The method of claim 1, wherein detecting that an anti-howling processing event is triggered comprises:

7. The method of claim 1, wherein before the mixing the attenuated background sound and the separated human voice to obtain the first uplink voice call data, the method further comprises:

carrying out enhancement processing on the separated human voice;

after the background sound after the attenuation processing and the separated voice are subjected to the sound mixing processing, first uplink voice call data are obtained, which includes:

8. The method of claim 1, wherein the predetermined application is an online gaming application.

9. A voice call data processing apparatus, comprising:

the voice data acquisition module is used for acquiring first voice data acquired by a microphone of a mobile terminal in the current time period after a voice call group in a preset application program is successfully established, wherein the first voice data comprises at least one of voice data acquired by the microphone in the current time period, voice of a user speaking corresponding to the mobile terminal, environmental sound of the environment where the mobile terminal is located and voice data played by a loudspeaker of the mobile terminal;

a sound data separation module, configured to, when it is detected that a howling prevention processing event is triggered, perform a human voice and background sound separation operation on the first sound data, where the separation operation includes: acquiring each sound source position corresponding to the first sound data; determining a sound source position, of the sound source positions, with a distance to the mobile terminal being smaller than a first preset distance value as a target sound source position; taking the sound corresponding to the target sound source position in the first sound data as a human sound, and taking the sound obtained after the human sound is separated from the first sound data as a background sound, wherein obtaining each sound source position corresponding to the first sound data includes: performing spectrum analysis on the first sound data to determine each sound source frequency and corresponding bandwidth contained in the first sound data; determining the respective sound source locations based on the respective sound source frequencies and corresponding bandwidths;

10. A computer-readable storage medium on which a computer program is stored, the program, when being executed by a processor, implementing a voice call data processing method according to any one of claims 1 to 8.

11. A mobile terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the voice call data processing method according to any one of claims 1 to 8 when executing the computer program.