CN113284500B

CN113284500B - Audio processing method, device, electronic equipment and storage medium

Info

Publication number: CN113284500B
Application number: CN202110547910.1A
Authority: CN
Inventors: 林俊潜; 谢静贤
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2024-02-06
Anticipated expiration: 2041-05-19
Also published as: CN113284500A

Abstract

The application discloses an audio processing method, an audio processing device, an electronic device and a storage medium, wherein the audio processing method is applied to first electronic equipment in audio transmission, the first electronic equipment is connected with second electronic equipment in the audio transmission, and the method comprises the following steps: receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is performed on the audio data acquired in the audio transmission process according to first voice print information; performing voice enhancement processing on the first audio data based on the stored first voice print information; and playing the first audio data after the voice enhancement processing. The method can facilitate the user to hear the voice in the audio transmission scene.

Description

Audio processing method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of audio processing technologies, and in particular, to an audio processing method, an apparatus, an electronic device, and a storage medium.

Background

With rapid progress in the technological level and the living standard, electronic devices (such as smartphones, tablet computers, etc.) are widely used by people, and more users perform video call, voice call, watch live video, etc. use scenes through the electronic devices. In these use scenes, the noisy environment of the data transmitting end will occur, so that after the receiving end plays the audio collected by the transmitting end, the user cannot hear the sound of the user corresponding to the transmitting end, and the user experience is seriously affected.

Disclosure of Invention

In view of the above problems, the present application proposes an audio processing method, an apparatus, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present application provides an audio processing method, applied to a first electronic device in audio transmission, where the first electronic device is connected to a second electronic device in the audio transmission, and the method includes: receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is performed on the audio data acquired in the audio transmission process according to first voice print information; performing voice enhancement processing on the first audio data based on the stored first voice print information; and playing the first audio data after the voice enhancement processing.

In a second aspect, an embodiment of the present application provides an audio processing apparatus, applied to a first electronic device in audio transmission, where the first electronic device is connected to a second electronic device in the audio transmission, where the apparatus includes: the audio receiving module is used for receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is carried out on the audio data acquired in the audio transmission process according to first voice print information; the voice enhancement module is used for performing voice enhancement processing on the first audio data based on the stored first voice print information; the audio playing module is used for playing the first audio data after the voice enhancement processing.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more application programs configured to perform the audio processing method provided in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein program code that is callable by a processor to perform the audio processing method provided in the first aspect described above.

According to the scheme, the first audio data sent by the second electronic equipment in the audio transmission process are received, the first audio data are audio data obtained after the second electronic equipment performs voice enhancement processing on the audio data acquired in the audio transmission process according to the first voiceprint information, then the voice enhancement processing is performed on the first audio data based on the stored first voiceprint information, the first audio data after the voice enhancement processing are played, and therefore the voice enhancement processing is performed on the acquired audio data by the sending end of the audio transmission, and meanwhile, the voice enhancement processing is performed on the received audio data again by the receiving end, so that voice can be highlighted in the played audio, and a user can hear the voice of the user corresponding to the sending end conveniently.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic diagram of an application environment provided in an embodiment of the present application.

Fig. 2 shows another schematic diagram of an application environment provided in an embodiment of the present application.

Fig. 3 shows a flow chart of an audio processing method according to an embodiment of the present application.

Fig. 4 shows a flow chart of an audio processing method according to another embodiment of the present application.

Fig. 5 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 6 illustrates another interface schematic provided by another embodiment of the present application.

Fig. 7 shows a flow chart of an audio processing method according to a further embodiment of the present application.

Fig. 8 illustrates an interface schematic provided in another embodiment of the present application.

Fig. 9 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 10 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 11 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 12 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 13 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 14 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 15 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 16 shows a schematic view of an interface provided in another embodiment of the present application.

Fig. 17 shows a block diagram of an audio processing device according to an embodiment of the present application.

Fig. 18 is a block diagram of an electronic device for performing an audio processing method according to an embodiment of the present application.

Fig. 19 is a storage unit for storing or carrying program code for implementing an audio processing method according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

Currently, with rapid progress in the technological level and the living standard, the functions of the electronic devices are increasing, and users can use the electronic devices to realize various demands. In daily use of electronic equipment, people can frequently carry out video conversation, voice conversation, watching live video and other use scenes through the electronic equipment, and the collected audio data can be transmitted in the use scenes, so that after a receiving end of audio transmission plays the received audio data, a user of the receiving end can acquire information which the user of the sending end wants to transfer.

The inventor discovers through long-time research that in an audio transmission scene, the sound collected by the transmitting end generally contains more sound types, such as the sound of each player speaking, the sound of an application program, and other sounds in the environment where the mobile terminal is located, and the sound is complex, so that after the receiving end plays the audio collected by the transmitting end, the user cannot hear the sound of the user corresponding to the transmitting end, and the user experience is seriously influenced.

In order to solve the above problems, the inventor proposes the audio processing method, the device, the electronic equipment and the storage medium provided by the embodiments of the present application, so that when the sending end of audio transmission performs voice enhancement processing on the collected audio data, the receiving end also performs voice enhancement processing on the received audio data again, so that the played audio can highlight the voice, the user can hear the voice of the user corresponding to the sending end conveniently, and the user experience is improved. The specific audio processing method is described in detail in the following embodiments.

The following describes an application environment of the audio processing method in the embodiment of the present application.

Fig. 1 is a schematic diagram of an application environment of an audio processing method according to an embodiment of the present application, and referring to fig. 1, the application environment includes a first electronic device 10, a second electronic device 20, and a server 30. The first electronic device 10 and the second electronic device 20 are connected to the server 30 through a wireless network or a wired network, and data interaction between the server 30 and the first electronic device 10 and the second electronic device 20 is enabled, so that the first electronic device 10 and the second electronic device 20 can communicate through the server 30.

Alternatively, the server 30 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a distribution network (Content Delivery Network, CDN), and basic cloud computing services such as big data and an artificial intelligence platform, but is not limited thereto.

Alternatively, the first electronic device 10 and the second electronic device 20 may be mobile terminals (e.g., smart phones, tablet computers, smart watches, smart bracelets, etc.), notebook computers, PC computers, servers, etc., but are not limited thereto. Those skilled in the art will appreciate that the number of electronic devices in the application environment may be greater or less, for example, the number of electronic devices may be only 2 (including the first electronic device 10 and the second electronic device 20), or the number of electronic devices may be tens or hundreds, or more, where other electronic devices may be included in the implementation environment. The number and type of electronic devices are not limited in the embodiments of the present application.

In some embodiments, the first electronic device 10 and the second electronic device 20 may be devices used by a user, and the first electronic device 10 and the second electronic device 20 may perform audio transmission according to a user operation, for example, a voice call, a video live broadcast, a voice live broadcast, and the like. In the process of audio transmission, the first electronic device and the second electronic device can perform voice enhancement processing on the collected audio data for transmission to the other party and the received audio data, so that the voice of the audio data is enhanced at both the sending end and the receiving end, and the played audio can be better highlighted.

Fig. 2 is another schematic diagram of an application environment of the audio processing method according to the embodiment of the present application, referring to fig. 2, where the application environment includes a plurality of electronic devices 100 (only 2 are shown in the figure), a management server 200, and a signaling server 300, where the electronic devices 100 may communicate with the management server and the signaling server to complete data interaction. In addition, communication is performed between the management server 200 and the signaling server 300. The management server 200, which is a server of a service platform of an audio transmission scenario, may be used to provide services in audio transmission for the electronic device 100; the signaling server 300 may be used to establish network real-time communications between the electronic devices 100 to complete data interactions between the electronic devices 100. The web Real-time communication is WebRTC, which is an open source item pushed by google corporation, and aims to provide a simple JavaScript interface for web applications of a browser and a mobile phone, so that the web application has RTC (Real-Time Communications) Real-time communication capability.

In some embodiments, before the audio transmission between the electronic devices 100, the communication connection of WebRTC between the management server 200 and the signaling server 300 may be established, and the communication connection of WebRTC is a P2P (end-to-end, peer to peer lending) connection, so that the real-time performance of the communication between the two is ensured, and further the real-time performance during the audio transmission is ensured. In the same way, in the audio transmission process, the electronic device can perform voice enhancement processing on the collected audio data for transmission to the opposite side and the received audio data, so that the voice of the audio data is enhanced at both the sending end and the receiving end, and the played audio can be better highlighted.

Referring to fig. 3, fig. 3 is a flow chart illustrating an audio processing method according to an embodiment of the present application. In a specific embodiment, the audio processing method is applied to a first electronic device, and the first electronic device is connected with a second electronic device in audio transmission. The following details about the flow shown in fig. 3, the audio processing method may specifically include the following steps:

step S110: and receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is performed on the audio data acquired in the audio transmission process according to the first voice print information.

The audio transmission between the first electronic device and the second electronic device may be audio transmission in a voice call scene, a video call scene, a voice live broadcast scene, a video live broadcast scene, and the like, and a specific scene to which the audio transmission in the embodiment of the present application is applied may not be limited.

In this embodiment of the present application, in the audio transmission process between the first electronic device and the second electronic device, before the second electronic device transmits the collected audio data to the first electronic device, the second electronic device may perform, according to the first voiceprint information, a voice enhancement process on the collected audio data, and then send the first audio data after the voice enhancement process to the first electronic device. Accordingly, the first electronic device may receive the audio data sent by the second electronic device. The voice enhancement processing refers to enhancing voice in audio data so that the voice can be highlighted when the audio data is played, thereby facilitating users to hear the voice in the audio.

In some embodiments, the first voiceprint information may be voiceprint information set for a user's voice that needs to be subjected to voice enhancement, and stored in the second electronic device, so that the second electronic device performs voice enhancement processing on a corresponding voice in the collected audio data. The first voiceprint information can be pre-recorded by a user of the second electronic device, or can be voiceprint information acquired in real time in the audio transmission process.

In some embodiments, the first voiceprint information may include voiceprint information of one user, or may include voiceprint information of a plurality of users. When the first voiceprint information comprises voiceprint information of a user, in the audio transmission process, the second electronic equipment carries out voice enhancement processing on user voice corresponding to the voiceprint information; when the first voiceprint information comprises voiceprint information of a plurality of users, in the audio transmission process, the second electronic equipment carries out voice enhancement processing on voices of the plurality of users corresponding to the voiceprint information. The specific selection and setting of the first voiceprint information may be set according to an actual scenario and requirements of users, for example, when users make a one-to-one private call, the first voiceprint information may include voiceprint information of only one user, and for example, when a remote video conference is performed, if a plurality of users are in a conference scenario, the first voiceprint information may include voiceprint information of the plurality of users.

Step S120: and carrying out voice enhancement processing on the first audio data based on the stored first voice print information.

In this embodiment of the present application, the first electronic device may also store the above first voiceprint information, so that after receiving, by the first electronic device, the first audio data sent by the second electronic device, the first electronic device may perform the voice enhancement processing on the received first audio data again by using the first voiceprint information.

In some embodiments, the first electronic device performs voice enhancement processing on the first audio data based on the first voiceprint information, which may be to enhance, according to the first voiceprint information, a voice corresponding to the first voiceprint information in the first audio data, so as to highlight the voice corresponding to the first voiceprint information.

Step S130: and playing the first audio data after the voice enhancement processing.

In the embodiment of the application, after the first electronic device performs the voice enhancement processing on the received first audio data, the first audio data after the voice enhancement processing can be played through the audio output unit, so that a user can hear the played audio.

In the audio processing method provided by the embodiment of the invention, because the first audio data is subjected to the voice enhancement processing at the transmitting end and the receiving end, the voice is doubly enhanced, so that the voice corresponding to the first voiceprint information can be highlighted, and the voice of the opposite party can be conveniently heard by the user.

Referring to fig. 4, fig. 4 is a flow chart illustrating an audio processing method according to another embodiment of the present application. The audio processing method is applied to a first electronic device, the first electronic device is connected with a second electronic device in audio transmission, and the detailed description will be given below with respect to the flow shown in fig. 4, and the audio processing method specifically may include the following steps:

Step S210: and responding to the starting operation of the voice enhancement processing, starting the voice enhancement processing to perform the voice enhancement processing on the received audio data.

In the embodiment of the application, the first electronic device may detect a control operation of a user, so as to start the voice enhancement processing when detecting a start operation of the voice enhancement processing, so that the voice enhancement processing may be performed on the received audio data.

In some embodiments, the first electronic device may display a control interface for audio transmission, where the control interface includes a first control for controlling the activation of the voice enhancement process; and responding to the opening operation of the first control, and opening the voice enhancement processing to perform the voice enhancement processing on the received audio data.

In this embodiment, the first electronic device may display a control interface for audio transmission, where the control interface is used to control audio transmission, so as to meet a control requirement of a user on audio transmission. The control interface may include at least a first control for controlling the opening of the voice enhancement processing, so that a user may open the voice enhancement processing of the received audio data according to the requirement thereof. And when the first electronic equipment detects the opening operation of the first control, the voice enhancement processing can be started.

In a possible implementation manner, the opening operation of the first control may be a clicking operation of the first control, a pressing operation that meets a preset pressing condition, a sliding operation that meets a preset sliding condition, or the like, and the specific opening operation may not be limited. Wherein the preset pressing condition may include at least one of the following conditions: the pressing time length corresponding to the pressing operation is longer than the first time length; the pressing area corresponding to the pressing operation is larger than the preset area; the pressing operation corresponds to a pressing pressure value greater than a preset pressure value. The preset sliding condition may include at least one of the following conditions: the sliding distance corresponding to the sliding operation is larger than the preset sliding distance; the sliding track corresponding to the sliding operation meets the preset sliding track.

For example, referring to fig. 5, the above audio transmission scenario is a video call scenario, and the first electronic device may display a video call interface A1, where the video call interface includes a control A2 for displaying more functions; referring to fig. 6, after detecting the selection operation of the control A2, more functions may be displayed in the video call interface A1: the method comprises the steps of converting a camera, hands-free, voice conversion, voice enhancement and voice sharing, wherein a control corresponding to voice enhancement is a first control A3, and when a selection operation of the control A3 is detected, voice enhancement processing can be started in response to the operation, so that voice enhancement processing is carried out on audio data received in video call.

In a possible implementation manner, before the first electronic device responds to the opening operation for the first control, the first electronic device may receive a prompt message sent by the second electronic device, where the prompt message is used for prompting the opening of the voice enhancement processing, and the prompt message is sent by the second electronic device when the second electronic device detects that the volume of the environmental noise of the current environment is greater than the preset volume; and then displaying the prompt information on a control interface. Therefore, when the second electronic equipment detects that the environment noise of the environment is large, the first electronic equipment is actively prompted to start the human voice enhancement processing, so that a user can timely control the first electronic equipment to start the human voice enhancement processing.

Of course, in this embodiment, the first control may also be used for turning off the voice enhancement process, and the first electronic device may turn off the voice enhancement process when detecting an operation for the first control. For example, in the video call process, when the environmental noise of the sending end is smaller, the sending end may send a corresponding prompt message to the first electronic device, and the first electronic device may correspondingly output the prompt message, and when detecting that the user operates the first control, may close the voice enhancement processing.

In other embodiments, after displaying the control interface for audio transmission, the first electronic device may also detect a sliding operation in the control interface, and if the detected sliding operation satisfies a set sliding condition that triggers the opening of the voice enhancement process, open the voice enhancement process. Alternatively, setting the sliding condition may include: the sliding area is positioned in the set sliding area, the sliding time length is longer than the set time length, and the sliding track meets at least one of the set sliding tracks. Therefore, the user can control the starting of the voice enhancement processing by utilizing the corresponding sliding operation in the control interface, and prompt the user experience.

Of course, after the voice enhancement process is turned on, when the detected sliding operation again satisfies the set sliding condition triggering the voice enhancement process to be turned on, the voice enhancement process may be controlled to be turned off.

Of course, the specific manner of controlling the opening of the voice enhancement processing in the embodiment of the present application may not be limited, for example, the first electronic device may also control the opening of the voice enhancement processing when detecting the voice control instruction for opening the voice enhancement processing in the audio transmission process.

Step S220: and receiving the first voiceprint information sent by the second electronic device.

In the embodiment of the application, in the audio transmission process, the second electronic device may also send the first voiceprint information according to which the second electronic device performs the voice enhancement processing on the collected audio data to the first electronic device, and the first electronic device may receive the first voiceprint information and store the first voiceprint information correspondingly.

In some embodiments, the first voiceprint information may be that the second electronic device sends the first voiceprint information to the first electronic device after detecting a sharing operation for sharing the voiceprint. Optionally, the second electronic device may display a control interface for audio transmission, and the control interface may include a control for sharing voiceprint, and when the second electronic device detects an operation of the control, the second electronic device may send the first voiceprint information to the first electronic device.

Step S230: and receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is performed on the audio data acquired in the audio transmission process according to the first voice print information.

In this embodiment, step S230 may refer to the content of the foregoing embodiment, which is not described herein.

Step S240: and carrying out voice enhancement processing on the first audio data based on the stored first voice print information.

In this embodiment of the present application, the first electronic device performs voice enhancement processing on the first audio data based on the stored first voiceprint information, which may be based on the first voiceprint information, determine a corresponding voice from the first audio data, and enhance the determined voice.

In some embodiments, the first electronic device may separate, based on the first voiceprint information, human voice audio corresponding to the first voiceprint information from the first audio data; and mixing the voice audio with the first audio data to obtain first audio data after voice enhancement processing. It can be understood that after the voice audio corresponding to the first voiceprint information is separated, the voice audio is mixed with the first audio data, so that the voice audio is overlapped, and the voice is highlighted and enhanced.

In one possible implementation manner, after the voice audio is separated, the first electronic device may perform amplitude enhancement on the voice audio, and perform amplitude attenuation on other audio after the voice audio is separated in the first audio data; and then mixing the voice audio subjected to the amplitude enhancement with other audio subjected to the amplitude attenuation to obtain first audio data subjected to the voice enhancement processing.

Alternatively, the human voice audio is amplitude enhanced, and the human voice audio may be input to the amplitude enhancement model. The human voice enhancement model can be a model obtained by training a deep neural network through a voice sample with noise so as to inhibit the noise and reduce the signal distortion. Specifically, the amplitude enhancement model can output voice phase information to be enhanced according to the input voice audio, and then overlap-add the voice audio according to the voice phase information to obtain the voice audio with enhanced amplitude.

Alternatively, the other audio may be subjected to amplitude attenuation, or may be input to the amplitude attenuation model, so as to perform amplitude attenuation on the other audio. The amplitude weakening model can also be a model obtained by training a deep neural network through a voice sample, and can output voice phase information needing weakening according to other input audios and then weaken the amplitude of the other audios according to the voice phase information. In this embodiment, the amplitude attenuation is performed on other audio, rather than completely discarding the part of sound, so that a real voice acquisition environment can be maintained.

Optionally, the audio of the voice after the amplitude enhancement is mixed with other audio after the amplitude attenuation to obtain the first audio data after the voice enhancement processing, and then the first audio data can be subjected to smoothing processing to make the obtained sound more natural.

In some embodiments, if the first voiceprint information is not stored in the first electronic device, that is, voiceprint information according to which the second electronic device performs the voice enhancement processing on the collected audio data, the first electronic device does not perform the voice enhancement processing after receiving the audio data sent by the second electronic device.

In some embodiments, after receiving the first audio data, the first electronic device may obtain, from the second electronic device, an environmental sound volume of an environment in which the second electronic device is located; then comparing the volume of the environmental sound with a volume threshold, and if the volume of the environmental sound is larger than the volume threshold, carrying out voice enhancement processing on the received first audio data again according to the first voice print information; if the volume of the environmental sound is not greater than the volume threshold, the received first audio data may not be subjected to the voice enhancement processing again.

In one possible implementation manner, when the environmental sound volume of the environment where the second electronic device is located is greater than the sound volume threshold, the first electronic device may further determine an enhancement amplitude of the voice enhancement according to the environmental sound volume, and then perform the voice enhancement on the first audio data according to the enhancement amplitude. Alternatively, in the above embodiment in which the human voice audio is separated and amplitude-enhanced and then mixed with other audio after the amplitude is weakened, the human voice audio may be amplitude-enhanced according to the enhancement amplitude. Wherein the enhancement amplitude may be positively correlated with the ambient sound volume.

Step S250: and playing the first audio data after the voice enhancement processing.

According to the audio processing method, the first electronic device can control the starting of voice enhancement processing according to the requirements of the user, and the first electronic device can receive voice print information shared by the second electronic device, so that voice enhancement processing can be carried out on audio data in audio transmission at a sending end and a receiving end, voice is doubly enhanced, voice corresponding to the first voice print information can be highlighted, and the user can hear the voice of the other party conveniently.

Referring to fig. 7, fig. 7 is a flow chart illustrating an audio processing method according to another embodiment of the present application. The audio processing method is applied to the electronic device, and will be described in detail with respect to the flowchart shown in fig. 7, where the audio processing method specifically includes the following steps:

Step S310: and receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is performed on the audio data acquired in the audio transmission process according to the first voice print information.

Step S320: and carrying out voice enhancement processing on the first audio data based on the stored first voice print information.

Step S330: and playing the first audio data after the voice enhancement processing.

In the embodiment of the present application, the steps S310 to S330 may refer to the content of the foregoing embodiment, and are not described herein.

Step S340: and sending the second voice message to the second electronic equipment.

In this embodiment of the present application, the first electronic device may also perform a voice enhancement process on the audio data collected by the first electronic device and used for being transmitted to the second electronic device during the audio transmission process. And the first electronic device can send the second voice information to the second electronic device, so that the second electronic device can perform voice enhancement processing on the received audio data again according to the second voice information after receiving the sent audio data.

In some embodiments, the first electronic device sends the second voice print information to the second electronic device in response to a sharing operation for sharing the voice print.

Optionally, the first electronic device displays a control interface of the audio transmission, wherein the control interface comprises a second control for sharing voiceprints; and responding to the selection operation of the second control, and sending second voice channel information to the second electronic equipment.

In this embodiment, the selection operation of the second control may be a click operation of the second control, a press operation that satisfies a specified press condition, a slide operation that satisfies a specified slide condition, or the like, and the specific selection operation may not be limited. Wherein specifying the pressing condition may include at least one of: the pressing time length corresponding to the pressing operation is longer than the first time length; the pressing area corresponding to the pressing operation is larger than the appointed area; the pressing operation corresponds to a pressing pressure value greater than the specified pressure value. Specifying the sliding condition may include at least one of: the sliding distance corresponding to the sliding operation is larger than a sliding distance threshold value; the sliding track corresponding to the sliding operation satisfies the specified sliding track.

Referring to fig. 6 again, after detecting the selection operation of the control A2, more functions may be displayed in the video call interface A1: the camera, the hands-free, the voice conversion, the voice enhancement and the voice sharing are switched, wherein the voice sharing corresponding control is a second control A4, and when the selection operation of the second control A4 is detected, the second voice information can be sent to the second electronic equipment in response to the operation.

Of course, the specific manner of controlling voiceprint sharing in the embodiments of the present application may not be limited.

In some embodiments, before the first electronic device shares the second voiceprint information to the second electronic device, the first electronic device may further determine whether preset voiceprint information exists in the first electronic device; if the preset voiceprint information exists, taking the preset voiceprint information as second voiceprint information; if the preset voiceprint information does not exist, voiceprint feature extraction is carried out according to the audio data acquired by the first electronic equipment in the audio transmission to obtain second voiceprint information.

In one possible implementation, the first electronic device may extract voiceprint features from the acquired audio data according to a pre-trained voiceprint recognition model. In the training stage of the voiceprint recognition model, after the voice sample is preprocessed, voiceprint characteristics are extracted by utilizing a voiceprint characteristic extraction algorithm, and then the neural network is trained according to the voice sample and the voiceprint characteristics corresponding to the voice sample, so that the voiceprint recognition model is obtained. Optionally, in the voice sample, respectively uniform voice samples can be obtained according to gender, age and geographical area, so as to improve the accuracy of the voiceprint recognition model.

When the voiceprint recognition model is used for extracting voiceprint characteristics of the collected audio data, the collected audio data can be preprocessed, background noise and partial background noise are removed, and then the processed audio data is input into the voiceprint recognition model to be recognized to obtain voiceprint characteristics, and then second voiceprint information is obtained.

In one possible implementation manner, if the first electronic device has preset voiceprint information corresponding to a plurality of users, the user may be prompted to select preset voiceprint information according to which the voice enhancement processing in the audio transmission is performed, so that the selected preset voiceprint information is used as the second voiceprint information.

In a possible implementation manner, when the first electronic device responds to the selection operation for the second control, if the preset voiceprint information does not exist, a prompting message may be output in the control interface to prompt the user to perform voiceprint extraction according to the subsequently input voice.

For example, referring to fig. 8, in the video call scenario, a prompt box A5 may be output in the video call interface A1, where the prompt content of the prompt box A5 is: whether the voice prints of the user are to be extracted according to the voice content of the user or not is not recorded before; and the prompt box A5 also comprises a control for confirming whether to carry out voiceprint extraction by the user, and when the operation of confirming voiceprint extraction by the user is detected, voiceprint characteristic extraction can be carried out according to the collected audio data to obtain second voiceprint information.

Further, if the user confirms that voiceprint extraction is not performed, prompting the user to output prompting information after the video call is finished, so as to record voiceprints. For example, referring to fig. 9, a prompt message may be output in the video call interface A1: you reject the current voiceprint extraction request, cannot share voiceprints, please enter individual voiceprints after receiving the call. In addition, after the call is ended, a prompt message may be further output to prompt the user to enter the voice print again, for example, please refer to fig. 10, and after the call is ended, an "unread message" may be displayed in the video call application interface B1: please enter the individual voiceprint.

Optionally, when the first electronic device extracts the voiceprint feature according to the collected audio data, the first electronic device may further output a prompt message after the voiceprint information is successfully extracted, so as to prompt the user that the voiceprint information has been successfully extracted. For example, referring to fig. 11, a prompt message may be output in the video call interface A1: the voiceprint recognition is completed, the voiceprints are successfully shared, and the opposite user can receive clearer voice output.

In some embodiments, the first electronic device may acquire an environmental volume of an environment in which the first electronic device is located, and output a prompt message when the environmental volume is greater than a set volume threshold, so as to prompt the user to share the first voiceprint information, so that the second electronic device can perform the voice enhancement processing on the received audio data again.

In some embodiments, the first electronic device and the second electronic device may perform the voice enhancement processing on the collected audio data only when the voice print information is shared with each other. In this embodiment, the first electronic device may obtain the volume of the environmental sound of the environment where the second electronic device is located from the second electronic device, and if the volume of the environmental sound is greater than a preset threshold, may send prompt information to the second electronic device to prompt the second electronic device to share the first voiceprint information, and prompt the second electronic device to perform the voice enhancement processing on the collected audio data. Of course, when the first electronic device detects that the volume of the environmental sound is greater than the preset threshold, the first electronic device may prompt the user to share the second voice information, and perform voice enhancement processing on the collected audio data under the condition of sharing the second voice information.

In this embodiment, the on and off of the vocal enhancement process, and the voiceprint sharing embodiment may be combined. For example, in a video call scene, if the first electronic device does not start voice enhancement processing and does not share second voice information, the first electronic device sends the originally collected audio data to the second electronic device, and does not perform voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, but does not perform the voice enhancement processing on the received audio data.

For another example, in the video call scenario, if the first electronic device does not start the voice enhancement processing and shares the second voice information, the first electronic device sends the audio data after the voice enhancement processing to the second electronic device, and does not perform the voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, but does not perform the voice enhancement processing on the received audio data.

For another example, in the video call scenario, if the first electronic device starts the voice enhancement processing, but does not share the second voice information, the first electronic device sends the originally collected audio data to the second electronic device, but performs the voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, the received audio data may be subjected to voice enhancement processing; if the first electronic equipment does not receive the first voiceprint information shared by the second electronic equipment, the received audio data is not subjected to voice enhancement processing.

For another example, in the video call scenario, if the first electronic device starts the voice enhancement processing and shares the second voice information, the first electronic device sends the voice-enhanced audio data to the second electronic device, and can perform the voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, the received audio data may be subjected to voice enhancement processing; if the first electronic equipment does not receive the first voiceprint information shared by the second electronic equipment, the received audio data is not subjected to voice enhancement processing.

In some embodiments, after the audio transmission is finished, the first electronic device may delete the first voiceprint information shared by the second electronic device, and the second electronic device may also delete the second voiceprint information shared by the first electronic device, so as to prevent privacy disclosure of the user.

In some embodiments, if the application program for audio transmission is allowed for the first time, the first electronic device may output a prompt message to prompt the user that the application program has a function of voice enhancement processing, and prompt the user to record voice print information.

Step S350: and acquiring second audio data acquired by the first electronic equipment in the audio transmission process.

Step S360: and carrying out voice enhancement processing on the second audio data based on the second voice information.

In this embodiment of the present application, after the first electronic device obtains the second audio data for sending to the second electronic device, the first electronic device may perform voice enhancement processing on the second audio data based on the second voice information. The manner of the voice enhancement processing may refer to the manner of the first electronic device performing the voice enhancement processing on the first audio data in the foregoing embodiment, which is not described herein again.

Step S370: and sending the second audio data subjected to the voice enhancement processing to the second electronic equipment, wherein the second electronic equipment is used for carrying out voice enhancement processing on the audio data sent by the first electronic equipment again according to the second voice information.

In the embodiment of the application, after the collected audio data is subjected to voice enhancement processing, the second audio data after the voice enhancement processing can be sent to the second electronic device, and the second electronic device can carry out voice enhancement processing on the second audio data again according to the second voice print information, so that a user can hear the voice of the user of the first electronic device in the playing audio conveniently.

In some embodiments, if there are multiple second electronic devices that perform audio transmission with the first electronic device during the audio transmission, there may be a case where the user wants to record the voice of the user of a certain electronic device, for example, during a multi-user video call conference, only the speaking content of the speaker is desired to be recorded, but the speaking content of the other person is not desired to be recorded. Therefore, voice print information of the designated user can be utilized to separate voice audio of the designated user from the received audio data and record the voice audio.

Optionally, because the recording involves information security issues, if there is a certain second electronic device user want to record the audio of the user of the first electronic device, a prompt message may be output in the control interface of audio transmission to prompt the user whether to allow recording, and if the user confirms that recording is allowed, the second electronic device may be allowed to record.

In an exemplary scenario of a multi-user video call, if the user a of the first electronic device wants to record the sound of the users B and C, but the users B and C do not allow recording through their respective second electronic devices, the first electronic device may send a recording request to the second electronic devices corresponding to the users B and C; if the user B inputs the instruction of allowing the recording through the second electronic equipment and the user C inputs the instruction of not allowing the recording through the second electronic equipment, the first electronic equipment can independently record the audio pushed by the electronic equipment of the user B; if the users B and C allow the first electronic device to record through the second electronic device, the first electronic device can record the audio of the users B and C independently or record the combined audio of the users B and C.

Optionally, in the control interface of audio transmission, a third control for allowing others to record may be displayed, and when an operation on the third control is detected, prompt information may be output to prompt the user whether to run the recording of others. For example, referring to fig. 12 and fig. 13 together, in the video call interface A1, after detecting the operation of the third control A6 for "record permission", a prompt content "all users will have the right to record your voice separately during the call, is continued? If the agreeing operation of the user is received, the equipment of other users is allowed to obtain the recording permission; otherwise, the equipment of other users is not allowed to obtain the recording permission.

Optionally, in the control interface of audio transmission, a fourth control for separating the recording may be displayed, and when an operation on the fourth control is detected, a user that can select the recording may be output for the user to select the user that wants to record. For example, referring to fig. 12 and 14, in the video call interface A1, after detecting the operation of the third control A7 for "separating recording", a prompt "please select the user who wants to record" may be displayed in the video call interface A1, and a selectable option of the user may be provided. Further, referring to fig. 15, if the user selects a plurality of users needing to record, the user may be prompted in the video call interface A1 to select a recording mode, where the recording mode may include "separate recording" and "combined recording". Correspondingly, referring to fig. 16, the electronic device of the user who selects recording may receive a corresponding recording request, and the electronic device may output a prompt message in the video call interface A1 to prompt the user whether to allow recording.

According to the audio processing method, the first audio data are subjected to voice enhancement processing at the sending end and the receiving end, so that voice is doubly enhanced, the voice corresponding to the first voiceprint information can be highlighted, and a user can hear the voice of the other party conveniently. In addition, the realization method for separating the sound recordings is provided, so that the requirement of a user on recording the speaking content of the user who wants to record the sound can be met.

Referring to fig. 17, a block diagram of an audio processing apparatus 400 according to an embodiment of the present application is shown. The audio processing apparatus 400 applies the first electronic device described above, and the first electronic device is connected to the second electronic device in audio transmission. The audio processing apparatus 400 includes: an audio receiving module 410, a human voice enhancement module 420, and an audio playing module 430. The audio receiving module 410 is configured to receive first audio data sent by the second electronic device in the audio transmission process, where the first audio data is audio data obtained by performing, by the second electronic device, voice enhancement processing on audio data collected in the audio transmission process according to first voice print information; the voice enhancement module 420 is configured to perform voice enhancement processing on the first audio data based on the stored first voiceprint information; the audio playing module 430 is configured to play the first audio data after the voice enhancement processing.

In some embodiments, the audio processing apparatus 400 further comprises: and the human voice enhancement control module. The voice enhancement control module is used for: and responding to the starting operation of the voice enhancement processing, starting the voice enhancement processing to perform the voice enhancement processing on the received audio data.

In some implementations, the voice enhancement control module may be configured to: displaying a control interface of the audio transmission, wherein the control interface comprises a first control for controlling the opening of the voice enhancement processing; and responding to the opening operation of the first control, and opening the voice enhancement processing to perform the voice enhancement processing on the received audio data.

In one possible implementation, the audio processing apparatus 400 further includes: the prompt message receiving module and the prompt message display module. The prompt information receiving module is used for receiving prompt information sent by the second electronic equipment, the prompt information is used for prompting the starting of the voice enhancement processing, and the prompt information is sent by the second electronic equipment when the volume of the environmental noise of the current environment is detected to be larger than the preset volume; the prompt information display module is used for displaying the prompt information on the control interface.

In some embodiments, the audio processing apparatus 400 further comprises: and the voiceprint receiving module. And the voiceprint receiving module is used for receiving the first voiceprint information sent by the second electronic equipment.

In some embodiments, the audio processing apparatus 400 further includes a voiceprint transmission module, an audio acquisition module, and an audio transmission module. The voiceprint sending module is used for sending second voiceprint information to the second electronic equipment; the audio acquisition module is used for acquiring second audio data acquired by the first electronic equipment in the audio transmission process; the voice enhancement module is used for performing voice enhancement processing on the second audio data based on the second voice information; the audio sending module is used for sending the second audio data subjected to the voice enhancement processing to the second electronic equipment, and the second electronic equipment is used for carrying out voice enhancement processing on the audio data sent by the first electronic equipment again according to the second voice information.

In some embodiments, the voiceprint sending module can be configured to: and sending second voice print information to the second electronic equipment in response to a sharing operation for sharing voice print.

In one possible implementation, the audio processing apparatus 400 may further include: and an interface display module. The interface display module is used for displaying a control interface of the audio transmission, and the control interface comprises a second control for sharing voiceprints; the voiceprint sending module may be configured to: and responding to the selection operation of the second control, and sending second voice channel information to the second electronic equipment.

In one possible implementation, the audio processing apparatus 400 may further include: the device comprises a voiceprint judging module, a first determining module and a second determining module. The voiceprint judging module is used for judging whether preset voiceprint information exists in the first electronic equipment; the first determining module is used for taking preset voiceprint information as second voiceprint information if the preset voiceprint information exists; and the second determining module is used for extracting voiceprint characteristics according to the audio data acquired by the first electronic equipment in the audio transmission to obtain second voiceprint information if the preset voiceprint information does not exist.

In some embodiments, the human voice enhancement module 420 may include an audio separation unit and a mixing unit. The audio separation unit is used for separating voice audio corresponding to the first voiceprint information from the first audio data based on the first voiceprint information; and the sound mixing unit is used for mixing the voice audio with the first audio data to obtain first audio data after voice enhancement processing.

In one possible implementation, the human voice enhancement module 420 may further include an amplitude enhancement unit and an amplitude attenuation unit. The amplitude enhancement unit is used for carrying out amplitude enhancement on the voice audio; the amplitude weakening unit is used for weakening the amplitude of other audio after the voice audio is separated from the first audio data. The mixing unit may be configured to: and mixing the voice frequency subjected to the amplitude enhancement with the other voice frequency subjected to the amplitude attenuation to obtain first voice frequency data subjected to the voice enhancement processing.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In several embodiments provided herein, the coupling of the modules to each other may be electrical, mechanical, or other.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

In summary, according to the scheme provided by the application, the first audio data sent by the second electronic device in the audio transmission process is received, the first audio data is audio data obtained after the second electronic device performs voice enhancement processing on the audio data collected in the audio transmission process according to the first voiceprint information, then the voice enhancement processing is performed on the first audio data based on the stored first voiceprint information, the first audio data subjected to the voice enhancement processing is played, and therefore the voice enhancement processing is performed on the collected audio data by the sending end of the audio transmission, and meanwhile, the voice enhancement processing is performed on the received audio data again by the receiving end, so that the voice can be highlighted in the played audio, and the user can hear the voice of the user corresponding to the sending end conveniently.

Referring to fig. 18, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 100 may be an electronic device capable of running an application program, such as a smart phone, a tablet computer, a smart watch, smart glasses, a notebook computer, etc. The electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs, wherein the one or more application programs may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more program(s) configured to perform the method as described in the foregoing method embodiments.

Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device 100, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The Memory 120 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The storage data area may also store data created by the electronic device 100 in use (e.g., phonebook, audiovisual data, chat log data), and the like.

Referring to fig. 19, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable medium 800 has stored therein program code which can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium 800 comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 800 has storage space for program code 810 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 810 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An audio processing method, applied to a first electronic device in audio transmission, the first electronic device being connected to a second electronic device in the audio transmission, the method comprising:

displaying a control interface of the audio transmission, wherein the control interface comprises a first control for controlling the starting of the voice enhancement processing;

responding to the opening operation of the first control, and opening the voice enhancement processing to perform voice enhancement processing on the received audio data;

receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is performed on the audio data acquired in the audio transmission process according to first voice print information;

Performing voice enhancement processing on the first audio data based on the stored first voice print information;

and playing the first audio data after the voice enhancement processing.

2. The method of claim 1, wherein prior to the displaying the control interface for audio transmission, the method further comprises:

receiving prompt information sent by the second electronic equipment, wherein the prompt information is used for prompting the starting of voice enhancement processing, and the prompt information is sent by the second electronic equipment when the volume of the environmental noise of the current environment is detected to be larger than a preset volume;

and displaying the prompt information on the control interface.

3. The method of claim 1, wherein prior to the voice enhancement processing of the first audio data based on the stored first voiceprint information, the method further comprises:

and receiving the first voiceprint information sent by the second electronic device.

4. The method according to claim 1, wherein the method further comprises:

transmitting second voice print information to the second electronic device;

acquiring second audio data acquired by the first electronic equipment in the audio transmission process;

Performing voice enhancement processing on the second audio data based on the second voice information;

and sending the second audio data subjected to the voice enhancement processing to the second electronic equipment, wherein the second electronic equipment is used for carrying out voice enhancement processing on the audio data sent by the first electronic equipment again according to the second voice information.

5. The method of claim 4, wherein the sending the second voiceprint information to the second electronic device comprises:

and sending second voice print information to the second electronic equipment in response to a sharing operation for sharing voice print.

6. The method of claim 5, wherein prior to the sending second voiceprint information to the second electronic device in response to the sharing operation for sharing voiceprints, the method further comprises:

displaying a control interface of the audio transmission, wherein the control interface comprises a second control for sharing voiceprints;

the sending, in response to a sharing operation for sharing voiceprint, second voiceprint information to the second electronic device includes:

and responding to the selection operation of the second control, and sending second voice channel information to the second electronic equipment.

7. The method of claim 4, wherein prior to the sending the second voice-print information to the second electronic device, the method further comprises:

judging whether preset voiceprint information exists in the first electronic equipment or not;

if the preset voiceprint information exists, taking the preset voiceprint information as second voiceprint information;

and if the preset voiceprint information does not exist, carrying out voiceprint feature extraction according to the audio data acquired by the first electronic equipment in the audio transmission to obtain second voiceprint information.

8. The method according to any one of claims 1-7, wherein the performing a human voice enhancement process on the first audio data based on the stored first voiceprint information comprises:

based on the first voiceprint information, separating voice audio corresponding to the first voiceprint information from the first audio data;

and mixing the voice audio with the first audio data to obtain first audio data after voice enhancement processing.

9. The method of claim 8, wherein prior to said mixing the human voice audio with the first audio data to obtain human voice enhanced processed first audio data, the method further comprises:

Amplitude enhancement is carried out on the voice audio;

performing amplitude weakening on other audio after the voice audio is separated from the first audio data;

mixing the voice audio with the first audio data to obtain first audio data after voice enhancement processing, wherein the method comprises the following steps:

and mixing the voice frequency subjected to the amplitude enhancement with the other voice frequency subjected to the amplitude attenuation to obtain first voice frequency data subjected to the voice enhancement processing.

10. An audio processing apparatus for use with a first electronic device in an audio transmission, the first electronic device being coupled to a second electronic device in the audio transmission, the apparatus comprising: the voice enhancement control module, the audio receiving module, the voice enhancement module and the audio playing module, wherein,

the voice enhancement control module is used for displaying a control interface of the audio transmission, and the control interface comprises a first control for controlling the opening of voice enhancement processing; responding to the opening operation of the first control, and opening the voice enhancement processing to perform voice enhancement processing on the received audio data;

The audio receiving module is used for receiving first audio data sent by the second electronic equipment in the audio transmission process, wherein the first audio data are audio data obtained by the second electronic equipment after voice enhancement processing is carried out on the audio data acquired in the audio transmission process according to first voice print information;

the voice enhancement module is used for performing voice enhancement processing on the first audio data based on the stored first voice print information;

the audio playing module is used for playing the first audio data after the voice enhancement processing.

11. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-9.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code, which is callable by a processor for executing the method according to any one of claims 1-9.