CN113284500A - Audio processing method and device, electronic equipment and storage medium - Google Patents

Audio processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113284500A
CN113284500A CN202110547910.1A CN202110547910A CN113284500A CN 113284500 A CN113284500 A CN 113284500A CN 202110547910 A CN202110547910 A CN 202110547910A CN 113284500 A CN113284500 A CN 113284500A
Authority
CN
China
Prior art keywords
audio
audio data
electronic device
enhancement processing
voiceprint information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110547910.1A
Other languages
Chinese (zh)
Other versions
CN113284500B (en
Inventor
林俊潜
谢静贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110547910.1A priority Critical patent/CN113284500B/en
Publication of CN113284500A publication Critical patent/CN113284500A/en
Application granted granted Critical
Publication of CN113284500B publication Critical patent/CN113284500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephone Function (AREA)

Abstract

The application discloses an audio processing method, an audio processing device, electronic equipment and a storage medium, wherein the audio processing method is applied to first electronic equipment in audio transmission, the first electronic equipment is connected with second electronic equipment in the audio transmission, and the method comprises the following steps: receiving first audio data sent by the second electronic device in the audio transmission process, wherein the first audio data is obtained by performing human voice enhancement processing on the audio data acquired by the second electronic device in the audio transmission process according to first voiceprint information; based on the stored first voiceprint information, carrying out human voice enhancement processing on the first audio data; and playing the first audio data after the human sound enhancement processing. The method can facilitate the user to hear the voice in the audio transmission scene.

Description

Audio processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of audio processing technologies, and in particular, to an audio processing method and apparatus, an electronic device, and a storage medium.
Background
With rapid progress of the technology level and the living standard, electronic devices (such as smart phones, tablet computers, and the like) are widely used by people, and more users perform video calls, voice calls, watch live videos, and other use scenes through the electronic devices. In these use scenes, the situation that the environment where the data sending end is located is noisy can occur, so that after the receiving end plays the audio collected by the sending end, the user cannot clearly hear the sound of the user corresponding to the sending end, and the user experience is seriously influenced.
Disclosure of Invention
In view of the foregoing, the present application provides an audio processing method, an apparatus, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present application provides an audio processing method, which is applied to a first electronic device in audio transmission, where the first electronic device is connected to a second electronic device in the audio transmission, and the method includes: receiving first audio data sent by the second electronic device in the audio transmission process, wherein the first audio data is obtained by performing human voice enhancement processing on the audio data acquired by the second electronic device in the audio transmission process according to first voiceprint information; based on the stored first voiceprint information, carrying out human voice enhancement processing on the first audio data; and playing the first audio data after the human sound enhancement processing.
In a second aspect, an embodiment of the present application provides an audio processing apparatus, which is applied to a first electronic device in audio transmission, where the first electronic device is connected to a second electronic device in the audio transmission, and the apparatus includes: the audio receiving module is used for receiving first audio data sent by the second electronic device in the audio transmission process, wherein the first audio data is obtained by performing human voice enhancement processing on the audio data acquired by the second electronic device in the audio transmission process according to first voiceprint information; the voice enhancement module is used for performing voice enhancement processing on the first audio data based on the stored first voiceprint information; the audio playing module is used for playing the first audio data after the human voice enhancement processing.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the audio processing method provided by the first aspect above.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute the audio processing method provided in the first aspect.
The scheme that this application provided, through receiving the first audio data that second electronic equipment sent at the in-process of audio transmission, this first audio data is second electronic equipment according to first voiceprint information, the audio data that obtains after carrying out the speech enhancement processing to the audio data of audio transmission's in-process collection, then based on the first voiceprint information of storage, carry out the speech enhancement processing to first audio data, play the first audio data after carrying out the speech enhancement processing, thereby can realize that the sending end of audio transmission carries out the speech enhancement processing to the audio data of collection simultaneously, the receiving terminal also carries out the speech enhancement processing again to the audio data of receiving, thereby make can stand out the speech in the audio frequency of broadcast, make things convenient for the user to hear the sound of the user that the sending end corresponds clearly.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 shows a schematic diagram of an application environment provided by an embodiment of the present application.
Fig. 2 shows another schematic diagram of an application environment provided by the embodiment of the present application.
FIG. 3 shows a flow diagram of an audio processing method according to one embodiment of the present application.
Fig. 4 shows a flow diagram of an audio processing method according to another embodiment of the present application.
Fig. 5 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 6 shows another interface schematic provided by another embodiment of the present application.
Fig. 7 shows a flow diagram of an audio processing method according to yet another embodiment of the present application.
Fig. 8 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 9 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 10 shows a schematic view of an interface provided by another embodiment of the present application.
Fig. 11 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 12 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 13 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 14 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 15 shows a schematic interface diagram provided in another embodiment of the present application.
Fig. 16 shows a schematic interface diagram provided in another embodiment of the present application.
FIG. 17 shows a block diagram of an audio processing device according to an embodiment of the application.
Fig. 18 is a block diagram of an electronic device for executing an audio processing method according to an embodiment of the present application.
Fig. 19 is a storage unit according to an embodiment of the present application, configured to store or carry program code for implementing an audio processing method according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
At present, with rapid progress of the technology level and the living standard, functions of electronic devices are more and more, and users can use the electronic devices to realize various demands. In daily use of electronic equipment by people, use scenes such as video call, voice call, live video watching and the like can be frequently carried out through the electronic equipment, collected audio data can be transmitted in the use scenes, and therefore after a receiving end of audio transmission plays received audio data, a user of the receiving end can acquire information which the user of the transmitting end wants to transmit.
The inventor finds that in an audio transmission scene, the types of sound generally contained in the sound collected by the sending end are more, for example, the sound contains the speaking sound of each player, the sound of an application program, other sounds in the environment where the mobile terminal is located, and the like, and due to the fact that the sound is complex, after the receiving end plays the audio collected by the sending end, a user cannot clearly hear the sound of the user corresponding to the sending end, and the user experience is seriously influenced.
In view of the above problems, the inventor provides an audio processing method, an audio processing device, an electronic device, and a storage medium, which are provided by the embodiments of the present application, so that a sending end of audio transmission can perform voice enhancement processing on collected audio data, and a receiving end can perform voice enhancement processing on the received audio data again, thereby enabling voice to be highlighted in played audio, facilitating a user to hear the voice of the user corresponding to the sending end, and improving user experience. The specific audio processing method is described in detail in the following embodiments.
An application environment of the audio processing method in the embodiment of the present application is described below.
Fig. 1 is a schematic diagram illustrating an application environment of an audio processing method provided in an embodiment of the present application, and referring to fig. 1, the application environment includes a first electronic device 10, a second electronic device 20, and a server 30. The first electronic device 10 and the second electronic device 20 are connected to the server 30 through a wireless network or a wired network, and the server 30 and the first electronic device 10 and the second electronic device 20 can perform data interaction, so that the first electronic device 10 and the second electronic device 20 can communicate through the server 30.
Optionally, the server 30 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Delivery Network (CDN), a big data and artificial intelligence platform, but is not limited thereto.
Alternatively, the first electronic device 10 and the second electronic device 20 may be mobile terminals (e.g., smart phones, tablet computers, smart watches, smart bands, etc.), notebook computers, PC computers, servers, etc., but are not limited thereto. Those skilled in the art will appreciate that the number of the electronic devices in the application environment may be greater or smaller, for example, the number of the electronic devices is only 2 (including the first electronic device 10 and the second electronic device 20), or the number of the electronic devices is tens or hundreds, or greater, and in this case, other electronic devices are also included in the implementation environment. The number and the type of the electronic devices are not limited in the embodiments of the present application.
In some embodiments, the first electronic device 10 and the second electronic device 20 may be devices used by a user, and the first electronic device 10 and the second electronic device 20 may perform a usage scenario of audio transmission according to an operation of the user, for example, a scenario of a voice call, a video call, a live video, a live voice broadcast, and the like. In the process of audio transmission, the first electronic device and the second electronic device can perform voice enhancement processing on the collected audio data used for being transmitted to the other party and the received audio data, so that the audio data is subjected to voice enhancement at the transmitting end and the receiving end, and voice can be well highlighted in the played audio.
Fig. 2 shows another schematic diagram of an application environment of the audio processing method provided in the embodiment of the present application, and referring to fig. 2, the application environment includes a plurality of electronic devices 100 (only 2 are shown in the figure), a management server 200, and a signaling server 300, where the electronic devices 100 may communicate with the management server and the signaling server to complete data interaction. In addition, the management server 200 communicates with the signaling server 300. The management server 200, which is a server of a service platform of an audio transmission scenario, may be used to provide a service for the electronic device 100 during audio transmission; the signaling server 300 may be used to establish network real-time communication between the electronic devices 100 to complete data interaction between the electronic devices 100. The WebRTC is an open source item pushed by google corporation, and aims to provide a simple JavaScript interface for a browser and a web application of a mobile phone, so that the browser and the web application have Real-Time Communications (RTC) Real-Time communication capability.
In some embodiments, before audio transmission between the electronic devices 100, a WebRTC communication connection between the two can be established through the management server 200 and the signaling server 300, and the WebRTC communication connection is a P2P (peer-to-peer) connection, so that the real-time performance of communication between the two is ensured, and further the real-time performance during audio transmission is ensured. Similarly, in the process of audio transmission, the electronic equipment can perform voice enhancement processing on the collected audio data used for transmitting the audio data to the other party and the received audio data, so that the audio data is subjected to voice enhancement at the transmitting end and the receiving end, and voice can be well highlighted in the played audio.
Referring to fig. 3, fig. 3 is a flow chart illustrating an audio processing method according to an embodiment of the present application. In a specific embodiment, the audio processing method is applied to a first electronic device, and the first electronic device is connected with a second electronic device in audio transmission. As will be described in detail with respect to the flow shown in fig. 3, the audio processing method may specifically include the following steps:
step S110: and receiving first audio data sent by second electronic equipment in the audio transmission process, wherein the first audio data is obtained by performing human voice enhancement processing on the audio data acquired by the second electronic equipment in the audio transmission process according to first voiceprint information.
The audio transmission between the first electronic device and the second electronic device may be audio transmission in a voice call scene, a video call scene, a voice live broadcast scene, a video live broadcast scene, and the like, and a specific scene to which the audio transmission is applied in the embodiment of the present application may not be limited.
In the embodiment of the application, in the process of audio transmission between the first electronic device and the second electronic device, before the second electronic device transmits the collected audio data to the first electronic device, the collected audio data can be subjected to voice enhancement processing according to the first voiceprint information, and then the first audio data subjected to the voice enhancement processing is sent to the first electronic device. Accordingly, the first electronic device may receive the audio data transmitted by the second electronic device. The voice enhancement processing is to enhance the voice in the audio data so that the voice can be highlighted when the audio data is played, thereby facilitating the user to hear the voice in the played audio.
In some embodiments, the first voiceprint information may be voiceprint information set for a user voice needing to be subjected to vocal enhancement, and is stored in the second electronic device, so that the second electronic device performs vocal enhancement processing on a corresponding vocal in the acquired audio data. The first voiceprint information may be pre-entered by a user of the second electronic device, or may be voiceprint information acquired in real time in an audio transmission process.
In some embodiments, the first voiceprint information may include voiceprint information of one user, and may also include voiceprint information of a plurality of users. When the first voiceprint information comprises voiceprint information of a user, in the audio transmission process, the second electronic equipment performs human voice enhancement processing on the user voice corresponding to the voiceprint information; when the first voiceprint information comprises voiceprint information of a plurality of users, in the audio transmission process, the second electronic device performs human voice enhancement processing on voices of the plurality of users corresponding to the voiceprint information. The specific selection and setting of the first voiceprint information may be set according to an actual scene and requirements of users, for example, when a one-to-one private call is performed between users, the first voiceprint information may only include voiceprint information of one user, and for example, when a remote video conference is performed, if multiple users are in a conference scene at the same time, the first voiceprint information may include voiceprint information of the multiple users.
Step S120: and performing human voice enhancement processing on the first audio data based on the stored first voiceprint information.
In this embodiment of the application, the first electronic device may also store the first voiceprint information, so that after receiving the first audio data sent by the second electronic device, the first electronic device may perform the human voice enhancement processing on the received first audio data again by using the first voiceprint information.
In some embodiments, the first electronic device performs a voice enhancement process on the first audio data based on the first voiceprint information, which may be to enhance a voice corresponding to the first voiceprint information in the first audio data according to the first voiceprint information so as to highlight the voice corresponding to the first voiceprint information.
Step S130: and playing the first audio data after the human sound enhancement processing.
In this embodiment of the application, after performing the voice enhancement processing on the received first audio data, the first electronic device may play the first audio data after the voice enhancement processing through the audio output unit, so that the user can hear the played audio.
In the audio processing method provided by the embodiment of the application, since the first audio data is subjected to the voice enhancement processing at the sending end and the receiving end, the voice is enhanced doubly, so that the voice corresponding to the first voiceprint information can be highlighted, and the user can hear the voice of the other party conveniently.
Referring to fig. 4, fig. 4 is a flowchart illustrating an audio processing method according to another embodiment of the present application. The audio processing method is applied to a first electronic device, the first electronic device is connected to a second electronic device in audio transmission, and as will be described in detail with reference to the flow shown in fig. 4, the audio processing method may specifically include the following steps:
step S210: and responding to the opening operation of the human voice enhancement processing, and opening the human voice enhancement processing to perform human voice enhancement processing on the received audio data.
In this embodiment of the application, the first electronic device may detect a control operation of a user, so that when detecting an opening operation of the human voice enhancement processing, the human voice enhancement processing is opened, and thus, the human voice enhancement processing may be performed on the received audio data.
In some embodiments, the first electronic device may display a control interface for audio transmission, the control interface including a first control for controlling activation of the human voice enhancement process; and responding to the starting operation of the first control, starting the human voice enhancement processing so as to perform human voice enhancement processing on the received audio data.
In this embodiment, the first electronic device may display a control interface of the audio transmission, where the control interface is used to control the audio transmission to meet the control requirement of the user on the audio transmission. The control interface can at least comprise a first control used for controlling the opening of the human voice enhancement processing, so that a user can determine whether to open the human voice enhancement processing on the received audio data according to the requirement of the user. When the first electronic device detects the opening operation of the first control, the human voice enhancement processing can be started.
In a possible implementation manner, the opening operation of the first control may be a click operation of the first control, a press operation meeting a preset press condition, a slide operation meeting a preset slide condition, and the like, and the specific opening operation may not be limited. Wherein the preset pressing condition may include at least one of the following conditions: the pressing duration corresponding to the pressing operation is longer than the first duration; the pressing area corresponding to the pressing operation is larger than the preset area; and the pressing pressure value corresponding to the pressing operation is greater than the preset pressure value. The preset slip condition may include at least one of the following conditions: the sliding distance corresponding to the sliding operation is greater than the preset sliding distance; and the sliding track corresponding to the sliding operation meets the preset sliding track.
For example, referring to fig. 5, the above scenario of audio transmission is a video call scenario, and the first electronic device may display a video call interface a1, where the video call interface includes a control a2 for showing more functions; referring to fig. 6, after detecting the selection operation of the control a2, more functions can be shown in the video call interface a 1: the method comprises the steps of converting a camera, hands-free operation, voice conversion, voice enhancement and voiceprint sharing, wherein a control corresponding to the voice enhancement is a first control A3, and when the selection operation of the control A3 is detected, the operation can be responded, and the voice enhancement processing is started so as to perform voice enhancement processing on audio data received in a video call.
In a possible implementation manner, before the first electronic device starts the human voice enhancement processing in response to the start operation for the above first control, the first electronic device may receive prompt information sent by the second electronic device, where the prompt information is used to prompt that the human voice enhancement processing is started, and the prompt information is sent by the second electronic device when the volume of the environmental noise of the current environment is detected to be greater than a preset volume; and then displaying the prompt message on a control interface. Therefore, when the second electronic device detects that the environmental noise of the environment where the second electronic device is located is large, the first electronic device is actively prompted to start the voice enhancement processing, so that a user can timely control the first electronic device to start the voice enhancement processing.
Of course, in this embodiment, the first control may also be used to turn off the human voice enhancement processing, and the first electronic device may turn off the human voice enhancement processing when detecting the operation on the first control. For example, in the video call process, when the environmental noise of the sending end is small, the sending end may send a corresponding prompt message to the first electronic device, the first electronic device may output the prompt message correspondingly, and when the operation of the user on the first control is detected, the human voice enhancement processing may be turned off.
In other embodiments, after the first electronic device displays the control interface for audio transmission, it may also detect a sliding operation in the control interface, and start the human voice enhancement processing if the detected sliding operation meets a set sliding condition that triggers the start of the human voice enhancement processing. Alternatively, the setting of the slip condition may include: the sliding area is in at least one of the set sliding area, the sliding time length is greater than the set time length, and the sliding track meets the set sliding track. Therefore, the user can utilize corresponding sliding operation in the control interface to control the starting of the human voice enhancement processing and prompt the user experience.
Of course, after the human voice enhancement processing is started, when the detected sliding operation satisfies the set sliding condition for triggering the human voice enhancement processing to be started, the human voice enhancement processing may be controlled to be closed.
Of course, a specific manner of controlling the start of the human voice enhancement processing in the embodiment of the present application may not be limited, and for example, the first electronic device may also control the start of the human voice enhancement processing when detecting a voice control instruction for starting the human voice enhancement processing in the audio transmission process.
Step S220: and receiving the first voiceprint information sent by the second electronic equipment.
In the embodiment of the application, in the audio transmission process, the second electronic device performs the first voiceprint information according to the voice enhancement processing on the acquired audio data, and the second electronic device can also send the first voiceprint information to the first electronic device, and correspondingly, the first electronic device can receive the first voiceprint information and store the first voiceprint information.
In some embodiments, the first voiceprint information may be sent to the first electronic device by the second electronic device after detecting a sharing operation for sharing a voiceprint. Optionally, the second electronic device may display a control interface for audio transmission, the control interface may include a control for sharing a voiceprint, and after the second electronic device detects an operation of the control, the first voiceprint information may be sent to the first electronic device.
Step S230: and receiving first audio data sent by the second electronic device in the audio transmission process, wherein the first audio data is obtained by performing human voice enhancement processing on the audio data acquired by the second electronic device in the audio transmission process according to the first voiceprint information.
In the embodiment of the present application, step S230 may refer to the contents of the foregoing embodiments, and is not described herein again.
Step S240: and performing human voice enhancement processing on the first audio data based on the stored first voiceprint information.
In this embodiment of the application, the first electronic device performs a voice enhancement process on the first audio data based on the stored first voiceprint information, which may be to determine a corresponding voice from the first audio data based on the first voiceprint information, and enhance the determined voice.
In some embodiments, the first electronic device may separate, based on the first voiceprint information, a human voice audio corresponding to the first voiceprint information from the first audio data; and mixing the voice audio with the first audio data to obtain the first audio data after the voice enhancement processing. Therefore, after the voice audio corresponding to the first voiceprint information is separated, the voice audio and the first audio data are mixed, so that the voice audio can be superposed, and the voice is highlighted and enhanced.
In one possible implementation, after separating the human voice audio, the first electronic device may perform amplitude enhancement on the human voice audio, and perform amplitude attenuation on other audio after separating the human voice audio in the first audio data; and then mixing the human voice audio subjected to amplitude enhancement with other audio subjected to amplitude attenuation to obtain first audio data subjected to human voice enhancement processing.
Alternatively, the human voice audio may be subjected to amplitude enhancement, and the human voice audio may be input to the amplitude enhancement model. The human voice enhancement model can be a model obtained by training a deep neural network through a voice sample with noise so as to inhibit noise and reduce signal distortion. Specifically, the amplitude enhancement model may output speech phase information to be enhanced according to the input speech audio, and then overlap-add the speech audio according to the speech phase information to obtain the amplitude-enhanced speech audio.
Optionally, the other audio is subjected to amplitude attenuation, and the other audio may also be input to the amplitude attenuation model to perform amplitude attenuation on the other audio. The amplitude weakening model can also be a model obtained by training a deep neural network through a voice sample, and the amplitude weakening model can output voice phase information needing to be weakened according to other input voice frequencies and then weaken the amplitude of the other voice frequencies according to the voice phase information. In this embodiment, amplitude reduction of other audio, rather than completely discarding this portion of the sound, may preserve the true speech capture environment.
Optionally, after the human voice audio subjected to amplitude enhancement and other audio subjected to amplitude attenuation are mixed to obtain first audio data subjected to human voice enhancement, smoothing may be performed on the first audio data to make the obtained sound more natural.
In some embodiments, if the first electronic device does not store the first voiceprint information, that is, the second electronic device performs the voiceprint information according to which the second electronic device performs the vocal enhancement processing on the acquired audio data, the first electronic device does not perform the vocal enhancement processing any more after receiving the audio data sent by the second electronic device.
In some embodiments, after receiving the first audio data, the first electronic device may obtain, from the second electronic device, an ambient sound volume of an environment in which the second electronic device is located; then comparing the environmental sound volume with a volume threshold, if the environmental sound volume is greater than the volume threshold, carrying out human sound enhancement processing on the received first audio data again according to the first voiceprint information; if the volume of the ambient sound is not greater than the volume threshold, the human voice enhancement processing may not be performed on the received first audio data again.
In a possible implementation manner, when the volume of the environmental sound of the environment where the second electronic device is located is greater than the volume threshold, the first electronic device may further determine an enhancement amplitude of the human voice enhancement according to the volume of the environmental sound, and then perform the human voice enhancement on the first audio data according to the enhancement amplitude. Alternatively, in the above embodiment in which the human voice audio is separated and subjected to amplitude enhancement, and then mixed with other audio whose amplitude is attenuated, the human voice audio may be subjected to amplitude enhancement according to the enhancement range. Wherein, the enhanced amplitude can be positively correlated with the volume of the environmental sound.
Step S250: and playing the first audio data after the human sound enhancement processing.
In this embodiment of the application, after performing the voice enhancement processing on the received first audio data, the first electronic device may play the first audio data after the voice enhancement processing through the audio output unit, so that the user can hear the played audio.
In the audio processing method provided by the embodiment of the application, the first electronic device can control the opening of the voice enhancement processing according to the requirements of the user, and the first electronic device can receive the voiceprint information shared by the second electronic device, so that the voice enhancement processing can be performed on the audio data in the audio transmission at the sending end and the receiving end, the voice is enhanced doubly, the voice corresponding to the first voiceprint information can be highlighted, and the user can hear the voice of the other side conveniently.
Referring to fig. 7, fig. 7 is a flow chart illustrating an audio processing method according to another embodiment of the present application. The audio processing method is applied to the electronic device, and will be described in detail with reference to the flow shown in fig. 7, and the audio processing method may specifically include the following steps:
step S310: and receiving first audio data sent by the second electronic device in the audio transmission process, wherein the first audio data is obtained by performing human voice enhancement processing on the audio data acquired by the second electronic device in the audio transmission process according to the first voiceprint information.
Step S320: and performing human voice enhancement processing on the first audio data based on the stored first voiceprint information.
Step S330: and playing the first audio data after the human sound enhancement processing.
In the embodiment of the present application, steps S310 to S330 may refer to the contents of the foregoing embodiments, and are not described herein again.
Step S340: and sending the second voiceprint information to the second electronic equipment.
In this embodiment of the application, the first electronic device may also perform, in the audio transmission process, human voice enhancement processing on the audio data collected by the first electronic device and used for transmitting to the second electronic device. And the first electronic device may send the second fingerprint information to the second electronic device, so that the second electronic device can perform the human voice enhancement processing on the received audio data again according to the second fingerprint information after receiving the audio data sent by the second electronic device.
In some embodiments, the first electronic device sends the second voiceprint information to the second electronic device in response to a sharing operation for sharing the voiceprint.
Optionally, the first electronic device displays a control interface of the audio transmission, where the control interface includes a second control for sharing voiceprints; and responding to the selection operation of the second control, and sending second voiceprint information to the second electronic equipment.
In this embodiment, the selection operation on the second control may be a click operation on the second control, a press operation meeting a specified press condition, a slide operation meeting a specified slide condition, or the like, and the specific selection operation may not be limited. Wherein specifying the compression condition may include at least one of: the pressing duration corresponding to the pressing operation is longer than the first duration; the pressing area corresponding to the pressing operation is larger than the designated area; and the pressing pressure value corresponding to the pressing operation is greater than the specified pressure value. The specified slip condition may include at least one of the following conditions: the sliding distance corresponding to the sliding operation is larger than a sliding distance threshold value; and the sliding track corresponding to the sliding operation meets the specified sliding track.
Referring again to FIG. 6, upon detecting a selection operation of the control A2, more functions may be shown in the video call interface A1: the method comprises the steps of converting a camera, hands-free operation, voice conversion, voice enhancement and personal voiceprint sharing, wherein a control corresponding to voiceprint sharing is a second control A4, and when the selection operation of the second control A4 is detected, the operation can be responded, and second voiceprint information is sent to second electronic equipment.
Of course, the specific manner of controlling voiceprint sharing in the embodiment of the present application may not be limited.
In some embodiments, before the first electronic device shares the second voiceprint information with the second electronic device, the first electronic device may further determine whether preset voiceprint information exists in the first electronic device; if the preset voiceprint information exists, the preset voiceprint information is used as second voiceprint information; and if the preset voiceprint information does not exist, extracting the voiceprint characteristics according to the audio data collected by the first electronic equipment in the audio transmission to obtain second voiceprint information.
In one possible implementation, the first electronic device may extract voiceprint features from the acquired audio data according to a pre-trained voiceprint recognition model. In the training stage of the voiceprint recognition model, after a voice sample is preprocessed, voiceprint features are extracted by using a voiceprint feature extraction algorithm, and then the neural network is trained according to the voice sample and the corresponding voiceprint features of the voice sample to obtain the voiceprint recognition model. Optionally, in the voice samples, uniform voice samples can be obtained according to gender, age group and geographic area, so as to improve accuracy of the voiceprint recognition model.
When the voiceprint recognition model is used for extracting the voiceprint characteristics of the collected audio data, the collected audio data can be preprocessed, background noise and partial background noise are removed, the processed data is input into the voiceprint recognition model, and the voiceprint characteristics are obtained through recognition, namely the second voiceprint information is obtained.
In a possible implementation manner, if there are a plurality of preset voiceprint information corresponding to the user in the first electronic device, the user may be prompted to select the preset voiceprint information according to the voice enhancement processing in the audio transmission, so that the selected preset voiceprint information is used as the second voiceprint information.
In a possible implementation manner, when the first electronic device responds to the selection operation for the second control, if preset voiceprint information does not exist, a prompt message may be output in the control interface to prompt the user to perform voiceprint extraction according to subsequently input voice.
For example, referring to fig. 8, in the video call scenario, a prompt box a5 may be output in the video call interface a1, and the prompt content of the prompt box a5 is: if the user does not input the voiceprint before, whether voiceprint extraction is carried out according to the subsequent voice content of the user; and the prompt box a5 further includes a control for the user to confirm whether to perform voiceprint extraction, and when detecting that the user confirms the voiceprint extraction operation, the voiceprint feature extraction can be performed according to the collected audio data to obtain second voiceprint information.
Further, if the user confirms that voiceprint extraction is not performed, the user is prompted to output prompting information after the video call is finished, so that voiceprints are input. For example, referring to fig. 9, a prompt may be output in the video call interface a 1: you refuse the current voiceprint extraction request, cannot share the voiceprint, and please enter the personal voiceprint after the call is received. In addition, after the call is ended, a prompt message may be output to prompt the user to enter the personal voiceprint again, for example, referring to fig. 10, after the call is ended, an "unread message: please enter a personal voiceprint.
Optionally, when the first electronic device extracts the voiceprint features according to the acquired audio data, the prompt information may be output after the voiceprint information is successfully extracted, so as to prompt the user that the voiceprint information is successfully extracted. For example, referring to fig. 11, in the video call interface a1, a prompt message may be output: and the voiceprint recognition is completed, the voiceprint is successfully shared, and the opposite user can already receive clearer voice output.
In some embodiments, the first electronic device may obtain an ambient volume of an environment where the first electronic device is located, and output a prompt message to prompt the user to share the first voiceprint information when the ambient volume is greater than a set volume threshold, so that the second electronic device can perform the human voice enhancement processing on the received audio data again.
In some embodiments, the first electronic device and the second electronic device may perform the human voice enhancement processing on the acquired audio data only when the first electronic device and the second electronic device share the voiceprint information to each other. In this embodiment, the first electronic device may obtain, from the second electronic device, an ambient sound volume of an environment where the second electronic device is located, and if the ambient sound volume is greater than a preset threshold, may send a prompt message to the second electronic device to prompt the second electronic device to share the first voiceprint information, and prompt the second electronic device to perform a vocal enhancement process on the acquired audio data. Of course, when detecting that the volume of the ambient sound of the first electronic device is greater than the preset threshold, the first electronic device may also prompt the user to share the second voiceprint information, and perform the vocal enhancement processing on the acquired audio data under the condition of sharing the second voiceprint information.
In this embodiment, the turning on and off of the vocal enhancement processing and the embodiment of voiceprint sharing may be combined. For example, in a video call scene, if the first electronic device does not start the voice enhancement processing and does not share the second voiceprint information, the first electronic device sends the originally acquired audio data to the second electronic device, and does not perform the voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, the received audio data is not subjected to the human voice enhancement processing.
For another example, in a video call scene, if the first electronic device does not start the voice enhancement processing and shares the second voiceprint information, the first electronic device sends the audio data after the voice enhancement processing to the second electronic device, and does not perform the voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, the received audio data is not subjected to the human voice enhancement processing.
For another example, in a video call scene, if the first electronic device starts the voice enhancement processing but does not share the second voiceprint information, the first electronic device sends the originally acquired audio data to the second electronic device, but performs the voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, the received audio data may be subjected to voice enhancement processing; and if the first electronic equipment does not receive the first voiceprint information shared by the second electronic equipment, the received audio data is not subjected to voice enhancement processing.
For another example, in a video call scene, if the first electronic device starts the voice enhancement processing and shares the second voiceprint information, the first electronic device sends the audio data after the voice enhancement processing to the second electronic device, and can perform the voice enhancement processing on the audio data sent by the second electronic device; at this time, if the first electronic device receives the first voiceprint information shared by the second electronic device, the received audio data may be subjected to voice enhancement processing; and if the first electronic equipment does not receive the first voiceprint information shared by the second electronic equipment, the received audio data is not subjected to voice enhancement processing.
In some embodiments, after the audio transmission is finished, the first electronic device may delete the first voiceprint information shared by the second electronic device, and the second electronic device may also delete the second voiceprint information shared by the first electronic device, so as to prevent the user privacy from being revealed.
In some embodiments, if the application program of the audio transmission is allowed for the first time, the first electronic device may output a prompt message to prompt the user that the application program has the function of the human voice enhancement processing and prompt the user to enter voiceprint information.
Step S350: and acquiring second audio data acquired by the first electronic equipment in the audio transmission process.
Step S360: and performing human voice enhancement processing on the second audio data based on the second voiceprint information.
In this embodiment of the application, after the first electronic device acquires the second audio data for sending to the second electronic device, the second audio data may be subjected to vocal enhancement processing based on the second voiceprint information. The manner of performing the human voice enhancement processing on the first audio data by the first electronic device in the foregoing embodiment may be referred to, and is not described herein again.
Step S370: and sending the second audio data subjected to the human voice enhancement processing to the second electronic equipment, wherein the second electronic equipment is used for carrying out the human voice enhancement processing again on the audio data sent by the first electronic equipment according to the second voiceprint information.
In this application embodiment, after carrying out the vocal enhancement processing on the collected audio data, then can send the second audio data after the vocal enhancement processing to the second electronic device, and the second electronic device can carry out the vocal enhancement processing again on the second audio data according to the second voiceprint information, so that the user can hear the sound of the user of the first electronic device in the playing audio.
In some embodiments, in the process of performing audio transmission, if there are a plurality of second electronic devices performing audio transmission with the first electronic device, there may be a case that a user wants to record the voice of the user of a certain electronic device, for example, during a multi-person video call conference, only the speaking content of a main speaker is recorded, and the speaking content of other people is not wanted to be recorded. Therefore, the voice and the audio of the specified user can be separated from the received audio data by utilizing the voiceprint information of the specified user and recorded.
Optionally, since the recording involves information security issues, if there is a user of a second electronic device who wants to record the audio of the user of the first electronic device, a prompt message may be output in the control interface of audio transmission to prompt the user whether to allow the recording, and the second electronic device may be allowed to record only if the user confirms that the recording is allowed.
For example, in a scene of a multi-person video call, if a user a of a first electronic device wants to record sounds of users B and C, but the users B and C do not allow recording through their respective corresponding second electronic devices, the first electronic device may send a recording request to the second electronic devices corresponding to the users B and C; if the user B inputs an instruction for allowing recording through the second electronic device and the user C inputs an instruction for not allowing recording through the second electronic device, the first electronic device can record the audio pushed by the electronic device of the user B independently; if the users B and C allow the first electronic device to record through the second electronic device, the first electronic device may record the audio of the users B and C separately, or record the combined audio of the users B and C.
Optionally, in the control interface of the audio transmission, a third control for allowing others to record sound may be displayed, and when an operation on the third control is detected, a prompt message may be output to prompt the user whether to run the other's record sound. For example, referring to fig. 12 and fig. 13, in the video call interface a1, after detecting the operation of the third control A6 of "record is allowed", a prompt content "will all users have right to record your voice individually during the call, and whether to continue? If receiving the agreement operation of the user, allowing the equipment of other users to obtain the recording authority; otherwise, the devices of other users are not allowed to obtain the recording authority.
Optionally, in the control interface of the audio transmission, a fourth control for separating the audio record may be displayed, and when an operation on the fourth control is detected, a user who can select the audio record may be output for the user to select a user who wants to record. For example, referring to fig. 12 and 14, in the video call interface a1, after detecting the operation of the third control a7 of "split recording", a prompt "please select the user who wants to record" may be displayed in the video call interface a1, and a selectable selection option of the user is provided. Further, referring to fig. 15, if the user selects a plurality of users who need recording, the user may be prompted to select a recording mode in the video call interface a1, where the recording mode may include "separate recording" and "combine recording". Correspondingly, referring to fig. 16, the electronic device of the user selected to record may receive the corresponding recording request, and the electronic device may output a prompt message in the video call interface a1 to prompt the user whether to allow recording.
According to the audio processing method provided by the embodiment of the application, because the first audio data is subjected to the voice enhancement processing at the sending end and the receiving end, the voice is enhanced doubly, so that the voice corresponding to the first voiceprint information can be highlighted, and a user can hear the voice of the other party conveniently. In addition, a realization method for separating the recording is provided, which can meet the requirement that the user records the speaking content of the user who wants to record.
Referring to fig. 17, a block diagram of an audio processing apparatus 400 according to an embodiment of the present disclosure is shown. The audio processing apparatus 400 employs the above-mentioned first electronic device, which is connected to the second electronic device in audio transmission. The audio processing apparatus 400 includes: an audio receiving module 410, a human voice enhancement module 420, and an audio playing module 430. The audio receiving module 410 is configured to receive first audio data sent by the second electronic device in the audio transmission process, where the first audio data is obtained by performing human voice enhancement processing on audio data acquired by the second electronic device in the audio transmission process according to first voiceprint information; the voice enhancement module 420 is configured to perform voice enhancement processing on the first audio data based on the stored first voiceprint information; the audio playing module 430 is configured to play the first audio data after the human voice enhancement processing.
In some embodiments, the audio processing apparatus 400 further comprises: and a human voice enhancement control module. The human voice enhancement control module is used for: and responding to the opening operation of the human voice enhancement processing, and opening the human voice enhancement processing to perform human voice enhancement processing on the received audio data.
In some embodiments, the human voice enhancement control module may be configured to: displaying a control interface of the audio transmission, wherein the control interface comprises a first control used for controlling the opening of the human voice enhancement processing; and responding to the starting operation of the first control, starting the human voice enhancement processing so as to perform human voice enhancement processing on the received audio data.
In a possible implementation, the audio processing apparatus 400 further includes: a prompt message receiving module and a prompt message display module. The prompt information receiving module is used for receiving prompt information sent by the second electronic device, the prompt information is used for prompting to start voice enhancement processing, and the prompt information is sent when the second electronic device detects that the volume of the environmental noise of the current environment is larger than a preset volume; and the prompt information display module is used for displaying the prompt information on the control interface.
In some embodiments, the audio processing apparatus 400 further comprises: and a voiceprint receiving module. The voiceprint receiving module is used for receiving the first voiceprint information sent by the second electronic device.
In some embodiments, the audio processing apparatus 400 further includes a voiceprint sending module, an audio capturing module, and an audio sending module. The voiceprint sending module is used for sending second voiceprint information to the second electronic equipment; the audio acquisition module is used for acquiring second audio data acquired by the first electronic equipment in the audio transmission process; the voice enhancement module is used for carrying out voice enhancement processing on the second audio data based on the second voiceprint information; and the audio sending module is used for sending the second audio data subjected to the human voice enhancement processing to the second electronic equipment, and the second electronic equipment is used for carrying out the human voice enhancement processing on the audio data sent by the first electronic equipment again according to the second voiceprint information.
In some embodiments, the voiceprint sending module may be to: and responding to the sharing operation for sharing the voiceprint, and sending second voiceprint information to the second electronic equipment.
In a possible implementation, the audio processing apparatus 400 may further include: and an interface display module. The interface display module is used for displaying a control interface of the audio transmission, and the control interface comprises a second control used for sharing voiceprints; the voiceprint sending module may be to: and responding to the selection operation of the second control, and sending second voiceprint information to the second electronic equipment.
In a possible implementation, the audio processing apparatus 400 may further include: the device comprises a voiceprint judging module, a first determining module and a second determining module. The voiceprint judging module is used for judging whether preset voiceprint information exists in the first electronic equipment; the first determining module is used for taking the preset voiceprint information as second voiceprint information if the preset voiceprint information exists; and the second determining module is used for extracting voiceprint characteristics according to the audio data acquired by the first electronic equipment in the audio transmission to obtain second voiceprint information if the preset voiceprint information does not exist.
In some embodiments, the human sound enhancement module 420 may include an audio separation unit and a mixing unit. The audio separation unit is used for separating the human voice audio corresponding to the first voiceprint information from the first audio data based on the first voiceprint information; the sound mixing unit is used for mixing the human voice audio and the first audio data to obtain first audio data after human voice enhancement processing.
In one possible implementation, the human voice enhancement module 420 may further include an amplitude enhancement unit and an amplitude reduction unit. The amplitude enhancement unit is used for carrying out amplitude enhancement on the human voice frequency; the amplitude weakening unit is used for weakening the amplitude of other audio frequencies after the human voice audio frequency is separated from the first audio frequency data. The mixing unit may be adapted to: and mixing the voice audio subjected to amplitude enhancement and the other audio subjected to amplitude attenuation to obtain first audio data subjected to voice enhancement processing.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
To sum up, the scheme provided by the application is that the first audio data sent by the second electronic device in the audio transmission process is received, the first audio data is obtained after the audio data collected in the audio transmission process is subjected to voice enhancement processing for the second electronic device according to the first voiceprint information, then the first audio data is subjected to voice enhancement processing based on the stored first voiceprint information, and the first audio data subjected to voice enhancement processing is played, so that the receiving end also performs voice enhancement processing on the received audio data again while the sending end of the audio transmission performs voice enhancement processing on the collected audio data, thereby the voice can be highlighted in the played audio, and a user can conveniently hear and clearly the sound of the user corresponding to the sending end.
Referring to fig. 18, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 100 may be an electronic device capable of running an application, such as a smart phone, a tablet computer, a smart watch, smart glasses, and a notebook computer. The electronic device 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and one or more applications, wherein the one or more applications may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more programs configured to perform a method as described in the aforementioned method embodiments.
Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall electronic device 100 using various interfaces and lines, and performs various functions of the electronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a communication chip.
The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created by the electronic device 100 during use (e.g., phone book, audio-video data, chat log data), and the like.
Referring to fig. 19, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 800 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.
The computer-readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 800 includes a non-volatile computer-readable storage medium. The computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 810 may be compressed, for example, in a suitable form.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (14)

1. An audio processing method applied to a first electronic device in audio transmission, wherein the first electronic device is connected with a second electronic device in the audio transmission, and the method comprises the following steps:
receiving first audio data sent by the second electronic device in the audio transmission process, wherein the first audio data is obtained by performing human voice enhancement processing on the audio data acquired by the second electronic device in the audio transmission process according to first voiceprint information;
based on the stored first voiceprint information, carrying out human voice enhancement processing on the first audio data;
and playing the first audio data after the human sound enhancement processing.
2. The method of claim 1, wherein prior to the performing of the vocal enhancement processing on the first audio data based on the stored first voiceprint information, the method further comprises:
and responding to the opening operation of the human voice enhancement processing, and opening the human voice enhancement processing to perform human voice enhancement processing on the received audio data.
3. The method of claim 2, wherein the turning on the human voice enhancement processing for human voice enhancement processing of the received audio data in response to the turning on operation of the human voice enhancement processing comprises:
displaying a control interface of the audio transmission, wherein the control interface comprises a first control used for controlling the opening of the human voice enhancement processing;
and responding to the starting operation of the first control, starting the human voice enhancement processing so as to perform human voice enhancement processing on the received audio data.
4. The method of claim 2, wherein before the turning on the human voice enhancement processing in response to the turning on operation of the human voice enhancement processing to perform the human voice enhancement processing on the received audio data, the method further comprises:
receiving prompt information sent by the second electronic device, wherein the prompt information is used for prompting to start voice enhancement processing, and the prompt information is sent when the second electronic device detects that the volume of the environmental noise of the current environment is larger than a preset volume;
and displaying the prompt information on the control interface.
5. The method of claim 1, wherein prior to the performing of the vocal enhancement processing on the first audio data based on the stored first voiceprint information, the method further comprises:
and receiving the first voiceprint information sent by the second electronic equipment.
6. The method of claim 1, further comprising:
sending second voiceprint information to the second electronic device;
acquiring second audio data acquired by the first electronic equipment in the audio transmission process;
performing human voice enhancement processing on the second audio data based on the second voiceprint information;
and sending the second audio data subjected to the human voice enhancement processing to the second electronic equipment, wherein the second electronic equipment is used for carrying out the human voice enhancement processing again on the audio data sent by the first electronic equipment according to the second voiceprint information.
7. The method of claim 6, wherein sending the second voiceprint information to the second electronic device comprises:
and responding to the sharing operation for sharing the voiceprint, and sending second voiceprint information to the second electronic equipment.
8. The method of claim 7, wherein before the sending second voiceprint information to the second electronic device in response to the sharing operation for sharing the voiceprint, the method further comprises:
displaying a control interface of the audio transmission, wherein the control interface comprises a second control used for sharing voiceprints;
the responding to the sharing operation for sharing the voiceprint and sending the second voiceprint information to the second electronic equipment comprises the following steps:
and responding to the selection operation of the second control, and sending second voiceprint information to the second electronic equipment.
9. The method of claim 6, wherein prior to the sending the second voiceprint information to the second electronic device, the method further comprises:
judging whether preset voiceprint information exists in the first electronic equipment or not;
if the preset voiceprint information exists, taking the preset voiceprint information as second voiceprint information;
and if the preset voiceprint information does not exist, extracting the voiceprint characteristics according to the audio data collected by the first electronic equipment in the audio transmission to obtain second voiceprint information.
10. The method according to any one of claims 1-9, wherein performing the vocal enhancement processing on the first audio data based on the stored first voiceprint information comprises:
based on the first voiceprint information, separating the human voice audio corresponding to the first voiceprint information from the first audio data;
and mixing the voice audio and the first audio data to obtain the first audio data after the voice enhancement processing.
11. The method according to claim 10, wherein before the mixing the human voice audio with the first audio data to obtain the human voice enhanced processed first audio data, the method further comprises:
performing amplitude enhancement on the human voice audio;
carrying out amplitude attenuation on other audios after the human voice audio is separated from the first audio data;
the mixing the voice audio and the first audio data to obtain the first audio data after the voice enhancement processing includes:
and mixing the voice audio subjected to amplitude enhancement and the other audio subjected to amplitude attenuation to obtain first audio data subjected to voice enhancement processing.
12. An audio processing apparatus, applied to a first electronic device in audio transmission, the first electronic device being connected to a second electronic device in the audio transmission, the apparatus comprising: an audio receiving module, a human voice enhancing module and an audio playing module, wherein,
the audio receiving module is configured to receive first audio data sent by the second electronic device in the audio transmission process, where the first audio data is obtained by performing, by the second electronic device, human voice enhancement processing on audio data acquired in the audio transmission process according to first voiceprint information;
the voice enhancement module is used for performing voice enhancement processing on the first audio data based on the stored first voiceprint information;
the audio playing module is used for playing the first audio data after the human voice enhancement processing.
13. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-11.
14. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 11.
CN202110547910.1A 2021-05-19 2021-05-19 Audio processing method, device, electronic equipment and storage medium Active CN113284500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110547910.1A CN113284500B (en) 2021-05-19 2021-05-19 Audio processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110547910.1A CN113284500B (en) 2021-05-19 2021-05-19 Audio processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113284500A true CN113284500A (en) 2021-08-20
CN113284500B CN113284500B (en) 2024-02-06

Family

ID=77280030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110547910.1A Active CN113284500B (en) 2021-05-19 2021-05-19 Audio processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113284500B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113726936A (en) * 2021-08-30 2021-11-30 联想(北京)有限公司 Audio data processing method and device
CN114466242A (en) * 2022-01-27 2022-05-10 海信视像科技股份有限公司 Display device and audio processing method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102664022A (en) * 2012-04-01 2012-09-12 北京风灵创景科技有限公司 Mobile terminal and method for optimizing conversation tone quality of mobile terminal
CN106920559A (en) * 2017-03-02 2017-07-04 奇酷互联网络科技(深圳)有限公司 The optimization method of conversation voice, device and call terminal
US20180069958A1 (en) * 2015-05-28 2018-03-08 Line Corporation Systems, non-transitory computer-readable media and methods for voice quality enhancement
CN108449507A (en) * 2018-03-12 2018-08-24 广东欧珀移动通信有限公司 Voice communication data processing method, device, storage medium and mobile terminal
CN108449506A (en) * 2018-03-12 2018-08-24 广东欧珀移动通信有限公司 Voice communication data processing method, device, storage medium and mobile terminal
CN109065066A (en) * 2018-09-29 2018-12-21 广东小天才科技有限公司 A kind of call control method, device and equipment
US20190068773A1 (en) * 2017-08-31 2019-02-28 Uniti, Llc Enhanced calling systems and methods
CN110970057A (en) * 2018-09-29 2020-04-07 华为技术有限公司 Sound processing method, device and equipment
WO2020211735A1 (en) * 2019-04-19 2020-10-22 华为技术有限公司 Method for using enhanced function of electronic device and related apparatus
CN111986689A (en) * 2020-07-30 2020-11-24 维沃移动通信有限公司 Audio playing method, audio playing device and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102664022A (en) * 2012-04-01 2012-09-12 北京风灵创景科技有限公司 Mobile terminal and method for optimizing conversation tone quality of mobile terminal
US20180069958A1 (en) * 2015-05-28 2018-03-08 Line Corporation Systems, non-transitory computer-readable media and methods for voice quality enhancement
CN106920559A (en) * 2017-03-02 2017-07-04 奇酷互联网络科技(深圳)有限公司 The optimization method of conversation voice, device and call terminal
US20190068773A1 (en) * 2017-08-31 2019-02-28 Uniti, Llc Enhanced calling systems and methods
CN108449507A (en) * 2018-03-12 2018-08-24 广东欧珀移动通信有限公司 Voice communication data processing method, device, storage medium and mobile terminal
CN108449506A (en) * 2018-03-12 2018-08-24 广东欧珀移动通信有限公司 Voice communication data processing method, device, storage medium and mobile terminal
CN109065066A (en) * 2018-09-29 2018-12-21 广东小天才科技有限公司 A kind of call control method, device and equipment
CN110970057A (en) * 2018-09-29 2020-04-07 华为技术有限公司 Sound processing method, device and equipment
WO2020211735A1 (en) * 2019-04-19 2020-10-22 华为技术有限公司 Method for using enhanced function of electronic device and related apparatus
CN111986689A (en) * 2020-07-30 2020-11-24 维沃移动通信有限公司 Audio playing method, audio playing device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113726936A (en) * 2021-08-30 2021-11-30 联想(北京)有限公司 Audio data processing method and device
CN113726936B (en) * 2021-08-30 2023-10-24 联想(北京)有限公司 Audio data processing method and device
CN114466242A (en) * 2022-01-27 2022-05-10 海信视像科技股份有限公司 Display device and audio processing method

Also Published As

Publication number Publication date
CN113284500B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
US11495231B2 (en) Lip language recognition method and mobile terminal using sound and silent modes
CN107682752B (en) Method, device and system for displaying video picture, terminal equipment and storage medium
US9128593B2 (en) Enabling an interactive program associated with a live broadcast on a mobile device
CN110390927B (en) Audio processing method and device, electronic equipment and computer readable storage medium
CN113284500B (en) Audio processing method, device, electronic equipment and storage medium
CN104811469B (en) Emotion sharing method and device for mobile terminal and mobile terminal thereof
EP3992962A1 (en) Voice interaction method and related device
US20170286755A1 (en) Facebot
CN109286554B (en) Social function unlocking method and device in social application
CN113301372A (en) Live broadcast method, device, terminal and storage medium
CN112565913B (en) Video call method and device and electronic equipment
CN110634478A (en) Method and apparatus for processing speech signal
US20200184973A1 (en) Transcription of communications
CN113257242A (en) Voice broadcast suspension method, device, equipment and medium in self-service voice service
CN113316041B (en) Remote health detection system, method, device and equipment
WO2024032111A9 (en) Data processing method and apparatus for online conference, and device, medium and product
CN113098931B (en) Information sharing method and multimedia session terminal
CN113573143B (en) Audio playing method and electronic equipment
CN115482824A (en) Speaker recognition method and device, electronic equipment and computer readable storage medium
US11830120B2 (en) Speech image providing method and computing device for performing the same
US11621016B2 (en) Intelligent noise suppression for audio signals within a communication platform
CN113066513B (en) Voice data processing method and device, electronic equipment and storage medium
CN116741193B (en) Training method and device for voice enhancement network, storage medium and computer equipment
KR20180034927A (en) Communication terminal for analyzing call speech
CN115022108A (en) Conference access method, conference access device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant