CN114141260A - Audio processing method, device, equipment and storage medium - Google Patents

Audio processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114141260A
CN114141260A CN202111415444.8A CN202111415444A CN114141260A CN 114141260 A CN114141260 A CN 114141260A CN 202111415444 A CN202111415444 A CN 202111415444A CN 114141260 A CN114141260 A CN 114141260A
Authority
CN
China
Prior art keywords
audio
processing
user
target
audio information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111415444.8A
Other languages
Chinese (zh)
Inventor
黄灵
郗恩延
赵偲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111415444.8A priority Critical patent/CN114141260A/en
Publication of CN114141260A publication Critical patent/CN114141260A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present disclosure relates to an audio processing method, apparatus, device, and storage medium, which may collect original audio information of a user in response to a first user operation input; in the process of collecting the original audio, the collected original audio is subjected to quasi-real-time noise reduction processing and sound change processing, and the problem that in the related art, after a user finishes audio input, the user waits for a long time due to the fact that the noise reduction processing and the sound change processing are carried out is solved. In addition, the audio signals subjected to noise reduction and sound change processing are output according to the processed audio information, so that a user can experience the audio effect after noise reduction and sound change in real time.

Description

Audio processing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to an audio processing method, apparatus, device, and storage medium.
Background
Currently, scenarios involving audio input include voice communication, audio/video file creation and editing, and the like. The voice communication scene includes but is not limited to fixed telephone, mobile telephone, audio and video chat, voice mail, network live broadcast, and the making and editing of audio/video files includes but is not limited to dubbing videos, recording audio files and the like. In a scene related to audio input, in order to meet user requirements, noise reduction processing is usually performed on collected user audio to remove noise in sound and improve the quality of audio signals, and sound change processing can also be performed on the collected user audio to change the tone color or tone of the sound and improve the interestingness of an audio input scene.
In the related art, after the user completes audio input, the input audio information is subjected to noise reduction, sound change and other processing, so that long user waiting time is generated, and user experience is not utilized.
Disclosure of Invention
The present disclosure provides an audio processing method, apparatus, device, and storage medium, to at least solve the problem in the related art that after a user completes audio input, processing such as noise reduction and sound change is performed on input audio information, thereby resulting in long user waiting time and not using user experience. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided an audio processing method, including: in response to the input first user operation, acquiring original audio information of a user; in the process of collecting the original audio information, carrying out preset processing on the collected original audio information to obtain a first target audio signal, wherein the preset processing comprises noise reduction processing according to preset noise reduction processing parameters and sound variation processing according to preset sound variation processing parameters; outputting the first target audio signal to a target device, the target device including the user-side device and/or other devices in communication with the user-side device.
With reference to the first aspect, in a possible implementation manner of the first aspect, in a process of acquiring the original audio information, performing preset processing on the acquired original audio information includes: when an audio information segment with preset duration is acquired, carrying out preset processing on the audio information segment to obtain a first target audio signal corresponding to the audio information segment; outputting the first target audio signal to a target device, comprising: and outputting the first target audio signal corresponding to the audio information segment to target equipment after each first target audio signal corresponding to the audio information segment is obtained.
With reference to the first aspect, in a possible implementation manner of the first aspect, before acquiring original audio information of a user in response to an input of a first user operation, the method further includes: displaying an audio acquisition interface, wherein the audio acquisition interface comprises a sound changing control; responding to the operation of a user on the sound variation control, and displaying at least one sound variation option, wherein different sound variation options correspond to different sound variation processing parameters, and the sound variation processing parameters comprise tone parameters and/or tone color parameters; in response to the input operation of selecting a first target sound variation option, determining the tone parameter and/or the tone color parameter corresponding to the first target sound variation option as the preset sound variation processing parameter, so as to process the tone and/or the tone color of the original audio information according to the tone parameter and/or the tone color parameter corresponding to the first target sound variation option.
With reference to the first aspect, in a possible implementation manner of the first aspect, the method further includes: in the process of collecting the original audio information, in response to an input operation of selecting a second target variant option, determining a tone parameter and/or a tone parameter corresponding to the second target variant option as the preset variant processing parameter, so as to process the tone and/or tone of the original audio information collected after the user selects the second target variant option according to the tone parameter and/or tone parameter corresponding to the second target variant option.
With reference to the first aspect, in a possible implementation manner of the first aspect, the audio acquisition interface includes a recording interface, a live network interface, and an audio call interface; the first user operation comprises user operation which is input in the recording interface and triggers to start recording, user operation which is input in the network live broadcast interface and triggers to start live broadcast, and user operation which is input in the audio call interface and indicates to initiate or accept a call request.
With reference to the first aspect, in a possible implementation manner of the first aspect, the method further includes: in response to the input second user operation, stopping collecting original audio information of a user, and displaying an audio processing interface, wherein the audio processing interface displays at least one sound variation option, different sound variation options correspond to different sound variation processing parameters, and the sound variation processing parameters comprise tone parameters and/or tone parameters; and responding to the input operation of selecting a third target sound variation option, and performing sound variation processing on the first target audio signal or the original audio information according to a tone parameter and/or a tone parameter corresponding to the third target sound variation option to obtain a second target audio signal.
With reference to the first aspect, in a possible implementation manner of the first aspect, the method further includes: responding to an input first user operation, and acquiring a noise reduction setting parameter, wherein the noise reduction setting parameter is used for representing the opening or closing of the noise reduction function; if the noise reduction setting parameter represents that the noise reduction function is started, the collected original audio information is subjected to the preset processing in the process of collecting the original audio information; and if the noise reduction setting parameter represents that the noise reduction function is closed, performing the sound changing processing on the acquired original audio information in the process of acquiring the original audio information.
With reference to the first aspect, in a possible implementation manner of the first aspect, a noise reduction control in an on state or an off state is further displayed on the audio acquisition interface, where the noise reduction control in the on state indicates that a noise reduction function is on, and the noise reduction control in the off state indicates that the noise reduction function is off; in response to the input first user operation, before acquiring the original audio information of the user, the method further comprises the following steps: responding to the input operation of the noise reduction control in the opening state, displaying the noise reduction control in a closing state, and configuring the noise reduction setting parameter as a parameter for representing the closing of a noise reduction function; or responding to the input operation of the noise reduction control in the closed state, displaying the noise reduction control in an open state, and configuring the noise reduction setting parameter as a parameter for representing the opening of a noise reduction function.
According to a second aspect of the embodiments of the present disclosure, there is provided an audio processing apparatus including an audio acquisition unit, an audio processing unit, and an audio output unit; the audio acquisition unit is used for responding to the input first user operation and acquiring original audio information of a user; the audio processing unit is used for performing preset processing on the acquired original audio information to obtain a first target audio signal in the process that the audio acquisition unit acquires the original audio information, wherein the preset processing comprises noise reduction processing performed according to preset noise reduction processing parameters and sound variation processing performed according to preset sound variation processing parameters; the audio output unit is configured to output the first target audio signal to a target device, where the target device includes the device on the user side and/or another device in communication with the device on the user side.
With reference to the second aspect, in a possible implementation manner of the second aspect, the audio processing unit is specifically configured to: when an audio information segment with preset duration is acquired, carrying out preset processing on the audio information segment to obtain a first target audio signal corresponding to the audio information segment; the audio output unit is specifically configured to output the first target audio signal corresponding to the audio information segment to a target device after each first target audio signal corresponding to the audio information segment is obtained.
With reference to the second aspect, in a possible implementation manner of the second aspect, the mobile terminal further includes an interface display unit and a parameter setting unit; the interface display unit is used for displaying an audio acquisition interface, and the audio acquisition interface comprises a sound changing control; responding to the operation of a user on the sound variation control, and displaying at least one sound variation option, wherein different sound variation options correspond to different sound variation processing parameters, and the sound variation processing parameters comprise tone parameters and/or tone color parameters; the parameter setting unit is configured to, in response to an input operation of selecting a first target sound variation option, determine the pitch parameter and/or the tone color parameter corresponding to the first target sound variation option as the preset sound variation processing parameter, so as to process the pitch and/or the tone color of the original audio information according to the pitch parameter and/or the tone color parameter corresponding to the first target sound variation option.
With reference to the second aspect, in a possible implementation manner of the second aspect, the parameter setting unit is further configured to: in the process of collecting the original audio information, in response to an input operation of selecting a second target variant sound option, determining a tone parameter and/or a tone parameter corresponding to the second target variant sound option as the preset variant sound processing parameter, so that the audio processing unit processes the tone and/or the tone of the original audio information collected after the user selects the second target variant sound option according to the tone parameter and/or the tone parameter corresponding to the second target variant sound option.
With reference to the second aspect, in a possible implementation manner of the second aspect, the audio acquisition interface includes a recording interface, a live network interface, and an audio call interface; the first user operation comprises user operation which is input in the recording interface and triggers to start recording, user operation which is input in the network live broadcast interface and triggers to start live broadcast, and user operation which is input in the audio call interface and indicates to initiate or accept a call request.
With reference to the second aspect, in a possible implementation manner of the second aspect, the audio collecting unit is further configured to stop collecting original audio information of the user in response to an input of a second user operation; the interface display unit is further used for responding to an input second user operation and displaying an audio processing interface, wherein at least one sound variation option is displayed on the audio processing interface, different sound variation options correspond to different sound variation processing parameters, and the sound variation processing parameters comprise tone parameters and/or tone parameters; the audio processing unit is further configured to, in response to an input operation of selecting a third target sound variation option, perform sound variation processing on the first target audio signal or the original audio information according to a tone parameter and/or a timbre parameter corresponding to the third target sound variation option, so as to obtain a second target audio signal.
With reference to the second aspect, in a possible implementation manner of the second aspect, the audio processing unit is specifically configured to: responding to an input first user operation, and acquiring a noise reduction setting parameter, wherein the noise reduction setting parameter is used for representing the opening or closing of the noise reduction function; if the noise reduction setting parameter represents that the noise reduction function is started, the collected original audio information is subjected to the preset processing in the process of collecting the original audio information; and if the noise reduction setting parameter represents that the noise reduction function is closed, performing the sound changing processing on the acquired original audio information in the process of acquiring the original audio information.
With reference to the second aspect, in a possible implementation manner of the second aspect, a noise reduction control in an on state or an off state is further displayed on the audio acquisition interface, the noise reduction control in the on state indicates that a noise reduction function is on, and the noise reduction control in the off state indicates that the noise reduction function is off; the interface display unit is further configured to display the noise reduction control in a closed state in response to an input operation on the noise reduction control in the open state, and display the noise reduction control in an open state in response to an input operation on the noise reduction control in the closed state; the parameter setting unit is further configured to configure the noise reduction setting parameter as a parameter representing that a noise reduction function is closed in response to the input operation on the noise reduction control in the on state; and responding to the input operation of the noise reduction control in the closed state, and configuring the noise reduction setting parameters as parameters for representing the opening of a noise reduction function.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor, a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the audio processing method as provided by the first aspect and any one of its possible designs.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of a server, enable the server to perform the audio processing method as provided by the first aspect and any one of its possible designs.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when run on a server, cause the server to perform the audio processing method as provided by the first aspect and any one of its possible designs.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: the audio processing method provided by the embodiment of the disclosure can respond to the input first user operation and collect the original audio information of a user; in the process of collecting the original audio, the collected original audio is subjected to quasi-real-time noise reduction processing and sound change processing, and the problem that in the related art, after a user finishes audio input, the user waits for a long time due to the fact that the noise reduction processing and the sound change processing are carried out is solved. In addition, the audio signals subjected to noise reduction and sound change processing are output according to the processed audio information, so that a user can experience the audio effect after noise reduction and sound change in real time.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a schematic diagram illustrating an audio interaction system architecture according to an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating an electronic device according to an exemplary embodiment;
FIG. 3 is a flow diagram illustrating an audio processing method according to an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating an audio capture interface in accordance with an exemplary embodiment;
FIG. 5 is a schematic diagram of another audio capture interface shown in accordance with an exemplary embodiment;
FIG. 6 is a schematic diagram of another audio capture interface shown in accordance with an exemplary embodiment;
FIG. 7 is a schematic diagram of another audio capture interface shown in accordance with an exemplary embodiment;
FIG. 8 is a schematic diagram of another audio capture interface shown in accordance with an exemplary embodiment;
FIG. 9 illustrates an audio information processing flow in accordance with an exemplary embodiment;
FIG. 10 is a schematic diagram illustrating an audio processing interface in accordance with an exemplary embodiment;
FIG. 11 is a block diagram illustrating an audio processing device according to an example embodiment;
fig. 12 is a schematic diagram illustrating a server architecture according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In addition, in the description of the embodiments of the present disclosure, "/" indicates an OR meaning, for example, A/B may indicate A or B, unless otherwise specified. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present disclosure, "a plurality" means two or more.
The audio processing method provided by the embodiment of the disclosure can be applied to an audio processing device, and the audio processing device is used for executing the audio processing method provided by the embodiment of the disclosure, so that a user can obtain better hearing experience. The audio processing device may be included in the server, included in the electronic apparatus, or may be independent of the server and the electronic apparatus.
The audio processing method provided by the embodiment of the disclosure can also be applied to an audio interaction system, where the audio interaction system includes a server and at least one terminal device, and the terminal device is configured to execute the audio processing method provided by the embodiment of the disclosure. Fig. 1 is a schematic diagram of an audio interactive system, as shown in fig. 1, the audio interactive system 10 includes a terminal device 101 and a server 102, and the terminal device 101 can communicate with the server 102.
The terminal device 101 may be any one of computer devices, where the computer device includes, but is not limited to, a mobile phone, a tablet computer, a desktop computer, a notebook computer, a vehicle-mounted terminal, a palm terminal, an Augmented Reality (AR) device, a Virtual Reality (VR) device, and the like, which can be installed and used in a content community application (such as a fast hand and a fast movie). The system can be used for man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like.
The server 102 may be a server, or may also be a server cluster composed of a plurality of servers or a cloud computing service center, which is not limited in this disclosure. The server 102 is mainly configured to store data related to a content community application installed on the terminal device 101, and may send corresponding data to the terminal device 101 when receiving a data acquisition request sent by the terminal device 101. For example, the terminal apparatus 101 may request interface data of an application from the server 102 to display an application interface according to the page data.
As shown in fig. 1, a terminal device 101 may interact with a server 102. For example, the terminal apparatus 101 may acquire interface data of an application from the server 102 to display an application interface according to the page data. The application interface is an audio acquisition interface and an audio processing interface as described in the following embodiments. The terminal device 101 may also transmit target audio data corresponding to the obtained target audio signal to the server 102. The terminal apparatus 101 may complete audio output on the terminal apparatus 101 side after obtaining the target audio signal. For example, after obtaining the target audio signal, the terminal device 101 sends the target audio signal to a power amplifier and a speaker inside the device, so as to output the target audio signal through the power amplifier and the speaker.
For example, in a recording scene, a user records sound through a recording application on an electronic device, and during the recording process, the user can hear his/her own sound through an ear return channel, where the sound is a target audio signal output in real time after the electronic device performs real-time processing on original audio information of the user.
In fig. 1, the audio interaction system further comprises at least one other terminal device 103. The terminal device 103 may communicate with the server 102 and also with the terminal device 101. The terminal devices 101 and 103 may implement data interaction through the server 102, and may also communicate through other manners, such as through a short-range wireless transmission technology, or through other network devices in the internet. After obtaining the target audio signal, the terminal device 101 may also output the target audio signal to another terminal device through the server 102, for example, the server 102 sends the target audio data to the terminal device 103, and the terminal device 103 completes outputting the target audio signal.
Illustratively, in a live scene, the anchor performs webcast through a live application on the terminal device 101, and the viewer views the live through a live application on the terminal device 103. In the live broadcast process, the terminal device 101 may receive and output audio collected and processed by the terminal device 101.
The server 102 is configured to receive and store target audio data sent by the terminal device 101. And the device is also used for sending corresponding target audio data to the terminal device when receiving a request of the terminal device for certain target audio data.
The terminal device 103 is configured to receive the target audio data sent by the server 102, analyze the target audio data to obtain a target audio signal, and output the target audio signal through a power amplifier and a speaker which are built in or external to the device.
It is understood that the terminal device 101 and the server 102 may be provided independently or integrated in a single device, and the disclosure is not limited thereto.
Both the terminal device 101 and the server 102 can be implemented by the electronic device 20 shown in fig. 2. As shown in fig. 2, a schematic structural diagram of an electronic device to which the technical solution provided by the embodiment of the present disclosure is applied is shown. The electronic device 20 in fig. 2 includes but is not limited to: a processor 201, a memory 202, an input unit 203, an interface unit 204, a power supply 205, and a display unit 206.
In some embodiments, the electronic device 20 further includes an audio collector, an audio processor, and an audio output module. Wherein, the audio collector is used for collecting audio information, such as a voice receiver; an audio processor may be included in the processor 201 for processing the acquired audio information, such as noise reduction processing and sound change processing; the audio output module comprises a power amplifier and a loudspeaker and is used for outputting audio signals transmitted by the audio processor.
The processor 201 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 202 and calling data stored in the memory 202, thereby performing overall monitoring of the electronic device. Processor 201 may include one or more processing units; optionally, the processor 201 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 201.
The memory 202 may be used to store software programs as well as various data. The memory 202 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one functional unit, and the like. Further, the memory 202 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Alternatively, the memory 202 may be a non-transitory computer readable storage medium, for example, a read-only memory (ROM), a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The input unit 203 may be a keyboard, a touch screen, or the like.
The interface unit 204 is an interface for connecting an external device to the electronic apparatus 20. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 204 may be used to receive input (e.g., data information, etc.) from an external device and transmit the received input to one or more elements within the electronic equipment 20 or may be used to transmit data between the electronic equipment 20 and the external device.
A power source 205 (e.g., a battery) may be used to power the various components. Optionally, the power source 205 may be logically connected to the processor 201 through a power management system, so as to implement functions of managing charging, discharging, power consumption management, and the like through the power management system.
The display unit 206 is used to display a User Interface (UI). The user interface may include graphics, text, icons, video, and any combination thereof. In the case where the display unit 206 is a display screen, the display unit 206 also has the ability to capture touch signals on or over the surface of the display unit 206. The touch signal may be input to the processor 201 as a control signal for processing. At this time, the display unit 206 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard.
In some embodiments, the display unit 206 may be a front panel of the terminal device 101; the display unit 206 may be made of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. In the case where the electronic device 20 is a terminal device, the terminal device includes the display unit 206.
Alternatively, the computer instructions in the embodiments of the present disclosure may also be referred to as application program codes or systems, which are not specifically limited by the embodiments of the present disclosure.
It should be noted that the electronic device shown in fig. 2 is only an example, and does not limit the electronic device to which the embodiment of the present disclosure is applicable. In actual implementation, the electronic device may include more or fewer devices or devices than those shown in fig. 2.
It should be noted that the audio processing method provided by the embodiment of the present disclosure may be applied to various scenes related to audio input, and the scenes related to audio input are not limited to the recording scene and the live scene mentioned above, and for example, may also include voice communication scenes such as a fixed phone, a mobile phone, audio and video chat, and a voicemail.
For real-time communication scenes such as live webcasting, mobile phones, audio-video chat and the like, sounds heard by users at one end or two ends of a communication link are all audio signals output by terminal equipment based on original audio information, and the original audio information is audio information without noise reduction or specific sound effect processing. Therefore, if the natural environment is noisy or the hardware of the device is limited, poor hearing experience can be brought to the user, and the requirements of the user on interestingness and individuation, such as sound change requirements, can not be met.
For a non-real-time interactive scene such as a recording scene, after the user finishes audio input, the user can process the input complete original audio information through operation to generate a recording work with a specific audio effect. Consequently, the audio processing is time-consuming, and thus inevitably requires a long time for the user to wait, which is not favorable for the user experience. In addition, this kind of rearmounted audio processing process for recording equipment can't give real-time audio feedback, and at the recording in-process promptly, the user can't obtain real-time audio experience, and is interesting poor, also is not convenient for the user in time to adjust the audio setting.
In order to enable a user to obtain better hearing experience, enrich interest of an audio input scene, and meet user requirements, the embodiment of the disclosure provides an audio processing method, which performs preset processing on acquired original audio information in real time in a process of acquiring the original audio information of the user to obtain a processed target audio signal, and outputs the target audio signal to a target device. Therefore, for real-time communication scenes such as live webcasts, mobile phones, audio and video chats and the like, users at one end or two ends of a communication link can obtain better hearing experience, for non-real-time interaction scenes such as recording scenes, the users can experience certain audio effect in advance in the recording process, so that the sound effect setting can be adjusted in time conveniently according to the satisfaction degree of the current audio effect, and after audio input is finished, a work file can be generated without waiting.
Fig. 3 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the disclosure, and as shown in fig. 3, the method may include:
s301, in response to the input first user operation, acquiring original audio information of a user.
It should be noted that the original audio information of the user refers to a sound of the user side collected by an electronic device of the user side, and the sound of the user side may be a sound emitted by the user, an environmental sound and/or an environmental noise. In other words, the original audio information of the user is not particularly the audio information containing the user's voice.
In some possible implementations, the user inputs the first user operation by voice interaction with the electronic device. That is, the first user operation may be a voice control instruction input by a user based on a voice control function of the electronic device. For example, a user first wakes up a voice application on the electronic device through a specific voice password, and after the voice application is started, the user may talk to the voice application and input a voice control instruction for instructing to start collecting the original audio information. For example, the voice control instruction may be a control instruction for triggering the start of recording, a control instruction for triggering the start of live broadcasting, a control instruction for receiving a call request sent by another device, and a control instruction for initiating a call request to another device.
In other possible implementations, the user inputs the first user operation by interacting with the electronic device through an interface. That is, the first user operation may be an operation of an interactive interface object on some application interfaces by a user, such as a touch operation, a long-press operation, a pressure operation, and the like. For convenience of explanation, the application interface capable of receiving the first user operation is referred to as an audio capture interface. The audio acquisition interface comprises a specific interactive interface object, and when the electronic equipment detects the operation of a user on the specific interactive object, the first user operation of receiving the input is determined.
The audio acquisition interface includes, but is not limited to, a recording interface provided by an application having a recording function, a network live broadcast interface provided by an application having a live broadcast function, and an audio/video call interface provided by an application having an audio/video call function.
In some implementations, the first user operation may be a user operation input in the recording interface to trigger starting of recording. Fig. 4 is a recording interface 40 exemplarily illustrated in the present disclosure, which is specifically an application interface for dubbing a video to be dubbed, such as a dubbing page of a "quick movie" application. As shown in fig. 4, the interface 40 includes a first display area 41, a second display area 42, and a third display area 43. The first display area 41 is used for displaying a part of function controls, such as a recording control 411 for triggering the start of recording, and a completion control 412 for triggering the stop of recording. The second display area 42 is used for displaying a frame sequence 421 of the audio/video to be provided and a recording time 422. Before the recording starts, the first N frames (N is greater than or equal to 1) of the audio and video to be configured are pre-loaded in the right area of the second display area 41, and after the recording starts, each frame in the frame sequence will move from the right side to the left side of the second display area 42 in sequence along with the increase of the recording duration, so as to realize the dynamic display of the frame sequence. The third display area 43 is used for displaying the frame corresponding to the current recording time in an enlarged manner, for example, when the recording time reaches 5.5 seconds, the frame with the timestamp of 5.5 seconds will be displayed in the third display area 43.
When the recording interface shown in fig. 4 is displayed, a user may input a corresponding control instruction by clicking a function control to instruct the electronic device to implement a corresponding function. For example, the user may input a first user operation by clicking the recording control 411. And the electronic equipment responds to the user operation and starts to collect the original audio information of the user through the audio collector.
In some implementation cases, the first user operation may also be a user operation that starts live broadcast after a trigger input on the webcast interface. Fig. 5 is a network live broadcast interface exemplarily shown in an embodiment of the present disclosure, which is specifically a live broadcast preview interface before a user starts a live broadcast, such as a live broadcast preview interface of a "quick-handed" application. As shown in fig. 5, the interface 50 includes a local preview screen 51 of the camera and a plurality of functional controls floating on the preview screen 51, such as a toggle control 52 for toggling the camera, an beautification control 53 for beautifying images, a sharing control 54 for sharing live broadcast addresses, an extension control 55 for adjusting more functions, a control 56 for triggering the start of live broadcast, and a plurality of live broadcast options, such as "live video broadcast", "live game broadcast", "chat room", and "live voice broadcast".
When the webcast interface shown in fig. 5 is displayed, a user may click a function control and input a corresponding control instruction to instruct the electronic device to implement a corresponding function. For example, the user may enter a first user action for triggering the start of a live broadcast by clicking on control 56. And the electronic equipment responds to the user operation, starts to acquire the original audio information of the user, continues to acquire the image information of the user through the camera and establishes a live channel with the server.
In addition, the first user operation may be a user operation that is input in the audio call interface and instructs to initiate or accept a call request. Fig. 6 is an audio call interface exemplarily shown in an embodiment of the present disclosure, which is specifically an interface displayed after an audio and video call application on an electronic device receives an audio call request sent by a friend device. As shown in fig. 6, the interface 60 includes buddy information 61 to send the request, a control 62 to deny the request, and a control 63 to accept the request.
While the audio call interface shown in fig. 6 is displayed, the user can input a first user operation for instructing acceptance of the call request by clicking the control 63. The electronic equipment responds to the user operation, establishes an audio call link with the friend equipment, and starts to collect the original audio information of the user through the audio collector.
S302, in the process of collecting the original audio information, the collected original audio information is subjected to preset processing to obtain a target audio signal.
The human ear distinguishes human voice through two dimensions of tone and tone. For example, the term "high pitch, middle pitch and low pitch" as used in daily life means that the pitch of a voice is different. In case of consistent pitch, one can also distinguish human voice by timbre. The sound change processing is to change the tone and/or pitch of the audio according to specific parameters and algorithms, so that the output sound is different from the original sound in sense. Illustratively, changing a male voice to a female voice, or changing a young child's voice to an old person's voice. In the embodiment of the present disclosure, the preset processing includes sound variation processing performed according to preset sound variation processing parameters. In the process of collecting the original audio, the collected original audio is subjected to quasi-real-time sound changing processing, and the problem that in the related art, after a user finishes audio input, sound changing processing is carried out, so that the waiting time of the user is long is solved.
The preset sound variation processing parameter may be a sound variation processing parameter preset by a user selecting a sound variation option displayed in an interface used for setting a sound variation function before the user triggers audio acquisition. At least one sound variation option is displayed in the interface, different sound variation options correspond to different sound variation processing parameters, and further, different sound variation options correspond to different sound variation styles.
In some embodiments, the interface for setting the sound-changing function is an audio capture interface, such as: the recording interface shown in fig. 4, the live network interface shown in fig. 5, and the audio chat interface shown in fig. 6.
In a more specific implementation, the audio capture interface includes a voicing control in addition to the interactive interface object for inputting the first user operation. The user can trigger the electronic equipment to display selectable sound variation options by operating the sound variation control, and then a desired sound variation option is selected according to the sound variation style shown by each displayed sound variation option, so that the sound variation processing parameter setting operation is completed. Through the design of the front-end interactive interface, a user can select a certain sound variation option before triggering and collecting audio, namely, the sound variation function is preset. On one hand, the equipment can obtain the sound changing processing parameters in advance according to the preset sound changing function of the user, so that the collected original audio can be changed in real time in the process of collecting the original audio. On the other hand, the sound variation function and the sound variation effect generated by each sound variation option can be displayed to the user in advance instead of being displayed to the user after the audio input is finished, so that the interest of the user is increased.
Taking the sound recording interface shown in fig. 4 as an example, the first display area 41 includes a sound changing control 413 in addition to the sound recording control 411. When the user operates the voicing control 413, the electronic device displays at least one voicing option in the interface 40 in response to the user operation. Fig. 7 is another sound recording interface exemplarily illustrated in the embodiment of the present disclosure, which is specifically an interface displayed after the user clicks the sound change control 413 in the interface 40. As shown in fig. 7, the interface 70 displays a plurality of voicing options, each voicing option name showing a voicing style, such as: "Luoli", "uncle", "lovely", "yellow man" etc. The 'raulie' option is in a selected state, which means that the variable acoustic processing parameter preset by the user is the variable acoustic processing parameter corresponding to the 'raulie' option at the moment. In addition, an "acoustic" option is included. It should be understood that when the user selects the "original sound" option and the sound change function is turned off, the electronic device will not change the sound of the original audio information after acquiring the original audio information of the user.
In the embodiment of the disclosure, in response to receiving an operation that a user selects a sound variation option, the sound variation option is displayed in a selected state, and other sound variation options are kept in an unselected state. For example, the "rale" option in fig. 7 is shown in the selected state, while the other voicing options are shown in the unselected state. It should be understood that the reference to "selected state" and "unselected state" in the embodiments of the present disclosure refers to two different display states of the interface object, and the purpose is to provide visual prompts for the user to understand the currently set sound variation style. It should be noted that the embodiments of the present disclosure are not limited to the display styles corresponding to the "selected state" and the "unselected state", respectively.
In some embodiments, during the process of acquiring the original audio, the user may modify the previously set voicing parameters by entering an operation to select other voicing options. For the sake of distinction, the sound variation option selected by the user before triggering audio acquisition is referred to as a first target sound variation option, and the sound variation option selected by the user in the audio acquisition process is referred to as a second target sound variation option. Based on this, the audio processing method provided by the embodiment of the present disclosure further includes, in the process of acquiring the original audio information, in response to an input operation of selecting the second target sound variation option, determining a sound variation processing parameter corresponding to the second target sound variation option as a preset sound variation processing parameter, so as to perform sound variation processing on the subsequently acquired original audio information according to the sound variation processing parameter corresponding to the second target sound variation option. The subsequently collected original audio information refers to the original audio information collected after the user selects the second target sound change option. Thus, if the user selects a sound variation option (i.e. the first target sound variation option) before starting to collect the original audio, the user can change the sound variation option to another sound variation option (i.e. the second target sound variation option) during the process of collecting the original audio, so that the user can change the sound variation option at any time according to the satisfaction degree of the user on the actual sound variation effect generated by each sound variation option. If the user does not start the sound change function before the original audio is collected, the user can operate and select a sound change option in the process of collecting the original audio so as to experience a real-time sound change effect.
Following the example above, the first target voicing option selected by the user prior to triggering audio capture is "rale," and the user may select "great tertiary" during audio capture. At this time, the electronic device performs sound variation processing on the subsequently acquired original audio information according to the sound variation processing parameters corresponding to the "uncle".
In some embodiments, the original audio information is backed up after the original audio information of the user is collected. For example, a copy of the original audio information is saved in a cache. If the operation that the user selects the second target sound change option is received in the audio acquisition process, on one hand, sound change processing is carried out on subsequently acquired original audio information according to the sound change processing parameters corresponding to the second target sound change option, and on the other hand, sound change processing is carried out on previously acquired original audio information which is backed up in the cache according to the sound change processing parameters corresponding to the second target sound change option. Therefore, if the user changes the sound variation processing parameters in the recording process, the target audio file can be generated based on the sound variation processing parameters corresponding to the sound variation option selected by the user last, and the sound variation effects of all the audio sections in the target audio file are ensured to be consistent.
In the disclosed embodiment, the variable sound processing parameters include a tone parameter for changing the tone of the original audio and/or a tone parameter for changing the tone of the original audio. The tone and/or timbre of the original audio signal can be changed by processing the original audio signal based on a preset sound variation algorithm and the tone parameter and/or timbre parameter. It is clear to a person skilled in the art how to select a suitable sound variation algorithm and how to design a sound changer based on a sound variation algorithm and sound variation processing parameters for performing sound variation processing on original audio information by the sound changer, based on the prior art. For example, the pitch of the original audio signal is changed by using a time domain method, a frequency domain method, or a parametric method, and the tone of the original audio signal is changed by a moving and shaping algorithm of a spectral envelope. The disclosed embodiments are not limited in this regard.
In some embodiments, in the process of acquiring the original audio information, the preset processing performed on the acquired original audio information further includes noise reduction processing performed according to preset noise reduction processing parameters. In the process of collecting the original audio, the collected original audio is subjected to quasi-real-time noise reduction processing, and the problem that in the related art, after a user finishes audio input, the noise reduction processing is carried out, so that the waiting time of the user is long is solved.
In specific implementation, a filter is configured according to predetermined filter parameters required for reducing or eliminating noise signals; in the process of collecting the original audio, a pre-configured filter is utilized to carry out real-time filtering processing on a noise signal contained in the original audio. It should be noted that, it is clear for those skilled in the art how to recognize the noise signal in the original audio signal and how to configure the filter parameters according to the prior art, and details of this disclosure are omitted.
In some embodiments, the user may turn the noise reduction function on or off in the interface for setting the noise reduction function before triggering audio acquisition. If the user starts the noise reduction function, the electronic device performs noise reduction processing on the original audio information according to preset noise reduction processing parameters before performing sound change processing on the acquired original audio information. If the user turns off the noise reduction function, the electronic device will not perform noise reduction processing on the original audio information.
In some embodiments, the interface for turning on or off the noise reduction function is an audio capture interface, such as: the recording interface shown in fig. 4, the live network interface shown in fig. 5, and the audio chat interface shown in fig. 6.
In a more specific implementation manner, the audio capture interface further includes a noise reduction control, that is, the noise reduction function switch, in addition to the interactive interface object and the sound variation control for inputting the first user operation. The noise reduction control can have two display states, one of which is an open state and the other of which is a closed state. The noise reduction control in the open state indicates that the noise reduction function is opened, and the noise reduction control in the closed state indicates that the noise reduction function is closed. Therefore, the user can judge whether the noise reduction function is started or not according to the display state of the noise reduction control. When the noise reduction function is in an on state or an off state, the noise reduction setting parameters stored in the operating system of the electronic device are different, and the electronic device can judge the on-off state of the noise reduction function based on the noise reduction setting parameters. Through the design of the front-end interactive interface, a user can start or close the noise reduction function before triggering and collecting audio, namely, the noise reduction function is preset. On the one hand, the equipment can judge whether to need to collect and reduce noise simultaneously according to the prepositive setting of the noise reduction function by the user. On the other hand, the noise reduction function may be presented to the user in advance, rather than being presented to the user after completing the audio input, thereby increasing the user's interest.
Referring to fig. 4, the first display area 41 of the recording interface includes a noise reduction control 414 in addition to the recording control 411 and the sound changing control 413. In fig. 4, the noise reduction control 414 is shown in an off state, illustrating that the noise reduction function is off at this time. When the audio capture interface shown in fig. 4 is displayed, the user may turn on the noise reduction function by operating the noise reduction control in the off state. The electronic equipment responds to the input operation of the noise reduction control in the closed state, displays the noise reduction control in the open state, and configures the noise reduction setting parameters into the parameters representing the closing of the noise reduction function.
Fig. 8 is another sound recording interface exemplarily illustrated in the embodiment of the present disclosure, and unlike the sound recording interface illustrated in fig. 4, in fig. 8, the noise reduction control 414 is displayed in an on state. When the audio capture interface shown in fig. 8 is displayed, the user may close the noise reduction function by operating the noise reduction control in the open state. The electronic equipment responds to the input operation of the noise reduction control in the closed state, displays the noise reduction control in the open state, and configures the noise reduction setting parameters into the parameters representing the opening of the noise reduction function.
Based on this, in some embodiments, the audio processing method provided by the embodiments of the present disclosure further includes, in response to the input first user operation, acquiring a noise reduction setting parameter; and if the noise reduction setting parameter represents that the noise reduction function is started, performing noise reduction processing on the original audio information according to a preset noise reduction processing parameter before performing sound variation processing on the acquired original audio information.
In the specific implementation of step S302, each time an audio information segment with a preset duration is collected, the audio information segment is subjected to preset processing to obtain a target audio signal corresponding to the audio information segment, each time an audio information segment with a certain duration is collected, the audio information segment is processed, and each time an audio information segment is processed, an audio signal is output according to the processed audio information segment. The segmentation processing needs less processing resources and shorter processing time, can be quickly processed and output in time, thereby realizing the technical effects of simultaneous acquisition, simultaneous processing and simultaneous output and leading a user to experience the processed audio effect in real time.
Fig. 9 is a schematic diagram illustrating an audio processing flow according to an exemplary embodiment of the disclosure, and as shown in fig. 9, the electronic device 20 continuously collects the original audio information of the user, that is, continuously collects the audio information segment 1, the audio information segment 2, … …, and the audio information segment N … …, and the duration of each audio information segment is the same. Every time an audio information segment is collected, the audio information segment is subjected to sound changing processing. Before the sound changing processing is carried out on a certain audio information segment, the noise reduction processing can also be carried out on the audio information segment. Thereby obtaining target audio signals corresponding to the respective pieces of audio information.
S303, outputting the target audio signal to a target device, where the target device includes a device on the user side and/or another device in communication with the device on the user side.
Illustratively, the device on the user side may be the terminal device 101 in the audio interactive system shown in fig. 1, and the other device in communication with the device on the user side may be the terminal device 103 in the audio interactive system shown in fig. 1.
In some application scenarios, after the terminal device 101 obtains the target audio signal, the audio signal output is completed on the terminal device 101 side. Specifically, after the target audio signal is obtained, the target audio signal is output through a built-in or external power amplifier and a loudspeaker. For example, in a recording scene, a user records sound through a recording application on the terminal device 101, and during the recording process, the user can hear his/her own sound through the ear return channel, where the sound is a target audio signal output in real time after the terminal device 101 performs real-time processing on the original audio information of the user.
In other application scenarios, after the terminal device 101 obtains the target audio signal, the server 102 sends the target audio data to the terminal device 103 to complete the output of the target audio signal on the terminal device 103. The terminal device 103 is configured to receive the target audio data sent by the server 102, analyze the target audio data to obtain a target audio signal, and output the target audio signal through a power amplifier and a speaker which are built in or external to the device. For example, in a live scene, the anchor performs a webcast through a live application on the terminal device 101, and the viewer views the live through a live application on the terminal device 103. In the live broadcast process, the audio signal output by the terminal device 103 is a target audio signal obtained by the terminal device 101.
Referring to fig. 9, in a specific implementation of step S303, each time a target audio signal corresponding to an audio information segment is obtained, the target audio signal corresponding to the audio information segment is output to a target device.
Based on the above embodiments, it can be seen that the audio processing method provided by the embodiments of the present disclosure performs preset processing on the acquired original audio information in real time in the process of acquiring the original audio information of the user to obtain a processed target audio signal, and outputs the target audio signal to the target device, thereby solving the problem of long waiting time of the user caused by performing sound change processing after the user completes audio input in the related art. In addition, the processed audio signal is output according to the processed audio information, so that a user can experience the processed audio effect in real time.
Therefore, for real-time communication scenes such as live webcasts, mobile phones, audio and video chats and the like, users at one end or two ends of a communication link can obtain better hearing experience, for non-real-time interaction scenes such as recording scenes, the users can experience certain audio effect in advance in the recording process, so that the sound effect setting can be adjusted in time conveniently according to the satisfaction degree of the current audio effect, and after audio input is finished, a work file can be generated without waiting.
In some embodiments, the user may perform a second voicing of the captured audio information after completing the audio input. Specifically, the audio processing method provided by the embodiment of the present disclosure further includes:
s304, in response to the input second user operation, stopping collecting the original audio information of the user, and displaying an audio processing interface, wherein the audio processing interface displays at least one sound variation option, and different sound variation options correspond to different sound variation processing parameters.
S305, responding to the input operation of selecting the third target sound variation option, and carrying out sound variation processing on the target audio signal or the original audio information according to the tone parameter and/or the tone parameter corresponding to the third target sound variation option to obtain a new target audio signal.
In some possible implementations, the user inputs the second user operation by voice interaction with the electronic device. That is, the second user operation may be a voice control instruction input by the user based on the voice control function of the electronic device. For example, the voice control instruction may be a control instruction for triggering the end of recording, or a control instruction for triggering the end of live broadcasting.
In other possible implementations, the user may input the second user operation by interacting with the electronic device through an interface. Taking the recording interface shown in fig. 4 as an example, the user may click the completion control 412 to input the second user operation.
Fig. 10 is an exemplary audio processing interface, which is specifically an interface displayed after the user clicks the completion control 412 in fig. 4, according to an embodiment of the present disclosure. Referring to fig. 10, the audio processing interface 100 includes a generation control 411 for triggering generation of a target file and a plurality of sound change options, such as: "Luoli", "uncle", "lovely", "yellow man" etc. At this time, the user may select the third target change option, and instruct the electronic device to process the target audio signal obtained in S302 or the original audio information backed up in the cache according to the change processing parameter corresponding to the third target change option by operating the generation control 1001, so as to obtain a new target audio signal. For convenience of explanation, in some embodiments of the present disclosure, the target audio signal obtained in S302 described above is referred to as a first target audio signal, and the target audio signal obtained in S305 described above is referred to as a second target audio signal.
In other embodiments, after receiving the second user operation, the user still stays in the audio capture interface without jumping to the audio processing interface. In these embodiments, the user may select the third target voicing option based on the voicing option presented in the audio capture interface to perform the second voicing processing on the audio information.
As can be seen from the above S304 and S305, no matter whether the user selects a sound variation option before or during the original audio acquisition is started, after the audio acquisition is finished, the user may operate to change the previously selected sound variation option or select a sound variation option for the first time to obtain an audio signal subjected to sound variation processing.
The audio processing method provided by the embodiment of the present disclosure further includes generating a target audio file based on the audio information subjected to the preset processing, storing the target audio file in the memory, and/or uploading the target audio file to the server. It should be understood that the process of generating, storing, and uploading the target audio file by the electronic device may be triggered by a specific user operation. For example, the user triggers by manipulating a particular functionality control on the audio capture interface.
The foregoing describes the scheme provided by the embodiments of the present disclosure, primarily from a methodological perspective. In order to implement the above functions, an embodiment of the present disclosure further provides an audio processing apparatus, which includes a hardware structure and/or a software module/unit corresponding to executing each function. Those of skill in the art will readily appreciate that the present disclosure can be implemented in hardware or a combination of hardware and computer software for the various exemplary method steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
Fig. 11 is a block diagram illustrating a structure of an audio processing apparatus according to an exemplary embodiment. As shown in fig. 11, the audio processing apparatus may include: an audio acquisition unit 111, an audio processing unit 112, and an audio output unit 113. And an audio collecting unit 111 for collecting original audio information of the user in response to the input first user operation. For example, the audio capturing unit 111 is configured to perform S301 in the embodiment shown in fig. 3. The audio processing unit 112 is configured to perform preset processing on the acquired original audio information to obtain a first target audio signal in a process that the original audio information is acquired by the audio acquisition unit, where the preset processing includes sound change processing performed according to preset sound change processing parameters. For example, the audio processing unit 112 is configured to execute S302 in the embodiment shown in fig. 3. An audio output unit 113, configured to output the first target audio signal to a target device, where the target device includes the device on the user side and/or another device in communication with the device on the user side. For example, the audio output unit 113 is configured to perform S303 in the embodiment shown in fig. 3.
In some embodiments, the preset processing further includes noise reduction processing according to preset noise reduction processing parameters.
In some embodiments, the audio processing unit 112 is specifically configured to: when an audio information segment with preset duration is acquired, carrying out preset processing on the audio information segment to obtain a first target audio signal corresponding to the audio information segment; the audio output unit 113 is specifically configured to output the first target audio signal corresponding to the audio information segment to a target device after each first target audio signal corresponding to the audio information segment is obtained.
In some embodiments, the audio processing apparatus further includes an interface display unit 114 and a parameter setting unit 115; an interface display unit 114, configured to display an audio acquisition interface, where the audio acquisition interface includes a sound change control; responding to the operation of a user on the sound variation control, and displaying at least one sound variation option, wherein different sound variation options correspond to different sound variation processing parameters, and the sound variation processing parameters comprise tone parameters and/or tone color parameters; the parameter setting unit 115 is configured to, in response to an input operation of selecting a first target variant option, determine the pitch parameter and/or the timbre parameter corresponding to the first target variant option as the preset variant sound processing parameter, so as to process the pitch and/or the timbre of the original audio information according to the pitch parameter and/or the timbre parameter corresponding to the first target variant option.
In some embodiments, the parameter setting unit 115 is further configured to: in the process of collecting the original audio information, in response to an input operation of selecting a second target variant sound option, determining a tone parameter and/or a tone parameter corresponding to the second target variant sound option as the preset variant sound processing parameter, so that the audio processing unit processes the tone and/or the tone of the original audio information collected after the user selects the second target variant sound option according to the tone parameter and/or the tone parameter corresponding to the second target variant sound option.
In some embodiments, the audio acquisition interface comprises a recording interface, a live network interface, and an audio call interface; the first user operation comprises user operation which is input in the recording interface and triggers to start recording, user operation which is input in the network live broadcast interface and triggers to start live broadcast, and user operation which is input in the audio call interface and indicates to initiate or accept a call request.
In some embodiments, the audio capturing unit 111 is further configured to stop capturing the original audio information of the user in response to the input of the second user operation; the interface display unit 114 is further configured to display an audio processing interface in response to the input second user operation, where the audio processing interface displays at least one voicing option, and different voicing options correspond to different voicing parameters, and the voicing parameters include a pitch parameter and/or a tone color parameter; the audio processing unit 112 is further configured to, in response to an input operation of selecting a third target sound variation option, perform sound variation processing on the first target audio signal or the original audio information according to a tone parameter and/or a tone parameter corresponding to the third target sound variation option, so as to obtain a second target audio signal.
In some embodiments, the audio processing unit 112 is specifically configured to: responding to an input first user operation, and acquiring a noise reduction setting parameter, wherein the noise reduction setting parameter is used for representing the opening or closing of the noise reduction function; and if the noise reduction setting parameter represents that the noise reduction function is started, performing noise reduction processing on the original audio information according to a preset noise reduction processing parameter before performing sound change processing on the acquired original audio information.
In some embodiments, a noise reduction control in an on state or an off state is further displayed on the audio acquisition interface, the noise reduction control in the on state indicates that a noise reduction function is on, and the noise reduction control in the off state indicates that the noise reduction function is off; the interface display unit 114 is further configured to display the noise reduction control in an off state in response to an input operation on the noise reduction control in the on state, and display the noise reduction control in an on state in response to an input operation on the noise reduction control in the off state; the parameter setting unit 115 is further configured to configure the noise reduction setting parameter as a parameter representing that the noise reduction function is turned off in response to the input operation on the noise reduction control in the on state; and responding to the input operation of the noise reduction control in the closed state, and configuring the noise reduction setting parameters as parameters for representing the opening of a noise reduction function.
The audio processing device provided by the embodiment of the disclosure can respond to the input first user operation and collect the original audio information of a user; in the process of collecting the original audio, the collected original audio is subjected to quasi-real-time sound changing processing, and the problem that in the related art, after a user finishes audio input, sound changing processing is carried out, so that the waiting time of the user is long is solved. In addition, the audio signal subjected to the sound changing processing is output according to the processed audio information, so that a user can experience the sound changing audio effect in real time.
With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
Fig. 12 is a schematic structural diagram of a server provided by the present disclosure. As in fig. 12, the server 80 may include at least one processor 801 and a memory 803 for storing processor-executable instructions. Wherein the processor 801 is configured to execute instructions in the memory 803 to implement the methods in the above-described embodiments.
Additionally, server 80 may include a communication bus 802 and at least one communication interface 804.
The processor 801 may be a Central Processing Unit (CPU), a micro-processing unit, an ASIC, or one or more integrated circuits for controlling the execution of programs according to the present disclosure.
The communication bus 802 may include a path that conveys information between the aforementioned components.
The communication interface 804 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.
The memory 803 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and connected to the processing unit by a bus. The memory may also be integrated with the processing unit.
The memory 803 is used for storing instructions for performing the disclosed aspects and is controlled in execution by the processor 801. The processor 801 is configured to execute instructions stored in the memory 803 to implement the functions of the disclosed method.
As an example, in conjunction with fig. 11, the functions implemented by the audio acquisition unit 111, the audio processing unit 112, and the audio output unit 113 in the audio processing apparatus are the same as those of the processor 801 in fig. 8.
In particular implementations, processor 801 may include one or more CPUs such as CPU0 and CPU1 in fig. 12, for example, as an example.
In particular implementations, server 80 may include multiple processors, such as processor 801 and processor 807 in FIG. 8, for example, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, server 80 may also include an output device 805 and an input device 806, as one embodiment. The output device 805 is in communication with the processor 801 and may display information in a variety of ways. For example, the output device 805 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 806 is in communication with the processor 801 and can accept user input in a variety of ways. For example, the input device 806 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
Those skilled in the art will appreciate that the architecture shown in fig. 12 does not constitute a limitation on server 80, and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.
In addition, the present disclosure also provides a computer-readable storage medium, in which instructions, when executed by a processor of a server, enable the server to perform the audio processing method provided as the above embodiment.
In addition, the present disclosure also provides a computer program product comprising computer instructions which, when run on a server, cause the server to perform the audio processing method as provided in the above embodiments.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An audio processing method, comprising:
in response to the input first user operation, acquiring original audio information of a user;
in the process of collecting the original audio information, carrying out preset processing on the collected original audio information to obtain a first target audio signal, wherein the preset processing comprises noise reduction processing according to preset noise reduction processing parameters and sound variation processing according to preset sound variation processing parameters;
outputting the first target audio signal to a target device, the target device including the user-side device and/or other devices in communication with the user-side device.
2. The audio processing method according to claim 1, wherein in the process of acquiring the original audio information, performing preset processing on the acquired original audio information includes:
when an audio information segment with preset duration is acquired, carrying out preset processing on the audio information segment to obtain a first target audio signal corresponding to the audio information segment;
outputting the first target audio signal to a target device, comprising:
and outputting the first target audio signal corresponding to the audio information segment to target equipment after each first target audio signal corresponding to the audio information segment is obtained.
3. The audio processing method according to claim 1, wherein before capturing the original audio information of the user in response to the first user operation input, further comprising:
displaying an audio acquisition interface, wherein the audio acquisition interface comprises a sound changing control;
responding to the operation of a user on the sound variation control, and displaying at least one sound variation option, wherein different sound variation options correspond to different sound variation processing parameters, and the sound variation processing parameters comprise tone parameters and/or tone color parameters;
in response to the input operation of selecting a first target sound variation option, determining the tone parameter and/or the tone color parameter corresponding to the first target sound variation option as the preset sound variation processing parameter, so as to process the tone and/or the tone color of the original audio information according to the tone parameter and/or the tone color parameter corresponding to the first target sound variation option.
4. The audio processing method of claim 3, further comprising:
in the process of collecting the original audio information, in response to an input operation of selecting a second target variant option, determining a tone parameter and/or a tone parameter corresponding to the second target variant option as the preset variant processing parameter, so as to process the tone and/or tone of the original audio information collected after the user selects the second target variant option according to the tone parameter and/or tone parameter corresponding to the second target variant option.
5. The audio processing method according to claim 3, wherein the audio acquisition interface comprises a recording interface, a live network interface and an audio call interface; the first user operation comprises user operation which is input in the recording interface and triggers to start recording, user operation which is input in the network live broadcast interface and triggers to start live broadcast, and user operation which is input in the audio call interface and indicates to initiate or accept a call request.
6. The audio processing method according to any of claims 1-5, characterized in that the method further comprises:
in response to the input second user operation, stopping collecting original audio information of a user, and displaying an audio processing interface, wherein the audio processing interface displays at least one sound variation option, different sound variation options correspond to different sound variation processing parameters, and the sound variation processing parameters comprise tone parameters and/or tone parameters;
and responding to the input operation of selecting a third target sound variation option, and performing sound variation processing on the first target audio signal or the original audio information according to a tone parameter and/or a tone parameter corresponding to the third target sound variation option to obtain a second target audio signal.
7. The audio processing device is characterized by comprising an audio acquisition unit, an audio processing unit and an audio output unit;
the audio acquisition unit is used for responding to the input first user operation and acquiring original audio information of a user;
the audio processing unit is used for performing preset processing on the acquired original audio information to obtain a first target audio signal in the process that the audio acquisition unit acquires the original audio information, wherein the preset processing comprises noise reduction processing according to preset noise reduction processing parameters and sound variation processing according to preset sound variation processing parameters;
the audio output unit is configured to output the first target audio signal to a target device, where the target device includes the device on the user side and/or another device in communication with the device on the user side.
8. An electronic device, comprising: a processor, a memory for storing instructions executable by the processor; wherein the processor is configured to execute instructions to implement the audio processing method of any of claims 1-6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to perform the audio processing method of any of claims 1-6.
10. A computer program product comprising instructions, characterized in that the computer program product comprises computer instructions which, when run on a server, cause the server to perform the audio processing method according to any of claims 1-6.
CN202111415444.8A 2021-11-25 2021-11-25 Audio processing method, device, equipment and storage medium Pending CN114141260A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111415444.8A CN114141260A (en) 2021-11-25 2021-11-25 Audio processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111415444.8A CN114141260A (en) 2021-11-25 2021-11-25 Audio processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114141260A true CN114141260A (en) 2022-03-04

Family

ID=80392114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111415444.8A Pending CN114141260A (en) 2021-11-25 2021-11-25 Audio processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114141260A (en)

Similar Documents

Publication Publication Date Title
KR101988279B1 (en) Operating Method of User Function based on a Face Recognition and Electronic Device supporting the same
WO2016177296A1 (en) Video generation method and apparatus
CN106791893A (en) Net cast method and device
US10586131B2 (en) Multimedia conferencing system for determining participant engagement
CN110401810B (en) Virtual picture processing method, device and system, electronic equipment and storage medium
CN107888965A (en) Image present methods of exhibiting and device, terminal, system, storage medium
JP2019186931A (en) Method and device for controlling camera shooting, intelligent device, and computer storage medium
CN109660873B (en) Video-based interaction method, interaction device and computer-readable storage medium
CN110798622B (en) Shared shooting method and electronic equipment
WO2019071808A1 (en) Video image display method, apparatus and system, terminal device, and storage medium
WO2021190404A1 (en) Conference establishment and conference creation method, device and system, and storage medium
CN113259583B (en) Image processing method, device, terminal and storage medium
WO2021051588A1 (en) Data processing method and apparatus, and apparatus used for data processing
CN110660403B (en) Audio data processing method, device, equipment and readable storage medium
US20230403413A1 (en) Method and apparatus for displaying online interaction, electronic device and computer readable medium
WO2023087929A1 (en) Assisted photographing method and apparatus, and terminal and computer-readable storage medium
CN114141260A (en) Audio processing method, device, equipment and storage medium
CN113827953B (en) Game control system
CN116016817A (en) Video editing method, device, electronic equipment and storage medium
US11729489B2 (en) Video chat with plural users using same camera
US20230132415A1 (en) Machine learning-based audio manipulation using virtual backgrounds for virtual meetings
CN116501227B (en) Picture display method and device, electronic equipment and storage medium
CN115379250B (en) Video processing method, device, computer equipment and storage medium
CN115474080B (en) Wired screen-throwing control method and device
WO2024032111A9 (en) Data processing method and apparatus for online conference, and device, medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination