WO2019101015A1 - 音频信号处理方法、装置和存储介质 - Google Patents

音频信号处理方法、装置和存储介质 Download PDF

Info

Publication number
WO2019101015A1
WO2019101015A1 PCT/CN2018/115928 CN2018115928W WO2019101015A1 WO 2019101015 A1 WO2019101015 A1 WO 2019101015A1 CN 2018115928 W CN2018115928 W CN 2018115928W WO 2019101015 A1 WO2019101015 A1 WO 2019101015A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
signal
target song
spectrum
short
Prior art date
Application number
PCT/CN2018/115928
Other languages
English (en)
French (fr)
Inventor
肖纯智
Original Assignee
广州酷狗计算机科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州酷狗计算机科技有限公司 filed Critical 广州酷狗计算机科技有限公司
Priority to US16/617,900 priority Critical patent/US10964300B2/en
Priority to EP18881136.8A priority patent/EP3614383A4/en
Publication of WO2019101015A1 publication Critical patent/WO2019101015A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/366Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/005Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental

Definitions

  • the present invention relates to the field of terminal technologies, and in particular, to an audio signal processing method, apparatus, and storage medium.
  • the terminal not only supports the basic communication function application, but also supports the application of the entertainment function.
  • the user can entertain through the application of the entertainment function installed on the terminal.
  • the terminal supports the K song application, and the user can record the song through the K song application installed on the terminal.
  • the terminal when the terminal records a certain target song through the K song application, the terminal directly collects an audio signal of the user singing the target song, and uses the collected audio signal of the user as an audio signal of the target song.
  • the user's audio signal is directly used as the audio signal of the target song.
  • the quality of the audio signal of the target song recorded by the terminal is poor.
  • the invention provides an audio signal processing method, device and storage medium, which can solve the problem of poor quality of a recorded audio signal.
  • the technical solutions are as follows:
  • the present invention provides an audio signal processing method, the method comprising:
  • the extracting the timbre information of the user from the first audio signal includes:
  • the acquiring the pitch information of the standard audio signal of the target song includes:
  • the extracting the pitch information of the standard audio signal from the standard audio signal includes:
  • the standard audio signal is an audio signal of a specified user singing the target song
  • the designated user is a singer whose original song or pitch of the target song satisfies the condition.
  • the generating, according to the timbre information and the pitch information, a second audio signal of the target song including:
  • the synthesizing the timbre information and the pitch information into a third short-time spectrum signal includes:
  • Y i (k) is the spectral value of the ith frame spectral signal in the third short-time spectrum signal
  • E i (k) is the excitation component of the ith frame spectrum. Is the envelope value of the spectrum of the ith frame.
  • the present invention provides an audio signal processing apparatus, the apparatus comprising:
  • a first acquiring module configured to acquire a first audio signal of a user singing a target song
  • An extracting module configured to extract timbre information of the user from the first audio signal
  • a second acquiring module configured to acquire pitch information of a standard audio signal of the target song
  • a generating module configured to generate a second audio signal of the target song according to the timbre information and the pitch information.
  • the extracting module is further configured to perform framing processing on the first audio signal to obtain a first audio signal after the framing; and performing the first audio signal after the framing Windowing processing, performing short-time Fourier transform on the audio signal located in the window to obtain a first short-time spectrum signal; extracting a first spectrum packet of the first audio signal from the first short-time spectrum signal
  • the first spectral envelope is used as the timbre information.
  • the second acquiring module is further configured to acquire a standard audio signal of the target song according to a song identifier of the target song, and extract the standard audio signal from the standard audio signal. Pitch information; or,
  • the second acquiring module is further configured to acquire, according to the song identifier of the target song, the pitch information of the standard audio signal of the target song from the correspondence between the song identifier and the pitch information of the standard audio signal.
  • the second acquiring module is further configured to perform framing processing on the standard audio signal to obtain a second audio signal after the framing; and to use the second audio signal after the framing Performing windowing processing, performing short-time Fourier transform on the audio signal located in the window to obtain a second short-time spectrum signal; and extracting a second spectrum packet of the standard audio signal from the second short-time spectrum signal Generating an excitation spectrum of the standard audio signal according to the second short-time spectrum signal and the second spectrum envelope, and using the excitation spectrum as the pitch information of the standard audio signal.
  • the standard audio signal is an audio signal of a specified user singing the target song
  • the designated user is a singer whose original song or pitch of the target song satisfies the condition.
  • the generating module is further configured to synthesize the timbre information and the pitch information into a third short-time spectrum signal, and perform inverse Fourier transform on the third short-time spectrum signal. Obtaining a second audio signal of the target song.
  • the generating module is further configured to determine, according to the second spectrum envelope corresponding to the timbre information and the excitation spectrum corresponding to the timbre information, the third short-term spectrum signal by using the following formula 1. ;
  • Y i (k) is the spectral value of the ith frame spectral signal in the third short-time spectrum signal
  • E i (k) is the excitation component of the ith frame spectrum. Is the envelope value of the spectrum of the ith frame.
  • the present invention provides an audio signal processing apparatus including a processor and a memory, wherein the memory stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one instruction A program, the set of codes, or a set of instructions is loaded and executed by the processor to implement an audio processing method as described in any of the possible implementations of the first aspect.
  • the present invention provides a storage medium, where the storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one program, and the code set. Or the set of instructions is loaded and executed by the processor to implement an audio processing method as described in any of the possible implementations of the first aspect.
  • the timbre information of the user is extracted from the first audio signal of the user singing the target song, the pitch information of the standard audio signal of the target song is obtained, and the target song is generated according to the timbre information and the pitch information.
  • the second audio signal Since the second audio signal of the target song is generated, it is generated based on the pitch information of the standard audio signal and the timbre information of the user. Therefore, even if the user's singing performance is poor, a high-quality audio signal is generated, thereby improving the quality of the generated audio signal.
  • FIG. 1 is a flowchart of a method for processing audio signals according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for processing audio signals according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of an apparatus for processing audio signals according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • An embodiment of the present invention provides an audio signal processing method. Referring to FIG. 1, the method includes:
  • Step 101 Acquire a first audio signal of a user singing a target song.
  • Step 102 Extract the timbre information of the user from the first audio signal.
  • Step 103 Acquire pitch information of a standard audio signal of the target song.
  • Step 104 Generate a second audio signal of the target song according to the timbre information and the pitch information.
  • extracting user's tone information from the first audio signal includes:
  • a first spectral envelope of the first audio signal is extracted from the first short-term spectral signal, and the first spectral envelope is used as the timbre information.
  • acquiring the pitch information of the standard audio signal of the target song includes:
  • the pitch information of the standard audio signal of the target song is obtained from the correspondence relationship between the song identification and the pitch information of the standard audio signal according to the song identification of the target song.
  • the pitch information of the standard audio signal is extracted from the standard audio signal, including:
  • the standard audio signal is an audio signal for a specified user to sing a target song, and the singer that specifies that the user is the original singer or the pitch of the target song satisfies the condition.
  • the second audio signal of the target song is generated according to the timbre information and the pitch information, including:
  • the timbre information and the pitch information are combined into a third short-time spectrum signal, including:
  • Y i (k) is the spectral value of the ith frame spectral signal in the third short-time spectrum signal
  • E i (k) is the excitation component of the ith frame spectrum. Is the envelope value of the spectrum of the ith frame.
  • the timbre information of the user is extracted from the first audio signal of the user singing the target song, the pitch information of the standard audio signal of the target song is obtained, and the target song is generated according to the timbre information and the pitch information.
  • the second audio signal Since the second audio signal of the target song is generated, it is generated based on the pitch information of the standard audio signal and the timbre information of the user. Therefore, even if the user's singing performance is poor, a high-quality audio signal is generated, thereby improving the quality of the generated audio signal.
  • the embodiment of the present invention provides an audio signal processing method.
  • the execution body of the method is a client of a specified application or a terminal that installs the client.
  • the specified application may be an application for recording audio signals or a social application.
  • the application for recording audio signals can be a camera, a video camera, a tape recorder, or a K song application.
  • the social application can be an instant messaging application or a live application.
  • the terminal can be any device capable of processing audio signals, such as a mobile phone device, a PAD (Portable Android Device) device, or a computer device.
  • the execution subject is used as a terminal, and the designated application is a K song application as an example. Referring to Figure 2, the method includes:
  • Step 201 The terminal acquires a first audio signal of the user singing the target song.
  • the terminal When the terminal generates an audio signal of the high-quality target song for the user, the terminal first acquires the first audio signal of the user singing the target song; wherein the first audio signal may be the audio signal currently recorded by the terminal, or may be in the local audio library.
  • the stored audio signal can also be an audio signal sent by the user's friend user.
  • the source of the first audio signal is not specifically limited.
  • the target song may be any song. In the embodiment of the present invention, the target song is not specifically limited.
  • the step may be: the terminal acquires the song identifier of the target song selected by the user; when detecting the recording start instruction, the terminal starts to collect the audio signal, when detecting When the recording end command is issued, the terminal stops collecting the audio signal, and uses the collected audio signal as the first audio signal.
  • the main interface of the terminal includes a plurality of song identifiers; the user may select a song among the plurality of song identifiers, the terminal acquires a song identifier of the song selected by the user, and determines a song identifier of the selected song as The song ID of the target song.
  • the main interface of the terminal further includes a search input box and a search button; the user can search for the target song through the search button by inputting the song identifier of the target song in the search input box.
  • the song when the terminal detects that the search button is triggered, the song identifies the song identifier of the song input in the search input box as the song identifier of the target song.
  • the song identifier may be the name of the song, the singer's logo singing the song, and the singer's logo may be the singer's name or nickname.
  • the step may be: the terminal acquires the song identifier of the target song selected by the user, and obtains the song identifier from the local audio library according to the song identifier of the target song. The user sings the first audio signal of the target song.
  • the correspondence between the song identification and the audio signal is stored in the local audio library.
  • the terminal acquires the first audio signal of the target song from the correspondence between the song identifier and the audio signal according to the song identifier of the target song.
  • the local audio library stores the song identification and audio signal of the song of the user who has played the song.
  • the step may be: the terminal selects the first audio signal sent by the user friend in the chat dialog box of the user and the friend user.
  • Step 202 The terminal extracts the timbre information of the user from the first audio signal.
  • the tone information includes the tone. This step can be implemented by the following steps (1) to (3), including:
  • the terminal performs frame processing on the first audio signal to obtain a framed first audio signal.
  • the terminal performs frame processing on the first audio signal by using the first preset frame length and the first preset frame shift to obtain a framed first audio signal.
  • the duration of the first audio signal of each frame obtained in the frame is in the time domain is the length of the first preset frame, and the end time of the audio signal of the previous frame in the first two frames of the adjacent two frames is in the time domain.
  • the difference from the start time of the first audio signal of the next frame is the first preset frame shift.
  • the first preset frame length and the first preset frame shift may be set and changed as needed.
  • the first preset frame length and the first preset frame shift are not specifically limited.
  • the terminal performs windowing processing on the first audio signal after the frame division, and performs short-time Fourier transform on the audio signal located in the window to obtain a first short-time spectrum signal.
  • the first audio signal after the frame is processed by windowing using a Hamming window. And, the short-time Fourier transform is performed on the audio signal located in the window as the window moves, and the audio signal in the time domain is converted into the audio signal in the frequency domain to obtain the first short-time spectrum signal.
  • the terminal extracts a first spectrum envelope of the first audio signal from the first short-time spectrum signal, and uses the first audio envelope as the tone color information of the user.
  • the terminal uses a cepstrum method to extract a first spectral envelope of the first audio signal from the first short-time spectrum signal.
  • Step 203 The terminal acquires the pitch information of the standard audio signal of the target song.
  • the terminal may currently extract the pitch information from the standard audio signal of the target song, that is, the following first implementation manner.
  • the terminal may also extract the pitch information of the target song in advance, and directly obtain the pitch information of the standard audio signal of the stored target song in this step, that is, the following second implementation manner.
  • the server may also extract the pitch information of the target song in advance. In this step, the terminal acquires the pitch information of the standard audio signal of the target song from the server, that is, the third implementation manner.
  • this step can be implemented by the following steps (1) to (2):
  • the terminal acquires a standard audio signal of the target song according to the song identifier of the target song.
  • the song library of the terminal is associated with storing a plurality of song identifiers and a standard audio signal; in this step, the terminal corresponds to the song identifier in the song library and the standard audio signal according to the song identifier of the target song.
  • the standard audio signal of the target song is obtained in the relationship.
  • the standard audio signal of the target song stored in the song library is an audio signal for the designated user to sing the target song. Specify the singer whose user is the original singer or pitch of the target song.
  • a plurality of song and audio signal banks are associatedly stored in the terminal, and the audio signal library corresponding to any song includes a plurality of audio signals of the any song.
  • the terminal acquires an audio signal library of the target song from the corresponding relationship between the song identifier and the audio signal library according to the song identifier of the target song, and obtains the standard of the singer whose pitch meets the condition from the audio signal library. audio signal.
  • the step of the terminal acquiring the standard audio signal of the singer whose pitch meets the condition from the audio signal library may be: the terminal determines the pitch of each audio signal in the audio signal library, according to the pitch of each audio signal, from the An audio signal sung by a designated user whose pitch meets the condition is selected in the audio signal library.
  • a singer whose pitch meets the condition refers to a singer whose pitch is greater than a preset threshold, or a singer whose highest pitch is the highest among a plurality of singers.
  • the song library may not be stored in the terminal, and the terminal acquires a standard audio signal of the target song from the server.
  • the step of the terminal acquiring the standard audio signal of the target song according to the song identifier of the target song may be: the terminal sends a first acquisition request to the server, where the first acquisition request carries the song identifier of the target song; The first obtaining request acquires a standard audio signal of the target song according to the song identifier of the target song, and sends a standard audio signal of the target song to the terminal.
  • the server since a plurality of singers may sing the target song, the server stores a standard audio signal of the target song sung by a plurality of singers. In this step, the user can also specify the singer.
  • the first obtaining request may further carry a user identifier of the specified user; the server acquires a standard audio signal of the specified user singing the target song according to the user identifier of the specified user and the song identifier of the target song, and sends the standard audio signal to the terminal. The designated user sings a standard audio signal of the target song.
  • the terminal extracts the pitch information of the standard audio signal from the standard audio signal.
  • the standard audio signal includes a spectrum envelope and an excitation spectrum
  • the spectrum envelope indicates tone information
  • the excitation spectrum indicates pitch information.
  • the pitch information includes pitch and length.
  • the terminal performs frame processing on the standard audio signal to obtain a second audio signal after the frame is divided.
  • the terminal performs frame processing on the standard audio signal by using the second preset frame length and the second preset frame shift to obtain a second audio signal after the frame division.
  • the duration of the second audio signal of each frame obtained in the frame is in the time domain is the second preset frame length, and the end time of the audio signal of the previous frame in the second frame of the adjacent two frames is in the time domain.
  • the difference between the start time of the second audio signal and the next frame is the second preset frame shift.
  • the second preset frame length and the first preset frame length may be the same or different; the second preset frame shift and the first preset frame shift may be the same or different.
  • the second preset frame length and the second preset frame shift are both set and changed as needed. In the embodiment of the present invention, the second preset frame length and the second preset frame shift are not specifically limited.
  • the terminal performs windowing processing on the second audio signal after the frame, and performs short-time Fourier transform on the audio signal located in the window to obtain a second short-time spectrum signal.
  • the first audio signal after the frame is processed by windowing using a Hamming window. Moreover, as the window moves, the audio signal located in the window is subjected to short-time Fourier transform, and the audio signal in the time domain is converted into an audio signal in the frequency domain to obtain a second short-time spectrum signal.
  • the terminal extracts a second spectral envelope of the standard audio signal from the second short-time spectrum signal.
  • the terminal uses a cepstrum method to extract a second spectral envelope of the standard audio signal from the second short-term spectral signal.
  • the terminal generates an excitation spectrum of the standard audio signal according to the second short-term spectrum signal and the second spectrum envelope, and uses the excitation spectrum as the pitch information of the standard audio signal.
  • the terminal determines the excitation component of the frame spectrum according to the spectral value and the envelope value of the frame spectrum, and forms the excitation component of the excitation component of each frame spectrum.
  • the terminal determines a ratio of a spectral value of the frame spectrum to an envelope value, and determines the ratio as an excitation component of the frame spectrum.
  • the excitation component of the spectrum of the ith frame is Where i is the frame number.
  • the terminal extracts the pitch information of the standard audio signal of each song in the song library in advance, and associates the correspondence between the song identifier and the pitch information of each song.
  • the terminal acquires the pitch information of the standard audio signal of the target song from the correspondence relationship between the song identifier and the pitch information of the standard audio signal according to the song identifier of the target song.
  • the terminal may also sing the user's friend to sing the pitch information of the target song and the user's timbre information to synthesize the second audio signal of the target song.
  • the step of the terminal acquiring the pitch information of the standard audio signal of the target song may be:
  • the terminal acquires an audio signal sent by the user's friend user, and uses the audio signal sent by the friend user as a standard audio signal, and extracts the pitch information of the standard audio signal from the standard audio signal.
  • step 203 may be: the terminal sends a second acquisition request to the server, the second acquisition request carries the song identifier of the target song, and the second acquisition request is used to obtain the standard audio signal of the target song. Pitch information.
  • the server receives the second acquisition request, acquires the pitch information of the standard audio signal of the target song according to the song identifier of the target song, and sends the pitch information of the standard audio signal of the target song to the terminal; the terminal receives the standard audio of the target song. The pitch information of the signal.
  • the server acquires the pitch information of the standard audio signal of the target song, and associates the song identifier of the target song with the pitch information of the standard audio signal of the target song.
  • the server can also extract and store in advance the pitch information of the standard audio signals of the plurality of singers singing the target song.
  • the user can also specify the singer.
  • the second obtaining request further carries the user identifier of the specified user; the server acquires the pitch information of the standard audio signal of the specified user singing the target song according to the user identifier of the specified user and the song identifier of the target song, The terminal transmits the pitch information of the standard audio signal of the specified user singing the target song.
  • the step of extracting the pitch information of the standard audio signal of the target song and the step of extracting the pitch information of the standard audio signal of the target song by the server may be the same or different, which is not specifically limited in the embodiment of the present invention.
  • the pitch information of the singer or the high-level singer and the timbre information of the user can be synthesized into a high-quality song work, but also the audio signal of the user friend can be used as a reference audio signal, thereby realizing The user sings the pitch information of the target song and the user's timbre information to synthesize high-quality song works, which improves the interest.
  • Step 204 The terminal generates a second audio signal of the target song according to the timbre information and the pitch information.
  • This step can be achieved by the following steps (1) and (2), including:
  • the terminal synthesizes the timbre information and the pitch information into a third short-time spectrum signal.
  • the terminal determines the third short-term spectrum signal by the following formula 1 according to the second spectrum envelope and the excitation spectrum.
  • Y i (k) is the spectral value of the spectrum of the ith frame in the third short-time spectrum signal
  • E i (k) is the excitation component of the spectrum of the ith frame. Is the envelope value of the spectrum of the ith frame.
  • the terminal performs inverse Fourier transform on the third short-time spectrum signal to obtain a second audio signal of the target song.
  • the terminal performs inverse Fourier transform on the second spectrum signal, and converts the third short-time spectrum signal into a time domain signal to obtain a second audio signal of the target song.
  • step 205 may also be performed to process the second audio signal.
  • Step 205 The terminal receives an operation instruction for the second audio signal, and processes the second audio signal according to the operation instruction.
  • the user may trigger an operation instruction for the second audio signal to the terminal, and the operation instruction may be a storage instruction, a first sharing instruction, and a second sharing instruction.
  • the storing instruction is used to instruct the terminal to store the second audio signal
  • the first sharing instruction is used to instruct the terminal to share the second audio signal to the target user
  • the second sharing instruction is used to instruct the terminal to share the second audio signal to the information display platform of the user.
  • the step of processing, by the terminal, the second audio signal according to the operation instruction may be: the terminal stores the second audio signal in the designated storage space according to the operation instruction.
  • the designated storage space may be an audio library local to the terminal, or may be a storage space corresponding to the user account of the user in the cloud server.
  • the step of the terminal storing the second audio signal in the specified storage space according to the operation instruction may be: the terminal sends the storage to the cloud server.
  • the request, the storage request carries the user identification and the second audio signal.
  • the cloud server receives the storage request, and stores the second audio signal in the storage space corresponding to the user identifier according to the user identifier.
  • the cloud server authenticates the terminal; after the identity verification is passed, the subsequent storage process is performed.
  • the step of authenticating the terminal by the cloud server may be: the terminal sends an authentication request to the cloud server, where the verification request carries the user account and the user password of the user.
  • the cloud server receives the verification request sent by the terminal. When the user account and the user password match, the user is authenticated; when the user account and the user password do not match, the user verification fails.
  • the user before the second audio signal is stored in the cloud server, the user is authenticated first, and after the verification is passed, the subsequent storage process is performed, thereby improving the security of the second audio signal.
  • the step of the terminal processing the second audio signal according to the operation instruction may be: the terminal acquires the target user selected by the user, and sends the second audio signal to the server and the The user ID of the target user.
  • the server receives the second audio signal and the user identifier of the target user, and sends the second audio signal to the terminal corresponding to the target user according to the user identifier of the target user.
  • the target user includes at least one user and/or at least one group.
  • the step of processing, by the terminal, the second audio signal according to the operation instruction may be: the terminal sends the second audio signal and the user identifier of the user to the server.
  • the server receives the second audio signal and the user identifier of the user, and shares the second audio signal into the information display platform of the user according to the user identifier of the user.
  • the user identifier may be a user account registered by the user in the server in advance.
  • the group identifier may be a group name, a two-dimensional code, or the like. It should be noted that, in the embodiment of the present invention, the function of processing an audio signal is added in a social application, enriching the function of the social application, and improving the user experience.
  • the timbre information of the user is extracted from the first audio signal of the user singing the target song, the pitch information of the standard audio signal of the target song is obtained, and the target song is generated according to the timbre information and the pitch information.
  • the second audio signal Since the second audio signal of the target song is generated, it is generated based on the pitch information of the standard audio signal and the timbre information of the user. Therefore, even if the user's singing performance is poor, a high-quality audio signal is generated, thereby improving the quality of the generated audio signal.
  • An embodiment of the present invention provides an audio signal processing apparatus, which is applied to a terminal, and is used in the method for performing the foregoing processing of an audio signal.
  • the apparatus includes:
  • a first acquiring module 301 configured to acquire a first audio signal of a user singing a target song
  • the extracting module 302 is configured to extract the timbre information of the user from the first audio signal
  • a second obtaining module 303 configured to acquire pitch information of a standard audio signal of the target song
  • the generating module 304 is configured to generate a second audio signal of the target song according to the timbre information and the pitch information.
  • the extracting module 302 is further configured to perform framing processing on the first audio signal to obtain a framing first audio signal, and perform windowing on the framing first audio signal, and Performing a short time Fourier transform on the audio signal located in the window to obtain a first short time spectrum signal; extracting a first spectrum envelope of the first audio signal from the first short time spectrum signal, using the first spectrum envelope as The tone information.
  • the second obtaining module 303 is further configured to: obtain a standard audio signal of the target song according to the song identifier of the target song, and extract pitch information of the standard audio signal from the standard audio signal; or ,
  • the second obtaining module 303 is further configured to obtain, according to the song identifier of the target song, the pitch information of the standard audio signal of the target song from the correspondence between the song identifier and the pitch information of the standard audio signal.
  • the second obtaining module 303 is further configured to perform frame processing on the standard audio signal to obtain a second audio signal after the framed frame, and perform windowing on the second audio signal after the framed frame. And performing a short time Fourier transform on the audio signal located in the window to obtain a second short time spectrum signal; extracting a second spectrum envelope of the standard audio signal from the second short time spectrum signal; according to the second short time spectrum The signal and the second spectral envelope generate an excitation spectrum of the standard audio signal, and the excitation spectrum is used as the pitch information of the standard audio signal.
  • the standard audio signal is an audio signal of a specified user singing a target song, and the user is designated as a singer whose original song or pitch of the target song satisfies the condition.
  • the generating module 304 is further configured to synthesize the timbre information and the pitch information into a third short-time spectrum signal, and perform inverse Fourier transform on the third short-time spectrum signal to obtain a target song.
  • the second audio signal is further configured to synthesize the timbre information and the pitch information into a third short-time spectrum signal, and perform inverse Fourier transform on the third short-time spectrum signal to obtain a target song.
  • the second audio signal is further configured to synthesize the timbre information and the pitch information into a third short-time spectrum signal, and perform inverse Fourier transform on the third short-time spectrum signal to obtain a target song.
  • the second audio signal is further configured to synthesize the timbre information and the pitch information into a third short-time spectrum signal, and perform inverse Fourier transform on the third short-time spectrum signal to obtain a target song.
  • the generating module 304 is further configured to determine, according to the second spectrum envelope corresponding to the timbre information and the excitation spectrum corresponding to the timbre information, the third short-term spectrum signal by using Equation 1 below;
  • Y i (k) is the spectral value of the ith frame spectral signal in the third short-time spectrum signal
  • E i (k) is the excitation component of the ith frame spectrum. Is the envelope value of the spectrum of the ith frame.
  • the timbre information of the user is extracted from the first audio signal of the user singing the target song, the pitch information of the standard audio signal of the target song is obtained, and the target song is generated according to the timbre information and the pitch information.
  • the second audio signal Since the second audio signal of the target song is generated, it is generated based on the pitch information of the standard audio signal and the timbre information of the user. Therefore, even if the user's singing performance is poor, a high-quality audio signal is generated, thereby improving the quality of the generated audio signal.
  • the audio signal processing apparatus provided by the foregoing embodiment is only illustrated by the division of each functional module in the audio signal processing. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the audio signal processing apparatus and the audio signal processing method are provided in the same embodiment, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal can be used to implement the functions performed by the terminal in the method of processing an audio signal shown in the above embodiments. Specifically:
  • the terminal 400 may include an RF (Radio Frequency) circuit 410, a memory 420 including one or more computer readable storage media, an input unit 430, a display unit 440, a sensor 450, an audio circuit 460, a transmission module 470, including One or more processing core processor 480, and power supply 490 and the like.
  • RF Radio Frequency
  • FIG. 4 does not constitute a limitation to the terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements. among them:
  • the RF circuit 410 can be used for transmitting and receiving information or during a call, and receiving and transmitting the signal. Specifically, after receiving the downlink information of the base station, the downlink information is processed by one or more processors 480. In addition, the data related to the uplink is sent to the base station. .
  • the RF circuit 410 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier). , duplexer, etc.
  • RF circuitry 410 can also communicate with the network and other terminals via wireless communication.
  • the wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access). , Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • e-mail Short Messaging Service
  • the memory 420 can be used to store software programs and modules, such as the software programs and modules corresponding to the terminals shown in the above exemplary embodiments, and the processor 480 executes various functional applications by running software programs and modules stored in the memory 420. And data processing, such as implementing video-based interactions.
  • the memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to The data created by the use of the terminal 400 (such as audio data, phone book, etc.) and the like.
  • memory 420 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 420 may also include a memory controller to provide access to memory 420 by processor 480 and input unit 430.
  • the input unit 430 can be configured to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
  • input unit 430 can include touch-sensitive surface 431 as well as other input terminals 432.
  • a touch-sensitive surface 431, also referred to as a touch display or trackpad, can collect touch operations on or near the user (eg, the user uses a finger, stylus, etc., any suitable object or accessory on the touch-sensitive surface 431 or The operation near the touch-sensitive surface 431) and driving the corresponding linking device according to a preset program.
  • the touch-sensitive surface 431 can include two portions of a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
  • the processor 480 is provided and can receive commands from the processor 480 and execute them.
  • the touch sensitive surface 431 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 430 can also include other input terminals 432.
  • other input terminals 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
  • Display unit 440 can be used to display information entered by the user or information provided to the user and various graphical user interfaces of terminal 400, which can be constructed from graphics, text, icons, video, and any combination thereof.
  • the display unit 440 may include a display panel 441.
  • the display panel 441 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
  • the touch-sensitive surface 431 can cover the display panel 441, and when the touch-sensitive surface 431 detects a touch operation thereon or nearby, it is transmitted to the processor 480 to determine the type of the touch event, and then the processor 480 according to the touch event The type provides a corresponding visual output on display panel 441.
  • touch-sensitive surface 431 and display panel 441 are implemented as two separate components to implement input and input functions, in some embodiments, touch-sensitive surface 431 can be integrated with display panel 441 for input. And output function.
  • Terminal 400 may also include at least one type of sensor 450, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 441 according to the brightness of the ambient light, and the proximity sensor may close the display panel 441 when the terminal 400 moves to the ear. / or backlight.
  • the gravity acceleration sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
  • the terminal 400 can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here Let me repeat.
  • the audio circuit 460, the speaker 461, and the microphone 462 can provide an audio interface between the user and the terminal 400.
  • the audio circuit 460 can transmit the converted electrical data of the received audio data to the speaker 461 for conversion to the sound signal output by the speaker 461; on the other hand, the microphone 462 converts the collected sound signal into an electrical signal by the audio circuit 460. After receiving, it is converted into audio data, and then processed by the audio data output processor 480, transmitted to the terminal, for example, via the RF circuit 410, or the audio data is output to the memory 420 for further processing.
  • the audio circuit 460 may also include an earbud jack to provide communication of the peripheral earphones with the terminal 400.
  • the terminal 400 can help the user to send and receive emails, browse web pages, access streaming media, etc. through the transmission module 470, which provides the user with wireless or wired broadband Internet access.
  • FIG. 4 shows the transmission module 470, it can be understood that it does not belong to the essential configuration of the terminal 400, and may be omitted as needed within the scope of not changing the essence of the invention.
  • Processor 480 is the control center of terminal 400, which links various portions of the entire handset using various interfaces and lines, by running or executing software programs and/or modules stored in memory 420, and recalling data stored in memory 420, The various functions and processing data of the terminal 400 are performed to perform overall monitoring of the mobile phone.
  • the processor 480 may include one or more processing cores; preferably, the processor 480 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 480.
  • the terminal 400 also includes a power source 490 (such as a battery) that supplies power to the various components.
  • a power source 490 (such as a battery) that supplies power to the various components.
  • the power source can be logically coupled to the processor 480 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • Power supply 490 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
  • the terminal 400 may further include a camera, a Bluetooth module, and the like, and details are not described herein.
  • the display unit of the terminal 400 is a touch screen display
  • the terminal 400 further includes a memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be one or one
  • the above processor executing one or more of the above-described programs includes instructions for implementing the operations performed by the terminal in the above-described embodiments.
  • a computer readable storage medium storing a computer program, such as a memory storing a computer program, the method of processing the audio signal in the above embodiment when the computer program is executed by the processor .
  • the computer readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), or a Compact Disc Read-Only Memory (CD-ROM). , tapes, floppy disks, and optical data storage devices.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

一种音频信号处理方法、装置和存储介质,属于终端技术领域。方法包括:获取用户演唱目标歌曲的第一音频信号(101);从第一音频信号中提取用户的音色信息(102);获取目标歌曲的标准音频信号的音准信息(103);根据音色信息和音准信息,生成目标歌曲的第二音频信号(104)。由于在生成目标歌曲的第二音频信号时,是基于标准音频信号的音准信息和用户的音色信息生成的,因此,即使用户的唱功较差,也会生成高质量的音频信号,从而提高了生成的音频信号的质量。

Description

音频信号处理方法、装置和存储介质 技术领域
本发明涉及终端技术领域,特别涉及一种音频信号处理方法、装置和存储介质。
背景技术
随着终端技术的发展,终端中支持的应用越来越多,终端不仅支持基本的通信功能的应用,还支持娱乐功能的应用。用户可以通过终端上安装的娱乐功能的应用进行娱乐活动。例如,终端支持K歌应用,用户可以通过终端上安装的K歌应用录制歌曲。
目前,终端通过该K歌应用录制某个目标歌曲时,终端直接采集用户演唱该目标歌曲的音频信号,将采集的用户的音频信号作为该目标歌曲的音频信号。
在实现本发明的过程中,发明人发现现有技术至少存在以下问题:
上述方法中直接将用户的音频信号作为该目标歌曲的音频信号,然而当用户的唱功较差时,终端录制的该目标歌曲的音频信号的质量较差。
发明内容
本发明提供了一种音频信号处理方法、装置和存储介质,可以解决录制音频信号的质量差的问题。技术方案如下:
第一方面,本发明提供了一种音频信号处理方法,所述方法包括:
获取用户演唱目标歌曲的第一音频信号;
从所述第一音频信号中提取所述用户的音色信息;
获取所述目标歌曲的标准音频信号的音准信息;
根据所述音色信息和所述音准信息,生成所述目标歌曲的第二音频信号。
在一个可能的实现方式中,所述从所述第一音频信号中提取所述用户的音色信息,包括:
对所述第一音频信号进行分帧处理,得到分帧后的第一音频信号;
对所述分帧后的第一音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第一短时频谱信号;
从所述第一短时频谱信号中提取所述第一音频信号的第一频谱包络,将所述第一频谱包络作为所述音色信息。
在一个可能的实现方式中,所述获取所述目标歌曲的标准音频信号的音准信息,包括:
根据所述目标歌曲的歌曲标识,获取所述目标歌曲的标准音频信号,从所述标准音频信号中提取所述标准音频信号的音准信息;或者,
根据所述目标歌曲的歌曲标识,从歌曲标识和标准音频信号的音准信息的对应关系中获取所述目标歌曲的标准音频信号的音准信息。
在一个可能的实现方式中,所述从所述标准音频信号中提取所述标准音频信号的音准信息,包括:
对所述标准音频信号进行分帧处理,得到分帧后的第二音频信号;
对所述分帧后的第二音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第二短时频谱信号;
从所述第二短时频谱信号中提取所述标准音频信号的第二频谱包络;
根据所述第二短时频谱信号和所述第二频谱包络,生成所述标准音频信号的激励谱,将所述激励谱作为所述标准音频信号的音准信息。
在一个可能的实现方式中,所述标准音频信号为指定用户演唱所述目标歌曲的音频信号,所述指定用户为所述目标歌曲的原唱或者音准度满足条件的演唱者。
在一个可能的实现方式中,所述根据所述音色信息和所述音准信息,生成所述目标歌曲的第二音频信号,包括:
将所述音色信息和所述音准信息,合成第三短时频谱信号;
对所述第三短时频谱信号进行逆傅里叶变换,得到所述目标歌曲的第二音频信号。
在一个可能的实现方式中,所述将所述音色信息和所述音准信息,合成第三短时频谱信号,包括:
根据所述音色信息对应的第二频谱包络和所述音准信息对应的激励谱,通过以下公式一,确定第三短时频谱信号;
公式一:
Figure PCTCN2018115928-appb-000001
Y i(k)为所述第三短时频谱信号中的第i帧频谱信号的频谱值,E i(k)为第i帧频谱的激励分量,
Figure PCTCN2018115928-appb-000002
为第i帧频谱的包络值。
第二方面,本发明提供了一种音频信号处理装置,所述装置包括:
第一获取模块,用于获取用户演唱目标歌曲的第一音频信号;
提取模块,用于从所述第一音频信号中提取所述用户的音色信息;
第二获取模块,用于获取所述目标歌曲的标准音频信号的音准信息;
生成模块,用于根据所述音色信息和所述音准信息,生成所述目标歌曲的第二音频信号。
在一个可能的实现方式中,所述提取模块,还用于对所述第一音频信号进行分帧处理,得到分帧后的第一音频信号;对所述分帧后的第一音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第一短时频谱信号;从所述第一短时频谱信号中提取所述第一音频信号的第一频谱包络,将所述第一频谱包络作为所述音色信息。
在一个可能的实现方式中,所述第二获取模块,还用于根据所述目标歌曲的歌曲标识,获取所述目标歌曲的标准音频信号,从所述标准音频信号中提取所述标准音频信号的音准信息;或者,
所述第二获取模块,还用于根据所述目标歌曲的歌曲标识,从歌曲标识和标准音频信号的音准信息的对应关系中获取所述目标歌曲的标准音频信号的音准信息。
在一个可能的实现方式中,所述第二获取模块,还用于对所述标准音频信号进行分帧处理,得到分帧后的第二音频信号;对所述分帧后的第二音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第二短时频谱信号;从所述第二短时频谱信号中提取所述标准音频信号的第二频谱包络;根据所述第二短时频谱信号和所述第二频谱包络,生成所述标准音频信号的激励谱,将所述激励谱作为所述标准音频信号的音准信息。
在一个可能的实现方式中,所述标准音频信号为指定用户演唱所述目标歌曲的音频信号,所述指定用户为所述目标歌曲的原唱或者音准度满足条件的演唱者。
在一个可能的实现方式中,所述生成模块,还用于将所述音色信息和所述音准信息,合成第三短时频谱信号;对所述第三短时频谱信号进行逆傅里叶变换,得到所述目标歌曲的第二音频信号。
在一个可能的实现方式中,所述生成模块,还用于根据所述音色信息对应的第二频谱包络和所述音准信息对应的激励谱,通过以下公式一,确定第三短时频谱信号;
公式一:
Figure PCTCN2018115928-appb-000003
Y i(k)为所述第三短时频谱信号中的第i帧频谱信号的频谱值,E i(k)为第i帧频谱的激励分量,
Figure PCTCN2018115928-appb-000004
为第i帧频谱的包络值。
第三方面,本发明提供了一种音频信号处理装置,包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如第一方面任一可能实现方式中所述的音频处理方法。
第四方面,本发明提供了一种存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如第一方面任一可能实现方式中所述的音频处理方法。
在本发明实施例中,从用户演唱目标歌曲的第一音频信号中提取用户的音色信息,获取该目标歌曲的标准音频信号的音准信息,根据该音色信息和该音准信息,生成该目标歌曲的第二音频信号。由于在生成目标歌曲的第二音频信号时,是基于标准音频信号的音准信息和用户的音色信息生成的。因此,即使用户的唱功较差,也会生成高质量的音频信号,从而提高了生成的音频信号的质量。
附图说明
图1是本发明实施例提供的一种音频信号处理的方法流程图;
图2是本发明实施例提供的一种音频信号处理的方法流程图;
图3是本发明实施例提供的一种音频信号处理的装置结构示意图;
图4是本发明实施例提供的一种终端的结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
本发明实施例提供了一种音频信号处理方法,参见图1,该方法包括:
步骤101:获取用户演唱目标歌曲的第一音频信号。
步骤102:从第一音频信号中提取用户的音色信息。
步骤103:获取目标歌曲的标准音频信号的音准信息。
步骤104:根据该音色信息和该音准信息,生成目标歌曲的第二音频信号。
在一个可能的实现方式中,从第一音频信号中提取用户的音色信息,包括:
对第一音频信号进行分帧处理,得到分帧后的第一音频信号;
对分帧后的第一音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第一短时频谱信号;
从第一短时频谱信号中提取第一音频信号的第一频谱包络,将第一频谱包络作为该音色信息。
在一个可能的实现方式中,获取目标歌曲的标准音频信号的音准信息,包括:
根据目标歌曲的歌曲标识,获取目标歌曲的标准音频信号,从标准音频信号中提取标准音频信号的音准信息;或者,
根据目标歌曲的歌曲标识,从歌曲标识和标准音频信号的音准信息的对应关系中获取目标歌曲的标准音频信号的音准信息。
在一个可能的实现方式中,从标准音频信号中提取标准音频信号的音准信息,包括:
对该标准音频信号进行分帧处理,得到分帧后的第二音频信号;
对分帧后的第二音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第二短时频谱信号;
从第二短时频谱信号中提取标准音频信号的第二频谱包络;
根据第二短时频谱信号和第二频谱包络,生成该标准音频信号的激励谱, 将激励谱作为标准音频信号的音准信息。
在一个可能的实现方式中,该标准音频信号为指定用户演唱目标歌曲的音频信号,指定用户为目标歌曲的原唱或者音准度满足条件的演唱者。
在一个可能的实现方式中,根据该音色信息和该音准信息,生成目标歌曲的第二音频信号,包括:
将该音色信息和该音准信息,合成第三短时频谱信号;
对第三短时频谱信号进行逆傅里叶变换,得到目标歌曲的第二音频信号。
在一个可能的实现方式中,将该音色信息和该音准信息,合成第三短时频谱信号,包括:
根据该音色信息对应的第二频谱包络和该音准信息对应的激励谱,通过以下公式一,确定第三短时频谱信号;
公式一:
Figure PCTCN2018115928-appb-000005
Y i(k)为第三短时频谱信号中的第i帧频谱信号的频谱值,E i(k)为第i帧频谱的激励分量,
Figure PCTCN2018115928-appb-000006
为第i帧频谱的包络值。
在本发明实施例中,从用户演唱目标歌曲的第一音频信号中提取用户的音色信息,获取该目标歌曲的标准音频信号的音准信息,根据该音色信息和该音准信息,生成该目标歌曲的第二音频信号。由于在生成目标歌曲的第二音频信号时,是基于标准音频信号的音准信息和用户的音色信息生成的。因此,即使用户的唱功较差,也会生成高质量的音频信号,从而提高了生成的音频信号的质量。
本发明实施例提供了一种音频信号处理方法,该方法的执行主体为指定应用的客户端或者安装该客户端的终端,该指定应用可以为音频录制信号的应用,也可以为社交应用。录制音频信号的应用可以为照相机、摄像机、录音机或者K歌应用等。社交应用可以为即时通信应用或者直播应用。终端可以为手机设备、PAD(Portable Android Device,平板电脑)设备或者电脑设备等任一能够处理音频信号的设备。在本发明实施例中以执行主体为终端,指定应用为K歌应用为例进行说明。参见图2,该方法包括:
步骤201:终端获取用户演唱目标歌曲的第一音频信号。
当终端为用户生成高质量的目标歌曲的音频信号时,终端首先获取用户演 唱目标歌曲的第一音频信号;其中,第一音频信号可以为终端当前录制的音频信号,也可以为本地音频库中存储的音频信号,也可以为用户的好友用户发送的音频信号。在本发明实施例中,对第一音频信号的来源不作具体限定。目标歌曲可以为任一歌曲,在本发明实施例中,对目标歌曲也不作具体限定。
(一):当第一音频信号为终端当前录制的音频信号时,本步骤可以为:终端获取用户选择的目标歌曲的歌曲标识;当检测到录制开始指令时,终端开始采集音频信号,当检测到录制结束指令时,终端停止采集音频信号,将采集的音频信号作为第一音频信号。
在一个可能的实现方式中,终端的主界面中包括多个歌曲标识;用户可以在该多个歌曲标识中选择歌曲,终端获取用户选择的歌曲的歌曲标识,将选择的歌曲的歌曲标识确定为目标歌曲的歌曲标识。在另一个可能的实现方式中,终端的主界面中还包括搜索输入框和搜索按钮;用户可以通过在该搜索输入框中输入目标歌曲的歌曲标识,通过该搜索按钮搜索目标歌曲。相应的,终端检测到搜索按钮被触发时,将搜索输入框中输入的歌曲的歌曲标识确定为目标歌曲的歌曲标识。其中,歌曲标识可以为歌曲的名称、演唱该歌曲的演唱者标识,演唱者标识可以为演唱者姓名或者昵称等。
(二):当第一音频信号为本地音频库中存储的音频信号,则本步骤可以为:终端获取用户选择的目标歌曲的歌曲标识,根据该目标歌曲的歌曲标识,从本地音频库中获取用户演唱该目标歌曲的第一音频信号。
本地音频库中存储歌曲标识和音频信号的对应关系。相应的,终端根据该目标歌曲的歌曲标识,从歌曲标识和音频信号的对应关系中获取该目标歌曲的第一音频信号。其中,本地音频库中存储的是用户演已唱歌曲的歌曲的歌曲标识和音频信号。
(三):当第一音频信号为用户的好友用户发送的音频信号,则本步骤可以为:终端在用户与该好友用户的聊天对话框中选择该用户好友发送的第一音频信号。
步骤202:终端从第一音频信号中提取用户的音色信息。
第一音频信号中包括的频谱包络和激励谱,频谱包络指示音色信息,激励谱指示音准信息。音色信息包括音色。本步骤可以通过以下步骤(1)至(3)实现,包括:
(1):终端对第一音频信号进行分帧处理,得到分帧后的第一音频信号。
终端以第一预设帧长和第一预设帧移对第一音频信号进行分帧处理,得到分帧后的第一音频信号。其中,分帧得到的每帧第一音频信号在时域上的持续时长为该第一预设帧长,且相邻两帧第一音频信号中上一帧音频信号在时域上的结束时刻与下一帧第一音频信号的起始时刻之间的差值为该第一预设帧移。
第一预设帧长和第一预设帧移都可以根据需要进行设置并更改,在本发明实施例中,对第一预设帧长和第一预设帧移都不作具体限定。
(2):终端对分帧后的第一音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第一短时频谱信号。
在本发明实施例中,采用汉明窗对分帧后的第一音频信号进行加窗处理。并且,随着窗的移动对位于窗口内的音频信号进行短时傅里叶变换,将时域上的音频信号转换为频域上的音频信号,得到第一短时频谱信号。
(3):终端从第一短时频谱信号中提取第一音频信号的第一频谱包络,将第一音频包络作为用户的音色信息。
终端采用倒谱法,从第一短时频谱信号中提取第一音频信号的第一频谱包络。
步骤203:终端获取目标歌曲的标准音频信号的音准信息。
在本发明实施例中,终端可以当前从目标歌曲的标准音频信号中提取音准信息,也即以下第一种实现方式。终端也可以事先提取好目标歌曲的音准信息,在本步骤中直接获取已存储的目标歌曲的标准音频信号的音准信息,也即以下第二种实现方式。服务器还可以事先提取好目标歌曲的音准信息,在本步骤中终端从服务器中获取该目标歌曲的标准音频信号的音准信息,也即一下第三种实现方式。
对于第一种实现方式,本步骤可以通过以下步骤(1)至(2)实现:
(1):终端根据目标歌曲的歌曲标识,获取目标歌曲的标准音频信号。
在一个可能的实现方式中,终端的歌曲库中关联存储多个歌曲标识和标准音频信号;在本步骤中,终端根据目标歌曲的歌曲标识,从歌曲库中的歌曲标识和标准音频信号的对应关系中获取目标歌曲的标准音频信号。其中,歌曲库中存储的目标歌曲的标准音频信号为指定用户演唱该目标歌曲的音频信号。指定用户为该目标歌曲的原唱或者音准度满足条件的演唱者。
终端中关联存储多个歌曲和音频信号库,任一歌曲对应的音频信号库包括该任一歌曲的多个音频信号。在本步骤中,终端根据该目标歌曲的歌曲标识,从歌曲标识和音频信号库的对应关系中获取该目标歌曲的音频信号库,从该音频信号库中获取音准度满足条件的演唱者的标准音频信号。
终端从该音频信号库中获取音准度满足条件的演唱者的标准音频信号的步骤可以为:终端确定该音频信号库中每个音频信号的音准度,根据每个音频信号的音准度,从该音频信号库中选择音准度满足条件的指定用户演唱的音频信号。
音准度满足条件的演唱者是指音准度大于预设阈值的演唱者,或者多个演唱者中音准度最高的演唱者。
在另一个可能的实现方式中,终端中可以不存储歌曲库,终端从服务器中获取目标歌曲的标准音频信号。相应的,终端根据目标歌曲的歌曲标识,获取目标歌曲的标准音频信号的步骤可以为:终端向服务器发送第一获取请求,该第一获取请求携带该目标歌曲的歌曲标识;服务器接收终端的该第一获取请求,根据该目标歌曲的歌曲标识,获取该目标歌曲的标准音频信号,向终端发送该目标歌曲的标准音频信号。
需要说明的一点是,由于可能多个演唱者都演唱过该目标歌曲,因此,服务器中存储了多个演唱者演唱的该目标歌曲的标准音频信号。在本步骤中,用户还可以指定演唱者。相应的,该第一获取请求中还可以携带指定用户的用户标识;服务器根据该指定用户的用户标识和该目标歌曲的歌曲标识,获取该指定用户演唱该目标歌曲的标准音频信号,向终端发送该指定用户演唱该目标歌曲的标准音频信号。
(2):终端从该标准音频信号中提取该标准音频信号的音准信息。
其中,标准音频信号包括频谱包络和激励谱,频谱包络指示音色信息,激励谱指示音准信息。音准信息包括音高和音长。相应的,本步骤可以通过以下步骤(2-1)至(2-4)实现,包括:
(2-1):终端对该标准音频信号进行分帧处理,得到分帧后的第二音频信号。
终端以第二预设帧长和第二预设帧移对标准音频信号进行分帧处理,得到分帧后的第二音频信号。其中,分帧得到的每帧第二音频信号在时域上的持续时长为该第二预设帧长,且相邻两帧第二音频信号中上一帧音频信号在时域上 的结束时刻与下一帧第二音频信号的起始时刻之间的差值为该第二预设帧移。
第二预设帧长和第一预设帧长可以相同,也可以不相同;第二预设帧移和第一预设帧移可以相同,也可以不相同。并且,第二预设帧长和第二预设帧移都可以根据需要进行设置并更改,在本发明实施例中,对第二预设帧长和第二预设帧移都不作具体限定。
(2-2):终端对分帧后的第二音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第二短时频谱信号。
在本发明实施例中,采用汉明窗对分帧后的第一音频信号进行加窗处理。并且,随着窗的移动对位于窗口内的音频信号进行短时傅里叶变换,将时域上的音频信号转换为频域上的音频信号,得到第二短时频谱信号。
(2-3):终端从第二短时频谱信号中提取该标准音频信号的第二频谱包络。
终端采用倒谱法,从第二短时频谱信号中提取该标准音频信号的第二频谱包络。
(2-4):终端根据第二短时频谱信号和第二频谱包络,生成该标准音频信号的激励谱,将该激励谱作为该标准音频信号的音准信息。
对于每帧频谱,终端根据该帧频谱的频谱值与包络值,确定该帧频谱的激励分量,将每帧频谱的激励分量组成激励谱。其中,终端确定该帧频谱的频谱值与包络值的比值,将该比值确定为该帧频谱的激励分量。
例如,第i帧频谱的频谱值为X i(k),第i帧频谱的包络值为H i(k),则第i帧频谱的激励分量为
Figure PCTCN2018115928-appb-000007
其中,i为帧号。
对于第二种实现方式,终端事先提取歌曲库中的每个歌曲的标准音频信号的音准信息,关联存储每个歌曲的歌曲标识和音准信息的对应关系。相应的,在本步骤中,终端根据目标歌曲的歌曲标识,从歌曲标识和标准音频信号的音准信息的对应关系中获取该目标歌曲的标准音频信号的音准信息。
需要说明的是,终端提取歌曲库中的每个歌曲的标准音频信号的音准信息的过程和上述终端提取目标歌曲的标准音频信号的音准信息的过程相同,在此不再赘述。
在本发明实施例中,终端也可以将用户好友演唱该目标歌曲的音准信息和用户的音色信息合成目标歌曲的第二音频信号。相应的,终端获取目标歌曲的 标准音频信号的音准信息的步骤可以为:
终端获取用户的好友用户发送的音频信号,将好友用户发送的音频信号作为标准音频信号,从该标准音频信号中提取该标准音频信号的音准信息。
对于第三种实现方式,步骤203可以为:终端向服务器发送第二获取请求,第二获取请求携带该目标歌曲的歌曲标识,且该第二获取请求用于获取该目标歌曲的标准音频信号的音准信息。服务器接收该第二获取请求,根据该目标歌曲的歌曲标识,获取该目标歌曲的标准音频信号的音准信息,向终端发送该目标歌曲的标准音频信号的音准信息;终端接收该目标歌曲的标准音频信号的音准信息。
需要说明的一点是,服务器在本步骤之前,获取该目标歌曲的标准音频信号的音准信息,并关联存储该目标歌曲的歌曲标识和该目标歌曲的标准音频信号的音准信息。
需要说明的另一点是,服务器还可以事先提取并存储多个演唱者演唱该目标歌曲的标准音频信号的音准信息。在本步骤中,用户还可以指定演唱者。相应的,该第二获取请求中还携带指定用户的用户标识;服务器根据该指定用户的用户标识和该目标歌曲的歌曲标识,获取该指定用户演唱该目标歌曲的标准音频信号的音准信息,向终端发送该指定用户演唱该目标歌曲的标准音频信号的音准信息。
其中,服务器提取目标歌曲的标准音频信号的音准信息的步骤和终端提取目标歌曲的标准音频信号的音准信息的步骤可以相同,也可以不同,在本发明实施例中对此不作具体限定。
在本发明实施例中,不仅可以实现将原唱或者高水准唱功的演唱者的音准信息与用户的音色信息合成高质量歌曲作品,还能够将用户好友的音频信号作为基准音频信号,从而实现将用户演唱该目标歌曲的音准信息和用户的音色信息合成高质量歌曲作品,提高了趣味性。
步骤204:终端根据该音色信息和该音准信息,生成该目标歌曲的第二音频信号。
本步骤可以通过以下步骤(1)和(2)实现,包括:
(1):终端将该音色信息和该音准信息,合成第三短时频谱信号。
终端根据第二频谱包络和该激励谱,通过以下公式一确定第三短时频谱信 号。
公式一:
Figure PCTCN2018115928-appb-000008
其中,Y i(k)为第三短时频谱信号中的第i帧频谱的频谱值,E i(k)为第i帧频谱的激励分量,
Figure PCTCN2018115928-appb-000009
为第i帧频谱的包络值。
(2):终端对第三短时频谱信号进行逆傅里叶变换,得到该目标歌曲的第二音频信号。
终端对第二频谱信号进行逆傅里叶变换,将第三短时频谱信号转换为时域信号,得到该目标歌曲的第二音频信号。
需要说明的一点是,终端生成该目标歌曲的第二音频信号之后,可以结束。另外,终端生成该目标歌曲的第二音频信号之后,还可以执行步骤205对该第二音频信号进行处理。
步骤205:终端接收对第二音频信号的操作指令,根据该操作指令,对第二音频信号进行处理。
当终端生成该目标歌曲的第二音频信号时,用户可以向终端触发对该第二音频信号的操作指令,该操作指令可以为存储指令、第一分享指令和第二分享指令。存储指令用于指示终端存储第二音频信号,第一分享指令用于指示终端将第二音频信号分享给目标用户,第二分享指令用于指示终端将第二音频信号分享到用户的信息展示平台。
(一):当该操作指令为存储指令时,终端根据该操作指令,对第二音频信号进行处理的步骤可以为:终端根据该操作指令,将该第二音频信号存储到指定存储空间中。其中,指定存储空间可以为终端本地的音频库,也可以为云服务器中的该用户的用户账号对应的存储空间。
当该指定存储空间为云服务器中的该用户的用户账号对应的存储空间时,终端根据该操作指令,将该第二音频信号存储到指定存储空间中的步骤可以为:终端向云服务器发送存储请求,该存储请求携带用户标识和该第二音频信号。云服务器接收该存储请求,根据该用户标识,将第二音频信号存储到该用户标识对应的存储空间中。
在终端将第二音频信号存储到云服务器中的该用户的用户账户对应的存储空间之前,云服务器对终端进行身份验证;在身份验证通过后,才进行后续的存储过程。其中,云服务器对终端进行身份验证的步骤可以为:终端向云服务 器发送验证请求,该验证请求携带该用户的用户账号和用户密码。云服务器接收终端发送的验证请求,当该用户账号和该用户密码匹配时,对该用户验证通过;当该用户账号和该用户密码不匹配时,对该用户验证不通过。
在本发明实施例中,将第二音频信号存储到云服务器之前,先对用户进行身份验证,在验证通过后,才进行后续的存储过程,从而提高了第二音频信号的安全性。
(二):当该操作指令为第一分享指令时,终端根据该操作指令,对第二音频信号进行处理的步骤可以为:终端获取用户选择的目标用户,向服务器发送第二音频信号和该目标用户的用户标识。服务器接收第二音频信号和该目标用户的用户标识,根据该目标用户的用户标识,将第二音频信号发送给该目标用户对应的终端。其中,目标用户包括至少一个用户和/或至少一个群组。
(三):当该操作指令为第二分享指令时,终端根据该操作指令,对第二音频信号进行处理的步骤可以为:终端向服务器发送该第二音频信号和该用户的用户标识。服务器接收第二音频信号和该用户的用户标识,根据该用户的用户标识,将该第二音频信号分享到该用户的信息展示平台中。
其中,用户标识可以为用户事先在服务器中注册的用户账号等。群组标识可以为群组名称、二维码等。需要说明的是,在本发明实施例中,在社交应用中增加了处理音频信号的功能,丰富了社交应用的功能,且提高了用户体验。
在本发明实施例中,从用户演唱目标歌曲的第一音频信号中提取用户的音色信息,获取该目标歌曲的标准音频信号的音准信息,根据该音色信息和该音准信息,生成该目标歌曲的第二音频信号。由于在生成目标歌曲的第二音频信号时,是基于标准音频信号的音准信息和用户的音色信息生成的。因此,即使用户的唱功较差,也会生成高质量的音频信号,从而提高了生成的音频信号的质量。
本发明实施例提供了一种音频信号处理装置,该装置应用在终端中,用于执行上述处理音频信号的方法中终端执行的步骤,参见图3,该装置包括:
第一获取模块301,用于获取用户演唱目标歌曲的第一音频信号;
提取模块302,用于从第一音频信号中提取该用户的音色信息;
第二获取模块303,用于获取目标歌曲的标准音频信号的音准信息;
生成模块304,用于根据该音色信息和该音准信息,生成该目标歌曲的第二音频信号。
在一个可能的实现方式中,提取模块302,还用于对第一音频信号进行分帧处理,得到分帧后的第一音频信号;对分帧后的第一音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第一短时频谱信号;从第一短时频谱信号中提取第一音频信号的第一频谱包络,将第一频谱包络作为该音色信息。
在一个可能的实现方式中,第二获取模块303,还用于根据该目标歌曲的歌曲标识,获取该目标歌曲的标准音频信号,从该标准音频信号中提取该标准音频信号的音准信息;或者,
第二获取模块303,还用于根据该目标歌曲的歌曲标识,从歌曲标识和标准音频信号的音准信息的对应关系中获取该目标歌曲的标准音频信号的音准信息。
在一个可能的实现方式中,第二获取模块303,还用于对该标准音频信号进行分帧处理,得到分帧后的第二音频信号;对分帧后的第二音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第二短时频谱信号;从第二短时频谱信号中提取标准音频信号的第二频谱包络;根据第二短时频谱信号和第二频谱包络,生成该标准音频信号的激励谱,将该激励谱作为标准音频信号的音准信息。
在一个可能的实现方式中,该标准音频信号为指定用户演唱目标歌曲的音频信号,指定用户为该目标歌曲的原唱或者音准度满足条件的演唱者。
在一个可能的实现方式中,生成模块304,还用于将该音色信息和该音准信息,合成第三短时频谱信号;对第三短时频谱信号进行逆傅里叶变换,得到目标歌曲的第二音频信号。
在一个可能的实现方式中,生成模块304,还用于根据该音色信息对应的第二频谱包络和该音准信息对应的激励谱,通过以下公式一,确定第三短时频谱信号;
公式一:
Figure PCTCN2018115928-appb-000010
Y i(k)为第三短时频谱信号中的第i帧频谱信号的频谱值,E i(k)为第i帧频谱的激励分量,
Figure PCTCN2018115928-appb-000011
为第i帧频谱的包络值。
在本发明实施例中,从用户演唱目标歌曲的第一音频信号中提取用户的音 色信息,获取该目标歌曲的标准音频信号的音准信息,根据该音色信息和该音准信息,生成该目标歌曲的第二音频信号。由于在生成目标歌曲的第二音频信号时,是基于标准音频信号的音准信息和用户的音色信息生成的。因此,即使用户的唱功较差,也会生成高质量的音频信号,从而提高了生成的音频信号的质量。
需要说明的是:上述实施例提供的音频信号处理装置在音频信号处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的音频信号处理装置与音频信号处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图4是本发明实施例提供的一种终端的结构示意图。该终端可以用于实施上述实施例所示出的处理音频信号的方法中的终端所执行的功能。具体来讲:
终端400可以包括RF(Radio Frequency,射频)电路410、包括有一个或一个以上计算机可读存储介质的存储器420、输入单元430、显示单元440、传感器450、音频电路460、传输模块470、包括有一个或者一个以上处理核心的处理器480、以及电源490等部件。本领域技术人员可以理解,图4中示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
RF电路410可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,交由一个或者一个以上处理器480处理;另外,将涉及上行的数据发送给基站。通常,RF电路410包括但不限于天线、至少一个放大器、调谐器、一个或多个振荡器、用户身份模块(SIM)卡、收发信机、耦合器、LNA(Low Noise Amplifier,低噪声放大器)、双工器等。此外,RF电路410还可以通过无线通信与网络和其他终端通信。所述无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA(Code Division Multiple Access,码分多址)、WCDMA(Wideband Code  Division Multiple Access,宽带码分多址)、LTE(Long Term Evolution,长期演进)、电子邮件、SMS(Short Messaging Service,短消息服务)等。
存储器420可用于存储软件程序以及模块,如上述示例性实施例所示出的终端所对应的软件程序以及模块,处理器480通过运行存储在存储器420的软件程序以及模块,从而执行各种功能应用以及数据处理,如实现基于视频的交互等。存储器420可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据终端400的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器420还可以包括存储器控制器,以提供处理器480和输入单元430对存储器420的访问。
输入单元430可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,输入单元430可包括触敏表面431以及其他输入终端432。触敏表面431,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面431上或在触敏表面431附近的操作),并根据预先设定的程式驱动相应的链接装置。可选的,触敏表面431可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器480,并能接收处理器480发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触敏表面431。除了触敏表面431,输入单元430还可以包括其他输入终端432。具体地,其他输入终端432可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
显示单元440可用于显示由用户输入的信息或提供给用户的信息以及终端400的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元440可包括显示面板441,可选的,可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode, 有机发光二极管)等形式来配置显示面板441。进一步的,触敏表面431可覆盖显示面板441,当触敏表面431检测到在其上或附近的触摸操作后,传送给处理器480以确定触摸事件的类型,随后处理器480根据触摸事件的类型在显示面板441上提供相应的视觉输出。虽然在图4中,触敏表面431与显示面板441是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面431与显示面板441集成而实现输入和输出功能。
终端400还可包括至少一种传感器450,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板441的亮度,接近传感器可在终端400移动到耳边时,关闭显示面板441和/或背光。作为运动传感器的一种,重力加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于终端400还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
音频电路460、扬声器461,传声器462可提供用户与终端400之间的音频接口。音频电路460可将接收到的音频数据转换后的电信号,传输到扬声器461,由扬声器461转换为声音信号输出;另一方面,传声器462将收集的声音信号转换为电信号,由音频电路460接收后转换为音频数据,再将音频数据输出处理器480处理后,经RF电路410以发送给比如另一终端,或者将音频数据输出至存储器420以便进一步处理。音频电路460还可能包括耳塞插孔,以提供外设耳机与终端400的通信。
终端400通过传输模块470可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线或有线的宽带互联网访问。虽然图4示出了传输模块470,但是可以理解的是,其并不属于终端400的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。
处理器480是终端400的控制中心,利用各种接口和线路链接整个手机的各个部分,通过运行或执行存储在存储器420内的软件程序和/或模块,以及调用存储在存储器420内的数据,执行终端400的各种功能和处理数据,从而对手机进行整体监控。可选的,处理器480可包括一个或多个处理核心;优选的, 处理器480可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器480中。
终端400还包括给各个部件供电的电源490(比如电池),优选的,电源可以通过电源管理系统与处理器480逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源490还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
尽管未示出,终端400还可以包括摄像头、蓝牙模块等,在此不再赘述。具体在本实施例中,终端400的显示单元是触摸屏显示器,终端400还包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行上述一个或者一个以上程序包含用于实施上述实施例中终端所执行操作的指令。
在示例性实施例中,还提供了一种存储有计算机程序的计算机可读存储介质,例如存储有计算机程序的存储器,上述计算机程序被处理器执行时实现上述实施例中的处理音频信号的方法。例如,所述计算机可读存储介质可以是只读内存(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (16)

  1. 一种音频信号处理方法,其特征在于,所述方法包括:
    获取用户演唱目标歌曲的第一音频信号;
    从所述第一音频信号中提取所述用户的音色信息;
    获取所述目标歌曲的标准音频信号的音准信息;
    根据所述音色信息和所述音准信息,生成所述目标歌曲的第二音频信号。
  2. 根据权利要求1所述的方法,其特征在于,所述从所述第一音频信号中提取所述用户的音色信息,包括:
    对所述第一音频信号进行分帧处理,得到分帧后的第一音频信号;
    对所述分帧后的第一音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第一短时频谱信号;
    从所述第一短时频谱信号中提取所述第一音频信号的第一频谱包络,将所述第一频谱包络作为所述音色信息。
  3. 根据权利要求1所述的方法,其特征在于,所述获取所述目标歌曲的标准音频信号的音准信息,包括:
    根据所述目标歌曲的歌曲标识,获取所述目标歌曲的标准音频信号,从所述标准音频信号中提取所述标准音频信号的音准信息;或者,
    根据所述目标歌曲的歌曲标识,从歌曲标识和标准音频信号的音准信息的对应关系中获取所述目标歌曲的标准音频信号的音准信息。
  4. 根据权利要求3所述的方法,其特征在于,所述从所述标准音频信号中提取所述标准音频信号的音准信息,包括:
    对所述标准音频信号进行分帧处理,得到分帧后的第二音频信号;
    对所述分帧后的第二音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第二短时频谱信号;
    从所述第二短时频谱信号中提取所述标准音频信号的第二频谱包络;
    根据所述第二短时频谱信号和所述第二频谱包络,生成所述标准音频信号 的激励谱,将所述激励谱作为所述标准音频信号的音准信息。
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述标准音频信号为指定用户演唱所述目标歌曲的音频信号,所述指定用户为所述目标歌曲的原唱或者音准度满足条件的演唱者。
  6. 根据权利要求1-4任一所述的方法,其特征在于,所述根据所述音色信息和所述音准信息,生成所述目标歌曲的第二音频信号,包括:
    将所述音色信息和所述音准信息,合成第三短时频谱信号;
    对所述第三短时频谱信号进行逆傅里叶变换,得到所述目标歌曲的第二音频信号。
  7. 根据权利要求6所述的方法,其特征在于,所述将所述音色信息和所述音准信息,合成第三短时频谱信号,包括:
    根据所述音色信息对应的第二频谱包络和所述音准信息对应的激励谱,通过以下公式一,确定第三短时频谱信号;
    公式一:
    Figure PCTCN2018115928-appb-100001
    Y i(k)为所述第三短时频谱信号中的第i帧频谱信号的频谱值,E i(k)为第i帧频谱的激励分量,
    Figure PCTCN2018115928-appb-100002
    为第i帧频谱的包络值。
  8. 一种音频信号处理装置,其特征在于,所述装置包括:
    第一获取模块,用于获取用户演唱目标歌曲的第一音频信号;
    提取模块,用于从所述第一音频信号中提取所述用户的音色信息;
    第二获取模块,用于获取所述目标歌曲的标准音频信号的音准信息;
    生成模块,用于根据所述音色信息和所述音准信息,生成所述目标歌曲的第二音频信号。
  9. 根据权利要求8所述的装置,其特征在于,
    所述提取模块,还用于对所述第一音频信号进行分帧处理,得到分帧后的第一音频信号;对所述分帧后的第一音频信号进行加窗处理,并对位于窗口内 的音频信号进行短时傅里叶变换,得到第一短时频谱信号;从所述第一短时频谱信号中提取所述第一音频信号的第一频谱包络,将所述第一频谱包络作为所述音色信息。
  10. 根据权利要求8所述的装置,其特征在于,
    所述第二获取模块,还用于根据所述目标歌曲的歌曲标识,获取所述目标歌曲的标准音频信号,从所述标准音频信号中提取所述标准音频信号的音准信息;或者,
    所述第二获取模块,还用于根据所述目标歌曲的歌曲标识,从歌曲标识和标准音频信号的音准信息的对应关系中获取所述目标歌曲的标准音频信号的音准信息。
  11. 根据权利要求10所述的装置,其特征在于,
    所述第二获取模块,还用于对所述标准音频信号进行分帧处理,得到分帧后的第二音频信号;对所述分帧后的第二音频信号进行加窗处理,并对位于窗口内的音频信号进行短时傅里叶变换,得到第二短时频谱信号;从所述第二短时频谱信号中提取所述标准音频信号的第二频谱包络;根据所述第二短时频谱信号和所述第二频谱包络,生成所述标准音频信号的激励谱,将所述激励谱作为所述标准音频信号的音准信息。
  12. 根据权利要求8-11任一所述的装置,其特征在于,所述标准音频信号为指定用户演唱所述目标歌曲的音频信号,所述指定用户为所述目标歌曲的原唱或者音准度满足条件的演唱者。
  13. 根据权利要求8-11任一所述的装置,其特征在于,
    所述生成模块,还用于将所述音色信息和所述音准信息,合成第三短时频谱信号;对所述第三短时频谱信号进行逆傅里叶变换,得到所述目标歌曲的第二音频信号。
  14. 根据权利要求13所述的装置,其特征在于,
    所述生成模块,还用于根据所述音色信息对应的第二频谱包络和所述音准信息对应的激励谱,通过以下公式一,确定第三短时频谱信号;
    公式一:
    Figure PCTCN2018115928-appb-100003
    Y i(k)为所述第三短时频谱信号中的第i帧频谱信号的频谱值,E i(k)为第i帧频谱的激励分量,
    Figure PCTCN2018115928-appb-100004
    为第i帧频谱的包络值。
  15. 一种音频信号处理装置,其特征在于,包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至7中任一权利要求所述的音频处理方法。
  16. 一种存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至7中任一权利要求所述的音频处理方法。
PCT/CN2018/115928 2017-11-21 2018-11-16 音频信号处理方法、装置和存储介质 WO2019101015A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/617,900 US10964300B2 (en) 2017-11-21 2018-11-16 Audio signal processing method and apparatus, and storage medium thereof
EP18881136.8A EP3614383A4 (en) 2017-11-21 2018-11-16 AUDIO DATA PROCESSING METHOD AND APPARATUS AND STORAGE MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711168514.8 2017-11-21
CN201711168514.8A CN107863095A (zh) 2017-11-21 2017-11-21 音频信号处理方法、装置和存储介质

Publications (1)

Publication Number Publication Date
WO2019101015A1 true WO2019101015A1 (zh) 2019-05-31

Family

ID=61702429

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/115928 WO2019101015A1 (zh) 2017-11-21 2018-11-16 音频信号处理方法、装置和存储介质

Country Status (4)

Country Link
US (1) US10964300B2 (zh)
EP (1) EP3614383A4 (zh)
CN (1) CN107863095A (zh)
WO (1) WO2019101015A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583894A (zh) * 2020-04-29 2020-08-25 长沙市回音科技有限公司 一种实时修正音色的方法、装置、终端设备及计算机存储介质

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107863095A (zh) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 音频信号处理方法、装置和存储介质
CN108156575B (zh) 2017-12-26 2019-09-27 广州酷狗计算机科技有限公司 音频信号的处理方法、装置及终端
CN108156561B (zh) 2017-12-26 2020-08-04 广州酷狗计算机科技有限公司 音频信号的处理方法、装置及终端
CN108831437B (zh) * 2018-06-15 2020-09-01 百度在线网络技术(北京)有限公司 一种歌声生成方法、装置、终端和存储介质
CN108831425B (zh) * 2018-06-22 2022-01-04 广州酷狗计算机科技有限公司 混音方法、装置及存储介质
CN108922505B (zh) * 2018-06-26 2023-11-21 联想(北京)有限公司 信息处理方法及装置
CN108897851A (zh) * 2018-06-29 2018-11-27 上海掌门科技有限公司 一种获取音乐数据的方法、设备和计算机存储介质
CN110727823A (zh) * 2018-06-29 2020-01-24 上海掌门科技有限公司 一种生成并比对音乐数据的方法、设备和计算机存储介质
CN109036457B (zh) 2018-09-10 2021-10-08 广州酷狗计算机科技有限公司 恢复音频信号的方法和装置
CN109192218B (zh) * 2018-09-13 2021-05-07 广州酷狗计算机科技有限公司 音频处理的方法和装置
CN109817193B (zh) * 2019-02-21 2022-11-22 深圳市魔耳乐器有限公司 一种基于时变多段式频谱的音色拟合系统
CN111063364B (zh) * 2019-12-09 2024-05-10 广州酷狗计算机科技有限公司 生成音频的方法、装置、计算机设备和存储介质
US11158297B2 (en) * 2020-01-13 2021-10-26 International Business Machines Corporation Timbre creation system
CN111435591B (zh) * 2020-01-17 2023-06-20 珠海市杰理科技股份有限公司 声音合成方法及系统、音频处理芯片、电子设备
CN111402842B (zh) * 2020-03-20 2021-11-19 北京字节跳动网络技术有限公司 用于生成音频的方法、装置、设备和介质
CN112259072B (zh) * 2020-09-25 2024-07-26 北京百度网讯科技有限公司 语音转换方法、装置和电子设备
CN112331234A (zh) * 2020-10-27 2021-02-05 北京百度网讯科技有限公司 歌曲多媒体的合成方法、装置、电子设备及存储介质
US11996083B2 (en) 2021-06-03 2024-05-28 International Business Machines Corporation Global prosody style transfer without text transcriptions
CN113808555B (zh) * 2021-09-17 2024-08-02 广州酷狗计算机科技有限公司 歌曲合成方法及其装置、设备、介质、产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020159607A1 (en) * 2001-04-26 2002-10-31 Ford Jeremy M. Method for using source content information to automatically optimize audio signal
CN101645268A (zh) * 2009-08-19 2010-02-10 李宋 一种演唱和演奏的计算机实时分析系统
CN105872253A (zh) * 2016-05-31 2016-08-17 腾讯科技(深圳)有限公司 一种直播声音处理方法及移动终端
CN106652986A (zh) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 一种歌曲音频拼接方法及设备
CN107863095A (zh) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 音频信号处理方法、装置和存储介质

Family Cites Families (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5567901A (en) * 1995-01-18 1996-10-22 Ivl Technologies Ltd. Method and apparatus for changing the timbre and/or pitch of audio signals
US6046395A (en) * 1995-01-18 2000-04-04 Ivl Technologies Ltd. Method and apparatus for changing the timbre and/or pitch of audio signals
JP3319211B2 (ja) * 1995-03-23 2002-08-26 ヤマハ株式会社 音声変換機能付カラオケ装置
US6336092B1 (en) * 1997-04-28 2002-01-01 Ivl Technologies Ltd Targeted vocal transformation
US6304846B1 (en) * 1997-10-22 2001-10-16 Texas Instruments Incorporated Singing voice synthesis
CA2325482C (en) 1998-03-25 2009-12-15 Lake Technology Limited Audio signal processing method and apparatus
CN1219414C (zh) 2002-07-23 2005-09-14 华南理工大学 两扬声器虚拟5.1通路环绕声的信号处理方法
TWI236307B (en) 2002-08-23 2005-07-11 Via Tech Inc Method for realizing virtual multi-channel output by spectrum analysis
CN100440314C (zh) * 2004-07-06 2008-12-03 中国科学院自动化研究所 基于语音分析与合成的高品质实时变声方法
GB2430073A (en) * 2005-09-08 2007-03-14 Univ East Anglia Analysis and transcription of music
EP1785891A1 (en) * 2005-11-09 2007-05-16 Sony Deutschland GmbH Music information retrieval using a 3D search algorithm
CN100588288C (zh) 2005-12-09 2010-02-03 华南理工大学 双通路立体声信号模拟5.1通路环绕声的信号处理方法
US8688441B2 (en) 2007-11-29 2014-04-01 Motorola Mobility Llc Method and apparatus to facilitate provision and use of an energy value to determine a spectral envelope shape for out-of-signal bandwidth content
US8335331B2 (en) 2008-01-18 2012-12-18 Microsoft Corporation Multichannel sound rendering via virtualization in a stereo loudspeaker system
CN101902679B (zh) 2009-05-31 2013-07-24 比亚迪股份有限公司 立体声音频信号模拟5.1声道音频信号的处理方法
CN101695151B (zh) 2009-10-12 2011-12-21 清华大学 多声道音频信号变换为双声道音频信号的方法和设备
US8756061B2 (en) * 2011-04-01 2014-06-17 Sony Computer Entertainment Inc. Speech syllable/vowel/phone boundary detection using auditory attention cues
CN102883245A (zh) 2011-10-21 2013-01-16 郝立 3d幻音
CN102568470B (zh) 2012-01-11 2013-12-25 广州酷狗计算机科技有限公司 一种音频文件音质识别方法及其系统
KR101897455B1 (ko) 2012-04-16 2018-10-04 삼성전자주식회사 음질 향상 장치 및 방법
US9020822B2 (en) * 2012-10-19 2015-04-28 Sony Computer Entertainment Inc. Emotion recognition using auditory attention cues extracted from users voice
CN103854644B (zh) * 2012-12-05 2016-09-28 中国传媒大学 单声道多音音乐信号的自动转录方法及装置
CN103237287B (zh) 2013-03-29 2015-03-11 华南理工大学 具定制功能的5.1通路环绕声耳机重放信号处理方法
US9666202B2 (en) * 2013-09-10 2017-05-30 Huawei Technologies Co., Ltd. Adaptive bandwidth extension and apparatus for the same
BR112016015695B1 (pt) 2014-01-07 2022-11-16 Harman International Industries, Incorporated Sistema, mídia e método para tratamento de sinais de áudio comprimidos
CN104091601A (zh) 2014-07-10 2014-10-08 腾讯科技(深圳)有限公司 音乐品质检测方法和装置
CN104103279A (zh) 2014-07-16 2014-10-15 腾讯科技(深圳)有限公司 音乐真实品质判断方法和系统
CN104581602B (zh) 2014-10-27 2019-09-27 广州酷狗计算机科技有限公司 录音数据训练方法、多轨音频环绕方法及装置
CN107077849B (zh) 2014-11-07 2020-09-08 三星电子株式会社 用于恢复音频信号的方法和设备
CN104464725B (zh) 2014-12-30 2017-09-05 福建凯米网络科技有限公司 一种唱歌模仿的方法与装置
US9754580B2 (en) * 2015-10-12 2017-09-05 Technologies For Voice Interface System and method for extracting and using prosody features
US9852743B2 (en) * 2015-11-20 2017-12-26 Adobe Systems Incorporated Automatic emphasis of spoken words
US10157626B2 (en) * 2016-01-20 2018-12-18 Harman International Industries, Incorporated Voice affect modification
CN107040862A (zh) 2016-02-03 2017-08-11 腾讯科技(深圳)有限公司 音频处理方法及处理系统
KR20170092313A (ko) * 2016-02-03 2017-08-11 육상조 모바일 기기를 이용한 노래방 서비스 제공방법
US10123120B2 (en) 2016-03-15 2018-11-06 Bacch Laboratories, Inc. Method and apparatus for providing 3D sound for surround sound configurations
WO2017165968A1 (en) 2016-03-29 2017-10-05 Rising Sun Productions Limited A system and method for creating three-dimensional binaural audio from stereo, mono and multichannel sound sources
CN105788612B (zh) 2016-03-31 2019-11-05 广州酷狗计算机科技有限公司 一种检测音质的方法和装置
CN105869621B (zh) * 2016-05-20 2019-10-25 广州华多网络科技有限公司 音频合成装置及其音频合成的方法
CN106228973A (zh) * 2016-07-21 2016-12-14 福州大学 稳定音色的音乐语音变调方法
CN107249080A (zh) * 2017-06-26 2017-10-13 维沃移动通信有限公司 一种调整音效的方法、装置及移动终端
CN109215643B (zh) * 2017-07-05 2023-10-24 阿里巴巴集团控股有限公司 一种交互方法、电子设备及服务器
CN108156561B (zh) 2017-12-26 2020-08-04 广州酷狗计算机科技有限公司 音频信号的处理方法、装置及终端
CN108156575B (zh) 2017-12-26 2019-09-27 广州酷狗计算机科技有限公司 音频信号的处理方法、装置及终端
CN109036457B (zh) 2018-09-10 2021-10-08 广州酷狗计算机科技有限公司 恢复音频信号的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020159607A1 (en) * 2001-04-26 2002-10-31 Ford Jeremy M. Method for using source content information to automatically optimize audio signal
CN101645268A (zh) * 2009-08-19 2010-02-10 李宋 一种演唱和演奏的计算机实时分析系统
CN105872253A (zh) * 2016-05-31 2016-08-17 腾讯科技(深圳)有限公司 一种直播声音处理方法及移动终端
CN106652986A (zh) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 一种歌曲音频拼接方法及设备
CN107863095A (zh) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 音频信号处理方法、装置和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3614383A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583894A (zh) * 2020-04-29 2020-08-25 长沙市回音科技有限公司 一种实时修正音色的方法、装置、终端设备及计算机存储介质
CN111583894B (zh) * 2020-04-29 2023-08-29 长沙市回音科技有限公司 一种实时修正音色的方法、装置、终端设备及计算机存储介质

Also Published As

Publication number Publication date
US20200143779A1 (en) 2020-05-07
US10964300B2 (en) 2021-03-30
EP3614383A1 (en) 2020-02-26
EP3614383A4 (en) 2020-07-15
CN107863095A (zh) 2018-03-30

Similar Documents

Publication Publication Date Title
WO2019101015A1 (zh) 音频信号处理方法、装置和存储介质
US10708649B2 (en) Method, apparatus and system for displaying bullet screen information
US10445482B2 (en) Identity authentication method, identity authentication device, and terminal
CN104967900B (zh) 一种生成视频的方法和装置
KR102207208B1 (ko) 음악 정보 시각화 방법 및 장치
CN106782600B (zh) 音频文件的评分方法及装置
US20200194027A1 (en) Method and apparatus for displaying pitch information in live webcast room, and storage medium
CN104518875B (zh) 一种身份验证及账号获取的方法、移动终端
WO2016184295A1 (zh) 即时通讯方法、用户设备及系统
CN107731241B (zh) 处理音频信号的方法、装置和存储介质
CN106973330B (zh) 一种屏幕直播方法、装置和系统
WO2017215660A1 (zh) 一种场景音效的控制方法、及电子设备
CN106528545B (zh) 一种语音信息的处理方法及装置
CN106203235B (zh) 活体鉴别方法和装置
CN106371964B (zh) 一种进行消息提示的方法和装置
CN106328176B (zh) 一种生成歌曲音频的方法和装置
WO2017215661A1 (zh) 一种场景音效的控制方法、及电子设备
WO2017088527A1 (zh) 音频文件的重录方法、装置及存储介质
CN109003194A (zh) 评论分享方法、终端以及存储介质
CN104573437B (zh) 信息认证方法、装置和终端
CN110798327B (zh) 消息处理方法、设备及存储介质
CN108763475B (zh) 一种录制方法、录制装置及终端设备
CN105550316B (zh) 音频列表的推送方法及装置
CN104038832A (zh) 一种播放视频的方法及装置
WO2017215615A1 (zh) 一种音效处理方法及移动终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18881136

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018881136

Country of ref document: EP

Effective date: 20191121

NENP Non-entry into the national phase

Ref country code: DE