WO2019233361A1 - Method and device for adjusting volume of music - Google Patents

Method and device for adjusting volume of music Download PDF

Info

Publication number
WO2019233361A1
WO2019233361A1 PCT/CN2019/089758 CN2019089758W WO2019233361A1 WO 2019233361 A1 WO2019233361 A1 WO 2019233361A1 CN 2019089758 W CN2019089758 W CN 2019089758W WO 2019233361 A1 WO2019233361 A1 WO 2019233361A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
noise
played
neural network
volume
Prior art date
Application number
PCT/CN2019/089758
Other languages
French (fr)
Chinese (zh)
Inventor
姚青山
秦宇
喻浩文
卢峰
Original Assignee
安克创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安克创新科技股份有限公司 filed Critical 安克创新科技股份有限公司
Publication of WO2019233361A1 publication Critical patent/WO2019233361A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • Embodiments of the present invention relate to the field of sound, and more particularly, to a method and device for adjusting volume of music.
  • Sound quality is a subjective evaluation of audio quality. Generally, sound quality is divided into dozens of indicators, and loudness is also called loudness, which is one of the important indicators.
  • the volume will affect the quality of people receiving music information.
  • the volume setting is generally related to the ambient sound. For example, the volume of music in a noisy environment is generally higher than the volume of music in a quiet environment.
  • the current volume setting is mainly adjusted by the user himself, which brings complexity to the user and affects the user's experience.
  • some existing automatic volume adjustment technologies generally only consider environmental noise parameters, so the automatic volume adjustment ability is limited.
  • the individual user's preference for volume is related to many factors, such as the type of music, when people listen to different types of music Different volume may be set, and different types of environmental noise will have different effects on the volume setting.
  • Other factors include personal preferences and personal hearing, audio playback device parameters, etc.
  • the volume model must fully consider these factors to achieve Better performance.
  • Embodiments of the present invention provide a method and device for automatically adjusting the volume of music, which can adjust the volume of music based on deep learning, simplify user operations, and thereby improve the user experience.
  • a method for adjusting volume of music including:
  • the method further includes:
  • the volume adjusted by the specific user is used as a training sample, and learning is performed on the basis of the parameters of the baseline model to obtain an updated model and use
  • the updated model described above replaces the baseline model.
  • the pre-trained neural network includes a music style neural network, a noise category identification neural network, and a volume adjustment neural network.
  • the process of obtaining the volume setting of the music to be played includes:
  • the style vector of the music to be played, the category of the noise, the energy characteristics of the music to be played, and the energy characteristics of the noise are input to the volume adjustment neural network to obtain the volume setting of the music to be played.
  • a process of obtaining a style vector of the music to be played includes:
  • the characteristics of the music to be played are input to the music style neural network to obtain the style vector of the music to be played.
  • the process of obtaining the category of the noise includes:
  • the characteristics of the noise are input to the noise category identification neural network to obtain the category of the noise.
  • the energy characteristics of the music to be played include the average amplitude of the music to be played, and the process of obtaining the energy characteristics of the music to be played includes:
  • the energy characteristic of the noise includes an average amplitude of the noise
  • a process of obtaining the energy characteristic of the noise includes:
  • the absolute value of the amplitude of each point in the time domain waveform of the noise is calculated, and then divided by the total number of points to obtain the average amplitude of the noise.
  • the method before using a musical style neural network, the method further includes:
  • the music style neural network is obtained through training.
  • each music training data in the music training data set has a music style vector
  • the music style vector of the music training data is obtained in the following manner:
  • a music style vector of each music training data is determined according to the annotation matrix.
  • the determining a music style vector of each music training data according to the annotation matrix includes:
  • Each row vector of the first matrix is determined as a music style vector of the corresponding music training data.
  • the method before using the noise category identification neural network, the method further includes:
  • the noise class identification neural network is obtained through training.
  • the time-domain waveform of the noise is collected by a pickup device of a user audio playback device.
  • the method further includes:
  • a device for volume adjustment of music is provided, the device is configured to implement the steps of the method described in the first aspect or any implementation manner, and the device includes:
  • An acquisition module for acquiring a time-domain waveform of music to be played and a time-domain waveform of noise of a playback environment
  • a determining module configured to obtain a volume setting of the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise by using a pre-trained neural network
  • An adjustment module is used to adjust the volume of the music to be played using the volume setting.
  • a device for adjusting volume of music which includes a memory, a processor, and a computer program stored on the memory and running on the processor.
  • the processor executes the computer program, Implement the steps of the method described in the foregoing first aspect or any implementation.
  • a computer storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps of the method according to the foregoing first aspect or any implementation manner are implemented.
  • the embodiment of the present invention uses a pre-trained neural network including a music style neural network, a noise category identification neural network, and a volume adjustment neural network, which takes into account the noise category and music style of the environment to affect the user's current volume Preference factors can automatically adjust the volume of the music to be played by the user, which can greatly simplify the user's operation and improve the user experience. And it can be adjusted again according to the volume preference of a specific user, and a volume adjustment model dedicated to a specific user can be obtained through online learning. Therefore, the volume adjustment model dedicated to a specific user can be used to automatically set the volume of the music to be played that the specific user wants to play.
  • FIG. 1 is a schematic flowchart of obtaining a music style vector of music training data according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a labeling matrix in an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a method for adjusting volume of music in an embodiment of the present invention
  • FIG. 4 is another schematic flowchart of a method for adjusting volume of music in an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of readjusting a user based on a volume setting according to an embodiment of the present invention
  • FIG. 6 is a schematic flowchart of obtaining a volume adjustment model dedicated to a specific user through online learning based on a baseline model in an embodiment of the present invention
  • FIG. 7 is a schematic flowchart of obtaining a volume adjustment model dedicated to a specific user in an embodiment of the present invention.
  • FIG. 8 is a schematic block diagram of a device for adjusting volume of music in an embodiment of the present invention.
  • FIG. 9 is another schematic block diagram of a device for adjusting volume of music in an embodiment of the present invention.
  • Deep learning is a machine learning method that uses deep neural networks to learn features of data with complex models, and intelligently organizes low-level features of data to form more advanced abstract forms. Because deep learning has strong feature extraction and modeling capabilities for complex data that is difficult to abstract and model manually, deep learning is an effective implementation method for tasks such as adaptive adjustment of sound quality that are difficult to model manually.
  • An embodiment of the present invention provides a pre-trained neural network, which includes a musical style neural network, a noise category identification neural network, and a volume adjustment neural network. Each will be explained below.
  • a musical style neural network is constructed based on deep learning.
  • the musical style neural network is trained based on the music training data set.
  • the music training data set includes a large amount of music training data, and a single music training data is described in detail below.
  • the music training data is music data, including the characteristics of the music training data, which can be used as the input of the neural network; it also includes the music style vector of the music training data, which can be used as the output of the neural network.
  • the original music waveform is a time-domain waveform
  • the time-domain waveform may be framed, and feature extraction is performed for each frame after the framed frame to obtain the characteristics of the music training data.
  • short-time Fourier transform (Short-Time Fourier Transform, STFT) can be used for feature extraction
  • the extracted feature can be Mel Frequency Frequency Cepstrum Coefficient (MFCC).
  • MFCC Mel Frequency Frequency Cepstrum Coefficient
  • the features obtained by feature extraction here and thereafter may be expressed as a feature tensor, for example, as an N-dimensional feature vector; or, the extracted features may also be expressed in other forms. It is not limited here.
  • the music style vector of the music training data can be obtained by referring to the method shown in FIG. 1, and the process includes:
  • the style annotation information of different users may be the same or different.
  • some users may label it as “Folk Music”
  • some users may label it as “Popular”
  • some users may label it as “Folk Music” and “Beisheng”, and so on.
  • the number of different style annotations can be obtained. As an example, referring to FIG. 2, for "My Motherland”, the number of annotations of "Folk Music” is 12, the number of annotations of "Popular” is 3, and the number of annotations of "Beisheng” is 10.
  • a labeling matrix may be generated based on labeling information of a plurality of music training data.
  • the rows of the labeling matrix may represent labeling information of a certain music training data, for example, each row represents a "style label” of the corresponding music training data.
  • the columns of the label matrix represent style. Referring to FIG. 2, the labeling matrix generated for labeling information of "My Motherland”, “Qilixiang”, “Coral Sea”, and "Ten Send Red Army” can be expressed as:
  • FIG. 2 is only schematic. Although only 4 pieces of music training data and 4 styles are shown therein, the present invention is not limited thereto, and may be obtained based on a larger amount of music training data and a larger number of styles Annotation matrix.
  • a music style vector can be extracted from the annotation matrix.
  • a vector corresponding to a row of music training data in the labeling matrix may be used as its music style vector.
  • the music style vector is [12,3,0,10].
  • a vector corresponding to a row of music training data in the labeling matrix may be normalized as its music style vector.
  • My Motherland the music style vector is [12 / 25,3 / 25,0,10 / 25].
  • the extraction algorithms include, but are not limited to, matrix decomposition, factorization machine, or word vectorization algorithm. The dimension of the music style vector obtained in this example is smaller, that is, a denser music style vector can be obtained.
  • the vectors of each row in the labeling matrix are sparse vectors.
  • some of the values are positive integers, and the rest are 0.
  • the labeling matrix is also a sparse matrix.
  • the labeling matrix may be decomposed into a first matrix multiplied by a second matrix.
  • the rows of the first matrix represent music style vectors corresponding to the music training data, which can be regarded as compression of style labels in the form of sparse vectors.
  • the music style vector of "My Motherland” is [1.2,3.7,3.1]
  • the music style vector of "Ten Free Red Army” is [1.8,4.0,4.1].
  • the cosine similarity between the vectors is high, so it can be determined that "My Motherland” and "Ten Free Red Army” are similar music.
  • the second matrix is a weight representing each item of the first matrix (specific values of each element of the second matrix are not shown in FIG. 2). Specifically, each column of the second matrix is for a music style, and the values in one column represent the weight of the music style class to each element in the first matrix.
  • FIG. 2 is only schematic. Although it shows that the dimension of the number of columns of the labeling matrix is 4 and the dimension of the obtained music style vector is 3, the present invention is not limited thereto. For example, in practical applications, the dimensions of matrices and vectors can be larger.
  • the music style vector of each music training data can be obtained. Taking the features as input and the music style vector as output, the music style neural network is trained until convergence, and then a trained music style neural network can be obtained.
  • a noise class recognition neural network is also constructed based on deep learning.
  • the noise class recognition neural network is trained based on the noise training data set.
  • the noise training data set includes a large amount of noise training data, and a single noise training data is described in detail below.
  • the noise training data is noise data, including the characteristics of the noise training data, which can be used as the input of the neural network; it also includes the noise category of the noise training data, which can be used as the output of the neural network.
  • the original noise waveform is a time-domain waveform
  • the time-domain waveform may be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the noise training data.
  • feature extraction may be performed through Short-Time Fourier Transform (STFT), and the extracted feature may be Mel Frequency Frequency Cepstrum Coefficient (MFCC).
  • STFT Short-Time Fourier Transform
  • MFCC Mel Frequency Frequency Cepstrum Coefficient
  • each noise training data may be labeled with a noise category to which it belongs.
  • Noise categories may include, but are not limited to, airports, pedestrian streets, buses, shopping malls, restaurants, and the like.
  • the method of marking is not limited in the present invention. For example, "000” may be used to indicate an airport, "001" to indicate a pedestrian street, and "010" to indicate a bus, etc .; other methods may also be used for marking, which are not listed here one by one.
  • one noise training data may be marked by one user or multiple users, and the noise categories marked by different users may be the same or different.
  • the most marked number among them can be determined as the noise category to which the one noise training data belongs. For example, suppose the noise training data A is labeled as "000" by m1 users, "001" by m2 users, and "010" by m3 users. If m1> m2 and m1> m3, you can It is determined that the noise category to which the noise training data A belongs is "000".
  • the noise category recognition neural network is trained until convergence, and the trained noise category recognition neural network can be obtained.
  • a volume adjustment neural network is also constructed based on deep learning.
  • the volume adjustment neural network is obtained by training according to a training data set.
  • the training data set includes a large amount of training data, and the training data set may be a user behavior set, such as collecting data of multiple users listening to music in various environments.
  • the single training data is explained in detail below.
  • the data can be acquired as training data.
  • the time domain waveform of the music can be obtained according to the music being played by the user
  • the time domain waveform of the ambient noise can be obtained through the pickup device of the playback terminal used by the user
  • the user's volume setting can be obtained.
  • the acquiring the time-domain waveform of the music may include: acquiring the time-domain waveform of the music from a client used by the user. Alternatively, it may include: acquiring music information of the music from a client used by the user, and acquiring the time-domain waveform of the music from a music database on the server according to the music information, so that the transmission amount can be reduced.
  • the music information may include at least one of a song title, a singer, an album, and the like. It can be understood that the music information described in the embodiment of the present invention is only exemplary, and it may include other information, such as duration, format, etc., which are not listed here one by one.
  • pickup devices such as a headset microphone and a mobile phone microphone are not limited here.
  • pickup devices such as a headset microphone and a mobile phone microphone are not limited here.
  • the volume may be expressed as a percentage, or the volume may also be expressed in other manners, which is not limited in the present invention.
  • the characteristics of the music included in the training data can be obtained based on the time-domain waveform of the music included in the training data.
  • the time-domain waveform of the music can be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the music.
  • the characteristics of the music are input to the aforementioned music style neural network, and a style vector of the music can be obtained.
  • the style vectors of the music obtained in different frames are different, the style vectors obtained in these frames may be averaged, and the averaged style vector may be used as the style vector of the music.
  • the "average” used herein is a result value obtained by averaging a plurality of style vector items (or values).
  • the "average” can also obtain the result value through other calculation methods, such as a weighted average, in which the weights of different items can be equal or different, and the embodiment of the present invention does not limit the average method.
  • the characteristics of the noise can be obtained based on the time-domain waveform of the noise included in the training data. Specifically, the time-domain waveform of the noise can be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the noise. Then, the characteristics of the noise are input to the aforementioned noise category identification neural network, and the category of the noise can be obtained. Exemplarily, if the types of noise obtained from different frames are different, the categories obtained from these frames may be classified and counted, and the category with the largest number is used as the category of the noise.
  • Music energy characteristics can be obtained based on the time-domain waveform of the music included in the training data.
  • the embodiment of the present invention does not limit the manner of calculating the energy characteristics of music.
  • the energy characteristics of music can be calculated according to the amplitude of each point of the time-domain waveform of the music.
  • the music energy feature may include the average amplitude of music.
  • the absolute value of the amplitude of each point in the time-domain waveform of the music may be calculated, and then divided by the total number of points to obtain the average music amplitude. That is, the arithmetic mean of the amplitudes of all points in the time domain waveform of the music can be used as the music energy feature.
  • the geometric mean or weighted mean of the amplitudes of all points in the time domain waveform of the music may be used as the music energy feature.
  • the amplitudes of all points in the time-domain waveform of the music may be taken as natural logarithms and then arithmetically averaged as the music energy feature.
  • the energy characteristics of music can also be obtained by other calculation methods, which is not limited in the present invention.
  • the noise energy characteristics can be obtained based on the time-domain waveform of the noise included in the training data.
  • the embodiment of the present invention does not limit the manner of calculating the noise energy characteristics.
  • the noise energy characteristics may be calculated according to the amplitude of each point of the time domain waveform of the noise.
  • the noise energy characteristic may include the average amplitude of the noise.
  • the absolute value of the amplitude of each point in the time domain waveform of the noise may be calculated, and then divided by the number of points to obtain the average amplitude of the noise. That is, the arithmetic mean of the amplitudes of all points in the time domain waveform of the noise can be used as the noise energy feature.
  • a geometric mean or a weighted mean of the amplitudes of all points in the time domain waveform of the noise may be used as the noise energy feature.
  • the amplitudes of all points in the time domain waveform of the noise may be taken as natural logarithms and then arithmetically averaged as the noise energy characteristic.
  • the noise energy characteristics can also be obtained by other calculation methods, which is not limited in the present invention.
  • the style vector of the music, the type of noise, the characteristics of the music energy, and the characteristics of the noise energy can be obtained, and the user's volume setting can be obtained.
  • the volume adjustment neural network is trained until convergence, and the trained volume adjustment neural network can be obtained.
  • An embodiment of the present invention provides a method for adjusting volume of music. As shown in FIG. 3, a flowchart of the method includes:
  • the pre-trained neural network may include a musical style neural network, a noise category identification neural network, and a volume adjustment neural network.
  • a music style neural network, a noise category identification neural network, and a volume adjustment neural network may be used to obtain the music to be played.
  • the music style neural network, the noise category identification neural network, and the volume adjustment neural network can be the aforementioned trained music style neural network, the trained noise category identification neural network, and the volume adjustment neural network, respectively. It is understandable that The aforementioned training process is generally performed on the server side (ie, the cloud).
  • the method shown in FIG. 3 may be executed by a server (that is, the cloud), or may be executed by a client.
  • the client can directly obtain the time domain waveform of the music to be played. If the music to be played is online music, the client can obtain the time domain waveform of the music to be played from the server. In addition, the time-domain waveform of the noise in the environment can be obtained by the pickup device of the client. Before S220, the client can obtain the pre-trained music style neural network, noise category identification neural network, and volume adjustment neural network from the server.
  • the server receives the music to be played from the client to obtain the time domain waveform of the music to be played.
  • the server receives the music information of the music to be played from the client.
  • the music information here may include the song name, singer, album At least one of them. Acquire the music to be played from the music database on the server side according to the music information, thereby obtaining the time domain waveform of the music to be played.
  • the server can also receive the time-domain waveform of the ambient noise collected by the client's pickup device from the client.
  • S220 may include:
  • the time-domain waveform of the music to be played can be framed, and feature extraction is performed for each frame after the framed frame to obtain the characteristics of the music to be played. Then, the features of the music to be played can be input to a music style neural network to obtain a style vector of the music to be played.
  • the method for feature extraction may include, but is not limited to, STFT, MFCC, and the like.
  • the extracted features may be amplitude spectrum, log spectrum, energy spectrum, etc., which is not limited in the present invention.
  • the time domain waveform of the noise can be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the noise.
  • the characteristics of the noise can then be input to a noise category identification neural network to obtain the category of the noise.
  • the method for feature extraction may include, but is not limited to, STFT, MFCC, and the like.
  • the extracted features may be amplitude spectrum, log spectrum, energy spectrum, etc., which is not limited in the present invention.
  • S2203 Obtain an energy characteristic of the music to be played according to a time-domain waveform of the music to be played.
  • the energy characteristics of the music may include the average amplitude of the music.
  • the absolute value of the amplitude of each point of the time-domain waveform of the music to be played can be calculated, and then divided by the total number of points to obtain the average amplitude of the music to be played.
  • a geometric average or a weighted average of the amplitudes of all points of the time-domain waveform of the music to be played may be used as the energy feature of the music to be played.
  • the amplitudes of all points of the time-domain waveform of the music to be played may be taken as natural logarithms and then arithmetically averaged as the energy characteristic of the music to be played.
  • the energy characteristics of the noise may include the average amplitude of the noise.
  • the absolute value of the amplitude of each point in the time domain waveform of the noise can be calculated, and then divided by the total number of points to obtain the average amplitude of the noise.
  • a geometric average or a weighted average of the amplitudes of all points of the time domain waveform of the noise may be used as the energy characteristic of the noise.
  • the amplitudes of all points in the time domain waveform of the noise may be taken as natural logarithms and then arithmetically averaged as the energy characteristic of the noise.
  • the embodiment of the present invention does not limit the execution order of S2201 to S2204.
  • the four steps S2201-S2204 can be executed in parallel.
  • S2201 and S2202 can be executed sequentially or in parallel, and then S2203 and S2204 can be executed sequentially or in parallel.
  • S2204 and S2203 can be executed sequentially or in parallel, and then S2201 and S2202 can be executed sequentially or in parallel.
  • S2201 and S2203 may be executed sequentially or in parallel, and then S2202 and S2204 may be executed sequentially or in parallel.
  • S2201-S2204 can be executed in any order, and no longer listed here.
  • S2205 input the style vector of the music to be played, the category of the noise, the energy characteristics of the music to be played, and the energy characteristics of the noise to a volume adjustment neural network to obtain a volume setting of the music to be played.
  • the embodiment of the present invention adopts a pre-trained neural network including a musical style neural network, a noise category identification neural network, and a volume adjustment neural network, which considers various influences on the user such as the noise category and music style of the environment
  • the current volume preference factor can automatically adjust the volume of the user's music to be played, which can greatly simplify the user's operation and improve the user experience.
  • the trained volume adjustment neural network may be referred to as a volume adjustment baseline neural network or may be referred to as a volume adjustment baseline model.
  • the user's preferences can be considered, and the volume adjustment neural network for specific users can be obtained through online learning.
  • the volume adjustment neural network in S2205 may be a volume adjustment baseline model, and in S230, the volume setting determined by S2205 may be used to adjust the volume of the music to be played. And, after S230, the adjusted volume can be used to play the music to be played.
  • the volume setting obtained by S230 is satisfactory to the user, the volume setting can be used to play the music to be played, and the above-mentioned volume adjustment baseline model is also a proprietary volume adjustment model suitable for the user.
  • the volume obtained by S230 may not be satisfactory to the user. Therefore, after S230, the user may adjust the volume again on this basis to obtain the desired volume of the user. .
  • This process can be shown in Figure 5.
  • a volume adjustment model dedicated to a specific user can be obtained through online learning based on a user's readjustment based on a pre-trained neural network.
  • the process may include:
  • a pre-trained neural network is used as a baseline model.
  • the corresponding volume setting can be obtained using the baseline model.
  • the baseline model can be learned online through the readjustment instruction of the specific user (that is, the user's feedback on the volume setting) until the user has little or no feedback, and the model finally obtained in S320 can be determined to be dedicated to a specific User's volume adjustment model.
  • the model is a volume adjustment model dedicated to a specific user.
  • the dedicated model can be used to automatically set the volume for the music played by a specific user without manual adjustment by the user, thereby improving the user experience.
  • the number of readjustments performed by a specific user is less than a preset value, which may mean that the frequency of readjustment performed by a specific user is less than a preset frequency.
  • the preset frequency may be equal to N0 / N.
  • the number of pieces of music that the specific user has adjusted again is less than N0.
  • a volume adjustment model dedicated to a specific user can be obtained through online learning based on the volume adjustment baseline model and according to readjustment by a specific user. After that, the volume adjustment model dedicated to a specific user can be used to automatically set the volume of the music to be played that the specific user wants to play, reducing user operations and improving the user experience.
  • FIG. 8 is a schematic block diagram of a device for adjusting volume of music according to an embodiment of the present invention.
  • the device 30 shown in FIG. 8 includes an acquisition module 310, a determination module 320, and an adjustment module 330.
  • the obtaining module 310 is configured to obtain a time-domain waveform of music to be played and a time-domain waveform of noise of a playback environment.
  • the determining module 320 is configured to obtain a volume setting of the music to be played according to a time domain waveform of the music to be played and a time domain waveform of the noise by using a pre-trained neural network.
  • the adjusting module 330 is configured to use the volume setting to adjust the volume of the music to be played.
  • the device 30 shown in FIG. 8 may be a server side (that is, the cloud).
  • the device 30 may further include a training module for obtaining the pre-trained neural network through training based on the training data set.
  • the device 30 may include a training module for obtaining a volume adjustment neural network dedicated to the specific user through online learning.
  • the pre-trained neural network may be used as a baseline model. Repeat the following steps until the number of times the specific user readjusts the instruction is less than the preset value: for the music being played, use the baseline model to obtain the corresponding volume setting; obtain the specific user's readjustment of the corresponding volume setting Instruction; if the number of times that the specific user adjusts the instruction again reaches a preset value, the volume adjusted by the specific user is used as a training sample, learning is performed on the basis of the baseline model, and an updated model is obtained and used The updated model replaces the baseline model. Then the updated model finally obtained is a volume adjustment neural network dedicated to the specific user.
  • the pre-trained neural network includes: a musical style neural network, a noise category identification neural network, and a volume adjustment neural network.
  • the determining module 320 may be specifically configured to use the music style neural network, the noise category recognition neural network, and the volume adjustment neural network to obtain the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise. Volume setting.
  • the determination module 320 may include a style vector determination unit, a noise category determination unit, a music energy feature determination unit, a noise energy feature determination unit, and a volume determination unit.
  • a style vector determining unit is configured to obtain a style vector of the music to be played according to a time-domain waveform of the music to be played by using the music style neural network.
  • the noise category determination unit is configured to identify a neural network using the noise category to obtain a category of the noise according to a time domain waveform of the noise.
  • the music energy characteristic determining unit is configured to obtain an energy characteristic of the music to be played according to a time-domain waveform of the music to be played.
  • the noise energy characteristic determining unit is configured to obtain an energy characteristic of the noise according to a time-domain waveform of the noise.
  • the volume determining unit is configured to input a style vector of the music to be played, a category of the noise, energy characteristics of the music to be played, and energy characteristics of the noise to the volume adjustment neural network to obtain the to-be-played music.
  • Music volume setting is configured to input a style vector of the music to be played, a category of the noise, energy characteristics of the music to be played, and energy characteristics of the noise to the volume adjustment neural network to obtain the to-be-played music.
  • the style vector determining unit is specifically configured to frame the time-domain waveform of the music to be played, and extract features from each frame after the frame to obtain the characteristics of the music to be played; Music characteristics are input to the music style neural network to obtain the style vector of the music to be played.
  • the noise category determination unit is specifically configured to: frame the time-domain waveform of the noise, and extract features from each frame after the frame to obtain the characteristics of the noise; and input the characteristics of the noise to all
  • the noise category identification neural network is used to obtain the category of the noise.
  • the energy characteristic of the music to be played includes the average amplitude of the music to be played.
  • the music energy characteristic determining unit is specifically configured to calculate the absolute value of the amplitude of each point of the time domain waveform of the music to be played, and divide The energy characteristics of the music to be played are obtained by the total points.
  • the energy characteristic of the noise includes an average amplitude of the noise
  • the noise energy characteristic determining unit is specifically configured to calculate the absolute value of the amplitude of each point in the time-domain waveform of the noise, and divide by the total number of points to obtain the Energy characteristics of noise.
  • the device 30 further includes a training module, configured to obtain the music style neural network through training based on the music training data set.
  • each music training data in the music training data set has a music style vector.
  • the training module obtains the music style vector of the music training data in the following ways: acquiring style annotation information of a plurality of music training data by a large number of users, and generating a annotation matrix based on the style annotation information; determining each music training according to the annotation matrix Data musical style vector.
  • the labeling matrix is decomposed into a product of a first matrix and a second matrix; and each row vector of the first matrix is determined as a music style vector of corresponding music training data.
  • the device 30 further includes a training module, configured to obtain the noise category identification neural network through training based on the noise training data set.
  • the time-domain waveform of the noise acquired by the acquiring module 310 is acquired by a pickup device of the client.
  • the device 30 further includes a playback module for playing the music to be played after the volume is adjusted.
  • the device 30 shown in FIG. 8 can be used to implement the foregoing method for adjusting volume of music. To avoid repetition, details are not described herein again.
  • an embodiment of the present invention further provides another device for adjusting volume of music, which includes a memory, a processor, and a computer program stored on the memory and running on the processor. The steps of the method shown previously are carried out when the program is executed.
  • the processor may obtain the time domain waveform of the music to be played and the time domain waveform of the noise of the playback environment; according to the time domain waveform of the music to be played and the time domain waveform of the noise, use a pre-trained neural network To obtain the volume setting of the music to be played; use the volume setting to adjust the volume of the music to be played.
  • the pre-trained neural network includes a music style neural network, a noise category identification neural network, and a volume adjustment neural network.
  • the processor can also learn online to obtain a volume-adjusting neural network dedicated to a specific user.
  • the device for adjusting the volume of music in the embodiment of the present invention may include: one or more processors, one or more memories, input devices, and output devices, and these components are connected through a bus system and / or other forms.
  • the connection mechanism is interconnected. It should be noted that the device may also have other components and structures as required.
  • the processor may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and / or instruction execution capabilities, and may control other components in the device to perform desired functions.
  • CPU central processing unit
  • the memory may include one or more computer program products, and the computer program product may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory.
  • the volatile memory may include, for example, a random access memory (RAM) and / or a cache memory.
  • the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
  • One or more computer program instructions may be stored on the computer-readable storage medium, and the processor may run the program instructions to implement a client function (implemented by the processor) in the embodiments of the present invention described below, and / Or other desired function.
  • client function implemented by the processor
  • Various application programs and various data can also be stored in the computer-readable storage medium.
  • the input device may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
  • the output device may output various information (for example, images or sounds) to the outside (for example, a user), and may include one or more of a display, a speaker, and the like.
  • the input device / output device may be an external device and communicate with the processor through a wired or wireless manner.
  • an embodiment of the present invention also provides a computer storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the steps of the aforementioned method for adjusting volume can be implemented.
  • the computer storage medium is a computer-readable storage medium.
  • the embodiment of the present invention uses a pre-trained neural network including a music style neural network, a noise category identification neural network, and a volume adjustment neural network, which takes into account the noise category and music style of the environment to affect the user's current volume Preference factors can automatically adjust the volume of the music to be played by the user, which can greatly simplify the user's operation and improve the user experience. And it can be adjusted again according to the volume preference of a specific user, and a volume adjustment model dedicated to a specific user can be obtained through online learning, so that the volume adjustment model dedicated to a specific user can be used to automatically Make volume settings.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present invention is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in various embodiments of the present invention.
  • the foregoing storage media include: U disks, mobile hard disks, read-only memories (ROMs), random access memories (RAMs), magnetic disks or compact discs and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

A method and device for adjusting a volume of music. The method comprises: obtaining a time domain waveform of music to be played and a time domain waveform of noise in a playback environment (S210); using, according to the time domain waveform of the music and the time domain waveform of the noise, a pre-trained neural network to obtain volume settings for the music (S220); and using the volume settings to adjust a volume of the music (S230). The invention employs a pre-trained neural network comprising a music style neural network, a noise class recognition neural network and a volume adjustment neural network, and takes into consideration factors, such as an ambient noise class and a music style, which influence a current user volume preference, such that a volume of music to be played by a user can be automatically adjusted, thereby maximally simplifying an operation procedure for the user, and improving user experience.

Description

对音乐进行音量调节的方法及设备Method and equipment for adjusting volume of music
本申请要求于2018年6月5日提交的、申请号为201810583114.1、发明名称为“对音乐进行音量调节的方法及设备”的中国发明专利申请的优先权。This application claims the priority of a Chinese invention patent application filed on June 5, 2018 with an application number of 2018105831114.1 and an invention name of "Method and Device for Adjusting Volume of Music".
技术领域Technical field
本发明实施例涉及声音领域,并且更具体地,涉及一种对音乐进行音量调节的方法及设备。Embodiments of the present invention relate to the field of sound, and more particularly, to a method and device for adjusting volume of music.
背景技术Background technique
音质是人对音频质量的主观评价。一般地音质被划分成几十个指标,音量(loudness)也称为响度,是其中一项重要的指标。音量的大小会影响人对音乐信息的接收质量。音量的设置一般与环境音有关,例如在嘈杂的环境中的音乐音量一般高于在安静的环境中的音乐音量。Sound quality is a subjective evaluation of audio quality. Generally, sound quality is divided into dozens of indicators, and loudness is also called loudness, which is one of the important indicators. The volume will affect the quality of people receiving music information. The volume setting is generally related to the ambient sound. For example, the volume of music in a noisy environment is generally higher than the volume of music in a quiet environment.
目前的音量的设置主要是由用户自己调节的,这样给用户带来了操作复杂度,影响了用户的体验。另外现存的一些自动音量调节技术,一般只考虑了环境噪声参数,因此音量自动调节能力有限,实际上个人用户对音量的偏好与很多因素有关,如音乐的类别,人们听不同风格类型音乐时,可能会设置不同的音量,不同类型的环境噪声也会对音量设置造成不同的影响,其他的因素还有个人的偏好和个人的听力、音频播放设备参数等,音量模型必须全面考虑这些因素才能达到更好的性能。The current volume setting is mainly adjusted by the user himself, which brings complexity to the user and affects the user's experience. In addition, some existing automatic volume adjustment technologies generally only consider environmental noise parameters, so the automatic volume adjustment ability is limited. In fact, the individual user's preference for volume is related to many factors, such as the type of music, when people listen to different types of music Different volume may be set, and different types of environmental noise will have different effects on the volume setting. Other factors include personal preferences and personal hearing, audio playback device parameters, etc. The volume model must fully consider these factors to achieve Better performance.
发明内容Summary of the Invention
本发明实施例提供了一种对音乐的音量进行自动调节的方法及设备,可以基于深度学习实现对音乐的音量进行调节,简化了用户操作,从而提升了用户的体验。Embodiments of the present invention provide a method and device for automatically adjusting the volume of music, which can adjust the volume of music based on deep learning, simplify user operations, and thereby improve the user experience.
第一方面,提供了一种对音乐进行音量调节的方法,包括:In a first aspect, a method for adjusting volume of music is provided, including:
获取待播放音乐的时域波形以及播放环境的噪声的时域波形;Obtain the time domain waveform of the music to be played and the time domain waveform of the noise of the playback environment;
根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用预先训 练好的神经网络,得到所述待播放音乐的音量设置;Obtaining a volume setting of the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise by using a neural network trained in advance;
使用所述音量设置调节所述待播放音乐的音量。Use the volume setting to adjust the volume of the music to be played.
在本发明的一种实现方式中,还包括:In an implementation manner of the present invention, the method further includes:
将所述预先训练好的神经网络作为基线模型;Using the pre-trained neural network as a baseline model;
重复执行以下步骤,直到特定用户的再次调节指令的次数小于预设值:Repeat the following steps until the number of times the specific user adjusts the instruction again is less than the preset value:
对在播放音乐,使用所述基线模型得到相应的音量设置;For playing music, use the baseline model to get the corresponding volume setting;
获取所述特定用户对所述相应的音量设置的再次调节指令;Obtaining a re-adjustment instruction of the corresponding volume setting by the specific user;
若所述特定用户的再次调节指令的次数达到预设值,则将所述特定用户调节后的音量作为训练样本,在所述基线模型的参数基础上进行学习,得到更新后的模型,并用所述更新后的模型替换基线模型。If the number of readjustment instructions of the specific user reaches a preset value, the volume adjusted by the specific user is used as a training sample, and learning is performed on the basis of the parameters of the baseline model to obtain an updated model and use The updated model described above replaces the baseline model.
在本发明的一种实现方式中,所述预先训练好的神经网络包括:音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络。In an implementation manner of the present invention, the pre-trained neural network includes a music style neural network, a noise category identification neural network, and a volume adjustment neural network.
在本发明的一种实现方式中,所述得到所述待播放音乐的音量设置的过程包括:In an implementation manner of the present invention, the process of obtaining the volume setting of the music to be played includes:
根据所述待播放音乐的时域波形,使用所述音乐风格神经网络,得到所述待播放音乐的风格向量;Using the music style neural network to obtain a style vector of the music to be played according to the time-domain waveform of the music to be played;
根据所述噪声的时域波形,使用所述噪声类别辨识神经网络,得到所述噪声的类别;Use the noise category identification neural network to obtain the category of the noise according to the time domain waveform of the noise;
根据所述待播放音乐的时域波形得到所述待播放音乐的能量特征;Obtaining an energy characteristic of the music to be played according to a time-domain waveform of the music to be played;
根据所述噪声的时域波形得到所述噪声的能量特征;Obtaining an energy characteristic of the noise according to a time-domain waveform of the noise;
将所述待播放音乐的风格向量、所述噪声的类别、所述待播放音乐的能量特征、所述噪声的能量特征输入至所述音量调节神经网络,得到所述待播放音乐的音量设置。The style vector of the music to be played, the category of the noise, the energy characteristics of the music to be played, and the energy characteristics of the noise are input to the volume adjustment neural network to obtain the volume setting of the music to be played.
在本发明的一种实现方式中,得到所述待播放音乐的风格向量的过程包括:In an implementation manner of the present invention, a process of obtaining a style vector of the music to be played includes:
对所述待播放音乐的时域波形进行分帧,并对分帧后的每帧进行特征提取,得到所述待播放音乐的特征;Frame the time-domain waveform of the music to be played, and perform feature extraction on each frame after the frame to obtain the characteristics of the music to be played;
将所述待播放音乐的特征输入至所述音乐风格神经网络,得到所述该待播放音乐的风格向量。The characteristics of the music to be played are input to the music style neural network to obtain the style vector of the music to be played.
在本发明的一种实现方式中,得到所述噪声的类别的过程包括:In an implementation manner of the present invention, the process of obtaining the category of the noise includes:
对所述噪声的时域波形进行分帧,并对分帧后的每帧进行特征提取,得 到所述噪声的特征;Frame the time-domain waveform of the noise, and extract features for each frame after the frame to obtain the characteristics of the noise;
将所述噪声的特征输入至所述噪声类别辨识神经网络,得到所述噪声的类别。The characteristics of the noise are input to the noise category identification neural network to obtain the category of the noise.
在本发明的一种实现方式中,所述待播放音乐的能量特征包括所述待播放音乐的平均幅度,得到所述待播放音乐的能量特征的过程包括:In an implementation manner of the present invention, the energy characteristics of the music to be played include the average amplitude of the music to be played, and the process of obtaining the energy characteristics of the music to be played includes:
计算所述待播放音乐的时域波形的每一点的幅度的绝对值,再除以总点数得到所述待播放音乐的平均幅度。Calculate the absolute value of the amplitude of each point of the time-domain waveform of the music to be played, and divide by the total number of points to obtain the average amplitude of the music to be played.
在本发明的一种实现方式中,所述噪声的能量特征包括所述噪声的平均幅度,得到所述噪声的能量特征的过程包括:In an implementation manner of the present invention, the energy characteristic of the noise includes an average amplitude of the noise, and a process of obtaining the energy characteristic of the noise includes:
计算所述噪声的时域波形的每一点的幅度的绝对值,再除以总点数得到所述噪声的平均幅度。The absolute value of the amplitude of each point in the time domain waveform of the noise is calculated, and then divided by the total number of points to obtain the average amplitude of the noise.
在本发明的一种实现方式中,在使用音乐风格神经网络之前,还包括:In an implementation manner of the present invention, before using a musical style neural network, the method further includes:
基于音乐训练数据集,通过训练得到所述音乐风格神经网络。Based on the music training data set, the music style neural network is obtained through training.
在本发明的一种实现方式中,所述音乐训练数据集中的每个音乐训练数据具有音乐风格向量,所述音乐训练数据的音乐风格向量通过以下方式得到:In an implementation manner of the present invention, each music training data in the music training data set has a music style vector, and the music style vector of the music training data is obtained in the following manner:
获取大量用户对多个音乐训练数据的风格标注信息,并基于所述风格标注信息生成标注矩阵;Acquiring style annotation information of a large number of users on multiple music training data, and generating a annotation matrix based on the style annotation information;
根据所述标注矩阵确定各个音乐训练数据的音乐风格向量。A music style vector of each music training data is determined according to the annotation matrix.
在本发明的一种实现方式中,所述根据所述标注矩阵确定各个音乐训练数据的音乐风格向量,包括:In an implementation manner of the present invention, the determining a music style vector of each music training data according to the annotation matrix includes:
将所述标注矩阵分解为第一矩阵与第二矩阵的乘积;Decomposing the labeling matrix into a product of a first matrix and a second matrix;
将所述第一矩阵的各个行向量确定为对应的音乐训练数据的音乐风格向量。Each row vector of the first matrix is determined as a music style vector of the corresponding music training data.
在本发明的一种实现方式中,在使用噪声类别辨识神经网络之前,还包括:In an implementation manner of the present invention, before using the noise category identification neural network, the method further includes:
基于噪声训练数据集,通过训练得到所述噪声类别辨识神经网络。Based on the noise training data set, the noise class identification neural network is obtained through training.
在本发明的一种实现方式中,所述噪声的时域波形是由用户音频播放设备的拾音设备采集的。In an implementation manner of the present invention, the time-domain waveform of the noise is collected by a pickup device of a user audio playback device.
在本发明的一种实现方式中,还包括:In an implementation manner of the present invention, the method further includes:
将音量调节后的待播放音乐进行播放。Play the music to be played after the volume is adjusted.
第二方面,提供了一种对音乐进行音量调节的设备,所述设备用于实现前述第一方面或任一实现方式所述方法的步骤,所述设备包括:In a second aspect, a device for volume adjustment of music is provided, the device is configured to implement the steps of the method described in the first aspect or any implementation manner, and the device includes:
获取模块,用于获取待播放音乐的时域波形以及播放环境的噪声的时域波形;An acquisition module for acquiring a time-domain waveform of music to be played and a time-domain waveform of noise of a playback environment;
确定模块,用于根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用预先训练好的神经网络,得到所述待播放音乐的音量设置;A determining module, configured to obtain a volume setting of the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise by using a pre-trained neural network;
调节模块,用于使用所述音量设置调节所述待播放音乐的音量。An adjustment module is used to adjust the volume of the music to be played using the volume setting.
第三方面,提供了一种对音乐进行音量调节的设备,包括存储器、处理器及存储在所述存储器上且在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现前述第一方面或任一实现方式所述方法的步骤。According to a third aspect, a device for adjusting volume of music is provided, which includes a memory, a processor, and a computer program stored on the memory and running on the processor. When the processor executes the computer program, Implement the steps of the method described in the foregoing first aspect or any implementation.
第四方面,提供了一种计算机存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现前述第一方面或任一实现方式所述方法的步骤。According to a fourth aspect, a computer storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the steps of the method according to the foregoing first aspect or any implementation manner are implemented.
由此可见,本发明实施例通过包括音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络的预先训练好的神经网络,其考虑了所处环境的噪声类别和音乐风格等影响用户当前音量偏好的因素,能够对用户的待播放音乐的音量进行自动调节,如此能够极大地简化用户的操作,提升了用户体验。并且还可以根据特定用户的音量偏好进行再次调节,通过在线学习得到专用于特定用户的音量调节模型。从而可以使用该专用于特定用户的音量调节模型,对特定用户想要播放的待播放音乐自动进行音量设置。It can be seen that the embodiment of the present invention uses a pre-trained neural network including a music style neural network, a noise category identification neural network, and a volume adjustment neural network, which takes into account the noise category and music style of the environment to affect the user's current volume Preference factors can automatically adjust the volume of the music to be played by the user, which can greatly simplify the user's operation and improve the user experience. And it can be adjusted again according to the volume preference of a specific user, and a volume adjustment model dedicated to a specific user can be obtained through online learning. Therefore, the volume adjustment model dedicated to a specific user can be used to automatically set the volume of the music to be played that the specific user wants to play.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are just some of the present invention For those of ordinary skill in the art, other embodiments may be obtained based on these drawings without paying creative labor.
图1是本发明实施例的得到音乐训练数据的音乐风格向量的示意性流程图;FIG. 1 is a schematic flowchart of obtaining a music style vector of music training data according to an embodiment of the present invention; FIG.
图2是本发明实施例中标注矩阵的示意图;2 is a schematic diagram of a labeling matrix in an embodiment of the present invention;
图3是本发明实施例中对音乐进行音量调节的方法的示意性流程图;3 is a schematic flowchart of a method for adjusting volume of music in an embodiment of the present invention;
图4是本发明实施例中对音乐进行音量调节的方法的另一示意性流程 图;4 is another schematic flowchart of a method for adjusting volume of music in an embodiment of the present invention;
图5是本发明实施例中对用户在音量设置基础上再次调节的示意性流程图;5 is a schematic flowchart of readjusting a user based on a volume setting according to an embodiment of the present invention;
图6是本发明实施例中基于基线模型通过在线学习得到专用于特定用户的音量调节模型的示意性流程图;6 is a schematic flowchart of obtaining a volume adjustment model dedicated to a specific user through online learning based on a baseline model in an embodiment of the present invention;
图7是本发明实施例中得到专用于特定用户的音量调节模型的示意性流程图;7 is a schematic flowchart of obtaining a volume adjustment model dedicated to a specific user in an embodiment of the present invention;
图8是本发明实施例中对音乐进行音量调节的设备的示意性框图;8 is a schematic block diagram of a device for adjusting volume of music in an embodiment of the present invention;
图9是本发明实施例中对音乐进行音量调节的设备的另一示意性框图。FIG. 9 is another schematic block diagram of a device for adjusting volume of music in an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In the following, the technical solutions in the embodiments of the present invention will be clearly and completely described with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
深度学习(Deep Learning)是一种机器学习方法,其应用深层神经网络对具有复杂模型的数据进行特征学习,并将数据低层次特征进行智能组织,形成更高级抽象形式。由于深度学习对人工难以抽象并建模的复杂数据具有较强的特征提取和建模能力,对音质自适应调整这类较难进行人工建模的任务,深度学习是一种有效的实现方法。Deep learning is a machine learning method that uses deep neural networks to learn features of data with complex models, and intelligently organizes low-level features of data to form more advanced abstract forms. Because deep learning has strong feature extraction and modeling capabilities for complex data that is difficult to abstract and model manually, deep learning is an effective implementation method for tasks such as adaptive adjustment of sound quality that are difficult to model manually.
本发明实施例提供了一种预先训练好的神经网络,其包括音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络。下面将分别进行阐述。An embodiment of the present invention provides a pre-trained neural network, which includes a musical style neural network, a noise category identification neural network, and a volume adjustment neural network. Each will be explained below.
本发明实施例中基于深度学习构建了一种音乐风格神经网络。该音乐风格神经网络是根据音乐训练数据集进行训练得到的。其中,音乐训练数据集中包括大量的音乐训练数据,下面对单个音乐训练数据进行详细阐述。In the embodiment of the present invention, a musical style neural network is constructed based on deep learning. The musical style neural network is trained based on the music training data set. Among them, the music training data set includes a large amount of music training data, and a single music training data is described in detail below.
音乐训练数据是音乐数据,包括该音乐训练数据的特征,其可以作为神经网络的输入;还包括该音乐训练数据的音乐风格向量,其可以作为神经网络的输出。The music training data is music data, including the characteristics of the music training data, which can be used as the input of the neural network; it also includes the music style vector of the music training data, which can be used as the output of the neural network.
示例性地,对于音乐训练数据,其原始音乐波形为时域波形,可以对该时域波形进行分帧,并对分帧后的每帧进行特征提取从而得到该音乐训练数据的特征。可选地,作为一例,可以通过短时傅里叶变换(Short-Time Fourier  Transform,STFT)进行特征提取,所提取的特征可以为梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)。应理解,本文对特征提取的方式仅是示意性的,并且也可以得到其他的特征,如幅度谱、对数谱、能量谱等,这里不再一一罗列。可选地,本发明实施例中此处以及之后经特征提取所得到的特征可以表示为特征张量的形式,例如表示为N维特征向量;或者,所提取的特征也可以表示为其他的形式,此处不作限定。Exemplarily, for the music training data, the original music waveform is a time-domain waveform, and the time-domain waveform may be framed, and feature extraction is performed for each frame after the framed frame to obtain the characteristics of the music training data. Optionally, as an example, short-time Fourier transform (Short-Time Fourier Transform, STFT) can be used for feature extraction, and the extracted feature can be Mel Frequency Frequency Cepstrum Coefficient (MFCC). It should be understood that the manner of feature extraction in this article is only schematic, and other features, such as amplitude spectrum, log spectrum, energy spectrum, etc. can also be obtained, which are not listed here one by one. Optionally, in the embodiments of the present invention, the features obtained by feature extraction here and thereafter may be expressed as a feature tensor, for example, as an N-dimensional feature vector; or, the extracted features may also be expressed in other forms. It is not limited here.
示例性地,可以参照如图1所示的方法得到音乐训练数据的音乐风格向量,该过程包括:Exemplarily, the music style vector of the music training data can be obtained by referring to the method shown in FIG. 1, and the process includes:
S101,获取用户对多个音乐训练数据的风格标注信息,并基于风格标注信息生成标注矩阵。S101. Acquire style annotation information of a plurality of music training data by a user, and generate a annotation matrix based on the style annotation information.
针对某一音乐训练数据,不同的用户的风格标注信息可能相同或不同。例如,歌曲《我的祖国》,有些用户可能将其标注为“民乐”,有些用户可能将其标注为“流行”,有些用户可能同时将其标注为“民乐”和“美声”,等等。通过统计多个用户的风格标注信息,可以获取不同风格的标注数量。作为一例,参照图2,针对《我的祖国》,“民乐”的标注数量为12,“流行”的标注数量为3,“美声”的标注数量为10。For certain music training data, the style annotation information of different users may be the same or different. For example, for the song "My Motherland", some users may label it as "Folk Music", some users may label it as "Popular", some users may label it as "Folk Music" and "Beisheng", and so on. By counting the style annotation information of multiple users, the number of different style annotations can be obtained. As an example, referring to FIG. 2, for "My Motherland", the number of annotations of "Folk Music" is 12, the number of annotations of "Popular" is 3, and the number of annotations of "Beisheng" is 10.
进一步地,可以基于多个音乐训练数据的标注信息,生成标注矩阵。标注矩阵的行可以表示某一音乐训练数据的标注信息,例如,每一行表示对应的音乐训练数据的“风格标签”。标注矩阵的列表示风格。参照图2,针对《我的祖国》《七里香》《珊瑚海》《十送红军》的标注信息所生成的标注矩阵可以表示为:Further, a labeling matrix may be generated based on labeling information of a plurality of music training data. The rows of the labeling matrix may represent labeling information of a certain music training data, for example, each row represents a "style label" of the corresponding music training data. The columns of the label matrix represent style. Referring to FIG. 2, the labeling matrix generated for labeling information of "My Motherland", "Qilixiang", "Coral Sea", and "Ten Send Red Army" can be expressed as:
Figure PCTCN2019089758-appb-000001
Figure PCTCN2019089758-appb-000001
应理解,图2仅是示意性地,尽管其中仅示出了4个音乐训练数据以及4种风格,但是本发明不限于此,可以基于更多数量的音乐训练数据以及更多数量的风格得到标注矩阵。It should be understood that FIG. 2 is only schematic. Although only 4 pieces of music training data and 4 styles are shown therein, the present invention is not limited thereto, and may be obtained based on a larger amount of music training data and a larger number of styles Annotation matrix.
S102,根据标注矩阵确定各个音乐训练数据的音乐风格向量。S102. Determine a music style vector of each music training data according to the annotation matrix.
具体地,可以从标注矩阵中提取音乐风格向量。作为一例,可以将标注 矩阵中某音乐训练数据的所在行对应的向量作为其音乐风格向量。如针对《我的祖国》,其音乐风格向量为[12,3,0,10]。作为另一例,可以将标注矩阵中某音乐训练数据的所在行对应的向量进行归一化后作为其音乐风格向量。如针对《我的祖国》,其音乐风格向量为[12/25,3/25,0,10/25]。可理解,这两例所得到的音乐风格向量的维度较大,且是稀疏向量。作为再一例,可以考虑该标注矩阵的稀疏性,从其中提取出音乐风格向量,提取的算法包括但不限于矩阵分解、因子分解机或词向量化算法等。该例中得到的音乐风格向量的维度较小,即可以得到更加密集的音乐风格向量,Specifically, a music style vector can be extracted from the annotation matrix. As an example, a vector corresponding to a row of music training data in the labeling matrix may be used as its music style vector. For "My Motherland", the music style vector is [12,3,0,10]. As another example, a vector corresponding to a row of music training data in the labeling matrix may be normalized as its music style vector. For "My Motherland", the music style vector is [12 / 25,3 / 25,0,10 / 25]. It can be understood that the dimensions of the music style vectors obtained in these two examples are relatively large and are sparse vectors. As yet another example, the sparseness of the labeling matrix can be considered, and music style vectors can be extracted therefrom. The extraction algorithms include, but are not limited to, matrix decomposition, factorization machine, or word vectorization algorithm. The dimension of the music style vector obtained in this example is smaller, that is, a denser music style vector can be obtained.
图2中以矩阵分解为例阐述该提取的过程。标注矩阵中每一行的向量均为稀疏的向量。例如针对某特定的音乐训练数据的风格标签,其中的某些值是正整数,而其余的均为0,很少会出现风格标签中所有项都为正整数的情况,也就是说,某特定的音乐训练数据一般只对应一种或几种风格。因此该标注矩阵也是稀疏矩阵,可以通过对该稀疏矩阵进行提取使得每个音乐训练数据的音乐风格向量的维度小于标注矩阵的列数,并且能够更好地反映不同音乐训练数据之间的相关度。In Figure 2, matrix extraction is used as an example to illustrate the extraction process. The vectors of each row in the labeling matrix are sparse vectors. For example, for a certain style label of music training data, some of the values are positive integers, and the rest are 0. It is rare that all items in the style labels are positive integers, that is, a specific Music training data generally corresponds to only one or several styles. Therefore, the labeling matrix is also a sparse matrix. By extracting the sparse matrix, the dimension of the music style vector of each music training data is smaller than the number of columns of the labeling matrix, and it can better reflect the correlation between different music training data. .
参照图2,标注矩阵可以被分解为第一矩阵乘以第二矩阵。其中,第一矩阵的行表示对应音乐训练数据的音乐风格向量,其可以看作是对稀疏向量形式的风格标签的压缩。如图2中第一矩阵所示,《我的祖国》的音乐风格向量为[1.2,3.7,3.1],《十送红军》的音乐风格向量为[1.8,4.0,4.1],由于这两个向量之间具有较高的余弦相似度,因此可以确定《我的祖国》与《十送红军》是相似的音乐。Referring to FIG. 2, the labeling matrix may be decomposed into a first matrix multiplied by a second matrix. The rows of the first matrix represent music style vectors corresponding to the music training data, which can be regarded as compression of style labels in the form of sparse vectors. As shown in the first matrix in Figure 2, the music style vector of "My Motherland" is [1.2,3.7,3.1], and the music style vector of "Ten Free Red Army" is [1.8,4.0,4.1]. The cosine similarity between the vectors is high, so it can be determined that "My Motherland" and "Ten Free Red Army" are similar music.
第二矩阵是表示第一矩阵各项的权重(图2中未示出第二矩阵的各个元素的具体值)。具体地,第二矩阵的每一列对于一个音乐风格,一列中的数值表征该音乐风格类对第一矩阵中各个元素的权重。The second matrix is a weight representing each item of the first matrix (specific values of each element of the second matrix are not shown in FIG. 2). Specifically, each column of the second matrix is for a music style, and the values in one column represent the weight of the music style class to each element in the first matrix.
可理解,通过将第一矩阵与第二矩阵相乘可以实现对标注矩阵的还原,标注矩阵可以更直观地显示被标注的各种不同风格。另外可理解,图2仅是示意性的,尽管其示出标注矩阵的列数维度为4,得到的音乐风格向量的维度为3,但是本发明不限于此。例如,在实际应用中,矩阵和向量的维度可以更大。It can be understood that by multiplying the first matrix and the second matrix, the annotation matrix can be restored, and the annotation matrix can more intuitively display various different styles being annotated. In addition, it can be understood that FIG. 2 is only schematic. Although it shows that the dimension of the number of columns of the labeling matrix is 4 and the dimension of the obtained music style vector is 3, the present invention is not limited thereto. For example, in practical applications, the dimensions of matrices and vectors can be larger.
如此,针对每一个音乐训练数据,均可以通过特征提取得到其特征。通 过图1和图2所示的过程,可以得到每个音乐训练数据的音乐风格向量。将特征作为输入,并将音乐风格向量作为输出,对音乐风格神经网络进行训练直到收敛,便可以得到训练好的音乐风格神经网络。In this way, for each piece of music training data, its features can be obtained through feature extraction. Through the process shown in Fig. 1 and Fig. 2, the music style vector of each music training data can be obtained. Taking the features as input and the music style vector as output, the music style neural network is trained until convergence, and then a trained music style neural network can be obtained.
本发明实施例中还基于深度学习构建了一种噪声类别辨识神经网络。该噪声类别辨识神经网络是根据噪声训练数据集进行训练得到的。其中,噪声训练数据集中包括大量的噪声训练数据,下面对单个噪声训练数据进行详细阐述。In the embodiment of the present invention, a noise class recognition neural network is also constructed based on deep learning. The noise class recognition neural network is trained based on the noise training data set. Among them, the noise training data set includes a large amount of noise training data, and a single noise training data is described in detail below.
噪声训练数据是噪声数据,包括该噪声训练数据的特征,其可以作为神经网络的输入;还包括该噪声训练数据的噪声类别,其可以作为神经网络的输出。The noise training data is noise data, including the characteristics of the noise training data, which can be used as the input of the neural network; it also includes the noise category of the noise training data, which can be used as the output of the neural network.
示例性地,对于噪声训练数据,其原始噪声波形为时域波形,可以对该时域波形进行分帧,并对分帧后的每帧进行特征提取从而得到该噪声训练数据的特征。可选地,作为一例,可以通过短时傅里叶变换(Short-Time Fourier Transform,STFT)进行特征提取,所提取的特征可以为梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)。应理解,本文对特征提取的方式仅是示意性的,并且也可以得到其他的特征,如幅度谱、对数谱、能量谱等,这里不再一一罗列。Exemplarily, for the noise training data, the original noise waveform is a time-domain waveform, and the time-domain waveform may be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the noise training data. Optionally, as an example, feature extraction may be performed through Short-Time Fourier Transform (STFT), and the extracted feature may be Mel Frequency Frequency Cepstrum Coefficient (MFCC). It should be understood that the manner of feature extraction in this article is only schematic, and other features, such as amplitude spectrum, log spectrum, energy spectrum, etc. can also be obtained, which are not listed here one by one.
示例性地,可以为每个噪声训练数据标记其所属的噪声类别。噪声类别可以包括但不限于机场、步行街、公交车、商场、餐厅等。本发明对标记的方式不做限定,例如,可以用“000”表示机场,“001”表示步行街,“010”表示公交车等;也可以采用其他方式进行标记,这里不再一一罗列。For example, each noise training data may be labeled with a noise category to which it belongs. Noise categories may include, but are not limited to, airports, pedestrian streets, buses, shopping malls, restaurants, and the like. The method of marking is not limited in the present invention. For example, "000" may be used to indicate an airport, "001" to indicate a pedestrian street, and "010" to indicate a bus, etc .; other methods may also be used for marking, which are not listed here one by one.
为了便于理解,此处以一个示例来说明标记的一种实现方式。具体地,一个噪声训练数据可以由一个用户或多个用户进行标记,不同的用户所标记的噪声类别可以相同或不同。在获取多个用户对一个噪声训练数据的标记之后,可以将其中被标记数量最多的确定为该一个噪声训练数据所属的噪声类别。举例来说,假设噪声训练数据A被m1个用户标记为“000”,被m2个用户标记为“001”,被m3个用户标记为“010”,若m1>m2且m1>m3,则可以确定噪声训练数据A所属的噪声类别为“000”。For ease of understanding, here is an example to illustrate one implementation of the tag. Specifically, one noise training data may be marked by one user or multiple users, and the noise categories marked by different users may be the same or different. After acquiring a plurality of users' labels for one noise training data, the most marked number among them can be determined as the noise category to which the one noise training data belongs. For example, suppose the noise training data A is labeled as "000" by m1 users, "001" by m2 users, and "010" by m3 users. If m1> m2 and m1> m3, you can It is determined that the noise category to which the noise training data A belongs is "000".
如此,针对每一个噪声训练数据,均可以通过特征提取得到其特征,并标记出其所属的噪声类别。将特征作为输入,并将噪声类别作为输出,对噪声类别辨识神经网络进行训练直到收敛,便可以得到训练好的噪声类别辨识 神经网络。In this way, for each noise training data, its features can be obtained through feature extraction, and the noise category to which it belongs is marked. Taking the features as input and the noise category as output, the noise category recognition neural network is trained until convergence, and the trained noise category recognition neural network can be obtained.
本发明实施例中还基于深度学习构建了一种音量调节神经网络。该音量调节神经网络是根据训练数据集进行训练得到的。其中,训练数据集中包括大量的训练数据,该训练数据集可以是用户行为集,如可以通过采集多个用户在各种环境下听音乐的数据等。In the embodiment of the present invention, a volume adjustment neural network is also constructed based on deep learning. The volume adjustment neural network is obtained by training according to a training data set. The training data set includes a large amount of training data, and the training data set may be a user behavior set, such as collecting data of multiple users listening to music in various environments.
下面对单个训练数据进行详细阐述。示例性地,某用户在某环境下听某音乐时,可以获取该数据作为训练数据。具体的,可以根据用户正在播放的音乐获取该音乐的时域波形,可以通过用户所使用的播放终端的拾音设备获取所处的环境的噪声的时域波形,并且可以获取用户的音量设置等。The single training data is explained in detail below. Exemplarily, when a user listens to certain music in a certain environment, the data can be acquired as training data. Specifically, the time domain waveform of the music can be obtained according to the music being played by the user, the time domain waveform of the ambient noise can be obtained through the pickup device of the playback terminal used by the user, and the user's volume setting can be obtained. .
其中,获取音乐的时域波形可以包括:从用户使用的客户端获取该音乐的时域波形。或者,可以包括:从用户使用的客户端获取该音乐的音乐信息,并根据该音乐信息从服务器端的音乐数据库中获取该音乐的时域波形,如此能够减少传输量。其中,音乐信息可以包括歌名、歌手、专辑等中的至少一项。可理解,本发明实施例中所述的音乐信息仅仅是示例性的,其可以包括其他信息,诸如时长、格式等,这里不再一一罗列。The acquiring the time-domain waveform of the music may include: acquiring the time-domain waveform of the music from a client used by the user. Alternatively, it may include: acquiring music information of the music from a client used by the user, and acquiring the time-domain waveform of the music from a music database on the server according to the music information, so that the transmission amount can be reduced. The music information may include at least one of a song title, a singer, an album, and the like. It can be understood that the music information described in the embodiment of the present invention is only exemplary, and it may include other information, such as duration, format, etc., which are not listed here one by one.
其中,拾音设备诸如耳机麦克、手机麦克等,这里不作限定。其中,可以获取用户对音量的调节指令或者获取在稳定播放该音乐时用户所设置的稳定音量。可选地,该音量可以用百分比表示,或者,音量也可以用其他方式表示,本发明对此不限定。Among them, pickup devices such as a headset microphone and a mobile phone microphone are not limited here. Among them, it is possible to obtain a volume adjustment instruction of the user or obtain a stable volume set by the user when the music is stably played. Optionally, the volume may be expressed as a percentage, or the volume may also be expressed in other manners, which is not limited in the present invention.
可以基于训练数据所包括的音乐的时域波形得到该音乐的特征。具体的,可以对该音乐的时域波形进行分帧,并对分帧后的每帧进行特征提取从而得到该音乐的特征。随后,将该音乐的特征输入至前述的音乐风格神经网络,便可以得到该音乐的风格向量。示例性地,如果不同的帧所得到的音乐的风格向量不同,可以通过对这些帧得到的风格向量进行平均,将平均后的风格向量作为该音乐的风格向量。应注意,这里所使用的“平均”是将多个风格向量项(或值)进行均值计算得到结果值。例如,可以为算术平均。然而,可理解,“平均”也可以通过其他计算方式得到结果值,如加权平均,其中不同项的权重可以相等或不等,本发明实施例对平均的方式不作限定。The characteristics of the music included in the training data can be obtained based on the time-domain waveform of the music included in the training data. Specifically, the time-domain waveform of the music can be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the music. Then, the characteristics of the music are input to the aforementioned music style neural network, and a style vector of the music can be obtained. Exemplarily, if the style vectors of the music obtained in different frames are different, the style vectors obtained in these frames may be averaged, and the averaged style vector may be used as the style vector of the music. It should be noted that the "average" used herein is a result value obtained by averaging a plurality of style vector items (or values). For example, it can be arithmetic mean. However, it can be understood that the "average" can also obtain the result value through other calculation methods, such as a weighted average, in which the weights of different items can be equal or different, and the embodiment of the present invention does not limit the average method.
可以基于训练数据所包括的噪声的时域波形得到该噪声的特征。具体的,可以对该噪声的时域波形进行分帧,并对分帧后的每帧进行特征提取从而得到该噪声的特征。随后,将该噪声的特征输入至前述的噪声类别辨识神 经网络,便可以得到该噪声的类别。示例性地,如果不同的帧所得到的噪声的类别不同,可以通过对这些帧得到的类别进行分类统计,将数量最多的一个类别作为该噪声的类别。The characteristics of the noise can be obtained based on the time-domain waveform of the noise included in the training data. Specifically, the time-domain waveform of the noise can be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the noise. Then, the characteristics of the noise are input to the aforementioned noise category identification neural network, and the category of the noise can be obtained. Exemplarily, if the types of noise obtained from different frames are different, the categories obtained from these frames may be classified and counted, and the category with the largest number is used as the category of the noise.
可以基于训练数据所包括的音乐的时域波形得到音乐能量特征。本发明实施例对计算音乐能量特征的方式不作限定,例如可以根据音乐的时域波形的各点的幅度来计算音乐能量特征。作为一例,该音乐能量特征可以包括音乐平均幅度,具体地可以计算该音乐的时域波形的每一点的幅度的绝对值,然后再除以总点数得到音乐平均幅度。也就是说,可以将该音乐的时域波形的所有点的幅度的算术平均作为音乐能量特征。作为另一例,也可以将该音乐的时域波形的所有点的幅度的几何平均或加权平均作为音乐能量特征。作为再一例,也可以将该音乐的时域波形的所有点的幅度取自然对数后再进行算术平均作为该音乐能量特征。当然,也可以通过其他的计算方法得到音乐能量特征,本发明对此不限定。Music energy characteristics can be obtained based on the time-domain waveform of the music included in the training data. The embodiment of the present invention does not limit the manner of calculating the energy characteristics of music. For example, the energy characteristics of music can be calculated according to the amplitude of each point of the time-domain waveform of the music. As an example, the music energy feature may include the average amplitude of music. Specifically, the absolute value of the amplitude of each point in the time-domain waveform of the music may be calculated, and then divided by the total number of points to obtain the average music amplitude. That is, the arithmetic mean of the amplitudes of all points in the time domain waveform of the music can be used as the music energy feature. As another example, the geometric mean or weighted mean of the amplitudes of all points in the time domain waveform of the music may be used as the music energy feature. As yet another example, the amplitudes of all points in the time-domain waveform of the music may be taken as natural logarithms and then arithmetically averaged as the music energy feature. Of course, the energy characteristics of music can also be obtained by other calculation methods, which is not limited in the present invention.
可以基于训练数据所包括的噪声的时域波形得到噪声能量特征。本发明实施例对计算噪声能量特征的方式不作限定,例如可以根据噪声的时域波形的各点的幅度来计算噪声能量特征。作为一例,该噪声能量特征可以包括噪声平均幅度,具体地可以计算该噪声的时域波形的每一点的幅度的绝对值,然后再除以从点数得到噪声平均幅度。也就是说,可以将该噪声的时域波形的所有点的幅度的算术平均作为噪声能量特征。作为另一例,也可以将该噪声的时域波形的所有点的幅度的几何平均或加权平均作为噪声能量特征。作为再一例,也可以将该噪声的时域波形的所有点的幅度取自然对数后再进行算术平均作为该噪声能量特征。当然,也可以通过其他的计算方法得到噪声能量特征,本发明对此不限定。The noise energy characteristics can be obtained based on the time-domain waveform of the noise included in the training data. The embodiment of the present invention does not limit the manner of calculating the noise energy characteristics. For example, the noise energy characteristics may be calculated according to the amplitude of each point of the time domain waveform of the noise. As an example, the noise energy characteristic may include the average amplitude of the noise. Specifically, the absolute value of the amplitude of each point in the time domain waveform of the noise may be calculated, and then divided by the number of points to obtain the average amplitude of the noise. That is, the arithmetic mean of the amplitudes of all points in the time domain waveform of the noise can be used as the noise energy feature. As another example, a geometric mean or a weighted mean of the amplitudes of all points in the time domain waveform of the noise may be used as the noise energy feature. As yet another example, the amplitudes of all points in the time domain waveform of the noise may be taken as natural logarithms and then arithmetically averaged as the noise energy characteristic. Of course, the noise energy characteristics can also be obtained by other calculation methods, which is not limited in the present invention.
如此,针对每一个训练数据,均可以得到音乐的风格向量、噪声的类别、音乐能量特征、噪声能量特征,并获取用户的音量设置。将音乐的风格向量、噪声的类别、音乐能量特征、噪声能量特征作为输入,将音量设置作为输出,对音量调节神经网络进行训练直到收敛,便可以得到训练好的音量调节神经网络。In this way, for each training data, the style vector of the music, the type of noise, the characteristics of the music energy, and the characteristics of the noise energy can be obtained, and the user's volume setting can be obtained. Taking the style vector of the music, the category of the noise, the characteristics of the music energy, and the characteristics of the noise energy as the input and the volume setting as the output, the volume adjustment neural network is trained until convergence, and the trained volume adjustment neural network can be obtained.
本发明实施例提供了一种对音乐进行音量调节的方法,如图3所示为该方法的流程图,包括:An embodiment of the present invention provides a method for adjusting volume of music. As shown in FIG. 3, a flowchart of the method includes:
S210,获取待播放音乐的时域波形以及播放环境的噪声的时域波形;S210. Obtain the time domain waveform of the music to be played and the time domain waveform of the noise of the playback environment;
S220,根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用预先训练好的神经网络,得到所述待播放音乐的音量设置;S220. Use a pre-trained neural network to obtain a volume setting of the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise;
S230,使用所述音量设置调节所述待播放音乐的音量。S230. Use the volume setting to adjust the volume of the music to be played.
预先训练好的神经网络可以包括音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络。具体地,S220中,可以根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络,得到所述待播放音乐的音量设置。其中的音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络可以分别是前述的训练好的音乐风格神经网络、训练好的噪声类别辨识神经网络以及训练好的音量调节神经网络,可理解,前述的训练过程一般在服务器端(即云端)执行。The pre-trained neural network may include a musical style neural network, a noise category identification neural network, and a volume adjustment neural network. Specifically, in S220, according to the time domain waveform of the music to be played and the time domain waveform of the noise, a music style neural network, a noise category identification neural network, and a volume adjustment neural network may be used to obtain the music to be played. Volume settings. The music style neural network, the noise category identification neural network, and the volume adjustment neural network can be the aforementioned trained music style neural network, the trained noise category identification neural network, and the volume adjustment neural network, respectively. It is understandable that The aforementioned training process is generally performed on the server side (ie, the cloud).
图3所示的方法可以由服务器端(即云端)执行,或者可以由客户端执行。The method shown in FIG. 3 may be executed by a server (that is, the cloud), or may be executed by a client.
在由客户端执行的实施例中,在S210中,若待播放音乐是客户端本地音乐,则客户端可以直接获取该待播放音乐的时域波形。若待播放音乐是在线音乐,则客户端可以从服务器端获取该待播放音乐的时域波形。另外,还可以由客户端的拾音设备获取所处的环境的噪声的时域波形。在S220之前,客户端可以从服务器端获取预先训练好的音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络。In the embodiment performed by the client, in S210, if the music to be played is the client's local music, the client can directly obtain the time domain waveform of the music to be played. If the music to be played is online music, the client can obtain the time domain waveform of the music to be played from the server. In addition, the time-domain waveform of the noise in the environment can be obtained by the pickup device of the client. Before S220, the client can obtain the pre-trained music style neural network, noise category identification neural network, and volume adjustment neural network from the server.
在由服务器端执行的实施例,在S210中,若待播放音乐是客户端本地音乐,则服务器端(即云端)从客户端接收该待播放音乐,从而获取该待播放音乐的时域波形。若待播放音乐为存储在服务器端的音乐,如存储在服务器端的音乐数据库中,则服务器端(即云端)从客户端接收待播放音乐的音乐信息,这里的音乐信息可以包括歌名、歌手、专辑等中的至少一项。根据该音乐信息从服务器端的音乐数据库中获取该待播放音乐,从而获取该待播放音乐的时域波形。另外,服务器端还可以从客户端接收由客户端的拾音设备所采集的所处环境的噪声的时域波形。In the embodiment performed by the server, in S210, if the music to be played is the client's local music, the server (ie, the cloud) receives the music to be played from the client to obtain the time domain waveform of the music to be played. If the music to be played is the music stored on the server, such as the music database stored on the server, the server (that is, the cloud) receives the music information of the music to be played from the client. The music information here may include the song name, singer, album At least one of them. Acquire the music to be played from the music database on the server side according to the music information, thereby obtaining the time domain waveform of the music to be played. In addition, the server can also receive the time-domain waveform of the ambient noise collected by the client's pickup device from the client.
示例性地,如图4所示,S220可以包括:Exemplarily, as shown in FIG. 4, S220 may include:
S2201,根据所述待播放音乐的时域波形,使用音乐风格神经网络,得到所述待播放音乐的风格向量。S2201. Use a music style neural network to obtain a style vector of the music to be played according to the time-domain waveform of the music to be played.
具体地,可以对待播放音乐的时域波形进行分帧,并对分帧后的每帧进 行特征提取,得到该待播放音乐的特征。随后可以将该待播放音乐的特征输入至音乐风格神经网络,得到该待播放音乐的风格向量。Specifically, the time-domain waveform of the music to be played can be framed, and feature extraction is performed for each frame after the framed frame to obtain the characteristics of the music to be played. Then, the features of the music to be played can be input to a music style neural network to obtain a style vector of the music to be played.
其中,特征提取的方法可以包括但不限于STFT、MFCC等。所提取的特征可以为幅度谱、对数谱、能量谱等,本发明对此不限定。The method for feature extraction may include, but is not limited to, STFT, MFCC, and the like. The extracted features may be amplitude spectrum, log spectrum, energy spectrum, etc., which is not limited in the present invention.
S2202,根据所述噪声的时域波形,使用噪声类别辨识神经网络,得到所述噪声的类别。S2202. Use a noise category identification neural network to obtain the category of the noise according to the time-domain waveform of the noise.
具体地,可以对噪声的时域波形进行分帧,并对分帧后的每帧进行特征提取,得到该噪声的特征。随后可以将该噪声的特征输入至噪声类别辨识神经网络,得到该噪声的类别。Specifically, the time domain waveform of the noise can be framed, and feature extraction is performed on each frame after the framed frame to obtain the characteristics of the noise. The characteristics of the noise can then be input to a noise category identification neural network to obtain the category of the noise.
其中,特征提取的方法可以包括但不限于STFT、MFCC等。所提取的特征可以为幅度谱、对数谱、能量谱等,本发明对此不限定。The method for feature extraction may include, but is not limited to, STFT, MFCC, and the like. The extracted features may be amplitude spectrum, log spectrum, energy spectrum, etc., which is not limited in the present invention.
S2203,根据所述待播放音乐的时域波形得到所述待播放音乐的能量特征。S2203: Obtain an energy characteristic of the music to be played according to a time-domain waveform of the music to be played.
可选地,音乐的能量特征可以包括音乐的平均幅度。可以计算该待播放音乐的时域波形的每一点的幅度的绝对值,然后再除以总点数得到该待播放音乐的平均幅度。Alternatively, the energy characteristics of the music may include the average amplitude of the music. The absolute value of the amplitude of each point of the time-domain waveform of the music to be played can be calculated, and then divided by the total number of points to obtain the average amplitude of the music to be played.
可选地,可以将该待播放音乐的时域波形的所有点的幅度的几何平均或加权平均作为该待播放音乐的能量特征。Optionally, a geometric average or a weighted average of the amplitudes of all points of the time-domain waveform of the music to be played may be used as the energy feature of the music to be played.
可选地,可以将该待播放音乐的时域波形的所有点的幅度取自然对数后再进行算术平均作为该待播放音乐的能量特征。Optionally, the amplitudes of all points of the time-domain waveform of the music to be played may be taken as natural logarithms and then arithmetically averaged as the energy characteristic of the music to be played.
S2204,根据所述噪声的时域波形得到所述噪声的能量特征。S2204. Obtain an energy characteristic of the noise according to a time-domain waveform of the noise.
可选地,噪声的能量特征可以包括噪声的平均幅度。可以计算该噪声的时域波形的每一点的幅度的绝对值,然后再除以总点数得到该噪声的平均幅度。Alternatively, the energy characteristics of the noise may include the average amplitude of the noise. The absolute value of the amplitude of each point in the time domain waveform of the noise can be calculated, and then divided by the total number of points to obtain the average amplitude of the noise.
可选地,可以将该噪声的时域波形的所有点的幅度的几何平均或加权平均作为该噪声的能量特征。Optionally, a geometric average or a weighted average of the amplitudes of all points of the time domain waveform of the noise may be used as the energy characteristic of the noise.
可选地,可以将该噪声的时域波形的所有点的幅度取自然对数后再进行算术平均作为该噪声的能量特征。Alternatively, the amplitudes of all points in the time domain waveform of the noise may be taken as natural logarithms and then arithmetically averaged as the energy characteristic of the noise.
应注意,尽管图4中按照S2201至S2204示出了该过程,然而本发明实施例对S2201至S2204的执行顺序不做限定。例如,S2201-S2204四个步骤可以并行执行。例如,可以先依次执行或并行执行S2201和S2202,然后再 依次执行或并行执行S2203和S2204。例如,可以先依次执行或并行执行S2204和S2203,然后再依次执行或并行执行S2201和S2202。例如,可以先依次执行或并行执行S2201和S2203,然后再依次执行或并行执行S2202和S2204。也就是说,S2201-S2204可以以任意顺序执行,这里不再一一罗列。It should be noted that although the process is shown in FIG. 4 according to S2201 to S2204, the embodiment of the present invention does not limit the execution order of S2201 to S2204. For example, the four steps S2201-S2204 can be executed in parallel. For example, S2201 and S2202 can be executed sequentially or in parallel, and then S2203 and S2204 can be executed sequentially or in parallel. For example, S2204 and S2203 can be executed sequentially or in parallel, and then S2201 and S2202 can be executed sequentially or in parallel. For example, S2201 and S2203 may be executed sequentially or in parallel, and then S2202 and S2204 may be executed sequentially or in parallel. In other words, S2201-S2204 can be executed in any order, and no longer listed here.
S2205,将所述待播放音乐的风格向量、所述噪声的类别、所述待播放音乐的能量特征、所述噪声的能量特征输入至音量调节神经网络,得到所述待播放音乐的音量设置。S2205: input the style vector of the music to be played, the category of the noise, the energy characteristics of the music to be played, and the energy characteristics of the noise to a volume adjustment neural network to obtain a volume setting of the music to be played.
由此可见,本发明实施例通过包括音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络的预先训练好的神经网络,其考虑了所处环境的噪声类别和音乐风格等多种影响用户当前音量偏好的因素,能够对用户的待播放音乐的音量进行自动调节,如此能够极大地简化用户的操作,提升了用户体验。It can be seen that the embodiment of the present invention adopts a pre-trained neural network including a musical style neural network, a noise category identification neural network, and a volume adjustment neural network, which considers various influences on the user such as the noise category and music style of the environment The current volume preference factor can automatically adjust the volume of the user's music to be played, which can greatly simplify the user's operation and improve the user experience.
由于不同的用户对音量的偏好设置不同,如有的人喜欢高音量的澎湃感,有的人喜欢睡眠前伴随低音量的音乐入眠;如老年人可能因为听力衰退需要高音量,而对年轻人来说低音量就可以满足。上述在训练音量调节神经网络时未考虑用户个体间的差异,因此上述训练好的音量调节神经网络可以被称为音量调节基线神经网络或者可以被称为音量调节基线模型等。Because different users have different preferences for volume, some people like high-volume surging, some people like to sleep with low-volume music before going to sleep; for example, the elderly may need high volume because of hearing loss, but for young people Low volume is enough. In the above-mentioned training of the volume adjustment neural network, the differences between the individual users are not considered. Therefore, the trained volume adjustment neural network may be referred to as a volume adjustment baseline neural network or may be referred to as a volume adjustment baseline model.
在该音量调节基线模型的基础上,可以考虑用户的使用偏好,通过在线学习得到针对特定用户的音量调节神经网络。Based on the baseline model of volume adjustment, the user's preferences can be considered, and the volume adjustment neural network for specific users can be obtained through online learning.
示例性地,S2205中的音量调节神经网络可以是音量调节基线模型,S230中可以使用S2205所确定的音量设置调节待播放音乐的音量。并且,在S230之后,可以使用该调节后的音量播放待播放音乐。Exemplarily, the volume adjustment neural network in S2205 may be a volume adjustment baseline model, and in S230, the volume setting determined by S2205 may be used to adjust the volume of the music to be played. And, after S230, the adjusted volume can be used to play the music to be played.
可理解,若S230所得到的音量设置使用户感到满意,则可以使用该音量设置播放待播放音乐,并且,上述的音量调节基线模型同时也是适合该用户的专有音量调节模型。然而,考虑到不同用户对音量的不同偏好,S230所得到的音量不一定是用户所满意的,因此,S230之后,用户可能会在此基础上再次进行音量调节,以得到该用户所期望的音量。该过程可以如图5所示。It can be understood that if the volume setting obtained by S230 is satisfactory to the user, the volume setting can be used to play the music to be played, and the above-mentioned volume adjustment baseline model is also a proprietary volume adjustment model suitable for the user. However, considering the different preferences of different users for the volume, the volume obtained by S230 may not be satisfactory to the user. Therefore, after S230, the user may adjust the volume again on this basis to obtain the desired volume of the user. . This process can be shown in Figure 5.
本发明实施例可以在预先训练好的神经网络的基础上,基于用户的再次调节,通过在线学习得到专用于特定用户的音量调节模型。具体地,如图6 所示,该过程可以包括:In the embodiment of the present invention, a volume adjustment model dedicated to a specific user can be obtained through online learning based on a user's readjustment based on a pre-trained neural network. Specifically, as shown in FIG. 6, the process may include:
S310,将预先训练好的神经网络作为基线模型。S310. A pre-trained neural network is used as a baseline model.
S320,重复执行以下步骤,直到特定用户的再次调节指令的次数小于预设值:S320. Repeat the following steps until the number of times the specific user adjusts the instruction again is less than a preset value:
S3201,对在播放音乐,可以使用基线模型得到相应的音量设置。S3201, for the music being played, the corresponding volume setting can be obtained using the baseline model.
S3202,获取特定用户对S3201中的音量设置的再次调节指令。S3202. Obtain a readjustment instruction of the volume setting in S3201 by a specific user.
S3203,若该特定用户的再次调节指令的次数达到预设值,则将该特定用户调节后的音量作为训练样本,在基线模型的基础上进行学习,得到更新后的模型,并用更新后的模型替换基线模型。S3203, if the number of re-adjustment instructions of the specific user reaches a preset value, the volume adjusted by the specific user is used as a training sample, and learning is performed on the basis of the baseline model to obtain an updated model and use the updated model Replace the baseline model.
可理解,S320中可以通过特定用户的再次调节指令(即用户对音量设置的反馈)对基线模型进行在线学习,直到用户很少或不再反馈,S320最终得到的模型可以被确定为专用于特定用户的音量调节模型。也就是说,S320最终得到的基线模型确定的音量设置用户不再反馈或者很少反馈,则该模型即为专用于特定用户的音量调节模型。在此之后,可以使用该专用模型为特定用户播放的音乐自动地进行音量设置,而无需用户手动调节,从而提升了用户体验。Understandably, in S320, the baseline model can be learned online through the readjustment instruction of the specific user (that is, the user's feedback on the volume setting) until the user has little or no feedback, and the model finally obtained in S320 can be determined to be dedicated to a specific User's volume adjustment model. In other words, if the volume setting determined by the baseline model finally obtained by S320 is no longer or rarely reported by the user, the model is a volume adjustment model dedicated to a specific user. After that, the dedicated model can be used to automatically set the volume for the music played by a specific user without manual adjustment by the user, thereby improving the user experience.
具体地,假设特定用户播放N个音乐,则可以使用音量调节基线模型得到对应的N个音量设置。如果随后该特定用户对其中的部分音量设置不满意,则会进行再次调节,假设特定用户对其中的N1个音乐的音量进行了再次调节。如果N1大于预设值(假设为N0),则可以使用这N1个音乐作为训练样本,在音量调节基线模型的基础上进行训练,得到训练后的模型,将其称为模型M(T=1)。其中,T可以表示针对特定用户进行在线训练的批次。在此之后,该特定用户播放音乐时,可以使用模型M(T=1)而不再使用音量调节基线模型。具体地,假设特定用户播放N个音乐,则可以使用模型M(T=1)得到对应的N个音量设置,如果随后该特定用户对其中的部分音量设置不满意,则会进行再次调节,假设特定用户对其中的N2个音乐的音量进行了再次调节。如果N2大于预设值(假设为N0),则可以使用这N2个音乐作为训练样本,在模型M(T=1)的基础上进行训练,得到训练后的模型,将其称为模型M(T=2)。在此之后,该特定用户播放音乐时,可以使用模型M(T=2)而不再使用音量调节基线模型和模型M(T=1)……以此类推,直到得到模型M(T=n)。在此之后,该特定用户播放音乐时,可以使用模型M(T=n)。也就 是说,可以使用M(T=n)得到对应的音量设置。如果特定用户对此次得到的音量设置都满意,不再做再次调节,则模型M(T=n)即为针对该特定用户的专用于特定用户的音量调节模型。或者,即使特定用户对其中部分音量设置不满意,但是该特定用户进行再次调节的数量小于预设值,则模型M(T=n)为针对该特定用户的专用于特定用户的音量调节模型。示例性地,该过程可以参见图7所示。Specifically, assuming that a particular user plays N pieces of music, the volume adjustment baseline model can be used to obtain corresponding N volume settings. If the specific user is not satisfied with some of the volume settings later, it will be adjusted again, assuming that the specific user has adjusted the volume of the N1 music again. If N1 is greater than the preset value (assuming N0), you can use this N1 music as a training sample to train on the basis of the volume adjustment baseline model to get the trained model, which is called model M (T = 1 ). Among them, T may represent a batch of online training for a specific user. After that, when the particular user plays music, the model M (T = 1) can be used instead of the baseline model for volume adjustment. Specifically, if a specific user plays N pieces of music, then the model M (T = 1) can be used to obtain the corresponding N volume settings. If the specific user is not satisfied with some of the volume settings, it will be adjusted again, assuming The specific user adjusted the volume of the N2 music again. If N2 is greater than a preset value (assuming N0), you can use these N2 music as training samples to train on the basis of model M (T = 1), get the trained model, and call it model M ( T = 2). After that, when the particular user plays music, the model M (T = 2) can be used instead of the volume adjustment baseline model and model M (T = 1) ... and so on, until the model M (T = n ). After that, the model M (T = n) can be used when the specific user plays music. That is, you can use M (T = n) to get the corresponding volume setting. If a specific user is satisfied with the volume settings obtained this time and does not perform adjustment again, the model M (T = n) is a volume adjustment model dedicated to the specific user for the specific user. Alternatively, even if a specific user is not satisfied with some of the volume settings, but the number of readjustments made by the specific user is less than a preset value, the model M (T = n) is a volume adjustment model dedicated to the specific user for the specific user. Exemplarily, this process can be shown in FIG. 7.
其中,特定用户进行再次调节的数量小于预设值可以是指,特定用户进行再次调节的频率小于预设频率,举例来说,该预设频率可以等于N0/N。例如,使用模型M(T=n)得到N个音乐的音量设置,该特定用户进行再次调节的音乐的数量小于N0。或者,例如,使用模型M(T=n)得到NN个音乐的音量设置,该特定用户进行再次调节的音乐的数量小于NN*N0/N。则说明该特定用户再次调节的频率小于预设频率。Wherein, the number of readjustments performed by a specific user is less than a preset value, which may mean that the frequency of readjustment performed by a specific user is less than a preset frequency. For example, the preset frequency may be equal to N0 / N. For example, using the model M (T = n) to obtain the volume settings of N pieces of music, the number of pieces of music that the specific user has adjusted again is less than N0. Or, for example, using the model M (T = n) to obtain the volume settings of NN pieces of music, the number of pieces of music that the specific user performs adjustment again is less than NN * N0 / N. It means that the frequency that the specific user adjusts again is less than the preset frequency.
由此可见,本发明实施例可以在音量调节基线模型的基础上,根据特定用户的再次调节,通过在线学习得到专用于特定用户的音量调节模型。在此之后,可以使用该专用于特定用户的音量调节模型,对特定用户想要播放的待播放音乐自动进行音量设置,减少了用户的操作,提升了用户体验。It can be seen that, in the embodiment of the present invention, a volume adjustment model dedicated to a specific user can be obtained through online learning based on the volume adjustment baseline model and according to readjustment by a specific user. After that, the volume adjustment model dedicated to a specific user can be used to automatically set the volume of the music to be played that the specific user wants to play, reducing user operations and improving the user experience.
图8是本发明实施例的对音乐进行音量调节的设备的一个示意性框图。图8所示的设备30包括获取模块310、确定模块320和调节模块330。FIG. 8 is a schematic block diagram of a device for adjusting volume of music according to an embodiment of the present invention. The device 30 shown in FIG. 8 includes an acquisition module 310, a determination module 320, and an adjustment module 330.
获取模块310用于获取待播放音乐的时域波形以及播放环境的噪声的时域波形。The obtaining module 310 is configured to obtain a time-domain waveform of music to be played and a time-domain waveform of noise of a playback environment.
确定模块320用于根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用预先训练好的神经网络,得到所述待播放音乐的音量设置。The determining module 320 is configured to obtain a volume setting of the music to be played according to a time domain waveform of the music to be played and a time domain waveform of the noise by using a pre-trained neural network.
调节模块330用于使用所述音量设置调节所述待播放音乐的音量。The adjusting module 330 is configured to use the volume setting to adjust the volume of the music to be played.
作为一种实现方式,图8所示的设备30可以为服务器端(即云端)。可选地,该设备30还可以包括训练模块,用于基于训练数据集,通过训练得到所述预先训练好的神经网络。As an implementation manner, the device 30 shown in FIG. 8 may be a server side (that is, the cloud). Optionally, the device 30 may further include a training module for obtaining the pre-trained neural network through training based on the training data set.
作为一种实现方式,设备30可以包括训练模块,用于通过在线学习得到专用于所述特定用户的音量调节神经网络。As an implementation manner, the device 30 may include a training module for obtaining a volume adjustment neural network dedicated to the specific user through online learning.
具体地:可以将所述预先训练好的神经网络作为基线模型。重复执行以下步骤,直到特定用户的再次调节指令的次数小于预设值:对在播放音乐,使用所述基线模型得到相应的音量设置;获取所述特定用户对所述相应的音 量设置的再次调节指令;若所述特定用户的再次调节指令的次数达到预设值,则将所述特定用户调节后的音量作为训练样本,在所述基线模型的基础上进行学习,得到更新后的模型,并用所述更新后的模型替换基线模型。则最终得到的更新后的模型即为专用于所述特定用户的音量调节神经网络。Specifically: the pre-trained neural network may be used as a baseline model. Repeat the following steps until the number of times the specific user readjusts the instruction is less than the preset value: for the music being played, use the baseline model to obtain the corresponding volume setting; obtain the specific user's readjustment of the corresponding volume setting Instruction; if the number of times that the specific user adjusts the instruction again reaches a preset value, the volume adjusted by the specific user is used as a training sample, learning is performed on the basis of the baseline model, and an updated model is obtained and used The updated model replaces the baseline model. Then the updated model finally obtained is a volume adjustment neural network dedicated to the specific user.
作为一种实现方式,所述预先训练好的神经网络包括:音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络。确定模块320可以具体用于:根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络,得到所述待播放音乐的音量设置。As an implementation manner, the pre-trained neural network includes: a musical style neural network, a noise category identification neural network, and a volume adjustment neural network. The determining module 320 may be specifically configured to use the music style neural network, the noise category recognition neural network, and the volume adjustment neural network to obtain the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise. Volume setting.
可选地,确定模块320可以包括风格向量确定单元、噪声类别确定单元、音乐能量特征确定单元、噪声能量特征确定单元以及音量确定单元。Optionally, the determination module 320 may include a style vector determination unit, a noise category determination unit, a music energy feature determination unit, a noise energy feature determination unit, and a volume determination unit.
风格向量确定单元用于根据所述待播放音乐的时域波形,使用所述音乐风格神经网络,得到所述待播放音乐的风格向量。A style vector determining unit is configured to obtain a style vector of the music to be played according to a time-domain waveform of the music to be played by using the music style neural network.
噪声类别确定单元用于根据所述噪声的时域波形,使用所述噪声类别辨识神经网络,得到所述噪声的类别。The noise category determination unit is configured to identify a neural network using the noise category to obtain a category of the noise according to a time domain waveform of the noise.
音乐能量特征确定单元用于根据所述待播放音乐的时域波形得到所述待播放音乐的能量特征。The music energy characteristic determining unit is configured to obtain an energy characteristic of the music to be played according to a time-domain waveform of the music to be played.
噪声能量特征确定单元用于根据所述噪声的时域波形得到所述噪声的能量特征。The noise energy characteristic determining unit is configured to obtain an energy characteristic of the noise according to a time-domain waveform of the noise.
音量确定单元用于将所述待播放音乐的风格向量、所述噪声的类别、所述待播放音乐的能量特征、所述噪声的能量特征输入至所述音量调节神经网络,得到所述待播放音乐的音量设置。The volume determining unit is configured to input a style vector of the music to be played, a category of the noise, energy characteristics of the music to be played, and energy characteristics of the noise to the volume adjustment neural network to obtain the to-be-played music. Music volume setting.
其中,风格向量确定单元具体用于:对所述待播放音乐的时域波形进行分帧,并对分帧后的每帧进行特征提取,得到所述待播放音乐的特征;将所述待播放音乐的特征输入至所述音乐风格神经网络,得到所述该待播放音乐的风格向量。The style vector determining unit is specifically configured to frame the time-domain waveform of the music to be played, and extract features from each frame after the frame to obtain the characteristics of the music to be played; Music characteristics are input to the music style neural network to obtain the style vector of the music to be played.
其中,噪声类别确定单元具体用于:对所述噪声的时域波形进行分帧,并对分帧后的每帧进行特征提取,得到所述噪声的特征;将所述噪声的特征输入至所述噪声类别辨识神经网络,得到所述噪声的类别。The noise category determination unit is specifically configured to: frame the time-domain waveform of the noise, and extract features from each frame after the frame to obtain the characteristics of the noise; and input the characteristics of the noise to all The noise category identification neural network is used to obtain the category of the noise.
其中,所述待播放音乐的能量特征包括所述待播放音乐的平均幅度,音乐能量特征确定单元具体用于:计算所述待播放音乐的时域波形的每一点的 幅度的绝对值,再除以总点数得到所述待播放音乐的能量特征。The energy characteristic of the music to be played includes the average amplitude of the music to be played. The music energy characteristic determining unit is specifically configured to calculate the absolute value of the amplitude of each point of the time domain waveform of the music to be played, and divide The energy characteristics of the music to be played are obtained by the total points.
其中,所述噪声的能量特征包括所述噪声的平均幅度,噪声能量特征确定单元具体用于:计算所述噪声的时域波形的每一点的幅度的绝对值,再除以总点数得到所述噪声的能量特征。Wherein, the energy characteristic of the noise includes an average amplitude of the noise, and the noise energy characteristic determining unit is specifically configured to calculate the absolute value of the amplitude of each point in the time-domain waveform of the noise, and divide by the total number of points to obtain the Energy characteristics of noise.
作为一种实现方式,设备30还包括训练模块,用于:基于音乐训练数据集,通过训练得到所述音乐风格神经网络。As an implementation manner, the device 30 further includes a training module, configured to obtain the music style neural network through training based on the music training data set.
其中,所述音乐训练数据集中的每个音乐训练数据具有音乐风格向量。训练模块通过以下方式得到所述音乐训练数据的音乐风格向量:获取大量用户对多个音乐训练数据的风格标注信息,并基于所述风格标注信息生成标注矩阵;根据所述标注矩阵确定各个音乐训练数据的音乐风格向量。Wherein, each music training data in the music training data set has a music style vector. The training module obtains the music style vector of the music training data in the following ways: acquiring style annotation information of a plurality of music training data by a large number of users, and generating a annotation matrix based on the style annotation information; determining each music training according to the annotation matrix Data musical style vector.
具体地,将所述标注矩阵分解为第一矩阵与第二矩阵的乘积;将所述第一矩阵的各个行向量确定为对应的音乐训练数据的音乐风格向量。Specifically, the labeling matrix is decomposed into a product of a first matrix and a second matrix; and each row vector of the first matrix is determined as a music style vector of corresponding music training data.
作为一种实现方式,设备30还包括训练模块,用于:基于噪声训练数据集,通过训练得到所述噪声类别辨识神经网络。As an implementation manner, the device 30 further includes a training module, configured to obtain the noise category identification neural network through training based on the noise training data set.
示例性地,获取模块310所获取的所述噪声的时域波形是由客户端的拾音设备采集的。Exemplarily, the time-domain waveform of the noise acquired by the acquiring module 310 is acquired by a pickup device of the client.
作为一种实现方式,设备30还包括播放模块,用于将音量调节后的待播放音乐进行播放。As an implementation manner, the device 30 further includes a playback module for playing the music to be played after the volume is adjusted.
图8所示的设备30能够用于实现前述所示的对音乐进行音量调节的方法,为避免重复,这里不再赘述。The device 30 shown in FIG. 8 can be used to implement the foregoing method for adjusting volume of music. To avoid repetition, details are not described herein again.
如图9所示,本发明实施例还提供了另一种对音乐进行音量调节的设备,包括存储器、处理器及存储在所述存储器上且在所述处理器上运行的计算机程序,处理器执行所述程序时实现前述所示的方法的步骤。As shown in FIG. 9, an embodiment of the present invention further provides another device for adjusting volume of music, which includes a memory, a processor, and a computer program stored on the memory and running on the processor. The steps of the method shown previously are carried out when the program is executed.
具体地,处理器可以获取待播放音乐的时域波形以及播放环境的噪声的时域波形;根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用预先训练好的神经网络,得到所述待播放音乐的音量设置;使用所述音量设置调节所述待播放音乐的音量。其中,所述预先训练好的神经网络包括:音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络。Specifically, the processor may obtain the time domain waveform of the music to be played and the time domain waveform of the noise of the playback environment; according to the time domain waveform of the music to be played and the time domain waveform of the noise, use a pre-trained neural network To obtain the volume setting of the music to be played; use the volume setting to adjust the volume of the music to be played. The pre-trained neural network includes a music style neural network, a noise category identification neural network, and a volume adjustment neural network.
处理器还可以通过在线学习得到专用于特定用户的音量调节神经网络。The processor can also learn online to obtain a volume-adjusting neural network dedicated to a specific user.
示例性地,本发明实施例中的对音乐进行音量调节的设备可以包括:一个或多个处理器、一个或多个存储器、输入装置以及输出装置,这些组件通 过总线系统和/或其它形式的连接机构互连。应当注意,该设备根据需要也可以具有其他组件和结构。Exemplarily, the device for adjusting the volume of music in the embodiment of the present invention may include: one or more processors, one or more memories, input devices, and output devices, and these components are connected through a bus system and / or other forms. The connection mechanism is interconnected. It should be noted that the device may also have other components and structures as required.
所述处理器可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制所述设备中的其它组件以执行期望的功能。The processor may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and / or instruction execution capabilities, and may control other components in the device to perform desired functions.
所述存储器可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器可以运行所述程序指令,以实现下文所述的本发明实施例中(由处理器实现)的客户端功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。The memory may include one or more computer program products, and the computer program product may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may include, for example, a random access memory (RAM) and / or a cache memory. The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor may run the program instructions to implement a client function (implemented by the processor) in the embodiments of the present invention described below, and / Or other desired function. Various application programs and various data, such as various data used and / or generated by the application program, can also be stored in the computer-readable storage medium.
所述输入装置可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。The input device may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
所述输出装置可以向外部(例如用户)输出各种信息(例如图像或声音),并且可以包括显示器、扬声器等中的一个或多个。The output device may output various information (for example, images or sounds) to the outside (for example, a user), and may include one or more of a display, a speaker, and the like.
其中,输入装置/输出装置可以是外接装置,通过有线或无线方式与处理器进行通信。The input device / output device may be an external device and communicate with the processor through a wired or wireless manner.
另外,本发明实施例还提供了一种计算机存储介质,其上存储有计算机程序。当所述计算机程序由处理器执行时,可以实现前述所示的音量调节的方法的步骤。例如,该计算机存储介质为计算机可读存储介质。In addition, an embodiment of the present invention also provides a computer storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the aforementioned method for adjusting volume can be implemented. For example, the computer storage medium is a computer-readable storage medium.
由此可见,本发明实施例通过包括音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络的预先训练好的神经网络,其考虑了所处环境的噪声类别和音乐风格等影响用户当前音量偏好的因素,能够对用户的待播放音乐的音量进行自动调节,如此能够极大地简化用户的操作,提升了用户体验。并且还可以根据特定用户的音量偏好进行再次调节,通过在线学习得到专用于特定用户的音量调节模型,从而可以使用该专用于特定用户的音量调节模型,对特定用户想要播放的待播放音乐自动进行音量设置。It can be seen that the embodiment of the present invention uses a pre-trained neural network including a music style neural network, a noise category identification neural network, and a volume adjustment neural network, which takes into account the noise category and music style of the environment to affect the user's current volume Preference factors can automatically adjust the volume of the music to be played by the user, which can greatly simplify the user's operation and improve the user experience. And it can be adjusted again according to the volume preference of a specific user, and a volume adjustment model dedicated to a specific user can be obtained through online learning, so that the volume adjustment model dedicated to a specific user can be used to automatically Make volume settings.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各 示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。When the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in various embodiments of the present invention. The foregoing storage media include: U disks, mobile hard disks, read-only memories (ROMs), random access memories (RAMs), magnetic disks or compact discs and other media that can store program codes .
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。The above description is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed by the present invention. It should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (17)

  1. 一种对音乐进行音量调节的方法,其特征在于,包括:A method for adjusting the volume of music, comprising:
    获取待播放音乐的时域波形以及播放环境的噪声的时域波形;Obtain the time domain waveform of the music to be played and the time domain waveform of the noise of the playback environment;
    根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用预先训练好的神经网络,得到所述待播放音乐的音量设置;Obtaining a volume setting of the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise by using a pre-trained neural network;
    使用所述音量设置调节所述待播放音乐的音量。Use the volume setting to adjust the volume of the music to be played.
  2. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    将所述预先训练好的神经网络作为基线模型;Using the pre-trained neural network as a baseline model;
    重复执行以下步骤,直到特定用户的再次调节指令的次数小于预设值:Repeat the following steps until the number of times the specific user adjusts the instruction again is less than the preset value:
    对在播放音乐,使用所述基线模型得到相应的音量设置;For playing music, use the baseline model to get the corresponding volume setting;
    获取所述特定用户对所述相应的音量设置的再次调节指令;Obtaining a re-adjustment instruction of the corresponding volume setting by the specific user;
    若所述特定用户的再次调节指令的次数达到预设值,则将所述特定用户调节后的音量作为训练样本,在所述基线模型的参数基础上进行学习,得到更新后的模型,并用所述更新后的模型替换基线模型。If the number of readjustment instructions of the specific user reaches a preset value, the volume adjusted by the specific user is used as a training sample, and learning is performed on the basis of the parameters of the baseline model to obtain an updated model and use The updated model described above replaces the baseline model.
  3. 根据权利要求1所述的方法,其特征在于,所述预先训练好的神经网络包括:音乐风格神经网络、噪声类别辨识神经网络以及音量调节神经网络。The method according to claim 1, wherein the pre-trained neural network comprises: a musical style neural network, a noise category identification neural network, and a volume adjustment neural network.
  4. 根据权利要求3所述的方法,其特征在于,所述得到所述待播放音乐的音量设置的过程包括:The method according to claim 3, wherein the process of obtaining a volume setting of the music to be played comprises:
    根据所述待播放音乐的时域波形,使用所述音乐风格神经网络,得到所述待播放音乐的风格向量;Using the music style neural network to obtain a style vector of the music to be played according to the time-domain waveform of the music to be played;
    根据所述噪声的时域波形,使用所述噪声类别辨识神经网络,得到所述噪声的类别;Use the noise category identification neural network to obtain the category of the noise according to the time domain waveform of the noise;
    根据所述待播放音乐的时域波形得到所述待播放音乐的能量特征;Obtaining an energy characteristic of the music to be played according to a time-domain waveform of the music to be played;
    根据所述噪声的时域波形得到所述噪声的能量特征;Obtaining an energy characteristic of the noise according to a time-domain waveform of the noise;
    将所述待播放音乐的风格向量、所述噪声的类别、所述待播放音乐的能量特征、所述噪声的能量特征输入至所述音量调节神经网络,得到所述待播放音乐的音量设置。The style vector of the music to be played, the category of the noise, the energy characteristics of the music to be played, and the energy characteristics of the noise are input to the volume adjustment neural network to obtain the volume setting of the music to be played.
  5. 根据权利要求4所述的方法,其特征在于,得到所述待播放音乐的风格向量的过程包括:The method according to claim 4, wherein the process of obtaining a style vector of the music to be played comprises:
    对所述待播放音乐的时域波形进行分帧,并对分帧后的每帧进行特征提取,得到所述待播放音乐的特征;Frame the time-domain waveform of the music to be played, and perform feature extraction on each frame after the frame to obtain the characteristics of the music to be played;
    将所述待播放音乐的特征输入至所述音乐风格神经网络,得到所述该待播放音乐的风格向量。The characteristics of the music to be played are input to the music style neural network to obtain the style vector of the music to be played.
  6. 根据权利要求4所述的方法,其特征在于,得到所述噪声的类别的过程包括:The method according to claim 4, wherein the process of obtaining the category of the noise comprises:
    对所述噪声的时域波形进行分帧,并对分帧后的每帧进行特征提取,得到所述噪声的特征;Frame the time domain waveform of the noise, and extract features for each frame after the frame to obtain the characteristics of the noise;
    将所述噪声的特征输入至所述噪声类别辨识神经网络,得到所述噪声的类别。The characteristics of the noise are input to the noise category identification neural network to obtain the category of the noise.
  7. 根据权利要求4所述的方法,其特征在于,所述待播放音乐的能量特征包括所述待播放音乐的平均幅度,得到所述待播放音乐的能量特征的过程包括:The method according to claim 4, wherein the energy characteristics of the music to be played include an average amplitude of the music to be played, and the process of obtaining the energy characteristics of the music to be played comprises:
    计算所述待播放音乐的时域波形的每一点的幅度的绝对值,再除以总点数得到所述待播放音乐的平均幅度。Calculate the absolute value of the amplitude of each point of the time-domain waveform of the music to be played, and divide by the total number of points to obtain the average amplitude of the music to be played.
  8. 根据权利要求4所述的方法,其特征在于,所述噪声的能量特征包括所述噪声的平均幅度,得到所述噪声的能量特征的过程包括:The method according to claim 4, wherein the energy characteristic of the noise includes an average amplitude of the noise, and a process of obtaining the energy characteristic of the noise includes:
    计算所述噪声的时域波形的每一点的幅度的绝对值,再除以总点数得到所述噪声的平均幅度。The absolute value of the amplitude of each point in the time domain waveform of the noise is calculated, and then divided by the total number of points to obtain the average amplitude of the noise.
  9. 根据权利要求3所述的方法,其特征在于,在使用音乐风格神经网络之前,还包括:The method according to claim 3, before using the music style neural network, further comprising:
    基于音乐训练数据集,通过训练得到所述音乐风格神经网络。Based on the music training data set, the music style neural network is obtained through training.
  10. 根据权利要求9所述的方法,其特征在于,所述音乐训练数据集中的每个音乐训练数据具有音乐风格向量,所述音乐训练数据的音乐风格向量通过以下方式得到:The method according to claim 9, wherein each music training data in the music training data set has a music style vector, and the music style vector of the music training data is obtained in the following manner:
    获取大量用户对多个音乐训练数据的风格标注信息,并基于所述风格标注信息生成标注矩阵;Acquiring style annotation information of a large number of users on multiple music training data, and generating a annotation matrix based on the style annotation information;
    根据所述标注矩阵确定各个音乐训练数据的音乐风格向量。A music style vector of each music training data is determined according to the annotation matrix.
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述标注矩阵确定各个音乐训练数据的音乐风格向量,包括:The method according to claim 10, wherein the determining a music style vector of each music training data according to the annotation matrix comprises:
    将所述标注矩阵分解为第一矩阵与第二矩阵的乘积;Decomposing the labeling matrix into a product of a first matrix and a second matrix;
    将所述第一矩阵的各个行向量确定为对应的音乐训练数据的音乐风格向量。Each row vector of the first matrix is determined as a music style vector of the corresponding music training data.
  12. 根据权利要求3所述的方法,其特征在于,在使用噪声类别辨识神经网络之前,还包括:The method of claim 3, before identifying the neural network using the noise category, further comprising:
    基于噪声训练数据集,通过训练得到所述噪声类别辨识神经网络。Based on the noise training data set, the noise class identification neural network is obtained through training.
  13. 根据权利要求1所述的方法,其特征在于,所述噪声的时域波形是由客户端的拾音设备采集的。The method according to claim 1, wherein the time-domain waveform of the noise is collected by a pickup device of the client.
  14. 根据权利要求1至13中任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1 to 13, further comprising:
    将音量调节后的待播放音乐进行播放。Play the music to be played after the volume is adjusted.
  15. 一种对音乐进行音量调节的设备,其特征在于,所述设备用于实现前述权利要求1至14中任一项所述的方法,所述设备包括:A device for adjusting the volume of music, wherein the device is configured to implement the method according to any one of the preceding claims 1 to 14, and the device includes:
    获取模块,用于获取待播放音乐的时域波形以及播放环境的噪声的时域波形;An acquisition module for acquiring a time-domain waveform of music to be played and a time-domain waveform of noise of a playback environment;
    确定模块,用于根据所述待播放音乐的时域波形以及所述噪声的时域波形,使用预先训练好的神经网络,得到所述待播放音乐的音量设置;A determining module, configured to obtain a volume setting of the music to be played according to the time domain waveform of the music to be played and the time domain waveform of the noise by using a pre-trained neural network;
    调节模块,用于使用所述音量设置调节所述待播放音乐的音量。An adjustment module is used to adjust the volume of the music to be played using the volume setting.
  16. 一种对音乐进行音量调节的设备,包括存储器、处理器及存储在所述存储器上且在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至14中任一项所述方法的步骤。A device for adjusting the volume of music includes a memory, a processor, and a computer program stored on the memory and running on the processor, characterized in that the processor implements rights when the processor executes the computer program The steps of the method according to any one of claims 1 to 14.
  17. 一种计算机存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至14中任一项所述方法的步骤。A computer storage medium having stored thereon a computer program, characterized in that when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 14 are implemented.
PCT/CN2019/089758 2018-06-05 2019-06-03 Method and device for adjusting volume of music WO2019233361A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810583114.1 2018-06-05
CN201810583114.1A CN109147816B (en) 2018-06-05 2018-06-05 Method and equipment for adjusting volume of music

Publications (1)

Publication Number Publication Date
WO2019233361A1 true WO2019233361A1 (en) 2019-12-12

Family

ID=64802002

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089758 WO2019233361A1 (en) 2018-06-05 2019-06-03 Method and device for adjusting volume of music

Country Status (2)

Country Link
CN (1) CN109147816B (en)
WO (1) WO2019233361A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147816B (en) * 2018-06-05 2021-08-24 安克创新科技股份有限公司 Method and equipment for adjusting volume of music
CN110012386B (en) * 2019-03-29 2021-05-11 维沃移动通信有限公司 Volume adjusting method of terminal and terminal
CN112118485B (en) * 2020-09-22 2022-07-08 英华达(上海)科技有限公司 Volume self-adaptive adjusting method, system, equipment and storage medium
CN113823318A (en) * 2021-06-25 2021-12-21 腾讯科技(深圳)有限公司 Multiplying power determining method based on artificial intelligence, volume adjusting method and device
CN116208700B (en) * 2023-04-25 2023-07-21 深圳市华卓智能科技有限公司 Control method and system for communication between mobile phone and audio equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125869A1 (en) * 2014-11-05 2016-05-05 Voyetra Turtle Beach, Inc. HEADSET WITH USER CONFIGURABLE NOISE CANCELLATION vs AMBIENT NOISE PICKUP
CN106027809A (en) * 2016-07-27 2016-10-12 维沃移动通信有限公司 Volume adjusting method and mobile terminal
CN106374864A (en) * 2016-09-29 2017-02-01 深圳市茁壮网络股份有限公司 Volume adjustment method and device
CN107886943A (en) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 A kind of method for recognizing sound-groove and device
CN109147816A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 The method and apparatus of volume adjustment is carried out to music

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2101411B1 (en) * 2008-03-12 2016-06-01 Harman Becker Automotive Systems GmbH Loudness adjustment with self-adaptive gain offsets
CN102446504B (en) * 2010-10-08 2013-10-09 华为技术有限公司 Voice/Music identifying method and equipment
CN102664017B (en) * 2012-04-25 2013-05-08 武汉大学 Three-dimensional (3D) audio quality objective evaluation method
CN102842310A (en) * 2012-08-10 2012-12-26 上海协言科学技术服务有限公司 Method for extracting and utilizing audio features for repairing Chinese national folk music audios
CN105159066B (en) * 2015-06-18 2017-11-07 同济大学 A kind of intelligent music Room regulation and control method and regulation device
KR20170030384A (en) * 2015-09-09 2017-03-17 삼성전자주식회사 Apparatus and Method for controlling sound, Apparatus and Method for learning genre recognition model
US9571628B1 (en) * 2015-11-13 2017-02-14 International Business Machines Corporation Context and environment aware volume control in telephonic conversation
CN105845120B (en) * 2016-05-24 2023-08-01 广东禾川电机科技有限公司 Silencer, atomizer and silencer screw design method
CN106502618B (en) * 2016-10-21 2020-10-13 深圳市冠旭电子股份有限公司 Hearing protection method and device
CN107436751A (en) * 2017-08-18 2017-12-05 广东欧珀移动通信有限公司 volume adjusting method, device, terminal device and storage medium
CN107564538A (en) * 2017-09-18 2018-01-09 武汉大学 The definition enhancing method and system of a kind of real-time speech communicating
CN107682561A (en) * 2017-11-10 2018-02-09 广东欧珀移动通信有限公司 volume adjusting method, device, terminal and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160125869A1 (en) * 2014-11-05 2016-05-05 Voyetra Turtle Beach, Inc. HEADSET WITH USER CONFIGURABLE NOISE CANCELLATION vs AMBIENT NOISE PICKUP
CN106027809A (en) * 2016-07-27 2016-10-12 维沃移动通信有限公司 Volume adjusting method and mobile terminal
CN106374864A (en) * 2016-09-29 2017-02-01 深圳市茁壮网络股份有限公司 Volume adjustment method and device
CN107886943A (en) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 A kind of method for recognizing sound-groove and device
CN109147816A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 The method and apparatus of volume adjustment is carried out to music

Also Published As

Publication number Publication date
CN109147816B (en) 2021-08-24
CN109147816A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
US11790934B2 (en) Deep learning based method and system for processing sound quality characteristics
WO2019233361A1 (en) Method and device for adjusting volume of music
US11875807B2 (en) Deep learning-based audio equalization
US9691379B1 (en) Selecting from multiple content sources
CN104768049B (en) Method, system and computer readable storage medium for synchronizing audio data and video data
WO2020155490A1 (en) Method and apparatus for managing music based on speech analysis, and computer device
Muthusamy et al. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals
CN106898339B (en) Song chorusing method and terminal
WO2019137392A1 (en) File classification processing method and apparatus, terminal, server, and storage medium
CN104091596A (en) Music identifying method, system and device
Haque et al. An analysis of content-based classification of audio signals using a fuzzy c-means algorithm
CN110853606A (en) Sound effect configuration method and device and computer readable storage medium
TW202223804A (en) Electronic resource pushing method and system
CN116132875B (en) Multi-mode intelligent control method, system and storage medium for hearing-aid earphone
JP6233625B2 (en) Audio processing apparatus and method, and program
CN113032616B (en) Audio recommendation method, device, computer equipment and storage medium
WO2019233359A1 (en) Method and device for transparency processing of music
CN113395577A (en) Sound changing playing method and device, storage medium and electronic equipment
Dutta et al. A hierarchical approach for silence/speech/music classification
JP7230085B2 (en) Method and device, electronic device, storage medium and computer program for processing sound
Astapov et al. Acoustic event mixing to multichannel AMI data for distant speech recognition and acoustic event classification benchmarking
CN115565508A (en) Song matching method and device, electronic equipment and storage medium
CN114664316A (en) Audio restoration method, device, equipment and medium based on automatic pickup
JP6169526B2 (en) Specific voice suppression device, specific voice suppression method and program
CN117174082A (en) Training and executing method, device, equipment and storage medium of voice wake-up model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19814093

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19814093

Country of ref document: EP

Kind code of ref document: A1