CN107045867B - Automatic composition method and device and terminal equipment - Google Patents

Automatic composition method and device and terminal equipment Download PDF

Info

Publication number
CN107045867B
CN107045867B CN201710175115.8A CN201710175115A CN107045867B CN 107045867 B CN107045867 B CN 107045867B CN 201710175115 A CN201710175115 A CN 201710175115A CN 107045867 B CN107045867 B CN 107045867B
Authority
CN
China
Prior art keywords
frame
music
value
note
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710175115.8A
Other languages
Chinese (zh)
Other versions
CN107045867A (en
Inventor
何江聪
潘青华
胡国平
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201710175115.8A priority Critical patent/CN107045867B/en
Publication of CN107045867A publication Critical patent/CN107045867A/en
Application granted granted Critical
Publication of CN107045867B publication Critical patent/CN107045867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The application provides an automatic composition method, an automatic composition device and terminal equipment, wherein the automatic composition method comprises the following steps: receiving a music file of a front piece of music to be predicted, wherein the music file of the front piece of music to be predicted comprises audio data or music description information of the front piece of music to be predicted; extracting frame-level audio features of music corresponding to the music file; according to the frame-level audio features and a pre-constructed music frequency band feature combination model, obtaining frame-level audio features carrying frequency band information; and obtaining the predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model so as to realize automatic composition. The method and the device can realize automatic composition, further improve the efficiency and feasibility of automatic composition, and reduce the influence of subjective factors on automatic composition.

Description

Automatic composition method and device and terminal equipment
Technical Field
The present application relates to the field of audio signal processing technologies, and in particular, to an automatic composition method, an automatic composition device, and a terminal device.
Background
With the application of computer technology to music processing, computer music has come into play. Computer music has gradually penetrated into various aspects of music creation, musical instrument performance, education, entertainment, and the like as a new generation of art. The automatic music composition by adopting the artificial intelligence technology is taken as a relatively new research direction in computer music, and is highly valued by researchers in related fields in recent years.
The existing automatic composing method based on artificial intelligence technology mainly comprises the following two methods: automatic composition based on heuristic search and automatic composition based on genetic algorithm. However, the existing automatic composition based on heuristic search is only suitable for the condition that the length of music is short, and the search efficiency is exponentially reduced along with the increase of the length of the music, so that the feasibility of the method is poor for the music with longer length; the automatic composition method based on the genetic algorithm inherits some typical defects of the genetic algorithm, such as: the method has large dependence on the initial population and is difficult to accurately select genetic operators, and the like.
Disclosure of Invention
The present application aims to solve at least one of the technical problems in the related art to some extent.
To this end, a first object of the present application is to propose an automatic composition method. The method realizes automatic composition by constructing the music frequency band characteristic combination model and the music prediction model, is a brand-new automatic composition method, and solves the problems of low efficiency, poor feasibility, large subjective influence and the like in the prior art.
A second object of the present application is to provide an automatic composition device.
A third object of the present application is to provide a terminal device.
A fourth object of the present application is to propose a storage medium containing computer executable instructions.
In order to achieve the above object, an automatic composition method according to an embodiment of the first aspect of the present application includes: receiving a music file of a front piece of music to be predicted, wherein the music file of the front piece of music to be predicted comprises audio data or music description information of the front piece of music to be predicted; extracting frame-level audio features of music corresponding to the music file; according to the frame-level audio features and a pre-constructed music frequency band feature combination model, obtaining frame-level audio features carrying frequency band information; and obtaining the predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model so as to realize automatic composition.
According to the automatic composition method, after a music file of a front section of music to be predicted is received, frame-level audio features of the music corresponding to the music file are extracted, then frame-level audio features carrying frequency band information are obtained according to the frame-level audio features and a pre-constructed music frequency band feature combination model, and finally predicted music is obtained according to the frame-level audio features carrying frequency band information and a pre-constructed music prediction model, so that automatic composition can be achieved, the efficiency and feasibility of automatic composition can be improved, and the influence of subjective factors on automatic composition is reduced.
In order to achieve the above object, an automatic composition apparatus according to an embodiment of the second aspect of the present application includes: the device comprises a receiving module, a prediction module and a prediction module, wherein the receiving module is used for receiving a music file of a front piece of music to be predicted, and the music file of the front piece of music to be predicted comprises audio data or music description information of the front piece of music to be predicted; the extracting module is used for extracting the frame-level audio features of the music corresponding to the music file received by the receiving module; the obtaining module is used for obtaining frame-level audio features carrying frequency band information according to the frame-level audio features and a pre-constructed music frequency band feature combination model; and obtaining the predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model so as to realize automatic composition.
In the automatic composition device according to the embodiment of the application, after the receiving module receives a music file of a piece of music to be predicted, the extracting module extracts frame-level audio features of the music corresponding to the music file, the obtaining module obtains the frame-level audio features carrying band information according to the frame-level audio features and a pre-constructed music band feature combination model, and obtains the predicted music according to the frame-level audio features carrying band information and the pre-constructed music prediction model, so that automatic composition can be realized, the efficiency and feasibility of automatic composition can be improved, and the influence of subjective factors on automatic composition can be reduced.
In order to achieve the above object, a terminal device according to an embodiment of the third aspect of the present application includes: one or more processors; storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods as described above.
To achieve the above object, a fourth aspect of the present application provides a storage medium containing computer-executable instructions for performing the method as described above when executed by a computer processor.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of one embodiment of an automatic composition method of the present application;
FIG. 2 is a flow chart of another embodiment of an automatic composition method of the present application;
FIG. 3 is a schematic diagram of one embodiment of a topology in an automatic composition method of the present application;
FIG. 4 is a flow chart of yet another embodiment of an automatic composition method of the present application;
FIG. 5 is a schematic representation of energy value coordinates in an automatic composition method of the present application;
FIG. 6 is a flow chart of yet another embodiment of an automatic composition method of the present application;
FIG. 7 is a flow chart of yet another embodiment of an automatic composition method of the present application;
FIG. 8 is a schematic diagram of another embodiment of a topology in an automatic composition method of the present application;
FIG. 9 is a schematic structural diagram of an embodiment of an automatic composition device according to the present application;
fig. 10 is a schematic structural view of another embodiment of the automatic composition device of the present application;
fig. 11 is a schematic structural diagram of an embodiment of a terminal device according to the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Fig. 1 is a flowchart of an embodiment of an automatic composition method according to the present application, and as shown in fig. 1, the automatic composition method may include:
step 101, receiving a music file of a previous piece of music to be predicted, where the music file of the previous piece of music to be predicted includes audio data or music description information of the previous piece of music to be predicted.
The audio data or the music description information of the preceding piece of music to be predicted refers to the audio data or the music description information of a given section of music, and then the following piece of music can be predicted according to the audio data or the music description information of the given section of music.
The music description information may be generally converted into audio data, and the music description information may be a Musical Instrument Digital Interface (MIDI) file or the like.
And 102, extracting the frame-level audio features of the music corresponding to the music file.
And 103, obtaining frame-level audio features carrying frequency band information according to the frame-level audio features and a pre-constructed music frequency band feature combination model.
And step 104, obtaining the predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model so as to realize automatic composition.
According to the automatic composition method, after a music file of a front section of music to be predicted is received, frame-level audio features of the music corresponding to the music file are extracted, then frame-level audio features carrying frequency band information are obtained according to the frame-level audio features and a pre-constructed music frequency band feature combination model, and finally predicted music is obtained according to the frame-level audio features carrying frequency band information and a pre-constructed music prediction model, so that automatic composition can be realized, the efficiency and feasibility of automatic composition can be improved, and the influence of subjective factors on automatic composition is reduced.
Fig. 2 is a flowchart of another embodiment of the automatic composition method of the present application, as shown in fig. 2, before step 103, the method may further include:
step 201, collecting music files and converting the music files into audio files with the same format.
Specifically, a large amount of training data can be obtained by crawling a large amount of music files on the internet, where the music files may be audio data or music description information, such as: MIDI files, and the like. Then, the music file can be converted into an audio file with the same format, and the format of the audio file only needs to satisfy the requirement of Fast Fourier Transform (FFT), for example: ". PCM or WAV, etc., the format of the audio file is not limited in this embodiment, and the present embodiment takes the format of" PCM "as an example for explanation. It should be noted that: if the music file is music description information, such as a MIDI file, it is necessary to convert the MIDI file into an audio file first, and then into an audio file in the ". PCM" format.
Step 202, extracting the frame-level audio features of the audio file.
Step 203, determining the topological structure of the music frequency band feature combination model.
Specifically, the topological structure is a hedge Neural network structure, and in this embodiment, taking a hedge Recurrent Neural Network (RNN) as an example, the topological structure includes two independent RNNs and a connection unit, as shown in fig. 3, and fig. 3 is a schematic diagram of an embodiment of the topological structure in the automatic composition method of the present application. Two independent RNNs, named LF _ RNN and HF _ RNN, are used for low-band multi-frequency signature combining and high-band multi-frequency signature combining, respectively.
The input of LF _ RNN is a certain frame TmEnergy value E (T) from low frequencym,Fi) I is 1,2, …, k, k is 1,2, …, N/2(N is the number of FFT points), and the output L of the previous frequency point LF _ RNNi-1(ii) a The output of LF _ RNN is LiIndicating the second after taking into account the low-frequency informationTmAnd the energy value of the ith frequency point of the frame.
Similarly, the input of HF _ RNN is a certain frame TmEnergy value E (T) from high frequencym,Fj) J is N/2, N/2-1, …, k, where k is 1,2, …, N/2(N is the number of FFT points), and the output H of the previous frequency point HF _ RNNj+1(ii) a The output of HF _ RNN is HiIndicating the Tth after taking into account high frequency informationmAnd energy value of j frequency point of frame.
The successive units are concatenate in fig. 3, and when i ═ j ═ k, the two are connected to form N (T)m,Fk) To obtain the Tth frequency point information considering other frequency point informationmAnd energy value of k frequency point of frame.
And step 204, training the music frequency band characteristic combination model according to the determined topological structure and the frame-level audio characteristics.
Specifically, when training the music band feature combination model, the training algorithm used may be a neural network model training algorithm, such as a Back Propagation (BP) algorithm, and the training algorithm used in this embodiment is not limited.
Fig. 4 is a flowchart of another embodiment of the automatic composition method of the present application, and as shown in fig. 4, in the embodiment shown in fig. 2 of the present application, step 202 may include:
step 401, performing fast fourier transform of a fixed number of points on the audio file according to a frame.
Specifically, an audio file in ". PCM" format may be subjected to a fixed number of points FFT per frame.
And step 402, calculating the energy value of each frame of the audio file at each frequency point according to the result of the fast Fourier transform.
Fig. 5 is a schematic diagram showing energy value coordinates in the automatic composition method of the present application, and fig. 5 is a schematic diagram showing energy value coordinates of each frame at each frequency point, where a horizontal axis t shows a time series frame, a vertical axis f shows a frequency point, coordinates E (t, f) show an energy value, M shows a total number of frames, and N shows FFT points.
And step 403, determining the attribution of the notes of each frame according to the energy value.
Specifically, at each frequency point, determining that a first frame and a second frame of the audio file belong to a first note; then judging whether the absolute value of a first difference value is smaller than a second difference value, wherein the first difference value is the difference between the energy value of a third frame of the audio file and the average value of the energy values of the first frame and a second frame of the audio file, and the second difference value is the difference between the maximum value and the minimum value of the energy values of the first frame and the second frame of the audio file; if yes, determining that the third frame of the audio file belongs to the first note, and sequentially judging backwards that the notes of the fourth frame till the last frame belong to.
If the absolute value of the first difference is greater than or equal to a second difference, taking a third frame of the audio file as the beginning of a second note, and determining that a fourth frame of the audio file belongs to the second note; judging whether the absolute value of a third difference value is smaller than a fourth difference value from a fifth frame of the audio file, wherein the third difference value is the difference between the energy value of the fifth frame of the audio file and the average value of the energy values of the third frame to the fourth frame of the audio file, and the fourth difference value is the difference between the maximum value and the minimum value of the energy values of the third frame to the fourth frame of the audio file; and determining the attribution of the musical note of the fifth frame in the same way of judging the attribution of the musical note of the third frame, and repeating the steps until the attribution of the musical note of the last frame of the audio file is determined.
That is, determining the note attribution for each frame may be: the following processing is carried out on each frequency point: will T1And T2The frame is considered to belong to the first note, from the Tth3Frame Start judge Attribution-if E (T) is satisfied3,F1)-Emean(T1,T2)|<(Emax(T1,T2)-Emin(T1,T2) Then T) then3The frame belongs to the first note, and then the attribution of each frame is judged backwards in sequence, wherein Emean(T1,T2)、Emax(T1,T2) And Emin(T1,T2) Respectively represent the T th1To T2Frame energy valueAverage, maximum and minimum values of; otherwise, will T3The frame is taken as the start of the second note and the Tth note is determined4The frame belonging to the second note, from the Tth5The frame start judgment is still made through the formula | E (T)5,F1)-Emean(T3,T4)|<(Emax(T3,T4)-Emin(T3,T4) Determine the attribution of notes of the T5 th frame until the attribution of notes of all frames is determined.
And step 404, calculating the energy value of each note, and acquiring the frame-level audio features according to the energy value of each note.
Fig. 6 is a flowchart of another embodiment of the automatic composition method of the present application, and as shown in fig. 6, in the embodiment shown in fig. 4 of the present application, step 404 may include:
step 601, calculating the energy average value of all frames contained in each note as the energy value of each note.
Step 602, normalizing the energy value of each frame included in each note to the energy value of the corresponding note.
Step 603, the notes with energy values less than a predetermined threshold are filtered out to obtain frame-level audio features.
The predetermined threshold may be set according to system performance and/or implementation requirements, and the size of the predetermined threshold is not limited in this embodiment.
In this embodiment, the energy of a note is defined as the energy mean of all frames contained in the note, so that the energy mean of all frames contained in each note can be calculated as the energy value e (i) of each note, and then the energy value of each frame contained in each note is normalized to the energy value of the note.
It should be noted that, in the embodiment shown in fig. 2 of the present application, steps 201 to 204 may be executed sequentially with steps 101 to 102, or may be executed concurrently with steps 101 to 102, which is not limited in this application.
Fig. 7 is a flowchart of another embodiment of the automatic composition method of the present application, as shown in fig. 7, in the embodiment shown in fig. 1 of the present application, before step 104, the method may further include:
step 701, determining a topological structure of a music prediction model.
In this embodiment, the music prediction model adopts an RNN model, as shown in fig. 8, fig. 8 is a schematic diagram of another embodiment of a topology structure in the automatic composition method of the present application, and an input of the RNN model shown in fig. 8 is an output N (T) of a music frequency band feature combination modelm,Fk) And the output h of the previous frame modelmThe output is the energy value N (T) of the next framem+1,Fk)。
Step 702, training the music prediction model according to the music frequency band feature and the output of the model and the determined topological structure.
It should be noted that step 701 and step 702 may be executed successively with step 101 to step 103, or may be executed in parallel with step 101 to step 103, which is not limited in this embodiment.
The automatic composition method can realize automatic composition, further improve the efficiency and feasibility of automatic composition, reduce the influence of subjective factors on automatic composition, is a brand new automatic composition method, and solves the problems of low efficiency, poor feasibility, large subjective influence and the like in the prior art.
Fig. 9 is a schematic structural diagram of an embodiment of an automatic composition device according to the present application, where the automatic composition device in the present embodiment may be used as a terminal device or a part of a terminal device to implement the automatic composition method provided by the present application. The terminal device may be a client device or a server device, and the form of the terminal device is not limited in the present application.
As shown in fig. 9, the automatic composition apparatus may include: a receiving module 91, an extracting module 92 and an obtaining module 93;
the receiving module 91 is configured to receive a music file of a to-be-predicted previous-segment music, where the music file of the to-be-predicted previous-segment music includes audio data or music description information of the to-be-predicted previous-segment music; the audio data or the music description information of the preceding piece of music to be predicted refers to the audio data or the music description information of a given section of music, and then the following piece of music can be predicted according to the audio data or the music description information of the given section of music. The music description information may be converted into audio data, and the music description information may be a MIDI file or the like.
An extracting module 92, configured to extract frame-level audio features of music corresponding to the music file received by the receiving module 91;
an obtaining module 93, configured to obtain a frame-level audio feature carrying band information according to the frame-level audio feature and a pre-constructed music band feature combination model; and obtaining the predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model so as to realize automatic composition.
In the automatic composition device, after the receiving module 91 receives a music file of a front section of music to be predicted, the extracting module 92 extracts frame-level audio features of the music corresponding to the music file, the obtaining module 93 obtains the frame-level audio features carrying band information according to the frame-level audio features and a pre-constructed music band feature combination model, and obtains the predicted music according to the frame-level audio features carrying band information and the pre-constructed music prediction model, so that automatic composition can be realized, the efficiency and feasibility of automatic composition can be improved, and the influence of subjective factors on automatic composition can be reduced.
Fig. 10 is a schematic structural view of another embodiment of the automatic composing device of the present application, which is different from the automatic composing device shown in fig. 9 in that the automatic composing device shown in fig. 10 may further include: a collection module 94, a conversion module 95, a determination module 96, and a training module 97;
a collecting module 94, configured to collect music files before the obtaining module 93 obtains the frame-level audio features carrying the frequency band information;
a conversion module 95 for converting the music files collected by the collection module 94 into audio files of the same format;
specifically, the collection module 94 may obtain a large amount of training data by crawling a large amount of music files on the internet, where the music files may be audio data or music description information, such as: MIDI files, and the like. Then, the conversion module 95 may convert the music file into an audio file with the same format, where the format of the audio file only needs to satisfy the requirement of performing FFT, for example: ". PCM or WAV, etc., the format of the audio file is not limited in this embodiment, and the present embodiment takes the format of" PCM "as an example for explanation. It should be noted that: if the music file is music description information, such as a MIDI file, it is necessary to convert the MIDI file into an audio file first, and then into an audio file in the ". PCM" format.
The extracting module 92 is further configured to extract the frame-level audio features of the audio file converted by the converting module 95.
A determining module 96 for determining a topological structure of the music band feature combination model; specifically, the topology determined by the determining module 96 is a hedged neural network structure, and in this embodiment, the hedged RNN is taken as an example, the topology includes two independent RNNs and a connection unit, as shown in fig. 3, the two independent RNNs are respectively named as LF _ RNN and HF _ RNN, and are respectively used for low-band multi-frequency feature combination and high-band multi-frequency feature combination.
The input of LF _ RNN is a certain frame TmEnergy value E (T) from low frequencym,Fi) I is 1,2, …, k, k is 1,2, …, N/2(N is the number of FFT points), and the output L of the previous frequency point LF _ RNNi-1(ii) a The output of LF _ RNN is LiIndicating the Tth after taking into account the low frequency informationmAnd the energy value of the ith frequency point of the frame.
Similarly, the input of HF _ RNN is a certain frame TmEnergy from high frequencyValue E (T)m,Fj) J is N/2, N/2-1, …, k, where k is 1,2, …, N/2(N is the number of FFT points), and the output H of the previous frequency point HF _ RNNj+1(ii) a The output of HF _ RNN is HiIndicating the Tth after taking into account high frequency informationmAnd energy value of j frequency point of frame.
The successive units are concatenate in fig. 3, and when i ═ j ═ k, the two are connected to form N (T)m,Fk) To obtain the Tth frequency point information considering other frequency point informationmAnd energy value of k frequency point of frame.
And a training module 97, configured to train the music band feature combination model according to the topology determined by the determining module 96 and the frame-level audio features extracted by the extracting module 92. Specifically, when the training module 97 trains the music band feature combination model, the training algorithm used may be a neural network model training algorithm, such as a BP algorithm, and the training algorithm used in this embodiment is not limited.
In this embodiment, the extracting module 92 may include: a transformation sub-module 921, a calculation sub-module 922, a determination sub-module 923 and an acquisition sub-module 924;
the transform submodule 921 is configured to perform fast fourier transform of a fixed number of points on the audio file by frame; specifically, the transform sub-module 921 may perform a fixed number of FFT on the audio file in ". PCM" format per frame.
The calculating submodule 922 is configured to calculate an energy value of each frame of the audio file at each frequency point according to a result of fast fourier transform of the transform submodule 921; fig. 5 is a schematic diagram showing the coordinate representation of the energy value of each frame at each frequency point, wherein the horizontal axis t represents a time sequence frame, the vertical axis f represents a frequency point, the coordinate E (t, f) represents the energy value, M represents the total frame number, and N represents the number of FFT points.
A determining submodule 923, configured to determine a note attribute of each frame according to the energy value calculated by the calculating submodule 922.
The calculating submodule 922 is further used for calculating an energy value of each note;
the obtaining sub-module 924 is configured to obtain the frame-level audio features according to the energy value of each note calculated by the calculating sub-module 922.
The calculating submodule 922 is specifically configured to calculate an energy average value of all frames included in each note as an energy value of each note; normalizing the energy value of each frame included by each note into the energy value of the corresponding note;
the obtaining sub-module 924 is specifically configured to filter out notes with energy values smaller than a predetermined threshold value to obtain frame-level audio features. The predetermined threshold may be set according to system performance and/or implementation requirements, and the size of the predetermined threshold is not limited in this embodiment.
In this embodiment, the energy of a note is defined as the energy mean of all frames contained in the note, so that the energy mean of all frames contained in each note can be calculated as the energy value e (i) of each note, and then the energy value of each frame contained in each note is normalized to the energy value of the note.
In this embodiment, the determining sub-module 923 may include: a note determining unit 9231 and a judging unit 9232;
a note determining unit 9231, configured to determine that the first frame and the second frame of the audio file belong to a first note at each frequency point;
a judging unit 9232 configured to judge whether the absolute value of the first difference is smaller than the second difference; the first difference value is the difference between the energy value of the third frame of the audio file and the average value of the energy values from the first frame to the second frame of the audio file, and the second difference value is the difference between the maximum value and the minimum value of the energy values from the first frame to the second frame of the audio file;
the note determining unit 9231 is further configured to determine that the third frame of the audio file belongs to the first note and sequentially determine backwards that the notes of the fourth frame are to be attributed to the last frame when the absolute value of the first difference is smaller than the second difference.
A note determining unit 9231, further configured to take a third frame of the audio file as a start of a second note and determine that a fourth frame of the audio file belongs to the second note when an absolute value of the first difference is greater than or equal to a second difference;
the determining unit 9232 is further configured to determine whether an absolute value of a third difference value is smaller than a fourth difference value from a fifth frame of the audio file, where the third difference value is a difference between an energy value of the fifth frame of the audio file and an average value of energy values of third to fourth frames of the audio file, and the fourth difference value is a difference between a maximum value and a minimum value of the energy values of the third to fourth frames of the audio file; and determining the attribution of the musical note of the fifth frame in the same way of judging the attribution of the musical note of the third frame, and repeating the steps until the attribution of the musical note of the last frame of the audio file is determined.
That is, the determination sub-module 923 may determine the note attribution for each frame as: the following processing is carried out on each frequency point: note determining unit 9231 compares T1And T2The frame is considered to belong to the first note, and the judging unit 9232 judges from the Tth3Frame Start determination Attribution-if | E (T) is satisfied3,F1)-Emean(T1,T2)|<(Emax(T1,T2)-Emin(T1,T2) Then frame T3 belongs to the first note and then it is decided backwards in turn to attribute each frame, where Emean(T1,T2)、Emax(T1,T2) And Emin(T1,T2) Respectively represent the T th1To T2An average, maximum, and minimum of the frame energy values; otherwise, will T3The frame is taken as the start of the second note and the Tth note is determined4The frame belonging to the second note, from the Tth5The frame start judgment is still made through the formula | E (T)5,F1)-Emean(T3,T4)|<(Emax(T3,T4)-Emin(T3,T4) Determine the attribution of notes of the T5 th frame until the attribution of notes of all frames is determined.
Further, the automatic composition apparatus may further include: a determination module 96 and a training module 97;
a determining module 96, configured to determine a topological structure of the music prediction model before the obtaining module 93 obtains the predicted music; in this embodiment, the topological structure of the music prediction model determined by the determining module 96 is an RNN model, and as shown in fig. 8, the input of the RNN model is the output N (T) of the music frequency band feature combination modelm,Fk) And the output h of the previous frame modelmThe output is the energy value N (T) of the next framem+1,Fk)。
And a training module 97, configured to train the music prediction model according to the output of the music band feature combination model and the topological structure determined by the determining module 96.
The automatic composition device can realize automatic composition, further improve the efficiency and feasibility of automatic composition, reduce the influence of subjective factors on automatic composition, is a brand new automatic composition method, and solves the problems of low efficiency, poor feasibility, large subjective influence and the like in the prior art.
Fig. 11 is a schematic structural diagram of an embodiment of a terminal device according to the present application, where the terminal device in the present application may implement the automatic composition method provided in the present application, the terminal device may be a client device or a server device, and the present application does not limit the form of the terminal device. The terminal device may include: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the automatic composition method provided by the present application.
Fig. 11 shows a block diagram of an exemplary terminal device 12 suitable for use in implementing embodiments of the present application. The terminal device 12 shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 11, the terminal device 12 is represented in the form of a general-purpose computing device. The components of terminal device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.
Terminal device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by terminal device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. Terminal device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 11, and commonly referred to as a "hard drive"). Although not shown in FIG. 11, a disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk Read Only memory (CD-ROM), a Digital versatile disk Read Only memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the application.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally execute the automated composition method of the embodiments described herein.
Terminal device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with terminal device 12, and/or with any devices (e.g., network card, modem, etc.) that enable terminal device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Furthermore, the terminal device 12 can also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network (e.g., the Internet) via the Network adapter 20. As shown in fig. 11, the network adapter 20 communicates with the other modules of the terminal device 12 via the bus 18. It should be understood that although not shown in fig. 11, other hardware and/or software modules may be used in conjunction with terminal device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, such as implementing the automatic composition method provided herein, by executing programs stored in the system memory 28.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present application, "a plurality" means two or more unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic Gate circuit for implementing a logic function on a data signal, an asic having an appropriate combinational logic Gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), and the like.
The present application also provides a storage medium containing computer-executable instructions for performing the automatic composition method provided herein when executed by a computer processor.
The storage media described above, which comprise computer-executable instructions, may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM) or flash Memory, an optical fiber, a portable compact disc Read Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (16)

1. An automatic composition method, comprising:
receiving a music file of a front piece of music to be predicted, wherein the music file of the front piece of music to be predicted comprises audio data or music description information of the front piece of music to be predicted;
extracting frame-level audio features of music corresponding to the music file;
obtaining frame-level audio features carrying frequency band information according to the frame-level audio features and a pre-constructed music frequency band feature combination model, wherein the music frequency band feature combination model is obtained according to the frame-level audio features of the audio files and the topological structure training of the music frequency band feature combination model;
and obtaining the predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model so as to realize automatic composition, wherein the music prediction model is obtained by combining the output of the model and the topological structure training of the music prediction model according to the music frequency band features.
2. The method of claim 1, wherein before obtaining the frame-level audio features carrying band information according to the frame-level audio features and the pre-constructed music band feature combination model, the method further comprises:
collecting music files and converting the music files into audio files with the same format;
extracting frame-level audio features of the audio file;
determining a topological structure of the music frequency band feature combination model;
and training the music frequency band characteristic combination model according to the determined topological structure and the frame-level audio characteristics.
3. The method of claim 2, wherein extracting the frame-level audio features of the audio file comprises:
carrying out fast Fourier transform of a fixed point number on the audio file according to frames;
calculating the energy value of each frame of the audio file at each frequency point according to the result of the fast Fourier transform;
determining the attribution of the notes of each frame according to the energy value;
and calculating the energy value of each note, and acquiring the frame-level audio features according to the energy value of each note.
4. A method as recited in claim 3, wherein said determining a note attribute for each frame as a function of the energy value comprises:
determining, at each frequency point, that a first frame and a second frame of the audio file belong to a first note;
judging whether the absolute value of the first difference is smaller than the second difference; the first difference value is the difference between the energy value of the third frame of the audio file and the average value of the energy values of the first frame to the second frame of the audio file, and the second difference value is the difference between the maximum value and the minimum value of the energy values of the first frame to the second frame of the audio file;
if yes, determining that the third frame of the audio file belongs to the first musical note, and sequentially judging backwards whether the musical notes of the fourth frame till the last frame belong to.
5. The method of claim 4, wherein after determining whether the absolute value of the first difference is less than the absolute value of the second difference, further comprising:
if the absolute value of the first difference is greater than or equal to the second difference, taking a third frame of the audio file as the beginning of a second note and determining that a fourth frame of the audio file belongs to the second note;
judging whether the absolute value of a third difference value is smaller than a fourth difference value from a fifth frame of the audio file, wherein the third difference value is the difference between the energy value of the fifth frame of the audio file and the average value of the energy values of the third frame to the fourth frame of the audio file, and the fourth difference value is the difference between the maximum value and the minimum value of the energy values of the third frame to the fourth frame of the audio file; and until the attribution of the note of the last frame of the audio file is determined.
6. The method of claim 3, wherein calculating the energy value of each note, and wherein obtaining the frame-level audio features according to the energy value of each note comprises:
calculating the energy mean value of all frames contained in each note as the energy value of each note;
normalizing the energy value of each frame included by each note to the energy value of the corresponding note;
notes with energy values less than a predetermined threshold are filtered out to obtain frame-level audio features.
7. The method according to claim 1, wherein before obtaining the predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model, the method further comprises:
determining a topological structure of a music prediction model;
and training the music prediction model according to the music frequency band characteristic and the output of the model and the determined topological structure.
8. An automatic composition device, comprising:
the device comprises a receiving module, a prediction module and a prediction module, wherein the receiving module is used for receiving a music file of a front piece of music to be predicted, and the music file of the front piece of music to be predicted comprises audio data or music description information of the front piece of music to be predicted;
the extracting module is used for extracting the frame-level audio features of the music corresponding to the music file received by the receiving module;
the obtaining module is used for obtaining frame-level audio features carrying band information according to the frame-level audio features and a pre-constructed music band feature combination model, wherein the music band feature combination model is obtained by training according to the frame-level audio features of the audio files and the topological structure of the music band feature combination model; and obtaining predicted music according to the frame-level audio features carrying the frequency band information and a pre-constructed music prediction model so as to realize automatic music composition, wherein the music prediction model is obtained by combining the output of the model and the topological structure training of the music prediction model according to the music frequency band features.
9. The apparatus of claim 8, further comprising: the device comprises a collection module, a conversion module, a determination module and a training module;
the collecting module is used for collecting music files before the obtaining module obtains the frame-level audio features carrying the frequency band information;
the conversion module is used for converting the music files collected by the collection module into audio files with the same format;
the extraction module is also used for extracting the frame-level audio features of the audio file converted by the conversion module;
the determining module is used for determining a topological structure of the music frequency band feature combination model;
and the training module is used for training the music frequency band feature combination model according to the topological structure determined by the determining module and the frame-level audio features extracted by the extracting module.
10. The apparatus of claim 9, wherein the extraction module comprises:
the transformation submodule is used for performing fast Fourier transformation of fixed points on the audio file according to frames;
the computing submodule is used for computing the energy value of each frame of the audio file at each frequency point according to the result of the fast Fourier transform of the transform submodule;
the determining submodule is used for determining the attribution of the musical notes of each frame according to the energy value calculated by the calculating submodule;
the calculating submodule is also used for calculating the energy value of each note;
and the acquisition submodule is used for acquiring the frame-level audio features according to the energy value of each note calculated by the calculation submodule.
11. The apparatus of claim 10, wherein the determination submodule comprises:
a note determining unit for determining that a first frame and a second frame of the audio file belong to a first note at each frequency point;
a judging unit configured to judge whether an absolute value of the first difference is smaller than a second difference; the first difference value is the difference between the energy value of the third frame of the audio file and the average value of the energy values of the first frame to the second frame of the audio file, and the second difference value is the difference between the maximum value and the minimum value of the energy values of the first frame to the second frame of the audio file;
and the note determining unit is further configured to determine that a third frame of the audio file belongs to a first note when the absolute value of the first difference is smaller than a second difference, and sequentially judge backwards that notes of a fourth frame are attributed to a last frame.
12. The apparatus of claim 11,
the note determining unit is further configured to take a third frame of the audio file as a start of a second note and determine that a fourth frame of the audio file belongs to the second note when the absolute value of the first difference is greater than or equal to the second difference;
the judging unit is further configured to judge whether an absolute value of a third difference value is smaller than a fourth difference value from a fifth frame of the audio file, where the third difference value is a difference between an energy value of the fifth frame of the audio file and an average value of energy values of third to fourth frames of the audio file, and the fourth difference value is a difference between a maximum value and a minimum value of the energy values of the third to fourth frames of the audio file; and until the attribution of the note of the last frame of the audio file is determined.
13. The apparatus of claim 10,
the calculation submodule is specifically used for calculating the energy mean value of all frames contained in each note as the energy value of each note; normalizing the energy value of each frame included by each note into the energy value of the corresponding note;
the obtaining submodule is specifically configured to filter out notes with energy values smaller than a predetermined threshold value, so as to obtain frame-level audio features.
14. The apparatus of claim 8, further comprising: a determining module and a training module;
the determining module is configured to determine a topological structure of the music prediction model before the obtaining module obtains the predicted music;
and the training module is used for training the music prediction model according to the music frequency band characteristic and the output of the model and the topological structure determined by the determining module.
15. A terminal device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.
16. A storage medium containing computer-executable instructions for performing the method of any one of claims 1-7 when executed by a computer processor.
CN201710175115.8A 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment Active CN107045867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710175115.8A CN107045867B (en) 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710175115.8A CN107045867B (en) 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN107045867A CN107045867A (en) 2017-08-15
CN107045867B true CN107045867B (en) 2020-06-02

Family

ID=59544865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710175115.8A Active CN107045867B (en) 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN107045867B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108538301B (en) * 2018-02-13 2021-05-07 吟飞科技(江苏)有限公司 Intelligent digital musical instrument based on neural network audio technology
CN109192187A (en) * 2018-06-04 2019-01-11 平安科技(深圳)有限公司 Composing method, system, computer equipment and storage medium based on artificial intelligence
CN110660375B (en) * 2018-06-28 2024-06-04 北京搜狗科技发展有限公司 Method, device and equipment for generating music
CN109285560B (en) * 2018-09-28 2021-09-03 北京奇艺世纪科技有限公司 Music feature extraction method and device and electronic equipment
CN109727590B (en) * 2018-12-24 2020-09-22 成都嗨翻屋科技有限公司 Music generation method and device based on recurrent neural network
CN109872709B (en) * 2019-03-04 2020-10-02 湖南工程学院 New music generation method with low similarity based on note complex network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050972A (en) * 2013-03-14 2014-09-17 雅马哈株式会社 Sound signal analysis apparatus, sound signal analysis method and sound signal analysis program
CN104282300A (en) * 2013-07-05 2015-01-14 中国移动通信集团公司 Non-periodic component syllable model building and speech synthesizing method and device
CN105374347A (en) * 2015-09-22 2016-03-02 中国传媒大学 A mixed algorithm-based computer-aided composition method for popular tunes in regions south of the Yangtze River

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6979767B2 (en) * 2002-11-12 2005-12-27 Medialab Solutions Llc Systems and methods for creating, modifying, interacting with and playing musical compositions
US7925502B2 (en) * 2007-03-01 2011-04-12 Microsoft Corporation Pitch model for noise estimation
US20170046973A1 (en) * 2015-10-27 2017-02-16 Thea Kuddo Preverbal elemental music: multimodal intervention to stimulate auditory perception and receptive language acquisition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050972A (en) * 2013-03-14 2014-09-17 雅马哈株式会社 Sound signal analysis apparatus, sound signal analysis method and sound signal analysis program
CN104282300A (en) * 2013-07-05 2015-01-14 中国移动通信集团公司 Non-periodic component syllable model building and speech synthesizing method and device
CN105374347A (en) * 2015-09-22 2016-03-02 中国传媒大学 A mixed algorithm-based computer-aided composition method for popular tunes in regions south of the Yangtze River

Also Published As

Publication number Publication date
CN107045867A (en) 2017-08-15

Similar Documents

Publication Publication Date Title
CN107045867B (en) Automatic composition method and device and terminal equipment
CN107221326B (en) Voice awakening method and device based on artificial intelligence and computer equipment
US20200357427A1 (en) Voice Activity Detection Using A Soft Decision Mechanism
US10261965B2 (en) Audio generation method, server, and storage medium
KR102128926B1 (en) Method and device for processing audio information
EP3255633B1 (en) Audio content recognition method and device
CN102956237B (en) The method and apparatus measuring content consistency
US20220398835A1 (en) Target detection system suitable for embedded device
JP2016520879A (en) Speech data recognition method, device and server for distinguishing local rounds
CN111276124B (en) Keyword recognition method, device, equipment and readable storage medium
CN108021635A (en) The definite method, apparatus and storage medium of a kind of audio similarity
CN110111811A (en) Audio signal detection method, device and storage medium
US20220215839A1 (en) Method for determining voice response speed, related device and computer program product
CN113271386B (en) Howling detection method and device, storage medium and electronic equipment
US20070011001A1 (en) Apparatus for predicting the spectral information of voice signals and a method therefor
CN110889010A (en) Audio matching method, device, medium and electronic equipment
CN112420079A (en) Voice endpoint detection method and device, storage medium and electronic equipment
CN116805012A (en) Quality assessment method and device for multi-mode knowledge graph, storage medium and equipment
CN113223487A (en) Information identification method and device, electronic equipment and storage medium
CN114399992B (en) Voice instruction response method, device and storage medium
CN113409792B (en) Voice recognition method and related equipment thereof
JP2021526669A (en) Voice feature extraction device, voice feature extraction method, and program
CN111899729B (en) Training method and device for voice model, server and storage medium
CN114220415A (en) Audio synthesis method and device, electronic equipment and storage medium
EP3872808A1 (en) Voice processing apparatus, voice processing method, and computer-readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant