CN113158642A - Information processing method, information processing device, electronic equipment and storage medium - Google Patents
Information processing method, information processing device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113158642A CN113158642A CN202110448290.6A CN202110448290A CN113158642A CN 113158642 A CN113158642 A CN 113158642A CN 202110448290 A CN202110448290 A CN 202110448290A CN 113158642 A CN113158642 A CN 113158642A
- Authority
- CN
- China
- Prior art keywords
- melody
- information
- phrase
- sequences
- obtaining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 27
- 238000003672 processing method Methods 0.000 title claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000009423 ventilation Methods 0.000 claims abstract description 11
- 238000002372 labelling Methods 0.000 claims abstract description 9
- 230000015654 memory Effects 0.000 claims description 20
- 238000010276 construction Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 9
- 238000013480 data collection Methods 0.000 claims description 4
- 238000013145 classification model Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000033764 rhythmic process Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The application discloses an information processing method, an information processing device, electronic equipment and a storage medium, and the specific implementation scheme is as follows: according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multi-level threshold value, and multi-level phrase information forming the melody information is obtained; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment. By the aid of the method and the device, automatic division of the phrases can be rapidly achieved.
Description
Technical Field
The present application relates to the field of digital music, and in particular, to an information processing method and apparatus, an electronic device, and a storage medium.
Background
Since the information revolution, the way music and multimedia are spread has changed in a short time. This variety of qualities has led to a dramatic increase in market demand for various types of music: a great deal of original music is required regardless of whether the song, album, MV, karaoke, which is a major element of popular music or artistic creation, or short video, advertisement, animation, trailer, and movie works using music as an auxiliary, or radio, anchor, public space music using music as background content. In the automatic composition technology of a computer, melody punctuation (namely phrase division) needs to be carried out on the melody if subsequent applications such as word matching and the like need to be carried out on the automatically created melody, and the automatic phrase division can be carried out and is very important in the applications such as automatic composition, singing voice synthesis and the like. How to quickly realize the automatic division of the phrases is a technical problem to be urgently solved.
Disclosure of Invention
The application provides an information processing method, an information processing device, electronic equipment and a storage medium.
According to an aspect of the present application, there is provided an information processing method including:
according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multilevel threshold value, and multilevel phrase information forming the melody information is obtained; wherein the labeling information used for training the phrase division model comprises: and the singing ventilation point when the song is performed based on the melody information is taken as phrase marking information obtained at the division moment.
According to another aspect of the present application, there is provided an information processing apparatus including:
the sentence break processing module is used for carrying out melody sentence break processing on the melody information based on a multilevel threshold value according to the melody information and a pre-trained phrase dividing model to obtain multilevel phrase information forming the melody information; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.
According to another aspect of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as provided by any one of the embodiments of the present application.
According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present application.
By adopting the method and the device, melody sentence breaking processing can be carried out on the melody information based on the multilevel threshold value according to the melody information and a pre-trained phrase division model, so that multilevel phrase information forming the melody information is obtained; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is performed based on the rhythm information is used as phrase marking information obtained at the dividing moment, so that the phrase can be quickly and automatically divided.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic flow chart diagram of an information processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a configuration of an information processing apparatus according to an embodiment of the present application;
fig. 3 is a block diagram of an electronic device for implementing the information processing method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The term "at least one" herein means any combination of any one or more of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C. The terms "first" and "second" used herein refer to and distinguish one from another in the similar art, without necessarily implying a sequence or order, or implying only two, such as first and second, to indicate that there are two or more of the features, first and second, and the first feature may be one or more of the other features.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.
According to an embodiment of the present application, an information processing method is provided, and fig. 1 is a flowchart of the information processing method according to the embodiment of the present application, and the method can be applied to an information processing apparatus, for example, in the case where the apparatus can be deployed in a terminal or a server or other processing device for execution, phrase division or the like can be executed. Among them, the terminal may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like. In some possible implementations, the method may also be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 1, includes:
s101, according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multi-level threshold value, and multi-level phrase information forming the melody information is obtained; wherein, the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.
In one example, for automatically created melody information, if word matching is required to realize subsequent applications such as synthesizing singing voice, melody punctuation processing (or phrase information dividing processing) needs to be performed on the melody information. The phrase information is a basic structural unit having characteristics constituting a musical piece, for example, four phrases of lyrics correspond to four phrases. Because the annotation information in the sample data set for training the phrase dividing model is phrase annotation information obtained by performing phrase annotation by using the singing ventilation point when a song sings as the dividing time, rather than performing phrase annotation according to an obvious logic rule, if the obtained multistage phrase information contains two-stage phrase information (namely, first-stage phrase information and second-stage phrase information) as an example, the ventilation probability of the second-stage phrase information is lower than that of the first-stage phrase information, so that the probability distribution output corresponding to the model of the pre-trained phrase dividing model obtained after the phrase annotation information is trained in the application has an obvious difference between the first-stage phrase information and the second-stage phrase information, and the hierarchical phrase dividing becomes possible. Finally, automatic multi-level phrase division can be achieved by performing the melody break processing (e.g., multi-level phrase division based on multi-level thresholds). Further, for melody information in which it is difficult to divide phrases, a judgment can also be given.
By adopting the method and the device, melody sentence breaking processing can be carried out on the melody information based on the multilevel threshold value according to the melody information and a pre-trained phrase division model, so that multilevel phrase information forming the melody information is obtained; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is performed based on the rhythm information is used as phrase marking information obtained at the dividing moment, so that the phrase can be quickly and automatically divided.
In one embodiment, the method further comprises: obtaining music score information; extracting musical piece construction information containing phrase information from the music score information according to a preset beat; obtaining a music segment structure according to the music segment construction information, and collecting data by taking the music segment structure as a unit to obtain a sample data set for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.
In one embodiment, the data collection in units of the music piece structure includes: and collecting melody representations transposed to preset positions in the music segment structure, wherein the melody representations are used for describing melody conditions at different dividing time. For example, the music piece structure needs to collect melody representations transposed to predetermined positions (e.g., after C major key or a minor key).
In one embodiment, the musical piece structure includes a plurality of melody sequences divided according to the melody information; wherein, in the case where the bar is a first bar, the position of the first bar is determined based on a first chord at which the music piece structure starts.
In one embodiment, the method further comprises: in the course of training the phrase dividing model, obtaining a plurality of melody sequences obtained by dividing the melody information and a plurality of position sequences (Pos) respectively corresponding to the melody sequences (such as M) according to the sample data set; inputting the melody sequences and the position sequences into the phrase division model to obtain probabilities of vectors corresponding to the melody sequences respectively, wherein the probabilities are used for representing the probability that each melody sequence is the beginning of the multi-level phrase information; and performing back propagation of the loss function until convergence based on the probability to obtain the pre-trained phrase division model.
In one embodiment, the method for performing melody sentence segmentation processing on the melody information based on a multilevel threshold according to the melody information and a pre-trained phrase segmentation model to obtain multilevel phrase information constituting the melody information includes: obtaining the probability of a plurality of vectors respectively corresponding to a plurality of melody sequences according to the melody information and a pre-trained phrase division model; under the condition that the probability is greater than a first-level phrase threshold value, extracting a plurality of first sub-melody sequences matched with the current situation from the plurality of melody sequences, and obtaining a plurality of first-level phrase information based on the plurality of first sub-melody sequences; under the condition that the probability is greater than a secondary phrase threshold and smaller than a primary phrase threshold, extracting a plurality of second sub melody sequences matched with the current situation from the plurality of first sub melody sequences, and obtaining a plurality of pieces of secondary phrase information based on the plurality of second sub melody sequences; and obtaining the multi-level phrase information according to the plurality of first-level phrase information and the plurality of second-level phrase information.
Application example:
in the computer automatic composition technology, it is very important to be able to automatically divide phrases for melodies. At present, no similar automatic melody phrase dividing technology exists, and only automatic phrase breaking technologies of voice and text exist. It is assumed that there is a rule-based melody automatic phrase dividing technique, and for example, a conditional judgment on whether to break a phrase is made based on information such as a duration of a rhythm middle sound and a current phrase length. However, the rule-based automatic sentence-breaking technology is difficult to cover various melody situations, and particularly for automatically generated melodies, the melody is often not obvious, the melody is difficult to divide by using simple logic, and situations such as automatic division and improper division occur. In addition, it is difficult to perform phrase division in multiple stages.
In view of the above problems, by applying the processing flow of an application example in the embodiment of the present application, the melody may be automatically divided by a pre-trained phrase division model, and a segment of melody M ═ M represented by N numerical values0…mN-1Automatic phrase segmentation is performed, i.e. a set of strictly increasing position indices P ═ P needs to be found0…pKIndicating that the melody is divided into K sentencesLevel clauses in which piThe index of the position of the beginning of the ith e { 0.,. K-1} sentence melody. In particular, for ease of presentation, contract pkN. Wherein the i ∈ { 0., K-1} sentence has a melody ofAnd with Q ═ Q0…qK-1,pi<qi<pi+1-1 indicates that in the ith sentence can be divided into two second-level phrasesFor the case that the first-level phrase or the second-level phrase cannot be divided, an automatic judgment is made. The method specifically comprises the following steps:
first, data collection, representation, preprocessing and modeling
And collecting the music scores, and recording the music scores by taking music pieces as units to form a data set. One musical section needs to record the melody after being shifted to C major key or a minor key, the melody needs to be divided by the breathing point when the song sings for carrying out phrase marking, the first minor bar position of the melody and the total minor bar number b, wherein the first minor bar position is judged according to the first chord started by the musical section, the first beat of the first minor bar is specified as the 0 moment of the musical section, the position difference of the melody head relative to the first minor bar is recorded as delta st, if the melody of the musical section is started before the first minor bar, the melody of the advanced part is called as the weak start melody, and the delta st is less than 0; if the first beat or later of the first bar begins, no weak spin-up law exists, and the delta st is more than or equal to 0. Maximum allowed weak rise length ST in data setmax32, Δ ST ≧ STmax。
It is determined that the quantization length per beat SPQ is 4, for example, for a song of four beats, the melody of each measure is expressed by 16 numerical values, which is emphasized in "representation of equidistant discretization" for the song melody, that is: the song melody is divided into 4 moments uniformly on the time scale in each beat, no matter how complex, and m can be as follows at the 4 momentsiThe formula (2) records the melody condition at this moment. Each beat is 4 moments, each bar is 4 beats, namely 16 moments, and m of 16 melodies corresponds toiThe value is obtained. In addition, the application is not limited to musical blocksThe beat number is not limited to the target tone number of the tone shift register and the tone shift, the SPQ value is not limited, and the mark is not necessarily expressed in equal parts, as follows miThe calculation formula of (2) may also be varied, and it is within the scope of the present application as long as the melody can be expressed in a simple "quantitative" manner.
SPQ quantification is adopted, in consideration of weak starting, the melody of a music section needs to be represented by N numerical values, N is 16 multiplied by b plus delta st, only the music sections which accord with delta st less than 16 and more than or equal to-32 and b is more than or equal to 4 are limited to be selected, and the melody of a music section is represented by M0…mN-1Wherein m isiI ∈ {0, 1.,. N-1} describes the time of dayThe melody situation of time. If there is a note or rest, the start time is not equal to a certain tiThen adjust its start time to be equal to a t in its vicinityi. The entire melody may be moved in octaves so that the melody satisfies the following formula:
the melody value has a total of NmClass 62.
A position sequence Pos ═ Pos corresponding to the melody sequence can be obtained0...posN-1Wherein:
posi=i+Δst+STmax
the sentence-punctuation mark of melody can be expressed as a vector S ═ S corresponding to melody sequence one by one0…sN-1. Wherein,
second, training model
The model may be an attention network Transformer, and the probability sequence Y-Y of S may be calculated from the sequence M and Pos0…yN-1,yiIs a melody miProbability of the beginning of a phrase
Y=Transformer(M,Pos)
Training a model, adjusting a model parameter theta, and optimizing the following loss function:
Thirdly, sentence breaking is carried out by utilizing the model
The Pos sequence is calculated for a given melody sequence M and Δ st.
Using the trained model Transformer, the Y sequence is calculated from the formula Y ═ Transformer (M, Pos).
A first-level phrase threshold a is set to 0.9.
Finding out all items with probability value larger than A in Y sequence, and recording the total number of the items as K, i.e. dividing K first-level phrases, and obtaining p by subscripts at positions of the items according to increasing order0…pK-1Which satisfies:
Searching the position subscript with the highest probability in the Y sequence except the first position, i.e. the first position of the first-level phrase, in the ith sentence, wherein the subscript is the required position subscriptQ of (a) to (b)i:
A second level phrase threshold B is set to 0.3.
If it isThen the i-th sentence may be considered to have no secondary phrase or the secondary phrase may be considered to be insignificant.
The second-level phrase division as described above is performed on the ith e {0, 1., K-1} first-level phrases, and Q is obtained0…qK-1. Thus, a first-level phrase division P and a second-level phrase division Q of the melody sequence are obtained.
Setting an acceptable maximum first-level phrase number N according to the data set and the application sceneA_maxMinimum number of first-stage phrases NA_min. If it isThe melody phrase number may be considered to be unsatisfactory.
Setting a suspected phrase threshold value C of 0.6 and a suspected proportion RC0.5. If in the interval [ A, C ] in the sequence Y]Is greater than RCThe melody phrase classification is not obvious, and phrase classification is difficult.
By adopting the method and the device, the flow of phrase division is carried out by using the neural network, the data-driven hierarchical phrase division is realized, because the phrase marking is carried out according to the singing air exchange point instead of the explicit logic rule, and the probability of secondary phrase air exchange is lower than that of the primary phrase, the probability given by the trained neural network can be obviously different between the primary phrase and the secondary phrase, so that the hierarchical phrase division is possible, the automatic phrase division technology without rules is realized, and the multilevel phrase division can be carried out according to the set threshold value. For melodies in which it is difficult to divide phrases, judgments may also be given.
According to an embodiment of the present application, there is provided an information processing apparatus, and fig. 2 is a schematic diagram of a configuration of the information processing apparatus according to the embodiment of the present application, as shown in fig. 2, including: a sentence break processing module 51, configured to perform melody sentence break processing on the melody information based on a multilevel threshold according to the melody information and a pre-trained phrase division model, so as to obtain multilevel phrase information constituting the melody information; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.
In one embodiment, the method further comprises: the music score acquisition module is used for acquiring music score information; the music piece construction module is used for extracting music piece construction information containing phrase information from the music score information according to a preset beat; the sample set collection module is used for obtaining a music segment structure according to the music segment construction information, and performing data collection by taking the music segment structure as a unit to obtain a sample data set used for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.
In one embodiment, the sample set collection module is configured to: and collecting melody representations transferred to preset positions in the music segment structure, wherein the melody representations are used for describing rotation conditions of different division moments.
In one embodiment, the musical piece structure includes a plurality of melody sequences divided according to the melody information; further comprising: a determination module configured to: in the case where the bar is the first bar, the position of the first bar is determined based on the first chord at which the music piece structure starts.
In one embodiment, the method further comprises: the first processing module is used for obtaining a plurality of melody sequences obtained by dividing aiming at the melody information and a plurality of position sequences respectively corresponding to the melody sequences according to the sample data set in the course of training the phrase dividing model; a second processing module, configured to input the multiple melody sequences and the multiple position sequences into the phrase division model, and obtain probabilities of multiple vectors corresponding to the multiple melody sequences, where the probabilities are used to represent a probability that each melody sequence is a beginning of the multi-level phrase information; and the third processing module is used for performing back propagation of the loss function until convergence on the basis of the probability to obtain the pre-trained phrase division model.
In an embodiment, the sentence break processing module is configured to: obtaining probabilities of a plurality of vectors respectively corresponding to a plurality of melody sequences according to the melody information and a phrase division model trained in advance; under the condition that the probability is larger than a first-level phrase threshold value, extracting a plurality of first sub-melody sequences matched with the current situation from the plurality of melody sequences, and obtaining a plurality of first-level phrase information based on the plurality of first sub-melody sequences; under the condition that the probability is greater than a secondary phrase threshold and smaller than a primary phrase threshold, extracting a plurality of second sub melody sequences matched with the current situation from the plurality of first sub melody sequences, and obtaining a plurality of pieces of secondary phrase information based on the plurality of second sub melody sequences; and obtaining the multi-level phrase information according to the plurality of first-level phrase information and the plurality of second-level phrase information.
The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 3 is a block diagram of an electronic device for implementing the information processing method according to the embodiment of the present application. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 3, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 3, a processor 801 is taken as an example.
The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the information processing method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the information processing method provided by the present application.
The memory 802, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the information processing methods in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing by running the non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the information processing method in the above-described method embodiments.
The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the information processing method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or in another manner, which is exemplified in fig. 3.
The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions are possible, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (14)
1. An information processing method, characterized in that the method comprises:
according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multi-level threshold value, and multi-level phrase information forming the melody information is obtained; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.
2. The method of claim 1, further comprising:
obtaining music score information;
extracting musical piece construction information containing phrase information from the music score information according to a preset beat;
obtaining a music segment structure according to the music segment construction information, and collecting data by taking the music segment structure as a unit to obtain a sample data set for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.
3. The method of claim 2, wherein the collecting data in units of the music piece structure comprises:
and collecting melody representations transferred to preset positions in the music segment structure, wherein the melody representations are used for describing melody conditions at different dividing moments.
4. The method of claim 2, wherein the musical piece structure comprises a plurality of melody sequences divided for the melody information; wherein,
in the case where the bar is the first bar, the position of the first bar is determined based on the first chord at which the music piece structure starts.
5. The method of claim 2, further comprising:
in the course of training the phrase dividing model, obtaining a plurality of melody sequences obtained by dividing aiming at the melody information and a plurality of position sequences respectively corresponding to the melody sequences according to the sample data set;
inputting the melody sequences and the position sequences into the phrase division model to obtain probabilities of vectors corresponding to the melody sequences respectively, wherein the probabilities are used for representing the probability that each melody sequence is the beginning of the multilevel phrase information;
and performing back propagation of the loss function until convergence based on the probability to obtain the pre-trained phrase division model.
6. The method according to any one of claims 1 to 5, wherein the obtaining of the multilevel phrase information constituting the melody information by performing melody sentence-breaking processing on the melody information based on a multilevel threshold according to the melody information and a pre-trained phrase classification model comprises:
obtaining the probability of a plurality of vectors respectively corresponding to a plurality of melody sequences according to the melody information and a pre-trained phrase division model;
under the condition that the probability is greater than a first-level phrase threshold value, extracting a plurality of first sub-melody sequences matched with the current condition from the plurality of melody sequences, and obtaining a plurality of first-level phrase information based on the plurality of first sub-melody sequences;
under the condition that the probability is greater than a secondary phrase threshold and smaller than a primary phrase threshold, extracting a plurality of second sub melody sequences matched with the current situation from the plurality of first sub melody sequences, and obtaining a plurality of pieces of secondary phrase information based on the plurality of second sub melody sequences;
and obtaining the multi-level phrase information according to the plurality of first-level phrase information and the plurality of second-level phrase information.
7. An information processing apparatus characterized in that the apparatus comprises:
the sentence break processing module is used for carrying out melody sentence break processing on the melody information based on a multilevel threshold value according to the melody information and a pre-trained phrase dividing model to obtain multilevel phrase information forming the melody information; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.
8. The apparatus of claim 7, further comprising:
the music score acquisition module is used for acquiring music score information;
the music piece construction extraction module is used for extracting music piece construction information containing phrase information from the music score information according to a preset beat;
the sample set collection module is used for obtaining a music segment structure according to the music segment construction information, and performing data collection by taking the music segment structure as a unit to obtain a sample data set used for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.
9. The apparatus of claim 8, wherein the sample set collection module is configured to:
and collecting melody representations transferred to preset positions in the music segment structure, wherein the melody representations are used for describing melody conditions at different dividing moments.
10. The apparatus according to claim 8, wherein the musical piece structure comprises a plurality of melody sequences divided for the melody information;
further comprising: a determination module configured to: in the case where the bar is the first bar, the position of the first bar is determined based on the first chord at which the music piece structure starts.
11. The apparatus of claim 8, further comprising:
the first processing module is used for obtaining a plurality of melody sequences obtained by dividing aiming at the melody information and a plurality of position sequences respectively corresponding to the melody sequences according to the sample data set in the course of training the phrase dividing model;
a second processing module, configured to input the multiple melody sequences and the multiple position sequences into the phrase division model, and obtain probabilities of multiple vectors corresponding to the multiple melody sequences, where the probabilities are used to represent a probability that each melody sequence is the beginning of the multi-level phrase information;
and the third processing module is used for carrying out back propagation of the loss function until convergence based on the probability to obtain the pre-trained phrase division model.
12. The apparatus according to any of claims 7-11, wherein the sentence break processing module is configured to:
obtaining the probability of a plurality of vectors respectively corresponding to a plurality of melody sequences according to the melody information and a pre-trained phrase division model;
under the condition that the probability is greater than a first-level phrase threshold value, extracting a plurality of first sub-melody sequences matched with the current condition from the plurality of melody sequences, and obtaining a plurality of first-level phrase information based on the plurality of first sub-melody sequences;
under the condition that the probability is greater than a secondary phrase threshold and smaller than a primary phrase threshold, extracting a plurality of second sub melody sequences matched with the current situation from the plurality of first sub melody sequences, and obtaining a plurality of pieces of secondary phrase information based on the plurality of second sub melody sequences;
and obtaining the multi-level phrase information according to the plurality of first-level phrase information and the plurality of second-level phrase information.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110448290.6A CN113158642A (en) | 2021-04-25 | 2021-04-25 | Information processing method, information processing device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110448290.6A CN113158642A (en) | 2021-04-25 | 2021-04-25 | Information processing method, information processing device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113158642A true CN113158642A (en) | 2021-07-23 |
Family
ID=76870244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110448290.6A Pending CN113158642A (en) | 2021-04-25 | 2021-04-25 | Information processing method, information processing device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113158642A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113920969A (en) * | 2021-10-09 | 2022-01-11 | 北京灵动音科技有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN113920968A (en) * | 2021-10-09 | 2022-01-11 | 北京灵动音科技有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
-
2021
- 2021-04-25 CN CN202110448290.6A patent/CN113158642A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113920969A (en) * | 2021-10-09 | 2022-01-11 | 北京灵动音科技有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
CN113920968A (en) * | 2021-10-09 | 2022-01-11 | 北京灵动音科技有限公司 | Information processing method, information processing device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110717339B (en) | Semantic representation model processing method and device, electronic equipment and storage medium | |
KR20210116379A (en) | Method, apparatus for text generation, device and storage medium | |
CN112542155B (en) | Song synthesis method, model training method, device, equipment and storage medium | |
CN110264991A (en) | Training method, phoneme synthesizing method, device, equipment and the storage medium of speech synthesis model | |
JP6541673B2 (en) | Real time voice evaluation system and method in mobile device | |
CN112365876B (en) | Method, device and equipment for training speech synthesis model and storage medium | |
JP2008175955A (en) | Indexing device, method and program | |
CN113158642A (en) | Information processing method, information processing device, electronic equipment and storage medium | |
CN112489676A (en) | Model training method, device, equipment and storage medium | |
CN108766451B (en) | Audio file processing method and device and storage medium | |
CN108304373A (en) | Construction method, device, storage medium and the electronic device of semantic dictionary | |
Samsekai Manjabhat et al. | Raga and tonic identification in carnatic music | |
CN112541076A (en) | Method and device for generating extended corpus of target field and electronic equipment | |
CN112925912B (en) | Text processing method, synonymous text recall method and apparatus | |
CN113920968B (en) | Information processing method, information processing device, electronic equipment and storage medium | |
JP2020003535A (en) | Program, information processing method, electronic apparatus and learnt model | |
CN117711427A (en) | Audio processing method and device | |
Bhattacharya et al. | A multimodal approach towards emotion recognition of music using audio and lyrical content | |
CN112989109A (en) | Music structure analysis method, electronic equipment and storage medium | |
CN110708619B (en) | Word vector training method and device for intelligent equipment | |
KR100542757B1 (en) | Automatic expansion Method and Device for Foreign language transliteration | |
CN108255917A (en) | Image management method, equipment and electronic equipment | |
CN117290515A (en) | Training method of text annotation model, method and device for generating text graph | |
CN113920969A (en) | Information processing method, information processing device, electronic equipment and storage medium | |
CN111428487B (en) | Model training method, lyric generation method, device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |