CN113158642A

CN113158642A - Information processing method, information processing device, electronic equipment and storage medium

Info

Publication number: CN113158642A
Application number: CN202110448290.6A
Authority: CN
Inventors: 孙炜岳; 吴健; 韩毅
Original assignee: Beijing Smart Sound Technology Co ltd
Current assignee: Beijing Smart Sound Technology Co ltd
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2021-07-23

Abstract

The application discloses an information processing method, an information processing device, electronic equipment and a storage medium, and the specific implementation scheme is as follows: according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multi-level threshold value, and multi-level phrase information forming the melody information is obtained; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment. By the aid of the method and the device, automatic division of the phrases can be rapidly achieved.

Description

Information processing method, information processing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of digital music, and in particular, to an information processing method and apparatus, an electronic device, and a storage medium.

Background

Since the information revolution, the way music and multimedia are spread has changed in a short time. This variety of qualities has led to a dramatic increase in market demand for various types of music: a great deal of original music is required regardless of whether the song, album, MV, karaoke, which is a major element of popular music or artistic creation, or short video, advertisement, animation, trailer, and movie works using music as an auxiliary, or radio, anchor, public space music using music as background content. In the automatic composition technology of a computer, melody punctuation (namely phrase division) needs to be carried out on the melody if subsequent applications such as word matching and the like need to be carried out on the automatically created melody, and the automatic phrase division can be carried out and is very important in the applications such as automatic composition, singing voice synthesis and the like. How to quickly realize the automatic division of the phrases is a technical problem to be urgently solved.

Disclosure of Invention

The application provides an information processing method, an information processing device, electronic equipment and a storage medium.

According to an aspect of the present application, there is provided an information processing method including:

according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multilevel threshold value, and multilevel phrase information forming the melody information is obtained; wherein the labeling information used for training the phrase division model comprises: and the singing ventilation point when the song is performed based on the melody information is taken as phrase marking information obtained at the division moment.

According to another aspect of the present application, there is provided an information processing apparatus including:

the sentence break processing module is used for carrying out melody sentence break processing on the melody information based on a multilevel threshold value according to the melody information and a pre-trained phrase dividing model to obtain multilevel phrase information forming the melody information; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as provided by any one of the embodiments of the present application.

According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present application.

By adopting the method and the device, melody sentence breaking processing can be carried out on the melody information based on the multilevel threshold value according to the melody information and a pre-trained phrase division model, so that multilevel phrase information forming the melody information is obtained; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is performed based on the rhythm information is used as phrase marking information obtained at the dividing moment, so that the phrase can be quickly and automatically divided.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic flow chart diagram of an information processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a configuration of an information processing apparatus according to an embodiment of the present application;

fig. 3 is a block diagram of an electronic device for implementing the information processing method according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The term "at least one" herein means any combination of any one or more of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C. The terms "first" and "second" used herein refer to and distinguish one from another in the similar art, without necessarily implying a sequence or order, or implying only two, such as first and second, to indicate that there are two or more of the features, first and second, and the first feature may be one or more of the other features.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present application.

According to an embodiment of the present application, an information processing method is provided, and fig. 1 is a flowchart of the information processing method according to the embodiment of the present application, and the method can be applied to an information processing apparatus, for example, in the case where the apparatus can be deployed in a terminal or a server or other processing device for execution, phrase division or the like can be executed. Among them, the terminal may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and the like. In some possible implementations, the method may also be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 1, includes:

s101, according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multi-level threshold value, and multi-level phrase information forming the melody information is obtained; wherein, the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.

In one example, for automatically created melody information, if word matching is required to realize subsequent applications such as synthesizing singing voice, melody punctuation processing (or phrase information dividing processing) needs to be performed on the melody information. The phrase information is a basic structural unit having characteristics constituting a musical piece, for example, four phrases of lyrics correspond to four phrases. Because the annotation information in the sample data set for training the phrase dividing model is phrase annotation information obtained by performing phrase annotation by using the singing ventilation point when a song sings as the dividing time, rather than performing phrase annotation according to an obvious logic rule, if the obtained multistage phrase information contains two-stage phrase information (namely, first-stage phrase information and second-stage phrase information) as an example, the ventilation probability of the second-stage phrase information is lower than that of the first-stage phrase information, so that the probability distribution output corresponding to the model of the pre-trained phrase dividing model obtained after the phrase annotation information is trained in the application has an obvious difference between the first-stage phrase information and the second-stage phrase information, and the hierarchical phrase dividing becomes possible. Finally, automatic multi-level phrase division can be achieved by performing the melody break processing (e.g., multi-level phrase division based on multi-level thresholds). Further, for melody information in which it is difficult to divide phrases, a judgment can also be given.

In one embodiment, the method further comprises: obtaining music score information; extracting musical piece construction information containing phrase information from the music score information according to a preset beat; obtaining a music segment structure according to the music segment construction information, and collecting data by taking the music segment structure as a unit to obtain a sample data set for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.

In one embodiment, the data collection in units of the music piece structure includes: and collecting melody representations transposed to preset positions in the music segment structure, wherein the melody representations are used for describing melody conditions at different dividing time. For example, the music piece structure needs to collect melody representations transposed to predetermined positions (e.g., after C major key or a minor key).

In one embodiment, the musical piece structure includes a plurality of melody sequences divided according to the melody information; wherein, in the case where the bar is a first bar, the position of the first bar is determined based on a first chord at which the music piece structure starts.

In one embodiment, the method further comprises: in the course of training the phrase dividing model, obtaining a plurality of melody sequences obtained by dividing the melody information and a plurality of position sequences (Pos) respectively corresponding to the melody sequences (such as M) according to the sample data set; inputting the melody sequences and the position sequences into the phrase division model to obtain probabilities of vectors corresponding to the melody sequences respectively, wherein the probabilities are used for representing the probability that each melody sequence is the beginning of the multi-level phrase information; and performing back propagation of the loss function until convergence based on the probability to obtain the pre-trained phrase division model.

In one embodiment, the method for performing melody sentence segmentation processing on the melody information based on a multilevel threshold according to the melody information and a pre-trained phrase segmentation model to obtain multilevel phrase information constituting the melody information includes: obtaining the probability of a plurality of vectors respectively corresponding to a plurality of melody sequences according to the melody information and a pre-trained phrase division model; under the condition that the probability is greater than a first-level phrase threshold value, extracting a plurality of first sub-melody sequences matched with the current situation from the plurality of melody sequences, and obtaining a plurality of first-level phrase information based on the plurality of first sub-melody sequences; under the condition that the probability is greater than a secondary phrase threshold and smaller than a primary phrase threshold, extracting a plurality of second sub melody sequences matched with the current situation from the plurality of first sub melody sequences, and obtaining a plurality of pieces of secondary phrase information based on the plurality of second sub melody sequences; and obtaining the multi-level phrase information according to the plurality of first-level phrase information and the plurality of second-level phrase information.

Application example:

in the computer automatic composition technology, it is very important to be able to automatically divide phrases for melodies. At present, no similar automatic melody phrase dividing technology exists, and only automatic phrase breaking technologies of voice and text exist. It is assumed that there is a rule-based melody automatic phrase dividing technique, and for example, a conditional judgment on whether to break a phrase is made based on information such as a duration of a rhythm middle sound and a current phrase length. However, the rule-based automatic sentence-breaking technology is difficult to cover various melody situations, and particularly for automatically generated melodies, the melody is often not obvious, the melody is difficult to divide by using simple logic, and situations such as automatic division and improper division occur. In addition, it is difficult to perform phrase division in multiple stages.

In view of the above problems, by applying the processing flow of an application example in the embodiment of the present application, the melody may be automatically divided by a pre-trained phrase division model, and a segment of melody M ═ M represented by N numerical values₀…m_N-1Automatic phrase segmentation is performed, i.e. a set of strictly increasing position indices P ═ P needs to be found₀…p_KIndicating that the melody is divided into K sentencesLevel clauses in which p_iThe index of the position of the beginning of the ith e { 0.,. K-1} sentence melody. In particular, for ease of presentation, contract p_kN. Wherein the i ∈ { 0., K-1} sentence has a melody of

And with Q ═ Q₀…q_K-1,p_i＜q_i＜p_i+1-1 indicates that in the ith sentence can be divided into two second-level phrases

For the case that the first-level phrase or the second-level phrase cannot be divided, an automatic judgment is made. The method specifically comprises the following steps:

first, data collection, representation, preprocessing and modeling

And collecting the music scores, and recording the music scores by taking music pieces as units to form a data set. One musical section needs to record the melody after being shifted to C major key or a minor key, the melody needs to be divided by the breathing point when the song sings for carrying out phrase marking, the first minor bar position of the melody and the total minor bar number b, wherein the first minor bar position is judged according to the first chord started by the musical section, the first beat of the first minor bar is specified as the 0 moment of the musical section, the position difference of the melody head relative to the first minor bar is recorded as delta st, if the melody of the musical section is started before the first minor bar, the melody of the advanced part is called as the weak start melody, and the delta st is less than 0; if the first beat or later of the first bar begins, no weak spin-up law exists, and the delta st is more than or equal to 0. Maximum allowed weak rise length ST in data set_max32, Δ ST ≧ ST_max。

It is determined that the quantization length per beat SPQ is 4, for example, for a song of four beats, the melody of each measure is expressed by 16 numerical values, which is emphasized in "representation of equidistant discretization" for the song melody, that is: the song melody is divided into 4 moments uniformly on the time scale in each beat, no matter how complex, and m can be as follows at the 4 moments_iThe formula (2) records the melody condition at this moment. Each beat is 4 moments, each bar is 4 beats, namely 16 moments, and m of 16 melodies corresponds to_iThe value is obtained. In addition, the application is not limited to musical blocksThe beat number is not limited to the target tone number of the tone shift register and the tone shift, the SPQ value is not limited, and the mark is not necessarily expressed in equal parts, as follows m_iThe calculation formula of (2) may also be varied, and it is within the scope of the present application as long as the melody can be expressed in a simple "quantitative" manner.

SPQ quantification is adopted, in consideration of weak starting, the melody of a music section needs to be represented by N numerical values, N is 16 multiplied by b plus delta st, only the music sections which accord with delta st less than 16 and more than or equal to-32 and b is more than or equal to 4 are limited to be selected, and the melody of a music section is represented by M₀…m_N-1Wherein m is_iI ∈ {0, 1.,. N-1} describes the time of day

The melody situation of time. If there is a note or rest, the start time is not equal to a certain t_iThen adjust its start time to be equal to a t in its vicinity_i. The entire melody may be moved in octaves so that the melody satisfies the following formula:

the melody value has a total of N_mClass 62.

A position sequence Pos ═ Pos corresponding to the melody sequence can be obtained₀...pos_N-1Wherein:

pos_i＝i+Δst+ST_max

the sentence-punctuation mark of melody can be expressed as a vector S ═ S corresponding to melody sequence one by one₀…s_N-1. Wherein,

second, training model

The model may be an attention network Transformer, and the probability sequence Y-Y of S may be calculated from the sequence M and Pos₀…y_N-1，y_iIs a melody m_iProbability of the beginning of a phrase

Y＝Transformer(M,Pos)

Training a model, adjusting a model parameter theta, and optimizing the following loss function:

wherein

Represents a data set, D represents data of a musical piece, and M represents a melody in D.

Thirdly, sentence breaking is carried out by utilizing the model

The Pos sequence is calculated for a given melody sequence M and Δ st.

Using the trained model Transformer, the Y sequence is calculated from the formula Y ═ Transformer (M, Pos).

A first-level phrase threshold a is set to 0.9.

Finding out all items with probability value larger than A in Y sequence, and recording the total number of the items as K, i.e. dividing K first-level phrases, and obtaining p by subscripts at positions of the items according to increasing order₀…p_K-1Which satisfies:

binding of p_KN to obtain the desired sequence P ═ P₀…p_KWherein the ith phrase is the first phrase

Searching the position subscript with the highest probability in the Y sequence except the first position, i.e. the first position of the first-level phrase, in the ith sentence, wherein the subscript is the required position subscriptQ of (a) to (b)_i：

A second level phrase threshold B is set to 0.3.

If it is

Then the i-th sentence may be considered to have no secondary phrase or the secondary phrase may be considered to be insignificant.

If it is

Q is then_iSubscripted to the position of the desired secondary phrase.

The second-level phrase division as described above is performed on the ith e {0, 1., K-1} first-level phrases, and Q is obtained₀…q_K-1. Thus, a first-level phrase division P and a second-level phrase division Q of the melody sequence are obtained.

Setting an acceptable maximum first-level phrase number N according to the data set and the application scene_{A_max}Minimum number of first-stage phrases N_{A_min}. If it is

The melody phrase number may be considered to be unsatisfactory.

Setting a suspected phrase threshold value C of 0.6 and a suspected proportion R_C0.5. If in the interval [ A, C ] in the sequence Y]Is greater than R_CThe melody phrase classification is not obvious, and phrase classification is difficult.

By adopting the method and the device, the flow of phrase division is carried out by using the neural network, the data-driven hierarchical phrase division is realized, because the phrase marking is carried out according to the singing air exchange point instead of the explicit logic rule, and the probability of secondary phrase air exchange is lower than that of the primary phrase, the probability given by the trained neural network can be obviously different between the primary phrase and the secondary phrase, so that the hierarchical phrase division is possible, the automatic phrase division technology without rules is realized, and the multilevel phrase division can be carried out according to the set threshold value. For melodies in which it is difficult to divide phrases, judgments may also be given.

According to an embodiment of the present application, there is provided an information processing apparatus, and fig. 2 is a schematic diagram of a configuration of the information processing apparatus according to the embodiment of the present application, as shown in fig. 2, including: a sentence break processing module 51, configured to perform melody sentence break processing on the melody information based on a multilevel threshold according to the melody information and a pre-trained phrase division model, so as to obtain multilevel phrase information constituting the melody information; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.

In one embodiment, the method further comprises: the music score acquisition module is used for acquiring music score information; the music piece construction module is used for extracting music piece construction information containing phrase information from the music score information according to a preset beat; the sample set collection module is used for obtaining a music segment structure according to the music segment construction information, and performing data collection by taking the music segment structure as a unit to obtain a sample data set used for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.

In one embodiment, the sample set collection module is configured to: and collecting melody representations transferred to preset positions in the music segment structure, wherein the melody representations are used for describing rotation conditions of different division moments.

In one embodiment, the musical piece structure includes a plurality of melody sequences divided according to the melody information; further comprising: a determination module configured to: in the case where the bar is the first bar, the position of the first bar is determined based on the first chord at which the music piece structure starts.

In one embodiment, the method further comprises: the first processing module is used for obtaining a plurality of melody sequences obtained by dividing aiming at the melody information and a plurality of position sequences respectively corresponding to the melody sequences according to the sample data set in the course of training the phrase dividing model; a second processing module, configured to input the multiple melody sequences and the multiple position sequences into the phrase division model, and obtain probabilities of multiple vectors corresponding to the multiple melody sequences, where the probabilities are used to represent a probability that each melody sequence is a beginning of the multi-level phrase information; and the third processing module is used for performing back propagation of the loss function until convergence on the basis of the probability to obtain the pre-trained phrase division model.

In an embodiment, the sentence break processing module is configured to: obtaining probabilities of a plurality of vectors respectively corresponding to a plurality of melody sequences according to the melody information and a phrase division model trained in advance; under the condition that the probability is larger than a first-level phrase threshold value, extracting a plurality of first sub-melody sequences matched with the current situation from the plurality of melody sequences, and obtaining a plurality of first-level phrase information based on the plurality of first sub-melody sequences; under the condition that the probability is greater than a secondary phrase threshold and smaller than a primary phrase threshold, extracting a plurality of second sub melody sequences matched with the current situation from the plurality of first sub melody sequences, and obtaining a plurality of pieces of secondary phrase information based on the plurality of second sub melody sequences; and obtaining the multi-level phrase information according to the plurality of first-level phrase information and the plurality of second-level phrase information.

The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 3 is a block diagram of an electronic device for implementing the information processing method according to the embodiment of the present application. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 3, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 3, a processor 801 is taken as an example.

The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the information processing method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the information processing method provided by the present application.

The memory 802, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the information processing methods in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing by running the non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the information processing method in the above-described method embodiments.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the information processing method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or in another manner, which is exemplified in fig. 3.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions are possible, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An information processing method, characterized in that the method comprises:

according to melody information and a pre-trained phrase division model, melody sentence breaking processing is carried out on the melody information on the basis of a multi-level threshold value, and multi-level phrase information forming the melody information is obtained; wherein the labeling information for training the phrase division model includes: and the singing ventilation point when the song is sung based on the melody information is taken as phrase marking information obtained at the division moment.

2. The method of claim 1, further comprising:

obtaining music score information;

extracting musical piece construction information containing phrase information from the music score information according to a preset beat;

obtaining a music segment structure according to the music segment construction information, and collecting data by taking the music segment structure as a unit to obtain a sample data set for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.

3. The method of claim 2, wherein the collecting data in units of the music piece structure comprises:

and collecting melody representations transferred to preset positions in the music segment structure, wherein the melody representations are used for describing melody conditions at different dividing moments.

4. The method of claim 2, wherein the musical piece structure comprises a plurality of melody sequences divided for the melody information; wherein,

in the case where the bar is the first bar, the position of the first bar is determined based on the first chord at which the music piece structure starts.

5. The method of claim 2, further comprising:

in the course of training the phrase dividing model, obtaining a plurality of melody sequences obtained by dividing aiming at the melody information and a plurality of position sequences respectively corresponding to the melody sequences according to the sample data set;

inputting the melody sequences and the position sequences into the phrase division model to obtain probabilities of vectors corresponding to the melody sequences respectively, wherein the probabilities are used for representing the probability that each melody sequence is the beginning of the multilevel phrase information;

and performing back propagation of the loss function until convergence based on the probability to obtain the pre-trained phrase division model.

6. The method according to any one of claims 1 to 5, wherein the obtaining of the multilevel phrase information constituting the melody information by performing melody sentence-breaking processing on the melody information based on a multilevel threshold according to the melody information and a pre-trained phrase classification model comprises:

obtaining the probability of a plurality of vectors respectively corresponding to a plurality of melody sequences according to the melody information and a pre-trained phrase division model;

under the condition that the probability is greater than a first-level phrase threshold value, extracting a plurality of first sub-melody sequences matched with the current condition from the plurality of melody sequences, and obtaining a plurality of first-level phrase information based on the plurality of first sub-melody sequences;

under the condition that the probability is greater than a secondary phrase threshold and smaller than a primary phrase threshold, extracting a plurality of second sub melody sequences matched with the current situation from the plurality of first sub melody sequences, and obtaining a plurality of pieces of secondary phrase information based on the plurality of second sub melody sequences;

and obtaining the multi-level phrase information according to the plurality of first-level phrase information and the plurality of second-level phrase information.

7. An information processing apparatus characterized in that the apparatus comprises:

8. The apparatus of claim 7, further comprising:

the music score acquisition module is used for acquiring music score information;

the music piece construction extraction module is used for extracting music piece construction information containing phrase information from the music score information according to a preset beat;

the sample set collection module is used for obtaining a music segment structure according to the music segment construction information, and performing data collection by taking the music segment structure as a unit to obtain a sample data set used for training the phrase division model; wherein the sample data set comprises: and marking information for the phrase.

9. The apparatus of claim 8, wherein the sample set collection module is configured to:

10. The apparatus according to claim 8, wherein the musical piece structure comprises a plurality of melody sequences divided for the melody information;

further comprising: a determination module configured to: in the case where the bar is the first bar, the position of the first bar is determined based on the first chord at which the music piece structure starts.

11. The apparatus of claim 8, further comprising:

the first processing module is used for obtaining a plurality of melody sequences obtained by dividing aiming at the melody information and a plurality of position sequences respectively corresponding to the melody sequences according to the sample data set in the course of training the phrase dividing model;

a second processing module, configured to input the multiple melody sequences and the multiple position sequences into the phrase division model, and obtain probabilities of multiple vectors corresponding to the multiple melody sequences, where the probabilities are used to represent a probability that each melody sequence is the beginning of the multi-level phrase information;

and the third processing module is used for carrying out back propagation of the loss function until convergence based on the probability to obtain the pre-trained phrase division model.

12. The apparatus according to any of claims 7-11, wherein the sentence break processing module is configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.