CN111401038A - Text processing method and device, electronic equipment and storage medium - Google Patents

Text processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111401038A
CN111401038A CN202010121213.5A CN202010121213A CN111401038A CN 111401038 A CN111401038 A CN 111401038A CN 202010121213 A CN202010121213 A CN 202010121213A CN 111401038 A CN111401038 A CN 111401038A
Authority
CN
China
Prior art keywords
lyric
rewriting
word
text
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010121213.5A
Other languages
Chinese (zh)
Other versions
CN111401038B (en
Inventor
曹绍升
杨轶斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010121213.5A priority Critical patent/CN111401038B/en
Publication of CN111401038A publication Critical patent/CN111401038A/en
Application granted granted Critical
Publication of CN111401038B publication Critical patent/CN111401038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries

Abstract

The embodiment of the specification discloses a text processing method, a text processing device, an electronic device and a storage medium, wherein a sentence pair extraction process can be performed on the basis of a lyric text to obtain K groups of lyric rewrite word pairs, a corresponding K rewrite word pair candidate set is further generated according to the K groups of lyric rewrite word pairs, each rewrite word pair candidate set comprises more than one lyric rewrite word pair aiming at the same source word and the rewrite probability of each lyric rewrite word pair aiming at more than one lyric rewrite word pair of the same source word.

Description

Text processing method and device, electronic equipment and storage medium
Technical Field
Embodiments of the present disclosure relate to word processing technologies, and in particular, to a text processing method and apparatus, an electronic device, and a storage medium.
Background
The creation of the lyrics requires the creator to have a certain literary and scientific skill and life history, and the creation inspiration of the creator also influences the creation quality of the lyrics to a great extent. With the continuous development of AI (Artificial Intelligence) technology, AI has been applied to all aspects of our lives and works, for example, AI works to help musicians or fans create better lyrics or tunes.
Disclosure of Invention
Embodiments of the present description provide a text processing method, an apparatus, an electronic device, and a storage medium, which implement fast and accurate automatic rewriting of rhyme texts such as lyrics.
In a first aspect, an embodiment of the present specification provides a lyric text processing method, which is applied to an electronic device, and the method includes: acquiring a lyric text set, and performing sentence pair extraction processing on each lyric text in the lyric text set to obtain a lyric sentence pair set, wherein the lyric sentence pair set comprises more than one lyric sentence pair; performing word pair extraction processing on each song word and sentence pair in the song word and sentence pair set to obtain K groups of song word rewriting word pairs, wherein each group of song word rewriting word pairs comprises more than one song word rewriting word pair aiming at the same source word, and K is a positive integer; determining the rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of lyric rewriting word pairs according to the K groups of lyric rewriting word pairs and the rewriting probability of each lyric rewriting word pair, wherein each rewriting word pair candidate set comprises more than one lyric rewriting word pair aiming at the same source word and the rewriting probability of each lyric rewriting word pair in the rewriting word pair candidate sets.
In a second aspect, an embodiment of the present specification provides a lyric rewriting method, which is applied to an electronic device, and includes: receiving a lyric rewriting request of a user, and specifying a target lyric text to be rewritten in the lyric rewriting request; for each word in the target lyric text, determining a target rewriting word pair candidate set for the word from K rewriting word pair candidate sets, and rewriting the word in the target rewriting word pair candidate set according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, wherein the K rewriting word pair candidate sets are obtained according to the lyric text processing method of the first aspect; presenting the new lyrics text to the user.
In a third aspect, an embodiment of the present specification provides a method for rewriting a rhyme text, including: obtaining a rhyme retention text set, and performing statement pair extraction processing on each rhyme retention text in the rhyme retention text set to obtain a statement pair set, wherein the statement pair set comprises more than one statement pair, and each statement pair comprises two adjacent statements; performing word pair extraction processing on each sentence pair in the sentence pair set to generate K groups of rhyme text rewriting word pairs, wherein each group of rhyme text rewriting word pairs comprises more than one rhyme text rewriting word pair aiming at the same source word, and K is a positive integer; determining the rewriting probability of each rhyme text rewriting word pair in the K groups of rhyme text rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of rhyme text rewriting word pairs according to the K groups of rhyme text rewriting word pairs and the rewriting probability of each rhyme text rewriting word pair, wherein each rewriting word pair candidate set comprises more than one rhyme text rewriting word pair aiming at the same rewriting source word and the rewriting probability of each rhyme text rewriting word pair aiming at the rewriting word pair candidate set; and rewriting a target rhyme text according to the plurality of rewritten word pair candidate sets in the K rewritten word pair candidate sets to obtain a new rhyme text corresponding to the target rhyme text.
In a fourth aspect, an embodiment of the present specification provides a lyric text processing apparatus, which is applied to an electronic device, and the apparatus includes: the system comprises a first sentence pair extraction unit, a second sentence pair extraction unit and a lyric analysis unit, wherein the first sentence pair extraction unit is used for acquiring a lyric text set and extracting each lyric text in the lyric text set in a sentence pair manner to obtain a lyric sentence pair set, and the lyric sentence pair set comprises more than one lyric sentence pair; the first word pair extraction unit is used for carrying out word pair extraction processing on each song word and sentence pair in the song word and sentence pair set to obtain K groups of song word rewriting word pairs, each group of song word rewriting word pairs comprises more than one song word rewriting word pair aiming at the same source word, and K is a positive integer; and the first word pair set generation unit is used for determining the rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of lyric rewriting word pairs according to the K groups of lyric rewriting word pairs and the rewriting probability of each lyric rewriting word pair, wherein each rewriting word pair candidate set comprises more than one lyric rewriting word pair aiming at the same source word and the rewriting probability of each lyric rewriting word pair aiming at the rewriting word pair candidate set.
In a fifth aspect, an embodiment of the present specification provides a lyric rewriting apparatus, which is applied to an electronic device, and includes: the system comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving a lyric rewriting request of a user and appointing a target lyric text to be rewritten in the lyric rewriting request; a rewriting unit, configured to determine, for each word in the target lyric text, a target rewriting word pair candidate set for the word from the K rewriting word pair candidate sets, and rewrite the word according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, where the K rewriting word pair candidate sets are obtained according to the lyric text processing method of the first aspect; and the presentation unit is used for presenting the new lyric text to the user.
In a sixth aspect, an embodiment of the present specification provides an apparatus for rewriting a rhyme text, including: the second sentence pair extraction unit is used for acquiring a rhyme-rhyme text set and performing sentence pair extraction processing on each rhyme text in the rhyme text set to obtain a sentence pair set, wherein the sentence pair set comprises more than one sentence pair, and each sentence pair comprises two adjacent sentences; a second word pair extraction unit, configured to perform word pair extraction processing on each statement pair in the statement pair set to generate K sets of rhyme text rewritten word pairs, where each set of rhyme text rewritten word pairs includes more than one rhyme text rewritten word pair for the same source word, and K is a positive integer; a second word pair set generating unit, configured to determine a rewriting probability of each rhyme text rewriting word pair in the K sets of rhyme text rewriting word pairs, and generate K rewriting word pair candidate sets corresponding to the K sets of rhyme text rewriting word pairs according to the K sets of rhyme text rewriting word pairs and the rewriting probability of each rhyme text rewriting word pair, where each rewriting word pair candidate set includes more than one rhyme text rewriting word pair for the same rewriting source word, and a rewriting probability of each rhyme text rewriting word pair in the rewriting word pair candidate set; and the text rewriting unit is used for rewriting the target rhyme text according to the plurality of rewritten word pair candidate sets in the K rewritten word pair candidate sets to obtain a new rhyme text corresponding to the target rhyme text.
In a seventh aspect, an embodiment of the present specification provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method in any one of the first to third aspects when executing the program.
In an eighth aspect, the present specification provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method according to any one of the first to third aspects.
One or more technical solutions provided in the embodiments of the present description at least achieve the following technical effects or advantages:
the technical scheme provided by the embodiment of the specification can obtain K groups of lyric rewriting word pairs from a lyric text set, and further generate K rewriting word pair candidate sets corresponding to the rewriting word pairs according to the K groups, wherein each rewriting word pair candidate set comprises more than one lyric rewriting word pair aiming at the same source word and the rewriting probability of each lyric rewriting word pair aiming at more than one lyric rewriting word pair of the same source word, so that the K rewriting word pair candidate sets aiming at K different source words are automatically generated according to the lyric text, the generated K rewriting word pair candidate sets can be used for quickly and accurately determining the replacement words for rewriting the lyrics, the lyrics are quickly and accurately rewritten, and the improvement of the rewriting accuracy and the rewriting efficiency of the lyrics plays an auxiliary role.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only examples of the embodiments of the present specification, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a lyric text processing method provided in an embodiment of the present specification;
FIG. 2 is a flowchart of a lyric rewriting method provided in an embodiment of the present specification;
fig. 3 is a flowchart of a method for rewriting a rhyme text according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a lyric text processing apparatus according to an embodiment of the present specification;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of this specification.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the embodiments in the present specification.
In the embodiments of the present specification, the term "plurality" means "two or more", that is, includes two or more cases; the term "and/or" is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In a first aspect, an embodiment of the present disclosure provides a lyric text processing method, which is applied to an electronic device, where the electronic device includes a server side of any music platform or a client side of any music platform, and the electronic device may also be any electronic device that separately implements a lyric text processing function. Referring to fig. 1, a method for processing a lyric text provided in an embodiment of the present specification at least includes the following steps:
s100, acquiring a lyric text set;
in step S100, the process of obtaining the lyric text set specifically includes: original lyric texts are extracted from a song library of more than one music platform, and a lyric text set is formed based on the extracted more than one original lyric texts.
In the embodiment of the present specification, each original lyric text is a lyric content file, for example, a song includes a lyric content file for displaying lyrics and an audio file for playing, and in the embodiment of the present specification, the lyric content file is extracted from a song library.
Specifically, the original lyric text may be extracted only from a lyric library managed by a server side that executes the lyric text processing method. On the basis, in order to obtain more original lyric texts, the original lyric texts can be extracted from each large word song library of the Internet.
Specifically, the forming of the lyric text set based on the extracted more than one original lyric text specifically includes: preprocessing each extracted original lyric text to obtain more than one corresponding preprocessed lyric text, wherein the formed preprocessed lyric texts correspond to the original lyric texts one by one; and determining each preprocessed lyric text exceeding the preset text data amount from the more than one preprocessed lyric texts to form a lyric text set. And forming a lyric text set by all the preprocessed lyric texts exceeding the preset text data amount.
Specifically, the preset text data amount may be a preset number of lyrics, for example, the preset text data amount may be three lyrics, that is, a pre-processed lyric text of less than two lyrics is deleted, and each pre-processed lyric text of more than three lyrics is reserved to form a lyric text set. The predetermined text data amount may be a predetermined text size, for example, each preprocessed lyric text with a number of text bytes exceeding a certain KB value is retained to form a lyric text set.
The preprocessing of each original lyric text comprises the following steps: and carrying out lyric sentence division on the original words lyric text. Specifically, each sentence of lyrics of each original lyric text can be divided according to specific punctuations such as line feed symbols, punctuations and the like; then, the words are divided directly for each divided lyric, or the words are divided after deleting the non-Chinese character symbol in each lyric, so that each lyric is divided into a word sequence or a Chinese word sequence combined according to a certain sequence from a continuous word sequence.
In the embodiment of the specification, after the lyric text set is obtained, for each preprocessed lyric text in the lyric text set, each sentence of lyrics not exceeding the preset number of words in the preprocessed lyric text is deleted, and a lyric text corresponding to the preprocessed lyric text is obtained. Such as: and deleting the lyrics of each sentence of the two words in each preprocessed text.
By deleting each preprocessed lyric text not exceeding the preset text data amount and deleting each sentence of lyrics not exceeding the preset word number, the subsequent processing of meaningless lyrics is avoided, and the computing resources are saved.
S102, performing sentence pair extraction processing on each lyric text in the lyric text set to obtain a lyric sentence pair set, wherein the lyric sentence pair set comprises more than one lyric sentence pair, and each lyric sentence pair comprises two adjacent lyrics.
The lyric sentence pairs of the lyric text can be extracted every time one lyric text is obtained, so that a lyric sentence pair set is continuously updated online, online updating is further performed according to the continuously updated lyric sentence pair set, the rewritten word pairs in a candidate set of the rewritten word pairs are increased, and the number and the accuracy of the rewritten word pairs are increased. Or performing sentence pair extraction processing on each lyric text in a lyric text set formed after the lyric text collection is completed, so that the method is suitable for generating a rewrite word pair candidate set in an off-line manner.
In step S102, for each lyric text in the lyric text set, obtaining every two adjacent lyrics in the lyric text to form each lyric sentence pair in the lyric text; and forming a lyric sentence pair set according to the lyric sentence pair of each lyric text in the lyric text set.
Specifically, for each lyric text in the lyric text set, each lyric sentence pair in the lyric text is obtained in the following manner: and scanning the lyric text, sequentially extracting the lyrics of the current sentence and the lyrics of the next sentence scanned from the lyric text, and forming a lyric sentence pair by using the lyrics of the current sentence and the lyrics of the next sentence until the lyric text is scanned to form each lyric sentence pair in the lyric text. In an optional implementation manner, in order to ensure the accuracy of word alignment, a lyric sentence pair meeting a preset alignment condition is determined from a lyric sentence pair in each lyric text in a lyric text set, so as to form a lyric sentence pair set. Specifically, the preset alignment condition may be: the words and expressions of two words and expressions in the lyric sentence pairs are the same, namely, the lyric sentence pairs with the same number of words and expressions are reserved in each lyric text in the lyric text set, and the lyric sentence pairs with different numbers of words and expressions are deleted.
Next, the following lyric text is taken as an example to illustrate step S102:
"you have said that love can not drift far with wind
If you have lost it for years
Ever my heart string
We will meet the above
As long as you are beside
Can go back to the origin
I will love you never to be tired
Disadvantages in life
Whether can hear
Love of our two
Autumn of one person
Then will find out
Feel your face
Whether you hear "
The following 13 lyric sentence pairs of the lyric text are obtained (the "| |" represents the interval between two lyrics in a lyric sentence pair, and is not necessarily the actually existing sign in the lyric sentence pair):
1. if you have lost you say that love does not drift far with the wind | | | for years
2. If you have lost that year, | | | ever my heart chord
3. Ever my heart chord | | | we can meet the surface again
4. We can see the face | | | again as long as you are beside
5. Can return to the origin only if you are beside | | | |
6. Can return to the original point I will love you never to be tired
7. I will love you never tire in all you will be in the shortcoming in the life
8. Whether the disadvantage in life | | | can be heard
9. Whether the love of our two can be heard
10. Love of our two in autumn of one person
11. One person's autumn | | | will then find
12. Then it will be found that | | | | feels your face
13. Feel if your face | | | you hear
The lyric sentence pairs with different numbers of words in the 13 sentence pairs are deleted, the lyric sentence pairs with the same number of words are reserved and are used as the lyric sentence pairs in the lyric sentence pair set, and the reserved lyric sentence pairs are as follows:
1. if you have lost you say that love does not drift far with the wind | | | for years
3. Ever my heart chord | | | we can meet the surface again
4. We can see the face | | | again as long as you are beside
8. Whether the disadvantage in life | | | can be heard
9. Whether the love of our two can be heard
10. Love of our two in autumn of one person
11. One person's autumn | | | will then find
12. Then it will be found that | | | | feels your face
Referring to the above example, lyric sentence pairs with the same number of words are extracted from each lyric text in the lyric text set as lyric sentence pairs in the lyric sentence pair set. Therefore, the lyric sentence pair set contains abundant and various lyric sentence pairs.
After the set of phrase pairs is obtained through step S102, step S104 may be performed: and carrying out word pair extraction processing on each song word and sentence pair in the song word and sentence pair set to obtain K groups of song word rewriting word pairs.
For example, the process of extracting word pairs from lyrics sentence pairs consisting of the t-th lyrics and the t + 1-th lyrics is as follows: and for each word of the T-th lyric, finding lyrics aligned with the word in the T-th lyric from the words of the T + 1-th lyric. For example, the lyric of the t th sentence contains words a1, B1 and C1, and the lyric of the t +1 th sentence contains words a2, B2 and C2, then a word a2 aligned with the word a1 in the lyric of the t +1 th sentence, a word B2 aligned with the word B1 and a word C2 aligned with the word C1 are found in the words of the lyric of the t +1 th sentence.
In step S104, the following steps S1041 to S1043 are specifically included:
wherein, step S1041 is: and performing word alignment on each song word and sentence pair in the song word and sentence pair set through a trained word and sentence alignment model to obtain an original rewritten word pair set corresponding to the song word and sentence pair set, wherein the trained word and sentence alignment model is obtained based on the training of the lyric sample.
Specifically, each lyric sentence pair in the set of lyric sentence pairs is input into the trained word alignment model, and the trained word alignment model outputs each lyric rewrite word pair of the lyric sentence pair. For example, inputting a certain phrase pair is: "we are two, love | one person, autumn", output corresponding three lyrics rewrite word pairs: "we are two people" | "," | | of "," love | | | autumn ". In the embodiments of the present specification, for the sake of reading and understanding only, two words of a word pair are rewritten with "|" interval lyrics, and "-" indicates the position of a participle in each lyric of the lyric sentence pair, and is not necessarily a symbol that actually exists.
Specifically, for each lyric sentence pair in the set of lyric sentence pairs, the lyric sentence pair may be extracted based on an SMT (statistical mechanical translation) model trained by a lyric sample or a Neural Machine translation system (NMT) model trained by a lyric sample, and each lyric rewritten word pair of the lyric sentence pair is output. The SMT model may be an SMT word alignment tool such as mkcls, GIZA + +, and the NMT may be a sequence-sequence (seq2seq) model, a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), an attention-based mechanism, and the like.
In order to exclude the rewritten word pair to which the two words do not correspond, the rewritten word pair is optimized by executing S1042. S1042: and deleting each lyric rewriting word pair which does not accord with the preset rewriting condition in the original rewriting word pair set to obtain a lyric rewriting word pair set.
In an embodiment of this specification, deleting each lyric rewrite word pair in the original rewrite word pair set that does not meet a preset rewrite condition specifically includes: and deleting each lyric rewriting word pair which does not accord with the preset rewriting condition in the M lyric rewriting word pairs aiming at the M lyric rewriting word pairs generated according to the jth lyric sentence pair to obtain the lyric rewriting word pairs which accord with the preset rewriting condition in the jth lyric sentence pair, wherein j is 1 to N in sequence, N is the number of the lyric sentence pairs in the lyric sentence pair set, M is an integer larger than 1, and M is the number of the lyric sentence pairs in the jth lyric sentence pair. It should be noted that, different lyric sentence pairs, the number of generated lyric rewrite word pairs may all be different.
In a specific implementation process, deleting each lyric rewrite word pair which does not meet a preset rewrite condition in the M lyric rewrite word pairs, specifically: if M lyric rewriting word pairs output aiming at the jth lyric sentence pair do not meet the alignment rule, deleting each lyric rewriting word pair generated according to the jth lyric sentence pair; if the requirement meets the contraposition rule, judging whether the word numbers of two words of the lyric rewriting word pair are the same for each lyric rewriting word pair generated according to the jth lyric sentence pair; if the words are the same, the lyric rewriting word pair is reserved, otherwise, the lyric rewriting word pair is deleted.
It should be noted that, for M lyric rewrite word pairs output by the jth lyric sentence pair, the alignment rule is satisfied, specifically: m lyric rewrite word pairs meet the one-to-one correspondence relationship according to the word sequence one by one, namely: if the jth lyric sentence pair consists of a lyric sentence k1 and a lyric sentence k2, in order: the alignment rule is satisfied when the first word of the lyric sentence k1 corresponds to the first word of the lyric sentence k2 and the second word of the lyric sentence k1 corresponds to the second word of the lyric sentence k2 until the last word of the lyric sentence k1 corresponds to the last word of the lyric sentence k 2. On the contrary, if the F-th word of the lyric sentence k1 corresponds to a word other than the F-th word in the lyric sentence k2, the lyric sentence k1 and the lyric sentence k2 do not satisfy the alignment rule.
For example, for the lyric sentence pair "we are two", love | a person, fall ", the output lyric rewrite word pair is: "we are two people, the" | "of" and "love | | | autumn". It can be seen that the output three lyric rewrite word pairs satisfy the one-to-one correspondence relationship according to the word sequence one by one, which indicates that the alignment rule is satisfied. For another example, "give only. we. love in this season. autumn", the lyrics rewrite word pair output is: "let's the season of | | |," we | | | autumn "," love | | | |, it can be seen that, the three lyrics rewrite word pairs that output do not satisfy the one-to-one correspondence according to the word order one by one, indicate that do not satisfy the alignment rule.
In the embodiment of the specification, the accuracy of the lyric rewrite word pair is ensured by deleting the lyric rewrite word pair which does not meet the contraposition rule in the original rewrite word pair set and deleting the lyric rewrite word pair of which the source word and the target word have different numbers of words.
In this embodiment of the present specification, a lyric rewrite word pair is a pair of words formed by a source word and a target word, where the source word is a word in the lyric rewrite word pair for locating a word that needs to be rewritten and replaced, and specifically, the source word may be a word that needs to be rewritten and replaced or a word related to the word that needs to be rewritten and replaced, such as: the words are aligned in the lyrics of the previous sentence, and the target words are words used for replacement in the lyric rewriting word pair. For example, in the three lyric rewrite word pairs of "two people", "in" and "love in autumn", the "two people", "in" and "love in autumn" are all source words, and the "one person", "in" and "autumn" are all target words.
S1043, grouping the lyric rewriting words to the concentrated lyric rewriting word pairs according to the source words of each lyric rewriting word pair in the set of lyric rewriting word pairs to obtain K groups of lyric rewriting word pairs.
The concentrated lyric rewriting word pairs are grouped by the lyric rewriting word pairs, more than one lyric rewriting word pair aiming at the same source word is classified into the same group of lyric rewriting word pairs, and target words of each lyric rewriting word pair classified into the same group of lyric rewriting word pairs are different from each other. For example, "love | | autumn", "love | | | heart string", "love | | | | next" … … all belong to the same lyric rewrite word pair for the source word "love", so classify to the same group of lyric rewrite word pairs. It should be noted that, the lyric rewrite word pairs under different group lyric rewrite word pairs have different source words, but may have the same target words.
After generating the K groups of lyric rewrite word pairs, executing step S106: determining the rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of lyric rewriting word pairs according to each lyric rewriting word pair in the K groups of lyric rewriting word pairs and the rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs; wherein, each rewriting word pair candidate set comprises more than one lyric rewriting word pair aiming at the same source word and the rewriting probability of each lyric rewriting word pair in the rewriting word pair candidate set.
In an optional implementation manner, the rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs is determined, and the specific process is as follows: determining the generation times of each lyric rewriting word pair in each group of lyric rewriting word pairs; and determining the rewriting probability of each lyric rewriting word pair in the group of lyric rewriting word pairs according to the generation times of each lyric rewriting word pair in the group of lyric rewriting word pairs.
Specifically, the generation times of each lyric rewrite word pair in the original rewrite word pair set are obtained through statistics. In the embodiment of the present specification, in the process of sequentially performing word pair extraction processing on each song word and sentence pair in the song word and sentence pair set, the generation frequency of the lyric rewriting word pair XX is increased by 1 time for each generation of the lyric rewriting word pair XX according to the song word and sentence pair set until the lyric rewriting word pair is generated for the last lyric sentence pair in the song word and sentence pair set, so as to obtain the generation frequency of each lyric rewriting word pair in the original rewriting word pair set.
For example, before performing word pair extraction processing on a lyric sentence pair "prevent i from saying goodbye | l imagine you are near" belonging to a lyric text a, the lyric rewrite word pair "prevent i | imagine" has been generated 3 times, "i say | you are" has been generated 1 time, then the lyric sentence pair "prevent i from saying goodbye | l imagine you are near" in the lyric text a is subjected to word pair extraction processing, so that the generation frequency of the lyric rewrite word pair "prevent i | imagine" is increased to 4 times, the generation frequency of the "i say | you" is increased to 2 times, and the generation frequency of the "re-say | body side" is increased to 1 time.
The rewriting probability of each lyric rewriting word pair in each group of lyric rewriting word pairs is obtained by taking the ith lyric rewriting word pair as an example in the following way: and taking the division result between the generation times of the ith lyric rewriting word pair and the total generation times of each lyric rewriting word pair in the group lyric rewriting word pair in which the ith lyric rewriting word pair is located as the rewriting probability of the ith lyric rewriting word pair.
Generating a candidate set of K rewritten word pairs corresponding to the rewritten word pairs of the K groups of lyrics according to each rewritten word pair of the K groups of lyrics and the corresponding rewriting probability, wherein various implementation modes can be provided, and the following description is respectively given:
the first implementation mode comprises the following steps: and associating the rewriting probability of each lyric rewriting word pair in the group of lyric rewriting word pairs aiming at each group of lyric rewriting word pairs to obtain a rewriting word pair candidate set aiming at the group of lyric rewriting word pairs. Under the embodiment, any lyric rewriting word pair in each group of lyric rewriting word pairs is not deleted, so that the integrity of the lyric rewriting word pairs is ensured.
In order to simplify and optimize the candidate set of the rewritten word pairs, the second embodiment may be adopted:
and for each group of lyric rewriting word pairs, associating the rewriting probability of each lyric rewriting word pair in the group of lyric rewriting word pairs, deleting each lyric rewriting pair with the generation frequency less than a preset frequency threshold value in the group of lyric rewriting word pairs and/or each lyric rewriting pair with the rewriting probability less than a preset probability threshold value, and obtaining a rewriting word pair candidate set corresponding to the group of lyric rewriting word pairs.
In embodiment two, there are two more specific embodiments as follows:
in an optional implementation mode, deleting each lyric rewriting pair of which the generation frequency is less than a preset frequency threshold value in the group of lyric rewriting word pairs before calculating the rewriting probability of each lyric rewriting word pair of the group of lyric rewriting word pairs; based on the above, the rewriting probability of each remaining lyric rewriting word pair in the group of lyric rewriting word pairs is determined according to the generation times of each remaining lyric rewriting word pair in the group of lyric rewriting word pairs, wherein the remaining lyric rewriting word pairs refer to the remaining lyric rewriting word pairs after deleting each lyric rewriting word pair with the generation times less than the preset times threshold. For example, after a group of lyric rewrite word pairs consisting of "love | | | autumn rewrite 70 times", "love | | heart string rewrite 29 times", "love | | | next door, rewrite 1 time" is deleted, the rewrite probability of each remaining lyric rewrite word pair is determined as follows: "love | | autumn, 70.71%"; "love | | heart string, 29.29%", based on which the corresponding rewrite word pair candidate set is obtained as follows: love | | | autumn, 70.71%; heartstring, 29.29% ".
In another optional implementation, deleting each lyric rewrite pair whose generation frequency is less than a preset frequency threshold in the group of lyric rewrite word pairs is performed by calculating a rewrite probability of each lyric rewrite word pair in the group of lyric rewrite word pairs, and based on the calculated rewrite probability, determining a rewrite probability of each lyric rewrite word pair in the group of lyric rewrite word pairs according to the generation frequency of each lyric rewrite word pair in the group of lyric rewrite word pairs, such as a group of lyric rewrite word pairs consisting of "love | | | | autumn, rewrite 70 times", "love | | heart string, rewrite 29 times", "love | | | | | next door, rewrite 1 time", and determining the rewrite probability of each lyric rewrite word pair as follows: "love | | | autumn, 70%", "love | | | heart string, 29%", "love | | | next door, 1%"; after deleting "1%" next to love | | ", the corresponding rewrite word pair candidate set is obtained as follows: love | | | autumn, 70%; heartstring, 29% ".
And in combination with the embodiment of deleting each lyric rewriting pair with the generation frequency less than the preset frequency threshold in the group of lyric rewriting word pairs, deleting the lyric rewriting word pairs lower than the preset rewriting probability in the group of lyric rewriting word pairs after calculating the rewriting probability of each lyric rewriting word pair in the group of lyric rewriting word pairs. Or deleting only the lyric rewrite word pairs lower than the preset rewrite probability in each group of lyric rewrite words to obtain the corresponding rewrite word pair candidate set.
After the K rewrite word pair candidate sets are generated in step S106, the lyric rewrite processing is performed on the target lyric text according to the generated K rewrite word pair candidate sets to obtain a new lyric text, and the lyric content of the new lyric text is different from that of the target lyric text.
In an implementation, the target lyric text may be one or more inputted lyrics, or may be a complete lyric, such as an imported lyric file.
Specifically, for each word in the target lyric text, a target rewriting word pair candidate set for the word is determined from the K rewriting word pair candidate sets, and the word is rewritten according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, wherein the lyric content of the new lyric text is different from that of the target lyric text.
Specifically, the generated K rewriting word pair candidate sets may be stored in a rewriting word pair database, when a lyric rewriting task for rewriting lyrics for any target lyric text is received, each lyric word of each lyric in the target lyric text is sequentially acquired, for the currently acquired lyric word, the rewriting word pair candidate set for the currently acquired lyric word is read from the rewriting word pair database, and a lyric rewriting word pair with the highest rewriting probability is determined from the read rewriting word pair candidate set for completing the rewriting of the currently acquired lyric word until the rewriting of each lyric word of each lyric in the target lyric text is completed, so as to obtain a new lyric text corresponding to the target lyric text.
For convenience of explanation, the following example is given by taking the following three rewrite word pair candidate sets formed in the above steps S100 to S106 as an example, and rewriting lyrics is performed (however, it should be understood that the number of rewrite word pair candidate sets generated in actual implementation is much larger than 3):
rewrite word pair candidate set 1: "love | | | autumn, 70.71%; heart chord, 29.29% ";
rewrite word pair candidate set 2: "Heart chord | | | | | meet, 82%; plain face, 2%; childhood, 16%) "
Rewrite word pair candidate set 3: before "body side | | |, 49%; encounter, 44%; fingertip, 7% ".
Sequentially acquiring each lyric word of a lyric sentence 'your time around' of a target lyric text: "around", "with your", "time", in sequence or in reverse. Obtaining a current lyric word, matching the current lyric word with a source word of each rewriting word pair candidate set in a rewriting word pair database, matching the rewriting word pair candidate set of the current lyric word from the rewriting word pair database, for example, matching the current lyric word with the source word of each rewriting word pair candidate set in the rewriting word pair database, and determining a rewriting word pair candidate set 3 for the lyric word "identity": rewrite word pair candidate set 3: before "body side | | |, 49%; encounter, 44%; fingertip, 7% ". Replace the identity with the target word "between" with the highest rewrite probability to form "there is your time before".
It can be seen from the description of the specific implementation process that the embodiment of the present specification can automatically generate the K rewritten word pair candidate sets for different source words according to the lyric text, so that the generated K rewritten word pair candidate sets can be used to quickly and accurately determine the replaced words for rewriting the lyrics, thereby being beneficial to quickly and accurately rewriting the lyrics, and playing an auxiliary role in improving the accuracy of rewriting the lyrics and the rewriting efficiency.
In a second aspect, based on the same inventive concept as the aforementioned lyric text processing method, an embodiment of the present specification provides a lyric rewriting method, which is applied to an electronic device, where the electronic device may be an independent function that implements the lyric rewriting method described below, or a target client is run on the electronic device, and the lyric rewriting method is implemented offline by the target client. Or the electronic device runs with a target server, and responds to a lyric rewriting request initiated by a client by interacting with the corresponding client through the target server, so as to implement the following lyric rewriting method on line, as shown in fig. 2, the lyric rewriting method includes the following steps:
s200, receiving a lyric rewriting request of a user, and specifying a target lyric text to be rewritten in the lyric rewriting request.
Specifically, the target lyric text may be a lyric text to be rewritten, which is input by a user or imported into a lyric file to be rewritten, and then a lyric rewriting request for the target lyric text is initiated.
S202, determining a target rewrite word pair candidate set for each word in the target lyric text from the K rewrite word pair candidate sets, rewriting the word according to the target rewrite word pair candidate set to obtain a new lyric text corresponding to the target lyric text, wherein the K rewrite word pair candidate sets are obtained according to the embodiment of the lyric text processing method, and for the sake of simplicity of the specification, implementation details of obtaining the K rewrite word pair candidate sets are not repeated.
And S204, presenting the new lyric text to the user.
The following illustrates the process S200 to S204 for rewriting the lyrics:
for example, a user inputs two words of lyrics with 'sudden thoughts and a little solitary', generates a lyric rewriting request for 'sudden thoughts and a little solitary', a target client or a target server receives the lyric rewriting request, and in response to the lyric rewriting request, the target client or the target server executes the following processes to complete lyric rewriting:
1. and searching from a word pair database to determine a rewriting word pair candidate set aiming at 'sudden' in the K rewriting word pair candidate set, and rewriting 'sudden' into 'day and night' according to the rewriting probability of each rewriting word pair aiming at 'sudden' in the rewriting word pair candidate set (the 'day and night' is a target word in the rewriting word pair with the highest rewriting probability 'sudden' day and night, and 90%).
2. Searching from a word pair database to determine a rewrite word pair candidate set aiming at ' of ' the K rewrite words in the candidate set, and rewriting ' yes ' according to the rewrite probability of each rewrite word pair aiming at ' the rewrite words in the candidate set (the ' is the target word in 98% ' of the rewrite word pair with the highest rewrite probability).
3. Searching from a word pair database to determine a rewritten word pair candidate set aiming at the thought in the K rewritten word pair candidate set, and rewriting the thought into the thought according to the rewriting probability of each rewritten word pair aiming at the thought in the candidate set (the thought is the target word in the rewritten word pair with the highest rewriting probability of the thought, namely the thought, 70%).
4. Searching from a word pair database to determine a rewrite word pair candidate set aiming at ' solitary ' in the K rewrite word pair candidate set, and rewriting ' solitary ' into ' solitary ' according to the rewrite probability of each rewrite word pair aiming at the ' solitary ' in the rewrite word pair candidate set (the ' solitary ' is a target word in the rewrite word pair ' solitary | solitary with the highest rewrite probability, 70%);
5. searching from a word pair database to determine a rewrite word pair candidate set aiming at 'silence' in a K rewrite word pair candidate set, rewriting 'silence' into 'silence' ('silence' is a target word in a rewrite word pair 'silent lony, 70%' with the highest rewrite probability), and finally obtaining two new words corresponding to 'sudden thought and silence' as: "thoughts at day and night, lonely".
By the lyric rewriting method, the lyrics which the user wants to rewrite can be quickly and accurately rewritten automatically, the lyric quality is improved, and the complicated process of manual rewriting and the error probability are avoided.
In a third aspect, based on the same inventive concept, an embodiment of the present specification provides a method for rewriting a rhyme-added text, which is shown in fig. 3 and includes the following steps:
s300, obtaining a rhyme-rhyme text set, and performing sentence pair extraction processing on each rhyme text in the rhyme text set to obtain a sentence pair set, wherein the sentence pair set comprises more than one sentence pair, each sentence pair comprises two adjacent sentences, and the rhyme text set comprises more than one rhyme text with different text contents;
s302, performing word pair extraction processing on each statement in the statement pair set to generate K groups of rhyme text rewriting word pairs, wherein each group of rhyme text rewriting word pairs comprises more than one rhyme text rewriting word pair aiming at the same source word, and K is a positive integer;
s304, determining the rewriting probability of each rhyme text rewriting word pair in the K groups of rhyme text rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of rhyme text rewriting word pairs according to the K groups of rhyme text rewriting word pairs and the rewriting probability of each rhyme text rewriting word pair, wherein each rewriting word pair candidate set comprises more than one rhyme text rewriting word pair aiming at the same rewriting source word and the rewriting probability of each rewriting word pair aiming at each rhyme text rewriting word pair in the rewriting word pair candidate sets;
s306, rewriting a target rhyme text according to the plurality of rewritten word pair candidate sets in the K rewritten word pair candidate sets to obtain a new rhyme text corresponding to the target rhyme text.
Specifically, the rhyme text may be a lyric text, an ancient poetry text, a modern poetry text, a reciting text, and the like. No matter what type of rhyme text is, the processing process is the same as or similar to the lyric text, and therefore, specific implementation details of the embodiment of the rhyme text rewriting method may refer to the description in the foregoing embodiment of the lyric text processing method, and are not described herein again for the sake of brevity of the description.
In a fourth aspect, based on the same inventive concept as the aforementioned lyric text processing method, an embodiment of the present specification provides a lyric text processing apparatus, as shown with reference to fig. 4, including:
a first sentence pair extracting unit 401, configured to obtain a lyric text set, and perform sentence pair extraction processing on each lyric text in the lyric text set to obtain a set of lyric sentence pairs, where the set of lyric sentence pairs includes more than one lyric sentence pair;
a first word pair extraction unit 402, configured to perform word pair extraction processing on each song word and sentence pair in the song word and sentence pair set to obtain K groups of song word rewriting word pairs, where each group of song word rewriting word pairs includes more than one song word rewriting word pair for the same source word, and K is a positive integer;
a first word pair set generating unit 403, configured to determine a rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs, and generate K rewriting word pair candidate sets corresponding to the K groups of lyric rewriting word pairs according to the K groups of lyric rewriting word pairs and the rewriting probability of each lyric rewriting word pair, where each rewriting word pair candidate set includes more than one lyric rewriting word pair for the same source word and a rewriting probability of each lyric rewriting word pair in the rewriting word pair candidate set.
In an optional implementation manner, the first word pair extracting unit 401 includes:
the word alignment subunit is used for performing word alignment on each song and phrase pair in the song and phrase pair set through a trained word alignment model to obtain an original rewritten word pair set corresponding to the song and phrase pair set, wherein the trained word alignment model is obtained based on lyric sample training;
a word pair deleting subunit, configured to delete each lyric rewrite word pair in the original rewrite word pair set that does not meet a preset rewrite condition, to obtain a lyric rewrite word pair set;
and the word pair grouping subunit is used for grouping the lyric rewriting words to the concentrated lyric rewriting word pairs according to the source words of each lyric rewriting word pair in the lyric rewriting word pair set to obtain the K groups of lyric rewriting word pairs.
In an optional implementation manner, the first word pair set generating unit 403 is specifically configured to:
determining the generation times of each lyric rewriting word pair in the K groups of lyric rewriting word pairs;
determining the rewriting probability of each lyric rewriting word pair in the group of lyric rewriting word pairs according to the generation times of each lyric rewriting word pair in the group of lyric rewriting word pairs aiming at each group of lyric rewriting word pairs;
and deleting each lyric rewriting pair with the generation frequency less than a preset generation frequency threshold value in the group of lyric rewriting pairs and/or each lyric rewriting pair with the rewriting probability less than a preset probability threshold value aiming at each group of lyric rewriting pairs in the K group of lyric rewriting pairs to obtain a rewriting pair candidate set corresponding to the group of lyric rewriting pairs.
In an optional implementation manner, the first sentence pair extraction unit 401 includes:
a lyric text acquisition subunit to:
acquiring more than one original lyric text;
preprocessing the more than one original lyric text to obtain more than one corresponding preprocessed lyric text, wherein the preprocessing comprises word segmentation processing of each sentence of lyrics in each lyric text;
determining each preprocessed lyric text exceeding a preset text data amount from the more than one preprocessed lyric text to form a lyric text set;
and deleting each sentence of lyrics which do not exceed the preset number of words in the preprocessed lyric text set aiming at each preprocessed lyric text in the lyric text set to obtain a lyric text corresponding to the preprocessed lyric text.
In an optional implementation manner, the first sentence pair extraction unit 401 includes:
a text processing subunit to:
acquiring every two adjacent lyrics in the lyric text aiming at each lyric text in the lyric text set to form each lyric text pair in the lyric text;
and forming a lyric sentence pair set according to the lyric sentence pair of each lyric text in the lyric text set.
In an optional embodiment, the apparatus further comprises:
and the lyric rewriting unit is used for determining a target rewriting word pair candidate set aiming at the words from the K rewriting word pair candidate sets aiming at each word in the target lyric text, and rewriting the words according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, wherein the lyric content of the new lyric text is different from that of the target lyric text.
In a fifth aspect, based on the same inventive concept as that of the foregoing lyric text processing method embodiment, an embodiment of the present specification provides a lyric rewriting apparatus applied to an electronic device, the apparatus including:
the system comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving a lyric rewriting request of a user and appointing a target lyric text to be rewritten in the lyric rewriting request;
the rewriting unit is configured to determine, for each word in the target lyric text, a target rewriting word pair candidate set for the word from the K rewriting word pair candidate sets, and rewrite the word according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, where the K rewriting word pair candidate sets are obtained according to the foregoing lyric text processing method embodiment, and for simplicity of the specification, implementation details of obtaining the K rewriting word pair candidate sets are not described herein again;
and the presentation unit is used for presenting the new lyric text to the user.
In a sixth aspect, based on the same inventive concept as the aforementioned lyric text processing method, an embodiment of the present specification provides a rhyme text rewriting device, including:
the second sentence pair extraction unit is used for acquiring a rhyme-rhyme text set and performing sentence pair extraction processing on each rhyme text in the rhyme text set to obtain a sentence pair set, wherein the sentence pair set comprises more than one sentence pair, and each sentence pair comprises two adjacent sentences;
a second word pair extraction unit, configured to perform word pair extraction processing on each statement pair in the statement pair set to generate K sets of rhyme text rewritten word pairs, where each set of rhyme text rewritten word pairs includes more than one rhyme text rewritten word pair for the same source word, and K is a positive integer;
a second word pair set generating unit, configured to determine a rewriting probability of each rhyme text rewriting word pair in the K sets of rhyme text rewriting word pairs, and generate K rewriting word pair candidate sets corresponding to the K sets of rhyme text rewriting word pairs according to the K sets of rhyme text rewriting word pairs and the rewriting probability of each rhyme text rewriting word pair, where each rewriting word pair candidate set includes more than one rhyme text rewriting word pair for the same rewriting source word, and a rewriting probability of each rhyme text rewriting word pair in the rewriting word pair candidate set;
and the text rewriting unit is used for rewriting the target rhyme text according to the plurality of rewritten word pair candidate sets in the K rewritten word pair candidate sets to obtain a new rhyme text corresponding to the target rhyme text.
The specific functions of the above devices, and the modules thereof, have been described in detail in the embodiments of the lyric text processing method provided in the embodiments of the present specification, and will not be described in detail here.
In a seventh aspect, based on the same inventive concept as the lyric text processing method, the lyric rewriting method, and the rhyme text rewriting method in the foregoing embodiments, an embodiment of this specification further provides an electronic device, as shown in fig. 5, including a memory 504, a processor 502, and a computer program stored in the memory 504 and operable on the processor 502, where the processor 502 implements the steps of any one of the foregoing lyric text processing method, the lyric rewriting method, and the rhyme text rewriting method when executing the program.
Where in fig. 5 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 506 provides an interface between the bus 500 and the receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.
In an eighth aspect, based on the inventive concept similar to the foregoing lyric text processing method embodiment, this specification embodiment further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the foregoing lyric text processing method embodiment.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (18)

1. A lyric text processing method is applied to an electronic device and comprises the following steps:
acquiring a lyric text set, and performing sentence pair extraction processing on each lyric text in the lyric text set to obtain a lyric sentence pair set, wherein the lyric sentence pair set comprises more than one lyric sentence pair;
performing word pair extraction processing on each song word and sentence pair in the song word and sentence pair set to obtain K groups of song word rewriting word pairs, wherein each group of song word rewriting word pairs comprises more than one song word rewriting word pair aiming at the same source word, and K is a positive integer;
determining the rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of lyric rewriting word pairs according to the K groups of lyric rewriting word pairs and the rewriting probability of each lyric rewriting word pair, wherein each rewriting word pair candidate set comprises more than one lyric rewriting word pair aiming at the same source word and the rewriting probability of each lyric rewriting word pair in the rewriting word pair candidate sets.
2. The method of claim 1, wherein said extracting word pairs from each of the set of song word and sentence pairs to obtain K groups of song word rewrite word pairs comprises:
performing word alignment on each song word and sentence pair in the song word and sentence pair set through a trained word and sentence alignment model to obtain an original rewritten word pair set corresponding to the song word and sentence pair set, wherein the trained word and sentence alignment model is obtained based on lyric sample training;
deleting each lyric rewriting word pair which does not accord with preset rewriting conditions in the original rewriting word pair set to obtain a lyric rewriting word pair set;
and grouping the lyric rewriting word pairs in the set according to the source words of each lyric rewriting word pair in the set of the lyric rewriting word pairs to obtain K groups of lyric rewriting word pairs.
3. The method of claim 1, wherein determining the probability of rewriting each lyric rewrite word pair of the K groups of lyric rewrite word pairs comprises:
determining the generation times of each lyric rewriting word pair in the K groups of lyric rewriting word pairs;
determining the rewriting probability of each lyric rewriting word pair in the group of lyric rewriting word pairs according to the generation times of each lyric rewriting word pair in the group of lyric rewriting word pairs aiming at each group of lyric rewriting word pairs;
generating a K rewriting word pair candidate set corresponding to each lyric rewriting word pair in the K group lyric rewriting word pairs according to each lyric rewriting word pair and the corresponding rewriting probability, wherein the method comprises the following steps:
and deleting each lyric rewriting pair with the generation frequency less than a preset generation frequency threshold value in the group of lyric rewriting pairs and/or each lyric rewriting pair with the rewriting probability less than a preset probability threshold value aiming at each group of lyric rewriting pairs in the K group of lyric rewriting pairs to obtain a rewriting pair candidate set corresponding to the group of lyric rewriting pairs.
4. The method of claim 1, the obtaining the text set of lyrics comprising:
acquiring more than one original lyric text;
preprocessing the more than one original lyric text to obtain more than one corresponding preprocessed lyric text, wherein the preprocessing comprises word segmentation processing of each sentence of lyrics in each lyric text;
determining each preprocessed lyric text exceeding a preset text data amount from the more than one preprocessed lyric text to form a lyric text set;
and deleting each sentence of lyrics which do not exceed the preset number of words in the preprocessed lyric text set aiming at each preprocessed lyric text in the lyric text set to obtain a lyric text corresponding to the preprocessed lyric text.
5. The method of claim 1, wherein performing a sentence pair extraction process on each lyric text in the lyric text set to obtain a set of lyric sentence pairs comprises:
acquiring every two adjacent lyrics in the lyric text aiming at each lyric text in the lyric text set to form each lyric text pair in the lyric text;
and forming a lyric sentence pair set according to the lyric sentence pair of each lyric text in the lyric text set.
6. The method of any one of claims 1-5, further comprising, after said generating a candidate set of K modifier pairs corresponding to said K groups of song modifier pairs:
and for each word in the target lyric text, determining a target rewriting word pair candidate set for the word from the K rewriting word pair candidate sets, and rewriting the word according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, wherein the lyric content of the new lyric text is different from that of the target lyric text.
7. A lyric rewriting method is applied to an electronic device, and comprises the following steps:
receiving a lyric rewriting request of a user, and specifying a target lyric text to be rewritten in the lyric rewriting request;
for each word in the target lyric text, determining a target rewriting word pair candidate set for the word from K rewriting word pair candidate sets, and rewriting the word according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, wherein the K rewriting word pair candidate sets are obtained according to the lyric text processing method of any one of claims 1-5;
presenting the new lyrics text to the user.
8. A rhyme text rewriting method includes:
obtaining a rhyme retention text set, and performing statement pair extraction processing on each rhyme retention text in the rhyme retention text set to obtain a statement pair set, wherein the statement pair set comprises more than one statement pair, and each statement pair comprises two adjacent statements;
performing word pair extraction processing on each sentence pair in the sentence pair set to generate K groups of rhyme text rewriting word pairs, wherein each group of rhyme text rewriting word pairs comprises more than one rhyme text rewriting word pair aiming at the same source word, and K is a positive integer;
determining the rewriting probability of each rhyme text rewriting word pair in the K groups of rhyme text rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of rhyme text rewriting word pairs according to the K groups of rhyme text rewriting word pairs and the rewriting probability of each rhyme text rewriting word pair, wherein each rewriting word pair candidate set comprises more than one rhyme text rewriting word pair aiming at the same rewriting source word and the rewriting probability of each rhyme text rewriting word pair aiming at the rewriting word pair candidate set;
and rewriting a target rhyme text according to the plurality of rewritten word pair candidate sets in the K rewritten word pair candidate sets to obtain a new rhyme text corresponding to the target rhyme text.
9. A lyric text processing device is applied to an electronic device, and comprises:
the system comprises a first sentence pair extraction unit, a second sentence pair extraction unit and a lyric analysis unit, wherein the first sentence pair extraction unit is used for acquiring a lyric text set and extracting each lyric text in the lyric text set in a sentence pair manner to obtain a lyric sentence pair set, and the lyric sentence pair set comprises more than one lyric sentence pair;
the first word pair extraction unit is used for carrying out word pair extraction processing on each song word and sentence pair in the song word and sentence pair set to obtain K groups of song word rewriting word pairs, each group of song word rewriting word pairs comprises more than one song word rewriting word pair aiming at the same source word, and K is a positive integer;
and the first word pair set generation unit is used for determining the rewriting probability of each lyric rewriting word pair in the K groups of lyric rewriting word pairs, and generating K rewriting word pair candidate sets corresponding to the K groups of lyric rewriting word pairs according to the K groups of lyric rewriting word pairs and the rewriting probability of each lyric rewriting word pair, wherein each rewriting word pair candidate set comprises more than one lyric rewriting word pair aiming at the same source word and the rewriting probability of each lyric rewriting word pair aiming at the rewriting word pair candidate set.
10. The apparatus of claim 9, the first word pair extraction unit, comprising:
the word alignment subunit is used for performing word alignment on each song and phrase pair in the song and phrase pair set through a trained word alignment model to obtain an original rewritten word pair set corresponding to the song and phrase pair set, wherein the trained word alignment model is obtained based on lyric sample training;
a word pair deleting subunit, configured to delete each lyric rewrite word pair in the original rewrite word pair set that does not meet a preset rewrite condition, to obtain a lyric rewrite word pair set;
and the word pair grouping subunit is used for grouping the lyric rewriting words to the concentrated lyric rewriting word pairs according to the source words of each lyric rewriting word pair in the lyric rewriting word pair set to obtain the K groups of lyric rewriting word pairs.
11. The apparatus of claim 9, wherein the first set of word pairs generating unit is specifically configured to:
determining the generation times of each lyric rewriting word pair in the K groups of lyric rewriting word pairs;
determining the rewriting probability of each lyric rewriting word pair in the group of lyric rewriting word pairs according to the generation times of each lyric rewriting word pair in the group of lyric rewriting word pairs aiming at each group of lyric rewriting word pairs;
and deleting each lyric rewriting pair with the generation frequency less than a preset generation frequency threshold value in the group of lyric rewriting pairs and/or each lyric rewriting pair with the rewriting probability less than a preset probability threshold value aiming at each group of lyric rewriting pairs in the K group of lyric rewriting pairs to obtain a rewriting pair candidate set corresponding to the group of lyric rewriting pairs.
12. The apparatus of claim 9, the first sentence pair extraction unit comprising:
a lyric text acquisition subunit to:
acquiring more than one original lyric text;
preprocessing the more than one original lyric text to obtain more than one corresponding preprocessed lyric text, wherein the preprocessing comprises word segmentation processing of each sentence of lyrics in each lyric text;
determining each preprocessed lyric text exceeding a preset text data amount from the more than one preprocessed lyric text to form a lyric text set;
and deleting each sentence of lyrics which do not exceed the preset number of words in the preprocessed lyric text set aiming at each preprocessed lyric text in the lyric text set to obtain a lyric text corresponding to the preprocessed lyric text.
13. The apparatus of claim 9, the first sentence pair extraction unit comprising:
a text processing subunit to:
acquiring every two adjacent lyrics in the lyric text aiming at each lyric text in the lyric text set to form each lyric text pair in the lyric text;
and forming a lyric sentence pair set according to the lyric sentence pair of each lyric text in the lyric text set.
14. The apparatus of any of claims 9-13, further comprising:
and the lyric rewriting unit is used for determining a target rewriting word pair candidate set aiming at the words from the K rewriting word pair candidate sets aiming at each word in the target lyric text, and rewriting the words according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, wherein the lyric content of the new lyric text is different from that of the target lyric text.
15. A lyric rewriting apparatus applied to an electronic device, the apparatus comprising:
the system comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving a lyric rewriting request of a user and appointing a target lyric text to be rewritten in the lyric rewriting request;
a rewriting unit, configured to determine, for each word in the target lyric text, a target rewriting word pair candidate set for the word from the K rewriting word pair candidate sets, and rewrite the word according to the target rewriting word pair candidate set to obtain a new lyric text corresponding to the target lyric text, where the K rewriting word pair candidate sets are obtained according to the lyric text processing method of any one of claims 1 to 5;
and the presentation unit is used for presenting the new lyric text to the user.
16. An rhyme text rewriting apparatus comprising:
the second sentence pair extraction unit is used for acquiring a rhyme-rhyme text set and performing sentence pair extraction processing on each rhyme text in the rhyme text set to obtain a sentence pair set, wherein the sentence pair set comprises more than one sentence pair, and each sentence pair comprises two adjacent sentences;
a second word pair extraction unit, configured to perform word pair extraction processing on each statement pair in the statement pair set to generate K sets of rhyme text rewritten word pairs, where each set of rhyme text rewritten word pairs includes more than one rhyme text rewritten word pair for the same source word, and K is a positive integer;
a second word pair set generating unit, configured to determine a rewriting probability of each rhyme text rewriting word pair in the K sets of rhyme text rewriting word pairs, and generate K rewriting word pair candidate sets corresponding to the K sets of rhyme text rewriting word pairs according to the K sets of rhyme text rewriting word pairs and the rewriting probability of each rhyme text rewriting word pair, where each rewriting word pair candidate set includes more than one rhyme text rewriting word pair for the same rewriting source word, and a rewriting probability of each rhyme text rewriting word pair in the rewriting word pair candidate set;
and the text rewriting unit is used for rewriting the target rhyme text according to the plurality of rewritten word pair candidate sets in the K rewritten word pair candidate sets to obtain a new rhyme text corresponding to the target rhyme text.
17. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1-8 when executing the program.
18. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202010121213.5A 2020-02-26 2020-02-26 Text processing method, device, electronic equipment and storage medium Active CN111401038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010121213.5A CN111401038B (en) 2020-02-26 2020-02-26 Text processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010121213.5A CN111401038B (en) 2020-02-26 2020-02-26 Text processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111401038A true CN111401038A (en) 2020-07-10
CN111401038B CN111401038B (en) 2023-10-27

Family

ID=71430391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010121213.5A Active CN111401038B (en) 2020-02-26 2020-02-26 Text processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111401038B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011430A (en) * 2023-03-22 2023-04-25 暗链科技(深圳)有限公司 Vowel duplication elimination method, nonvolatile readable storage medium and electronic equipment
WO2023217019A1 (en) * 2022-05-07 2023-11-16 北京有竹居网络技术有限公司 Text processing method, apparatus, and system, and storage medium and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002157245A (en) * 2000-11-16 2002-05-31 Nippon Telegr & Teleph Corp <Ntt> Method and device for analyzing syntax and storage medium with syntax analysis program stored therein
US6490549B1 (en) * 2000-03-30 2002-12-03 Scansoft, Inc. Automatic orthographic transformation of a text stream
JP2010230948A (en) * 2009-03-27 2010-10-14 Hitachi East Japan Solutions Ltd Content distribution system and text display method
US20140358519A1 (en) * 2013-06-03 2014-12-04 Xerox Corporation Confidence-driven rewriting of source texts for improved translation
CN108710607A (en) * 2018-04-17 2018-10-26 达而观信息科技(上海)有限公司 Text Improvement and device
CN109117475A (en) * 2018-07-02 2019-01-01 武汉斗鱼网络科技有限公司 A kind of method and relevant device of text rewriting
CN109815493A (en) * 2019-01-09 2019-05-28 厦门大学 A kind of modeling method that the intelligence hip-hop music lyrics generate
CN110717010A (en) * 2018-06-27 2020-01-21 北京嘀嘀无限科技发展有限公司 Text processing method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6490549B1 (en) * 2000-03-30 2002-12-03 Scansoft, Inc. Automatic orthographic transformation of a text stream
JP2002157245A (en) * 2000-11-16 2002-05-31 Nippon Telegr & Teleph Corp <Ntt> Method and device for analyzing syntax and storage medium with syntax analysis program stored therein
JP2010230948A (en) * 2009-03-27 2010-10-14 Hitachi East Japan Solutions Ltd Content distribution system and text display method
US20140358519A1 (en) * 2013-06-03 2014-12-04 Xerox Corporation Confidence-driven rewriting of source texts for improved translation
CN108710607A (en) * 2018-04-17 2018-10-26 达而观信息科技(上海)有限公司 Text Improvement and device
CN110717010A (en) * 2018-06-27 2020-01-21 北京嘀嘀无限科技发展有限公司 Text processing method and system
CN109117475A (en) * 2018-07-02 2019-01-01 武汉斗鱼网络科技有限公司 A kind of method and relevant device of text rewriting
CN109815493A (en) * 2019-01-09 2019-05-28 厦门大学 A kind of modeling method that the intelligence hip-hop music lyrics generate

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023217019A1 (en) * 2022-05-07 2023-11-16 北京有竹居网络技术有限公司 Text processing method, apparatus, and system, and storage medium and electronic device
CN116011430A (en) * 2023-03-22 2023-04-25 暗链科技(深圳)有限公司 Vowel duplication elimination method, nonvolatile readable storage medium and electronic equipment
CN116011430B (en) * 2023-03-22 2024-04-02 暗链科技(深圳)有限公司 Vowel duplication elimination method, nonvolatile readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111401038B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
US11501182B2 (en) Method and apparatus for generating model
CN108874878B (en) Knowledge graph construction system and method
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
US20200193217A1 (en) Method for determining sentence similarity
CN112270379A (en) Training method of classification model, sample classification method, device and equipment
CN110555095A (en) Man-machine conversation method and device
CN111310440B (en) Text error correction method, device and system
CN111309910A (en) Text information mining method and device
CN112163424A (en) Data labeling method, device, equipment and medium
CN112711950A (en) Address information extraction method, device, equipment and storage medium
CN113204967B (en) Resume named entity identification method and system
CN113722493A (en) Data processing method, device, storage medium and program product for text classification
CN112528637A (en) Text processing model training method and device, computer equipment and storage medium
CN111401038A (en) Text processing method and device, electronic equipment and storage medium
CN113704410A (en) Emotion fluctuation detection method and device, electronic equipment and storage medium
CN111368066B (en) Method, apparatus and computer readable storage medium for obtaining dialogue abstract
CN113934834A (en) Question matching method, device, equipment and storage medium
CN113705207A (en) Grammar error recognition method and device
CN113704393A (en) Keyword extraction method, device, equipment and medium
CN115017271B (en) Method and system for intelligently generating RPA flow component block
CN116680387A (en) Dialogue reply method, device, equipment and storage medium based on retrieval enhancement
CN115600605A (en) Method, system, equipment and storage medium for jointly extracting Chinese entity relationship
CN111476003B (en) Lyric rewriting method and device
CN113988048A (en) Emotional cause pair extraction method based on multi-wheel machine reading understanding
CN114896966A (en) Method, system, equipment and medium for positioning grammar error of Chinese text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant