WO2022183923A1

WO2022183923A1 - Phrase generation method and apparatus, and computer readable storage medium

Info

Publication number: WO2022183923A1
Application number: PCT/CN2022/077155
Authority: WO
Inventors: 朱鹏军; 巨荣辉; 崔明; 葛一迪; 刘朋樟
Original assignee: 北京沃东天骏信息技术有限公司; 北京京东世纪贸易有限公司
Priority date: 2021-03-03
Filing date: 2022-02-22
Publication date: 2022-09-09
Also published as: CN113761114A

Abstract

The present disclosure relates to the technical field of computers, and relates to a phrase generation method and apparatus, and a computer readable storage medium. The method of the present disclosure comprises: for each obtained initial phrase, determining the part-of-speech and order of each word in the initial phrase to obtain a part-of-speech combination of the initial phrase, wherein the part-of-speech combination is the part-of-speech of each word arranged according to the order of each word; selecting one or more part-of-speech combinations according to the number of times of occurrence of each part-of-speech combination; selecting, from the words of an alternative text, a word that conforms to the part-of-speech in the selected part-of-speech combination, and according to the selected part-of-speech combination, generating a phrase as an alternative phrase; and according to the closeness degree of each word in each alternative phrase, selecting the alternative phrase as a generated phrase.

Description

Phrase generation method, apparatus and computer readable storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the CN application number 202110234468.7 and the filing date is March 3, 2021, and claims its priority. The disclosure content of this CN application is hereby incorporated into this application as a whole.

technical field

The present disclosure relates to the field of computer technology, and in particular, to a phrase generation method, apparatus, and computer-readable storage medium.

Background technique

Objects on Internet platforms are often described using a few phrases. For example, "whitening and moisturizing", "outdoor barbecue" and so on. These phrases can be displayed as object tags, and can also provide an index for the search side, and provide writing materials for generating projects such as text generation. For example, a search index of SKUs can be constructed through the combination of "phrase + product words", so that when users are guided to search for related keywords, related products can be quickly locked.

These phrases are fixed fragments of two or more words that form a certain combination and are often used together in different sentences. The method for generating phrases on the Internet platform known to the inventors is to artificially set some rules for word combination, and combine words according to the rules to obtain phrases.

SUMMARY OF THE INVENTION

According to some embodiments of the present disclosure, a method for generating a phrase is provided, including: for each acquired initial phrase, determining the parts of speech and order of each participle in the initial phrase, and obtaining a part-of-speech combination of the initial phrase, wherein the part-of-speech combination is according to The part of speech of each participle arranged in order of each participle; according to the number of occurrences of each part of speech combination, one or more part of speech combinations are selected; And according to the selected part-of-speech combination, a phrase is generated as an alternative phrase; according to the closeness of each participle in each alternative phrase, an alternative phrase is selected as the generated phrase.

In some embodiments, selecting the candidate phrase as the generated phrase according to the closeness of each participle in each candidate phrase includes: for each candidate phrase, according to each participle in the candidate phrase, respectively in the preset text The number of occurrences and the number of consecutive occurrences of each participle in the preset text determine the degree of closeness of each participle in the candidate phrase; the candidate phrase whose degree of closeness is not lower than the threshold of the degree of closeness is selected as the generated phrase.

In some embodiments, for each candidate phrase, the degree of closeness of each participle in the candidate phrase is the product of the probability of each participle appearing continuously in the preset text and the probability of each participle appearing in the preset text respectively ratio.

In some embodiments, selecting one or more part-of-speech combinations according to the number of occurrences of each part-of-speech combination includes: for each part-of-speech combination, according to the number of occurrences of the part-of-speech combination, the maximum number of occurrences of each part-of-speech combination, and The minimum number of times determines the weight of the part-of-speech combination; select one or more part-of-speech combinations whose weight is not lower than the weight threshold.

In some embodiments, the method further includes: when the generated phrase includes multiple phrases with the same participle and different order of the participles, determining a probability of occurrence of the participle sequence of each phrase in the multiple phrases; The probability of occurrence of the word segmentation sequence of the phrase is used to determine the fluency of each phrase; according to the fluency of each phrase, one or more phrases are selected and updated to the generated phrase.

In some embodiments, determining the occurrence probability of the word segmentation sequence of each phrase in the plurality of phrases includes: inputting the word segmentation sequence of each phrase into a pre-trained natural language processing model to obtain the occurrence probability of the word segmentation sequence of each phrase.

In some embodiments, determining the fluency of each phrase according to the probability of occurrence of the word segmentation sequence of each phrase includes: for each phrase, taking the inverse of the probability of the occurrence of the word segmentation sequence of the phrase, and taking the square of the number of word segmentations to obtain The fluency of the phrase.

In some embodiments, the method further includes: selecting a plurality of phrases in the training corpus as initial phrases according to the similarity between each phrase in the training corpus and the seed phrase.

In some embodiments, according to the similarity between each phrase in the training corpus and the seed phrase, selecting a plurality of phrases in the training corpus as initial phrases includes: respectively determining the vectors of each phrase and the seed phrase in the training corpus; The similarity between the vectors of each phrase in the training corpus and the vector of the seed phrase is determined to determine the similarity between each phrase in the training corpus and the seed phrase; multiple phrases whose similarity to the seed phrase is not lower than the similarity threshold are selected as initial phrases .

In some embodiments, the method further includes: according to the similarity between each participle in the training corpus and the first seed participle, selecting a plurality of participles in the training corpus as initial participles; performing each initial participle with the second seed participle respectively; combine to get multiple seed phrases.

According to other embodiments of the present disclosure, a phrase generation device is provided, comprising: a part-of-speech combination determination module, configured to determine, for each acquired initial phrase, the part-of-speech and order of each participle in the initial phrase, and obtain the part-of-speech of the initial phrase Combination, wherein, the part-of-speech combination is the part-of-speech of each participle arranged in the order of each participle; the part-of-speech combination selection module is used to select one or more part-of-speech combinations according to the number of times each part-of-speech combination occurs; the alternative phrase generation module, It is used to filter out the participles that match the part of speech in the selected part of speech combination from each participle of the candidate text, and generate a phrase according to the selected part of speech combination as an alternative phrase; the phrase generation module is used to generate a phrase according to each alternative phrase. The tightness of each participle is selected, and the candidate phrase is selected as the generated phrase.

According to further embodiments of the present disclosure, a phrase generation apparatus is provided, comprising: a processor; and a memory coupled to the processor for storing instructions, and when the instructions are executed by the processor, the processor executes any of the foregoing Phrase generation method of an embodiment.

According to further embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the phrase generation method of any of the foregoing embodiments.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings.

Description of drawings

The accompanying drawings described herein are used to provide a further understanding of the present disclosure and constitute a part of this application, and the exemplary embodiments of the present disclosure and their descriptions are configured to explain the present disclosure and do not constitute an improper limitation of the present disclosure.

FIG. 1 shows a schematic flowchart of a phrase generation method according to some embodiments of the present disclosure.

FIG. 2 shows a schematic flowchart of a phrase generation method according to other embodiments of the present disclosure.

FIG. 3 shows a schematic flowchart of a method for generating phrases according to further embodiments of the present disclosure.

FIG. 4 shows a schematic structural diagram of a phrase generating apparatus according to some embodiments of the present disclosure.

FIG. 5 shows a schematic structural diagram of a phrase generating apparatus according to other embodiments of the present disclosure.

FIG. 6 shows a schematic structural diagram of a phrase generating apparatus according to further embodiments of the present disclosure.

Detailed ways

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application or uses in any way. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure.

The inventor found that the rules set manually are not necessarily very general, and may generate a large number of phrases with poor quality, for example, the words in the phrases are completely irrelevant, and the meaning of the expression is unclear.

A technical problem to be solved by the present disclosure is: how to improve the quality and efficiency of phrase generation.

The present disclosure provides a phrase generation method, which will be described below with reference to FIGS. 1 to 3 .

FIG. 1 is a flowchart of some embodiments of the disclosed phrase generation method. As shown in FIG. 1 , the method of this embodiment includes steps S102 to S108.

In step S102, for each acquired initial phrase, the parts of speech and the sequence of each participle in the initial phrase are determined to obtain a part-of-speech combination of the initial phrase.

First, a plurality of initial phrases are acquired, and these initial phrases may be generated based on a small number of seed words. The method for generating the initial phrases will be described in subsequent embodiments. In addition, the initial phrase can also be preset. The initial phrase may be a phrase representing a preset dimension, such as a time dimension, a food dimension, or a cosmetic dimension, and the like. For example, phrases in the time dimension may include words that describe time, such as Spring Festival, Mid-Autumn Festival, and the like. Initial phrases of different dimensions can be selected according to actual requirements, so that the subsequently generated phrases belong to the same dimension as the initial phrase.

In some embodiments, word segmentation and part-of-speech tagging are performed for each initial phrase to obtain the parts of speech of each participle in the initial phrase, and the parts of speech of each participle are combined according to the order of each participle to obtain the part-of-speech combination of the initial phrase. That is, the part-of-speech combination is the part-of-speech of each participle arranged in the order of each participle. Word segmentation and part-of-speech tagging can be performed on each initial phrase using existing natural language processing (NLP) algorithms. For example, part-of-speech tagging is used to determine participles as nouns, verbs, and the like. Table 1 shows examples of part-of-speech combinations of some phrases.

Table 1

As shown in Table 1, t represents a time word, v represents a verb, n represents a noun, r represents a pronoun, and a represents an adjective. For example, the part-of-speech combination of giving gifts on the Mid-Autumn Festival is t-v, that is, time-word-verb.

In step S104, one or more part-of-speech combinations are selected according to the number of occurrences of each part-of-speech combination.

The more times the part-of-speech combination appears, the higher the probability of being selected. In some embodiments, for each part-of-speech combination, the weight of the part-of-speech combination is determined according to the number of occurrences of the part-of-speech combination, the maximum number of times and the minimum number of occurrences of each part-of-speech combination; a weight that is not lower than the weight threshold is selected. one or more part-of-speech combinations. For example, the following formula is used to determine the weight of each part-of-speech combination.

In formula (1), x _i represents the number or frequency of the occurrence of part-of-speech combination i, i is a positive integer, x _max represents the maximum number of occurrences of each part-of-speech combination, and x _min represents the minimum number of occurrences of each part-of-speech combination . As shown in Table 2, the occurrence times and weights of various part-of-speech combinations are shown.

Table 2

	词性组合part-of-speech combination	次数frequency	权重Weights
11	t-vt-v	150150	0.990.99
22	t-nt-n	138138	0.930.93
33	t-at-a	109109	0.8860.886
44	v-n-r-n-av-n-r-n-a	2626	0.450.45
55	……	……	……

For example, the weight threshold can be set to 0.35, and part-of-speech combinations with a weight higher than 0.35 can be selected. The weight threshold can be set according to actual needs. The higher the weight threshold is set, the higher the probability of the selected part-of-speech combination appears, that is, the more general the part-of-speech combination is, the more general the subsequent generated phrases, and the higher the efficiency. The lower the weight threshold is set, the more part-of-speech combinations are selected, and the more phrases are finally generated, which can cover more types and richer phrases.

In step S106 , a word segment that matches the part of speech in the selected part-of-speech combination is screened from each participle of the candidate text, and a phrase is generated according to the selected part-of-speech combination as a candidate phrase.

First, obtain alternative texts, which can be titles of online platforms, search texts, or commented articles. Perform word segmentation and part-of-speech tagging on each candidate text to obtain the word segmentation and part-of-speech in each candidate text. According to the selected part-of-speech combination, traverse each participle and part-of-speech in the candidate text to form candidate phrases, and obtain the candidate phrase set. For example, based on the product title "New Year gift box, New Year gift box, Spring Festival gift box, New Year gift box, high-end gift box, enterprise group purchase, customized logo, gift gift, New Year's gift, New Year's gift box, customized nut gift box, Customized D-Double fruit tray basket [health gift box] gift box", alternative phrases can be obtained: "New Year's custom gift box (t-v-n), Spring Festival gift box (a-n), Spring Festival high-end gift box (t-a-n), New Year's goods wholesale (n-v), Health enterprise (n-v), Health gift box (v-n), New Year gift package (t-n), Customized enterprise (v-n )…"and many more.

In step S108, according to the closeness of each word segment in each candidate phrase, the candidate phrase is selected as the generated phrase.

The degree of closeness of each participle in the candidate phrase is used to indicate the degree of correlation or interdependence of each participle. The higher the degree of correlation or interdependence of each participle, the higher the degree of closeness. In some embodiments, for each candidate phrase, according to the number of times each participle in the candidate phrase appears in the preset text respectively and the number of times each participle appears in the preset text continuously, determine each part in the candidate phrase. The degree of closeness of the word segmentation; the candidate phrases whose closeness degree is not lower than the closeness degree threshold are selected as the generated phrases.

The preset text can be selected high-quality text, for example, the comment text provided by the user whose credit is higher than the credit threshold, the comment text provided by the user whose number of friends is higher than the threshold of the number of friends, or the user whose number of followers is higher than the threshold of the number of followers. The provided comment text, or the comment text with a number of comments above the threshold of the number of comments or the number of views above the threshold of the number of views, or the text of a search, a title with a number of searches above the threshold of the number of searches, or the number of views above the threshold of the number of views title, etc. The preset text is more in line with the user's behavioral habits, and it is more accurate to evaluate the closeness of each participle of the candidate phrase based on the preset text.

For each candidate phrase, the fewer times each participle appears in the preset text, and the more times each participle appears in the preset text in succession, the higher the degree of closeness of each participle. In some embodiments, for each candidate phrase, the degree of closeness of each participle in the candidate phrase is the product of the probability of each participle appearing continuously in the preset text and the probability of each participle appearing in the preset text respectively ratio. The probability or number of times that each word segment appears consecutively in the preset text may not distinguish the sequence of each word segment. For example, if "Spring Festival gift box" appears once, and "gift box Spring Festival" appears once, then "Spring Festival" and "gift box" appear twice in a row. The closeness of each participle in the candidate phrase can be expressed by the following formula.

In formula (2), P(u,v,...) represents the probability that each participle in the candidate phrase appears consecutively in the preset text, P(u), P(v)...respectively represent each participle in the preset text. probability of appearing in .

For example, if there are 1 million words (including repetitions) in the preset text, among them, "Spring Festival" appears 80,000 times, "gift box" appears 65,000 times, and "Spring Festival and gift box" together appears 50,000 times. Then you can Calculate the probability expectation value of Spring Festival, gift box, and Spring Festival gift box respectively: P(Spring Festival) = 0.08, P(gift box) = 0.65, P(Spring Festival, gift box) = 0.05, then based on formula (2), get "Spring Festival" and "Gift box"' closeness

In the method of the above embodiment, a plurality of initial phrases are obtained first, the part-of-speech combination of each initial phrase is determined, and then one or more part-of-speech combinations are selected according to the number of occurrences of various part-of-speech combinations. Generate alternative phrases from alternative text based on selected part-of-speech combinations. Further based on the closeness of each participle in the candidate phrase, the candidate phrase is selected as the generated phrase. The method of the above embodiment can select a large number of candidate phrases from the candidate text based on a small number of initial phrases, and the part-of-speech combination of these candidate phrases is more general and more logical, and further according to the participle of the candidate phrases. The degree of closeness is screened, so that the degree of closeness of the word segmentation in the final generated phrase is higher, avoiding completely irrelevant word segmentation to form a phrase, and improving the quality and efficiency of phrase generation.

In some embodiments, at least one of a title of the object and an index to search is generated according to the generated phrase.

The phrases generated by the method of the above embodiment are more reasonable, and the quality and efficiency are improved. In order to further improve the quality and efficiency of the phrases, the present disclosure further screens the generated phrases. The following describes another aspect of the phrase generation method of the present disclosure with reference to FIG. 2 . some examples.

FIG. 2 is a flow chart of other embodiments of the method for generating phrases of the present disclosure. As shown in FIG. 2, after step S108, the method of this embodiment further includes: steps S202-S206.

In step S202, when the generated phrase includes multiple phrases with the same participle and different order of the participles, the probability of occurrence of the participle sequence of each phrase in the multiple phrases is determined.

For example, "Spring Festival gift box" and "Gift box Spring Festival" belong to two phrases with the same participle but different order of participles. Based on the method of the foregoing embodiment, the degree of closeness corresponding to the two is the same, but "Spring Festival gift box" belongs to more fluent, Higher quality phrases. Therefore, perform the next steps to filter for higher quality phrases. For example, inputting the sequence of participles for each phrase into a pre-trained natural language processing model, the probability of occurrence of the sequence of participles for each phrase is obtained. The natural language processing model is, for example, an N-Gram model, and an existing model can be used, which will not be repeated here. The natural language processing model can be trained in advance using the corpus of the Internet platform.

In step S204, the fluency of each phrase is determined according to the probability of occurrence of the word segmentation sequence of each phrase.

The higher the probability of the occurrence of the participle sequence of a phrase, the higher the fluency of the phrase. For example, for each phrase, the reciprocal of the probability of occurrence of the sequence of the phrase's participles is squared according to the number of participles to obtain the fluency of the phrase. The following formula can be used to determine the fluency of a phrase.

In formula (3), P(w ₁ w ₂ . . . w _N ) represents the probability of occurrence of the word segmentation sequence of the phrase.

In step S206, one or more phrases are selected and updated to the generated phrases according to the fluency of each phrase.

For example, a phrase with the highest degree of fluency among the phrases is selected to be updated as the generated phrase, or one or more phrases with a degree of fluency higher than a threshold of fluency are selected to be updated as the generated phrase.

In the method of the above embodiment, selecting a phrase with a higher degree of fluency further improves the quality and efficiency of phrase generation.

The following describes how the initial phrase in the foregoing embodiment is generated in conjunction with FIG. 3 .

FIG. 3 is a flowchart of further embodiments of the disclosed phrase generation method. As shown in Fig. 3, the method of this embodiment includes steps S302-S306.

In step S302, according to the similarity between each participle in the training corpus and the first seed participle, a plurality of participles in the training corpus are selected as initial participles.

First, the training corpus can be preprocessed, and the training is expected to be various corpora obtained from the Internet platform, or it can be obtained from an open source corpus. Preprocessing includes mainly converting traditional Chinese to simplified, uppercase to lowercase, and deleting special characters (such as @, #, &, etc.). Perform word segmentation and part-of-speech tagging on the preprocessed training corpus. Similar preprocessing processes can also be performed on the candidate texts and preset texts in the foregoing embodiments, and word segmentation and part-of-speech tagging can also be performed in advance.

The initial phrase may represent a phrase of a preset dimension, and the first seed segment may also be a seed segment representing a preset dimension. Taking the time dimension as an example, the first seed participles are, for example, Women's Day, Children's Day, Dragon Boat Festival, Memorial Day, and so on.

In some embodiments, the similarity between each participle in the training corpus and the first seed participle is determined respectively, and a participle whose similarity is not lower than a threshold of the similarity of the participle is selected as the initial participle. For example, determine the vectors of each participle and the first seed participle in the training corpus respectively; determine each participle and the first seed participle in the training corpus according to the similarity between the vectors of each participle in the training corpus and the vector of the first seed participle The similarity of the first seed participle is not lower than the similarity threshold of the first seed participle as the initial participle.

Each word segment and the first seed segment in the training corpus can be input into the pre-trained word vector model to obtain the vector of each segment and the vector of the first seed segment. The word vector model may adopt an existing model, such as a Bert model, etc., and is not limited to the examples. The similarity between the vector of each segment and the vector of the first seed segment can be calculated by using cosine similarity. For example, the following formula is used to calculate the similarity between each participle and the first seed participle.

In formula (4), s _i represents the vector of the ith participle, i is a positive integer, and s _j represents the vector of the first seed participle. For example, according to the above method, a participle similar to the first seed participle "Mid-Autumn Festival" can be obtained, as shown in Table 3.

table 3

IdId	相似词similar words	相似度similarity
11	端午节Dragon Boat Festival	0.877297750.87729775
22	国庆节National Day	0.796896150.79689615
33	重阳节Double Ninth Festival	0.782419740.78241974
44	七夕节Qixi Festival	0.737594420.73759442
55	….. …	……...

In step S304, each initial participle is combined with the second seed participle to obtain a plurality of seed phrases.

There may be multiple second seed word segments, and each initial word segment may be combined with each second seed word segment to obtain multiple seed phrases.

In step S306, according to the similarity between each phrase in the training corpus and the seed phrase, a plurality of phrases in the training corpus are selected as initial phrases.

In some embodiments, the vectors of each phrase and the seed phrase in the training corpus are respectively determined; according to the similarity of the vector of each phrase in the training corpus and the vector of the seed phrase, the similarity between each phrase in the training corpus and the seed phrase is determined multiple phrases whose similarity to the seed phrase is not lower than the similarity threshold are selected as initial phrases. The vector of each phrase and the vector of the seed phrase can be determined by using the word vector model, and the method of determining the similarity between each phrase and the seed phrase can be the same as or similar to the method of determining the similarity between each word segment and the first seed segment.

The method of the above embodiment can mine a large number of initial phrases from the training corpus based on a small number of first seed segmentations and second seed segmentations for subsequent generation of phrases, thereby improving the quality and richness of phrases.

The present disclosure also provides an apparatus for generating a phrase, which will be described below with reference to FIG. 5 .

FIG. 4 is a block diagram of some embodiments of the disclosed phrase generating apparatus. As shown in FIG. 4 , the apparatus 40 in this embodiment includes: a part-of-speech combination determination module 410 , a part-of-speech combination selection module 420 , a candidate phrase generation module 430 , and a phrase generation module 440 .

The part-of-speech combination determining module 410 is configured to, for each acquired initial phrase, determine the part-of-speech and order of each participle in the initial phrase, and obtain a part-of-speech combination of the initial phrase, wherein the part-of-speech combination is the part-of-speech of each participle arranged in the order of each participle.

In some embodiments, the part-of-speech combination determining module 410 is configured to select a plurality of phrases in the training corpus as initial phrases according to the similarity between each phrase in the training corpus and the seed phrase.

In some embodiments, the part-of-speech combination determination module 410 is used to determine the vectors of each phrase and the seed phrase in the training corpus respectively; according to the similarity between the vectors of each phrase in the training corpus and the vector of the seed phrase, The similarity between each phrase and the seed phrase; multiple phrases whose similarity to the seed phrase is not lower than the similarity threshold are selected as initial phrases.

In some embodiments, the part-of-speech combination determination module 410 is configured to select a plurality of word segments in the training corpus as initial word segments according to the similarity between each word segment in the training corpus and the first seed segment; Participles are combined to obtain multiple seed phrases.

The part-of-speech combination selection module 420 is configured to select one or more part-of-speech combinations according to the number of occurrences of each part-of-speech combination.

In some embodiments, the part-of-speech combination selection module 420 is configured to, for each part-of-speech combination, determine the weight of the part-of-speech combination according to the number of occurrences of the part-of-speech combination, the maximum number and the minimum number of occurrences of each part-of-speech combination; select the weight One or more part-of-speech combinations not lower than the weight threshold.

The candidate phrase generation module 430 is configured to filter out word segmentations that match the selected part-of-speech combination from each participle of the candidate text, and generate a phrase according to the selected part-of-speech combination as a candidate phrase.

The phrase generation module 440 is configured to select candidate phrases as the generated phrases according to the closeness of each word segment in each candidate phrase.

In some embodiments, the phrase generation module 440 is configured to, for each candidate phrase, determine according to the number of times that each participle in the candidate phrase appears in the preset text respectively and the number of times that each participle appears in the preset text continuously. The degree of closeness of each participle in the candidate phrase; the candidate phrase whose degree of closeness is not lower than the threshold of the degree of closeness is selected as the generated phrase.

In some embodiments, the phrase generation module 440 is further configured to determine the probability of occurrence of the word segmentation sequence of each phrase in the plurality of phrases when the generated phrase includes a plurality of phrases with the same word segmentation and different order of the word segmentation; According to the probability of occurrence of the word segmentation sequence of each phrase, the fluency of each phrase is determined; according to the fluency of each phrase, one or more phrases are selected and updated to the generated phrase.

In some embodiments, the phrase generation module 440 is configured to input the word segmentation sequence of each phrase into a pre-trained natural language processing model to obtain the probability of occurrence of the word segmentation sequence of each phrase.

In some embodiments, the phrase generation module 440 is configured to, for each phrase, take the reciprocal of the probability of occurrence of the sequence of word segments of the phrase, and square the number of the segment words to obtain the fluency of the phrase.

The phrase generating apparatuses in the embodiments of the present disclosure may each be implemented by various computing devices or computer systems, which will be described below in conjunction with FIG. 5 and FIG. 6 .

FIG. 5 is a block diagram of some embodiments of the disclosed phrase generating apparatus. As shown in FIG. 5 , the apparatus 50 of this embodiment includes a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to execute any of the implementations of the present disclosure based on instructions stored in the memory 510 The phrase generation method in the example.

The memory 510 may include, for example, a system memory, a fixed non-volatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.

FIG. 6 is a structural diagram of other embodiments of the phrase generating apparatus of the present disclosure. As shown in FIG. 6 , the apparatus 60 of this embodiment includes: a memory 610 and a processor 620 , which are similar to the memory 510 and the processor 520 respectively. It may also include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These

interfaces

630 , 640 , 650 and the memory 610 and the processor 620 can be connected, for example, through a bus 660 . The input and output interface 630 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides a connection interface for various networked devices, for example, it can be connected to a database server or a cloud storage server. The storage interface 650 provides a connection interface for external storage devices such as SD cards and U disks.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce An apparatus configured to implement the functions specified in a flow or flows of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps configured to implement the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.

The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present disclosure shall be included in the protection of the present disclosure. within the range.

Claims

A phrase generation method including:

For each obtained initial phrase, determine the part of speech and order of each participle in the initial phrase, and obtain the part of speech combination of the initial phrase, wherein the part of speech combination is the part of speech of each participle arranged in the order of each participle;

According to the number of occurrences of each part-of-speech combination, select one or more part-of-speech combinations;

Screen out the participles that match the part of speech in the selected part-of-speech combination from each participle of the candidate text, and generate a phrase according to the selected part-of-speech combination as an alternative phrase;

According to the closeness of each participle in each candidate phrase, the candidate phrase is selected as the generated phrase.
The phrase generation method according to claim 1, wherein, according to the tightness of each participle in each candidate phrase, selecting the candidate phrase as the generated phrase comprises:

For each candidate phrase, determine the degree of closeness of each participle in the candidate phrase according to the number of times each participle in the candidate phrase appears in the preset text respectively and the number of times each participle continuously appears in the preset text;

Candidate phrases whose closeness is not lower than the closeness threshold are selected as the generated phrases.
The phrase generation method according to claim 2, wherein, for each candidate phrase, the degree of closeness of each participle in the candidate phrase is the probability that each participle continuously appears in the preset text and the probability of each participle appearing in the preset text respectively. The ratio of the products of the probabilities of occurrence in .
The phrase generation method according to claim 1, wherein, according to the number of occurrences of each part-of-speech combination, selecting one or more part-of-speech combinations comprises:

For each part-of-speech combination, determine the weight of the part-of-speech combination according to the number of occurrences of the part-of-speech combination, the maximum number and the minimum number of occurrences of each part-of-speech combination;

Select one or more part-of-speech combinations whose weight is not lower than the weight threshold.
The phrase generation method according to claim 1, further comprising:

In the case where the generated phrase includes multiple phrases with the same participle and different order of the participles, determining the probability of occurrence of the participle sequence of each phrase in the multiple phrases;

Determine the fluency of each phrase according to the probability of occurrence of the word segmentation sequence of each phrase;

According to the fluency of each phrase, one or more phrases are selected and updated to the generated phrases.
The phrase generation method according to claim 5, wherein said determining the probability of occurrence of the word segmentation sequence of each phrase in the plurality of phrases comprises:

The word segmentation sequence of each phrase is input into the pre-trained natural language processing model, and the probability of occurrence of the segmentation sequence of each phrase is obtained.
The phrase generation method according to claim 5, wherein the determining the fluency of each phrase according to the probability of occurrence of the word segmentation sequence of each phrase comprises:

For each phrase, the reciprocal of the probability of occurrence of the sequence of the phrase's participles is squared according to the number of participles to obtain the fluency of the phrase.
The phrase generation method according to claim 1, further comprising:

According to the similarity between each phrase in the training corpus and the seed phrase, multiple phrases in the training corpus are selected as initial phrases.
The phrase generation method according to claim 8, wherein, according to the similarity between each phrase in the training corpus and the seed phrase, selecting a plurality of phrases in the training corpus as initial phrases comprises:

Determine the vectors of each phrase and seed phrase in the training corpus respectively;

Determine the similarity between each phrase in the training corpus and the seed phrase according to the similarity between the vector of each phrase in the training corpus and the vector of the seed phrase;

Multiple phrases whose similarity to the seed phrase is not lower than the similarity threshold are selected as initial phrases.
The phrase generation method according to claim 8, further comprising:

According to the similarity between each participle in the training corpus and the first seed participle, multiple participles in the training corpus are selected as initial participles;

Combine each initial participle with the second seed participle to obtain multiple seed phrases.
A phrase generating device, comprising:

The part-of-speech combination determination module is used to determine the part-of-speech and the order of each participle in the initial phrase for each acquired initial phrase, and obtain the part-of-speech combination of the initial phrase, wherein the part-of-speech combination is arranged in the order of each participle The part of speech of each participle;

The part-of-speech combination selection module is used to select one or more part-of-speech combinations according to the number of occurrences of each part-of-speech combination;

The alternative phrase generation module is used to screen out the participles that conform to the selected part of speech combination from each participle of the alternative text, and generate a phrase according to the selected part of speech combination as an alternative phrase;

The phrase generation module is used for selecting candidate phrases as the generated phrases according to the closeness of each participle in each candidate phrase.
A phrase generating device, comprising:

processor; and

a memory coupled to the processor for storing instructions which, when executed by the processor, cause the processor to perform the phrase generation method of any one of claims 1-10.
A non-transitory computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the steps of the method of any one of claims 1-10.