WO2024011813A1 - 一种文本扩展方法、装置、设备及介质 - Google Patents

一种文本扩展方法、装置、设备及介质 Download PDF

Info

Publication number
WO2024011813A1
WO2024011813A1 PCT/CN2022/134086 CN2022134086W WO2024011813A1 WO 2024011813 A1 WO2024011813 A1 WO 2024011813A1 CN 2022134086 W CN2022134086 W CN 2022134086W WO 2024011813 A1 WO2024011813 A1 WO 2024011813A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
expanded
target
preset
entity
Prior art date
Application number
PCT/CN2022/134086
Other languages
English (en)
French (fr)
Inventor
郭振华
徐聪
赵雅倩
范宝余
贾麒
刘璐
金良
Original Assignee
山东海量信息技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东海量信息技术研究院 filed Critical 山东海量信息技术研究院
Publication of WO2024011813A1 publication Critical patent/WO2024011813A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the field of short text expansion, and in particular to a text expansion method, device, equipment and medium.
  • short text semantic expansion is to expand short texts with limited semantic information into long texts with richer semantic information. It can be applied to various text rewriting tasks, automatic text generation, data enhancement and text classification tasks and other scenarios.
  • the purpose of existing short text expansion methods is mainly to expand feature words in short texts. For example, the short Weibo text "Friends for Life” can be expanded into a long text “My puppy and I play quietly every day, we are good friends forever” after adding the tag "Dog” or adding the tag "Bestie” Later it can be expanded to "My best friend and I talk about everything and we are lifelong friends.”
  • Tasks similar to this short text expansion are text expansion and short text expansion.
  • the former mainly expands a small amount of text to a large amount of similar text with diverse sentence patterns and semantic fidelity, while the latter expands the limited semantic features of short text to more dimensions. .
  • the three tasks all belong to text enhancement tasks.
  • the main methods include manual annotation, word replacement, syntax tree, back translation, neural network and other methods.
  • Manual annotation is the main method of early corpus expansion. The quality of the expanded corpus is very high, but the work cycle is long and the cost is high.
  • the word replacement method expands the text corpus by replacing non-core words in the text with their synonyms, inserting and deleting function words and particles, etc. This method is fast and convenient, but the expanded text has a single sentence structure.
  • the back-translation method is a text enhancement method that has been used more frequently in recent years. It constructs an enhancement of the source language by translating the source language into another language, and then translating the sentences in the other language back to the source language. Data; although the back-translation method can generate corpus of different sentence types, when the text contains colloquial words, misspelled words, or domain-specific vocabulary, this method can easily cause the semantics of the generated sentences to change.
  • the syntax tree method mainly analyzes the syntactic dependencies and semantic roles of the text, and changes the sentence structure through compiled transformation rules.
  • the purpose of this application is to provide a text expansion method, device, equipment and medium that can expand short text into long text with rich semantics and consistent emotions.
  • the specific plan is as follows:
  • This application discloses a text expansion method, including:
  • the target noun is determined from the text to be expanded, including:
  • a preset part-of-speech tagging tool is used to perform part-of-speech tagging on the text to be expanded to obtain the text to be expanded with part-of-speech tags, including:
  • noun part-of-speech tags include NN tags and NNP tags, and a phrase whose part-of-speech tag is a noun part-of-speech is determined from the text to be expanded with a part-of-speech tag as the target noun, including:
  • determining the text to be expanded and determining the target noun from the text to be expanded includes:
  • entity expansion and semantic expansion are performed on the target noun to determine the target expansion entity and target expansion semantics, including:
  • determining text to be expanded and determining target nouns from the text to be expanded to generate a noun list includes:
  • the method further includes:
  • the knowledge graph is used to determine the hypernym relationship list of the target noun, including:
  • generating an entity extension list based on the superior and subordinate relationship list and the target extended entity and generating a semantic extension list based on the superior and subordinate relationship list and the target extension semantics include:
  • the method before determining the text to be expanded and determining the target noun from the text to be expanded, the method further includes:
  • determine the text to be expanded including:
  • the preset classification rules include a preset text character length, text is collected from a preset social platform, and the preset classification rules are used to classify the text into short text and long text, including:
  • the method before combining the target extended entities and the target extended semantics in pairs and calculating the correlation score between the corresponding target extended entities and the target extended semantics in each combination, the method further includes:
  • the verb phrases and noun phrases in the same long text are determined as relevant phrases, and the relevant phrases are used as training data and input into the preset language representation model for training to obtain the trained model.
  • the target extended entity and the target extended semantics are combined in pairs, and the correlation score between the corresponding target extended entity and the target extended semantics in each combination is calculated, including:
  • the combination of the text to be expanded and the relevance score that satisfies the first preset condition is input into the preset text generation model, including:
  • the score corresponding to each target expansion entity is sorted into a combination of a preset number of groups and the text to be expanded is input into the preset text generation model.
  • the combination of the text to be expanded and the relevance score that satisfies the first preset condition is input into the preset text generation model, including:
  • the preset similarity condition includes a preset semantic similarity
  • the expanded text whose semantic similarity satisfies the preset similarity condition is determined from the expanded text and output as the target expanded text, including:
  • the expanded text whose semantic similarity is greater than the preset semantic similarity is determined from the expanded text and is output as the target expanded text.
  • the expanded text whose semantic similarity is greater than the preset semantic similarity is determined from the expanded text and output as the target expanded text, including:
  • This application also discloses a text expansion device, including:
  • the target noun determination module is used to determine the text to be expanded and determine the target noun from the text to be expanded;
  • the entity semantic expansion module is used to perform entity expansion and semantic expansion of the target noun to determine the target expanded entity and target expanded semantics;
  • the entity semantic combination module is used to combine target extended entities and target extended semantics in pairs, and calculate the correlation score between the corresponding target extended entities and target extended semantics in each combination;
  • the text expansion module is used to input the combination of the text to be expanded and the correlation score that satisfies the first preset condition into the preset text generation model to obtain the expanded text output by the preset text generation model;
  • the target expanded text output module is used to evaluate the semantic similarity between the expanded text and the text to be expanded using the preset text semantic similarity evaluation model, and determines from the expanded text that the semantic similarity satisfies the preset similarity conditions.
  • the expanded text of is output as the target expanded text.
  • This application also discloses an electronic device, including:
  • Memory used to hold computer programs
  • a processor is used to execute a computer program to implement the aforementioned text expansion method.
  • This application also discloses a non-volatile readable storage medium for storing a computer program; wherein when the computer program is executed by a processor, the steps of the previously disclosed text expansion method are implemented.
  • the text to be expanded is determined, and the target noun is determined from the text to be expanded.
  • the target noun is entity expanded and semantic expanded to determine the target expanded entity and the target expanded semantics, and the target expanded entity and the target expanded semantics are combined.
  • Figure 1 is a flow chart of a text expansion method provided by this application.
  • Figure 2 is a flow chart of a specific text expansion method provided by this application.
  • Figure 3 is a schematic flow chart of a method provided by this application.
  • Figure 4 is a flow chart of a specific text expansion method provided by this application.
  • FIG. 5 is a system framework diagram provided by this application.
  • Figure 6 is a schematic structural diagram of a text expansion device provided by this application.
  • Figure 7 is a structural diagram of an electronic device provided by this application.
  • short text can be expanded into long text with rich semantics and consistent emotions, which not only ensures rich semantics but also ensures semantic consistency, thereby improving the accuracy of text expansion.
  • the embodiment of the present application discloses a text expansion method. See Figure 1.
  • the method includes:
  • Step S11 Determine the text to be expanded and determine the target noun from the text to be expanded.
  • the target noun is determined from the text to be expanded. Determining the target noun from the text to be expanded includes: using a preset part-of-speech tagging tool to tag the text to be expanded to obtain the text to be expanded with a part-of-speech tag; determining the part-of-speech tag from the text to be expanded with a part-of-speech tag The phrase with a noun part of speech serves as the target noun.
  • the part-of-speech tagging tool stanza can be used to perform part-of-speech analysis on the extended text T, and all tags are NN (Noun-singular or mass, noun-singular or mass) or NNP (Proper noun-singular, special). Nouns with nouns (singular) are extracted as target nouns.
  • Determining the text to be expanded and determining the target nouns from the text to be expanded includes: determining the text to be expanded and determining the target nouns from the text to be expanded to generate a noun list; correspondingly, performing entity expansion and semantic expansion of the target nouns, Determining the target extended entity and target extended semantics includes: performing entity expansion and semantic expansion on the target noun in the noun list to determine the target extended entity and target extended semantics. It can be understood that the target noun determined from the text to be expanded can be stored in the noun list. Correspondingly, when the target noun is expanded, the target noun can be extracted from the noun list, and then the target noun can be expanded.
  • Step S12 Perform entity expansion and semantic expansion on the target noun to determine the target expanded entity and target expanded semantics.
  • the target noun will be expanded in both entity expansion and semantic expansion.
  • the possible generated target extended entities are: pet, dog, girlfriend, partner, confidant; the possible generated target extended semantics are: chat, rely on, play games .
  • Step S13 Combine the target extended entity and the target extended semantics in pairs, and calculate the correlation score between the corresponding target extended entity and the target extended semantics in each combination.
  • the target extended entity and the target extended semantics are combined in pairs, and the correlation score between the target extended entity and the target extended semantics in each combination is calculated.
  • the trained artificial intelligence model may be used to calculate the correlation score for each group.
  • the splicing sequence of the text generation model may also include: sorting the correlation scores in descending order, and selecting the top N groups as the next input preset. Let the splicing sequence of the text generation model be.
  • Step S14 Input the combination of the text to be expanded and the correlation score that satisfies the first preset condition into the preset text generation model to obtain the expanded text output by the preset text generation model.
  • Inputting the combination of the text to be expanded and the correlation score that satisfies the first preset condition into the preset text generation model includes: using a preset splicing method to splice the combination of the text to be expanded and the correlation score that satisfies the first preset condition. , to generate the spliced sequence; input the spliced sequence into the preset text generation model.
  • the text to be expanded is spliced with the combination obtained in the previous step to generate a spliced sequence, and then the spliced sequence is input into a pre-trained text generation model to obtain the model
  • the generated long text with richer semantics is the expanded text.
  • the preset text generation model includes but is not limited to the GPT3 model (Generative Pre-trained Transformer 3, autoregressive language model).
  • Step S15 Use the preset text semantic similarity evaluation model to evaluate the semantic similarity between the expanded text and the text to be expanded, and determine the expanded text whose semantic similarity satisfies the preset similarity conditions from the expanded text as the target.
  • the expanded text is output.
  • the semantic similarity between the expanded text and the text to be expanded can be calculated through the preset text semantic similarity evaluation model, and then the top N expanded texts can be selected.
  • N is a positive integer and can be set or changed at will according to user needs.
  • the preset text semantic similarity evaluation model includes but is not limited to a DSSM (Deep Structured Semantic Model) model.
  • the text to be expanded is determined, and the target noun is determined from the text to be expanded.
  • the target noun is entity expanded and semantic expanded to determine the target extended entity and the target extended semantics, and the target extended entity and the target extended semantics are combined.
  • the expanded text that meets the preset similar conditions is output as the target expanded text.
  • this application uses entity expansion and semantic expansion to achieve accurate expansion of the short text to be expanded. It also uses a preset text generation model and a preset text semantic similarity evaluation model to perform text generation and text similarity evaluation. , improves the efficiency of text expansion, and solves the problem of insufficient semantic richness and semantic changes when short text is expanded. This application not only ensures semantic richness but also ensures semantic consistency, and improves the accuracy of text expansion.
  • Figure 2 is a flow chart of a specific text expansion method provided by an embodiment of the present application. As shown in Figure 2, the method includes:
  • Step S21 Determine the text to be expanded, and use stanza to perform part-of-speech tagging on the text to be expanded to obtain the text to be expanded with part-of-speech tags, and then determine the phrase whose part-of-speech tag is a noun part-of-speech from the text to be expanded with part-of-speech tags as the target noun.
  • the part-of-speech tagging tool stanza can be used to perform part-of-speech tagging on the text to be expanded.
  • Step S22 Perform entity expansion and semantic expansion on the target noun to determine the target expanded entity and target expanded semantics.
  • step S22 For more specific processing of step S22, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
  • Step S23 Use the knowledge graph to determine the hyponym relationship list of the target noun, generate an entity extension list based on the hyponymy relationship list and the target extended entity, and generate a semantic extension list based on the hyponymy relationship list and the target extended semantics.
  • Using the knowledge graph to determine the list of hypernym and hyponym relationships of the target noun includes: using the retrieval interface of ConceptNet to retrieve the hypernym relationship of the target noun to determine the list of hypernym and hyponym relationships of the target noun.
  • Generate an entity expansion list based on the superior-hybrid relationship list and the target extended entity and generate a semantic expansion list based on the superior-hybrid relationship list and the target extended semantics, including: extracting the tail entity whose relationship is the preset first relationship in the superior-hybrid relationship list to form an entity Expand the list; extract the tail entity whose relationship is the preset second relationship in the upper and lower relationship list to form a semantic expansion list.
  • each noun in the noun list can be queried through ConceptNet's retrieval API (ie, Application Programming Interface, application programming interface) to obtain a list of hypernym relationships of all nouns, and then extract " is a subevents of (about..)", “Types of (type of..)”, “Parts of (part of..)”, “Symbols of (symbol..)”, “is a type of (. .)”
  • Step S24 Combine the target extended entity and the target extended semantics in pairs, and calculate the correlation score between the corresponding target extended entity and the target extended semantics in each combination.
  • step S24 For more specific processing of step S24, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
  • Step S25 Input the combination of the text to be expanded and the correlation score that satisfies the first preset condition into the preset text generation model to obtain the expanded text output by the preset text generation model.
  • step S25 For more specific processing of step S25, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
  • Step S26 Use the preset text semantic similarity evaluation model to evaluate the semantic similarity between the expanded text and the text to be expanded, and determine the expanded text whose semantic similarity satisfies the preset similarity conditions from the expanded text as the target.
  • the expanded text is output.
  • step S26 For more specific processing of step S26, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
  • Figure 3 is a schematic flow chart of a method proposed in this embodiment.
  • conceptnet is used to expand the entity and semantics of the target noun, and then the correlation between the extended entity and the extended semantics is calculated, and the correlation score meets the first preset
  • the combination of conditions is input into the preset text generation model to generate long text, and then the text similarity model is used to compare the output long text.
  • the text is scored, sorted according to the score, and the long text for output display is finally determined.
  • the text to be expanded is first determined, and stanza is used to perform part-of-speech tagging on the text to be expanded to obtain the text to be expanded with part-of-speech tags, and then the part-of-speech tag is determined to be a noun part-of-speech from the text to be expanded with part-of-speech tags Phrases serve as target nouns. Then, entity expansion and semantic expansion are performed on the target noun to determine the target expansion entity and target expansion semantics.
  • the next step is to combine the target extended entity and the target extended semantics in pairs, and calculate the correlation score between the corresponding target extended entity and the target extended semantics in each combination.
  • a combination of text to be expanded and a correlation score that satisfies the first preset condition is input into the preset text generation model to obtain expanded text output by the preset text generation model.
  • the preset text semantic similarity evaluation model is used to evaluate the semantic similarity between the expanded text and the text to be expanded, and the expanded text whose semantic similarity meets the preset similarity conditions is determined from the expanded text as the target expanded text.
  • Text is output.
  • the hyponymy list of the target noun is determined through the knowledge graph, an entity extension list is generated based on the hyponymy list and the target extended entity, and a semantic extension list is generated based on the hyponymy list and the target extended semantics.
  • the method proposed in this application uses concept maps to extract relevant entities and semantics as candidate extensions, and uses a semantic relevance evaluation model to score the finally generated long text, which can ensure the semantic and emotional consistency of the extended long text. Improved accuracy of text expansion.
  • Figure 4 is a flow chart of a specific text expansion method provided by an embodiment of the present application. As shown in Figure 4, the method includes:
  • Step S31 Collect text from the preset social platform, and use preset classification rules to classify the text into short text and long text, then determine the short text as the text to be expanded, and determine the target noun from the text to be expanded.
  • text can be collected in a preset social platform, and the text with a length of Chinese characters less than or equal to 10 is defined as short text, and the text with a length greater than 10 is defined as long text, and then the short text is used as the text to be expanded, and Determine the target noun from the text to be expanded.
  • the preset social platforms include but are not limited to Weibo.
  • Step S32 Perform entity expansion and semantic expansion on the target noun to determine the target expanded entity and target expanded semantics.
  • step S32 For more specific processing of step S32, reference may be made to the corresponding content disclosed in the foregoing embodiments, and will not be described again here.
  • Step S33 Use a preset part-of-speech tagging tool to perform part-of-speech tagging on the long text to obtain long text with part-of-speech tags, and then determine from the long text with part-of-speech tags the phrases whose part-of-speech tags are verb part-of-speech and noun part-of-speech.
  • a preset part-of-speech tagging tool can be used to perform part-of-speech tagging on the long text to obtain the long text with part-of-speech tags, and then determine the part-of-speech tags from the long text with part-of-speech tags as a combination of verb part-of-speech and noun part-of-speech. .
  • Step S34 Determine the verb phrases and noun phrases in the same long text as relevant phrases, and use the relevant phrases as training data and input them into the preset language representation model for training to obtain a trained model.
  • the correlation between verbs and nouns appearing in the same article can be determined as 1, otherwise it is 0, and then the relevant phrases are used as training data and input to BERT (ie, Bidirectional Encoder Representation from Transformers , pre-trained language representation model) model to obtain the correlation calculation model M r .
  • BERT Bidirectional Encoder Representation from Transformers , pre-trained language representation model
  • Step S35 Input the pairwise combinations of the target extended entity and the target extended semantics into the trained model, and obtain the correlation score between the corresponding target extended entity and the target extended semantics in each combination output by the trained model.
  • the target extended entity and the target extended semantics can be input into the trained correlation calculation model M r , and the relationship between the target extended entity and the target extended semantics in each combination output by the correlation calculation model M r can be obtained. correlation scores between.
  • Step S36 Input the combination of the text to be expanded and the correlation score that satisfies the first preset condition into the preset text generation model to obtain the expanded text output by the preset text generation model.
  • Input the combinations of text to be extended and relevance scores that meet the first preset condition into the preset text generation model including: determining the relevance scores of all combinations of target extended entities and target extended semantics, and classifying all combinations according to the relevance scores. Sort by combination; respectively determine the relevance score corresponding to each target extension entity to be sorted into a combination of the previous preset number of groups; sort the score corresponding to each target extension entity into a combination of the previous preset number of groups and the text to be expanded Input into a preset text generation model.
  • all correlation scores corresponding to the same target extended entity can be sorted, and the combinations e 1 with the top ten correlation scores are selected: [s 1 , s 2 ... s 10 ], and Spliced with the text to be extended to obtain the spliced sequence [T, e i , s j ], and input the spliced sequence into the preset text generation model GPT-3 to obtain the output long text T ⁇ .
  • the preset quantity in this step can be set and changed at will according to user needs.
  • Step S37 Use the preset text semantic similarity evaluation model to evaluate the semantic similarity between the expanded text and the text to be expanded, and determine the expanded text whose semantic similarity meets the preset similarity conditions from the expanded text as the target.
  • the expanded text is output.
  • the text T to be expanded and the expanded text T ⁇ generated in step S36 can be input into the text similarity calculation model DSSM model to obtain the similarity score between the text to be expanded and each expanded text so far, and then the similarity score can be obtained Select a preset number of texts to be expanded as the final target expanded text for output. It is understandable that the preset quantity in this step can be set and changed at will according to the user's wishes.
  • Figure 5 is a system framework diagram proposed in this embodiment.
  • a preset language representation model is trained using long text, and the trained model is used to communicate between the target extended entity and the target extended semantics of the extended text. Relevance scores are evaluated.
  • the acquired long text is tagged with part-of-speech nouns and verbs, and after determining the correlation between these verbs and nouns, the preset model is trained, and the trained correlation model M r is obtained.
  • the entity set containing the target noun After obtaining the short text, the entity set containing the target noun will be determined first, and then the entity expansion and semantic expansion of the target noun will be performed based on the concept map in the knowledge graph technology, and the entity expansion and semantic expansion will be input to the correlation model M r to determine the relevance score of each entity-semantic combination. After sorting the scores, input the sorted entity-semantic combination into the preset text generation model GPT3, and then obtain the long text output by GPT3, and finally use The DSSM model ranks the similarity between the extended text and the long text output by GPT3, and finally determines the top 5 long texts.
  • text is first collected from a preset social platform, and the text is classified into short text and long text using preset classification rules, and then the short text is determined as the text to be expanded, and the target is determined from the text to be expanded.
  • Noun perform entity expansion and semantic expansion on the target noun to determine the target expansion entity and target expansion semantics.
  • use the preset part-of-speech tagging tool to perform part-of-speech tagging on the long text to obtain long text with part-of-speech tags, and then determine from the long text with part-of-speech tags the phrases whose part-of-speech tags are verb part-of-speech and noun part-of-speech.
  • the verb phrases and noun phrases in the same long text are determined as relevant phrases, and the relevant phrases are used as training data and input into the preset language representation model for training to obtain the trained model.
  • pairwise combinations of target extended entities and target extended semantics are input into the post-training model, and the correlation score between the corresponding target extended entity and target extended semantics in each combination output by the post-training model is obtained.
  • the combination of the text to be expanded and the correlation score that satisfies the first preset condition is input into the preset text generation model to obtain the expanded text output by the preset text generation model.
  • the long text obtained from the preset social platform can be used to train the preset language representation model, and the trained model can be used to combine the target extended entity and the target extended semantics of the extended text.
  • the correlation scores between the two are evaluated, which improves the applicability of this method.
  • semantic richness is ensured while semantic consistency is ensured, thereby improving the accuracy of text expansion.
  • a text expansion device which may specifically include:
  • the target noun determination module 11 is used to determine the text to be expanded and determine the target noun from the text to be expanded;
  • the entity semantic expansion module 12 is used to perform entity expansion and semantic expansion of the target noun to determine the target expanded entity and target expanded semantics;
  • Entity semantic combination module 13 is used to combine target extended entities and target extended semantics in pairs, and calculate the correlation score between the corresponding target extended entities and target extended semantics in each combination;
  • the text expansion module 14 is used to input the combination of the text to be expanded and the correlation score that satisfies the first preset condition into the preset text generation model to obtain the expanded text output by the preset text generation model;
  • the target expanded text output module 15 is used to evaluate the semantic similarity between the expanded text and the text to be expanded using a preset text semantic similarity evaluation model, and determine from the expanded text that the semantic similarity satisfies the preset similarity.
  • the expanded text of the condition is output as the target expanded text.
  • the text to be expanded is determined, and the target noun is determined from the text to be expanded.
  • the target noun is entity expanded and semantic expanded to determine the target expanded entity and the target expanded semantics, and the target expanded entity and the target expanded semantics are combined.
  • the target noun determination module 11 may include:
  • the first part-of-speech tagging unit is used to use a preset part-of-speech tagging tool to perform part-of-speech tagging on the text to be expanded, so as to obtain the text to be expanded with part-of-speech tags;
  • the target noun extraction unit is used to determine the phrase whose part-of-speech tag is the noun part-of-speech as the target noun from the text to be expanded with the part-of-speech tag.
  • the first part-of-speech tagging unit includes:
  • the part-of-speech tagging tool application unit is used to use stanza to perform part-of-speech tagging on the text to be expanded, so as to obtain the text to be expanded with part-of-speech tags.
  • the target noun determination module 11 includes:
  • a noun list generation unit used to determine the text to be expanded and determine the target nouns from the text to be expanded to generate a noun list
  • the entity semantics extension module 12 includes:
  • the entity semantic expansion unit is used to perform entity expansion and semantic expansion of the target noun in the noun list to determine the target expansion entity and target expansion semantics.
  • the text expansion device further includes:
  • the list determination unit is used to determine the hypernym relationship list of the target noun using the knowledge graph
  • the first list generation unit is used to generate an entity extension list based on the superior-subordinate relationship list and the target extension entity;
  • the second list generation unit is used to generate a semantic expansion list based on the hyponymy relationship list and the target expansion semantics.
  • the list determination unit includes:
  • the noun retrieval unit is used to use the retrieval interface of ConceptNet to retrieve the hypernym relationship of the target noun to determine the list of hypernym relationships of the target noun.
  • the first list generation unit and the second list generation unit include:
  • the first tail entity extraction unit is used to extract the tail entities whose relationships are the preset first relationships in the upper and lower relationship lists to form an entity expansion list;
  • the second tail entity extraction unit is used to extract the tail entities whose relationships are the preset second relationships in the superior-hybrid relationship list to form a semantic expansion list.
  • the text expansion device further includes:
  • the text collection unit is used to collect text from preset social platforms and use preset classification rules to classify the text into short text and long text;
  • the target noun determination module 11 includes:
  • the short text determination unit is used to determine the short text as the text to be expanded.
  • the text expansion device further includes:
  • the long text tagging unit is used to use the preset part-of-speech tagging tool to tag long texts to obtain long texts with part-of-speech tags, and then determine the part-of-speech tags from the long text with part-of-speech tags as verb part-of-speech and noun part-of-speech phrases;
  • the model training unit is used to determine verb phrases and noun phrases in the same long text as relevant phrases, and use the relevant phrases as training data and input them into the preset language representation model for training to obtain training After model.
  • the text expansion device further includes:
  • the entity semantic combination module 13 also includes:
  • the first text input unit is used to input the target extended entity and target extended semantics into the trained model in pairs;
  • the score output unit is used to obtain the correlation score between the corresponding target extended entity and the target extended semantics in each combination of the model output after training.
  • the text expansion module 14 includes:
  • the score sorting unit is used to determine the relevance scores of all combinations of the target extended entity and the target extended semantics, and sort all combinations according to the relevance scores;
  • a combination determination unit used to respectively determine the combination of the preset number of groups in the correlation score corresponding to each target extended entity whose scores are sorted;
  • the second text input unit is used to input the score corresponding to each target expansion entity into the preset number of combinations of groups and the text to be expanded into the preset text generation model.
  • the text expansion module 14 includes:
  • a sequence splicing unit used to splice a combination of the text to be expanded and the correlation score that meets the first preset condition using a preset splicing method to generate a spliced sequence
  • the sequence input unit is used to input the spliced sequence into the preset text generation model.
  • Figure 7 is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content in the figure cannot be considered as any limitation on the scope of use of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application.
  • the electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a display screen 24, an input and output interface 25, a communication interface 26 and a communication bus 27.
  • the memory 22 is used to store computer programs, and the computer programs are loaded and executed by the processor 21 to implement relevant steps in the text expansion method disclosed in any of the foregoing embodiments.
  • the electronic device 20 in this embodiment may specifically be an electronic computer.
  • the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 26 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here; the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here Not specifically limited.
  • the memory 22, as a carrier for resource storage can be a read-only memory, a random access memory, a magnetic disk or an optical disk, and the resources stored thereon can include an operating system 221, a computer program 222, a virtual machine data 223, etc.
  • the virtual machine data 223 A wide variety of data can be included.
  • the storage method can be temporary storage or permanent storage.
  • the operating system 221 is used to manage and control each hardware device and computer program 222 on the electronic device 20, which can be Windows Server, Netware, Unix, Linux, etc.
  • the computer program 222 may further include computer programs that can be used to complete other specific tasks.
  • this application also discloses a non-volatile readable storage medium.
  • the non-volatile readable storage medium mentioned here includes random access memory (Random Access Memory, RAM), memory, read-only memory ( Read-Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register, hard disk, magnetic disk or optical disk or any other form of storage medium known in the technical field.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • electrically programmable ROM electrically erasable programmable ROM
  • register hard disk
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种文本扩展方法、装置、设备及介质,涉及短文本扩展领域,方法包括:确定待扩展文本,并确定待扩展文本中的目标名词(S11);对目标名词进行实体扩展与语义扩展,确定目标扩展实体与目标扩展语义(S12);将目标扩展实体与目标扩展语义两两组合,并计算每一组合相应的相关性得分(S13);将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取扩展后文本(S14);利用预设文本语义相似度评价模型对扩展后文本与待扩展文本的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出(S15)。本方法能够将短文本扩展为语义丰富、情感一致的长文本,提升了文本扩展的准确度。

Description

一种文本扩展方法、装置、设备及介质
相关申请的交叉引用
本申请要求于2022年07月15日提交中国专利局、申请号202210829003.0、申请名称为“一种文本扩展方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及短文本扩展领域,特别涉及一种文本扩展方法、装置、设备及介质。
背景技术
短文本语义扩展的目的是将语义信息有限的短文本扩展为语义信息更为丰富的长文本,可以应用于各种文本改写任务,自动文本生成、数据增强和文本分类任务等场景。现有的短文本扩充方法的目的主要是对短文本中的特征词进行扩充。例如,微博短文本“一辈子的朋友”,加入“狗”的标签后可以扩充为长文本“我和我的小狗每天愉快地玩耍,我们是永远的好朋友”或者加入“闺蜜”的标签后可以扩充为“我和闺蜜无话不谈,我们是一辈子的朋友”。
与该短文本扩展相似的任务是文本扩充和短文本扩充,前者主要将少量文本扩充到大量句式多样且语义保真的相似文本,后者是将短文本有限的语义特征扩展到更多维度。三个任务均属于文本增强任务,主要的方法有人工标注、词替换、语法树、回译、神经网络等方法。人工标注法是早期语料扩充的主要方式,其扩充的语料质量很高,但工作周期长、成本高。词替换法通过将文本中的非核心词替换为其同义词、插入删除虚词和助词等多种方式来实现文本语料的扩充,这种方式快速便捷,但扩充的文本句式单一。回译法是近年来使用较多的一种文本增强方法,它通过将源语言翻译成另一种语言,再将得到另一种语言的句子翻译回源语言的方式,构造出源语言的增强数据;回译法虽然能生成不同句型的语料,但在文本中包含口语化的词、错别字的词、或领域专业词汇的情况下,该方法容易导致生成句子的语义发生变化。语法树的方法主要是通过分析文本的句法依存和语义角色,通过编制好的变换规则,对句子进行句式的改变。
由上可见,在短文本扩展过程中,如何避免扩展后文本的语义变化不够丰富,语义信息容易发生改变的情况是本领域有待解决的问题。
发明内容
有鉴于此,本申请的目的在于提供一种文本扩展方法、装置、设备及介质,能够将短文本扩展为语义丰富、情感一致的长文本。其具体方案如下:
本申请公开了一种文本扩展方法,包括:
确定待扩展文本,并从待扩展文本中确定目标名词;
对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义;
将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分;
将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本;
利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评 价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
在一些实施例中,从待扩展文本中确定目标名词,包括:
利用预设的词性标注工具对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本;
从带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词。
在一些实施例中,利用预设的词性标注工具对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本,包括:
利用stanza对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本。
在一些实施例中,名词词性标签包括NN标签和NNP标签,从带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词,包括:
从带有词性标签的待扩展文本中提取NN标签或NNP标签的词组作为目标名词。
在一些实施例中,确定待扩展文本,并从待扩展文本中确定目标名词,包括:
确定待扩展文本,并从待扩展文本中确定目标名词,以生成名词列表;
相应的,对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义,包括:
对名词列表中的目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。
在一些实施例中,确定待扩展文本,并从待扩展文本中确定目标名词,以生成名词列表,包括:
确定待扩展文本,并从待扩展文本中确定目标名词;
根据目标名词生成实体集合,以及与实体集合对应的名词列表。
在一些实施例中,确定目标扩展实体与目标扩展语义之后,还包括:
利用知识图谱确定目标名词的上下位关系列表;
基于上下位关系列表与目标扩展实体生成实体扩展列表;
基于上下位关系列表与目标扩展语义生成语义扩展列表。
在一些实施例中,利用知识图谱确定目标名词的上下位关系列表,包括:
利用ConceptNet的检索接口对目标名词的上下位关系进行检索,以确定目标名词的上下位关系列表。
在一些实施例中,基于上下位关系列表与目标扩展实体生成实体扩展列表,基于上下位关系列表与目标扩展语义生成语义扩展列表,包括:
提取上下位关系列表中关系为预设第一关系的尾实体,以构成实体扩展列表;
提取上下位关系列表中关系为预设第二关系的尾实体,以构成语义扩展列表。
在一些实施例中,确定待扩展文本,并从待扩展文本中确定目标名词之前,还包括:
从预设社交平台中收集文本,并利用预设分类规则将文本分类为短文本与长文本;
相应的,确定待扩展文本,包括:
将短文本确定为待扩展文本。
在一些实施例中,预设分类规则包括预设文字符长度,从预设社交平台中收集文本,并利用预设分类规则将文本分类为短文本与长文本,包括:
从预设社交平台收集至少一个文本,文本包括文本字符长度;
将文本字符长度小于或等于预设文字符长度的文本分类为短文本;
将文本字符长度大于预设文字符长度的文本分类为长文本。
在一些实施例中,将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分之前,还包括:
利用预设的词性标注工具对长文本进行词性标注,以获取带有词性标签的长文本,然后从带有词性标签的长文本中确定词性标签为动词词性与名词词性的词组;
将同一长文本中的动词词组与名词词组确定为具备相关性的词组,并将具备相关性的词组作为训练数据,输入至预设语言表征模型中进行训练,以得到训练后模型。
在一些实施例中,将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分,包括:
将目标扩展实体与目标扩展语义两两组合输入至训练后模型中;
获取训练后模型输出的每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。
在一些实施例中,将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,包括:
确定目标扩展实体与目标扩展语义的全部组合的相关性得分,并按照相关性得分对全部组合进行排序;
分别确定每个目标扩展实体对应的相关性得分中得分排序为前预设数量组的组合;
将每个目标扩展实体对应的得分排序为前预设数量组的组合与待扩展文本输入至预设文本生成模型中。
在一些实施例中,将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,包括:
利用预设拼接方法将待扩展文本与相关性得分满足第一预设条件的组合进行拼接,以生成拼接后序列;
将拼接后序列输入至预设文本生成模型中。
在一些实施例中,预设相似条件包括预设语义相似度,从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出,包括:
从扩展后文本中确定语义相似度大于预设语义相似度的扩展后文本作为目标扩展后文本进行输出。
在一些实施例中,从扩展后文本中确定语义相似度大于预设语义相似度的扩展后文本作为目标扩展后文本进行输出,包括:
确定语义相似度大于预设语义相似度的扩展后文本,并按照语义相似度对扩展后文本由大到小进行排序;
选取前预设数量个待扩展后文本作为目标扩展后文本进行输出。
本申请还公开了一种文本扩展装置,包括:
目标名词确定模块,用于确定待扩展文本,并从待扩展文本中确定目标名词;
实体语义扩展模块,用于对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义;
实体语义组合模块,用于将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分;
文本扩展模块,用于将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本;
目标扩展后文本输出模块,用于利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
本申请还公开了一种电子设备,包括:
存储器,用于保存计算机程序;
处理器,用于执行计算机程序,以实现前述的文本扩展方法。
本申请还公开了一种非易失性可读存储介质,用于保存计算机程序;其中,计算机程序被处理器执行时实现前述公开的文本扩展方法的步骤。
本申请中,首先通过确定待扩展文本,并从待扩展文本中确定目标名词,对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义,将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分,将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本,利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。这样一来,本申请通过利用实体扩展与语义扩展实现对待扩展的短文本的精确扩展,其中还利用了预设文本生成模型以及预设文本语义相似度评价模型进行文本生成与文本相似度的评价,提升了文本扩展的效率,解决短文本扩展时语义不够丰富和语义发生变化的问题,本申请在保证语义丰富的同时还能保证语义的一致性,提升了文本扩展的精确度。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请提供的一种文本扩展方法流程图;
图2为本申请提供的一种具体的文本扩展方法流程图;
图3为本申请提供的一种方法流程示意图;
图4为本申请提供的一种具体的文本扩展方法流程图;
图5为本申请提供的一种系统框架图;
图6为本申请提供的一种文本扩展装置结构示意图;
图7为本申请提供的一种电子设备结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例, 都属于本申请保护的范围。
现有技术中,在短文本扩展过程中,扩展后文本的语义变化不够丰富,且语义信息容易发生改变。在本申请中,能够将短文本扩展为语义丰富、情感一致的长文本,在保证语义丰富的同时还能保证语义的一致性,进而提升文本扩展的精确度。
本申请实施例公开了一种文本扩展方法,参见图1,该方法包括:
步骤S11:确定待扩展文本,并从待扩展文本中确定目标名词。
本实施例中,在确定待扩展文本T之后,会从待扩展文本中确定目标名词。其中从待扩展文本中确定目标名词,包括:利用预设的词性标注工具对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本;从带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词。在一种具体的实施方式中,可以使用词性标注工具stanza对待扩展文本T进行词性分析,将所有标签为NN(Noun-singular or mass,名词-单数或质量)或NNP(Proper noun-singular,专有名词-单数)的名词提取作为目标名词,在一种具体的实施方式中,也可以根据提取出来的目标名词生成实体集合E={e 1,e 2...e n}。
确定待扩展文本,并从待扩展文本中确定目标名词,包括:确定待扩展文本,并从待扩展文本中确定目标名词,以生成名词列表;相应的,对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义,包括:对名词列表中的目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。可以理解的是,可以将从待扩展文本中确定出的目标名词存放至名词列表中,相应的在对目标名词进行扩展时,可以从名词列表中提取目标名词,然后对目标名词进行扩展。
步骤S12:对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。
本步骤中会对目标名词进行实体扩展与语义扩展两方面的扩展。在一些具体的实施方式中,若目标名词为“朋友”,则可能生成的目标扩展实体为:宠物、狗、女朋友、伙伴、知己;可能生成的目标扩展语义为:聊天、依靠、玩游戏。
步骤S13:将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。
本实施例中,会对目标扩展实体与目标扩展语义两两组合,并计算每一组合中目标扩展实体与目标扩展语义之间的相关性得分。在一些具体实施方式中,可以使用训练后的人工智能模型对每组的相关性得分进行计算。
本实施例中,计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分之后,还可以包括:将相关性得分按照降序排序,选取预设前N组作为下一步输入预设文本生成模型的拼接序列。
步骤S14:将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本。
将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,包括:利用预设拼接方法将待扩展文本与相关性得分满足第一预设条件的组合进行拼接,以生成拼接后序列;将拼接后序列输入至预设文本生成模型中。可以理解的是,本实施例中将待扩展文本与上一步骤中得到的组合进行拼接,以生成拼接后序列,然后将拼接后序列输入至已经预训练好的文本生成模型中,以获取模型生成的语义更加丰富的长文本,也就是扩展后 文本。在一些具体实施方式中,预设文本生成模型包括但不限于GPT3模型(即Generative Pre-trained Transformer 3,自回归语言模型)。
步骤S15:利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
在获取到预设文本生成模型生成的扩展后文本后,可以通过预设文本语义相似度评价模型计算扩展后文本与待扩展文本之间的语义相似度,然后可以选择排序前N的扩展后文本作为最终输出的长文本,其中N为正整数,且可以根据用户需求随意设置或更改。在一些具体的实施方式中,预设文本语义相似度评价模型包括但不限于DSSM(即Deep Structured Semantic Model,深度结构语义模型)模型。
本实施例中,首先通过确定待扩展文本,并从待扩展文本中确定目标名词,对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义,将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分,将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本,利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。这样一来,本申请通过利用实体扩展与语义扩展实现对待扩展的短文本的精确扩展,其中还利用了预设文本生成模型以及预设文本语义相似度评价模型进行文本生成与文本相似度的评价,提升了文本扩展的效率,解决短文本扩展时语义不够丰富和语义发生变化的问题,本申请在保证语义丰富的同时还能保证语义的一致性,提升了文本扩展的精确度。
图2为本申请实施例提供的一种具体的文本扩展方法流程图。参见图2所示,该方法包括:
步骤S21:确定待扩展文本,并利用stanza对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本,然后从带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词。
在本实施例的一些具体实施方式中,可以利用词性标注工具stanza对待扩展文本进行词性标注。
步骤S22:对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。
其中,关于步骤S22的更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
步骤S23:利用知识图谱确定目标名词的上下位关系列表,并基于上下位关系列表与目标扩展实体生成实体扩展列表,基于上下位关系列表与目标扩展语义生成语义扩展列表。
利用知识图谱确定目标名词的上下位关系列表,包括:利用ConceptNet的检索接口对目标名词的上下位关系进行检索,以确定目标名词的上下位关系列表。
基于上下位关系列表与目标扩展实体生成实体扩展列表,基于上下位关系列表与目标扩展语义生成语义扩展列表,包括:提取上下位关系列表中关系为预设第一关系的尾实体,以构成实体扩展列表;提取上下位关系列表中关系为预设第二关系的尾实体,以构成语义扩展 列表。
本实施例中,可以将名词列表中的每个名词通过ConceptNet的检索API(即Application Programming Interface,应用程序编程接口)进行查询,得到所有名词的上下位关系列表,然后提取上下位关系列表中“is a subevents of(关于..)”,“Types of(..的类型)”,“Parts of(..的一部分)”,“Symbols of(象征..)”,“is a type of(..的一种)”五类关系的尾实体构成实体扩展列表E expand={ex 1,ex 2...ex n},例如“朋友”在概念图谱符合上述五种关系的实体有{宠物,狗,女朋友,伙伴,知己,猫咪…};还需要提取上下位关系列表中“Subevents of”(带来),“is capable of”(能够),“Things motivated by”(被驱动去做),“Location of”(地点),“wants”(想要),“is motivated by”(被驱动),“Things that make you want”(让你需要),“Causes of”(原因),“makes you want…”(让你想要做)八类关系的尾实体和关系构成语义扩展列表S expand={s 1,s 2...s n},例如“朋友”在概念图谱中符合上述八种关系的语义有{“聊天”,“依靠”,“玩游戏”…}。
步骤S24:将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。
其中,关于步骤S24的更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
步骤S25:将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本。
其中,关于步骤S25的更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
步骤S26:利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
其中,关于步骤S26的更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
图3为本实施例提出的一种方法流程示意图,首先利用conceptnet对目标名词进行实体扩充与语义扩充,然后计算扩展实体与扩展语义之间的相关性,并将相关性得分满足第一预设条件的组合输入至预设文本生成模型中生成长文本,接着来用文本相似度模型对输出的长
文本进行打分,根据得分进行排序,最终确定输出显示的长文本。
本实施例中,先确定待扩展文本,并利用stanza对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本,然后从带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词。接着对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。然后利用知识图谱确定目标名词的上下位关系列表,并基于上下位关系列表与目标扩展实体生成实体扩展列表,基于上下位关系列表与目标扩展语义生成语义扩展列表。下一步将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本。最后利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中 确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。本实施例中通过知识图谱确定目标名词的上下位关系列表,并基于上下位关系列表与目标扩展实体生成实体扩展列表,基于上下位关系列表与目标扩展语义生成语义扩展列表。本申请中提出的利用概念图谱提取相关的实体和语义作为候选扩展,且利用语义相关性评价模型对最后生成的长文本进行打分的方法,能够保证扩展后的长文本的语义和情感一致性,提升了文本扩展的精确度。
图4为本申请实施例提供的一种具体的文本扩展方法流程图。参见图4所示,该方法包括:
步骤S31:从预设社交平台中收集文本,并利用预设分类规则将文本分类为短文本与长文本,然后将短文本确定为待扩展文本,并从待扩展文本中确定目标名词。
本实施例中,可以在预设社交平台中收集文本,并将中文字符长度小于等于10的文本定义为短文本,长度大于10的文本定义为长文本,然后将短文本作为待扩展文本,并从待扩展文本中确定目标名词。本实施例中预设社交平台包括但不限于微博。
步骤S32:对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。
其中,关于步骤S32的更加具体的处理过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
步骤S33:利用预设的词性标注工具对长文本进行词性标注,以获取带有词性标签的长文本,然后从带有词性标签的长文本中确定词性标签为动词词性与名词词性的词组。
本实施例中可以利用预设的词性标注工具对长文本进行词性标注,以获取带有词性标签的长文本,然后从带有词性标签的长文本中确定词性标签为动词词性与名词词性的词组。
步骤S34:将同一长文本中的动词词组与名词词组确定为具备相关性的词组,并将具备相关性的词组作为训练数据,输入至预设语言表征模型中进行训练,以得到训练后模型。
本实施例中,可以将在同一条文中出现的动词和名词之间的相关度确定为1,否则为0,然后将具备相关性的词组作为训练数据,输入至BERT(即Bidirectional Encoder Representation from Transformers,预训练的语言表征模型)模型中进行训练,得到相关性计算模型M r
步骤S35:将目标扩展实体与目标扩展语义两两组合输入至训练后模型中,并获取训练后模型输出的每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。
本步骤中可以将目标扩展实体与目标扩展语义两两组合输入至训练后的相关性计算模型M r中,并获取相关性计算模型M r输出的每一组合中目标扩展实体与目标扩展语义之间的相关性得分。
步骤S36:将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本。
将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,包括:确定目标扩展实体与目标扩展语义的全部组合的相关性得分,并按照相关性得分对全部组合进行排序;分别确定每个目标扩展实体对应的相关性得分中得分排序为前预设数量组的组合;将每个目标扩展实体对应的得分排序为前预设数量组的组合与待扩展文本输入至预设文本生成模型中。
在一些具体的实施方式中,可以将同一目标扩展实体所对应的全部相关性得分进行排序,选取其中相关性得分前十的组合e 1:[s 1,s 2...s 10],并与待扩展文本进行拼接得到拼接后序列[T,e i,s j],并将拼接后序列输入至预设文本生成模型GPT-3中,得到输出的长文本T`。可以理解的是,本步骤中预设数量可以根据用户需求随意设置与更改。
步骤S37:利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
本步骤中,可以将待扩展文本T与步骤S36中生成的扩展后文本T`输入到文本相似度计算模型DSSM模型中,得到待扩展文本与每个扩展后文本至今的相似度得分,然后可以选取前预设数量个待扩展文本作为最终的目标扩展后文本进行输出。可以理解的是,本步骤中预设数量可以根据用户意愿随意设置与更改。
图5为本实施例提出的一种系统框架图,图中通过对利用长文本对预设语言表征模型进行训练,并利用训练后的模型对待扩展文本的目标扩展实体与目标扩展语义之间的相关性得分进行评价。其中首先对获取到的长文本进行名词与动词的词性标注,并在确定这些动词与名词之间的相关度后,对预设的模型进行训练,并获取到训练后的相关性模型M r。在获取到短文本后,会先确定包含有目标名词的实体集合,然后基于知识图谱技术中的概念图对目标名词进行实体扩展与语义扩展,并将实体扩展与语义扩展输入至相关性模型M r中,以确定每个实体与语义的组合的相关性得分,在对得分进行排序后,将排序后的实体语义组合输入预设文本生成模型GPT3中,然后获取GPT3输出的长文本,最终利用DSSM模型对待扩展文本与GPT3输出的长文本进行相似度排序,并最终确定排序前5的长文本。
本实施例中,先通过从预设社交平台中收集文本,并利用预设分类规则将文本分类为短文本与长文本,然后将短文本确定为待扩展文本,并从待扩展文本中确定目标名词,对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。然后利用预设的词性标注工具对长文本进行词性标注,以获取带有词性标签的长文本,然后从带有词性标签的长文本中确定词性标签为动词词性与名词词性的词组。接着将同一长文本中的动词词组与名词词组确定为具备相关性的词组,并将具备相关性的词组作为训练数据,输入至预设语言表征模型中进行训练,以得到训练后模型。然后将目标扩展实体与目标扩展语义两两组合输入至训练后模型中,并获取训练后模型输出的每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。最后将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本。通过本实施例中提出的文本扩展方法,可以利用从预设社交平台上获取的长文本对预设语言表征模型进行训练,并利用训练后的模型对待扩展文本的目标扩展实体与目标扩展语义之间的相关性得分进行评价,提升了本方法的适用性。且本实施例中在对短文本进行扩展的过程中,在保证语义丰富的同时还能保证语义的一致性,提升了文本扩展的精确度。
参见图6所示,本申请实施例公开了一种文本扩展装置,具体可以包括:
目标名词确定模块11,用于确定待扩展文本,并从待扩展文本中确定目标名词;
实体语义扩展模块12,用于对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义;
实体语义组合模块13,用于将目标扩展实体与目标扩展语义两两组合,并计算每一组合 中相应的目标扩展实体与目标扩展语义之间的相关性得分;
文本扩展模块14,用于将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本;
目标扩展后文本输出模块15,用于利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
本申请中,首先通过确定待扩展文本,并从待扩展文本中确定目标名词,对目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义,将目标扩展实体与目标扩展语义两两组合,并计算每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分,将待扩展文本与相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取预设文本生成模型输出的扩展后文本,利用预设文本语义相似度评价模型对扩展后文本与待扩展文本之间的语义相似度进行评价,并从扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。这样一来,本申请通过利用实体扩展与语义扩展实现对待扩展的短文本的精确扩展,其中还利用了预设文本生成模型以及预设文本语义相似度评价模型进行文本生成与文本相似度的评价,提升了文本扩展的效率,解决短文本扩展时语义不够丰富和语义发生变化的问题,在保证语义丰富的同时还能保证语义的一致性,提升了文本扩展的精确度。
在一些具体实施例中,目标名词确定模块11中,可以包括:
第一词性标注单元,用于利用预设的词性标注工具对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本;
目标名词提取单元,用于从带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词。
在一些具体实施例中,第一词性标注单元,包括:
词性标注工具应用单元,用于利用stanza对待扩展文本进行词性标注,以获取带有词性标签的待扩展文本。
在一些具体实施例中,目标名词确定模块11中,包括:
名词列表生成单元,用于确定待扩展文本,并从待扩展文本中确定目标名词,以生成名词列表;
相应的,实体语义扩展模块12中,包括:
实体语义扩展单元,用于对名词列表中的目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。
在一些具体实施例中,文本扩展装置中,还包括:
列表确定单元,用于利用知识图谱确定目标名词的上下位关系列表;
第一列表生成单元,用于基于上下位关系列表与目标扩展实体生成实体扩展列表;
第二列表生成单元,用于基于上下位关系列表与目标扩展语义生成语义扩展列表。
在一些具体实施例中,列表确定单元,包括:
名词检索单元,用于利用ConceptNet的检索接口对目标名词的上下位关系进行检索,以确定目标名词的上下位关系列表。
在一些具体实施例中,第一列表生成单元与第二列表生成单元中,包括:
第一尾实体提取单元,用于提取上下位关系列表中关系为预设第一关系的尾实体,以构成实体扩展列表;
第二尾实体提取单元,用于提取上下位关系列表中关系为预设第二关系的尾实体,以构成语义扩展列表。
在一些具体实施例中,文本扩展装置中,还包括:
文本收集单元,用于从预设社交平台中收集文本,并利用预设分类规则将文本分类为短文本与长文本;
相应的,目标名词确定模块11中,包括:
短文本确定单元,用于将短文本确定为待扩展文本。
在一些具体实施例中,文本扩展装置中,还包括:
长文本标注单元,用于利用预设的词性标注工具对长文本进行词性标注,以获取带有词性标签的长文本,然后从带有词性标签的长文本中确定词性标签为动词词性与名词词性的词组;
模型训练单元,用于将同一长文本中的动词词组与名词词组确定为具备相关性的词组,并将具备相关性的词组作为训练数据,输入至预设语言表征模型中进行训练,以得到训练后模型。
在一些具体实施例中,文本扩展装置中,还包括:
将目标扩展实体与目标扩展语义两两组合输入至训练后模型中;
获取训练后模型输出的每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。
在一些具体实施例中,实体语义组合模块13中,还包括:
第一文本输入单元,用于将目标扩展实体与目标扩展语义两两组合输入至训练后模型中;
得分输出单元,用于获取训练后模型输出的每一组合中相应的目标扩展实体与目标扩展语义之间的相关性得分。
在一些具体实施例中,文本扩展模块14中,包括:
得分排序单元,用于确定目标扩展实体与目标扩展语义的全部组合的相关性得分,并按照相关性得分对全部组合进行排序;
组合确定单元,用于分别确定每个目标扩展实体对应的相关性得分中得分排序为前预设数量组的组合;
第二文本输入单元,用于将每个目标扩展实体对应的得分排序为前预设数量组的组合与待扩展文本输入至预设文本生成模型中。
在一些具体实施例中,文本扩展模块14中,包括:
序列拼接单元,用于利用预设拼接方法将待扩展文本与相关性得分满足第一预设条件的组合进行拼接,以生成拼接后序列;
序列输入单元,用于将拼接后序列输入至预设文本生成模型中。
进一步的,本申请实施例还公开了一种电子设备,图7是根据示例性实施例示出的电子设备20结构图,图中的内容不能认为是对本申请的使用范围的任何限制。
图7为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,具体可以包 括:至少一个处理器21、至少一个存储器22、电源23、显示屏24、输入输出接口25、通信接口26和通信总线27。其中,存储器22用于存储计算机程序,计算机程序由处理器21加载并执行,以实现前述任一实施例公开的文本扩展方法中的相关步骤。另外,本实施例中的电子设备20具体可以为电子计算机。
本实施例中,电源23用于为电子设备20上的各硬件设备提供工作电压;通信接口26能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源可以包括操作系统221、计算机程序222及虚拟机数据223等,虚拟机数据223可以包括各种各样的数据。存储方式可以是短暂存储或者永久存储。
其中,操作系统221用于管理与控制电子设备20上的各硬件设备以及计算机程序222,其可以是Windows Server、Netware、Unix、Linux等。计算机程序222除了包括能够用于完成前述任一实施例公开的由电子设备20执行的文本扩展方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。
进一步的,本申请还公开了一种非易失性可读存储介质,这里所说的非易失性可读存储介质包括随机存取存储器(Random Access Memory,RAM)、内存、只读存储器(Read-Only Memory,ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、磁碟或者光盘或技术领域内所公知的任意其他形式的存储介质。其中,计算机程序被处理器执行时实现前述公开的文本扩展方法。关于该方法的具体步骤可以参考前述实施例中公开的相应内容,在此不再进行赘述。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要 素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请所提供的文本扩展方法、装置、设备、存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种文本扩展方法,其特征在于,包括:
    确定待扩展文本,并从所述待扩展文本中确定目标名词;
    对所述目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义;
    将所述目标扩展实体与所述目标扩展语义两两组合,并计算每一组合中相应的所述目标扩展实体与所述目标扩展语义之间的相关性得分;
    将所述待扩展文本与所述相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取所述预设文本生成模型输出的扩展后文本;
    利用预设文本语义相似度评价模型对所述扩展后文本与所述待扩展文本之间的语义相似度进行评价,并从所述扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
  2. 根据权利要求1所述的文本扩展方法,其特征在于,所述从所述待扩展文本中确定目标名词,包括:
    利用预设的词性标注工具对所述待扩展文本进行词性标注,以获取带有词性标签的待扩展文本;
    从所述带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词。
  3. 根据权利要求2所述的文本扩展方法,其特征在于,所述利用预设的词性标注工具对所述待扩展文本进行词性标注,以获取带有词性标签的待扩展文本,包括:
    利用stanza对所述待扩展文本进行词性标注,以获取带有词性标签的待扩展文本。
  4. 根据权利要求2所述的文本扩展方法,其特征在于,所述名词词性标签包括NN标签和NNP标签,所述从所述带有词性标签的待扩展文本中确定词性标签为名词词性的词组作为目标名词,包括:
    从带有所述词性标签的待扩展文本中提取所述NN标签或所述NNP标签的词组作为所述目标名词。
  5. 根据权利要求1所述的文本扩展方法,其特征在于,所述确定待扩展文本,并从所述待扩展文本中确定目标名词,包括:
    确定待扩展文本,并从所述待扩展文本中确定目标名词,以生成名词列表;
    相应的,所述对所述目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义,包括:
    对所述名词列表中的所述目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义。
  6. 根据权利要求5所述的文本扩展方法,其特征在于,所述确定待扩展文本,并从所述待扩展文本中确定目标名词,以生成名词列表,包括:
    确定待扩展文本,并从所述待扩展文本中确定目标名词;
    根据所述目标名词生成实体集合,以及与所述实体集合对应的名词列表。
  7. 根据权利要求6所述的文本扩展方法,其特征在于,所述确定目标扩展实体与目标扩展语义之后,还包括:
    利用知识图谱确定所述目标名词的上下位关系列表;
    基于所述上下位关系列表与所述目标扩展实体生成实体扩展列表;
    基于所述上下位关系列表与所述目标扩展语义生成语义扩展列表。
  8. 根据权利要求7所述的文本扩展方法,其特征在于,所述利用知识图谱确定所述目标名词的上下位关系列表,包括:
    利用ConceptNet的检索接口对所述目标名词的上下位关系进行检索,以确定所述目标名词的上下位关系列表。
  9. 根据权利要求7所述的文本扩展方法,其特征在于,所述基于所述上下位关系列表与所述目标扩展实体生成实体扩展列表,基于所述上下位关系列表与所述目标扩展语义生成语义扩展列表,包括:
    提取所述上下位关系列表中关系为预设第一关系的尾实体,以构成实体扩展列表;
    提取所述上下位关系列表中关系为预设第二关系的尾实体,以构成语义扩展列表。
  10. 根据权利要求1所述的文本扩展方法,其特征在于,所述确定待扩展文本,并从所述待扩展文本中确定目标名词之前,还包括:
    从预设社交平台中收集文本,并利用预设分类规则将所述文本分类为短文本与长文本;
    相应的,所述确定待扩展文本,包括:
    将所述短文本确定为待扩展文本。
  11. 根据权利要求10所述的文本扩展方法,其特征在于,所述预设分类规则包括预设文字符长度,所述从预设社交平台中收集文本,并利用预设分类规则将所述文本分类为短文本与长文本,包括:
    从所述预设社交平台收集至少一个所述文本,所述文本包括文本字符长度;
    将所述文本字符长度小于或等于所述预设文字符长度的文本分类为所述短文本;
    将所述文本字符长度大于所述预设文字符长度的文本分类为所述长文本。
  12. 根据权利要求10所述的文本扩展方法,其特征在于,所述将所述目标扩展实体与所述目标扩展语义两两组合,并计算每一组合中相应的所述目标扩展实体与所述目标扩展语义之间的相关性得分之前,还包括:
    利用预设的词性标注工具对所述长文本进行词性标注,以获取带有词性标签的长文本,从所述带有词性标签的长文本中确定词性标签为动词词性与名词词性的词组;
    将同一所述长文本中的动词词组与名词词组确定为具备相关性的词组,并将所述具备相关性的词组作为训练数据,输入至预设语言表征模型中进行训练,以得到训练后模型。
  13. 根据权利要求12所述的文本扩展方法,其特征在于,所述将所述目标扩展实体与所述目标扩展语义两两组合,并计算每一组合中相应的所述目标扩展实体与所述目标扩展语义之间的相关性得分,包括:
    将所述目标扩展实体与所述目标扩展语义两两组合输入至所述训练后模型中;
    获取所述训练后模型输出的每一组合中相应的所述目标扩展实体与所述目标扩展语义之间的相关性得分。
  14. 根据权利要求1所述的文本扩展方法,其特征在于,所述将所述待扩展文本与所述相关性得分满足第一预设条件的组合输入至预设文本生成模型中,包括:
    确定所述目标扩展实体与所述目标扩展语义的全部组合的相关性得分,并按照所述相关性得分对所述全部组合进行排序;
    分别确定每个所述目标扩展实体对应的相关性得分中得分排序为前预设数量组的组合;
    将每个所述目标扩展实体对应的所述得分排序为前预设数量组的组合与所述待扩展文本输入至预设文本生成模型中。
  15. 根据权利要求1至14任一项所述的文本扩展方法,其特征在于,所述将所述待扩展文本与所述相关性得分满足第一预设条件的组合输入至预设文本生成模型中,包括:
    利用预设拼接方法将所述待扩展文本与所述相关性得分满足第一预设条件的组合进行拼接,以生成拼接后序列;
    将所述拼接后序列输入至预设文本生成模型中。
  16. 根据权利要求15所述的文本扩展方法,其特征在于,所述预设相似条件包括预设语义相似度,所述从所述扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出,包括:
    从所述扩展后文本中确定所述语义相似度大于所述预设语义相似度的扩展后文本作为目标扩展后文本进行输出。
  17. 根据权利要求16所述的文本扩展方法,其特征在于,所述从所述扩展后文本中确定所述语义相似度大于所述预设语义相似度的扩展后文本作为目标扩展后文本进行输出,包括:
    确定所述语义相似度大于所述预设语义相似度的扩展后文本,并按照所述语义相似度对所述扩展后文本由大到小进行排序;
    选取前预设数量个所述待扩展后文本作为所述目标扩展后文本进行输出。
  18. 一种文本扩展装置,其特征在于,包括:
    目标名词确定模块,用于确定待扩展文本,并从所述待扩展文本中确定目标名词;
    实体语义扩展模块,用于对所述目标名词进行实体扩展与语义扩展,以确定目标扩展实体与目标扩展语义;
    实体语义组合模块,用于将所述目标扩展实体与所述目标扩展语义两两组合,并计算每一组合中相应的所述目标扩展实体与所述目标扩展语义之间的相关性得分;
    文本扩展模块,用于将所述待扩展文本与所述相关性得分满足第一预设条件的组合输入至预设文本生成模型中,以获取所述预设文本生成模型输出的扩展后文本;
    目标扩展后文本输出模块,用于利用预设文本语义相似度评价模型对所述扩展后文本与所述待扩展文本之间的语义相似度进行评价,并从所述扩展后文本中确定语义相似度满足预设相似条件的扩展后文本作为目标扩展后文本进行输出。
  19. 一种电子设备,其特征在于,包括处理器和存储器;其中,所述处理器执行所述存储器中保存的计算机程序时实现如权利要求1至17任一项所述的文本扩展方法。
  20. 一种非易失性可读存储介质,其特征在于,用于存储计算机程序;其中,所述计算机程序被处理器执行时实现如权利要求1至17任一项所述的文本扩展方法。
PCT/CN2022/134086 2022-07-15 2022-11-24 一种文本扩展方法、装置、设备及介质 WO2024011813A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210829003.0A CN114912448B (zh) 2022-07-15 2022-07-15 一种文本扩展方法、装置、设备及介质
CN202210829003.0 2022-07-15

Publications (1)

Publication Number Publication Date
WO2024011813A1 true WO2024011813A1 (zh) 2024-01-18

Family

ID=82771900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134086 WO2024011813A1 (zh) 2022-07-15 2022-11-24 一种文本扩展方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114912448B (zh)
WO (1) WO2024011813A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114912448B (zh) * 2022-07-15 2022-12-09 山东海量信息技术研究院 一种文本扩展方法、装置、设备及介质
CN115620726A (zh) * 2022-10-09 2023-01-17 京东科技信息技术有限公司 语音文本生成方法、语音文本生成模型的训练方法、装置
CN115964487A (zh) * 2022-12-22 2023-04-14 南阳理工学院 基于自然语言的论文标签补充方法、装置及存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567290A (zh) * 2010-12-30 2012-07-11 百度在线网络技术(北京)有限公司 用于对待处理的短文本信息进行扩展的方法、装置和设备
WO2016117920A1 (ko) * 2015-01-20 2016-07-28 한국과학기술원 지식표현 확장 방법 및 장치
JP2017078919A (ja) * 2015-10-19 2017-04-27 日本電信電話株式会社 単語拡張装置、分類装置、機械学習装置、方法、及びプログラム
CN107180026A (zh) * 2017-05-02 2017-09-19 苏州大学 一种基于词嵌入语义映射的事件短语学习方法及装置
US20180157640A1 (en) * 2016-12-06 2018-06-07 Electronics And Telecommunications Research Institute System and method for automatically expanding input text
CN110222707A (zh) * 2019-04-28 2019-09-10 平安科技(深圳)有限公司 一种文本数据增强方法及装置、电子设备
CN111027312A (zh) * 2019-12-12 2020-04-17 中金智汇科技有限责任公司 文本扩充方法、装置、电子设备及可读存储介质
CN111930891A (zh) * 2020-07-31 2020-11-13 中国平安人寿保险股份有限公司 基于知识图谱的检索文本扩展方法及相关装置
CN114385791A (zh) * 2022-01-14 2022-04-22 平安科技(深圳)有限公司 基于人工智能的文本扩充方法、装置、设备及存储介质
CN114912448A (zh) * 2022-07-15 2022-08-16 山东海量信息技术研究院 一种文本扩展方法、装置、设备及介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IN2014CH01007A (zh) * 2014-02-27 2015-09-04 Accenture Global Services Ltd
US10157220B2 (en) * 2015-07-23 2018-12-18 International Business Machines Corporation Context sensitive query expansion
CN109271514B (zh) * 2018-09-14 2022-03-15 华南师范大学 短文本分类模型的生成方法、分类方法、装置及存储介质
CN113392647B (zh) * 2020-11-25 2024-04-26 腾讯科技(深圳)有限公司 一种语料生成的方法、相关装置、计算机设备及存储介质
CN112651235A (zh) * 2020-12-24 2021-04-13 北京搜狗科技发展有限公司 一种诗歌生成的方法及相关装置
CN112487827A (zh) * 2020-12-28 2021-03-12 科大讯飞华南人工智能研究院(广州)有限公司 问题回答方法及电子设备、存储装置
CN114580436A (zh) * 2022-03-02 2022-06-03 重庆邮电大学 一种基于语义和词扩展的社交用户主题分析方法及系统

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567290A (zh) * 2010-12-30 2012-07-11 百度在线网络技术(北京)有限公司 用于对待处理的短文本信息进行扩展的方法、装置和设备
WO2016117920A1 (ko) * 2015-01-20 2016-07-28 한국과학기술원 지식표현 확장 방법 및 장치
JP2017078919A (ja) * 2015-10-19 2017-04-27 日本電信電話株式会社 単語拡張装置、分類装置、機械学習装置、方法、及びプログラム
US20180157640A1 (en) * 2016-12-06 2018-06-07 Electronics And Telecommunications Research Institute System and method for automatically expanding input text
CN107180026A (zh) * 2017-05-02 2017-09-19 苏州大学 一种基于词嵌入语义映射的事件短语学习方法及装置
CN110222707A (zh) * 2019-04-28 2019-09-10 平安科技(深圳)有限公司 一种文本数据增强方法及装置、电子设备
CN111027312A (zh) * 2019-12-12 2020-04-17 中金智汇科技有限责任公司 文本扩充方法、装置、电子设备及可读存储介质
CN111930891A (zh) * 2020-07-31 2020-11-13 中国平安人寿保险股份有限公司 基于知识图谱的检索文本扩展方法及相关装置
CN114385791A (zh) * 2022-01-14 2022-04-22 平安科技(深圳)有限公司 基于人工智能的文本扩充方法、装置、设备及存储介质
CN114912448A (zh) * 2022-07-15 2022-08-16 山东海量信息技术研究院 一种文本扩展方法、装置、设备及介质

Also Published As

Publication number Publication date
CN114912448A (zh) 2022-08-16
CN114912448B (zh) 2022-12-09

Similar Documents

Publication Publication Date Title
CN110442718B (zh) 语句处理方法、装置及服务器和存储介质
US11327978B2 (en) Content authoring
TWI746690B (zh) 自然語言問句答案的產生方法、裝置及伺服器
WO2024011813A1 (zh) 一种文本扩展方法、装置、设备及介质
WO2018000272A1 (zh) 一种语料生成装置和方法
CN110457708B (zh) 基于人工智能的词汇挖掘方法、装置、服务器及存储介质
CN112364660B (zh) 语料文本处理方法、装置、计算机设备及存储介质
US20170262783A1 (en) Team Formation
US20220343082A1 (en) System and method for ensemble question answering
Derwojedowa et al. Words, concepts and relations in the construction of Polish WordNet
CN111104803B (zh) 语义理解处理方法、装置、设备及可读存储介质
US20240185734A1 (en) Methods, Systems, Devices, and Software for Managing and Conveying Knowledge
CN112506945A (zh) 基于知识图谱的自适应导学方法及系统
CN111553138B (zh) 用于规范内容结构文档的辅助写作方法及装置
Hong et al. Automatically extracting word relationships as templates for pun generation
Johnson et al. Exploiting social information in grounded language learning via grammatical reduction
Gang et al. Chinese intelligent chat robot based on the AIML language
US20190318220A1 (en) Dispersed template-based batch interaction with a question answering system
KR101072100B1 (ko) 표현 및 설명 추출을 위한 문서 처리 장치 및 방법
CN111046168A (zh) 用于生成专利概述信息的方法、装置、电子设备和介质
Lee Natural Language Processing: A Textbook with Python Implementation
US20230111052A1 (en) Self-learning annotations to generate rules to be utilized by rule-based system
Talita et al. Challenges in building domain ontology for minority languages
CN111126066A (zh) 基于神经网络的中文修辞手法的确定方法和装置
Ramos et al. A QA System for learning Python

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22950921

Country of ref document: EP

Kind code of ref document: A1