CN110059163B - Method and device for generating template, electronic equipment and computer readable medium - Google Patents

Method and device for generating template, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN110059163B
CN110059163B CN201910356347.2A CN201910356347A CN110059163B CN 110059163 B CN110059163 B CN 110059163B CN 201910356347 A CN201910356347 A CN 201910356347A CN 110059163 B CN110059163 B CN 110059163B
Authority
CN
China
Prior art keywords
template
text
candidate
keyword
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910356347.2A
Other languages
Chinese (zh)
Other versions
CN110059163A (en
Inventor
王德瑞
谷伟波
贠挺
刘霏暄
陈国庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910356347.2A priority Critical patent/CN110059163B/en
Publication of CN110059163A publication Critical patent/CN110059163A/en
Application granted granted Critical
Publication of CN110059163B publication Critical patent/CN110059163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present disclosure provides a method of generating a template, comprising: determining various types of slot positions and a word bank corresponding to each type of slot position; template expansion step: searching a candidate text comprising at least one keyword in a preset training text library, generating a template by using at least part of the candidate text, converting contents except the keyword in the candidate text into fixed contents in the generated template, and converting the position of the corresponding keyword in the candidate text into a slot position of a type corresponding to the keyword in the word library in which the keyword is positioned in the generated template; and a keyword expansion step: searching a matching text with fixed content matched with the template in the training text library, taking a word of a slot position in the corresponding template in the matching text as a candidate word of the slot position of the type, and adding at least part of the candidate word as a keyword into the word library corresponding to the slot position of the type. The disclosure also provides a device for generating the template, an electronic device and a computer readable medium.

Description

Method and device for generating template, electronic equipment and computer readable medium
Technical Field
The embodiment of the disclosure relates to the technical field of text generation, and in particular, to a method and an apparatus for generating a template, an electronic device, and a computer-readable medium.
Background
Text Generation (Text Generation) is a technique for generating a Text (such as a title, a sentence, a paragraph, etc.) that is smooth and has a correlation using existing information (a priori information). One way of generating text is to generate with a template: the template comprises fixed content (fixed text) and Slot positions (Slot), each Slot position has a certain type, and when the text is generated, the Slot positions are filled with keywords in a word stock corresponding to the type of the Slot positions, so that the text consisting of the fixed content and the keywords is generated.
The existing template is generated manually: manually summarizing the rules of the existing texts, extracting the texts with commonalities, changing the replaceable words into slots of corresponding types to obtain templates, and setting a corresponding word bank for each type.
The manual template generation has low efficiency, long time consumption and high cost, and the quantity and the diversity of the templates are limited; moreover, the quality of the template is very dependent on the level of the executive and the quality of the existing text, and in addition, errors are easily generated due to human errors, so that the quality of the template is reduced.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for generating a template, an electronic device and a computer readable medium.
In a first aspect, an embodiment of the present disclosure provides a method for generating a template, where the template is used for text generation, and the method includes:
determining various types of slot positions and word banks corresponding to the slot positions of each type, wherein the word banks corresponding to the slot positions of each type comprise at least one keyword;
template expansion step: searching a candidate text comprising at least one keyword in a preset training text library, and generating the template by using at least part of the candidate text, wherein the content except the keyword in the candidate text is converted into fixed content in the template generated by the candidate text, and the position of the candidate text corresponding to the keyword is converted into a slot position of the type corresponding to the keyword in the template generated by the candidate text;
and a keyword expansion step: searching a matching text with fixed content matched with the template in the training text library, taking a word corresponding to the slot position in the template in the matching text as a candidate word of the slot position of the type, and adding at least part of the candidate word as a keyword into the word library corresponding to the slot position of the type.
In some embodiments, after the keyword expanding step, the method further comprises:
and judging whether a preset first condition is met, and if not, returning to the template expansion step.
In some embodiments, the first condition comprises:
in the template expansion step of the previous preset times, the number of the generated new templates is less than or equal to a first threshold value;
in the keyword expansion step a predetermined number of times before, the number of candidate words that are regarded as new keywords is less than or equal to a second threshold.
In some embodiments, said generating said template with at least part of said candidate text comprises; judging whether the template generated by any candidate text meets a preset second condition, if so, generating the template by using the candidate text; the second condition includes:
the number of types of slots included in the template generated with the candidate text is greater than or equal to a third threshold.
In some embodiments, said generating said template with at least part of said candidate text comprises; judging whether the template generated by any candidate text meets a preset second condition, if so, generating the template by using the candidate text; the second condition includes:
in the training text base, the number of texts with matching fixed contents generated by the candidate texts is greater than or equal to a fourth threshold value.
In some embodiments, the adding at least part of the candidate words as keywords into the lexicon corresponding to the slot of the type includes: judging whether any candidate word meets a third condition, if so, adding the candidate word serving as a keyword into a word stock corresponding to the slot position of the type; the third condition includes:
the number of times the candidate word is taken as a candidate word for the slot of the type is greater than or equal to a fifth threshold.
In some embodiments, after the template expanding step, the method further comprises:
and classifying each template according to the slot position in each template.
In some embodiments, after the keyword expanding step, the method further comprises:
filling the corresponding keywords in the word stock into the slots of each type in the template to obtain an intermediate text, wherein each keyword and the text connected with the keyword form a description combination in the intermediate text;
searching each description combination in the training text library, and taking the description combination with the number of search results larger than or equal to a sixth threshold value as a common description combination;
and taking the intermediate text of which all description combinations are the common description as the text generated by the template.
In a second aspect, an embodiment of the present disclosure provides an apparatus for generating a template, including:
the system comprises a setting module, a searching module and a processing module, wherein the setting module is used for determining various types of slot positions and word banks corresponding to the slot positions of each type, and the word banks corresponding to the slot positions of each type comprise at least one keyword;
the template expansion module is used for searching a candidate text comprising at least one keyword in a preset training text library and generating the template by using at least part of the candidate text, wherein the content except the keyword in the candidate text is converted into fixed content in the template generated by the candidate text, and the position of the candidate text corresponding to the keyword is converted into a slot position of the type corresponding to the word library in which the keyword is positioned in the template generated by the candidate text;
and the keyword expansion module is used for searching a matching text which has matched fixed content with the template in the training text library, taking a word in the matching text corresponding to the slot position in the template as a candidate word of the slot position of the type, and adding at least part of the candidate word as a keyword into a word library corresponding to the slot position of the type.
In some embodiments, the apparatus further comprises:
and the judging module is used for judging whether a preset first condition is met, and if not, controlling the template expansion module to start working.
In some embodiments, the first condition comprises:
in the work of the template expansion module for a preset time, the number of the generated new templates is less than or equal to a first threshold value;
in the work of the keyword expansion module for the predetermined times, the number of candidate words to be used as new keywords is less than or equal to a second threshold.
In some embodiments, the template expansion module is configured to determine whether the template generated by any candidate text satisfies a preset second condition, and if so, generate a template by using the candidate text; the second condition includes:
the number of types of slots included in the template generated with the candidate text is greater than or equal to a third threshold.
In some embodiments, the template expansion module is configured to determine whether the template generated by any candidate text satisfies a preset second condition, and if so, generate a template by using the candidate text; the second condition includes:
in the training text base, the number of texts with matching fixed contents generated by the candidate texts is greater than or equal to a fourth threshold value.
In some embodiments, the keyword expansion module is configured to determine whether any candidate word satisfies a third condition, and if yes, add the candidate word as a keyword to a lexicon corresponding to the slot of the type; the third condition includes:
the number of times the candidate word is taken as a candidate word for the slot of the type is greater than or equal to a fifth threshold.
In some embodiments, the apparatus further comprises:
and the classification module is used for classifying the templates according to the slot positions in the templates.
In some embodiments, the apparatus further comprises a generation module to:
filling the corresponding keywords in the word stock into the slots of each type in the template to obtain an intermediate text, wherein each keyword and the text connected with the keyword form a description combination in the intermediate text;
searching each description combination in the training text library, and taking the description combination with the number of search results larger than or equal to a sixth threshold value as a common description combination;
and taking the intermediate text of which all description combinations are the common description as the text generated by the template.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
one or more processors;
a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement any of the above methods of generating a template.
In a fourth aspect, the present disclosure provides a computer readable medium, on which a computer program is stored, where the computer program is executed by a processor to implement any one of the above methods for generating a template.
The method disclosed by the embodiment of the disclosure can generate the template (and the key words) by itself without manual intervention, so that the template generation efficiency is high, the cost is low, the quality of the generated template does not depend on the level of operators, the quality requirement on a training text library is low, no human error exists, the quality of the obtained template is good, and the quality of the text generated by the template is good.
Furthermore, by setting some conditions in the process of generating the template and the keywords, the generated template and the keywords can be ensured to be a more common expression mode, so that the obtained template is more in line with the general expression habits of people, has higher quality and can generate text with higher quality.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
fig. 1 is a flowchart of a method for generating a template according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another method for generating a template provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a method for generating a template according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of some steps in another method for generating a template according to an embodiment of the disclosure;
FIG. 5 is a flowchart of some steps in another method for generating a template according to an embodiment of the disclosure;
FIG. 6 is a block diagram illustrating an apparatus for generating a template according to an embodiment of the present disclosure;
fig. 7 is a block diagram illustrating another apparatus for generating a template according to an embodiment of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present disclosure, the method and apparatus for generating a template, the electronic device, and the computer readable medium provided in the present disclosure are described in detail below with reference to the accompanying drawings.
Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but which may be embodied in different forms and should not be construed as limited to the embodiments set forth in the disclosure. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As used in this disclosure, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
When the terms "comprises," "comprising," "includes," and/or "made from … …" are used in this disclosure, the presence of stated features, integers, steps, operations, elements, and/or components are specified, but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Embodiments of the present disclosure may be described with reference to plan and/or cross-sectional views in light of idealized schematic illustrations of the present disclosure. Accordingly, the example illustrations can be modified in accordance with manufacturing techniques and/or tolerances.
Embodiments of the present disclosure are not limited to the embodiments shown in the drawings, but include modifications of configurations formed based on a manufacturing process. Thus, the regions illustrated in the figures have schematic properties, and the shapes of the regions shown in the figures illustrate specific shapes of regions of elements, but are not intended to be limiting.
Unless otherwise defined, all terms (including technical and scientific terms) used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Noun interpretation
In the embodiments of the present disclosure, unless otherwise specified, the following technical terms should be understood as follows:
the text refers to information composed of words and symbols and capable of expressing a certain meaning, such as phrases (e.g., titles), sentences, paragraphs, and the like.
The words refer to units in the text which can represent independent and definite meanings, and the words can be specific to single words, phrases and the like.
Generating a text, namely generating a smooth and relevant text by utilizing the existing information; the generated text can be a phrase (such as a title), a sentence, a paragraph, and the like; the generated text may be a text without specific requirements, or a text with a specific form for specific content, belonging to a specific field.
The method comprises the steps of generating a template, a slot position, a type of the slot position, a word stock and a keyword, wherein the template refers to a model for generating a text; each template includes fixed content (fixed text) and slots (slots); each slot position has a certain type, which indicates that words of corresponding types should be filled in the slot position; each type of slot position corresponds to a word stock, and the word stock is provided with a plurality of keywords which can be filled into the slot position; when the text is generated, the slot is filled with the key words in the word stock corresponding to the type of the slot, so that the text consisting of the fixed content and the key words is generated.
The text and the template have matching fixed content, which means that the text conforms to the form of one template (or matches with one template), that is, the text has the same words as the fixed content of the template, and words which do not belong to the fixed content exist in the position of the slot of the corresponding template, that is, words of the corresponding slot exist in the text.
Fig. 1 is a flowchart of a method of generating a template according to an embodiment of the present disclosure.
In a first aspect, an embodiment of the present disclosure provides a method for generating a template.
The templates generated by the method of the embodiment of the disclosure are used for generating texts, that is, each generated template includes fixed content (fixed text) and a Slot (Slot), each Slot has a certain type, and when a text is generated, the Slot is filled with a keyword in a lexicon corresponding to the type of the Slot, so that a text composed of the fixed content and the keyword is obtained.
Referring to fig. 1, a method for generating a template provided by an embodiment of the present disclosure includes:
s101, determining various types of slot positions and word banks corresponding to the slot positions of each type, wherein the word banks corresponding to the slot positions of each type comprise at least one keyword.
Various types of slots that may exist in the template are manually set, such as [ adjective slot ], [ verb slot ], and the like (hereinafter, "slot" ] "symbol represents slot).
For example, assuming that the types of slots are n in total, all types of slots constitute a set S, S ═ { S1, S2, S3 … … sn }, where si is the ith type of slot (i ≦ n).
Meanwhile, some keywords (initial keywords) are artificially set for the lexicon of the slot position of each type, for example, the lexicon corresponding to the [ adjective slot position ] is set to include the keyword "great".
S102, template expansion step: searching a candidate text comprising at least one keyword in a preset training text library, and generating a template by using at least part of the candidate text, wherein the content except the keyword in the candidate text is converted into fixed content in the template generated by the candidate text, and the position of the corresponding keyword in the candidate text is converted into a slot position of the type corresponding to the word library in which the keyword is positioned in the template generated by the candidate text.
The training text library includes a plurality of pre-set texts for generating templates. The specific content of the training text library can be determined according to the template which needs to be generated, for example, when a template for a specific field is to be generated (such as a template for generating a legal document, a template for generating a comment and the like), the training text library includes the text (such as including the legal document, the comment article and the like) of the corresponding field.
In this step, the template is found by using the keywords, that is, the keywords (possibly initial keywords, and possibly initial keywords and subsequently added keywords) in the current thesaurus are used to train the text thesaurus to find possible templates.
Specifically, the step includes searching keywords in a training text library, and if one text (such as a sentence) includes at least one keyword, using the text as a candidate text; and then, using the template generated by at least part of the retrieved candidate texts, namely replacing the content of the position corresponding to the keyword in the candidate texts (namely the keyword in the text) with the slot of the type of the keyword, and using other parts of the candidate texts as fixed content, thereby obtaining the template consisting of the fixed content and the slot.
Obviously, when there are new keywords, it is possible to find new candidate texts and generate new templates, so the step can implement template expansion through the new keywords.
In this step, the type of slot included in each generated template forms a set T, { sa1, sa2 … … }, where sai is the ith type of slot included in the template. It should be understood that the set T is necessarily a subset of the set S, i.e., T ∈ S.
For example, assume that there is text in the training text library: "the great scientist Einstein published a narrow relativistic theory and made a significant contribution to the development of physics. If the key word in the word stock corresponding to the [ adjective slot ] position is 'great', the key word in the word stock corresponding to the [ name slot ] is 'einstein', and the key word in the word stock corresponding to the [ scientific theory slot ] is 'narrow-sense relativistic'. The above text can be selected as candidate text and can be converted to template 1:
the scientist [ name slot ] of the [ adjective slot ] publishes the [ scientific theory slot ], which makes a great contribution to the development of physics.
It should be understood that when the number of the keywords increases, the number of the candidate texts including the keywords is more, and the number of the converted templates and the slots in the templates is more.
For example, if there is [ discipline slot ] corresponding to the related keyword "physics" in the thesaurus and [ adjective slot ] corresponding to the related keyword "great" in the thesaurus, the above candidate text is converted into template 2:
an allergenicity (a) scientist (a name slot) of the (an adjective slot) publishes a scientific theory slot, and contributes to the development of the (a subject slot);
it should be understood that template 1 and template 2 above, although originating from the same candidate text, are two different templates.
It should be understood that, according to different specific operation manners, the number of templates actually obtained each time the step is executed is also different: for example, all keywords can be used to search for candidate texts, so that the generated template may include the existing template, and then the same template is merged; alternatively, only the text including at least one new keyword may be used as the candidate text to ensure that the generated templates are all new.
In some embodiments, generating a template with at least a portion of the candidate text comprises; judging whether a template generated by any candidate text meets a preset second condition, if so, generating the template by using the candidate text; wherein the second condition comprises:
the type number of the slots included in the template generated by the candidate text is greater than or equal to a third threshold;
and/or the presence of a gas in the gas,
in the training text library, the number of texts having matching fixed contents generated with the candidate text is greater than or equal to a fourth threshold.
Since the number of candidate texts including the keyword is generally large, most of which are not suitable for generating the template, the template may be generated only with candidate texts satisfying a specific second condition.
Specifically, the second condition may include "the number of types of slots included in the template generated with the candidate text is greater than or equal to a third threshold". That is, after the candidate text is converted into the template, if the type of the slot included in a certain template exceeds a predetermined value (for example, half of the number of types of all slots), the candidate text is actually used to generate the template; therefore, the method can ensure that each finally obtained template has a plurality of fillable contents and can generate a plurality of different texts.
Specifically, the second condition may include "the number of texts having matching fixed contents generated with the candidate text in the training text library is greater than or equal to a fourth threshold value". That is, after a candidate text is converted into a template, more texts with fixed contents matched with the template can be found in the training text library, and then the candidate text is actually used for generating the template; this ensures that the resulting template is a more common expression.
Of course, it should be understood that the second condition may include only one of the above two requirements, or both.
S103, keyword expansion step: searching a matching text with fixed content matched with the template in the training text library, taking a word of a slot position in the corresponding template in the matching text as a candidate word of the slot position of the type, and adding at least part of the candidate word as a keyword into the word library corresponding to the slot position of the type.
After some templates are obtained, texts (matched texts) matched with the fixed contents of the templates are continuously searched in the training text base, the fixed contents of the matched texts are the same as the templates, certain words (candidate words) are necessarily arranged at positions corresponding to the slots in the templates, and the candidate words are probably keywords corresponding to the word bases of the slots of the types, so that at least part of the candidate words can be used as keywords to be typed into the corresponding word bases.
Obviously, when a new template exists, a new keyword is likely to be found, so that the step can realize the expansion of the keyword through the new template.
In order to avoid the word stock from being too large, an upper limit value can be set for the number of keywords in each word stock (or the total number of keywords in all the word stocks) according to the scale of the training text stock.
Illustratively, after the scientist [ named slot ] who gets the above template "[ adjective slot ] publishes [ scientific theory slot ], and makes a significant contribution to the development of physics ], if the training text library includes a text" famous scientist newton publishes three laws of newtons, and makes a significant contribution to the development of physics ", the text will be matched as a matching text, and the word" famous "therein can be selected as a keyword corresponding to [ named slot ], and" newton "can be selected as a keyword corresponding to [ named slot ].
That is to say, the word stock corresponding to the [ adjective slot ] originally has the new keyword "famous", and the word stock corresponding to the [ famous slot ] has the new keyword "newton", so that the expansion of the word stock is realized.
Of course, the more the number of the existing templates is, the more the number of the matched texts can be matched, and the more the number of the keywords can be found.
For example, if the training text library includes a text "famous scientists newton published newton's three laws and made an important contribution to the development of mechanics", it is not matched with a template "the scientists [ named after slot ] published [ scientific theory slot ] and made an important contribution to the development of physics" above "; however, if a scientific theory slot is published by a scientist [ a name slot ] of the [ adjective slot ] of the template [ the name slot ] and contributes to the development of the [ subject slot ], the text is matched with the scientific theory slot ], and then the keyword 'mechanics' corresponding to the [ subject slot ] and the keyword 'important' corresponding to the [ adjective slot ] are obtained.
It should be understood that, according to different specific operation manners, the number of keywords actually obtained each time the step is executed is also different: for example, all templates can be used for matching, so that the found keywords may be repeated with the existing keywords, and then the repeated keywords are combined; alternatively, matching may be performed using only new templates, so that most of the found keywords are new.
In some embodiments, adding at least part of the candidate words as keywords to the lexicon corresponding to the slot of the type includes: judging whether any candidate word meets a preset third condition, if so, adding the candidate word serving as a keyword into a word stock corresponding to the slot position of the type; the third condition includes: the number of times the candidate word is taken as a candidate word for the slot of the type is greater than or equal to a fifth threshold.
The words selected as the candidate words are not necessarily all suitable as the keywords, so only the candidate words in which the third condition is satisfied may be taken as the keywords.
Specifically, the third condition may include that "the number of times the candidate word is taken as a candidate word for the slot of the type is greater than or equal to a fifth threshold". That is, if a word is determined to be a candidate word corresponding to a slot of the same type in multiple different matching texts, it indicates that the word is a relatively common expression manner in the slot of the type, and the word is suitable for being added to a corresponding word stock as a keyword of the slot of the type.
Therefore, the method disclosed by the embodiment of the disclosure can generate the template (and the keywords) by itself without manual intervention, so that the template generation efficiency is high, the cost is low, the quality of the generated template does not depend on the level of an operator, the quality requirement on the training text base is low, no human error exists, the quality of the obtained template is good, and the quality of the text generated by using the template is good.
Obviously, human language expression has certain regularity, and for the same question, several fixed expression modes (including the same sentence pattern or the same word) are often adopted, for example, when the article is titled, a summary expression mode is generally adopted, and when a question is reviewed, several common expression modes are respectively adopted for the opinions of negative and positive tendency.
Obviously, common expressions will appear repeatedly in the text; in addition, for one expression, part of the content is often the same (corresponding to the fixed part in the template), and the other part is selected from a plurality of specific expressions (corresponding to the slot in the template) according to specific situations. Therefore, by setting some conditions (such as the second condition and the third condition) in the process of generating the template and the keywords, the generated template and the keywords can be ensured to accord with a more common expression mode, so that the obtained template is more accordant with the general expression habit of people, has higher quality and can generate a text with higher quality.
Fig. 2 is a flow chart of another method of generating a template according to an embodiment of the present disclosure.
In some embodiments, referring to fig. 2, after the above keyword expansion step (S103), the method further includes:
and S104, judging whether a preset first condition is met, and if not, returning to the template expansion step (S102).
That is, after the template expansion step and the keyword expansion step are performed to obtain a new template and a new keyword, it can be determined whether a preset condition is currently satisfied:
if not, the generation process of the representative template is not finished, so the step S102 needs to be returned, when the step S102 is carried out, more new templates can be generated according to the newly found keywords in the previous step S103, and when the step S103 is carried out again, more new keywords can be obtained according to the new templates.
If yes, the template generation process is finished, and the method can be finished.
Therefore, in the step S102, a certain amount of new templates can be generated, and some new keywords can be obtained in the step S103 according to the new templates; when the step S102 is returned again, since the number of the keywords is increased (a part of the keywords is added in the previous step S103), some new templates can be obtained according to the step S102; further, when the step S103 is performed again, since the number of templates increases (a part of the templates is newly added in the previous step S102), new keywords can be continuously obtained.
Thus, referring to fig. 3, according to the method of the embodiment of the present disclosure, as long as the type of the preset slot and a small number of initial keywords are utilized, the "cold start" of the steps S102 and S103 can be realized, and then, the steps S102 and S103 are performed in a loop, a loop of "template expansion-keyword expansion" can be realized, and the number of templates and keywords is increased in turn in an iterative manner; and ending the cycle until the template generation process is completed after enough templates and keywords are obtained.
In some embodiments, the above first condition comprises:
in the template expansion step of the previous preset times, the number of the generated new templates is less than or equal to a first threshold value;
in the keyword expansion step of the previous predetermined times, the number of candidate words to be a new keyword is less than or equal to a second threshold.
When the number of growth of the template and keyword is less than the threshold (it can also be understood that the growth rate of the template and keyword is less than the threshold) in the previous predetermined number of cycles (one or more), then it can be assumed that the system has already tended to stabilize and the template generation has been substantially completed, and the above cycle can be ended.
The predetermined number of times and the threshold may be specifically set as required, for example, the first condition may be satisfied when both the template and the keyword do not increase any more (i.e., the threshold is 0) in one cycle, or the first condition may be satisfied when the number of increases of the template and the keyword is less than a non-zero value in a plurality of cycles.
Of course, the above specific first condition is only exemplary, and it can be determined whether to continue the loop by other parameters. For example, the first condition may be that "the number of found templates and the number of keywords reach a predetermined value", or "the number of times of performing the template expansion step and the keyword expansion step reaches a predetermined value", and the like, which will not be described in detail herein.
FIG. 4 is a flow chart of another method of generating a template according to an embodiment of the present disclosure.
In some embodiments, referring to fig. 4, after the above template extension step (S102), the method further includes:
and S105, classifying the templates according to the slot positions in the templates.
For the templates obtained above, clustering can be performed on the templates according to the conditions of the slots therein, so as to obtain multiple types of templates, each type of template is multiple different expression modes for the same problem, and thus, when a text is generated, the templates of the proper type can be selected as required, so as to obtain a text with high quality which meets the requirements better.
The specific way of classifying the templates according to the slot is various, and for example, the templates can be classified according to the type of the slot. For convenience of representation, corresponding numbers are set for different types of slots as follows:
[ adjective slot ]: 1
[ name slot ]: 2
[ scientific theory slot ]: 3
[ discipline slot ]: 4.
therefore, the scientist [ name slot ] of the [ adjective slot ] publishes the [ scientific theory slot ], and the distribution rule of the slots in the template, which makes a great contribution to the development of physics, can be summarized to be 1-2-3; the distribution rule of the slot positions in the template, which is provided with the contribution of the adjective slot positions, for the development of the subject slot positions, can be summarized as 1-2-4-3-1 by the scientist of the adjective slot positions (named slot positions); it can be seen that the types of slots in the two templates have similar distribution rules, so that they can be classified into one type.
Of course, the specific manner of sorting the slots may vary. For example, templates with slots of a certain type may be classified into one class; or the templates with the same number of slots are classified into one type, and the like, and are not described in detail herein.
Of course, the same type of slot may be re-classified manually, for example, for [ adjective slot ], some templates have an adjective scientist corresponding to the slot (i.e., the word in the candidate text for generating the template), and other templates have an adjective politician corresponding to the slot; therefore, templates with the slot positions can be classified into different types according to different types of words originally corresponding to the slot positions.
Of course, the above process of classifying the templates according to the words originally corresponding to the slot positions can also be realized in more detail by setting the types of the slot positions and the initial keywords. For example, the [ scientist adjective slot ] and the [ politician adjective slot ] can be directly set, and the initial keywords generally used for the adjective scientists and the initial keywords generally used for the politicians can be respectively set for the word banks of the scientists and the politicians.
FIG. 5 is a flow chart of another method of generating a template according to an embodiment of the present disclosure.
In some embodiments, referring to fig. 5, after the above keyword expansion step (S103), the method further includes:
and S1061, respectively filling the corresponding keywords in the word stock into the slots of each type in the template to obtain an intermediate text, wherein each keyword and the text connected with the keyword form a description combination in the intermediate text.
After some templates and keywords are obtained, corresponding keywords can be filled in each slot of the template to obtain an intermediate text, wherein each keyword and a text (which can be fixed content or other keywords) adjacent to the keyword can form a description combination.
And S1062, retrieving each description combination in the training text library, wherein the description combination with the retrieval result number larger than or equal to a sixth threshold value is a common description combination.
Obviously, even for the same type of keywords (keywords of a thesaurus corresponding to the same type of slot), there may be a case where some of them can be combined with some other content and others cannot be combined with the other content. For example, "great" and "significant" are both keywords corresponding to [ adjective slot ], while "great scientist Duonenst" is reasonable, but "significant scientist Einstein" is not reasonable.
Therefore, after the keywords are filled in to obtain various description combinations, the description combinations can be searched in the training text library, and only the description combinations with the searching result reaching the preset number are correct (common description combinations); if the retrieval result does not reach the preset number, the corresponding description combination is indicated to be wrong.
And S1063, taking the intermediate text of which all description combinations are common descriptions as the text generated by the template.
In part of the intermediate texts, part or all of the description combinations may be incorrect, so that the intermediate texts are also incorrect and should not be used as final results of text generation; therefore, only the intermediate text having the correct combination of descriptions can be taken as a result of text generation.
That is to say, when the template obtained by the method of the embodiment of the present disclosure is used for text generation, the incorrect result in the obtained description combination can be removed by searching, thereby improving the quality of text generation.
Specifically, from the perspective of implementation, before the user actually requires text generation, the above steps S1061 to S1063 are performed, all possible description combinations are traversed, and all correct results are cached; therefore, when a subsequent user requires text generation, the user only needs to directly select the required text from the buffer and output the text, so that the text generation time felt by the user is greatly shortened, and the efficiency is improved.
Of course, it is also possible to temporarily generate the required text according to the above steps S1061 to S1063 every time the user wants to generate the text.
Fig. 6 is a block diagram of an apparatus for generating a template according to an embodiment of the present disclosure.
In a second aspect, an embodiment of the present disclosure provides an apparatus for generating a template, referring to fig. 6, which includes:
the system comprises a setting module, a searching module and a processing module, wherein the setting module is used for determining various types of slot positions and word banks corresponding to the slot positions of each type, and the word banks corresponding to the slot positions of each type comprise at least one keyword;
the template expansion module is used for searching a candidate text comprising at least one keyword in a preset training text library and generating a template by using at least part of the candidate text, wherein the content except the keyword in the candidate text is converted into fixed content in the generated template, and the position of the corresponding keyword in the candidate text is converted into a slot position of a type corresponding to the word library where the keyword is located in the generated template;
and the keyword expansion module is used for searching a matching text which has matched fixed content with the template in the training text library, taking the word of the slot position in the corresponding template in the matching text as a candidate word of the slot position of the type, and adding at least part of the candidate word as a keyword into the word library corresponding to the slot position of the type.
In some embodiments, referring to fig. 7, the apparatus further comprises:
and the judging module is used for judging whether a preset first condition is met or not, and if not, controlling the template expansion module to start working.
In some embodiments, the first condition comprises:
in the work of the template expansion module for the preset time, the number of the generated new templates is less than or equal to a first threshold value;
in the work of the keyword expansion module for the predetermined times, the number of candidate words to be used as new keywords is less than or equal to the second threshold.
In some embodiments, the template expansion module is configured to determine whether a template generated by any candidate text meets a preset second condition, and if so, generate the template by using the candidate text; the second condition includes:
the number of types of slots included in the template generated with the candidate text is greater than or equal to a third threshold.
In some embodiments, the template expansion module is configured to determine whether a template generated by any candidate text meets a preset second condition, and if so, generate the template by using the candidate text; the second condition includes:
in the training text library, the number of texts having matching fixed contents generated with the candidate text is greater than or equal to a fourth threshold.
In some embodiments, the keyword expansion module is configured to determine whether any candidate word satisfies a third condition, and if so, add the candidate word as a keyword to a lexicon corresponding to the slot of the type; the third condition includes:
the number of times the candidate word is taken as a candidate word for the slot of the type is greater than or equal to a fifth threshold.
In some embodiments, referring to fig. 7, the apparatus further comprises:
and the classification module is used for classifying the templates according to the slot positions in the templates.
In some embodiments, referring to fig. 7, the apparatus further comprises a generating module for:
filling the corresponding keywords in the word stock into the slots of various types in the template to obtain intermediate texts, wherein each keyword and the text connected with the keyword form a description combination in the intermediate texts;
retrieving each description combination in the training text library, and taking the description combination with the retrieval result number larger than or equal to a sixth threshold value as a common description combination;
and taking the intermediate text of which all description combinations are common descriptions as the text generated by the template.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
one or more processors;
a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement any of the above methods of generating a template.
In a fourth aspect, the present disclosure provides a computer readable medium, on which a computer program is stored, where the computer program is executed by a processor to implement any one of the above methods for generating a template.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The present disclosure has disclosed example embodiments and, although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (12)

1. A method of generating a template, the template being a template for text generation, the template comprising: a plurality of slots and fixed content; the method comprises the following steps:
determining various types of slot positions and word banks corresponding to the slot positions of each type, wherein the word banks corresponding to the slot positions of each type comprise at least one keyword;
template expansion step: searching a candidate text comprising a plurality of keywords in a preset training text library, and generating the template by using at least part of the candidate text, wherein the content except the keywords in the candidate text is converted into fixed content in the template generated by the candidate text, and the position of the candidate text corresponding to the keywords in the template generated by the candidate text is converted into a slot position of the type corresponding to the keyword in the thesaurus;
and a keyword expansion step: searching a matching text with fixed content matched with the template in the training text library, taking a word corresponding to the slot position in the template in the matching text as a candidate word of the slot position of the type, and adding at least part of the candidate word as a keyword into a word library corresponding to the slot position of the type;
after the step of keyword expansion, the method further comprises the following steps:
judging whether a preset first condition is met, if not, returning to the template expansion step;
after the step of keyword expansion, the method further comprises the following steps:
filling the corresponding keywords in the word stock into the slots of each type in the template to obtain an intermediate text, wherein each keyword and the text connected with the keyword form a description combination in the intermediate text;
searching each description combination in the training text library, and taking the description combination with the number of search results larger than or equal to a sixth threshold value as a common description combination;
taking the intermediate text of which all the description combinations are the common description as the text generated by the template;
said generating said template with at least a portion of said candidate text comprises; judging whether the template generated by any candidate text meets a preset second condition, if so, generating the template by using the candidate text; the second condition includes:
the number of types of slots included in the template generated with the candidate text is greater than or equal to a third threshold.
2. The method of claim 1, wherein the first condition comprises:
in the template expansion step of the previous preset times, the number of the generated new templates is less than or equal to a first threshold value;
in the keyword expansion step a predetermined number of times before, the number of candidate words that are regarded as new keywords is less than or equal to a second threshold.
3. The method of claim 1, wherein the generating the template with at least a portion of the candidate text comprises; judging whether the template generated by any candidate text meets a preset second condition, if so, generating the template by using the candidate text; the second condition includes:
in the training text library, the number of texts having fixed contents matching the template generated with the candidate text is greater than or equal to a fourth threshold.
4. The method of claim 1, wherein the adding at least part of the candidate words as keywords into a lexicon corresponding to the slot of the type comprises: judging whether any candidate word meets a third condition, if so, adding the candidate word serving as a keyword into a word stock corresponding to the slot position of the type; the third condition includes:
the number of times the candidate word is taken as a candidate word for the slot of the type is greater than or equal to a fifth threshold.
5. The method of claim 1, wherein after the template expansion step, further comprising:
and classifying each template according to the slot position in each template.
6. An apparatus for generating a template, the template comprising: a plurality of slots and fixed content, comprising:
the system comprises a setting module, a searching module and a processing module, wherein the setting module is used for determining various types of slot positions and word banks corresponding to the slot positions of each type, and the word banks corresponding to the slot positions of each type comprise at least one keyword;
the template expansion module is used for searching a candidate text comprising a plurality of keywords in a preset training text library and generating the template by using at least part of the candidate text, wherein the contents except the keywords in the candidate text are converted into fixed contents in the template generated by the candidate text, and the positions of the candidate text corresponding to the keywords in the template generated by the candidate text are converted into slot positions of the types corresponding to the word library where the keywords are located;
the keyword expansion module is used for searching a matching text which has matched fixed content with the template in the training text library, taking a word in the matching text corresponding to the slot position in the template as a candidate word of the slot position of the type, and adding at least part of the candidate word as a keyword into a word library corresponding to the slot position of the type;
the judging module is used for judging whether a preset first condition is met or not, and if not, the template expansion module is controlled to start working;
a generation module to:
filling the corresponding keywords in the word stock into the slots of each type in the template to obtain an intermediate text, wherein each keyword and the text connected with the keyword form a description combination in the intermediate text;
searching each description combination in the training text library, and taking the description combination with the number of search results larger than or equal to a sixth threshold value as a common description combination;
taking the intermediate text of which all the description combinations are the common description as the text generated by the template;
the template expansion module is used for judging whether the template generated by any candidate text meets a preset second condition or not, and if so, the template is generated by using the candidate text; the second condition includes:
the number of types of slots included in the template generated with the candidate text is greater than or equal to a third threshold.
7. The apparatus of claim 6, wherein the first condition comprises:
in the work of the template expansion module for a preset time, the number of the generated new templates is less than or equal to a first threshold value;
in the work of the keyword expansion module for the predetermined times, the number of candidate words to be used as new keywords is less than or equal to a second threshold.
8. The device of claim 6, wherein the template expansion module is configured to determine whether the template generated by any candidate text satisfies a preset second condition, and if so, generate a template by using the candidate text; the second condition includes:
in the training text base, the number of texts with matching fixed contents generated by the candidate texts is greater than or equal to a fourth threshold value.
9. The device of claim 6, wherein the keyword expansion module is configured to determine whether any candidate word satisfies a third condition, and if yes, add the candidate word as a keyword to a lexicon corresponding to the slot of the type; the third condition includes:
the number of times the candidate word is taken as a candidate word for the slot of the type is greater than or equal to a fifth threshold.
10. The apparatus of claim 6, further comprising:
and the classification module is used for classifying the templates according to the slot positions in the templates.
11. An electronic device, comprising:
one or more processors;
storage means having one or more programs stored thereon which, when executed by the one or more processors, cause the one or more processors to carry out the method according to any one of claims 1 to 5.
12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.
CN201910356347.2A 2019-04-29 2019-04-29 Method and device for generating template, electronic equipment and computer readable medium Active CN110059163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910356347.2A CN110059163B (en) 2019-04-29 2019-04-29 Method and device for generating template, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910356347.2A CN110059163B (en) 2019-04-29 2019-04-29 Method and device for generating template, electronic equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN110059163A CN110059163A (en) 2019-07-26
CN110059163B true CN110059163B (en) 2022-05-13

Family

ID=67321702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910356347.2A Active CN110059163B (en) 2019-04-29 2019-04-29 Method and device for generating template, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN110059163B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609991B (en) * 2019-09-10 2023-09-19 卓尔智联(武汉)研究院有限公司 Text generation method, electronic device and storage medium
CN110827831A (en) * 2019-11-15 2020-02-21 广州洪荒智能科技有限公司 Voice information processing method, device, equipment and medium based on man-machine interaction
CN111159999B (en) * 2019-12-05 2023-04-18 中移(杭州)信息技术有限公司 Method and device for filling word slot, electronic equipment and storage medium
CN111400484B (en) * 2020-03-20 2023-06-02 支付宝(杭州)信息技术有限公司 Keyword extraction method and system
CN111488450A (en) * 2020-04-08 2020-08-04 北京字节跳动网络技术有限公司 Method and device for generating keyword library and electronic equipment
CN112560425B (en) * 2020-12-24 2024-04-09 北京百度网讯科技有限公司 Template generation method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369265A (en) * 2008-01-14 2009-02-18 北京百问百答网络技术有限公司 Method and system for automatically generating semantic template of problem
CN101576907A (en) * 2009-03-03 2009-11-11 杜小勇 System and method for acquiring product parameters
CN103186509B (en) * 2011-12-29 2016-03-30 北京百度网讯科技有限公司 The extensive method and apparatus of asterisk wildcard class template, the extensive method and system of common template
CN103324622A (en) * 2012-03-21 2013-09-25 北京百度网讯科技有限公司 Method and device for automatic generating of front page abstract
US10037360B2 (en) * 2016-06-20 2018-07-31 Rovi Guides, Inc. Approximate template matching for natural language queries

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268339A (en) * 2013-05-17 2013-08-28 中国科学院计算技术研究所 Recognition method and system of named entities in microblog messages

Also Published As

Publication number Publication date
CN110059163A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
CN110059163B (en) Method and device for generating template, electronic equipment and computer readable medium
US8150822B2 (en) On-line iterative multistage search engine with text categorization and supervised learning
CN111324728A (en) Text event abstract generation method and device, electronic equipment and storage medium
CN108829780B (en) Text detection method and device, computing equipment and computer readable storage medium
Toda et al. A probabilistic approach for automatically filling form-based web interfaces
CN111444320A (en) Text retrieval method and device, computer equipment and storage medium
CN104484377B (en) Replace dictionary generating method and device
US20110282858A1 (en) Hierarchical Content Classification Into Deep Taxonomies
US10078629B2 (en) Tabular data compilation
CN103198149A (en) Method and system for query error correction
CN110134799B (en) BM25 algorithm-based text corpus construction and optimization method
CN110263127A (en) Text search method and device is carried out based on user query word
CN111753167B (en) Search processing method, device, computer equipment and medium
CN115422372A (en) Knowledge graph construction method and system based on software test
CN116151220A (en) Word segmentation model training method, word segmentation processing method and device
CN115248839A (en) Knowledge system-based long text retrieval method and device
CN109815328B (en) Abstract generation method and device
Emu et al. An efficient approach for keyphrase extraction from english document
Lin et al. Enhanced BERT-based ranking models for spoken document retrieval
CN111966869A (en) Phrase extraction method and device, electronic equipment and storage medium
CN111931041A (en) Label recommendation method and device, electronic equipment and storage medium
CN107609006B (en) Search optimization method based on local log research
CN110874408A (en) Model training method, text recognition device and computing equipment
CN115129890A (en) Feedback data map generation method and generation device, question answering device and refrigerator
CN115017267A (en) Unsupervised semantic retrieval method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant