CN110765753B - Document generation method, system, computer device and storage medium - Google Patents

Document generation method, system, computer device and storage medium Download PDF

Info

Publication number
CN110765753B
CN110765753B CN201911372226.3A CN201911372226A CN110765753B CN 110765753 B CN110765753 B CN 110765753B CN 201911372226 A CN201911372226 A CN 201911372226A CN 110765753 B CN110765753 B CN 110765753B
Authority
CN
China
Prior art keywords
entity
document
information
knowledge graph
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911372226.3A
Other languages
Chinese (zh)
Other versions
CN110765753A (en
Inventor
胡盼盼
胡浩
赵茜
利啟东
高玮
杨超龙
黄聿
梁容铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Bozhilin Robot Co Ltd
Original Assignee
Guangdong Bozhilin Robot Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Bozhilin Robot Co Ltd filed Critical Guangdong Bozhilin Robot Co Ltd
Priority to CN201911372226.3A priority Critical patent/CN110765753B/en
Publication of CN110765753A publication Critical patent/CN110765753A/en
Application granted granted Critical
Publication of CN110765753B publication Critical patent/CN110765753B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a document generation method, a system, a computer device and a storage medium, wherein the method comprises the following steps: acquiring a document first draft obtained by filling key information into a document template; identifying the entity of the long and medium case knowledge patterns in the case first draft through the entity sequence model, calling the knowledge patterns of the entity from the long case knowledge patterns, and integrating the information of the knowledge patterns of the entity into natural language to obtain an expansion statement; and adding the extended sentence into the document initial draft to generate the target document. According to the method, the entity in the document draft is identified through the entity sequence model, the information of the knowledge graph of the entity is integrated into the expansion statement, the expansion statement is added to the document draft to generate the target document which is completed in expansion, the expansion direction of the expansion statement is limited according to the long document knowledge graph in the appointed field, meanwhile, the breadth of the long document knowledge graph is large, the multi-element expansion direction is provided, and the efficiency of creating the long document is improved.

Description

Document generation method, system, computer device and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a document generating method, a system, a computer device, and a storage medium.
Background
The manual writing of the document often takes a great deal of time and mind to collect data and think about writing, so that the document requirements under different scenes can be met, and time and effort can be wasted.
In recent years, along with the continuous development of artificial intelligence, intelligent writing applied to various fields is developed, and the writing efficiency is improved. The intelligent writing can also generate articles based on writing templates according to key variables in a specific field, such as generating weather forecast, stock market report, sports event and other styles of articles. The expression style of the articles in the field can be outlined according to directionality by writing templates.
In real estate marketing, a large amount of long-term documents are required to assist in popularization of real estate projects, and according to a fixed plurality of types of real estate long-term document writing templates, only real estate long-term documents with a plurality of styles specified can be generated, and a large amount of real estate long-term documents are engraved and solidified, so that the efficiency of long-term document creation is low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a document creation method, system, computer device, and storage medium that address the above-described technical drawbacks, particularly those that are inefficient in document creation.
A document generation method comprising the steps of:
Acquiring a document first draft obtained by filling key information into a document template;
identifying the entity of the medium-length document knowledge graph of the document draft through an entity sequence model, wherein the entity sequence model is obtained by training an article marked with the entity;
the knowledge graph of the entity is called from the long case knowledge graph, and the information of the knowledge graph of the entity is integrated into natural language to obtain an expansion statement;
and adding the extended sentence into the document draft to generate a target document.
In one embodiment, the step of obtaining the document first draft obtained by filling the key information into the document template includes:
matching the document template according to variable information and key information input by a user; receiving supplementary information input by a user into the document template; and generating the document first draft according to the key information, the supplementary information and the document template.
In one embodiment, before the step of obtaining the document first draft obtained by filling the key information into the document template, the method further comprises:
labeling variable information for each document in the document database; dividing various types of content modules from the document, and respectively extracting a writing template from each content module; marking the key information type for the content template according to the key information in the content template;
And matching the variable information and the key information input by the user with the marked variable information and key information types to obtain the document template.
In one embodiment, before the step of retrieving the knowledge-graph of the entity from the long-document knowledge-graph, the method further includes:
carrying out application named entity identification and relation extraction on each document in a document database to obtain an entity and an entity relation in the document;
and constructing the long-document knowledge graph according to the variable information, the key information type, the appointed expertise and the entity and entity relation in the document marked on the document database.
In one embodiment, before the step of identifying the entity of the long-medium document knowledge graph in the document draft through the entity sequence model, the method further comprises:
inputting a document sample marked with the corresponding entity in the long document knowledge graph into an initial entity sequence model, and training the entity sequence model.
In one embodiment, the step of retrieving the knowledge spectrum of the entity from the long case knowledge spectrum, and integrating the information of the knowledge spectrum of the entity into a natural language to obtain an extended sentence includes:
Invoking a relation entity connected with the entity and relation information between the relation entity and the relation entity, and acquiring entity attributes of the relation entity; searching the characteristic information related to the entity, the relation information and the entity attribute from a database; and integrating according to the characteristic information and the relation information and natural language, and generating the expansion statement.
In one embodiment, the expansion statement comprises a first-level expansion statement;
the step of retrieving the knowledge graph of the entity from the long-case knowledge graph, and integrating the information of the knowledge graph of the entity into natural language to obtain an extended sentence comprises the following steps:
the first-level relation entity directly connected with the entity and the first-level relation information between the first-level relation entity and the entity are called from the long-document knowledge graph; acquiring entity attributes of the entity and the first-level relation entity in the long-document knowledge graph; and integrating the first-level relation entity, the first-level relation information and the entity attribute into a natural language to obtain a first-level expansion statement.
In one embodiment, the expansion statement further comprises a multi-level expansion statement;
the step of retrieving the knowledge graph of the entity from the long-case knowledge graph, integrating the information of the knowledge graph of the entity into natural language to obtain an extended sentence, and further comprising:
The multi-level relation entity indirectly connected with the entity and the multi-level relation information between the multi-level relation entity are called from the long case knowledge graph; acquiring entity attributes of the entity and the multi-level relation entity in the long-text knowledge graph; and integrating the multi-level relation entity, the multi-level relation information and the entity attribute into a natural language to obtain a multi-level expansion statement.
In one embodiment, after the step of retrieving the knowledge-graph of the entity from the long document knowledge-graph, the method further includes:
generating expansion suggestion information according to the entity and entity relation in the knowledge graph of the entity; returning the expansion suggestion information to a user, and receiving an expansion text input by the user according to the expansion suggestion information; and adding the extended text in the document initial draft to generate the target document.
In one embodiment, after the step of generating the target document, further comprising:
performing text error correction checking and contraband checking on the target text; if grammar errors exist in the target text, marking the grammar errors by underlines; if the forbidden words exist in the target case, the forbidden words are framed by a box.
In one embodiment, the types of content modules include: one or more of properties, decorations, gardens, project planning, municipal adaptations, intellectualization, education, house description, product advantages, land, business, traffic, theme, instruction, gifts, and gifts and discounts.
In one embodiment, the document database stores a floor promotion document, and the variable information includes one or more of advertisement purposes, item attributes, line styles and types.
In one embodiment, the types of entities in the long-document knowledge graph include one or more of a building district, a transportation facility, a school, a sports fitness facility, a hospital, a bank, a property, a decoration, a house.
In one embodiment, the type of physical relationship includes a distance between the building cell and a transportation facility, school, hospital, bank, and one or more of ancillary properties between the building cell and a sports fitness facility, property, decoration, house type.
A document generation system, comprising:
the acquisition module is used for acquiring a document first draft obtained by filling key information into the document template;
the identification module is used for identifying the entity of the medium-length document knowledge graph in the document draft through an entity sequence model, wherein the entity sequence model is obtained by training the article marked with the entity;
The retrieving module is used for retrieving the knowledge graph of the entity from the long-text knowledge graph, and integrating the information of the knowledge graph of the entity into natural language to obtain an expansion statement;
and the generation module is used for adding the expansion statement into the document draft to generate a target document.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the document generation method of any of the embodiments described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the document generation method of any of the embodiments described above.
According to the document generation method, the system, the computer equipment and the storage medium, the entity in the document first draft is identified through the entity sequence model, the expandable item is found based on the long document knowledge graph, the knowledge graph of the entity is called, the information of the knowledge graph of the entity is integrated into the expansion statement, the expansion statement is added to the document first draft to generate the target document for completing expansion, the long document knowledge graph can further ensure that the expansion direction is limited in the appointed field of the long document knowledge graph, meanwhile, the breadth of the long document knowledge graph is large, the multi-element expansion direction is provided, and the efficiency of the creation of the long document is improved.
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice.
Drawings
The foregoing and/or additional aspects and advantages will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a document template in one embodiment;
FIG. 2 is a flow chart of a method of file generation in one embodiment;
FIG. 3 is a schematic diagram of a knowledge graph in one embodiment;
FIG. 4 is a schematic diagram of a knowledge graph under an real estate project document application in one embodiment;
FIG. 5 is a schematic diagram of marking grammar errors and contraband words in one embodiment;
FIG. 6 is a schematic diagram of a document template in yet another embodiment;
FIG. 7 is a diagram of extended advice information in one embodiment;
FIG. 8 is a schematic diagram of a file generation system in one embodiment;
FIG. 9 is a schematic diagram of the internal structure of a computer device in one embodiment.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As shown in fig. 1, fig. 1 is a schematic diagram of a document template in which "X" represents a fixed text and an underline represents critical information that needs to be additionally supplemented in one embodiment.
In one embodiment, as shown in fig. 2, fig. 2 is a flowchart of a document generation method in one embodiment, and in this embodiment, a document generation method is provided, where the document generation method may be applied to a computer device, and may specifically include the following steps:
step S210: and acquiring a document first draft obtained by filling the key information into the document template.
In this step, the document template may be selected by the user, and after determining the key information to be filled in the document template, the key information is filled in the corresponding position of the document template, so as to obtain the document draft. For example, the user may input key information of the supplementary document template, or the key information may be obtained by searching from corresponding pre-stored data.
Specifically, in one embodiment, the step of acquiring the document draft obtained by filling the key information into the document template in step S210 may include:
s211: and matching the document template according to the variable information and the key information input by the user.
Variable information and key information provided by a user are obtained, the variable information and the key information are compared with labels of the document templates, and a plurality of document templates with high correlation degree are matched. If a plurality of document templates are matched, the document templates can be displayed to a user and one or more document templates are determined according to the selection of the user.
S212: supplementary information of the text template input by the user is received.
And determining a part to be supplemented according to the key information and the matched document template, prompting a user to input the supplemental information, and receiving the supplemental information of the document template input by the user after the user inputs the supplemental information of the document template.
S213: and generating a document first draft according to the key information, the supplementary information and the document template.
And filling the key information and the supplementary information into the corresponding positions of the document template to generate a complete document draft.
Step S220: and identifying the entity of the medium-length document knowledge graph in the document draft through an entity sequence model, wherein the entity sequence model is obtained by training according to the article marked with the entity.
In the step, the entity which can be classified into the long document knowledge graph in the document first draft is identified, so that the extensible item which can be used for carrying out document extension based on the long document knowledge graph is identified for the document first draft.
The identified entity can belong to key information in the document draft or can belong to a document template in the document draft, and the key information and the content of the document template can be used as extensible items for document expansion.
Each knowledge graph has a corresponding appointed field, the entity and entity relation in the long-text knowledge graph belong to the characteristic information in the appointed field, and the long-text knowledge graph and the entity sequence module based on the long-text knowledge graph are limited in the appointed field, so that the extensible item can be determined in the appointed field, and the extension direction can be effectively guided.
The entity sequence model is trained according to the articles marked with the entities. And inputting the document draft to the trained entity sequence model, wherein the entity sequence model can output the entities in the document draft. The entity sequence model may be trained using the Bi-LSTM+CRF model, or may be trained using the Bi-GRU+CRF model or a CRF++ tool (Bi-LSTM: bidirectional long/short term memory networks, two-way long and short term memory network; CRF: conditional Random Field, conditional random field; bi-GRU: bidirectional gated recurrent units, two-way gated loop unit).
Step S230: and calling the knowledge graph of the entity from the long-text knowledge graph, and integrating the information of the knowledge graph of the entity into natural language to obtain an extended sentence.
In this step, the knowledge graph of the entity is fetched, the entity and entity relation related to the entity are obtained, the related entity and the entity relation corresponding to the related entity are organized into an expansion statement according to the natural language rule, and the expansion of the expandable item (entity) in the expandable direction (related entity and entity relation) is completed according to the completion. According to the long-case knowledge graph of the appointed field, the expansion direction of the expansion sentence can be limited, as shown in fig. 3, fig. 3 is a schematic diagram of the knowledge graph in one embodiment, and fig. 3 shows the knowledge graph of the entity 1; fig. 4 is a schematic diagram of a knowledge graph under application of a real estate project document in one embodiment, and fig. 4 shows a knowledge graph of an entity of "cell".
Step S240: and adding the extended sentence into the document initial draft to generate the target document.
In the step, an expansion statement is added into the document draft to realize the expansion of the document draft.
The expanded sentence may be selected by the user to place or may be selected by the user to add or not to the document draft. If the statement of the expanded statement is not smooth or lacks information, the information can be modified and added by the user.
According to the document generation method, the entity in the document initial draft is identified through the entity sequence model, the extensible item is found based on the long document knowledge graph, the knowledge graph of the entity is called, the information of the knowledge graph of the entity is integrated into the extension statement, the extension statement is added to the document initial draft to generate the target document for completing extension, the long document knowledge graph can further guarantee that the extension direction is limited in the appointed field of the long document knowledge graph, meanwhile, the breadth of the long document knowledge graph is large, the multi-element extension direction is provided, and the efficiency of creating the long document is improved.
In one embodiment, after the step of generating the target document in step S240, further includes:
step S251: performing text error correction checking and contraband checking on the target text; step S252: if grammar errors exist in the target document, marking the grammar errors by underlines; step S253: if the forbidden words exist in the target document, the forbidden words are framed by a box.
The document generation method can perform intelligent error correction and forbidden word checking so as to ensure the quality of generated text. Judging whether grammar errors exist in the target text in the text error correction checking, marking suspicious errors by using underlines, detecting whether forbidden words are used in the target text in the forbidden word checking, and framing suspicious forbidden words by using boxes. For example, as shown in fig. 5, fig. 5 is a schematic diagram of marking grammar errors and contraband words in one embodiment, and when the target document is displayed back to the user, the underline and the box can also be displayed simultaneously, so as to prompt the user of suspicious errors and suspicious contraband words.
In one embodiment, step S230 of retrieving the knowledge spectrum of the entity from the long text knowledge spectrum and integrating the information of the knowledge spectrum of the entity into natural language to obtain the extended sentence may include:
s231: and retrieving the relation entity connected with the entity and relation information between the relation entity and the relation entity, and acquiring entity attributes of the relation entity.
Determining a relation entity connected with the entity, and retrieving entity attributes of the relation entity and the relation entity, and retrieving relation information between the entity and the relation entity.
The "sky curtain" in the draft of the document is matched with the entity in the knowledge graph of the long document (as shown in fig. 4) as a "cell", taking the entity "cell" as an example, the attribute of the "cell" comprises a "name", the knowledge graph of the "cell" is called, and the relationship entity connected with the "cell" comprises a "school", "subway station" and a "sports supporting facility". The attributes of the school include a scale and a property, and the relationship information between the cell and the school is a distance; the attribute of the subway station comprises a name and a number line, and the relation information between the cell and the subway station is a distance; the attributes of the sports complete equipment include the occupied area, the type and the adaptation group, and the relation information between the district and the sports complete equipment is the 'presence'.
S232: and searching the characteristic information associated with the entity, the relation information and the entity attribute.
Searching the feature information corresponding to the entity based on the relationship under the entity from the database or prompting the user to provide the associated feature information.
The characteristic information includes specific information truly related to the entity. Taking the "sky curtain" of the matched entity as an example, searching a database for a "school", "subway station" and a "sports matching facility" of the "sky curtain" in a distance near the "sky curtain", extracting a "scale" and a "property" of the "school", determining a "name" and a "number line" of the "subway station", and determining a "sports matching facility", "occupied area", a "type" and an "adaptation group" of the "subway station". If the first school is found out that the matched school is located in 500 m near the sky curtain, the first school is the office, the scale is 60 teaching classes, 3000 students are students, and the like.
S233: and integrating according to the characteristic information and the relation information and natural language to generate an expansion statement.
Combining the relation information and the characteristic information to generate an expansion statement, and taking the characteristic information of 'school' and 'distance' as an example to generate the expansion statement: "first primary school distance 500 meters". Matching extended templates may also be invoked to integrate feature information and relationship information, such as "backdrop house adjacent first school".
The relationships with the entity connections may include a direct connected primary relationship and an indirect connected multi-stage relationship. For example, the relationship entities and relationship information between them and entity attributes may be in several levels of relationships of the entities or in specified levels of relationships. The first-stage expansion sentence and the second-stage expansion sentence are expanded by taking the knowledge maps of the entities under the first-stage relationship and the second-stage relationship as examples, respectively, see the following examples.
In one embodiment, the expansion statement comprises a first level expansion statement, which is an expansion statement directly related to the entity. In step S230, the step of retrieving the knowledge spectrum of the entity from the long-text knowledge spectrum and integrating the information of the knowledge spectrum of the entity into natural language to obtain the extended sentence may include:
s2301: and calling the first-level relation entity directly connected with the entity and the first-level relation information between the first-level relation entity and the entity from the long-document knowledge graph. S2302: and acquiring entity attributes of the entity and the first-level relation entity in the long-document knowledge graph. S2303: and integrating the first-level relation entity, the first-level relation information and the entity attribute into natural language to obtain a first-level expansion statement.
According to the document generation method, document expansion can be performed according to the most relevant first-level relation, the relevance of expansion content and document manuscript is improved, and the range and efficiency of long document expansion are improved.
In one embodiment, the expanded sentence further comprises a multi-level expanded sentence, the multi-level expanded sentence being an expanded sentence indirectly related to the entity. In step S230, the step of retrieving the knowledge spectrum of the entity from the long-text knowledge spectrum, and integrating the information of the knowledge spectrum of the entity into natural language to obtain the extended sentence may further include:
s2304: and the multi-level relation entity indirectly connected with the entity and the multi-level relation information between the multi-level relation entity are called from the long-text knowledge graph. S2305: and acquiring entity attributes of the entity and the multi-level relation entity in the long-document knowledge graph. S2306: and integrating the multi-level relation entity, the multi-level relation information and the entity attribute into a natural language to obtain a multi-level expansion statement.
According to the document generation method, document expansion can be performed according to the indirectly related multi-level relationship, the expansion direction can be increased deeply, comprehensively and in multiple directions on the basis of the primary relationship, the thinking of document creation is enriched, and the efficiency of long document creation is improved.
In one embodiment, before the step of obtaining the document first draft obtained by filling the key information into the document template, the method may further include:
s201: and labeling variable information for each document in the document database. S202: various types of content modules are divided from the document, and authoring templates are extracted from the respective content modules.
The extraction basis of the writing template is a large amount of true document data with labels, and for each document, variable information corresponding to each section of document needs to be divided and labeled, and a content module for indicating the content of each section of document belongs to.
S203: and marking the key information type for the content template according to the key information in the content template.
The key information in any content module needs to be marked. The key information revelation is special information specific to the text, the special information is scratched out, and the rest of the content is the writing template. And filling the user-defined key information into the file to generate the exclusive file.
S204: and matching the variable information and the key information input by the user with the marked variable information and key information types to obtain the document template.
According to the document generation method, the document template is extracted, and the matched document template is provided according to the user input information, so that the document generation efficiency is improved.
Taking a document template applied to real estate project documents as an example, a large number of real estate project documents are contained in a document database, and the whole real estate project documents need to be marked with advertisement purposes, project attributes, line style, types and the like. FIG. 6 is a schematic diagram of a document template in yet another embodiment, the first piece of content module as shown in FIG. 6 being a value output class-related authoring template with labels for promotions, chinese pornography (item attributes), popularity (style of line), hardness (genre), and at some locations to populate the authentication qualifications of the property and proprietary service information.
Taking real estate project file as an example, the types of the content modules include: one or more of properties, decorations, gardens, project planning, municipal adaptations, intellectualization, education, house description, product advantages, land, business, traffic, theme, instruction, gifts, and gifts and discounts.
The text database stores the floor popularization text, and the variable information comprises one or more of advertisement purposes, project attributes, line text styles and types.
The types of the entities in the long-text knowledge graph comprise one or more of building communities, traffic facilities, schools, sports fitness facilities, hospitals, banks, property, decoration and house types;
the types of physical relationships include the distances between the building cells and traffic facilities, schools, hospitals, banks, and one or more of the ancillary properties between the building cells and sports fitness facilities, properties, decorations, and house types.
The variable information may include advertising purposes, item attributes, line styles, document types, value output classes, brand value classes, regular promotion classes, and content modules of node promotion classes. The key information may include characteristic information of various types of content modules in the application field.
Taking the characteristic information of the real estate project file as an example, the types of the content modules comprise one or more of value output class, brand value class, conventional promotion class and node promotion class.
The information of the value output class content module includes: one or more of property information, decoration information, garden information, project planning information, municipal support information, intelligent information, educational information, house description information, product advantage information, section information, business district information, and traffic information. For example, property information includes information such as certification qualification, proprietary service, etc.; the decoration information comprises decoration price, decoration style, designer and the like; the garden information comprises information such as styles, areas, bright spots and the like; project planning information comprises volume rate, occupied area, greening rate, building area and other information; the municipal matched information comprises government investment, municipal information and the like; the intelligent information comprises information such as fresh air, purified water, access control, security protection and the like; the education information comprises information of primary schools, junior middle schools, high schools, universities and the like; house description information: or inputting information such as house type characteristics, area, hall and the like; the product advantage information comprises house type characteristics, area, hall and other information; the section information comprises information of hospitals, banks, entertainment living facilities, leisure facilities and the like; the business district information comprises facilities such as hospitals, banks, entertainment life and leisure, shopping centers and the like; traffic information includes subway, highway, expressway, urban rail, highway, etc.
The information of the brand value content module can select the contents of group introduction, group architecture, development process, obtained honor, business scope, group concept, development target, enterprise culture, social responsibility, public welfare activity, group college, and the like in a document library.
The information of the conventional promotional-type content module may include: one or more of theme thesaurus information, gift information, and gift and discount information. The real estate item information of the node promotion class content module includes: one or more of theme-and-speech information, gift information, tail-disk-saw information, and presentation and discount information. For example, the topic description information includes information such as topic, promotion means, promotion force, etc.; the giving and preferential discount information comprises discount, giving and other information; the gift information comprises a gift and a receiving condition; the taildisk preferential information comprises information such as preferential and discount of buying houses.
Taking a real estate project case as an example, the project attributes can comprise Chinese style house, general house, quality conventional house, living conventional house, coast, hot spring, mountain and water, apartment, office building and the like; the style of the line can comprise people's wind, local tyrant wind, light luxury wind, literature wind, green spring wind, vacation leisure wind, office business wind, investment elevation wind, england noble wind, business wind, emotion appeal wind and the like; the content modules of the value output class may include information related to the project, such as property, decoration, gardening, project planning, municipal support, intelligence, education, house description, product advantage, land, business, traffic, etc.
In one embodiment, before the step of retrieving the knowledge-graph in which the entity is located from the long-document knowledge-graph, the method may further include:
s261: and carrying out application named entity identification and relation extraction on each document in the document database to obtain the entity and entity relation in the document.
And processing the existing documents in the document database by using a named entity identification and relation extraction tool, identifying the entities in the existing documents and extracting entity relations among the entities, and summarizing the obtained entities and entity relations.
S261: and constructing a long-term document knowledge graph according to the variable information, the key information type, the appointed expertise and the entity and entity relation in the document marked on the document database.
And determining the entity and entity relation which can be incorporated into the long document knowledge graph according to the variable information, the key information type, the appointed expertise and the entity and entity relation in the document, which are marked on the document database, and constructing the long document knowledge graph.
For example, in the real estate project file, "floor name", "cell name", "peripheral subway line", "peripheral school name", "decoration style", etc. may be used as types and attributes of entities, such as "name", "type", etc., and "distance between two places", "traffic pattern between two places", etc., may be used as types of entity relationships. And finally, integrating the information into a knowledge graph by integrating the above works.
In one embodiment, before the step of identifying the entity of the long-medium document knowledge graph in the document draft through the entity sequence model, the method may further include:
and inputting a document sample marked with the corresponding entity in the long document knowledge graph into an initial entity sequence model to train out the entity sequence model.
Entities are marked on the text content in the text sample, and the entities in the long text knowledge graph are marked. The object of the entity sequence model is to be able to identify entities contained in a text, in particular to be able to identify entities in a long text knowledge graph. And inputting the document draft to the trained entity sequence model, wherein the entity sequence model can output entities in the document draft, for example, the output labels identify which entities the document draft has. According to the output entity, the position of the entity in the document draft can be correspondingly determined, and the addition of the expansion statement is carried out at the position of the entity, so that the fluency of document expansion is enhanced. And in the visual display of the draft of the document, the entity can be marked by underlining according to the position of the entity so as to prompt the user.
In one embodiment, after the step of retrieving the knowledge-graph of the entity from the long document knowledge-graph in step S230, the method may further include:
S271: and generating expansion suggestion information according to the entity and the entity relationship in the knowledge graph of the entity.
And calling an expansion suggestion template, and adding the entity and the entity relationship into the expansion suggestion template to obtain expansion suggestion information.
S272: and returning the expansion suggestion information to the user, and receiving expansion text input by the user according to the expansion suggestion information.
FIG. 7 is a schematic diagram of expansion advice information in one embodiment, as shown in FIG. 7, where the expansion advice for "Tian Zhu" is returned and displayed to the user, and where the user's cursor arrow moves, the expansion advice corresponding to "Tian Zhu" is displayed, and the remaining text with underlining is correspondingly provided with the corresponding expansion advice. And the user inputs the expansion text for carrying out supplementary expansion on the document draft according to the related expansion suggestion.
S273: and adding the extended text in the document initial draft to generate the target document.
According to the document generation method, after the extended text which is supplemented according to the extended suggestion information is added in the document draft, the extension of the document draft is completed, the target document is obtained, the user can freely extend and create documents with different lengths according to the extended suggestion information, different scenes are adapted, and the efficiency of creating long documents is improved.
In one embodiment, as shown in fig. 8, fig. 8 is a schematic structural diagram of a document generation system in one embodiment, and the present application further provides a document generation system, which may specifically include: acquisition module 810, identification module 820, retrieval module 830, and generation module 840, wherein:
the acquiring module 810 is configured to acquire a document first draft obtained by filling the key information into the document template.
The obtaining module 810 may be selected by a user, and after determining the key information to be filled in the document template, the key information is filled in the corresponding position of the document template to obtain the document draft. For example, the user may input key information of the supplementary document template, or the key information may be obtained by searching from corresponding pre-stored data.
The identifying module 820 is configured to identify an entity of the long and medium document knowledge graph in the document draft through an entity sequence model, where the entity sequence model is obtained by training an article marked with the entity.
The identification module 820 identifies the entity which can be classified into the long document knowledge graph in the document first draft, so as to identify the extensible item which can be used for document extension based on the long document knowledge graph for the document first draft.
The identified entity can belong to key information in the document draft or can belong to a document template in the document draft, and the key information and the content of the document template can be used as extensible items for document expansion.
Each knowledge graph has a corresponding appointed field, the entity and entity relation in the long-text knowledge graph belong to the characteristic information in the appointed field, and the long-text knowledge graph and the entity sequence module based on the long-text knowledge graph are limited in the appointed field, so that the extensible item can be determined in the appointed field, and the extension direction can be effectively guided.
The retrieving module 830 is configured to retrieve a knowledge graph of the entity from the long-document knowledge graph, and integrate information of the knowledge graph of the entity into a natural language to obtain an extended sentence.
The invoking module 830 invokes the knowledge graph of the entity to acquire the entity and the entity relationship related to the entity, organizes the related entity and the corresponding entity relationship into an expansion statement according to the natural language rule, and completes the expansion of the expandable item (entity) in the expandable direction (related entity and entity relationship).
The generating module 840 is configured to add an extension sentence to the document draft, and generate a target document.
The generating module 840 adds the expanded sentence to the document draft to realize the expansion of the document draft.
The expanded sentence may be selected by the user to place or may be selected by the user to add or not to the document draft. If the statement of the expanded statement is not smooth or lacks information, the information can be modified and added by the user.
According to the document generation system, the entity in the document initial draft is identified through the entity sequence model, the extensible item is found based on the long document knowledge graph, the knowledge graph of the entity is called, the information of the knowledge graph of the entity is integrated into the extension statement, the extension statement is added to the document initial draft to generate the target document for completing extension, the long document knowledge graph can further guarantee that the extension direction is limited in the appointed field of the long document knowledge graph, meanwhile, the breadth of the long document knowledge graph is large, the multi-element extension direction is provided, and the efficiency of creating the long document is improved.
For specific limitations of the document generation system, reference may be made to the above limitations of the document generation method, and no further description is given here. The various modules in the above-described document generation system may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
As shown in fig. 9, fig. 9 is a schematic diagram of an internal structure of the computer device in one embodiment. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. The nonvolatile storage medium of the computer device stores an operating system, a database and a computer program, and when the computer program is executed by a processor, the processor can realize a document generating method. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may have stored therein a computer program which, when executed by a processor, causes the processor to perform a method of generating a document. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the document generation method of any of the embodiments described above when the computer program is executed by the processor.
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the document generation method of any of the embodiments described above.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A document generation method, comprising the steps of:
acquiring a document first draft obtained by filling key information into a document template;
identifying the entity of the medium-length document knowledge graph of the document draft through an entity sequence model, wherein the entity sequence model is obtained by training an article marked with the entity;
the knowledge graph of the entity is called from the long case knowledge graph, and the information of the knowledge graph of the entity is integrated into natural language to obtain an expansion statement;
adding the extended sentence into the document first draft to generate a target document;
the step of obtaining the document manuscript obtained by filling the key information into the document template comprises the following steps:
matching the document template according to variable information and key information input by a user;
determining parts to be supplemented according to the key information and the matched document template, and prompting a user to input supplementary information;
Receiving supplementary information input by a user into the document template;
generating the document first draft according to the key information, the supplementary information and the document template;
after the step of retrieving the knowledge graph of the entity from the long case knowledge graph, the method further comprises:
generating expansion suggestion information according to the entity and entity relation in the knowledge graph of the entity;
returning the expansion suggestion information to a user, and receiving an expansion text input by the user according to the expansion suggestion information;
adding the extended text in the document initial draft to generate the target document;
after the step of generating the target document, further comprising:
performing text error correction checking and contraband checking on the target text;
if grammar errors exist in the target text, marking the grammar errors by underlines;
if the forbidden words exist in the target case, the forbidden words are framed by a box; wherein,,
before the step of retrieving the knowledge graph of the entity from the long case knowledge graph, the method further comprises:
carrying out application named entity identification and relation extraction on each document in a document database to obtain an entity and an entity relation in the document;
And constructing the long-document knowledge graph according to the variable information, the key information type, the appointed expertise and the entity and entity relation in the document marked on the document database.
2. The document generating method according to claim 1, wherein before the step of obtaining the document draft obtained by filling the document template with the key information, further comprising:
labeling variable information for each document in the document database;
dividing various types of content modules from the document, and respectively extracting a writing template from each content module;
marking the key information type for the content template according to the key information in the content template;
and matching the variable information and the key information input by the user with the marked variable information and key information types to obtain the document template.
3. The document generating method according to claim 2, further comprising, before the step of identifying the entity of the document knowledge-graph in the document draft by an entity sequence model:
inputting a document sample marked with the corresponding entity in the long document knowledge graph into an initial entity sequence model, and training the entity sequence model.
4. The document generating method according to claim 1, wherein the step of retrieving the knowledge-graph of the entity from the long document knowledge-graph and integrating the information of the knowledge-graph of the entity into a natural language to obtain an extended sentence includes:
invoking a relation entity connected with the entity and relation information between the relation entity and the relation entity, and acquiring entity attributes of the relation entity;
searching the characteristic information related to the entity, the relation information and the entity attribute from a database;
and integrating according to the characteristic information and the relation information and natural language, and generating the expansion statement.
5. The document generating method according to claim 1, wherein the extension sentence includes a first-level extension sentence;
the step of retrieving the knowledge graph of the entity from the long-case knowledge graph, and integrating the information of the knowledge graph of the entity into natural language to obtain an extended sentence comprises the following steps:
the first-level relation entity directly connected with the entity and the first-level relation information between the first-level relation entity and the entity are called from the long-document knowledge graph;
Acquiring entity attributes of the entity and the first-level relation entity in the long-document knowledge graph;
and integrating the first-level relation entity, the first-level relation information and the entity attribute into a natural language to obtain a first-level expansion statement.
6. The document generating method according to claim 5, wherein the extension sentence further comprises a multi-stage extension sentence;
the step of retrieving the knowledge graph of the entity from the long-case knowledge graph, integrating the information of the knowledge graph of the entity into natural language to obtain an extended sentence, and further comprising:
the multi-level relation entity indirectly connected with the entity and the multi-level relation information between the multi-level relation entity are called from the long case knowledge graph;
acquiring entity attributes of the entity and the multi-level relation entity in the long-text knowledge graph;
and integrating the multi-level relation entity, the multi-level relation information and the entity attribute into a natural language to obtain a multi-level expansion statement.
7. The document generating method according to claim 2, wherein the type of the content module includes: one or more of property, decoration, gardening, project planning, municipal matching, intelligentization, education, house description, product advantage, section, business district, traffic, theme, instruction, gift and discount;
The text database stores a building popularization text, and the variable information comprises one or more of advertisement purposes, project attributes, text styles and types;
the types of the entities in the long-term case knowledge graph comprise one or more of building communities, traffic facilities, schools, sports fitness facilities, hospitals, banks, property, decoration and house types;
the types of entity relationships include distances between building cells and transportation facilities, schools, hospitals, banks, and one or more of ancillary properties between building cells and sports fitness facilities, property, decoration, house types.
8. A document generation system for performing the steps of the document generation method of any one of the preceding claims 1 to 7, the document generation system comprising:
the acquisition module is used for acquiring a document first draft obtained by filling key information into the document template;
the identification module is used for identifying the entity of the medium-length document knowledge graph in the document draft through an entity sequence model, wherein the entity sequence model is obtained by training the article marked with the entity;
the retrieving module is used for retrieving the knowledge graph of the entity from the long-text knowledge graph, and integrating the information of the knowledge graph of the entity into natural language to obtain an expansion statement;
The generation module is used for adding the expansion sentence in the document draft to generate a target document;
wherein the generating module is further used for generating the data,
matching the document template according to variable information and key information input by a user;
determining parts to be supplemented according to the key information and the matched document template, and prompting a user to input supplementary information;
receiving supplementary information input by a user into the document template;
generating the document first draft according to the key information, the supplementary information and the document template;
generating expansion suggestion information according to the entity and entity relation in the knowledge graph of the entity;
returning the expansion suggestion information to a user, and receiving an expansion text input by the user according to the expansion suggestion information;
adding the extended text in the document initial draft to generate the target document;
performing text error correction checking and contraband checking on the target text;
if grammar errors exist in the target text, marking the grammar errors by underlines;
if the forbidden words exist in the target case, the forbidden words are framed by a box.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the document generation method of any one of claims 1 to 7 when the computer program is executed.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the document generation method of any one of claims 1 to 7.
CN201911372226.3A 2019-12-27 2019-12-27 Document generation method, system, computer device and storage medium Active CN110765753B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911372226.3A CN110765753B (en) 2019-12-27 2019-12-27 Document generation method, system, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911372226.3A CN110765753B (en) 2019-12-27 2019-12-27 Document generation method, system, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN110765753A CN110765753A (en) 2020-02-07
CN110765753B true CN110765753B (en) 2023-07-14

Family

ID=69341567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911372226.3A Active CN110765753B (en) 2019-12-27 2019-12-27 Document generation method, system, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN110765753B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553436A (en) * 2020-04-23 2021-10-26 广东博智林机器人有限公司 Knowledge graph updating method and device, electronic equipment and storage medium
CN111666746B (en) * 2020-06-05 2023-09-29 中国银行股份有限公司 Conference summary generation method and device, electronic equipment and storage medium
CN111930959B (en) * 2020-07-14 2024-02-09 上海明略人工智能(集团)有限公司 Method and device for generating text by map knowledge
CN112348638B (en) * 2020-11-09 2024-02-20 上海秒针网络科技有限公司 Activity document recommending method and device, electronic equipment and storage medium
CN112507128A (en) * 2020-12-07 2021-03-16 云南电网有限责任公司普洱供电局 Content filling prompting method for power distribution network operation file and related equipment
CN112733515B (en) * 2020-12-31 2022-11-11 贝壳技术有限公司 Text generation method and device, electronic equipment and readable storage medium
CN113935306A (en) * 2021-09-14 2022-01-14 有米科技股份有限公司 Method and device for processing advertising pattern template
CN114997131A (en) * 2022-05-19 2022-09-02 北京沃东天骏信息技术有限公司 File generation method, model training device, file generation device, file training equipment and storage medium
CN117350271A (en) * 2023-09-28 2024-01-05 上海臣道网络科技有限公司 AI content generation method and service cloud platform based on large language model
CN118095293A (en) * 2024-04-24 2024-05-28 卓世未来(天津)科技有限公司 Text extension method and system based on large language model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018072563A1 (en) * 2016-10-18 2018-04-26 中兴通讯股份有限公司 Knowledge graph creation method, device, and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9406020B2 (en) * 2012-04-02 2016-08-02 Taiger Spain Sl System and method for natural language querying
CN106844322A (en) * 2017-01-22 2017-06-13 百度在线网络技术(北京)有限公司 Intelligent article generation method and device
CN106970749A (en) * 2017-02-06 2017-07-21 广东小天才科技有限公司 A kind of writing method and device based on mobile terminal
CN108563620A (en) * 2018-04-13 2018-09-21 上海财梵泰传媒科技有限公司 The automatic writing method of text and system
CN110309320B (en) * 2019-06-28 2021-04-06 浙江传媒学院 NBA basketball news automatic generation method combining NBA event knowledge map

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018072563A1 (en) * 2016-10-18 2018-04-26 中兴通讯股份有限公司 Knowledge graph creation method, device, and system

Also Published As

Publication number Publication date
CN110765753A (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN110765753B (en) Document generation method, system, computer device and storage medium
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
Nerlich et al. Theory and language of climate change communication
Trajber et al. Promoting climate change transformation with young people in Brazil: Participatory action research through a looping approach
CN109902288A (en) Intelligent clause analysis method, device, computer equipment and storage medium
CN104615616A (en) Group recommendation method and system
CN111881262A (en) Text emotion analysis method based on multi-channel neural network
CN112417100A (en) Knowledge graph in Liaodai historical culture field and construction method of intelligent question-answering system thereof
CN107798123A (en) Knowledge base and its foundation, modification, intelligent answer method, apparatus and equipment
Mattern Cloud and field
Gomez et al. Self-supervised learning from web data for multimodal retrieval
CN110110218B (en) Identity association method and terminal
CN112966053B (en) Knowledge graph-based marine field expert database construction method and device
Poole Ecolinguistics, GIS, and corpus linguistics for the analysis of the Rosemont Copper mine debate
Zhang et al. The theme park industry in China: A research review
CN112529615A (en) Method, device, equipment and computer readable storage medium for automatically generating advertisement
CN115018549A (en) Method for generating advertisement file, device, equipment, medium and product thereof
CN113312498B (en) Text information extraction method for embedding knowledge graph by undirected graph
CN103377381B (en) The method and apparatus identifying the contents attribute of image
CN112668335A (en) Method for identifying and extracting business license structured information by using named entity
CN115422920B (en) Method for identifying dispute focus of referee document based on BERT and GAT
Juvan et al. Towards a GIS analysis of literary cultures: The making of the Slovenian ethnoscape through literature
CN112989068B (en) Knowledge graph construction method for Tang poetry knowledge and Tang poetry knowledge question-answering system
CN101568917A (en) Generating chinese language banners
CN114510943A (en) Incremental named entity identification method based on pseudo sample playback

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant