CN110765753A

CN110765753A - Method, system, computer device and storage medium for generating file

Info

Publication number: CN110765753A
Application number: CN201911372226.3A
Authority: CN
Inventors: 胡盼盼; 胡浩; 赵茜; 利啟东; 高玮; 杨超龙; 黄聿; 梁容铭
Original assignee: Guangdong Bozhilin Robot Co Ltd
Current assignee: Guangdong Bozhilin Robot Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-02-07
Anticipated expiration: 2039-12-27
Also published as: CN110765753B

Abstract

The application provides a document generation method, a system, a computer device and a storage medium, wherein the method comprises the following steps: acquiring a primary manuscript of the document obtained by filling the key information into the document template; identifying an entity of a long document knowledge graph in the initial draft of the document through an entity sequence model, calling the knowledge graph of the entity from the long document knowledge graph, and integrating the information of the knowledge graph of the entity into a natural language to obtain an expanded statement; and adding the expanded sentences into the primary manuscript of the file to generate a target file. The method identifies the entity in the primary manuscript of the long manuscript through the entity sequence model, integrates the knowledge map information of the entity into an extension statement, adds the extension statement to the primary manuscript of the long manuscript to generate a target manuscript which is completed with extension, limits the extension direction of the extension statement according to the long manuscript knowledge map in the appointed field, and simultaneously has large breadth of the long manuscript knowledge map, provides multiple extension directions and improves the efficiency of long manuscript creation.

Description

Method, system, computer device and storage medium for generating file

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a document generation method, a document generation system, a computer device, and a storage medium.

Background

Manually writing a document often needs to spend a lot of time and thinking to collect data and conceive writing to meet the document requirements in different scenes, which can waste time and labor.

In recent years, with the continuous development of artificial intelligence, intelligent writing applied to various fields is developed, and the writing efficiency of the document is improved. The intelligent writing can also generate articles based on writing templates according to key variables in specific fields, such as weather forecast, stock market report, sports events and other styles of articles. The expression style of the articles in the field can be outlined according to the directionality through the writing template.

In real estate marketing, a large number of long documents are needed to assist the popularization of real estate projects, and only real estate long documents of a plurality of specified styles can be generated according to a plurality of fixed real estate long document writing templates, so that the large number of real estate long documents are carved and solidified, and the creation efficiency of the long documents is low.

Disclosure of Invention

In view of the above, it is necessary to provide a document generation method, a system, a computer device, and a storage medium for solving the above technical drawbacks, particularly, the technical drawback of inefficient document creation.

A document generation method comprises the following steps:

acquiring a primary manuscript of the document obtained by filling the key information into the document template;

identifying an entity of the long document knowledge graph in the document primary draft through an entity sequence model, wherein the entity sequence model is obtained through training according to an article marked with the entity;

calling a knowledge graph of the entity from the long document knowledge graph, and integrating information of the knowledge graph of the entity into a natural language to obtain an expanded statement;

and adding the expanded sentences into the primary manuscript of the file to generate a target file.

In one embodiment, the step of obtaining a first manuscript of the document obtained by filling the key information into the document template includes:

matching the document template according to variable information and key information input by a user; receiving the supplementary information of the file template input by the user; and generating the primary manuscript of the document according to the key information, the supplementary information and the document template.

In one embodiment, the step of obtaining the first manuscript obtained by filling the key information into the manuscript template further comprises:

marking variable information on each file in a file database; dividing various types of content modules from the file, and extracting writing templates from the content modules respectively; marking key information types for the content template according to the key information in the content template;

and matching the variable information and the key information input by the user with the marked variable information and the marked key information type to obtain the file template.

In one embodiment, before the step of retrieving the knowledge-graph of the entity from the long-document knowledge-graph, the method further comprises:

carrying out application naming entity identification and relationship extraction on each file in the file database to obtain an entity and an entity relationship in the file;

and constructing the long file knowledge map according to the variable information, the key information type, the appointed professional knowledge, the entity in the file and the entity relationship labeled to the file database.

In one embodiment, before the step of identifying the entity of the long document knowledge-graph in the first manuscript of the document through the entity sequence model, the method further comprises:

inputting the pattern sample marked with the corresponding entity in the long text knowledge graph in the pattern content into an initial entity sequence model, and training the entity sequence model.

In one embodiment, the step of retrieving the knowledge graph of the entity from the long document knowledge graph, and integrating the information of the knowledge graph of the entity into a natural language to obtain the expanded sentence includes:

calling relationship entities connected with the entities and relationship information among the relationship entities, and acquiring entity attributes of the relationship entities; searching a database for characteristic information of the entity associated with the relationship entity, the relationship information and the entity attribute; and integrating according to the characteristic information and the relation information and natural language to generate the expanded statement.

In one embodiment, the expanded statements comprise first-level expanded statements;

the step of retrieving the knowledge graph of the entity from the long document knowledge graph, and integrating the information of the knowledge graph of the entity into a natural language to obtain an expanded statement comprises the following steps:

calling primary relationship entities directly connected with the entities and primary relationship information among the primary relationship entities from the long document knowledge graph; acquiring entity attributes of the entity and the primary relation entity in the long file knowledge graph; and integrating the primary relationship entity, the primary relationship information and the entity attribute into a natural language to obtain a primary extension statement.

In one embodiment, the expanded statements further comprise multiple levels of expanded statements;

the step of retrieving the knowledge graph of the entity from the long document knowledge graph, and integrating the information of the knowledge graph of the entity into a natural language to obtain an expanded statement, further comprises:

calling multi-level relation entities indirectly connected with the entities and multi-level relation information among the entities from the long document knowledge graph; acquiring entity attributes of the entity and the multi-level relation entity in the long file knowledge graph; and integrating the multi-level relational entity, the multi-level relational information and the entity attribute into a natural language to obtain the multi-level extension statement.

In one embodiment, after the step of retrieving the knowledge-graph of the entity from the long-document knowledge-graph, the method further comprises:

generating extension suggestion information according to the entity and the entity relation in the knowledge graph where the entity is located; returning the extended suggestion information to the user, and receiving an extended text input by the user according to the extended suggestion information; and adding the extended text in the primary manuscript of the file to generate the target file.

In one embodiment, after the step of generating the target file, the method further comprises:

carrying out text error correction inspection and forbidden word inspection on the target file; if the target file has a grammar error, marking the grammar error by underlining; and if the target file has the forbidden words, the forbidden words are framed out by a frame.

In one embodiment, the types of content modules include: one or more of property, decoration, garden, project planning, municipal support, intelligence, education, family descriptions, product advantages, segments, trade circles, traffic, topic terms, gifts and gifts, and discount offers.

In one embodiment, the file database stores the building promotion file, and the variable information comprises one or more of advertising purpose, project attribute, literary style and type.

In one embodiment, the types of entities in the long document knowledge graph include one or more of a building district, a transportation facility, a school, a sports fitness facility, a hospital, a bank, a property, a fitment, a house type.

In one embodiment, the type of physical relationship includes a distance between the building cell and a transportation facility, school, hospital, bank, and further includes one or more of an affiliation nature between the building cell and a sports fitness facility, property, fitment, house type.

A document generation system comprising:

the acquisition module is used for acquiring the primary manuscript of the document obtained by filling the key information into the document template;

the identification module is used for identifying the entity of the long document knowledge graph in the document primary draft through an entity sequence model, wherein the entity sequence model is obtained by training an article marked with the entity;

the calling module is used for calling the knowledge graph of the entity from the long document knowledge graph and integrating the information of the knowledge graph of the entity into a natural language to obtain an expanded statement;

and the generating module is used for adding the extension statement in the primary manuscript of the document to generate a target document.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the document generation method of any of the above embodiments when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the document generation method according to any one of the above embodiments.

According to the method, the system, the computer equipment and the storage medium for generating the file, the entity in the file initial draft is identified through the entity sequence model, the extensible item is found out based on the long file knowledge graph, the knowledge graph where the entity is located is called, the information of the knowledge graph where the entity is located is integrated into the extension sentence, the extension sentence is added to the file initial draft, the target file which is expanded is generated, the long file knowledge graph can further guarantee that the extension direction is limited in the specified field where the long file knowledge graph is located, meanwhile, the width of the long file knowledge graph is large, the multiple extension directions are provided, and the long file creation efficiency is improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice.

Drawings

The foregoing and/or additional aspects and advantages will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic representation of a pattern template according to one embodiment;

FIG. 2 is a flow chart of a method for generating a document in one embodiment;

FIG. 3 is a schematic diagram of a knowledge-graph in one embodiment;

FIG. 4 is a diagram of a knowledge graph under application of a property project portfolio in one embodiment;

FIG. 5 is a diagram illustrating the labeling of grammar errors and illicit words in one embodiment;

FIG. 6 is a schematic diagram of a pattern template according to yet another embodiment;

FIG. 7 is a diagram of expanding suggestion information in one embodiment;

FIG. 8 is a schematic diagram of the document creation system in one embodiment;

FIG. 9 is a diagram showing an internal configuration of a computer device according to an embodiment.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As shown in fig. 1, fig. 1 is a schematic diagram of a document template in an embodiment, in the document template, "X" represents fixed text, and underlining represents key information that needs to be supplemented additionally.

In an embodiment, as shown in fig. 2, fig. 2 is a flowchart of a document generation method in an embodiment, where this embodiment proposes a document generation method, and the document generation method may be applied to a computer device, and specifically may include the following steps:

step S210: and acquiring the primary manuscript of the document obtained by filling the key information into the document template.

In this step, the document template may be selected by the user, and after determining the key information to be filled in the document template, the key information is filled in the corresponding position of the document template to obtain the initial document of the document. For example, the user may input key information of the supplementary document template, or the key information may be obtained by searching corresponding pre-stored data.

Specifically, in an embodiment, the step of acquiring the first manuscript obtained by filling the key information into the document template in step S210 may include:

s211: and matching the document template according to the variable information and the key information input by the user.

Obtaining variable information and key information provided by a user, comparing the variable information and the key information with the labels of the document templates, and matching a plurality of document templates with high correlation degrees. If a plurality of document templates are matched, the plurality of document templates can be displayed to the user, and one or more document templates can be determined according to the selection of the user.

S212: and receiving the supplementary information of the file template input by the user.

And determining the parts needing to be supplemented according to the key information and the matched document template, prompting the user to input supplementary information, and receiving the supplementary information of the document template input by the user after the user inputs the supplementary information of the document template.

S213: and generating a primary manuscript of the document according to the key information, the supplementary information and the document template.

And filling the key information and the supplementary information into corresponding positions of the document template to generate a complete document primary draft.

Step S220: and identifying the entity of the long document knowledge graph in the first manuscript of the document through an entity sequence model, wherein the entity sequence model is obtained by training an article marked with the entity.

In this step, the entities in the first manuscript that can be classified into the long case knowledge graph are identified, so as to identify the expandable items for the first manuscript that can expand the case based on the long case knowledge graph.

The identified entity can belong to the key information in the first draft of the document, and also can belong to the document template in the first draft of the document, and the contents of the key information and the document template can be used as extensible items of document extension.

Each knowledge graph has a corresponding designated field, the entity and the entity relation in the long scheme knowledge graph belong to the characteristic information in the designated field, and the long scheme knowledge graph and the entity sequence module based on the long scheme knowledge graph are both limited in the designated field, so that the extensible items can be determined in the designated field, and the extension direction can be effectively guided.

The entity sequence model is obtained by training according to an article marked with an entity. And inputting the primary manuscript of the file to the trained entity sequence model, wherein the entity sequence model can output the entities in the primary manuscript of the file. The entity sequence model can be trained by using a Bi-LSTM + CRF model, or can also be trained by using a Bi-GRU + CRF model or a CRF + + tool (Bi-LSTM: bidirectional long/short term memory networks; CRF: Conditional Random Field; Bi-GRU: bidirectional gated recovery units).

Step S230: and (4) calling the knowledge graph of the entity from the long document knowledge graph, and integrating the information of the knowledge graph of the entity into a natural language to obtain the expanded statement.

In this step, a knowledge graph where the entity is located is called, the entity and the entity relationship related to the entity are obtained, the related entity and the entity relationship corresponding to the related entity are organized into an expansion statement according to the natural language rule, and expansion of the expandable item (entity) in the expandable direction (the related entity and entity relationship) is completed according to the expansion statement. The expanding direction of the expanded sentences can be defined according to the long file knowledge graph of the specified domain, as shown in fig. 3, fig. 3 is a schematic diagram of the knowledge graph in one embodiment, and fig. 3 shows the knowledge graph where the entity 1 is located; fig. 4 is a diagram of a knowledge graph of a real estate project pattern application in one embodiment, and fig. 4 shows a knowledge graph of an entity "cell".

Step S240: and adding the expanded sentences into the primary manuscript of the file to generate a target file.

In this step, the extension statement is added to the primary manuscript to realize the extension of the primary manuscript.

The expanded sentence can be placed at a position selected by the user, or can be added into the manuscript by the user. If the statement of the expanded statement is not smooth or lacks information, the information can be modified and added by the user.

According to the method for generating the file, the entity in the initial file of the file is identified through the entity sequence model, the extensible item is found out based on the knowledge graph of the long file, the knowledge graph where the entity is located is called, the information of the knowledge graph where the entity is located is integrated into the extensible statement, the extensible statement is added to the initial file of the file, the target file which is completely extended is generated, the knowledge graph of the long file can further guarantee that the extension direction is limited in the designated field where the knowledge graph of the long file is located, meanwhile, the breadth of the knowledge graph of the long file is large, the multiple extension direction is provided, and the creation efficiency of the long file is improved.

In one embodiment, after the step of generating the target file in step S240, the method further includes:

step S251: carrying out text error correction inspection and forbidden word inspection on the target file; step S252: if the target file has grammar errors, marking out the grammar errors by underlining; step S253: if the target file has the forbidden words, the forbidden words are framed out by a frame.

The document generation method can carry out intelligent error correction and forbidden word detection so as to ensure the quality of the generated text. And judging whether grammar errors exist in the target document in the text error correction inspection, marking suspicious errors by underlines, detecting whether forbidden words are used in the target document in the forbidden word inspection, and framing the suspicious forbidden words by square frames. For example, as shown in fig. 5, fig. 5 is a schematic diagram of labeling syntax errors and forbidden words in an embodiment, when the target document is returned to the user and displayed, underlines and boxes may also be displayed at the same time, and suspicious errors and suspicious forbidden words are prompted to the user.

In one embodiment, the step of retrieving the knowledge graph of the entity from the long document knowledge graph in step S230, and integrating the information of the knowledge graph of the entity into a natural language to obtain the expanded sentence may include:

s231: and calling the relation entities connected with the entities and the relation information between the relation entities, and acquiring the entity attributes of the relation entities.

Determining a relation entity connected with the entity, calling entity attributes of the relation entity and the relation entity, and calling relation information between the entity and the relation entity.

The entity matched with the "Tianqifu" in the first draft of the document into the knowledge graph of the long document (as shown in fig. 4) is a "cell", the attribute of the "cell" includes the "name" taking the entity "cell" as an example, the knowledge graph where the "cell" is located is called, and the relation entities connected with the "cell" include "school", "subway station" and "sports supporting facility". Attributes of the school include scale and nature, and relationship information between the cell and the school is distance; attributes of the subway station comprise a name and a number line, and the relation information between the cell and the subway station is a distance; the attributes of the sports supporting facilities comprise occupied area, type and adaptive crowd, and the relation information between the cell and the sports supporting facilities is 'possessed'.

S232: and finding characteristic information of the entity associated with the relationship entity, the relationship information and the entity attribute.

And searching the characteristic information corresponding to the related entity based on the entity from the database or prompting the user to provide the associated characteristic information.

The characteristic information includes specific information that is truly related to the entity. Taking the "Tianqifu" of the matched entity as an example, the "school", "subway station" and the "sports supporting facilities" owned by the "Tianqifu" in the distance near the "Tianqifu" are searched from the database, the "scale" and the "property" of the "school" are extracted, the "name" and the "number line" of the "subway station" are determined, the "sports supporting facilities", "occupied area", the "type" and the "adapted crowd" are determined. If the matched school within 500 meters near the sky screen house is found, the first school is office, the teaching class is 60, and the students are 3000 in scale.

S233: and integrating according to the characteristic information and the relation information and the natural language to generate an expanded statement.

Combining the relationship information and the characteristic information to generate an expanded sentence, and taking the characteristic information of school and distance as an example to generate the expanded sentence: "first elementary school is 500 meters away". Matching extended templates may also be invoked to integrate feature information and relationship information, such as "Tianqifu adjacent to first school".

The relationships connected to the entities may include a one-level relationship connected directly and a multi-level relationship connected indirectly. For example, several levels of relationships of entities or relationship entities under a specified level of relationship and relationship information therebetween and entity attributes may be specified. In the following, the knowledge graph of the entity under the first-level relationship and the multi-level relationship is taken as an example to expand the first-level expansion statement and the multi-level expansion statement, please refer to the following embodiment.

In one embodiment, the extension statements include primary extension statements, which are extension statements directly related to the entity. In step S230, the step of retrieving the knowledge graph of the entity from the long document knowledge graph, and integrating the information of the knowledge graph of the entity into a natural language to obtain the expanded sentence may include:

s2301: and calling primary relationship entities directly connected with the entities and primary relationship information among the primary relationship entities from the long document knowledge graph. S2302: and acquiring entity attributes of the entity and the primary relation entity in the long document knowledge graph. And S2303: and integrating the first-level relation entities, the first-level relation information and the entity attributes into a natural language to obtain a first-level extension statement.

The method for generating the file can expand the file according to the most relevant primary relation, improves the relevance of the expanded content and the initial file of the file, and improves the range and the efficiency of long file expansion.

In one embodiment, the expanded statements further comprise multi-level expanded statements, a multi-level expanded statement being an expanded statement that is indirectly related to an entity. In step S230, the step of retrieving the knowledge graph of the entity from the long-document knowledge graph, and integrating the information of the knowledge graph of the entity into a natural language to obtain an expanded statement may further include:

s2304: and calling the multi-level relation entities indirectly connected with the entities and the multi-level relation information between the entities from the long document knowledge graph. S2305: and acquiring entity attributes of the entity and the multi-level relation entity in the long document knowledge graph. S2306: and integrating the multi-level relational entities, the multi-level relational information and the entity attributes into a natural language to obtain the multi-level extension statements.

The method for generating the file can also expand the file according to indirectly related multilevel relations, can further increase the expansion direction more deeply, comprehensively and diversely on the basis of the first-level relation, enriches the thought of file creation, and improves the efficiency of long file creation.

In an embodiment, before the step of obtaining the first manuscript obtained by filling the key information into the manuscript template, the method may further include:

s201: and marking variable information on each file in the file database. S202: various types of content modules are divided from the file, and writing templates are extracted from the content modules respectively.

The extraction basis of the writing template is a large amount of real document data with labels, and for each document, the variable information corresponding to each document is divided and labeled, and the content module to which the content in each document belongs is indicated.

S203: and marking the key information type for the content template according to the key information in the content template.

Any content module also needs to be labeled with key information therein. The key information revelation is special information exclusive to the file, the special information is extracted, and the rest content is the writing template. And the special file can be generated by filling the key information defined by the user.

S204: and matching the variable information and the key information input by the user with the marked variable information and the marked key information type to obtain the file template.

The method for generating the file extracts the file template and provides the matched file template according to the user input information, thereby improving the file generation efficiency.

Taking the case template applied to the real estate project case as an example, the case database contains a large number of real estate project cases, and the whole real estate project cases need to be labeled with the advertisement purpose, the project attribute, the line style, the type and the like. FIG. 6 is a diagram of a document template in yet another embodiment, wherein the first segment of content module shown in FIG. 6 is a writing template associated with a value output class with promotional, Chinese family (item attributes), ethnic style, hard and broad (type) tags, and populated with certification qualifications for property and proprietary service information at certain locations.

Wherein, taking the real estate project file as an example, the types of the content modules include: one or more of property, decoration, garden, project planning, municipal support, intelligence, education, family descriptions, product advantages, segments, trade circles, traffic, topic terms, gifts and gifts, and discount offers.

The file database stores the building promotion file, and the variable information comprises one or more items of advertisement purpose, project attribute, literary style and type.

The types of the entities in the long pattern knowledge graph comprise one or more of a building district, a traffic facility, a school, a sports fitness facility, a hospital, a bank, a property, decoration and a house type;

the type of the entity relationship comprises the distance between the building district and a transportation facility, a school, a hospital and a bank, and also comprises one or more of the attaching properties between the building district and a sports fitness facility, a property, a decoration and a house type.

Variable information may include advertising purposes, item attributes, literary styles, literary types, and content modules of value output class, brand value class, regular promotion class, and node promotion class. The key information may include characteristic information of various types of content modules in the application field.

Taking the characteristic information of the real estate project file as an example, the types of the content modules include one or more of a value output class, a brand value class, a general promotion class, and a node promotion class.

The information of the value output class content module comprises: one or more of property information, decoration information, garden information, project planning information, municipal support information, intelligentized information, education information, house type description information, product advantage information, district information, business district information and traffic information. For example, the property information includes information such as certification qualifications and dedicated services; the decoration information comprises decoration price, decoration style, designer and the like; the garden information comprises information such as style, area, bright spots and the like; the project planning information comprises information such as volume rate, floor area, greening rate, building area and the like; the municipal supporting information comprises information of government investment, municipal administration and the like; the intelligent information comprises information such as fresh air, purified water, entrance guard, security and the like; the education information comprises information of primary schools, junior high schools, universities and the like; house type specification information: or inputting the information of house type characteristics, area, hall and the like; the product advantage information comprises information such as house type characteristics, area, hall and the like; the district information comprises information of hospitals, banks, entertainment, living and leisure facilities and the like; the business circle information comprises information of facilities such as hospitals, banks, entertainment, life and leisure facilities and shopping centers; the traffic information includes information of subway, highway, express way, urban rail, high speed, etc.

The information of the brand value content module can select the content of group introduction, group architecture, development process, acquired honor, service range, group concept, development target, enterprise culture, social responsibility, public welfare activity, group memorial and the like in the file library.

Information for conventional promotional content modules may include: one or more items of the theme thesaurus information, the gift information and the giving and discount offer information. The real estate item information of the node promotion content module comprises: one or more items of theme comment information, gift information, trailer discount information and giving and discount information. For example, the topic thesaurus information includes information such as topic, promotion means, promotion force, etc.; the giving and preferential discount information comprises information such as discount and giving; the gift information comprises a gift and a receiving condition; the tail disc moment benefit information comprises the information of the preference, discount and the like of the house.

Taking a real estate project document as an example, the project attributes can include Chinese luxury homes, general luxury homes, quality type conventional homes, life type conventional homes, coastal areas, hot springs, mountains, apartments, office buildings, and the like; the literary style may include a human style, a local style, a light luxury style, a literary style, a youth style, a vacation and leisure style, an office business style, an investment upgrade style, an english noble style, a business style, an emotional appeal style, and the like; the content modules of the value output class may include information related to the project such as property, decoration, gardening, project planning, municipal support, intelligence, education, house type descriptions, product advantages, location, business circles, traffic, etc.

In one embodiment, before the step of retrieving the knowledge-graph of the entity from the long-document knowledge-graph, the method may further include:

s261: and carrying out application named entity identification and relationship extraction on each file in the file database to obtain the entity and the entity relationship in the file.

Processing the existing file in the file database by using a named entity recognition and relation extraction tool, recognizing the entities in the existing file and extracting the entity relation between the entities, and summarizing the obtained entities and the entity relation.

S261: and constructing a long case knowledge map according to the variable information, the key information type, the appointed professional knowledge, the entity in the case and the entity relationship labeled to the case database.

And determining the entity and entity relationship which can be incorporated into the long case knowledge map according to the variable information, the key information type, the appointed professional knowledge and the entity and entity relationship in the case labeled to the case database, and constructing the long case knowledge map.

For example, in the real estate project file, "building name", "cell name", "peripheral subway line", "peripheral school name", "decoration style" and the like can be used as the type and attribute of the entity, and "name", "type" and the like, "distance between two places", "transportation means between two places" and the like can be used as the type of entity relationship. And finally integrating the above work to integrate the information into a knowledge graph.

In one embodiment, before the step of identifying the entity of the long document knowledge graph in the first manuscript of the document through the entity sequence model, the method further comprises the following steps:

inputting the file sample marked with the corresponding entity in the long text knowledge graph in the file content into an initial entity sequence model, and training the entity sequence model.

Entities are marked on the content of the file in the file sample, and the entities in the long text knowledge map are marked. The purpose of the entity sequence model is to be able to identify entities contained in a text, and in particular to be able to identify entities in a long text knowledge graph. The entity sequence model may output entities in the first manuscript, for example, which entities the first manuscript possesses are identified in the output tags. The position of the entity at the initial draft of the document can be correspondingly determined according to the output entity, and the addition of the expansion statement is carried out at the position of the entity, so that the fluency of document expansion is enhanced. In the visual display of the primary manuscript of the document, the entity can be marked with underlines according to the position of the entity so as to prompt a user.

In one embodiment, after the step of retrieving the knowledge-graph of the entity from the long document knowledge-graph in step S230, the method may further include:

s271: and generating the extension suggestion information according to the entity and the entity relation in the knowledge graph where the entity is located.

And calling the extension suggestion template, and adding the entity and the entity relation into the extension suggestion template to obtain extension suggestion information.

S272: and returning the expanded suggestion information to the user, and receiving expanded text input by the user according to the expanded suggestion information.

Fig. 7 is a schematic diagram of the extension suggestion information in an embodiment, as shown in fig. 7, an extension information suggestion for "tianfang fu" is returned and displayed to the user, when the cursor arrow of the user moves to the place, the extension information suggestion corresponding to "tianfang fu" is displayed, and the rest of the text with underlines is correspondingly provided with the corresponding extension information suggestion. And the user inputs an extension text for performing supplementary extension on the primary manuscript of the file according to the related extension suggestion.

S273: and adding the extended text in the primary manuscript of the file to generate the target file.

According to the method for generating the long file, after the extension text supplemented according to the extension suggestion information is added in the first file, the extension of the first file is completed, the target file is obtained, and a user can freely extend and create files with different lengths according to the extension suggestion information, so that the method is suitable for different scenes and improves the efficiency of long file creation.

In an embodiment, as shown in fig. 8, fig. 8 is a schematic structural diagram of a document generation system in an embodiment, and the present application further provides a document generation system, which may specifically include: an obtaining module 810, an identifying module 820, a retrieving module 830 and a generating module 840, wherein:

the obtaining module 810 is configured to obtain a first manuscript of the document obtained by filling the key information into the document template.

The obtaining module 810 may obtain the first manuscript by selecting, and after determining the key information to be filled in the manuscript template, filling the key information into the corresponding position of the manuscript template. For example, the user may input key information of the supplementary document template, or the key information may be obtained by searching corresponding pre-stored data.

The identification module 820 is configured to identify an entity of the long document knowledge graph in the first manuscript of the document through an entity sequence model, where the entity sequence model is obtained by training an article marked with the entity.

The recognition module 820 recognizes entities in the first manuscript that can be categorized into the long manuscript knowledge graph, thereby recognizing extensible items for the first manuscript that can be expanded based on the long manuscript knowledge graph.

The retrieving module 830 is configured to retrieve the knowledge graph of the entity from the long-document knowledge graph, and integrate the information of the knowledge graph of the entity into a natural language to obtain an expanded statement.

The retrieving module 830 retrieves the knowledge graph of the entity, obtains the entity and the entity relationship related to the entity, organizes the related entity and the entity relationship corresponding to the related entity into an extension statement according to the natural language rule, and completes the extension of the extensible item (entity) in the extensible direction (related entity and entity relationship).

The generating module 840 is configured to add an extension statement to the first manuscript of the document to generate a target document.

The generating module 840 adds the expansion statements into the primary manuscript to realize expansion of the primary manuscript.

The file generation system identifies the entity in the file initial draft through the entity sequence model, finds out the extensible item based on the long file knowledge graph, calls the knowledge graph where the entity is located, integrates the information of the knowledge graph where the entity is located into the extension sentence, adds the extension sentence to the file initial draft, generates the target file which completes the extension, can also ensure that the extension direction is limited in the designated field where the long file knowledge graph is located, has large breadth of the long file knowledge graph, provides multiple extension directions, and improves the long file creation efficiency.

For the specific limitations of the document generation system, reference may be made to the above limitations of the document generation method, which are not described herein again. The modules in the above-described document generation system may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

As shown in fig. 9, fig. 9 is a schematic diagram of an internal structure of a computer device in one embodiment. The computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected by a system bus. Wherein the non-volatile storage medium of the computer device stores an operating system, a database, and a computer program that, when executed by the processor, causes the processor to implement a document generation method. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein a computer program that, when executed by the processor, causes the processor to perform a method of generating a document. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the document generation method according to any of the above embodiments when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the document generation method according to any of the above embodiments.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. A document generation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the step of obtaining a first draft of the document obtained by populating the document template with the key information comprises:

matching the document template according to variable information and key information input by a user;

receiving the supplementary information of the file template input by the user;

and generating the primary manuscript of the document according to the key information, the supplementary information and the document template.

3. The method of claim 1, wherein the step of obtaining a first manuscript of a document obtained by filling key information into a document template further comprises:

marking variable information on each file in a file database;

dividing various types of content modules from the file, and extracting writing templates from the content modules respectively;

marking key information types for the content template according to the key information in the content template;

4. The document generation method of claim 3, wherein before the step of retrieving the knowledge-graph of the entity from the long document knowledge-graph, further comprising:

5. The method of claim 4, wherein prior to the step of identifying the entity of the long document knowledge graph in the first manuscript of the document by the entity sequence model, the method further comprises:

6. The method of claim 1, wherein the step of retrieving the knowledge-graph of the entity from the long knowledge-graph of the document and integrating the knowledge-graph of the entity into a natural language to obtain the expanded sentence comprises:

calling relationship entities connected with the entities and relationship information among the relationship entities, and acquiring entity attributes of the relationship entities;

searching a database for characteristic information of the entity associated with the relationship entity, the relationship information and the entity attribute;

and integrating according to the characteristic information and the relation information and natural language to generate the expanded statement.

7. The method of generating a document according to claim 1, wherein the extension sentence includes a primary extension sentence;

calling primary relationship entities directly connected with the entities and primary relationship information among the primary relationship entities from the long document knowledge graph;

acquiring entity attributes of the entity and the primary relation entity in the long file knowledge graph;

and integrating the primary relationship entity, the primary relationship information and the entity attribute into a natural language to obtain a primary extension statement.

8. The document generation method according to claim 7, wherein the extension sentence further comprises a multi-level extension sentence;

calling multi-level relation entities indirectly connected with the entities and multi-level relation information among the entities from the long document knowledge graph;

acquiring entity attributes of the entity and the multi-level relation entity in the long file knowledge graph;

and integrating the multi-level relational entity, the multi-level relational information and the entity attribute into a natural language to obtain the multi-level extension statement.

9. The document generation method of claim 1, wherein after the step of retrieving the knowledge-graph of the entity from the long document knowledge-graph, the method further comprises:

generating extension suggestion information according to the entity and the entity relation in the knowledge graph where the entity is located;

returning the extended suggestion information to the user, and receiving an extended text input by the user according to the extended suggestion information;

and adding the extended text in the primary manuscript of the file to generate the target file.

10. The method of claim 1, further comprising, after the step of generating the target document:

carrying out text error correction inspection and forbidden word inspection on the target file;

if the target file has a grammar error, marking the grammar error by underlining;

and if the target file has the forbidden words, the forbidden words are framed out by a frame.

11. The document generation method of claim 4, wherein the type of the content module comprises: one or more of property, decoration, garden, project planning, municipal support, intelligence, education, family descriptions, product advantages, segments, trade circles, traffic, topic thesaurus, gifts and preferential discount;

the pattern database stores the building promotion patterns, and the variable information comprises one or more of advertisement purpose, project attribute, literary style and type;

the type of the entity relationship comprises the distance between the building district and traffic facilities, schools, hospitals and banks, and also comprises one or more of the attaching properties between the building district and sports fitness facilities, properties, decoration and house types.

12. A document generation system, comprising:

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the document generation method of any one of claims 1 to 11 when executing the computer program.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the document generation method according to any one of claims 1 to 11.