CN116933757A

CN116933757A - Document generation method and system applying language artificial intelligence

Info

Publication number: CN116933757A
Application number: CN202311187668.7A
Authority: CN
Inventors: 蓝建敏; 池沐霖; 李观春; 徐泳坚
Original assignee: Excellence Information Technology Co ltd
Current assignee: Excellence Information Technology Co ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-10-24
Anticipated expiration: 2043-09-15
Also published as: CN116933757B

Abstract

The application provides a document generation method and a system applying language artificial intelligence, wherein an information extraction model is used for extracting the relation of a text document to obtain a plurality of triples to form a triplet set, a text document is used for carrying out fine tuning training on a pre-training language model to obtain a generation model, the generation model is used for complementing the template document to obtain a complement text, a semantic condensation reaction is carried out on the triplet set according to the complement text to obtain a text reaction coefficient, and the complement text is condensed according to the text reaction coefficient, so that the safety and quality of text generation are better ensured.

Description

Document generation method and system applying language artificial intelligence

Technical Field

The application belongs to the field of processing optimization, and particularly relates to a document generation method and system applying language artificial intelligence.

Background

The application of language artificial intelligence to generate a document refers to automatically generating the document meeting grammar, logic and semantic requirements through a computer system by using related technologies such as natural language processing, machine learning, deep learning and the like. The technology has wide application prospect in various fields such as law, public service, medical treatment, finance and the like. Although Natural Language Processing (NLP): NLP technology can be used for tasks such as lexical analysis, syntactic analysis, semantic understanding, and the like, learning a large amount of text data through training models, but the prior art still has challenges in understanding complex knowledge and contexts. In generating a long document, the model may suffer from logic errors, incompatibilities, or lack of context. In the generation of documents in a particular field, it is a challenge to obtain a large amount of training data of high quality. Lack of domain specific data may cause the generated results to deviate from expected. The generated documents may involve plagiarism problems, as well as generating inappropriate, illegal or biased content, which requires the establishment of appropriate regulatory mechanisms and algorithms to ensure the reliability and compliance of the documents. And the data set used by the model may have sample bias and tendency, which may cause problems of bias, discrimination or unfair of the generated document, and special attention needs to be paid to avoid such problems for document generation which has an influence on topics on information monitoring. A legal document generation method based on a knowledge graph is provided in the patent document with publication number CN113868391a, and although a target referee result corresponding to a case to be processed can be determined from the case knowledge graph, it is difficult to manage for generating inappropriate or biased content. In publication number CN113420143a, a document abstract generating method is provided, and although context semantic analysis can be performed on a target text based on document entity elements to obtain context Wen Yuyi vectors of the document entity elements, it is difficult to capture a preset multi-hop knowledge relationship, and it is also difficult to avoid sample bias and tendency.

Disclosure of Invention

The application aims to provide a document generation method and a document generation system applying language artificial intelligence, which are used for solving one or more technical problems in the prior art and at least providing a beneficial selection or creation condition.

To achieve the above object, according to an aspect of the present application, there is provided a document generation method applying language artificial intelligence, the method comprising the steps of:

inputting a text document;

using an information extraction model to extract the relation of the text document to obtain a plurality of triples to form a triplet set;

inputting a template document;

using the generated model to complement the template document to obtain a complement text;

carrying out semantic condensation reaction on the triplet set according to the complement text to obtain a text reaction coefficient;

condensing the complement text according to the text reaction coefficient.

Further, the text document entered is string data representing one or more articles.

Further, the information extraction model is an information extraction model based on a pre-training language model, and the generation model is a generation model obtained by performing fine-tuning training on the pre-training language model according to the text document;

in some embodiments, to save training costs, the information extraction model may be implemented by performing zero-shot information extraction through chat with ChatGPT, while in some embodiments, to ensure data security and independence, a chinese information extraction framework (e.g., bert-NER) built based on Bert-NER may be used.

Further, the triples in the triples set are three-dimensional arrays composed of character strings, the character strings in the triples all belong to an input text document, and the triples in the triples set have mutual dissimilarity. The triplet is (Subject, precede, object), wherein the Subject at the head, i.e. the head entity, and the Subject at the end, i.e. the tail entity, are two entities, the middle precede being the entity relationship, subject, predicate and the Object being in the form of a string.

Further, the template document is a text containing a plurality of gap filling positions, the complete text is composed of a plurality of different lemmas, (lemmas can represent token, token is of character string type) each lemma corresponds to one gap filling position, each gap filling position is not connected with each other and has interval characters, only interval characters exist between two gap filling positions, but no other gap filling positions are called as adjacent gap filling positions, the adjacent gap filling positions are called as adjacent gap filling positions, and lemmas corresponding to the adjacent gap filling positions are adjacent lemmas.

Further, the method for obtaining a plurality of complement texts by using the generation model to complement the template document comprises the following steps: and using a masking mechanism of the pre-training language model to make the generated model complement the template document to obtain a complement text.

Further, according to the complement text, carrying out semantic condensation reaction on the triplet set, and obtaining a text reaction coefficient by the following method:

creating a semantic embedding function, wherein the semantic embedding function converts a character string input into the semantic embedding function into a semantic vector with a fixed dimension size for output;

the number of dimensions of the semantic vectors is k, the sequence number of each dimension in the semantic vectors is v, v is E [1, k ], and the semantic similarity between the semantic vectors can be a value of 0-1;

for two adjacent lemmas, acquiring the characters of the interval between the two adjacent lemmas, and the three-dimensional array formed by the adjacent lemmas and the characters of the interval between the adjacent lemmas is called as an adjacent lemma;

taking a set formed by all adjacent tuples as an adjacent tuple set;

in each adjacent word group, converting two words and the words at intervals into semantic vectors respectively through the semantic embedding function, calculating the semantic similarity of the semantic vectors of the two words and the semantic vectors of the words at intervals respectively, multiplying the semantic vectors of the two words and the semantic similarity of the semantic vectors of the words at intervals and taking square roots, taking the numerical value of the square roots as the deviation weight of the adjacent word groups, multiplying the numerical value of each dimension of the semantic vectors of the words at intervals by the deviation weight to obtain a relation correction vector,

recording semantic similarity y1 and y2 between semantic vectors of two words and semantic vectors of the words at intervals, wherein the semantic vectors of the words at intervals are Gvec, the numerical value of the dimension with the sequence number v in Gvec is Gvec [ v ], the relation correction vector is Male,

，

in the Malec, the numerical calculation of each dimension Gvec v (y1×y2) can be parallel, which is different from the high-complexity calculation of the semantic vector to be subjected to matrix decomposition, so that the method is beneficial to accelerating the calculation process by using the distributed computing equipment, relieves the problem caused by long running time of a large-scale pre-training model, and can be used for generating a document on a large scale;

in the triplet set, the head entity, entity relation and tail entity of each triplet are respectively converted into semantic vectors through the semantic embedding function, the semantic vectors of the head entity in the triplet are recorded as Subvec, the semantic vectors of the tail entity in the triplet are recorded as ovvec, the semantic vectors of the entity relation in the triplet are recorded as Relvec,

calculating the semantic similarity of Subvec and Relvec to be SmR, calculating the semantic similarity of Subvec and Relvec to be OmR,

calculating a semantic transition value of the triplet, wherein the semantic transition value has a plurality of scores, the number of scores of the semantic transition value is consistent with the number of dimensions of a semantic vector, the sequence number of scores of the semantic transition value is consistent with the sequence number of dimensions of the semantic vector, the semantic transition value is Benec, the score with the sequence number v in Benec is Benec [ v ], and the calculation formula of Benec [ v ] is:

，

it should be noted that, the semantic transition value Benec should not be regarded as a vector, the order of the dimensions of the semantic transition value is not ordered and fixed like the semantic vector, in the embodiment provided by the application, one state of the semantic transition value is selected for the convenience of calculation, namely, the number of the scores of the semantic transition value is equal to the number of the dimensions of the semantic vector, and the sequence number of the scores of the semantic transition value is marked by the sequence number of the dimensions of the semantic vector, in addition, the number of the scores of the semantic transition value can be different from that, preferably, the number of the scores of the external semantic transition value should be greater than or equal to the number of the dimensions of the semantic vector, wherein the scores can also be disordered, so that the entity nodes of the knowledge graph can be fully represented for posterior probability among a plurality of jump paths, and the posterior probability of transfer connection among the score of the head entity and tail entity is extracted by dividing the combination of the dimensional component of the two sides and the semantic similarity of entity relationship respectively;

wherein Subvec [ v ] represents the number of the dimension with the number v in Subvec, and Relvec [ v ] represents the number of the dimension with the number v in Relvec.

In the triplet set, each triplet corresponds to different adjacent tuples and has corresponding text reaction coefficients respectively, and each adjacent tuple corresponds to each triplet and has corresponding text reaction coefficients respectively;

for each adjacent word tuple, calculating the text reaction coefficient of each adjacent word tuple for each triplet in the triplet set, wherein the text reaction coefficient specifically comprises the following steps:

the number of triples in the triplet set is recorded as n, the sequence number of the triples in the triplet set is recorded as i, the triples with the sequence number of i in the triplet set is recorded as Triple (i),

the number of adjacent tuples in the adjacent tuple set is recorded as m, the sequence number of the adjacent tuple in the adjacent tuple set is recorded as j, the sequence number of the adjacent tuple in the adjacent tuple set is recorded as token (j),

for token (j), calculating a relation correction vector corresponding to the token (j) as a Malec (j), wherein the number of the dimension with the sequence number v in the Malec (j) is a Malec (j) [ v ], calculating a semantic transition value Benec (i) corresponding to each Triple (i), wherein the number v in the Benec (i) is a Benec (i) [ v ],

here, the sequence numbers of the elements in the set are denoted by brackets (), and the dimensions, components, scores, or the like are denoted by brackets [ ];

to distinguish the cyclic traversal of v in Malec (i) and Malec (j) [ v ] and replace the original traversal of the symbol v in Malec (j) with v1 to obtain Malec (j) [ v1], replace the original traversal of the symbol v in benc (i) with v2 to obtain benc (i) [ v2], v1 and v2 are similarly changed within the original [1, k ] interval only by replacing the symbol, so that the serial numbers of each dimension in the Malec (j) and the serial numbers of each score in the benc (i) are enumerated independently of each other, thereby realizing a double nested loop, and calculating the text reaction coefficient Condes (j, i) of Tokens (j) to Triple (i):

，

simplifying the denominator of the formula can obtain:

，

in the prior art, the calculation of corresponding dimensions is generally carried out between vectors or tensors, which is the calculation of single hops of the corresponding dimensions, but triples in a knowledge base have multi-hop relations, and each gap-filling position in the corresponding template document also has multi-hop relations, so that the calculation of the corresponding dimensions in the prior art is not suitable for the multi-hop relations, and the double nesting circulation is exactly used for measuring the posterior probability of the gap-filling connection in the template document and the mathematical characteristics of paths of entities between triples reaching a plurality of entities through entity relations, thereby being beneficial to measuring the multi-hop relations of the full-filling text in the template document.

Further, according to the text reaction coefficient, the method for condensing the complement text comprises the following steps:

condensing the completion text refers to condensing large-scale adjacent word tuples with triples in a knowledge base from large-scale alignment to small-scale alignment in mass data based on various template documents, and in some embodiments, the document generation method applying language artificial intelligence is applied to document generation of a public service institution, and a database of the public service institution contains a large number of conference record texts, speech texts and the like, wherein the semantics of words expressed in background texts (context) and general semantics (general common sense) are in and out with great probability, and automatically generated documents are often used in the public service field, if the general semantics method has great probability to influence, the completion text needs to be condensed first, and the safety and quality of the text generation are ensured;

for each adjacent tuple, counting the text reaction coefficient of each triplet corresponding to the adjacent tuple, and selecting one triplet with the numerical value of the text reaction coefficient being at the minimum value of the text reaction coefficient of each triplet as the closest triplet;

calculating the semantic similarity between two adjacent words in the adjacent word groups and the words spaced by the two adjacent words, and calculating the semantic similarity between a head entity and a tail entity in the closest triple and the entity relationship in the closest triple, if the semantic similarity between the words and the words spaced by the two adjacent words in the adjacent word groups is lower than the semantic similarity between the head entity and the tail entity in the closest triple and the entity relationship in the closest triple, judging the word which is lower than the semantic similarity between the head entity and the tail entity in the closest triple and the entity relationship in the closest triple as a word to be condensed, wherein: firstly calculating the semantic similarity of a head entity and a tail entity in the closest triplet to the entity relationship in the closest triplet, and the numerical values R1 and R2 of the semantic similarity of the tail entity in the closest triplet and the entity relationship in the closest triplet, wherein two lemmas and interval characters between the two lemmas are also arranged in the adjacent lemmas, calculating the numerical values S1 and S2 of the two lemmas in the adjacent lemmas and the semantic similarity of the characters between the two lemmas are also obtained, and then comparing the numerical values of the two lemmas with the numerical values of the two lemmas to judge whether S1 is smaller than R1 and smaller than R2, and S2 is smaller than R1 and smaller than R2, if so, the two lemmas are used as the lemmas to be condensed;

the closest triplet is used as a condensation triplet (the step can effectively screen the position of filling the gap with risk, avoid sample prejudice and tendency, and better ensure the safety and quality of text generation);

storing and outputting the word elements to be condensed and the condensed triples;

identifying the word elements to be condensed, displaying the condensed triplet, and enabling the condensed triplet to be replaced alternatively.

The application also provides a document generation system of the application language artificial intelligence, which comprises: a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the processor implements steps in the method for generating a document using language artificial intelligence when the processor executes the computer program, the system for generating a document using language artificial intelligence can be executed in a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud data center, and the like, and the executable system can include, but is not limited to, a processor, a memory, and a server cluster, and the processor executes the computer program to execute in units of the following systems:

the triplet set unit is used for extracting the relation of the text document by using an information extraction model to obtain a plurality of triples to form a triplet set;

the text completion unit is used for completing the template document by using the generated model to obtain a completed text;

the text reaction coefficient unit is used for carrying out semantic condensation reaction on the triplet set according to the completed text to obtain a text reaction coefficient;

and the text condensation unit is used for condensing the complement text according to the text reaction coefficient.

The beneficial effects of the application are as follows: the application provides a document generation method and a system applying language artificial intelligence, wherein an information extraction model is used for extracting the relation of a text document to obtain a plurality of triples to form a triplet set, a text document is used for carrying out fine tuning training on a pre-training language model to obtain a generation model, the generation model is used for complementing the template document to obtain a complement text, a semantic condensation reaction is carried out on the triplet set according to the complement text to obtain a text reaction coefficient, and the complement text is condensed according to the text reaction coefficient, so that the safety and quality of text generation are better ensured.

Drawings

The above and other features of the present application will become more apparent from the detailed description of the embodiments thereof given in conjunction with the accompanying drawings, in which like reference characters designate like or similar elements, and it is apparent that the drawings in the following description are merely some examples of the present application, and other drawings may be obtained from these drawings without inventive effort to those of ordinary skill in the art, in which:

FIG. 1 is a flow chart of a method for generating a document using language artificial intelligence;

FIG. 2 is a system architecture diagram of a document generation system employing language artificial intelligence.

Detailed Description

The conception, specific structure, and technical effects produced by the present application will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, aspects, and effects of the present application. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

In the description of the present application, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

Referring now to FIG. 1, a flowchart of a document generation method using language artificial intelligence according to the present application will be described, with reference to FIG. 1, in which a document generation method and system using language artificial intelligence according to an embodiment of the present application are described.

The application provides a document generation method applying language artificial intelligence, which specifically comprises the following steps:

condensing the complement text according to the text reaction coefficient.

in some embodiments, to save training costs, the information extraction model ([ 1] Zero-Shot Information Extraction via Chatting with ChatGPT ArXiv 2023 Xiang WeiXingyu CuiNing ChengXiaobin WangXin ZhangShen HuangPengjun XieJinan XuYufeng ChenMeishan Zhang) may be implemented by performing Zero-shot information extraction through chat with ChatGPT, while in some embodiments, to ensure data security and independence, a chinese information extraction framework (e.g., bert-NER) built based on Bert-NER may be used.

the number of dimensions of the semantic vectors is k, the sequence number of each dimension in the semantic vectors is v, v epsilon [1, k ], and the semantic similarity between the semantic vectors is a numerical value of 0-1;

taking a set formed by all adjacent tuples as an adjacent tuple set;

，

simplifying the denominator of the formula can obtain:

，

in the prior art, the calculation of corresponding dimensions is generally carried out between vectors or tensors, which is the calculation of single hops of the corresponding dimensions, but triples in a knowledge base have multi-hop relations, and each gap-filling position in a corresponding template document also has multi-hop relations, so that the calculation of the corresponding dimensions in the prior art is not suitable for the multi-hop relations, and the double nesting circulation in the text reaction coefficient calculation is exactly used for measuring the posterior probability of the jump connection between each gap-filling position in the template document and the mathematical characteristics of paths of entities between the triples to a plurality of entities through entity relations, thereby being beneficial to measuring the multi-hop relations of the complement text in the template document.

calculating the semantic similarity between two adjacent words in the adjacent word groups and the words spaced by the two adjacent words, calculating the semantic similarity between a head entity and a tail entity in the closest triplet and the entity relationship in the closest triplet, and if the semantic similarity between the words and the words spaced by the two adjacent words in the adjacent word groups is lower than the semantic similarity between the head entity and the tail entity in the closest triplet and the entity relationship in the closest triplet, judging that the word which is lower than the semantic similarity between the head entity and the tail entity in the closest triplet and the entity relationship in the closest triplet is a word to be condensed, and taking the closest triplet as a condensation triplet, wherein the step can effectively screen the position of filling gaps with risks, avoid sample bias and tendency, and better ensure the safety and quality of text generation;

The document generation system applying language artificial intelligence operates in any computing device of a desktop computer, a notebook computer, a palm computer or a cloud data center, and the computing device comprises: a processor, a memory, and a computer program stored in and running on the memory, the processor implementing the steps in the document generation method of application language artificial intelligence when executing the computer program, and the operable system may include, but is not limited to, a processor, a memory, a server cluster.

As shown in fig. 2, a document generation system using language artificial intelligence according to an embodiment of the present application includes: a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the steps in an embodiment of a document generation method for artificial intelligence in an application language when the computer program is executed, the processor executing the computer program to run in units of the following system:

Preferably, all undefined variables in the present application, if not explicitly defined, can be threshold set manually; preferably, for numerical calculation between unit different physical quantities, in order to better count the linear relation or probability relation of numerical distribution between different physical quantities, dimensionless processing and normalization processing can be performed to convert the numerical relation between different physical quantities so as to unify the numerical relation between different physical quantities.

The document generation system applying language artificial intelligence can be operated in computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud data center and the like. The document generation system applying language artificial intelligence comprises, but is not limited to, a processor and a memory. It will be appreciated by those skilled in the art that the examples are merely examples of a document generation method and system for application language artificial intelligence, and are not limiting of a document generation method and system for application language artificial intelligence, and may include more or fewer components than examples, or may combine certain components, or different components, e.g., the document generation system for application language artificial intelligence may further include input and output devices, network access devices, buses, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete component gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the document generation system of the application language artificial intelligence, and various interfaces and lines are used to connect various sub-areas of the entire document generation system of the application language artificial intelligence.

The memory may be used to store the computer program and/or module, and the processor may implement the functions of the document generation method and system of application language artificial intelligence by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The application provides a document generation method and a system applying language artificial intelligence, wherein an information extraction model is used for extracting the relation of a text document to obtain a plurality of triples to form a triplet set, a text document is used for carrying out fine tuning training on a pre-training language model to obtain a generation model, the generation model is used for complementing the template document to obtain a complement text, a semantic condensation reaction is carried out on the triplet set according to the complement text to obtain a text reaction coefficient, and the complement text is condensed according to the text reaction coefficient, so that the safety and quality of text generation are better ensured. After the template document is generated by using the method of the application, the F1-score is increased from 0.67 to 0.85 which is not used by the method of the application.

Although the present application has been described in considerable detail and with particularity with respect to several described embodiments, it is not intended to be limited to any such detail or embodiment or any particular embodiment so as to effectively cover the intended scope of the application. Furthermore, the foregoing description of the application has been presented in its embodiments contemplated by the inventors for the purpose of providing a useful description, and for the purposes of providing a non-essential modification of the application that may not be presently contemplated, may represent an equivalent modification of the application.

Claims

1. A document generation method employing language artificial intelligence, the method comprising the steps of:

inputting a text document;

inputting a template document;

condensing the complement text according to the text reaction coefficient.

2. The method for document generation using language artificial intelligence of claim 1, wherein the text document input is character string data representing one or more articles.

3. The document generation method using language artificial intelligence according to claim 1, wherein the information extraction model is an information extraction model based on a pre-training language model, and the generation model is a generation model obtained by performing fine-tuning training on the pre-training language model according to the text document.

4. The method for generating a document using language artificial intelligence according to claim 1, wherein the triples in the triples set are three-dimensional arrays composed of character strings, the character strings in the triples all belong to an input text document, and the triples in the triples set have mutual dissimilarity.

5. The method for generating a document using language artificial intelligence according to claim 1, wherein the template document is a text including a plurality of gap-filling positions, the complete text is composed of a plurality of different lemmas, each lemma corresponds to a gap-filling position, each gap-filling position is not connected and has a character with a space, only the character with the space between the two gap-filling positions and no other gap-filling position are called as adjacent gap-filling positions, the adjacent two gap-filling positions are called as adjacent gap-filling positions, and the lemma corresponding to the adjacent gap-filling position is the adjacent lemma.

6. The document generation method of claim 1, wherein the method for using the generation model to complement the template document to obtain a plurality of complement texts is as follows: and using a masking mechanism of the pre-training language model to make the generated model complement the template document to obtain a complement text.

7. The method for generating a document using linguistic artificial intelligence according to claim 5, wherein the method for performing semantic condensation reaction on the triplet set according to the complement text to obtain the text reaction coefficient comprises the steps of:

creating a semantic embedding function, wherein the semantic embedding function converts a character string input into the semantic embedding function into a semantic vector with a fixed dimension size for output; the number of dimensions of the semantic vector is k;

taking a set formed by all adjacent tuples as an adjacent tuple set;

，

wherein Subvec [ v ] represents the number of the dimension with the sequence number v in Subvec, and Relvec [ v ] represents the number of the dimension with the sequence number v in Relvec;

for token (j), calculating a relation correction vector corresponding to token (j) as a Malec (j), wherein the numerical value of the dimension with the sequence number v in the Malec (j) is Malec (j) v, calculating a semantic transition value Benec (i) corresponding to each Triple (i), the score with the sequence number v in Benec (i) is Benec (i) v, and in order to distinguish the cyclic traversal of v in the Malec (j) v and the Benec (i) v, replacing the traversal of the original symbol v in the Malec (j) with v1 to obtain the Malec (j) v1, the original symbol v is replaced by v2 to traverse in the Benec (i) to obtain Benec (i) [ v2], v1 and v2 are changed in the original [1, k ] interval just by replacing the symbol, so that the serial numbers of each dimension in the Malec (j) and the serial numbers of each score in the Benec (i) are enumerated independently, a double nesting cycle is realized, and the text reaction coefficient Condes (j, i) of Tokens (j) to Triple (i) is calculated:

，

and obtaining text reaction coefficients corresponding to each triplet corresponding to each adjacent tuple.

8. The document generation method using language artificial intelligence according to claim 6 or 7, wherein the method for condensing the complement text according to the text reaction coefficient is as follows:

calculating the semantic similarity between two adjacent lemmas in the adjacent lemmas and the text spaced by the two adjacent lemmas, calculating the semantic similarity between a head entity and a tail entity in the closest lemmas and the entity relationship in the closest lemmas, and judging that the lemmas which are simultaneously lower than the semantic similarity between the head entity and the tail entity in the closest lemmas and the entity relationship in the closest lemmas are to be condensed lemmas if the semantic similarity between the lemmas and the text spaced by the adjacent lemmas exists in the adjacent lemmas and the semantic similarity between the head entity and the tail entity in the closest lemmas and the entity relationship in the closest lemmas is simultaneously lower than the semantic similarity between the head entity and the tail entity in the closest lemmas and the entity relationship in the closest lemmas, and taking the closest lemmas as condensation triples;

identifying the word elements to be condensed, and displaying the condensed triplet for replacement.

9. A document generation system employing language artificial intelligence, the document generation system employing language artificial intelligence operating in any computing device of a desktop computer, a notebook computer, or a cloud data center, the computing device comprising: a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, implements the steps of a document generation method employing language artificial intelligence as claimed in any one of claims 1 to 7.