CN116796796A - GPT architecture-based automatic document generation method and device - Google Patents

GPT architecture-based automatic document generation method and device Download PDF

Info

Publication number
CN116796796A
CN116796796A CN202310009751.9A CN202310009751A CN116796796A CN 116796796 A CN116796796 A CN 116796796A CN 202310009751 A CN202310009751 A CN 202310009751A CN 116796796 A CN116796796 A CN 116796796A
Authority
CN
China
Prior art keywords
gpt
text
training
language model
architecture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310009751.9A
Other languages
Chinese (zh)
Inventor
马延美
刘学谦
王来奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fangcun Wuyou Technology Development Co ltd
Original Assignee
Beijing Fangcun Wuyou Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Fangcun Wuyou Technology Development Co ltd filed Critical Beijing Fangcun Wuyou Technology Development Co ltd
Priority to CN202310009751.9A priority Critical patent/CN116796796A/en
Publication of CN116796796A publication Critical patent/CN116796796A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a method and a device for automatically generating a document based on a GPT architecture. The automatic document generation method based on the GPT architecture comprises the following steps: acquiring a trained GPT-OD language model; acquiring text information input by a user; and inputting the text information into the trained GPT-OD language model so as to obtain the document information output by the GPT-OD language model. The automatic document generation method based on the GPT architecture provided by the application is aimed at the data preprocessing of document corpus: a simple and efficient data preprocessing strategy is used for filtering large-scale document corpus, so that the document generation capacity of the model is improved.

Description

GPT architecture-based automatic document generation method and device
Technical Field
The application relates to the technical field of text generation, in particular to a GPT architecture-based automatic document generation method and a GPT architecture-based automatic document generation device.
Background
The existing automatic document generation technology mainly comprises three main stream methods: an automatic document generation method based on grammar syntax rules, an automatic document generation method based on search, and an automatic document generation method based on a shallow sub-depth network such as RNN/LSTM. The three methods will be described in detail below.
We analyze the following for their respective disadvantages for three types of technology available:
(1) The method based on the rules has the defects that a large number of manually set rule templates are needed, a large number of linguistic experts are needed for marking, and the method can lead to single diversity of automatically generated documents, so that the automatic generation effect of the documents is weakened.
(2) The method based on the retrieval model utilizes text retrieval and sequencing technology to select proper documents from the document corpus. The method recommends the existing official document to the user, so that the sentence smoothness is higher; the method has the defects that new text corpus cannot be generated, and when searching or sorting is performed, only semantic relevance on the surface is likely to stay, so that real meaning is difficult to capture.
(3) The deep learning algorithm-based generation method mainly uses an encoder-decoder structure to generate replies, and a typical technology is a Seq2Seq network structure. The method has the advantage that no rules are needed, and how to generate the text can be automatically learned from the existing dialogue text. The method has the advantage that the deep neural network can learn the semantic mapping from the input data to the output text end to end without manual participation in feature engineering. Deep neural models tend to have a large number of parameters, and most text generation task datasets are very small, so deep neural networks are very easy to overfit on these datasets, making it impossible to generalize in practical applications.
It is therefore desirable to have a solution that solves or at least alleviates the above-mentioned drawbacks of the prior art.
Disclosure of Invention
The application aims to provide a GPT architecture-based document automatic generation method for solving at least one technical problem.
The application provides the following scheme:
according to one aspect of the present application, there is provided a GPT architecture-based automatic document generation method, including:
acquiring a trained GPT-OD language model;
acquiring text information input by a user;
and inputting the text information into the trained GPT-OD language model so as to obtain the document information output by the GPT-OD language model.
Optionally, before the obtaining the trained GPT-OD language model, the method for automatically generating a document based on the GPT architecture further includes:
acquiring a training set;
and training the GPT-2 language model through a training set so as to obtain a GPT-OD language model.
Optionally, the training set includes a plurality of text sets;
the training of the GPT-2 language model by the training set includes:
preprocessing each text set in the training set;
and training the GPT-2 language model according to the preprocessed text set.
Optionally, the preprocessing each text set in the training set includes:
each text set is processed as follows:
acquiring the number of line-feed symbols of a text set;
judging whether the character set exceeds the preset line number according to the number of line-changing symbols of the character set, if not, then
And deleting the text set which does not exceed the preset line number.
Optionally, the preprocessing each text set in the training set further includes:
judging whether the text set exceeds the preset line number according to the number of line-changing symbols of the text set, if so, then
Judging whether the word number of the word set is smaller than a first preset word number, if so, then
Deleting the character set smaller than the first preset word number.
Optionally, the preprocessing each text set in the training set further includes:
judging whether the word number of the word set is smaller than a first preset word number, if not, then
Judging whether the number of characters in each row in the continuous 5 rows is less than 15 characters according to the line-changing character and the number of words of the character set, if so, then
And deleting the character set with the character number of each row being less than 15 characters in the continuous 5 rows.
Optionally, the preprocessing each text set in the training set further includes:
judging whether the number of characters in each row in the continuous 5 rows is less than 15 characters according to the line-changing character and the number of words of the character set, if not, then
Identifying the character set, judging whether the character set has at least two serial numbers, if so, then
Judging whether each serial number is continuous, if not, then
And deleting the text sets with discontinuous sequence numbers.
Optionally, the preprocessing each text set in the training set further includes:
judging whether each serial number is continuous, if so, then
Acquiring a preset text database, wherein the preset text database comprises at least one preset text;
judging each row of the character set respectively, judging whether one row does not comprise any preset character in the preset character database, if yes, then
And deleting the row which does not comprise any preset characters in the preset character database.
Optionally, the training the GPT-2 language model according to the preprocessed text set includes:
in the training process, a weight value is provided for each text set respectively;
during training, resampling is performed on each text set according to different weight values.
The application also provides a device for automatically generating the document based on the GPT framework, which comprises:
the GPT-OD language model acquisition module is used for acquiring a trained GPT-OD language model;
the text information acquisition module is used for acquiring text information input by a user;
and the document information acquisition module is used for inputting the text information into the trained GPT-OD language model so as to acquire the document information output by the GPT-OD language model.
The automatic document generation method based on the GPT architecture has the following advantages:
data preprocessing for document corpus: the simple and efficient data preprocessing strategy is used for filtering large-scale document corpus, so that the document generation capacity of the model is improved;
resampling strategy technique in GPT-OD: the high-quality document has high sampling probability and more training times, and vice versa, thereby realizing the optimal training result.
Drawings
Fig. 1 is a flow chart of a method for automatically generating a document based on a GPT architecture according to an embodiment of the application.
Fig. 2 is a block diagram of an electronic device according to an embodiment of the present application, which is a method for automatically generating a document based on a GPT architecture.
FIG. 3 is a diagram of a GPT series model architecture in one embodiment of the application.
Fig. 4 is a schematic diagram of a generation effect of a method for automatically generating a document based on a GPT architecture according to an embodiment of the present application.
Fig. 5 is a schematic diagram of another generation effect of the automatic document generation method based on the GPT architecture according to an embodiment of the present application.
Fig. 6 is a schematic diagram of another generation effect of the automatic document generation method based on the GPT architecture according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a loss function curve of the automatic document generation method based on the GPT architecture of the present application.
Detailed Description
The following description of the embodiments of the present application will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Fig. 1 is a flow chart of a method for automatically generating a document based on a GPT architecture according to an embodiment of the application.
The automatic document generation method based on the GPT architecture shown in FIG. 1 comprises the following steps:
step 1: acquiring a trained GPT-OD language model;
step 2: acquiring text information input by a user;
step 3: and inputting the text information into the trained GPT-OD language model so as to obtain the document information output by the GPT-OD language model.
In this embodiment, before the obtaining the trained GPT-OD language model, the method for automatically generating a document based on the GPT architecture further includes:
acquiring a training set;
and training the GPT-2 language model through a training set so as to obtain a GPT-OD language model.
In this embodiment, the training set includes a plurality of text sets;
the training of the GPT-2 language model by the training set includes:
preprocessing each text set in the training set;
and training the GPT-2 language model according to the preprocessed text set.
In this embodiment, the preprocessing each text set in the training set includes:
each text set is processed as follows:
acquiring the number of line-feed symbols of a text set;
judging whether the character set exceeds the preset line number according to the number of line-changing symbols of the character set, if not, then
And deleting the text set which does not exceed the preset line number.
In this embodiment, the preprocessing each text set in the training set further includes:
judging whether the text set exceeds the preset line number according to the number of line-changing symbols of the text set, if so, then
Judging whether the word number of the word set is smaller than a first preset word number, if so, then
Deleting the character set smaller than the first preset word number.
In this embodiment, the preprocessing each text set in the training set further includes:
judging whether the word number of the word set is smaller than a first preset word number, if not, then
Judging whether the number of characters in each row in the continuous 5 rows is less than 15 characters according to the line-changing character and the number of words of the character set, if so, then
And deleting the character set with the character number of each row being less than 15 characters in the continuous 5 rows.
In this embodiment, the preprocessing each text set in the training set further includes:
judging whether the number of characters in each row in the continuous 5 rows is less than 15 characters according to the line-changing character and the number of words of the character set, if not, then
Identifying the character set, judging whether the character set has at least two serial numbers, if so, then
Judging whether each serial number is continuous, if not, then
And deleting the text sets with discontinuous sequence numbers.
In this embodiment, the preprocessing each text set in the training set further includes:
judging whether each serial number is continuous, if so, then
Acquiring a preset text database, wherein the preset text database comprises at least one preset text;
judging each row of the character set respectively, judging whether one row does not comprise any preset character in the preset character database, if yes, then
And deleting the row which does not comprise any preset characters in the preset character database.
In this embodiment, the training the GPT-OD language model according to the preprocessed text set includes:
in the training process, a weight value is provided for each text set respectively;
during training, resampling is performed on each text set according to different weight values.
The application will now be described in further detail by way of examples, which should not be construed as limiting the application in any way.
Step 1: acquiring a trained GPT-OD language model;
step 2: acquiring text information input by a user;
step 3: and inputting the text information into the trained GPT-OD language model so as to obtain the document information output by the GPT-OD language model.
Referring to fig. 4 to 6, in the embodiment shown in fig. 4, if we input a sentence, the right-hand description of fig. 4 can be obtained through the GPT-OD language model of the present application.
In the embodiment shown in FIG. 5, if we enter some keywords and are separated by punctuation marks, the right hand description of FIG. 5 can be obtained by the GPT-OD language model of the present application.
In the embodiment shown in fig. 6, if we input a section of, for example, the modern engineering and technology science research of the multidisciplinary fusion is greatly enhanced, the basic science and engineering technology is driven to develop, and a complete modern science and technology system is formed, the description on the right side of fig. 6 can be obtained through the GPT-OD language model of the present application.
In this embodiment, the data set for training the GPT-OD language model of the present application mainly includes two parts, an open-source chinese data corpus such as cleuecorpuscull, and about 23 ten thousand public document files are collected by legal compliance. The following filtering strategies are mainly adopted in the process of cleaning the training set:
the following filtering strategies are mainly adopted in the process of cleaning the training set:
files with fewer than 5 lines, fewer than 500 words, and fewer than 15 characters in consecutive 5 lines;
the sequence numbers in the document file are discontinuous;
and does not include. : (ii) a step of (c) carrying out (a) treatment; lines of any one character from one to fifty.
Specifically, the preprocessing logic for each text set in the training set is as follows:
first, 23 ten thousand document files are legally and successfully collected from document data disclosed by us.
1. Judging that the number of lines of each file is less than 5 lines of filtering according to the line feed symbols in the files;
2. the number of words in each file is determined to be less than 500,
3. judging files with continuous 5 lines of less than 15 characters according to the line feed character and the word number of each file, and filtering;
4. extracting sequence numbers [ one, two-fifty ] in the file, judging whether the sequence numbers are continuous or not, and discontinuously filtering;
5. each line in each file is determined, if a line does not contain [ in the occurrence. : (ii) a step of (c) carrying out (a) treatment; 1. any one of two-fifty characters, then this line is filtered out.
Referring to fig. 3, in this embodiment, the GPT-2 language model is trained on OpenAI using a 40GB oversized dataset WebText crawled from the network, the model architecture is constructed using only a transducer decoder module, and can process a sequence of 1024 words at maximum, each word will flow through all decoder modules along with its previous path, the complete GPT-2 model has 15 hundred million parameters, and the network structure is shown in fig. 3 below.
The GPT-2 based model architecture is a standard transducer model, the number of network layers is 12, the number of layers of the network is 16, the number of layers of the layer is 1024, and the dimension of the FFN layer is 4096; cross-entropy loss with label smoothing rate is equal to 0.1 used in training and is set to contain a maximum of 4096 tokens per batch; in the experiment, adam optimizerwithBeta (0.9,0.98), 2000 wakeup updates and inverse square root learning rate scheduler were used, the maximum learning rate was set to 5e-5, the training period epoch was equal to 5, the batch size was 32, and model preservation was performed every 2000 training steps. Our model was trained using an 8-block NVIDIA Tesla V100 GPU.
Because different files have different importance to training, in the training process, a weight value is provided for each text set respectively; during training, resampling is performed on each text set according to different weight values. In particular, we devised a data resampling strategy that allows high quality files to appear more frequently while maintaining a balance such that each file appears at least once throughout the training process. Because the previous method samples from the training corpus randomly in the training process, the resampling technology samples according to the weight of each file, namely, the document has high quality, the sampling probability is high, the training times are high, and vice versa. The resampling calculation formula is as follows:
w=w 1 *w 2 *w 3 (1)
w 1 representing weights, w, obtained from word sets according to word numbers 2 Representing weights, w, obtained from the number of rows in a literal set 3 The weight obtained according to the average word number of the line in the word set is represented, the weight w of the whole file is obtained by multiplying the three weights, and the larger the weight is, the more important the file is, and vice versa. Wherein N1 is the total number of words in the word set, N2 is the number of rows in the word set, and N3 is the average number of words in the word set.
When training is carried out, iterative training is carried out to optimize the model, and the iterative process is as follows: firstly, large-scale pre-training is carried out by using Chinese data corpus such as CLUECorpusSmall and the like and based on a GPT-2 architecture. Then we continuously pretrain the collected large amount of post-pretreatment document corpus to obtain the current model GPT-OD. A graph of the loss function during model training is shown in fig. 7.
The technical scheme of the application is based on large-scale document data disclosed by legal compliance for training, and is a first domestic document automatic generation model which can be applied to actual online business;
the technical scheme of the application can adjust the diversity, number and theme of the generated document to a certain extent through the temperature sampling technology;
the technical scheme of the application has high robustness. The system can accept any theme, any character and the number of words within 1024, which can perfectly support the on-line business scene;
the technical scheme of the application can serve on-line service by using only 100M parameter models, and the models are small and exquisite without losing performance.
The application also provides a document automatic generation device based on the GPT framework, which comprises a GPT-OD language model acquisition module, a text information acquisition module and a document information acquisition module, wherein the GPT-OD language model acquisition module is used for acquiring a trained GPT-OD language model; the text information acquisition module is used for acquiring text information input by a user; and the document information acquisition module is used for inputting the text information into the trained GPT-OD language model so as to acquire document information output by the GPT-OD language model.
Fig. 2 is a block diagram of an electronic device according to one or more embodiments of the present application.
As shown in fig. 2, the present application also discloses an electronic device, including: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the GPT architecture-based document automatic generation method.
The present application also provides a computer-readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of a GPT architecture-based document automatic generation method.
The communication bus mentioned above for the electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry StandardArchitecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The electronic device includes a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on top of the operating system. The hardware layer comprises a central processing unit (CPU, central Processing Unit), a memory management unit (MMU, memoryManagementUnit), a memory and other hardware. The operating system may be any one or more computer operating systems that implement electronic device control via processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system, etc. In addition, in the embodiment of the present application, the electronic device may be a handheld device such as a smart phone, a tablet computer, or an electronic device such as a desktop computer, a portable computer, which is not particularly limited in the embodiment of the present application.
The execution body controlled by the electronic device in the embodiment of the application can be the electronic device or a functional module in the electronic device, which can call a program and execute the program. The electronic device may obtain firmware corresponding to the storage medium, where the firmware corresponding to the storage medium is provided by the vendor, and the firmware corresponding to different storage media may be the same or different, which is not limited herein. After the electronic device obtains the firmware corresponding to the storage medium, the firmware corresponding to the storage medium can be written into the storage medium, specifically, the firmware corresponding to the storage medium is burned into the storage medium. The process of burning the firmware into the storage medium may be implemented by using the prior art, and will not be described in detail in the embodiment of the present application.
The electronic device may further obtain a reset command corresponding to the storage medium, where the reset command corresponding to the storage medium is provided by the provider, and the reset commands corresponding to different storage media may be the same or different, which is not limited herein.
At this time, the storage medium of the electronic device is a storage medium in which the corresponding firmware is written, and the electronic device may respond to a reset command corresponding to the storage medium in which the corresponding firmware is written, so that the electronic device resets the storage medium in which the corresponding firmware is written according to the reset command corresponding to the storage medium. The process of resetting the storage medium according to the reset command may be implemented in the prior art, and will not be described in detail in the embodiments of the present application.
For convenience of description, the above devices are described as being functionally divided into various units and modules. Of course, the functions of the units, modules may be implemented in one or more pieces of software and/or hardware when implementing the application.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated by one of ordinary skill in the art that the methodologies are not limited by the order of acts, as some acts may, in accordance with the methodologies, take place in other order or concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform the method according to the embodiments or some parts of the embodiments of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (10)

1. The automatic document generation method based on the GPT architecture is characterized by comprising the following steps of:
acquiring a trained GPT-OD language model;
acquiring text information input by a user;
and inputting the text information into the trained GPT-OD language model so as to obtain the document information output by the GPT-OD language model.
2. The GPT-architecture-based automatic document generation method of claim 1, wherein prior to the obtaining the trained GPT-OD language model, the GPT-architecture-based automatic document generation method further comprises:
acquiring a training set;
and training the GPT-2 language model through a training set so as to obtain a GPT-OD language model.
3. The GPT-architecture-based document automatic generation method of claim 2, wherein the training set comprises a plurality of word sets;
the training of the GPT-2 language model by the training set includes:
preprocessing each text set in the training set;
and training the GPT-2 language model according to the preprocessed text set.
4. The GPT-architecture-based document automatic generation method of claim 3, wherein the preprocessing each text set in the training set comprises:
each text set is processed as follows:
acquiring the number of line-feed symbols of a text set;
judging whether the character set exceeds the preset line number according to the number of line-changing symbols of the character set, if not, then
And deleting the text set which does not exceed the preset line number.
5. The GPT-architecture-based document automatic generation method of claim 4, wherein the preprocessing each text set in the training set further comprises:
judging whether the text set exceeds the preset line number according to the number of line-changing symbols of the text set, if so, then
Judging whether the word number of the word set is smaller than a first preset word number, if so, then
Deleting the character set smaller than the first preset word number.
6. The GPT-architecture-based document automatic generation method of claim 5, wherein the preprocessing each set of words in the training set further comprises:
judging whether the word number of the word set is smaller than a first preset word number, if not, then
Judging whether the number of characters in each row in the continuous 5 rows is less than 15 characters according to the line-changing character and the number of words of the character set, if so, then
And deleting the character set with the character number of each row being less than 15 characters in the continuous 5 rows.
7. The GPT-architecture-based document automatic generation method of claim 6, wherein the preprocessing each set of words in the training set further comprises:
judging whether the number of characters in each row in the continuous 5 rows is less than 15 characters according to the line-changing character and the number of words of the character set, if not, then
Identifying the character set, judging whether the character set has at least two serial numbers, if so, then
Judging whether each serial number is continuous, if not, then
And deleting the text sets with discontinuous sequence numbers.
8. The GPT-architecture-based document automatic generation method of claim 7, wherein the preprocessing each set of words in the training set further comprises:
judging whether each serial number is continuous, if so, then
Acquiring a preset text database, wherein the preset text database comprises at least one preset text;
judging each row of the character set respectively, judging whether one row does not comprise any preset character in the preset character database, if yes, then
And deleting the row which does not comprise any preset characters in the preset character database.
9. The GPT-architecture-based document automatic generation method of claim 8, wherein training the GPT-OD language model according to the preprocessed text set comprises:
in the training process, a weight value is provided for each text set respectively;
during training, resampling is performed on each text set according to different weight values.
10. The automatic document generation device based on the GPT architecture is characterized by comprising:
the GPT-OD language model acquisition module is used for acquiring a trained GPT-OD language model;
the text information acquisition module is used for acquiring text information input by a user;
and the document information acquisition module is used for inputting the text information into the trained GPT-OD language model so as to acquire the document information output by the GPT-OD language model.
CN202310009751.9A 2023-01-04 2023-01-04 GPT architecture-based automatic document generation method and device Pending CN116796796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310009751.9A CN116796796A (en) 2023-01-04 2023-01-04 GPT architecture-based automatic document generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310009751.9A CN116796796A (en) 2023-01-04 2023-01-04 GPT architecture-based automatic document generation method and device

Publications (1)

Publication Number Publication Date
CN116796796A true CN116796796A (en) 2023-09-22

Family

ID=88046729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310009751.9A Pending CN116796796A (en) 2023-01-04 2023-01-04 GPT architecture-based automatic document generation method and device

Country Status (1)

Country Link
CN (1) CN116796796A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117592436A (en) * 2023-11-23 2024-02-23 知学云(北京)科技股份有限公司 Automatic document generation system based on artificial intelligence technology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117592436A (en) * 2023-11-23 2024-02-23 知学云(北京)科技股份有限公司 Automatic document generation system based on artificial intelligence technology

Similar Documents

Publication Publication Date Title
CN108984530B (en) Detection method and detection system for network sensitive content
CN109977416B (en) Multi-level natural language anti-spam text method and system
CN108388554B (en) Text emotion recognition system based on collaborative filtering attention mechanism
CN107944014A (en) A kind of Chinese text sentiment analysis method based on deep learning
CN109117474B (en) Statement similarity calculation method and device and storage medium
CN112380319A (en) Model training method and related device
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN112597366B (en) Encoder-Decoder-based event extraction method
CN111008266A (en) Training method and device of text analysis model and text analysis method and device
CN109033073B (en) Text inclusion recognition method and device based on vocabulary dependency triple
CN111858878B (en) Method, system and storage medium for automatically extracting answer from natural language text
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN116796796A (en) GPT architecture-based automatic document generation method and device
CN112307048A (en) Semantic matching model training method, matching device, equipment and storage medium
CN110347833B (en) Classification method for multi-round conversations
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN110968661A (en) Event extraction method and system, computer readable storage medium and electronic device
CN110728144A (en) Extraction type document automatic summarization method based on context semantic perception
CN110610006B (en) Morphological double-channel Chinese word embedding method based on strokes and fonts
CN111274791B (en) Modeling method of user loss early warning model in online home decoration scene
CN116561320A (en) Method, device, equipment and medium for classifying automobile comments
CN110472231A (en) It is a kind of identification legal documents case by method and apparatus
Zhao et al. Commented content classification with deep neural network based on attention mechanism
CN114969294A (en) Expansion method of sound-proximity sensitive words
Li et al. Multilingual toxic text classification model based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination