CN111414735A - Text data generation method and device - Google Patents
Text data generation method and device Download PDFInfo
- Publication number
- CN111414735A CN111414735A CN202010166957.9A CN202010166957A CN111414735A CN 111414735 A CN111414735 A CN 111414735A CN 202010166957 A CN202010166957 A CN 202010166957A CN 111414735 A CN111414735 A CN 111414735A
- Authority
- CN
- China
- Prior art keywords
- text data
- theme
- deep learning
- learning model
- sentences
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000013136 deep learning model Methods 0.000 claims abstract description 72
- 230000014509 gene expression Effects 0.000 claims abstract description 58
- 238000012545 processing Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000004308 accommodation Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application discloses a text data generation method and device. The method comprises the following steps: acquiring text data corresponding to a preset theme to obtain a sample data set of the theme; establishing a deep learning model for generating text data of the theme by using the sample data set of the theme, wherein the deep learning model records a language logic expression relationship; and after receiving a text generation request for the theme, generating corresponding text data by using the deep learning model of the theme.
Description
Technical Field
The present invention relates to the field of information processing, and in particular, to a method and an apparatus for generating text data.
Background
In the business scenes of the e-commerce industry and new media, a great deal of text information such as news, brief introduction, soft text and the like is needed as an important basis for information propagation and diffusion. In the related technology, based on a given text set, the existing text data is spliced and integrated by adopting a mode of randomly sampling from the text set, so that a result text with specified content is generated; or, the text is segmented and then generated in a random selection and integration mode.
The readability of the text generated by random sampling is poor, and the content repetition degree of the content and other texts is high.
Disclosure of Invention
In order to solve any technical problem, embodiments of the present application provide a method and an apparatus for generating text data.
In order to achieve the purpose of the embodiment of the present application, an embodiment of the present application provides a method for generating text data, including:
acquiring text data corresponding to a preset theme to obtain a sample data set of the theme;
establishing a deep learning model for generating text data of the theme by using the sample data set of the theme, wherein the deep learning model records a language logic expression relationship;
and after receiving a text generation request for the theme, generating corresponding text data by using the deep learning model of the theme.
In an exemplary embodiment, the obtaining text data corresponding to a preset theme includes:
acquiring text data on a website by using a preset text acquisition tool;
and classifying the collected text data according to the theme of the text data to obtain the text data corresponding to the theme.
In an exemplary embodiment, said building a deep learning model for generating text data of said topic using said sample dataset of said topic comprises:
identifying a language logic expression relation between words and sentences in each text data in the sample data set;
performing cross combination on words and sentences in at least two text data according to the language logic expression relation to obtain new words and sentences;
and establishing the deep learning model according to the language logic expression relation and the new words and sentences.
In an exemplary embodiment, the generating corresponding text data using the deep learning model of the topic includes:
acquiring keyword information of the theme;
inquiring a target word and sentence which accords with the description content of the preset keyword from words and sentences pre-stored in the deep learning model of the theme;
and controlling the deep learning model to arrange and combine the target words and sentences according to a language logic expression relation acquired in advance to obtain text data corresponding to the keywords.
In an exemplary embodiment, before the controlling the deep learning model to arrange and combine the target words and sentences according to the pre-obtained language logic expression relationship, the method further includes:
acquiring target text data including the keywords;
and identifying the target text data by using the deep learning model to obtain a language logic expression relation of the target text information, and performing cross combination on words and sentences in at least two target text data according to the language logic expression relation of the target text information to obtain new target words and sentences.
A generation apparatus of text data, comprising:
the acquisition module is used for acquiring text data corresponding to a preset theme to obtain a sample data set of the theme;
the establishing module is used for establishing a deep learning model for generating text data of the theme by utilizing the sample data set of the theme, wherein the deep learning model records a language logic expression relationship;
and the generating module is used for generating corresponding text data by using the deep learning model of the theme after receiving a text generating request for the theme.
In one exemplary embodiment, the obtaining module includes:
the acquisition unit is used for acquiring text data on a website by using a preset text acquisition tool;
and the classification unit is used for classifying the collected text data according to the theme of the text data to obtain the text data corresponding to the theme.
In one exemplary embodiment, the establishing module includes:
the identification unit is used for identifying the language logic expression relation between words and sentences in each text data in the sample data set;
the combination unit is used for performing cross combination on words and sentences in at least two text data according to the language logic expression relation to obtain new words and sentences;
and the establishing unit is used for establishing the deep learning model according to the language logic expression relation and the new words and sentences.
In one exemplary embodiment, the generating module includes:
a first obtaining unit, configured to obtain keyword information of the topic;
the query unit is used for acquiring the keyword information of the theme; inquiring a target word and sentence which accords with the description content of the preset keyword from words and sentences pre-stored in the deep learning model of the theme;
and the control unit is used for controlling the deep learning model to arrange and combine the target words and sentences according to a pre-acquired language logic expression relation to obtain text data corresponding to the keywords.
In an exemplary embodiment, the generating module further comprises:
a second obtaining unit, configured to obtain target text data including the keywords before the target words and sentences are arranged and combined;
and the processing unit is used for identifying the target text data by using the deep learning model to obtain a language logic expression relation of the target text information, and performing cross combination on words and sentences in at least two target text data according to the language logic expression relation of the target text information to obtain new target words and sentences.
According to the scheme provided by the embodiment of the application, the text data corresponding to the preset theme is obtained, the sample data set of the theme is obtained, the deep learning model for generating the text data of the theme is established by using the sample data set of the theme, the deep learning model records the language logic expression relationship, after the text generation request for the theme is received, the deep learning model of the theme is used for generating the corresponding text data, the deep learning model is obtained by training in the obtained sample data set, the specified theme text is generated by using the language logic expression relationship in the deep learning model, and the readability of the text is improved.
Additional features and advantages of the embodiments of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the embodiments of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments of the present application and are incorporated in and constitute a part of this specification, illustrate embodiments of the present application and together with the examples of the embodiments of the present application do not constitute a limitation of the embodiments of the present application.
Fig. 1 is a flowchart of a text data generation method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an L STM framework of layers provided by an embodiment of the present application;
fig. 3 is a block diagram of a text data generation device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that, in the embodiments of the present application, features in the embodiments and the examples may be arbitrarily combined with each other without conflict.
In the process of implementing the present application, the inventors found that the random concatenation generation method based on the given text in the related art has the following problems:
1. limitations of the content of the generated results: because the existing schemes are all obtained from a given text set, the generated result is also limited to the given text set, and the generated text has high repeatability under the condition of less text sets.
2. The consistency of the generated text is poor: the method based on random splicing generation does not consider the continuity of the upper and lower semantics of the text, so that the result is not consistent before and after, and the readability is poor.
In summary, the random concatenation generation method based on the given text has poor results, and the limitations and the inconsistency are difficult to be eliminated from the technical aspect, so the following solutions are proposed in the present application:
fig. 1 is a flowchart of a text data generation method according to an embodiment of the present application. As shown in fig. 1, the method shown in fig. 1 includes:
in one exemplary embodiment, the topic is a specific word, which is an expanded description of the textual content around the word; for example, the subject term corresponding to the text data describing a river may be the name of the river.
In an exemplary embodiment, the text data corresponding to the theme may be manually read, extracted, or acquired by using a preset text reading tool.
In an exemplary embodiment, the obtaining text data corresponding to a preset theme includes:
acquiring text data on a website by using a preset text acquisition tool;
and classifying the collected text data according to the theme of the text data to obtain the text data corresponding to the theme.
The text collection tool may be a web crawler tool;
the method comprises the steps of acquiring data of a preset website in real time by using a text acquisition tool to obtain text data, classifying the text data according to a theme to obtain the text data of the theme, completing real-time updating of a sample data set, and reducing the repetition degree of subsequently generated text data.
102, establishing a deep learning model for generating text data of the theme by using the sample data set of the theme, wherein the deep learning model records a language logic expression relationship;
aiming at the problem of text readability cross in the related technology, through training of the sample data set, an expression rule of language logic can be obtained, and an operation basis is provided for subsequently generated texts with readability.
In an exemplary embodiment, said building a deep learning model for generating text data of said topic using said sample dataset of said topic comprises:
identifying a language logic expression relation between words and sentences in each text data in the sample data set;
performing cross combination on words and sentences in at least two text data according to the language logic expression relation to obtain new words and sentences;
and establishing the deep learning model according to the language logic expression relation and the new words and sentences.
After the language logic expression relationship is obtained, words and sentences in at least two text data can be combined in a cross mode to obtain new words and sentences which have readability and are not repeated, and data support is provided for subsequently generated new text data.
Wherein, the mode of cross combination includes:
selecting at least one word from each text data, and combining to obtain a new combination; or,
at least one word is selected from each text data, combined, and the combined content is modified, and the modified combination is used as a new combination.
For example, ABC is used for describing the word X in the text 1, DEF is used for describing the word X in the text 2, and GH is used for describing the word X in the text 3, wherein a to H represent different words; after the contents in the 3 texts are cross-combined, the contents used for describing the word X can be various expression contents such as AEH, CFG and the like. Alternatively, after obtaining various expressions such as AEH and CFG, some of the newly combined contents may be adjusted, for example, a in AEH is modified to obtain a ', and then combined with EH to obtain a' EH.
As can be seen from the above example, the repetition degree of the text data after the recombination and the previous text data is significantly reduced.
And 103, after receiving a text generation request for the theme, generating corresponding text data by using the deep learning model of the theme.
In an exemplary embodiment, by means of the language logic expression relationship recorded in the deep learning model, and by using the text data in the collected sample data set to generate new text data, it can be ensured that the content in the new text data conforms to the language logic expression relationship and has readability.
In an exemplary embodiment, the generating corresponding text data using the deep learning model of the topic includes:
acquiring keyword information of the theme;
inquiring a target word and sentence which accords with the description content of the preset keyword from words and sentences pre-stored in the deep learning model of the theme;
and controlling the deep learning model to arrange and combine the target words and sentences according to a language logic expression relation acquired in advance to obtain text data corresponding to the keywords.
The keyword information can be used for limiting the theme and determining the direction of the emphasis description of the theme, and when the theme word is a certain scenic spot, the keyword information can be human history, landscape, catering accommodation and the like. And the keywords are utilized to screen words and sentences stored in advance, so that the generated text data is closer to the requirements of users, and the content accuracy of the text data is improved.
In an exemplary embodiment, before the controlling the deep learning model to arrange and combine the target words and sentences according to the pre-obtained language logic expression relationship, the method further includes:
acquiring target text data including the keywords;
and identifying the target text data by using the deep learning model to obtain a language logic expression relation of the target text information, and performing cross combination on words and sentences in at least two target text data according to the language logic expression relation of the target text information to obtain new target words and sentences.
The target file text data comprising the keywords can better meet the requirements of users, and the text data more meeting the requirements of the users can be obtained by analyzing the target file data.
According to the method provided by the embodiment of the application, the text data corresponding to the preset theme is obtained, the sample data set of the theme is obtained, the deep learning model for generating the text data of the theme is established by using the sample data set of the theme, the deep learning model records the language logic expression relationship, after the text generation request for the theme is received, the deep learning model of the theme is used for generating the corresponding text data, the deep learning model is obtained by training in the obtained sample data set, the specified theme text is generated by using the language logic expression relationship in the deep learning model, and the readability of the text is improved.
The method provided by the embodiments of the present application is explained as follows:
the L STM (L ong Short Term Memory) algorithm used in the invention belongs to one type of RNN (neural network) and is good at processing and analyzing events with longer time intervals and delays in a time sequence, and is usually used for problems of speech recognition, language translation, stock prediction and the like.
The method comprises the key steps of data acquisition, data processing, model training and text generation, and after a trained L STM model is obtained, a complete generation method is finally completed by combining specific service conditions.
1. A data acquisition step: automatically acquiring a data set by using a Python crawler, and crawling text data on a certain website;
the Python crawler is used for acquiring data, and comments on a preset website can be crawled. The method can be realized by using a current module of Python, calls 10 threads to crawl data, and stores the data in a csv file format.
2. And (3) data processing: cleaning the acquired data set, removing nonstandard text contents such as special characters, punctuations, line feed and the like, unifying text formats, and dividing a training set and a verification set of the data;
the method comprises the steps of obtaining a csv file, extracting special characters in the text, and obtaining a model for building L STM model.
3. A model training step, namely building L STM model based on the obtained data set to complete the training of the model;
the deep learning model is created by adopting Keras, wherein the Keras is an advanced neural network API and supports TensorFlow, Theano and CNTK, and some complex neural network models can be easily built through the Keras, so that the deep learning model is a commonly used deep learning framework with better performance.
In order to automatically generate the text of the comment content with high readability, the method builds L STM models on the comment data set obtained in the last step.
In the process of building L STM model, firstly, each character is corresponding to a number, the input feature vector of the model is a vector formed by the numbers corresponding to the first 10 characters, the target variable is the number corresponding to the next character of the 10 characters, 1949 characters (including Chinese characters and punctuations) are totally contained in the txt file, 41402 samples are obtained by processing according to the processing mode, and the samples are transmitted into the L STM model.
The steps of establishing L STM model are as follows:
FIG. 2 is a schematic diagram of L STM framework of layers provided by an embodiment of the present application, as shown in FIG. 2, each layer of the L STM framework of 2 layers has 128 hidden layer nodes, batch _ size is set to 64 (i.e., 64 samples are taken for training at a time), a Dropout layer is created, which can effectively alleviate overfitting phenomenon under the condition of excessive model parameters and less sample data, and achieve the regularization effect to a certain extent, and the last layer is a Softmax layer, which converts the Softmax layer into a multi-classification problem, wherein a loss function is calculated by using cross entropy and updates the model parameters by back propagation.
Analysis of the text data was accomplished by the following advantages by L STM, including:
(1) the accuracy for classification is high;
(2) the algorithm has strong parallel processing capability and can be used for distributed storage and learning;
(3) the method has better fault-tolerant capability and robustness for noise nerves in data, and can fully approximate to a complex nonlinear relation;
(4) the short-term memory function can process the internal relation between texts, and avoid processing characters into single individuals.
4. A text generation step: and generating a target text by using the model obtained by training.
After training to obtain L STM model, the result text is generated by using the trained model, because the input feature vector is the vector corresponding to the first 10 characters in the training, the length of the input character is also 10 characters in the generation stage.
Secondly, comparing the generated text with the text set in the sample data, not completely copying and editing the sample, but selectively organizing the texts with logical connection in the data set, thereby reducing the repetition degree of the text.
When the data set is limited, the generated texts obtained through multiple training have higher similarity, but when the number of automatically acquired samples is increased continuously, the similarity between the generated texts can be reduced, and a more ideal training result is obtained.
The method can be used for automatically crawling the text, expanding the sample data set, being suitable for the service scene of automatic text generation of various different themes, training the automatically acquired data set by using the L STM model to obtain the L STM model, and generating the target text with better readability of the specified theme.
Fig. 3 is a block diagram of a text data generation device according to an embodiment of the present application. As shown in fig. 3, the apparatus shown in fig. 3 includes:
the acquisition module is used for acquiring text data corresponding to a preset theme to obtain a sample data set of the theme;
the establishing module is used for establishing a deep learning model for generating text data of the theme by utilizing the sample data set of the theme, wherein the deep learning model records a language logic expression relationship;
and the generating module is used for generating corresponding text data by using the deep learning model of the theme after receiving a text generating request for the theme.
In one exemplary embodiment, the obtaining module includes:
the acquisition unit is used for acquiring text data on a website by using a preset text acquisition tool;
and the classification unit is used for classifying the collected text data according to the theme of the text data to obtain the text data corresponding to the theme.
In one exemplary embodiment, the establishing module includes:
the identification unit is used for identifying the language logic expression relation between words and sentences in each text data in the sample data set;
the combination unit is used for performing cross combination on words and sentences in at least two text data according to the language logic expression relation to obtain new words and sentences;
and the establishing unit is used for establishing the deep learning model according to the language logic expression relation and the new words and sentences.
In one exemplary embodiment, the generating module includes:
a first obtaining unit, configured to obtain keyword information of the topic;
the query unit is used for querying a target word and sentence which accords with the description content of the preset keyword from words and sentences pre-stored in the deep learning model of the theme;
and the control unit is used for controlling the deep learning model to arrange and combine the target words and sentences according to a pre-acquired language logic expression relation to obtain text data corresponding to the keywords.
In an exemplary embodiment, the generating module further comprises:
a second obtaining unit, configured to obtain target text data including the keywords before the target words and sentences are arranged and combined;
and the processing unit is used for identifying the target text data by using the deep learning model to obtain a language logic expression relation of the target text information, and performing cross combination on words and sentences in at least two target text data according to the language logic expression relation of the target text information to obtain new target words and sentences.
According to the device provided by the embodiment of the application, the text data corresponding to the preset theme is obtained, the sample data set of the theme is obtained, the deep learning model for generating the text data of the theme is established by using the sample data set of the theme, the deep learning model records the language logic expression relationship, after the text generation request of the theme is received, the deep learning model of the theme is used for generating the corresponding text data, the deep learning model is obtained by training in the obtained sample data set, the specified theme text is generated by using the language logic expression relationship in the deep learning model, and the readability of the text is improved.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
Claims (10)
1. A method for generating text data, comprising:
acquiring text data corresponding to a preset theme to obtain a sample data set of the theme;
establishing a deep learning model for generating text data of the theme by using the sample data set of the theme, wherein the deep learning model records a language logic expression relationship;
and after receiving a text generation request for the theme, generating corresponding text data by using the deep learning model of the theme.
2. The method according to claim 1, wherein the obtaining text data corresponding to a preset theme comprises:
acquiring text data on a website by using a preset text acquisition tool;
and classifying the collected text data according to the theme of the text data to obtain the text data corresponding to the theme.
3. The method of claim 1, wherein said building a deep learning model for generating text data of said topic using said sample dataset of said topic comprises:
identifying a language logic expression relation between words and sentences in each text data in the sample data set;
performing cross combination on words and sentences in at least two text data according to the language logic expression relation to obtain new words and sentences;
and establishing the deep learning model according to the language logic expression relation and the new words and sentences.
4. The method of claim 3, wherein generating corresponding text data using the deep learning model of the topic comprises:
acquiring keyword information of the theme;
inquiring a target word and sentence which accords with the description content of the preset keyword from words and sentences pre-stored in the deep learning model of the theme;
and controlling the deep learning model to arrange and combine the target words and sentences according to a language logic expression relation acquired in advance to obtain text data corresponding to the keywords.
5. The method according to claim 4, wherein before controlling the deep learning model to arrange and combine the target words and sentences according to the pre-obtained language logic expression relationship, the method further comprises:
acquiring target text data including the keywords;
and identifying the target text data by using the deep learning model to obtain a language logic expression relation of the target text information, and performing cross combination on words and sentences in at least two target text data according to the language logic expression relation of the target text information to obtain new target words and sentences.
6. An apparatus for generating text data, comprising:
the acquisition module is used for acquiring text data corresponding to a preset theme to obtain a sample data set of the theme;
the establishing module is used for establishing a deep learning model for generating text data of the theme by utilizing the sample data set of the theme, wherein the deep learning model records a language logic expression relationship;
and the generating module is used for generating corresponding text data by using the deep learning model of the theme after receiving a text generating request for the theme.
7. The apparatus of claim 6, wherein the obtaining module comprises:
the acquisition unit is used for acquiring text data on a website by using a preset text acquisition tool;
and the classification unit is used for classifying the collected text data according to the theme of the text data to obtain the text data corresponding to the theme.
8. The apparatus of claim 6, wherein the establishing module comprises:
the identification unit is used for identifying the language logic expression relation between words and sentences in each text data in the sample data set;
the combination unit is used for performing cross combination on words and sentences in at least two text data according to the language logic expression relation to obtain new words and sentences;
and the establishing unit is used for establishing the deep learning model according to the language logic expression relation and the new words and sentences.
9. The apparatus of claim 8, wherein the generating module comprises:
a first obtaining unit, configured to obtain keyword information of the topic;
the query unit is used for acquiring the keyword information of the theme; inquiring a target word and sentence which accords with the description content of the preset keyword from words and sentences pre-stored in the deep learning model of the theme;
and the control unit is used for controlling the deep learning model to arrange and combine the target words and sentences according to a pre-acquired language logic expression relation to obtain text data corresponding to the keywords.
10. The apparatus of claim 9, wherein the generating module further comprises:
a second obtaining unit, configured to obtain target text data including the keywords before the target words and sentences are arranged and combined;
and the processing unit is used for identifying the target text data by using the deep learning model to obtain a language logic expression relation of the target text information, and performing cross combination on words and sentences in at least two target text data according to the language logic expression relation of the target text information to obtain new target words and sentences.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010166957.9A CN111414735B (en) | 2020-03-11 | 2020-03-11 | Text data generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010166957.9A CN111414735B (en) | 2020-03-11 | 2020-03-11 | Text data generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111414735A true CN111414735A (en) | 2020-07-14 |
CN111414735B CN111414735B (en) | 2024-03-22 |
Family
ID=71491096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010166957.9A Active CN111414735B (en) | 2020-03-11 | 2020-03-11 | Text data generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111414735B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984845A (en) * | 2020-08-17 | 2020-11-24 | 江苏百达智慧网络科技有限公司 | Website wrongly-written character recognition method and system |
CN112699643A (en) * | 2020-12-23 | 2021-04-23 | 车智互联(北京)科技有限公司 | Method for generating language model and method for automatically generating article |
CN117033934A (en) * | 2023-08-02 | 2023-11-10 | 中信联合云科技有限责任公司 | Content generation method and device based on artificial intelligence |
RU2821835C1 (en) * | 2023-07-18 | 2024-06-26 | Общество с ограниченной ответственностью "Открытый код" | Method of text generation based on machine learning |
WO2024136504A1 (en) * | 2022-12-22 | 2024-06-27 | Cj Olivenetworks Co., Ltd. | Artificial intelligence-based creative phrase generation system and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107644085A (en) * | 2017-09-22 | 2018-01-30 | 百度在线网络技术(北京)有限公司 | The generation method and device of competitive sports news |
CN107797982A (en) * | 2016-08-31 | 2018-03-13 | 百度在线网络技术(北京)有限公司 | For identifying the method, apparatus and equipment of text type |
DE102019000433A1 (en) * | 2018-04-23 | 2019-10-24 | Adobe Inc. | Generate a topic-based summary of a text content |
CN110750975A (en) * | 2019-10-21 | 2020-02-04 | 北京明略软件系统有限公司 | Introduction text generation method and device |
-
2020
- 2020-03-11 CN CN202010166957.9A patent/CN111414735B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107797982A (en) * | 2016-08-31 | 2018-03-13 | 百度在线网络技术(北京)有限公司 | For identifying the method, apparatus and equipment of text type |
CN107644085A (en) * | 2017-09-22 | 2018-01-30 | 百度在线网络技术(北京)有限公司 | The generation method and device of competitive sports news |
DE102019000433A1 (en) * | 2018-04-23 | 2019-10-24 | Adobe Inc. | Generate a topic-based summary of a text content |
CN110390009A (en) * | 2018-04-23 | 2019-10-29 | 奥多比公司 | Generate the summary based on theme of content of text |
CN110750975A (en) * | 2019-10-21 | 2020-02-04 | 北京明略软件系统有限公司 | Introduction text generation method and device |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984845A (en) * | 2020-08-17 | 2020-11-24 | 江苏百达智慧网络科技有限公司 | Website wrongly-written character recognition method and system |
CN111984845B (en) * | 2020-08-17 | 2023-10-31 | 江苏百达智慧网络科技有限公司 | Website wrongly written word recognition method and system |
CN112699643A (en) * | 2020-12-23 | 2021-04-23 | 车智互联(北京)科技有限公司 | Method for generating language model and method for automatically generating article |
CN112699643B (en) * | 2020-12-23 | 2024-04-19 | 车智互联(北京)科技有限公司 | Method for generating language model and automatic article generation method |
WO2024136504A1 (en) * | 2022-12-22 | 2024-06-27 | Cj Olivenetworks Co., Ltd. | Artificial intelligence-based creative phrase generation system and method |
RU2821835C1 (en) * | 2023-07-18 | 2024-06-26 | Общество с ограниченной ответственностью "Открытый код" | Method of text generation based on machine learning |
CN117033934A (en) * | 2023-08-02 | 2023-11-10 | 中信联合云科技有限责任公司 | Content generation method and device based on artificial intelligence |
CN117033934B (en) * | 2023-08-02 | 2024-04-19 | 中信联合云科技有限责任公司 | Content generation method and device based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN111414735B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107679039B (en) | Method and device for determining statement intention | |
CN111414735B (en) | Text data generation method and device | |
US8868609B2 (en) | Tagging method and apparatus based on structured data set | |
CN109697239B (en) | Method for generating teletext information | |
CN107301170B (en) | Method and device for segmenting sentences based on artificial intelligence | |
CN111339250B (en) | Mining method for new category labels, electronic equipment and computer readable medium | |
CN110888990A (en) | Text recommendation method, device, equipment and medium | |
CN112131449A (en) | Implementation method of cultural resource cascade query interface based on elastic search | |
CN108304424B (en) | Text keyword extraction method and text keyword extraction device | |
JP2022088304A (en) | Method for processing video, device, electronic device, medium, and computer program | |
CN114297439A (en) | Method, system, device and storage medium for determining short video label | |
CN115982376B (en) | Method and device for training model based on text, multimode data and knowledge | |
CN112015928A (en) | Information extraction method and device of multimedia resource, electronic equipment and storage medium | |
US10595098B2 (en) | Derivative media content systems and methods | |
CN112199954A (en) | Disease entity matching method and device based on voice semantics and computer equipment | |
CN111736804A (en) | Method and device for identifying App key function based on user comment | |
CN114860992A (en) | Video title generation method, device, equipment and storage medium | |
CN113946668A (en) | Semantic processing method, system and device based on edge node and storage medium | |
US10499121B2 (en) | Derivative media content systems and methods | |
CN115858776B (en) | Variant text classification recognition method, system, storage medium and electronic equipment | |
CN114662002A (en) | Object recommendation method, medium, device and computing equipment | |
CN113761081A (en) | Method and system for carrying out multi-dimensional combined retrieval on enterprise information | |
CN110276001B (en) | Checking page identification method and device, computing equipment and medium | |
CN113535125A (en) | Financial demand item generation method and device | |
CN112632962A (en) | Method and device for realizing natural language understanding in human-computer interaction system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |