CN112667815A - Text processing method and device, computer readable storage medium and processor - Google Patents

Text processing method and device, computer readable storage medium and processor Download PDF

Info

Publication number
CN112667815A
CN112667815A CN202011625158.XA CN202011625158A CN112667815A CN 112667815 A CN112667815 A CN 112667815A CN 202011625158 A CN202011625158 A CN 202011625158A CN 112667815 A CN112667815 A CN 112667815A
Authority
CN
China
Prior art keywords
text
topic
content
classification model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011625158.XA
Other languages
Chinese (zh)
Inventor
李健
谢园园
陈明
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN202011625158.XA priority Critical patent/CN112667815A/en
Publication of CN112667815A publication Critical patent/CN112667815A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text processing method, a text processing device, a computer readable storage medium and a processor. Wherein, the method comprises the following steps: acquiring a text; inputting a text into a topic classification model, outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using a plurality of groups of training data through machine learning training, and each group of data in the plurality of groups of training data comprises: and the text and the classification result corresponding to the text. The invention solves the technical problem that the prior art can not enable new media personnel to quickly find required topics by only carrying out cluster analysis and analyzing the trend of public sentiment.

Description

Text processing method and device, computer readable storage medium and processor
Technical Field
The invention relates to the field of text data processing, in particular to a text processing method, a text processing device, a computer readable storage medium and a processor.
Background
Most new media people use search engines such as hundredths, 360, dog searches, microblogs, etc. to find authored material and hotspots. However, in the face of miscellaneous news materials, a lot of time is consumed to extract useful topics and read text core points, and then the news materials are processed into required articles. The general implementation scheme is a network public opinion information analysis system based on text semantic correlation, and the system is used for realizing the cluster analysis of dynamic data. However, the prior art has the following disadvantages: for example, it is impossible to help new media operators to effectively classify topics by fields, automatically generalize article ideas, and show their public opinion trends; the rewriting of articles can not be carried out by setting audience groups, keywords and the like; the written article can not be modified by the system, such as checking wrongly written words, checking the context logic relationship, etc.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a text processing method, a text processing device, a computer readable storage medium and a processor, which at least solve the technical problem that new media personnel cannot quickly find required topics by only performing cluster analysis and analyzing the public sentiment trend in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a text processing method, including: acquiring a text; inputting the text into a topic classification model, and outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using multiple sets of training data through machine learning training, and each set of data in the multiple sets of training data comprises: and the text and the classification result corresponding to the text.
Optionally, the topic classification model includes at least one of: the method comprises the following steps of inputting a topic classification model into a layout/industry classification model, a topic type model and a customized model, and outputting a classification result corresponding to the text by the topic classification model, wherein the classification result comprises the following steps: under the condition that the topic classification model is the plate/industry classification model, identifying a news plate/industry corresponding to the text according to the plate/industry classification model; under the condition that the topic classification model is the topic type model, identifying a subdivided topic corresponding to the text according to the topic type model; and under the condition that the topic classification model is the customized model, identifying the customized topic corresponding to the text according to the customized model.
Optionally, the method further comprises: analyzing the text content of the text to obtain an analysis processing result, wherein the analysis processing comprises at least one of the following steps: content representation, weight calculation and content selection; and organizing content according to the analysis processing result to obtain the abstract content of the text.
Optionally, the method further comprises: performing article modification on the text content of the text, wherein the performing article modification on the text content of the text comprises: adjusting the text content of the text according to different analysis angles, wherein the analysis angles include at least one of: language accuracy, chapter structure, theme.
Optionally, adjusting the text content of the text according to different analysis angles includes: under the condition that the analysis angle is language accuracy, adjusting the text content of the text based on the odd character error correction, the grammar detection, the idiom detection and the literacy detection; under the condition that the analysis angle is a chapter structure, adjusting the structure and viewpoint of the text content of the text; and under the condition that the analysis angle is a theme, identifying the theme of the text content of the text.
Optionally, the method further comprises: rewriting the article of the text content of the text, wherein the rewriting the text content of the text comprises: determining an update type of text content of the text; and when the updating type is changed, replanning the text content of the text.
Optionally, the replanning the text content of the text comprises at least one of: performing document planning on the text content of the text, wherein the document planning comprises at least one of the following: determining generated content and determining a document structure; micro-programming the text content of the text, wherein the micro-programming comprises at least one of: sentence generation, sentence linkage, paragraph generation and title generation; performing surface layer rewriting on the text content of the text, wherein the surface layer rewriting comprises at least one of the following steps: and (5) retouching and matching a format drawing.
According to another aspect of the embodiments of the present invention, there is also provided a text processing apparatus including: the acquisition module is used for acquiring a text; the first processing module is used for inputting the texts into a topic classification model, and outputting classification results corresponding to the texts by the topic classification model, wherein the topic classification model is obtained by using multiple sets of training data through machine learning training, and each set of data in the multiple sets of training data comprises: the text and the classification result corresponding to the text; the topic classification model includes at least one of: a plate/industry classification model, a topic type model and a customized model.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the text processing method described in any one of the above.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the text processing method described in any one of the above.
In the embodiment of the invention, the text is obtained; inputting the text into a topic classification model, and outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using multiple sets of training data through machine learning training, and each set of data in the multiple sets of training data comprises: the text and the classification result corresponding to the text are identified through the topic classification model, so that the technical effect of quickly and accurately realizing topic classification is realized, and the technical problem that in the prior art, only clustering analysis is carried out, the trend of public sentiment of the new media person is analyzed, and the new media person cannot quickly find the required topic is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of a method of text processing according to an embodiment of the invention;
fig. 2 is a schematic diagram of a text processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, some nouns or terms appearing in the present invention will be described in detail below.
Natural Language Understanding (NLU for short): the semantic understanding engine provides natural language processing capacity for a computer system, is used for understanding article contents, extracts abstract contents of the article contents, and combines a large-scale corpus to perform secondary article creation according to word segmentation, syntax analysis semantic association and entity recognition technologies.
Data mining: by adopting scientific methods in the fields of mathematics, statistics, artificial intelligence, neural networks and the like, such as technologies of memory reasoning, cluster analysis, association analysis, decision trees, neural networks, genetic algorithms and the like, implicit and previously unknown relationships, modes and trends with potential value on decision are mined from a large amount of data, and a model for decision support is established by using the knowledge and rules to provide a method, a tool and a process for predictive decision support. The invention mainly uses the clustering and classifying functions.
And (3) modifying the article: through the NLU technology, words in the database can be compared with words in the article, and wrongly written or mispronounced words can be searched; through the NLU technology, the sentence can be cut into characters, integrated into words, the part of speech is marked out, finally, the sentence structure is judged according to the sentence pattern model, and the correctness of the sentence structure is checked; and (3) summarizing the paragraphs into core contents by using viewpoint extraction, and finally analyzing the logical relationship among the core contents by using a deep learning technology so as to check the context logical relationship.
Rewriting the article: a large number of articles are read through the NLU deep learning technology, a set of fixed writing templates can be summarized for each field, and then the coherent articles are generated intelligently by capturing data in materials.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a text processing method, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of a text processing method according to an embodiment of the present invention, as shown in fig. 1, the text processing method includes the steps of:
step S102, acquiring a text;
the text may be sourced from different website pages, for example, the text on the website page may be obtained by using a crawler technology.
Step S104, inputting the text into a topic classification model, and outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using multiple groups of training data through machine learning training, and each group of data in the multiple groups of training data comprises: and the text and the classification result corresponding to the text.
As an alternative embodiment, the topic classification model may be obtained by training a semi/unsupervised model using big data/labeled data, customized data, and the like, wherein the training process may adopt modes such as a speech vector, deep learning, machine learning, and the like.
As an alternative embodiment, the topic classification model includes, but is not limited to, a section/industry classification model, a topic type model, and a customized model. In the specific implementation process, any one model can be adopted, and a mode of combining a plurality of models can also be adopted.
Through above-mentioned step, can realize adopting and acquire the text, with text input topic classification model again, by topic classification model output and the classification result that the text corresponds, wherein, topic classification model is for using multiunit training data, obtains through machine learning training, and every group data in the multiunit training data all includes: the classification results corresponding to the texts and the texts are identified through the topic classification model, so that the technical effect of quickly and accurately realizing topic classification is achieved, and the technical problem that in the prior art, only clustering analysis is carried out, the trend of public sentiment of the users is analyzed, and the users of new media cannot quickly find out required topics is solved.
Optionally, the topic classification model includes at least one of: the method comprises the following steps of inputting a block/industry classification model, a topic type model and a customized model into a topic classification model, and outputting a classification result corresponding to a text by the topic classification model, wherein the classification result comprises the following steps: under the condition that the topic classification model is a plate/industry classification model, identifying a news plate/industry corresponding to the text according to the plate/industry classification model; under the condition that the topic classification model is a topic type model, identifying a subdivided topic corresponding to the text according to the topic type model; and under the condition that the topic classification model is a customized model, identifying the customized topic corresponding to the text according to the customized model.
As an alternative embodiment, the collected trending topics can be classified, and corresponding topic labels are automatically marked according to the text content. For example, industry classifications (e.g., automobile, luxury, diet, general well-being, etc.), and multiple breakdown sub-topic tags (e.g., natural disasters, food safety, regulation violations, high-management variations, etc.) may be by block category (e.g., finance, sports, science, civilian, etc.). Meanwhile, various customized topics can be quickly added and identified by using a classification technology in big data mining.
As an alternative embodiment, the method further includes: constructing a keyword map based on the media big data; determining the occurrence frequency of the keywords according to the keyword map; and determining the keywords as hot topics under the condition that the frequency of occurrence of the keywords is greater than or equal to the preset frequency. By the method, the hot topics can be quickly and accurately found from the media big data. In addition, in order to enable the hot topics to be more visual, the hot topics can be visualized, wherein the frequency, the ranking and the like of the hot topics can be displayed, and the hot topics can be marked by adopting different colors.
As an alternative embodiment, the mass news is a huge text database, and by using semantic recognition and viewpoint extraction functions of data mining, keywords and core contents such as information sources, places, people and events are extracted as tags, and then are classified into corresponding categories according to the tags and time sequence. When a new media operator needs to retrieve information about a certain topic, a list containing relevant content can be provided quickly.
It should be noted that topic classification is performed by a data mining technique, and text ideas are summarized by a natural language understanding technique. When a worker needs to retrieve information about a certain topic, a list containing the relevant content can be provided quickly, thereby saving new media operators a lot of reading time.
Optionally, the method further includes: analyzing the text content of the text to obtain an analysis processing result, wherein the analysis processing comprises at least one of the following steps: content representation, weight calculation and content selection; and organizing the content according to the analysis processing result to obtain the abstract content of the text.
As an optional embodiment, analysis processing such as content representation, weight calculation, content selection, and the like may be performed on the text content of the text to obtain an analysis processing result, and then content organization is performed according to the analysis processing result to obtain the abstract content of the text. It should be noted that the above organization of content includes but is not limited to being generic, readable, concise, etc. In addition, through the natural language processing technology, the text idea is automatically summarized, the core key points of the text are quickly known, the abstract is automatically generated, and a large amount of reading time of new media operators is saved.
Optionally, the method further includes: performing article modification on the text content of the text, wherein the performing article modification on the text content of the text comprises: adjusting the text content of the text according to different analysis angles, wherein the analysis angles comprise at least one of the following: language accuracy, chapter structure, theme.
As an alternative embodiment, the article modification of the text content of the text requires adjusting the text content of the text from different analysis perspectives, wherein the analysis perspectives include, but are not limited to, language accuracy, chapter structure, theme, and the like. It is desirable to say that the language accuracy analysis includes, but is not limited to, alias error correction, grammar detection, idiom detection, allusion detection, etc.; discourse structure analysis includes but is not limited to whether the structure is clear, whether the view is clear, etc.; topic analysis includes, but is not limited to topic recognition and the like.
Optionally, adjusting the text content of the text according to different analysis angles includes: under the condition that the analysis angle is the language accuracy, adjusting the text content of the text based on the odd character error correction, the grammar detection, the idiom detection and the literacy detection; under the condition that the analysis angle is a chapter structure, adjusting the structure and viewpoint of the text content of the text; in the case where the analysis angle is a topic, a topic of the text content of the text is identified.
As an optional embodiment, based on syntactic analysis in natural language understanding, a DNN language model and text error correction, the article uploaded by a new media operator is subjected to machine learning and evaluation, and whether the sentence of the article is smooth and reasonable in terms and whether word sending sentence making accords with objective language expression habits is judged. And through the error correction function, wrongly written characters of the article are verified, the wrong segment in the article is identified, error prompt is carried out, and a correct suggestion is given.
As an alternative embodiment, for example, to check for wrongly written words, words in the database can be compared with words in the articles by natural language understanding techniques; checking the structure of the sentence, wherein the natural language processing technology can cut the sentence into characters, integrate the characters into words, mark out the part of speech, and finally judge the sentence structure according to the sentence pattern model; and checking the context logic relationship, applying viewpoint extraction to summarize the paragraphs into core contents, and finally analyzing the logic relationship among the core contents by a deep learning technology understood by natural language.
Optionally, the method further includes: rewriting the article of the text content of the text, wherein the rewriting the text content of the text comprises: determining an update type of text content of the text; and when the updating type is changed, replanning the text content of the text.
As an alternative embodiment, the update types include, but are not limited to, data update, topic update, hot topic, list update, and the like. In the specific implementation process, data updating needs to be judged in real time or intermittently; periodically judging topic updating and list updating; the hot topic is sudden and needs to be updated irregularly.
As an optional embodiment, based on a natural language understanding technology, Chinese word segmentation, semantic association of grammar analysis and entity recognition technology are independently developed, and the continuous accumulation of large-scale industry corpora is combined to rewrite the articles.
As an alternative embodiment, a deep learning technology of natural language understanding may be used to summarize a set of fixed writing templates for each field by reading a large number of articles, and then the data in the material is captured to generate a coherent article intelligently. For example, after a financial field article is rewritten, a financial field template is summarized, information such as time, places, scoring conditions and character highlights in materials can be captured through similarity analysis and viewpoint extraction according to set audience groups, keywords and the like, the information is filled in the template, news products are produced quickly, and therefore time and space are reserved for deep mining and writing of new media operators.
Through the implementation mode, the article rewriting can be realized in time.
Optionally, the replanning of the text content of the text comprises at least one of: performing document planning on the text content of the text, wherein the document planning comprises at least one of the following steps: determining generated content and determining a document structure; and performing micro planning on the text content of the text, wherein the micro planning comprises at least one of the following steps: sentence generation, sentence linkage, paragraph generation and title generation; performing surface layer rewriting on text content of the text, wherein the surface layer rewriting comprises at least one of the following steps: and (5) retouching and matching a format drawing.
As an alternative embodiment, the replanning of the text content of the text includes, but is not limited to, document planning, micro-planning, surface-layer rewriting, and the like. Wherein, the document planning includes but is not limited to determining the generated content, determining the document structure, etc.; the micro-programming includes but is not limited to sentence generation, inter-sentence linkage, paragraph generation, title generation, etc.; the surface overwrites include, but are not limited to, touch-up overwrites, formatting, and the like.
As an alternative embodiment, first, data acquisition and information input are required. Wherein, it is understood that various data related to data and writing output matters are digested, and data related to the target output matters can be found from various forms of data and materials, and can be APIs, and data, algorithms and services in various formats can be found. Secondly, the data needs to be analyzed, the data and its internal association and relationship are analyzed, a reasonable data structure expression is found, and the data and the expression of target output are summarized. Furthermore, an output structure needs to be constructed, and for output objects of different types and target requirements, the definition of the output structure and the semantic representation of an output result need to be reasonably expressed, and a user portrait can also be introduced for personalized expression. And the constraint and support of the knowledge graph cannot be separated when semantic representation is carried out, and the data is put under the knowledge graph background frame of the output object for representation. Finally, optimization, word sending, sentence making, language modification, visible element decoration and the like are displayed, for example, contents, microblogs, titles and the like mainly comprising special typical scenes of chatting sessions, long texts, abstracts, short news, communication reports, stories and visual charts are displayed, and the optimization direction and method are different in different application modes.
As an alternative embodiment, the article modification and rewriting are performed by natural language understanding technology, for example, the rewriting of the article is performed by setting audience groups, keywords, and the like; the written article is modified by the system, such as checking wrongly written characters, checking the context logic relationship and the like, so that the creation time of new media operators is saved.
Example 2
According to another aspect of the embodiments of the present invention, there is also provided a text processing apparatus, and fig. 2 is a schematic diagram of a text processing apparatus according to an embodiment of the present invention, as shown in fig. 2, the text processing apparatus includes: an acquisition module 22 and a first processing module 24. The text processing apparatus will be described in detail below.
An obtaining module 22, configured to obtain a text; the first processing module 24 is connected to the obtaining module 22, and is configured to input the text into a topic classification model, and output a classification result corresponding to the text by the topic classification model, where the topic classification model is obtained by using multiple sets of training data through machine learning training, and each set of data in the multiple sets of training data includes: the text and the classification result corresponding to the text; the topic classification model includes at least one of: a plate/industry classification model, a topic type model and a customized model.
In the above embodiment, the text processing device can identify the classification result corresponding to the text through the topic classification model, thereby realizing the technical effect of quickly and accurately realizing topic classification, and further solving the technical problem that the new media personnel can not quickly find the required topic only by carrying out cluster analysis and analyzing the public opinion trend in the prior art.
It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; and/or the modules are located in different processors in any combination.
It should be noted here that the above-mentioned obtaining module 22 and the first processing module 24 correspond to steps S102 to S104 in embodiment 1, and the above-mentioned modules are the same as examples and application scenarios realized by the corresponding steps, but are not limited to what is disclosed in embodiment 1 above. It should be noted that the modules described above as part of an apparatus may be implemented in a computer system such as a set of computer-executable instructions.
Optionally, the topic classification model includes at least one of: a plate/industry classification model, a topic type model, and a customized model, wherein the first processing module 24 includes: the first identification unit is used for identifying news sections/industries corresponding to texts according to the section/industry classification model under the condition that the topic classification model is the section/industry classification model; the second identification unit is used for identifying the subdivided topics corresponding to the texts according to the topic type model under the condition that the topic classification model is the topic type model; and the third identification unit is used for identifying the customized topics corresponding to the texts according to the customized model under the condition that the topic classification model is the customized model.
Optionally, the apparatus further comprises: the second processing module is used for analyzing and processing the text content of the text to obtain an analysis processing result, wherein the analysis processing includes at least one of the following: content representation, weight calculation and content selection; and the third processing module is used for organizing the content according to the analysis processing result to obtain the abstract content of the text.
Optionally, the apparatus further comprises: a fourth processing module, configured to modify a text content of a text, where the fourth processing module includes: the adjusting unit is used for adjusting the text content of the text according to different analysis angles, wherein the analysis angles comprise at least one of the following: language accuracy, chapter structure, theme.
Optionally, the adjusting unit includes: a first adjusting subunit, configured to adjust text content of the text based on the alias error correction, the grammar detection, the idiom detection, and the literacy detection when the analysis angle is the language accuracy; the second adjusting subunit is used for adjusting the structure and viewpoint of the text content of the text under the condition that the analysis angle is a chapter structure; and the identification subunit is used for identifying the theme of the text content of the text under the condition that the analysis angle is the theme.
Optionally, the apparatus further comprises: a fifth processing module, configured to perform article rewriting on text content of a text, where the fifth processing module includes: a determination unit configured to determine an update type of text content of the text; and the replanning unit is used for replanning the text content of the text when the updating type is changed.
Optionally, the replanning unit includes at least one of: a first planning subunit, configured to perform document planning on text content of the text, where the document planning includes at least one of: determining generated content and determining a document structure; a second planning subunit, configured to perform micro planning on text content of the text, where the micro planning includes at least one of: sentence generation, sentence linkage, paragraph generation and title generation; a third planning subunit, configured to perform a surface layer rewriting on the text content of the text, where the surface layer rewriting includes at least one of: and (5) retouching and matching a format drawing.
Example 3
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, which includes a stored program, wherein when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the text processing method of any one of the above.
Optionally, in this embodiment, the computer-readable storage medium may be located in any one of a group of computer terminals in a computer network and/or in any one of a group of mobile terminals, and the computer-readable storage medium includes a stored program.
Optionally, the program when executed controls an apparatus in which the computer-readable storage medium is located to perform the following functions: acquiring a text; inputting a text into a topic classification model, outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using a plurality of groups of training data through machine learning training, and each group of data in the plurality of groups of training data comprises: and the text and the classification result corresponding to the text.
Example 4
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes a text processing method according to any one of the above.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps: acquiring a text; inputting a text into a topic classification model, outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using a plurality of groups of training data through machine learning training, and each group of data in the plurality of groups of training data comprises: and the text and the classification result corresponding to the text.
The invention also provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring a text; inputting a text into a topic classification model, outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using a plurality of groups of training data through machine learning training, and each group of data in the plurality of groups of training data comprises: and the text and the classification result corresponding to the text.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of text processing, comprising:
acquiring a text;
inputting the text into a topic classification model, and outputting a classification result corresponding to the text by the topic classification model, wherein the topic classification model is obtained by using multiple sets of training data through machine learning training, and each set of data in the multiple sets of training data comprises: and the text and the classification result corresponding to the text.
2. The method of claim 1, wherein the topic classification model comprises at least one of: the method comprises the following steps of inputting a topic classification model into a layout/industry classification model, a topic type model and a customized model, and outputting a classification result corresponding to the text by the topic classification model, wherein the classification result comprises the following steps:
under the condition that the topic classification model is the plate/industry classification model, identifying a news plate/industry corresponding to the text according to the plate/industry classification model;
under the condition that the topic classification model is the topic type model, identifying a subdivided topic corresponding to the text according to the topic type model;
and under the condition that the topic classification model is the customized model, identifying the customized topic corresponding to the text according to the customized model.
3. The method of claim 1, further comprising:
analyzing the text content of the text to obtain an analysis processing result, wherein the analysis processing comprises at least one of the following steps: content representation, weight calculation and content selection;
and organizing content according to the analysis processing result to obtain the abstract content of the text.
4. The method of claim 1, further comprising:
performing article modification on the text content of the text, wherein the performing article modification on the text content of the text comprises: adjusting the text content of the text according to different analysis angles, wherein the analysis angles include at least one of: language accuracy, chapter structure, theme.
5. The method of claim 1, wherein adjusting the text content of the text according to different analysis angles comprises:
under the condition that the analysis angle is language accuracy, adjusting the text content of the text based on the odd character error correction, the grammar detection, the idiom detection and the literacy detection;
under the condition that the analysis angle is a chapter structure, adjusting the structure and viewpoint of the text content of the text;
and under the condition that the analysis angle is a theme, identifying the theme of the text content of the text.
6. The method of claim 1, further comprising:
rewriting the article of the text content of the text, wherein the rewriting the text content of the text comprises: determining an update type of text content of the text; and when the updating type is changed, replanning the text content of the text.
7. The method of claim 6, wherein replanning the textual content of the text comprises at least one of:
performing document planning on the text content of the text, wherein the document planning comprises at least one of the following: determining generated content and determining a document structure;
micro-programming the text content of the text, wherein the micro-programming comprises at least one of: sentence generation, sentence linkage, paragraph generation and title generation;
performing surface layer rewriting on the text content of the text, wherein the surface layer rewriting comprises at least one of the following steps: and (5) retouching and matching a format drawing.
8. A text processing apparatus, comprising:
the acquisition module is used for acquiring a text;
the first processing module is used for inputting the texts into a topic classification model, and outputting classification results corresponding to the texts by the topic classification model, wherein the topic classification model is obtained by using multiple sets of training data through machine learning training, and each set of data in the multiple sets of training data comprises: the text and the classification result corresponding to the text; the topic classification model includes at least one of: a plate/industry classification model, a topic type model and a customized model.
9. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the text processing method according to any one of claims 1 to 7.
10. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to execute the text processing method according to any one of claims 1 to 7 when running.
CN202011625158.XA 2020-12-30 2020-12-30 Text processing method and device, computer readable storage medium and processor Pending CN112667815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011625158.XA CN112667815A (en) 2020-12-30 2020-12-30 Text processing method and device, computer readable storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011625158.XA CN112667815A (en) 2020-12-30 2020-12-30 Text processing method and device, computer readable storage medium and processor

Publications (1)

Publication Number Publication Date
CN112667815A true CN112667815A (en) 2021-04-16

Family

ID=75412587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011625158.XA Pending CN112667815A (en) 2020-12-30 2020-12-30 Text processing method and device, computer readable storage medium and processor

Country Status (1)

Country Link
CN (1) CN112667815A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988978A (en) * 2021-04-27 2021-06-18 河南金明源信息技术有限公司 Case trend analysis system in key field of public welfare litigation
CN113688206A (en) * 2021-08-25 2021-11-23 平安国际智慧城市科技股份有限公司 Text recognition-based trend analysis method, device, equipment and medium
CN114298012A (en) * 2021-12-31 2022-04-08 中国电子科技集团公司电子科学研究院 Optimization method for generating long text scientific and technological information model

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012027679A (en) * 2010-07-23 2012-02-09 Mitsubishi Electric Information Systems Corp Text creating system and text creating program
CN108563620A (en) * 2018-04-13 2018-09-21 上海财梵泰传媒科技有限公司 The automatic writing method of text and system
US20180336175A1 (en) * 2017-05-17 2018-11-22 Media Gazelle Inc. Method and System for Semantically Generating and Digitally Publishing Articles
CN109446505A (en) * 2018-10-31 2019-03-08 广东小天才科技有限公司 Model essay generation method and system
CN110119786A (en) * 2019-05-20 2019-08-13 北京奇艺世纪科技有限公司 Text topic classification method and device
CN110750637A (en) * 2019-08-15 2020-02-04 中国平安财产保险股份有限公司 Text abstract extraction method and device, computer equipment and storage medium
CN111125354A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Text classification method and device
CN111274776A (en) * 2020-01-21 2020-06-12 中国搜索信息科技股份有限公司 Article generation method based on keywords
CN111813936A (en) * 2020-06-28 2020-10-23 深圳壹账通智能科技有限公司 News information presentation method based on deep learning and related equipment
CN111883136A (en) * 2020-07-30 2020-11-03 潘忠鸿 Rapid writing method and device based on artificial intelligence

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012027679A (en) * 2010-07-23 2012-02-09 Mitsubishi Electric Information Systems Corp Text creating system and text creating program
US20180336175A1 (en) * 2017-05-17 2018-11-22 Media Gazelle Inc. Method and System for Semantically Generating and Digitally Publishing Articles
CN108563620A (en) * 2018-04-13 2018-09-21 上海财梵泰传媒科技有限公司 The automatic writing method of text and system
CN109446505A (en) * 2018-10-31 2019-03-08 广东小天才科技有限公司 Model essay generation method and system
CN111125354A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Text classification method and device
CN110119786A (en) * 2019-05-20 2019-08-13 北京奇艺世纪科技有限公司 Text topic classification method and device
CN110750637A (en) * 2019-08-15 2020-02-04 中国平安财产保险股份有限公司 Text abstract extraction method and device, computer equipment and storage medium
CN111274776A (en) * 2020-01-21 2020-06-12 中国搜索信息科技股份有限公司 Article generation method based on keywords
CN111813936A (en) * 2020-06-28 2020-10-23 深圳壹账通智能科技有限公司 News information presentation method based on deep learning and related equipment
CN111883136A (en) * 2020-07-30 2020-11-03 潘忠鸿 Rapid writing method and device based on artificial intelligence

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988978A (en) * 2021-04-27 2021-06-18 河南金明源信息技术有限公司 Case trend analysis system in key field of public welfare litigation
CN112988978B (en) * 2021-04-27 2024-03-26 河南金明源信息技术有限公司 Case trend analysis system in important field of public service litigation
CN113688206A (en) * 2021-08-25 2021-11-23 平安国际智慧城市科技股份有限公司 Text recognition-based trend analysis method, device, equipment and medium
CN114298012A (en) * 2021-12-31 2022-04-08 中国电子科技集团公司电子科学研究院 Optimization method for generating long text scientific and technological information model

Similar Documents

Publication Publication Date Title
Günther et al. Word counts and topic models: Automated text analysis methods for digital journalism research
Pak et al. Text segmentation techniques: a critical review
CN106997382A (en) Innovation intention label automatic marking method and system based on big data
CN109960786A (en) Chinese Measurement of word similarity based on convergence strategy
CN112667815A (en) Text processing method and device, computer readable storage medium and processor
CN109086265B (en) Semantic training method and multi-semantic word disambiguation method in short text
US10042880B1 (en) Automated identification of start-of-reading location for ebooks
CN110309114B (en) Method and device for processing media information, storage medium and electronic device
CN101681348A (en) Semantics-based method and system for document analysis
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN111339284A (en) Product intelligent matching method, device, equipment and readable storage medium
CN115017303A (en) Method, computing device and medium for enterprise risk assessment based on news text
CN105760363A (en) Text file word sense disambiguation method and device
CN114971730A (en) Method for extracting file material, device, equipment, medium and product thereof
CN113722492A (en) Intention identification method and device
Ribeiro et al. Discovering IMRaD structure with different classifiers
KR102185733B1 (en) Server and method for automatically generating profile
CN114840685A (en) Emergency plan knowledge graph construction method
CN115878752A (en) Text emotion analysis method, device, equipment, medium and program product
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN112711666B (en) Futures label extraction method and device
Reddy et al. Classification of user’s review using modified logistic regression technique
CN111274354B (en) Referee document structuring method and referee document structuring device
CN113641788B (en) Unsupervised long and short film evaluation fine granularity viewpoint mining method
CN114996458A (en) Text processing method and device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination