CN110750637B - Text abstract extraction method, device, computer equipment and storage medium - Google Patents

Text abstract extraction method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110750637B
CN110750637B CN201910753710.4A CN201910753710A CN110750637B CN 110750637 B CN110750637 B CN 110750637B CN 201910753710 A CN201910753710 A CN 201910753710A CN 110750637 B CN110750637 B CN 110750637B
Authority
CN
China
Prior art keywords
text
processed
training
category
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910753710.4A
Other languages
Chinese (zh)
Other versions
CN110750637A (en
Inventor
张思亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN201910753710.4A priority Critical patent/CN110750637B/en
Publication of CN110750637A publication Critical patent/CN110750637A/en
Application granted granted Critical
Publication of CN110750637B publication Critical patent/CN110750637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text abstract extraction method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: processing a text to be processed by using a target text classification model obtained through pre-training to obtain the category of the text to be processed; and executing the following circulation processing on the text to be processed until all sentences in the text to be processed are deleted: randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text; processing the residual text by using the target text classification model to obtain the category of the residual text; judging whether the category of the residual text is the same as the category of the text to be processed, if not, restoring the deleted sentence to the text to be processed; and taking the residual text obtained after the circulation processing is finished as a target text abstract. The method and the device are combined with the abstract obtained by the text whole semantics, and improve the accuracy of text abstract extraction.

Description

Text abstract extraction method, device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text abstract extraction method, a device, computer equipment and a storage medium.
Background
The short text is simple and coherent, which can reflect the center content of a certain text, and can help people to shorten the reading time when reading a large amount of text. The automatic text summarization technology is to analyze and process lengthy texts by a computer through a series of text processing technologies, extract main central ideas of the texts, generate a section of brief summary with generalization, and help users to locate the content wanted by themselves.
The automatic text summarization technology is a research hotspot in the field of natural language processing, and is divided into an extraction type summary and a generation type summary according to the generation mode of summary content. At present, the generation type technology is still immature, an extraction type method is generally used in the industry for generating abstracts, the text is generally subjected to word segmentation, stop words and other pretreatment steps, a text matrix is constructed by utilizing a TF-IDF algorithm, sentence scores are calculated, and sentences serving as abstracts are selected according to the scores. However, such a method stays on the literal, does not use the semantic relation of the context, and the extracted abstract lacks relevance, cannot extract key content according to the context, and cannot meet the requirements of users.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a text abstract extraction method, a text abstract extraction device, computer equipment and a storage medium, which are used for solving the problem that the prior art does not utilize the semantic relation of the context to extract the abstract.
In order to achieve the above object, the present invention provides a text abstract extraction method, comprising the steps of:
processing a text to be processed by using a target text classification model obtained through pre-training to obtain the category of the text to be processed;
and executing the following circulation processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
Processing the residual text by using the target text classification model to obtain the category of the residual text;
Judging whether the category of the residual text is the same as the category of the text to be processed, if not, restoring the deleted sentence to the text to be processed;
And taking the residual text obtained after the circulation processing is finished as a target text abstract.
Further, the target text classification model is obtained through training of the following steps:
collecting a sample data set, wherein the sample data set comprises a plurality of training texts, and each training text is marked with a corresponding category;
dividing the sample data set into a training set and a verification set according to a preset proportion;
training to obtain the target text classification model based on the training set;
and verifying the target text classification model based on the verification set, and training the end if the verification is passed.
Further, the text to be processed and the training text are complaint texts.
Further, the categories of the text to be processed and the training text comprise aging failure, price difference and service attitude, and the like.
Further, the target text classification model is TEXTCNN models, and the TEXTCNN models include an embedding layer, a convolution layer, a pooling layer, a full connection layer, and a Softmax classification layer.
Further, the steps of processing the text to be processed by using the target text classification model obtained by pre-training are as follows:
vectorizing the text to be processed through the embedded layer to obtain word vectors of the text to be processed;
carrying out convolution processing on the word vector of the text to be processed through the convolution layer so as to extract the characteristics of the text to be processed;
Carrying out pooling treatment on the characteristics of the text to be treated through the pooling layer to obtain dimension reduction characteristics of the text to be treated;
transmitting the dimension reduction characteristics of the text to be processed to the Softmax classification layer through the full connection layer;
And processing the dimension reduction features of the text to be processed through the Softmax classification layer to obtain the category of the text to be processed.
Further, the text abstract extraction method further comprises the following steps: and preprocessing the text to be processed before processing the text to be processed by utilizing the target text classification model obtained through pre-training.
In order to achieve the above object, the present invention further provides a text abstract extracting device, including:
The class acquisition module is used for processing the text to be processed by utilizing the target text classification model obtained by pre-training to obtain the class of the text to be processed;
the loop pruning processing module is used for executing the following loop processing on the text to be processed until all sentences in the text to be processed are deleted:
Randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
Processing the residual text by using the target text classification model to obtain the category of the residual text;
Judging whether the category of the residual text is the same as the category of the text to be processed, if not, restoring the deleted sentence to the text to be processed;
and the abstract acquisition module is used for acquiring the residual text obtained after the circulation processing is finished as a target text abstract.
Further, the text abstract extracting device further comprises: the model training module is used for training the target text classification model, and comprises:
The system comprises a sample data set acquisition unit, a data processing unit and a data processing unit, wherein the sample data set is used for acquiring a sample data set, the sample data set comprises a plurality of training texts, and each training text is marked with a corresponding category label;
a sample data set dividing unit for dividing the sample data set into a training set and a verification set according to a predetermined ratio;
the training unit is used for training and obtaining the target text classification model based on the training set;
And the verification unit is used for verifying the target text classification model based on the verification set, and if the verification is passed, the training is finished.
Further, the text to be processed and the training text are complaint texts.
Further, the categories of the text to be processed and the training text comprise aging failure, price difference and service attitude, and the like.
Further, the target text classification model is TEXTCNN models, and the TEXTCNN models include an embedding layer, a convolution layer, a pooling layer, a full connection layer, and a Softmax classification layer.
Further, the category acquisition module is specifically configured to:
vectorizing the text to be processed through the embedded layer to obtain word vectors of the text to be processed;
carrying out convolution processing on the word vector of the text to be processed through the convolution layer so as to extract the characteristics of the text to be processed;
Carrying out pooling treatment on the characteristics of the text to be treated through the pooling layer to obtain dimension reduction characteristics of the text to be treated;
transmitting the dimension reduction characteristics of the text to be processed to the Softmax classification layer through the full connection layer;
And processing the dimension reduction features of the residual text through the Softmax classification layer to obtain the category of the residual text.
Further, the text abstract extracting device further comprises: the preprocessing module is used for preprocessing the text to be processed before the text to be processed is processed by utilizing the target text classification model obtained through pre-training.
To achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of the aforementioned method when executing the computer program.
In order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned method.
By adopting the technical scheme, the invention has the following beneficial effects:
According to the method, sentences in the text to be processed are deleted through random circulation, whether the text category after the deletion of the sentences is the same as that before the deletion is calculated, if so, the deleted sentences have small semantic contribution to the text, and the sentences are deleted, otherwise, the deleted sentences have large semantic contribution to the text, and the sentences are not deleted, so that the deleted sentences are restored in the text, and when all the sentences in the text are deleted, the abstract of the text is obtained. Because the above process is realized based on the classification model, and the classification model is trained based on the semantics, the abstract obtained based on the invention is an abstract combining the whole semantics of the text, i.e. the abstract can truly abstract the whole information of the text from the aspect of the semantics. In addition, the method and the device for deleting the text abstract randomly delete sentences, ensure that key semantics are not influenced by sequences, and improve the accuracy of generating the text abstract while considering the text processing speed performance.
Drawings
FIG. 1 is a flow chart of one embodiment of a text summarization method of the present invention;
FIG. 2 is a block diagram of one embodiment of a text summarization apparatus of the present invention;
FIG. 3 is a hardware architecture diagram of one embodiment of a computer device of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, the invention provides a text abstract extraction method, which specifically comprises the following steps:
S0, training according to the collected sample data set to obtain a target text classification model, wherein the specific training process comprises the following steps:
S01, collecting a sample data set, wherein the sample data set comprises a plurality of training texts, and each training text is marked with a corresponding category. In this embodiment, the training text may be complaint text. For example, assuming that a vehicle insurance company needs to quickly obtain a complaint abstract from a customer's complaint text, the collected sample data set should contain complaint text marked with different categories, including but not limited to less than aging, price objection, service attitudes. It should be appreciated that in addition to complaint text, corresponding sample data sets may be collected for different application scenarios, depending on different needs.
S02, the collected sample data set is divided into a training set and a verification set according to a preset proportion, wherein the training set accounts for 80% and the verification set accounts for 20%.
S03, training by adopting a gradient descent algorithm based on the training set to obtain a target text classification model. In the present invention, the target text classification model is preferably a commonly used text classification model-TEXTCNN model, textCNN is a model for classifying text by using a convolutional neural network, and includes an embedding layer, a convolutional layer, a pooling layer, a full-connection layer and a Softmax classification layer.
S04, based on the verification set, verifying whether the performances such as Accuracy (Accuracy), precision (Precision), recall (Recall), F1 score and the like of the target text classification model obtained through training meet preset conditions, if yes, the training is finished, otherwise, the number of training texts in the training set is increased, and the target classification model is retrained.
S1, acquiring a text to be processed, wherein the text to be processed can be a complaint text, such as a complaint text of a vehicle insurance client.
S2, processing the text to be processed by using a target text classification model (TEXTCNN model) obtained through training to obtain the category of the text to be processed, wherein the method is realized through the following steps:
s21, vectorizing the text to be processed through an embedding layer of the TEXTCNN model to obtain word vectors of the text to be processed;
s22, carrying out convolution processing on word vectors of the text to be processed through a convolution layer of a TEXTCNN model so as to extract characteristics of the text to be processed;
S23, pooling the characteristics of the text to be processed through a pooling layer of the TEXTCNN model to obtain the dimension reduction characteristics of the text to be processed;
S24, transmitting the dimension reduction characteristics of the text to be processed to a Softmax classification layer through a full connection layer of TEXTCNN models;
s25, calculating the probability of the text to be processed corresponding to various classification labels according to the dimension reduction characteristics of the text to be processed by a Softmax classification layer of the TEXTCNN model, and taking the classification label with the highest probability as the class of the text to be processed.
S3, sentence dividing processing is carried out on the text to be processed. In particular, the present invention may follow sentence-level punctuation marks, such as periods. ", exclamation mark" -! ", question marks"? "etc., sentence the text to be processed. For example, suppose that the text to be processed is a non-accident rescue of the following complaint content application, only one phone contact in the middle tells that two hours are needed, and the result is that 4 hours are more or less elapsed, and complaints are not satisfied. The client calls that the user does not need me to rescue at present and finds himself to rescue. The user can answer the rescue 028-65200801 by contacting the Ann-Lian with the Chinese patent, and the client asks the driver to give a personal description. Please mechanism processes replies as soon as possible, thank you-! ", the following four sentences are obtained after sentence dividing processing: the 1 st sentence is the non-accident rescue of the application, only one telephone contact in the middle is informed that two hours are needed, and the result is more than 4 hours or not, and complaints are not satisfied. And the 2 nd sentence is that the client calls that the rescue of me is not needed at present, and the client finds the person to rescue. And 3, the sentence is that' the multiple contact alliance rescue 028-65200801 is not answered by a person, and the client asks the department to give a personal description. "the 4 th sentence is" please mechanism processes replies as soon as possible, thank you-! ".
After the sentence dividing process is completed, a corresponding deletion flag bit is set for each sentence, the initial value of the deletion flag bit is set to 0, and when the deletion flag bit is 0, the corresponding sentence is not deleted.
S4, randomly selecting a sentence which is not deleted from the text to be processed to delete, and obtaining a residual text. After the sentence is selected to be deleted, the sentence is marked as deleted, so that the sentence is not deleted when the step is repeatedly executed. In this embodiment, labeling the foregoing sentence as deleted means: and setting the deletion mark position of the sentence as 1, and when the deletion mark position is 1, indicating that the corresponding sentence is deleted.
S5, processing the residual text by using a target text classification model, namely TEXTCNN model, to obtain the category of the residual text, wherein the specific flow is as follows:
s51, carrying out vectorization processing on the residual text through an embedding layer of the TEXTCNN model to obtain word vectors of the residual text;
S52, carrying out convolution processing on word vectors of the residual text through a convolution layer of the TEXTCNN model so as to extract characteristics of the residual text;
S53, carrying out pooling treatment on the characteristics of the residual text through a pooling layer of the TEXTCNN model to obtain the dimension reduction characteristics of the residual text;
s54, transmitting the dimension reduction features of the residual text to a Softmax classification layer through a full connection layer of TEXTCNN models;
and S55, calculating the probability of the residual text corresponding to each class label through a Softmax class layer of the TEXTCNN model, and taking the class label with the highest probability as the class of the residual text.
S6, judging whether the type of the residual text obtained by deleting the certain sentence is the same as the type of the text to be processed, if so, explaining that the whole semantic meaning of the deleted certain sentence to be processed is not important, namely, the sentence is deleted from the target text abstract of the text to be processed, and executing the step S8; if not, step S7 is performed.
And S7, if the type of the rest text is different from the type of the text to be processed, explaining that the deleted certain sentence is important to the whole semantic of the text, namely, the sentence should not be deleted from the target text abstract of the text to be processed. Therefore, the deleted sentence is restored to the text to be processed, and step S8 is performed.
S8, judging whether all sentences in the text to be processed are deleted, namely judging whether the deletion flag bits of all sentences are 1, if so, executing the step S9, otherwise, returning to the step S4 to execute the next circulation processing.
And S9, deleting all sentences in the text to be processed to obtain a residual text as a target text abstract to be extracted.
One application scenario of the application is: assume that a text to be processed X includes A, B, C, D sentences, and the type obtained after the text is processed by the target text classification model is M. When the method is adopted for processing, firstly, sentences D are randomly deleted, if the category of the text after the sentences D are deleted is still M, the sentences D are not important to the text X, and the sentences D can be deleted to obtain the rest text comprising sentences A, B, C; randomly deleting the sentence C in the residual text, if the category of the text after deleting the sentence C is not M, explaining that the sentence C is important for the text X, and recovering the sentence C if deleting the sentence C can not be deleted, so as to still obtain the residual text comprising the sentence A, B, C; and then, continuously circularly and randomly deleting the un-deleted sentences in the rest text, and deleting the sentences C, so that the sentences C are not deleted any more, and the rest text obtained by deleting all the sentences in the text M is taken as the abstract. Taking the text to be processed as the complaint text provided in the step S3 as an example, assuming that the category obtained after the text is processed by the TEXTCNN model is 'time efficiency is not reached', the category obtained after the 1 st sentence is deleted is changed, and the category obtained after the 2 nd, 3 rd or 4 th sentences are deleted is still 'time efficiency is not reached', which indicates that the 1 st sentence is critical to the complaint text, the 2 nd to 4 th sentences are non-critical to the text and are deleted from the abstract thereof, thereby obtaining the abstract of the complaint text as the 1 st sentence.
Therefore, the invention deletes the sentence in the text to be processed through random circulation, calculates whether the text category after the sentence is deleted is the same as that before the deletion, if so, the sentence which indicates that the deleted sentence has little semantic contribution to the text should be deleted, otherwise, the sentence which indicates that the deleted sentence has large semantic contribution to the text should not be deleted, and restores the deleted sentence in the text, and when all the sentences in the text are deleted, the abstract of the text is obtained. The invention is realized based on the classification model, and the classification model is based on semantic training, so that the abstract obtained based on the invention is an abstract combining the whole text semantics, namely, the abstract can truly abstract the whole text information from the aspect of semantics, and the accuracy of generating the text abstract is improved while considering the text processing speed performance.
As a preferred solution of this embodiment, the present invention further includes preprocessing the obtained text to be processed before executing step S2, specifically includes preprocessing the text to be processed such as stop word filtering, that is, detecting whether there is a word in the text to be processed that matches with a stop word in a preset stop word list, and if so, deleting the matched word. It should be understood that the term "deactivated" is generally an imaginary term that has no actual meaning, such as "ground", "obtained", "having" and the like.
It should be noted that, for simplicity of description, the present embodiment is shown as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts, as some steps may be performed in other order or simultaneously in accordance with the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required for the present invention.
Example two
As shown in fig. 2, the present embodiment provides a text digest extracting apparatus 10, including:
The model training module 11 is used for training to obtain a target text classification model;
The obtaining module 12 is configured to process a text to be processed by using a target text classification model obtained by training in advance, so as to obtain a category of the text to be processed, where the text to be processed may be a complaint text;
the loop pruning processing module 13 is configured to perform the following loop processing on the text to be processed until all sentences in the text to be processed are deleted:
Randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
Processing the residual text by using the target text classification model to obtain the category of the residual text;
Judging whether the category of the rest text is the same as the category of the text to be processed, if not, restoring the deleted sentence to the text to be processed;
the abstract obtaining module 14 is configured to obtain, as a target text abstract, a remaining text obtained after the loop processing is completed.
In the present embodiment, the model training module 11 includes:
The system comprises a sample data set acquisition unit, a first analysis unit and a second analysis unit, wherein the sample data set acquisition unit is used for acquiring a sample data set, the sample data set comprises a plurality of training texts, each training text is marked with a corresponding category, and the training texts can be complaint texts;
a sample data set dividing unit for dividing the sample data set into a training set and a verification set according to a predetermined ratio;
The training unit is used for training to obtain a target text classification model based on the training set;
And the verification unit is used for verifying the target text classification model based on the verification set, if the verification is passed, the training is finished, and if the verification is not passed, the number of the training texts in the training set is increased and the target classification model is retrained.
In this embodiment, the target text classification model is TEXTCNN model, and the TEXTCNN model includes an embedding layer, a convolution layer, a pooling layer, a full-connection layer, and a Softmax classification layer.
In this embodiment, the category obtaining module 12 is specifically configured to:
Vectorizing the text to be processed through an embedding layer of TEXTCNN model to obtain word vectors of the text to be processed;
carrying out convolution processing on word vectors of the text to be processed through a convolution layer of TEXTCNN model so as to extract characteristics of the text to be processed;
pooling the characteristics of the text to be processed through a pooling layer of TEXTCNN model to obtain the dimension reduction characteristics of the text to be processed;
transmitting the dimension reduction characteristics of the text to be processed to a Softmax classification layer through a full connection layer of TEXTCNN models;
and calculating the probability of the text to be processed corresponding to various classification labels according to the dimension reduction characteristics of the text to be processed through a Softmax classification layer of the TEXTCNN model, and taking the classification label with the highest probability as the class of the text to be processed.
In this embodiment, the text abstract extracting device 10 may further include a preprocessing module, configured to preprocess the text to be processed before the text to be processed is processed by using the target text classification model obtained by training in advance, specifically including preprocessing such as stop word filtering, that is, detecting whether there is a word in the text to be processed that matches with a stop word in the preset stop word list, and if so, deleting the matched word. It should be understood that the term "deactivated" is generally an imaginary term that has no actual meaning, such as "ground", "obtained", "having" and the like.
Those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments and that the modules referred to are not necessarily essential to the invention.
Example III
The invention also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server or a cabinet server (comprising independent servers or a server cluster formed by a plurality of servers) and the like which can execute programs. The computer device 20 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 22, which may be communicatively coupled to each other via a system bus, as shown in fig. 3. It should be noted that fig. 3 only shows a computer device 20 having components 21-22, but it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may be implemented instead.
In the present embodiment, the memory 21 (i.e., readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the computer device 20, such as a hard disk or memory of the computer device 20. In other embodiments, the memory 21 may also be an external storage device of the computer device 20, such as a plug-in hard disk provided on the computer device 20, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Of course, the memory 21 may also include both internal storage units of the computer device 20 and external storage devices. In this embodiment, the memory 21 is typically used to store an operating system and various types of application software installed on the computer device 20, such as program codes of the text digest extraction apparatus 10 of the second embodiment. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 20. In this embodiment, the processor 22 is configured to execute the program code or process data stored in the memory 21, for example, execute the text digest extraction apparatus 10, to implement the text digest extraction method of the first embodiment.
Example IV
The present invention also provides a computer readable storage medium such as a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs a corresponding function. The computer readable storage medium of the present embodiment is used for storing the text digest extracting apparatus 10, and when executed by a processor, implements the text digest extracting method of the first embodiment.
From the above description of the embodiments, it will be clear to those skilled in the art that the above embodiment method may be implemented by means of software plus necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather to utilize the equivalent structures or equivalent processes disclosed in the present specification and the accompanying drawings, or to directly or indirectly apply to other related technical fields, which are all encompassed by the present invention.

Claims (10)

1. The text abstract extraction method is characterized by comprising the following steps of:
processing a text to be processed by utilizing a target text classification model obtained in advance based on semantic training to obtain the category of the text to be processed;
And executing the following circulation processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
Processing the residual text by using the target text classification model to obtain the category of the residual text;
Judging whether the category of the residual text is the same as the category of the text to be processed, if not, restoring the deleted sentence to the text to be processed;
And taking the residual text obtained after the circulation processing is finished as a target text abstract.
2. The text summarization method of claim 1 wherein the target text classification model is trained by:
collecting a sample data set, wherein the sample data set comprises a plurality of training texts, and each training text is marked with a corresponding category;
dividing the sample data set into a training set and a verification set according to a preset proportion;
training to obtain the target text classification model based on the training set;
And verifying the target text classification model based on the verification set, and if the verification is passed, ending training.
3. The text summarization method of claim 2 wherein the text to be processed and training text are complaint text.
4. A text summarization method according to claim 3 wherein the categories of text to be processed and training text include age-out, price objection and service attitude.
5. The text summarization method of claim 1, wherein the target text classification model is a TEXTCNN model and the TEXTCNN model comprises an embedding layer, a convolution layer, a pooling layer, a full-join layer, and a Softmax classification layer.
6. The text summarization method according to claim 5, wherein the processing the text to be processed using the target text classification model previously trained based on semantics comprises the steps of:
Vectorizing the text to be processed through the embedded layer to obtain word vectors of the text to be processed;
carrying out convolution processing on the word vector of the text to be processed through the convolution layer so as to extract the characteristics of the text to be processed;
Carrying out pooling treatment on the characteristics of the text to be treated through the pooling layer to obtain dimension reduction characteristics of the text to be treated;
transmitting the dimension reduction characteristics of the text to be processed to the Softmax classification layer through the full connection layer;
And processing the dimension reduction features of the text to be processed through the Softmax classification layer to obtain the category of the text to be processed.
7. The text summarization method of claim 1, further comprising: and preprocessing the text to be processed before processing the text to be processed by utilizing the target text classification model obtained through pre-training.
8. A text digest extraction apparatus comprising:
the category acquisition module is used for processing the text to be processed by utilizing a target text classification model which is obtained in advance based on semantic training, so as to obtain the category of the text to be processed;
the loop pruning processing module is used for executing the following loop processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
Processing the residual text by using the target text classification model to obtain the category of the residual text;
judging whether the category of the residual text is the same as the category of the text to be processed, if not, restoring the deleted sentence to the text to be processed;
And the abstract acquisition module is used for acquiring the residual text obtained after the circulation processing is finished as a target text abstract.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
CN201910753710.4A 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium Active CN110750637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910753710.4A CN110750637B (en) 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910753710.4A CN110750637B (en) 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110750637A CN110750637A (en) 2020-02-04
CN110750637B true CN110750637B (en) 2024-05-24

Family

ID=69275839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910753710.4A Active CN110750637B (en) 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110750637B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667815A (en) * 2020-12-30 2021-04-16 北京捷通华声科技股份有限公司 Text processing method and device, computer readable storage medium and processor
CN113761175A (en) * 2021-02-01 2021-12-07 北京沃东天骏信息技术有限公司 Text processing method and device, electronic equipment and storage medium
CN113033216B (en) * 2021-03-03 2024-05-28 东软集团股份有限公司 Text preprocessing method and device, storage medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1040267A (en) * 1996-07-26 1998-02-13 Nec Corp Document summary viewer
CN101446940A (en) * 2007-11-27 2009-06-03 北京大学 Method and device of automatically generating a summary for document set
CN106133772A (en) * 2013-12-18 2016-11-16 谷歌公司 With entity mark video based on comment summary
KR20170089369A (en) * 2016-01-26 2017-08-03 주식회사 마커 Method for automatic summarizing document by user learning
WO2018036555A1 (en) * 2016-08-25 2018-03-01 腾讯科技(深圳)有限公司 Session processing method and apparatus
CN109376242A (en) * 2018-10-18 2019-02-22 西安工程大学 Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN110069624A (en) * 2019-04-28 2019-07-30 北京小米智能科技有限公司 Text handling method and device
KR20190090944A (en) * 2018-01-26 2019-08-05 주식회사 두유비 System and method for machine learning to sort sentence importance and generating summary sentence based on keyword importance

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1040267A (en) * 1996-07-26 1998-02-13 Nec Corp Document summary viewer
CN101446940A (en) * 2007-11-27 2009-06-03 北京大学 Method and device of automatically generating a summary for document set
CN106133772A (en) * 2013-12-18 2016-11-16 谷歌公司 With entity mark video based on comment summary
KR20170089369A (en) * 2016-01-26 2017-08-03 주식회사 마커 Method for automatic summarizing document by user learning
WO2018036555A1 (en) * 2016-08-25 2018-03-01 腾讯科技(深圳)有限公司 Session processing method and apparatus
KR20190090944A (en) * 2018-01-26 2019-08-05 주식회사 두유비 System and method for machine learning to sort sentence importance and generating summary sentence based on keyword importance
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN109376242A (en) * 2018-10-18 2019-02-22 西安工程大学 Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks
CN110069624A (en) * 2019-04-28 2019-07-30 北京小米智能科技有限公司 Text handling method and device

Also Published As

Publication number Publication date
CN110750637A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN110704633B (en) Named entity recognition method, named entity recognition device, named entity recognition computer equipment and named entity recognition storage medium
CN110347835B (en) Text clustering method, electronic device and storage medium
CN110502608B (en) Man-machine conversation method and man-machine conversation device based on knowledge graph
CN110781276B (en) Text extraction method, device, equipment and storage medium
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
CN110362822B (en) Text labeling method, device, computer equipment and storage medium for model training
CN110750637B (en) Text abstract extraction method, device, computer equipment and storage medium
CN110750965B (en) English text sequence labeling method, english text sequence labeling system and computer equipment
CN110334186B (en) Data query method and device, computer equipment and computer readable storage medium
CN111858843B (en) Text classification method and device
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN110866115B (en) Sequence labeling method, system, computer equipment and computer readable storage medium
CN112052682A (en) Event entity joint extraction method and device, computer equipment and storage medium
CN112632278A (en) Labeling method, device, equipment and storage medium based on multi-label classification
CN110321426B (en) Digest extraction method and device and computer equipment
CN114238629A (en) Language processing method and device based on automatic prompt recommendation and terminal
CN112052305A (en) Information extraction method and device, computer equipment and readable storage medium
CN109461016B (en) Data scoring method, device, computer equipment and storage medium
CN111831920A (en) User demand analysis method and device, computer equipment and storage medium
CN111126056B (en) Method and device for identifying trigger words
CN113239702A (en) Intention recognition method and device and electronic equipment
CN114238602A (en) Dialogue analysis method, device, equipment and storage medium based on corpus matching
CN112581297B (en) Information pushing method and device based on artificial intelligence and computer equipment
CN110705258A (en) Text entity identification method and device
CN116166858A (en) Information recommendation method, device, equipment and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant