CN110750637A - Text abstract extraction method and device, computer equipment and storage medium - Google Patents

Text abstract extraction method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN110750637A
CN110750637A CN201910753710.4A CN201910753710A CN110750637A CN 110750637 A CN110750637 A CN 110750637A CN 201910753710 A CN201910753710 A CN 201910753710A CN 110750637 A CN110750637 A CN 110750637A
Authority
CN
China
Prior art keywords
text
processed
training
processing
residual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910753710.4A
Other languages
Chinese (zh)
Other versions
CN110750637B (en
Inventor
张思亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN201910753710.4A priority Critical patent/CN110750637B/en
Publication of CN110750637A publication Critical patent/CN110750637A/en
Application granted granted Critical
Publication of CN110750637B publication Critical patent/CN110750637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a text abstract extraction method, a text abstract extraction device, computer equipment and a storage medium, wherein the method comprises the following steps: processing a text to be processed by using a target text classification model obtained by pre-training to obtain the category of the text to be processed; and executing the following cyclic processing on the text to be processed until all sentences in the text to be processed are deleted: randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text; processing the residual texts by using the target text classification model to obtain the categories of the residual texts; judging whether the type of the residual text is the same as that of the text to be processed or not, and if not, restoring the deleted sentence into the text to be processed; and taking the residual text obtained after the loop processing as the target text abstract. The invention combines the abstract obtained by the integral text semantics and improves the accuracy of text abstract extraction.

Description

Text abstract extraction method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text abstract extraction method, a text abstract extraction device, computer equipment and a storage medium.
Background
The abstract is a simple and coherent short text capable of reflecting the central content of a certain text, and can help people to shorten the reading time when reading a large amount of texts. The automatic text summarization technology is used for analyzing and processing a lengthy text by utilizing a series of text processing technologies through a computer, extracting main central ideas of the text, generating a brief and generalized summary, and helping a user to locate the content desired by the user.
The automatic text summarization technology is a research hotspot in the field of natural language processing, and is divided into an extraction type summary and a generation type summary according to the generation mode of summary content. At present, the generation technology is not mature, and the abstract is generated by an extraction method commonly used in the industry. However, such a method is literally, semantic relations of contexts are not utilized, the extracted abstract lacks relevance, key contents cannot be extracted according to the contexts, and user requirements cannot be met.
Disclosure of Invention
In view of the above deficiencies of the prior art, the present invention provides a method, an apparatus, a computer device and a storage medium for extracting a text abstract, so as to solve the problem that the prior art does not extract an abstract by using a context semantic relationship.
In order to achieve the above object, the present invention provides a text abstract extracting method, which comprises the following steps:
processing a text to be processed by using a target text classification model obtained by pre-training to obtain the category of the text to be processed;
and executing the following cyclic processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
processing the residual texts by using the target text classification model to obtain the categories of the residual texts;
judging whether the type of the residual text is the same as that of the text to be processed or not, and if not, restoring the deleted sentences to the text to be processed;
and taking the residual text obtained after the loop processing as the target text abstract.
Further, the target text classification model is obtained by training through the following steps:
collecting a sample data set, wherein the sample data set comprises a plurality of training texts, and each training text is marked with a corresponding category;
dividing the sample data set into a training set and a verification set according to a preset proportion;
training to obtain the target text classification model based on the training set;
and verifying the target text classification model based on the verification set, and finishing training if the verification is passed.
Further, the text to be processed and the training text are complaint texts.
Further, the categories of the text to be processed and the training text comprise time-ineligibility, price diversity, service attitude and the like.
Further, the target text classification model is a TEXTCNN model that includes an embedding layer, a convolutional layer, a pooling layer, a fully-connected layer, and a Softmax classification layer.
Further, the step of processing the text to be processed by using the pre-trained target text classification model is as follows:
vectorizing the text to be processed through the embedding layer to obtain a word vector of the text to be processed;
performing convolution processing on the word vectors of the text to be processed through the convolution layer to extract the features of the text to be processed;
performing pooling treatment on the characteristics of the text to be treated through the pooling layer to obtain dimension reduction characteristics of the text to be treated;
transmitting the dimensionality reduction features of the text to be processed to the Softmax classification layer through the full connection layer;
and processing the dimensionality reduction features of the text to be processed through the Softmax classification layer to obtain the category of the text to be processed.
Further, the text abstract extraction method further comprises the following steps: and preprocessing the text to be processed before processing the text to be processed by using a target text classification model obtained by pre-training.
In order to achieve the above object, the present invention further provides a text abstract extracting apparatus, including:
the category acquisition module is used for processing the text to be processed by utilizing a target text classification model obtained by pre-training to obtain the category of the text to be processed;
the cyclic deletion processing module is used for executing the following cyclic processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
processing the residual texts by using the target text classification model to obtain the categories of the residual texts;
judging whether the type of the residual text is the same as that of the text to be processed or not, and if not, restoring the deleted sentences to the text to be processed;
and the abstract acquisition module is used for acquiring the residual text obtained after the circulation processing is finished as the abstract of the target text.
Further, the text abstract extracting device further comprises: a model training module for training the target text classification model, the model training module comprising:
the system comprises a sample data set acquisition unit, a data processing unit and a data processing unit, wherein the sample data set is used for acquiring a sample data set, the sample data set comprises a plurality of training texts, and each training text is labeled with a corresponding category label;
the sample data set dividing unit is used for dividing the sample data set into a training set and a verification set according to a preset proportion;
the training unit is used for training to obtain the target text classification model based on the training set;
and the verification unit is used for verifying the target text classification model based on the verification set, and if the verification passes, the training is finished.
Further, the text to be processed and the training text are complaint texts.
Further, the categories of the text to be processed and the training text comprise time-ineligibility, price diversity, service attitude and the like.
Further, the target text classification model is a TEXTCNN model that includes an embedding layer, a convolutional layer, a pooling layer, a fully-connected layer, and a Softmax classification layer.
Further, the category acquisition module is specifically configured to:
vectorizing the text to be processed through the embedding layer to obtain a word vector of the text to be processed;
performing convolution processing on the word vectors of the text to be processed through the convolution layer to extract the features of the text to be processed;
performing pooling treatment on the characteristics of the text to be treated through the pooling layer to obtain dimension reduction characteristics of the text to be treated;
transmitting the dimensionality reduction features of the text to be processed to the Softmax classification layer through the full connection layer;
and processing the dimensionality reduction features of the residual text through the Softmax classification layer to obtain the category of the residual text.
Further, the text abstract extracting device further comprises: and the preprocessing module is used for preprocessing the text to be processed before processing the text to be processed by utilizing the target text classification model obtained by pre-training.
In order to achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the aforementioned method when executing the computer program.
In order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned method.
By adopting the technical scheme, the invention has the following beneficial effects:
the method comprises the steps of deleting sentences in a text to be processed through random circulation, calculating whether the text type after the sentences are deleted is the same as that before the sentences are deleted, if so, indicating that the deleted sentences have small semantic contribution to the text and should be deleted, otherwise, indicating that the deleted sentences have large semantic contribution to the text and should not be deleted, recovering the deleted sentences in the text, and obtaining the abstract of the text when all the sentences in the text are deleted. Since the above process is implemented based on the classification model which is trained based on the semantic meaning, the abstract obtained based on the present invention is an abstract combined with the overall semantic meaning of the text, i.e., the abstract can truly outline the overall information of the text from the semantic aspect. In addition, the invention randomly deletes sentences when deleting sentences, ensures that key semantics are not influenced by sequence, and improves the accuracy of text abstract generation while giving consideration to the performance of text processing speed.
Drawings
FIG. 1 is a flow chart of one embodiment of a text summarization method of the present invention;
FIG. 2 is a block diagram of an embodiment of a text summarization extraction apparatus according to the present invention;
fig. 3 is a hardware architecture diagram of one embodiment of the computer apparatus of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
As shown in fig. 1, the present invention provides a text abstract extracting method, which specifically includes the following steps:
s0, training according to the collected sample data set to obtain a target text classification model, wherein the specific training process comprises the following steps:
and S01, collecting a sample data set, wherein the sample data set comprises a plurality of training texts, and each training text is labeled with a corresponding category. In this embodiment, the training text may be a complaint text. For example, assuming that a vehicle insurance company needs to quickly obtain a complaint summary from a customer's complaint text, the sample data set collected should contain complaint texts labeled with different categories, including but not limited to age failure, price dissimilarity, and service attitude. It should be understood that, in addition to the complaint text, for different application scenarios, corresponding sample data sets may be collected according to different needs.
And S02, dividing the collected sample data set into a training set and a verification set according to a preset proportion, wherein the training set accounts for 80% and the verification set accounts for 20%.
And S03, training by adopting a gradient descent algorithm based on the training set to obtain a target text classification model. In the invention, the target text classification model is preferably a commonly used text classification model, namely a TEXTCNN model, wherein the TextCNN is a model for classifying texts by using a convolutional neural network and comprises an embedding layer, a convolutional layer, a pooling layer, a full-link layer and a Softmax classification layer.
And S04, verifying whether the performances of the target text classification model obtained by training, such as Accuracy (Accuracy), Precision (Precision), Recall (Recall), F1_ score (F1 score) and the like, meet preset conditions or not based on the verification set, if so, indicating that the target text classification model passes verification, finishing the training, otherwise, increasing the number of the training texts in the training set and retraining the target text classification model.
S1, obtaining a text to be processed, wherein the text to be processed can be a complaint text, for example, a complaint text of a car insurance customer.
S2, processing the text to be processed by using the trained target text classification model (TEXTCNN model) to obtain the category of the text to be processed, and specifically realizing the following steps:
s21, vectorizing the text to be processed through the embedded layer of the TEXTCNN model to obtain a word vector of the text to be processed;
s22, performing convolution processing on the word vectors of the text to be processed through the convolution layer of the TEXTCNN model to extract the characteristics of the text to be processed;
s23, performing pooling processing on the features of the text to be processed through a pooling layer of the TEXTCNN model to obtain dimension reduction features of the text to be processed;
s24, transmitting the dimensionality reduction features of the text to be processed to a Softmax classification layer through a full connection layer of the TEXTCNN model;
and S25, calculating the probability of the text to be processed corresponding to various classification labels according to the dimensionality reduction characteristics of the text to be processed through a Softmax classification layer of the TEXTCNN model, and taking the classification label with the maximum probability as the category of the text to be processed.
And S3, performing sentence segmentation on the text to be processed. In particular, the present invention may be labeled in terms of sentence-level symbols, such as periods ". ", exclamation point"! ", question mark"? And the like, and dividing sentences of the text to be processed. For example, suppose the pending text is the following complaint "apply for non-accident rescue, there is only one telephone contact in the middle to tell that two hours need to elapse, and as a result, it waits for 4 hours or more, or it does not elapse, for which complaints are not satisfied. The client calls that the user does not need to rescue at present and finds people for rescue by himself. The multiple contact with the Ann Union rescue 028 and 65200801 cannot be answered by people, and the client requires the company to give a saying. The facility handles the reply as soon as possible, thanks! ", the following four sentences are obtained after sentence division processing: sentence 1 is "non-accident rescue requested", only one telephone contact in the middle tells that two hours need to be waited for, and as a result, 4 hours or more are waited for or the time is not waited for, which is not satisfied with complaints. "the 2 nd sentence is" the customer says that I do not need rescue now, finds people for rescue by oneself. And the 3 rd sentence is that the multiple contact link rescue 028 and 65200801 is not answered by people, and the client requires me to give a statement. "sentence 4 is" the trouble mechanism handles the reply as soon as possible, thanks! ".
After the sentence dividing processing is finished, a corresponding deletion flag bit is set for each sentence, the initial value of the deletion flag bit is set to be 0, and when the deletion flag bit is 0, the corresponding sentence is not deleted.
And S4, randomly selecting a certain undeleted sentence from the text to be processed for deletion to obtain the residual text. After the sentence is selected and deleted, the sentence is marked as deleted, so that the sentence can not be deleted when the step is repeatedly executed subsequently. In this embodiment, marking a sentence as deleted means: and setting the deletion flag position of the sentence to be 1, wherein when the deletion flag position is 1, the corresponding sentence is deleted.
S5, processing the residual text by using a target text classification model, namely a TEXTCNN model, and obtaining the category of the residual text, wherein the specific flow is as follows:
s51, carrying out vectorization processing on the residual text through the embedded layer of the TEXTCNN model to obtain word vectors of the residual text;
s52, performing convolution processing on the word vectors of the residual text through the convolution layer of the TEXTCNN model to extract the characteristics of the residual text;
s53, performing pooling treatment on the characteristics of the residual text through a pooling layer of the TEXTCNN model to obtain dimension reduction characteristics of the residual text;
s54, transmitting the dimension reduction features of the residual text to a Softmax classification layer through a full connection layer of the TEXTCNN model;
and S55, calculating the probability of each classification label corresponding to the residual text through a Softmax classification layer of the TEXTCNN model, and taking the classification label with the maximum probability as the category of the residual text.
S6, determining whether the type of the remaining text obtained by deleting the sentence is the same as the type of the text to be processed, if so, indicating that the deleted sentence is not important to the overall semantics of the text to be processed, i.e. the sentence should be deleted from the target text abstract of the text to be processed, then executing step S8; if not, go to step S7.
S7, if the category of the remaining text is different from the category of the text to be processed, it indicates that the deleted sentence is important for the overall semantic meaning of the text, i.e. the sentence should not be deleted from the target text abstract of the text to be processed. Therefore, the deleted sentence is restored to the text to be processed, and step S8 is executed.
S8, determining whether all sentences in the text to be processed have been deleted, that is, determining whether all deletion flags of all sentences are 1, if yes, executing step S9, otherwise, returning to step S4 to execute the next loop processing.
And S9, taking the residual text finally obtained after all sentences in the text to be processed are deleted as the target text abstract to be extracted.
One application scenario of the present invention is: suppose that a text X to be processed includes A, B, C, D four sentences, and the type of the text processed by the target text classification model is M. When the method is used for processing, the sentence D is randomly deleted firstly, if the type of the text after the sentence D is deleted is still M, the sentence D is not important for the text X, and the sentence D can be deleted to obtain the residual text comprising the sentence A, B, C; randomly deleting a sentence C in the residual text, if the type of the text after the sentence C is deleted is not M, explaining that the sentence C is important to the text X, and if the sentence C can not be deleted, recovering the sentence C and still obtaining the residual text comprising the sentence A, B, C; and then, circularly and randomly deleting the sentences which are not deleted in the residual text, wherein the sentences C are not deleted any more because the sentences C are deleted, and the rest of the text obtained by deleting all the sentences in the text M is taken as the abstract in the same way. Taking the text to be processed as the complaint text provided in step S3 as an example, assuming that the category obtained after processing the text by the TEXTCNN model is "time-lapse failure", the category obtained after deleting the 1 st sentence is changed, and the category obtained after deleting the 2 nd, 3 rd or 4 th sentences is still "time-lapse failure", it indicates that the 1 st sentence is critical to the complaint text, and the 2 nd to 4 th sentences are non-critical to the text and should be deleted from the summary thereof, so that the summary of the complaint text is the 1 st sentence.
It can be seen that the invention deletes sentences in the text to be processed by random loop, and calculates whether the text type after the sentence deletion is the same as that before the deletion, if the same, it indicates that the deleted sentences contribute little to the semantics of the text and should be deleted, otherwise, it indicates that the deleted sentences contribute much to the semantics of the text and should not be deleted, and then restores the deleted sentences in the text, and when all the sentences in the text are deleted, the abstract of the text is obtained. The invention is realized based on the classification model, and the classification model is based on semantic training, so the abstract obtained based on the invention is the abstract combined with the whole text semantics, namely, the abstract can truly outline the whole information of the text from the aspect of the semantics, and the accuracy of text abstract generation is improved while the text processing speed performance is considered.
As a preferable scheme of this embodiment, before executing step S2, the method further includes preprocessing the acquired text to be processed, specifically including preprocessing the text to be processed, such as stop word filtering, that is, detecting whether a word in the text to be processed matches a stop word in a preset stop word table, and if so, deleting the matched word. It will be understood that stop words are generally unrealistic words such as "in", "with", "get", "having", etc.
It should be noted that, for the sake of simplicity, the present embodiment is described as a series of acts, but those skilled in the art should understand that the present invention is not limited by the described order of acts, because some steps can be performed in other orders or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Example two
As shown in fig. 2, the present embodiment provides a text abstract extracting apparatus 10, which includes:
the model training module 11 is used for training to obtain a target text classification model;
the obtaining module 12 is configured to process a to-be-processed text by using a target text classification model obtained through pre-training to obtain a category of the to-be-processed text, where the to-be-processed text may be a complaint text;
the loop deletion processing module 13 is configured to perform the following loop processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
processing the residual texts by using the target text classification model to obtain the categories of the residual texts;
judging whether the types of the residual texts are the same as the types of the texts to be processed, if not, restoring the deleted sentences to the texts to be processed;
and the abstract acquiring module 14 is configured to acquire a remaining text obtained after the loop processing is finished as a target text abstract.
In this embodiment, the model training module 11 includes:
the sample data set acquisition unit is used for acquiring a sample data set, the sample data set comprises a plurality of training texts, and each training text is marked with a corresponding category, wherein the training texts can be complaint texts;
the sample data set dividing unit is used for dividing the sample data set into a training set and a verification set according to a preset proportion;
the training unit is used for training to obtain a target text classification model based on a training set;
and the verification unit is used for verifying the target text classification model based on the verification set, finishing training if the verification is passed, and increasing the number of the training texts in the training set and re-training the target classification model if the verification is not passed.
In this embodiment, the target text classification model is a TEXTCNN model, which includes an embedding layer, a convolution layer, a pooling layer, a full-link layer, and a Softmax classification layer.
In this embodiment, the category obtaining module 12 is specifically configured to:
vectorizing the text to be processed through an embedded layer of the TEXTCNN model to obtain a word vector of the text to be processed;
performing convolution processing on word vectors of the text to be processed through a convolution layer of the TEXTCNN model to extract characteristics of the text to be processed;
performing pooling processing on the features of the text to be processed through a pooling layer of the TEXTCNN model to obtain dimension reduction features of the text to be processed;
transmitting the dimensionality reduction features of the text to be processed to a Softmax classification layer through a full connection layer of the TEXTCNN model;
and calculating the probability of the text to be processed corresponding to various classification labels according to the dimensionality reduction characteristics of the text to be processed by a Softmax classification layer of the TEXTCNN model, and taking the classification label with the maximum probability as the category of the text to be processed.
In this embodiment, the text abstract extracting apparatus 10 may further include a preprocessing module, configured to perform preprocessing on the text to be processed before processing the text to be processed by using a target text classification model obtained through pre-training, specifically including preprocessing such as stop word filtering, that is, detecting whether a word in the text to be processed matches a stop word in a preset stop word list, and if yes, deleting the matched word. It will be understood that the term of disablement is typically a shorthand term without actual meaning, such as "being," "ground," "being," "having," etc.
It should also be understood by those skilled in the art that the embodiments described in the specification are preferred embodiments and that the modules referred to are not necessarily essential to the invention.
EXAMPLE III
The present invention also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers) capable of executing programs, and the like. The computer device 20 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 22, which may be communicatively coupled to each other via a system bus, as shown in FIG. 3. It is noted that fig. 3 only shows the computer device 20 with components 21-22, but it is to be understood that not all shown components are required to be implemented, and that more or fewer components may be implemented instead.
In the present embodiment, the memory 21 (i.e., a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 20, such as a hard disk or a memory of the computer device 20. In other embodiments, the memory 21 may also be an external storage device of the computer device 20, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 20. Of course, the memory 21 may also include both internal and external storage devices of the computer device 20. In this embodiment, the memory 21 is generally used for storing an operating system and various application software installed on the computer device 20, such as the program codes of the text abstract extracting apparatus 10 of the second embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 20. In this embodiment, the processor 22 is configured to operate the program code stored in the storage 21 or process data, for example, operate the text abstract extracting apparatus 10, so as to implement the text abstract extracting method of the first embodiment.
Example four
The present invention also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App (business) store, etc., on which a computer program is stored, which when executed by a processor implements a corresponding function. The computer-readable storage medium of the embodiment is used for storing the text abstract extracting apparatus 10, and when being executed by a processor, the computer-readable storage medium implements the text abstract extracting method of the first embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that the above embodiment method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or used directly or indirectly in other related fields, are included in the scope of the present invention.

Claims (10)

1. A text abstract extraction method is characterized by comprising the following steps:
processing a text to be processed by using a target text classification model obtained by pre-training to obtain the category of the text to be processed;
and executing the following cyclic processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
processing the residual texts by using the target text classification model to obtain the categories of the residual texts;
judging whether the types of the residual texts are the same as the types of the texts to be processed, if not, restoring the deleted sentences to the texts to be processed;
and taking the residual text obtained after the loop processing as the target text abstract.
2. The method for extracting a text abstract according to claim 1, wherein the target text classification model is obtained by training through the following steps:
collecting a sample data set, wherein the sample data set comprises a plurality of training texts, and each training text is labeled with a corresponding category;
dividing the sample data set into a training set and a verification set according to a preset proportion;
training to obtain the target text classification model based on the training set;
and verifying the target text classification model based on the verification set, and finishing training if the verification is passed.
3. The method of claim 2, wherein the text to be processed and the training text are complaint texts.
4. The method according to claim 3, wherein the categories of the text to be processed and the training text include time-out, price disagreement and service attitude.
5. The text summarization extraction method of claim 1 wherein the target text classification model is a TEXTCNN model comprising an embedding layer, a convolution layer, a pooling layer, a fully-connected layer, and a Softmax classification layer.
6. The method for extracting a text abstract according to claim 5, wherein the step of processing the text to be processed by using the pre-trained target text classification model comprises:
vectorizing the text to be processed through the embedding layer to obtain a word vector of the text to be processed;
performing convolution processing on the word vectors of the text to be processed through the convolution layer to extract the features of the text to be processed;
performing pooling treatment on the characteristics of the text to be treated through the pooling layer to obtain dimension reduction characteristics of the text to be treated;
transmitting the dimensionality reduction features of the text to be processed to the Softmax classification layer through the full connection layer;
and processing the dimensionality reduction features of the text to be processed through the Softmax classification layer to obtain the category of the text to be processed.
7. The method of claim 1, further comprising: the method comprises the steps of preprocessing a text to be processed before processing the text to be processed by using a target text classification model obtained through pre-training.
8. An apparatus for extracting a text abstract, comprising:
the category acquisition module is used for processing the text to be processed by utilizing a target text classification model obtained by pre-training to obtain the category of the text to be processed;
the cyclic deletion processing module is used for executing the following cyclic processing on the text to be processed until all sentences in the text to be processed are deleted:
randomly deleting a sentence which is not deleted from the text to be processed to obtain a residual text;
processing the residual texts by using the target text classification model to obtain the categories of the residual texts;
judging whether the type of the residual text is the same as that of the text to be processed or not, and if not, restoring the deleted sentence into the text to be processed;
and the abstract acquisition module is used for acquiring the residual text obtained after the circulation processing is finished as the target text abstract.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented by the processor when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201910753710.4A 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium Active CN110750637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910753710.4A CN110750637B (en) 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910753710.4A CN110750637B (en) 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110750637A true CN110750637A (en) 2020-02-04
CN110750637B CN110750637B (en) 2024-05-24

Family

ID=69275839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910753710.4A Active CN110750637B (en) 2019-08-15 2019-08-15 Text abstract extraction method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110750637B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667815A (en) * 2020-12-30 2021-04-16 北京捷通华声科技股份有限公司 Text processing method and device, computer readable storage medium and processor
CN113033216A (en) * 2021-03-03 2021-06-25 东软集团股份有限公司 Text preprocessing method and device, storage medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1040267A (en) * 1996-07-26 1998-02-13 Nec Corp Document summary viewer
CN101446940A (en) * 2007-11-27 2009-06-03 北京大学 Method and device of automatically generating a summary for document set
CN106133772A (en) * 2013-12-18 2016-11-16 谷歌公司 With entity mark video based on comment summary
KR20170089369A (en) * 2016-01-26 2017-08-03 주식회사 마커 Method for automatic summarizing document by user learning
WO2018036555A1 (en) * 2016-08-25 2018-03-01 腾讯科技(深圳)有限公司 Session processing method and apparatus
CN109376242A (en) * 2018-10-18 2019-02-22 西安工程大学 Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN110069624A (en) * 2019-04-28 2019-07-30 北京小米智能科技有限公司 Text handling method and device
KR20190090944A (en) * 2018-01-26 2019-08-05 주식회사 두유비 System and method for machine learning to sort sentence importance and generating summary sentence based on keyword importance

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1040267A (en) * 1996-07-26 1998-02-13 Nec Corp Document summary viewer
CN101446940A (en) * 2007-11-27 2009-06-03 北京大学 Method and device of automatically generating a summary for document set
CN106133772A (en) * 2013-12-18 2016-11-16 谷歌公司 With entity mark video based on comment summary
KR20170089369A (en) * 2016-01-26 2017-08-03 주식회사 마커 Method for automatic summarizing document by user learning
WO2018036555A1 (en) * 2016-08-25 2018-03-01 腾讯科技(深圳)有限公司 Session processing method and apparatus
KR20190090944A (en) * 2018-01-26 2019-08-05 주식회사 두유비 System and method for machine learning to sort sentence importance and generating summary sentence based on keyword importance
CN109492091A (en) * 2018-09-28 2019-03-19 科大国创软件股份有限公司 A kind of complaint work order intelligent method for classifying based on convolutional neural networks
CN109376242A (en) * 2018-10-18 2019-02-22 西安工程大学 Text classification algorithm based on Recognition with Recurrent Neural Network variant and convolutional neural networks
CN110069624A (en) * 2019-04-28 2019-07-30 北京小米智能科技有限公司 Text handling method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667815A (en) * 2020-12-30 2021-04-16 北京捷通华声科技股份有限公司 Text processing method and device, computer readable storage medium and processor
CN113033216A (en) * 2021-03-03 2021-06-25 东软集团股份有限公司 Text preprocessing method and device, storage medium and electronic equipment
CN113033216B (en) * 2021-03-03 2024-05-28 东软集团股份有限公司 Text preprocessing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110750637B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN110347835B (en) Text clustering method, electronic device and storage medium
CN110502608B (en) Man-machine conversation method and man-machine conversation device based on knowledge graph
CN110781276B (en) Text extraction method, device, equipment and storage medium
CN110765785B (en) Chinese-English translation method based on neural network and related equipment thereof
CN107679084B (en) Clustering label generation method, electronic device and computer readable storage medium
CN111858843B (en) Text classification method and device
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN110334186B (en) Data query method and device, computer equipment and computer readable storage medium
WO2022048363A1 (en) Website classification method and apparatus, computer device, and storage medium
CN110866115B (en) Sequence labeling method, system, computer equipment and computer readable storage medium
CN110705235B (en) Information input method and device for business handling, storage medium and electronic equipment
CN110825827B (en) Entity relationship recognition model training method and device and entity relationship recognition method and device
CN109461016B (en) Data scoring method, device, computer equipment and storage medium
CN112052305A (en) Information extraction method and device, computer equipment and readable storage medium
CN111831920A (en) User demand analysis method and device, computer equipment and storage medium
CN113722438A (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN110610180A (en) Method, device and equipment for generating recognition set of wrongly-recognized words and storage medium
CN110750637B (en) Text abstract extraction method, device, computer equipment and storage medium
CN113947095A (en) Multilingual text translation method and device, computer equipment and storage medium
CN113505595A (en) Text phrase extraction method and device, computer equipment and storage medium
CN110705258A (en) Text entity identification method and device
CN113779202B (en) Named entity recognition method and device, computer equipment and storage medium
CN115965003A (en) Event information extraction method and event information extraction device
CN113342932B (en) Target word vector determining method and device, storage medium and electronic device
CN115033683A (en) Abstract generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant