CN114239555A

CN114239555A - Training method of keyword extraction model and related device

Info

Publication number: CN114239555A
Application number: CN202111602825.7A
Authority: CN
Inventors: 李电祥; 陈学珉; 毛骏
Original assignee: Shanghai Changsheng Computer Technology Co ltd
Current assignee: Shanghai Changsheng Computer Technology Co ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-03-25

Abstract

The application discloses a training method of a keyword extraction model, which comprises the following steps: carrying out error correction processing on the original data to obtain error-corrected data; performing data enhancement processing on the error-corrected data based on the synonym vocabulary to obtain first input data; performing countermeasure learning processing on the error-corrected data based on a projection gradient descent mode to obtain second input data; and training the first input data and the second input data by adopting a keyword extraction model based on attention and an improved Bi-LSTM structure to obtain the trained keyword model. By adopting the attention and the improved Bi-LSTM structure to extract the characteristics, the key information can be effectively extracted from the data, and the performance and effect of extracting the key words are improved. The application also discloses a training device, a server and a computer readable storage medium of the keyword extraction model, which have the beneficial effects.

Description

Training method of keyword extraction model and related device

Technical Field

The present application relates to the field of natural language processing, and in particular, to a method and an apparatus for training a keyword extraction model, a server, and a computer-readable storage medium.

Background

With the continuous development of AI (Artificial Intelligence) technology, the application range is also getting larger and larger. The keyword extraction is an important component in the intelligent quality inspection, keywords can be quickly extracted from complicated long sentences, and corresponding dialogues and guidance suggestions are given according to the keywords. The method has very important value and significance in the field of intelligent quality inspection by accurately and quickly extracting the keywords.

In the related art, TF-IDF (term frequency-inverse document frequency, a commonly used weighting technique for information retrieval and data mining) is the most basic and easiest to understand method for extracting keywords, but the importance of a word of an article is not comprehensive enough only by word frequency, sometimes the number of the important words is not enough, and the calculation cannot represent position information and cannot represent the importance of the word in the context. LDA (Latent Dirichlet Allocation, implicit Dirichlet Allocation) gives documents in a probability distribution form, and corresponding keywords are extracted through analysis, so that the method is a very good keyword extraction technology, but the algorithm also has some defects, is low in subject identification degree and difficult to interpret, and has the problems of ambiguity of subject words and the like.

Therefore, how to improve the accuracy of extracting keywords is a key issue of attention of those skilled in the art.

Disclosure of Invention

The application aims to provide a training method, a training device, a server and a computer readable storage medium of a keyword extraction model so as to improve the performance and accuracy of keyword extraction.

In order to solve the above technical problem, the present application provides a training method for a keyword extraction model, including:

carrying out error correction processing on the original data to obtain error-corrected data;

performing data enhancement processing on the error-corrected data based on the synonym vocabulary to obtain first input data;

performing countermeasure learning processing on the error-corrected data based on a projection gradient descent mode to obtain second input data;

and training the first input data and the second input data by adopting a keyword extraction model based on attention and an improved Bi-LSTM structure to obtain the trained keyword model.

Optionally, the method further includes:

acquiring data to be processed;

and processing the data to be processed by adopting the trained keyword model to obtain a keyword extraction set.

Optionally, processing the data to be processed by using the trained keyword model to obtain a keyword extraction set, including:

vectorizing the data to be processed to obtain a multi-dimensional matrix representation;

performing primary feature extraction on the multi-dimensional matrix representation through a Bi-LSTM module to obtain primary features;

performing deep feature extraction on the primary features through the attention layer to obtain deep features;

and processing the deep features to obtain the keyword extraction set.

Optionally, performing vectorization representation processing on the data to be processed to obtain a multi-dimensional matrix representation, including:

performing vectorization expression of word granularity and vectorization expression of word granularity on the data to be processed respectively to obtain word matrix expression and word matrix expression;

the word matrix representation and the word matrix representation are taken as the multi-dimensional matrix representation.

Optionally, performing error correction processing on the original data to obtain error-corrected data, including:

recognizing original voice data through a voice recognition technology to obtain text data;

carrying out error detection on the text data based on the word segmentation result of the text data to obtain an error position candidate set;

and performing error correction on the text data based on the error position candidate set to obtain the corrected data.

Optionally, performing data enhancement processing on the error-corrected data based on the synonym vocabulary to obtain first input data, including:

constructing a standard synonym word list and a professional synonym word list based on database data;

performing data enhancement operation on the error-corrected data by adopting the standard synonym vocabulary and the professional synonym vocabulary to obtain first input data; wherein the data enhancement operation comprises at least: synonym replacement, random deletion, random swapping, reverse translation, noise injection, grammar transformation.

Optionally, performing countermeasure learning processing on the error-corrected data based on a projection gradient descent manner to obtain second input data, including:

performing noise generation processing on the error-corrected data based on the projection gradient descent mode to obtain a noise component;

adding the noise component to the error-corrected data to obtain the second input data.

The present application further provides a training device for keyword extraction model, including:

the data error correction module is used for carrying out error correction processing on the original data to obtain error-corrected data;

the data enhancement module is used for carrying out data enhancement processing on the error-corrected data based on the synonym vocabulary to obtain first input data;

the countermeasure learning module is used for carrying out countermeasure learning processing on the error-corrected data based on a projection gradient descent mode to obtain second input data;

and the model training module is used for training the first input data and the second input data by adopting a keyword extraction model based on attention and an improved Bi-LSTM structure to obtain the trained keyword model.

The present application further provides a server, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the training method as described above when executing the computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the training method as described above.

The application provides a training method of a keyword extraction model, which comprises the following steps: carrying out error correction processing on the original data to obtain error-corrected data; performing data enhancement processing on the error-corrected data based on the synonym vocabulary to obtain first input data; performing countermeasure learning processing on the error-corrected data based on a projection gradient descent mode to obtain second input data; and training the first input data and the second input data by adopting a keyword extraction model based on attention and an improved Bi-LSTM structure to obtain the trained keyword model.

By carrying out data enhancement processing and antagonistic learning data on the corrected data and then training the keyword extraction model, the generalization capability and robustness of the model are improved while the data volume is increased, the effect of data insufficiency or data imbalance is solved, and by adopting the attion and the improved Bi-LSTM structure to carry out feature extraction, the key information can be effectively extracted from the data, and the performance and effect of keyword extraction are improved.

The application also provides a training device, a server and a computer readable storage medium for the keyword extraction model, which have the beneficial effects, and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a method for training a keyword extraction model according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of another method for training a keyword extraction model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a keyword extraction model provided in the embodiment of the present application;

FIG. 4 is a schematic structural diagram of a Bi-LSTM module of the keyword extraction model provided in the embodiment of the present application;

fig. 5 is a schematic structural diagram of an attention layer of the keyword extraction model according to the embodiment of the present application;

fig. 6 is a schematic structural diagram of a training apparatus for a keyword extraction model according to an embodiment of the present disclosure.

Detailed Description

The core of the application is to provide a training method, a training device, a server and a computer readable storage medium of a keyword extraction model so as to improve the performance and accuracy of keyword extraction.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the related technology, TF-IDF is the most basic and simplest method for extracting keywords, but the importance of a word of an article is measured only from word frequency and is not comprehensive enough, sometimes, the number of the important words is not enough, and the calculation cannot reflect position information and cannot reflect the importance of the word in the context. The LDA gives the document in a probability distribution form, and extracts corresponding keywords through analysis, so that the LDA is a very good keyword extraction technology, but the algorithm also has some defects, the subject identification degree is low, the interpretation is difficult, and the ambiguity of the subject words exists.

Therefore, the method for training the keyword extraction model has the advantages that data enhancement processing and data confrontation learning data are carried out on the corrected data, then the keyword extraction model is trained, the data volume is increased, meanwhile, the generalization capability and robustness of the model are improved, the effect of insufficient data or unbalanced data is solved, in addition, the feature extraction is carried out by adopting the attention and the improved Bi-LSTM structure, the key information can be effectively extracted from the data, and the performance and the effect of keyword extraction are improved.

The following describes a method for training a keyword extraction model according to an embodiment of the present application.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for training a keyword extraction model according to an embodiment of the present disclosure.

In this embodiment, the method may include:

s101, carrying out error correction processing on original data to obtain error-corrected data;

it can be seen that this step aims to perform error correction processing on the original data to obtain error-corrected data. The original data obtained is not uniform in source, and the accuracy of the obtaining and converting mode is not uniform, so that the text errors are easy to occur in the data. Therefore, in order to maintain the validity of the data, in this embodiment, the data is first subjected to error correction processing to obtain error-corrected data.

The process of performing error correction on the original data may adopt any error correction method provided in the prior art, and may adopt the following error correction processing method, which is not limited herein.

Further, the step may include:

step 1, recognizing original voice data through a voice recognition technology to obtain text data;

step 2, carrying out error detection on the text data based on the word segmentation result of the text data to obtain an error position candidate set;

and 3, carrying out error correction on the text data based on the error position candidate set to obtain error-corrected data.

It can be seen that the present alternative scheme mainly explains how error correction is performed. In the alternative scheme, original voice data are recognized through a voice recognition technology to obtain text data, error detection is carried out on the text data based on word segmentation results of the text data to obtain an error position candidate set, and error correction is carried out on the text data based on the error position candidate set to obtain error-corrected data.

That is, in the present alternative, the corresponding error position is first detected, and then error correction is performed based on the detected error position. The error detection process comprises the steps of firstly segmenting words of the text data to obtain word segmentation results, and then carrying out error detection on the text data based on the word segmentation results. The accuracy and the effectiveness of error detection on the text data are improved, and a quick error detection mode is realized.

S102, performing data enhancement processing on the error-corrected data based on the synonym vocabulary to obtain first input data;

on the basis of S101, this step aims to perform data enhancement processing on the error-corrected data based on the synonym vocabulary, resulting in first input data. The data enhancement process is a process of generating a large amount of data based on a small amount of data, can effectively improve the generalization capability and robustness of the trained model, and solves the problems of insufficient data or unbalanced data.

In this step, data enhancement processing is mainly performed through the synonym table, that is, the content in the synonym table is adopted to perform certain modification on the text data, so as to increase the number of data.

Further, the step may include:

step 1, constructing a standard synonym table and a professional synonym table based on database data;

step 2, performing data enhancement operation on the error-corrected data by adopting a standard synonym table and a professional synonym table to obtain first input data; wherein the data enhancement operation comprises at least: synonym replacement, random deletion, random swapping, reverse translation, noise injection, grammar transformation.

It can be seen that the present alternative is primarily illustrative of how data enhancement may be performed. In the alternative scheme, a standard synonym vocabulary and a professional synonym vocabulary are constructed based on database data, and data enhancement operation is performed on the error-corrected data by adopting the standard synonym vocabulary and the professional synonym vocabulary to obtain first input data; wherein the data enhancement operation comprises at least: synonym replacement, random deletion, random swapping, reverse translation, noise injection, grammar transformation.

Wherein synonym replacement: randomly extracting words with a certain proportion from sentences to complete synonym replacement, replacing common words with a common word list, replacing professional words with words of a professional word list, and replacing with an indefinite probability p. Wherein the probability p satisfies a normal distribution of 0 to 0.3.

Wherein, randomly deleting: in the process of data enhancement, semantic features are generally found to exist in a certain range of the context of a target entity, so that in order to avoid accidental damage to keywords and the context features thereof, special processing is considered to be carried out on professional vocabularies, and random deletion is guaranteed not to be carried out in a marking mode. For each word in the sentence, it is deleted randomly with an indefinite probability p. Wherein, the probability p satisfies the normal distribution of 0-0.3, but the number is not processed, and the number in collection has clear meaning.

Wherein, random exchange: for each word in the sentence, the order of the words is randomly exchanged with an indefinite probability p for each word in the sentence. Wherein the probability p satisfies a normal distribution of 0-0.3.

Wherein, reverse translation: the way of interpreting text and retraining meaning by machine translation can generally be a way of using languages such as English, French and Italian to perform intermediate language

Wherein, noise injection: and the model has stronger robustness to disturbance by randomly injecting noise into the text. Such as: the sentence is disordered, and a certain word is randomly occupied by using a placeholder.

Wherein, the grammar conversion: and converting the active sentence into the passive sentence or converting the passive sentence into the active sentence by using a grammar mode.

S103, performing countermeasure learning processing on the error-corrected data based on a projection gradient descent mode to obtain second input data;

in addition to S102, the present step is intended to perform countermeasure learning processing on the error-corrected data based on the projection gradient descent method, resulting in second input data.

The method is characterized in that carefully selected artificially synthesized noise is added into input natural data of a deep neural network to confuse the neural network, so that the anti-interference performance of a neural network model is enhanced and the robustness of the model is improved.

The projection Gradient descent method is pgd (projected Gradient) and is a large-scale constraint optimization method. Continuous disturbance operation is realized, and the effect of counterstudy is improved.

Further, the step may include:

step 1, performing noise generation processing on the corrected data based on a projection gradient descent mode to obtain a noise component;

and step 2, adding a noise component into the error-corrected data to obtain second input data.

It can be seen that this alternative is primarily illustrative of how countermeasure generation can be performed. In this alternative, the noise generation processing is performed on the error-corrected data based on the projection gradient descent method to obtain a noise component, and the noise component is added to the error-corrected data to obtain second input data.

And S104, training the first input data and the second input data by adopting a keyword extraction model based on attention and an improved Bi-LSTM structure to obtain a trained keyword model.

On the basis of S102 and S103, this step aims to train the first input data and the second input data by using a keyword extraction model based on attribute and an improved Bi-directional Long Short-Term Memory (Bi-directional Long Short-Term Memory) to obtain a trained keyword model.

In order to fully acquire information of adjacent sentences, a traditional Bi-LSTM structure is modified, in a backward LSTM structure in the Bi-LSTM, an input source is the vector of an embedding layer (embedding layer) and the input at the time t, the vector representation of the embedding layer at the time t +1 is considered, and the compactness of adjacent vectors is increased through the operation of a sliding window. Namely, the attention and the improved Bi-LSTM structure are realized.

Further, the keyword extraction model comprises: vector conversion layer, Bi-LSTM layer, attention layer and label filter layer.

The vector conversion layer is used for converting vectorization.

Wherein the Bi-LSTM layer is used to perform primary feature extraction.

Wherein, the attention layer is used for deep layer feature extraction.

Wherein, the label filter layer carries out label characteristic elimination.

Further, this embodiment may further include:

step 1, acquiring data to be processed;

and 2, processing the data to be processed by adopting the trained keyword model to obtain a keyword extraction set.

It can be seen that this alternative scheme is mainly illustrative of the fact that the keyword model can also be used for processing. In the alternative, to-be-processed data is acquired, and the trained keyword model is adopted to process the to-be-processed data, so that a keyword extraction set is obtained.

Further, step 2 in the last alternative may include:

step 1, vectorization representation processing is carried out on data to be processed to obtain multi-dimensional matrix representation;

step 2, performing primary feature extraction on the multi-dimensional matrix representation through a Bi-LSTM module to obtain primary features;

step 3, carrying out deep layer feature extraction on the primary features through the attention layer to obtain deep layer features;

and 4, processing the deep features to obtain a keyword extraction set.

Therefore, the optional scheme summary mainly explains how to process data by adopting a keyword model. In the alternative scheme, vectorization representation processing is carried out on data to be processed to obtain multi-dimensional matrix representation, primary feature extraction is carried out on the multi-dimensional matrix representation through a Bi-LSTM module to obtain primary features, deep feature extraction is carried out on the primary features through an attention layer to obtain deep features, and the deep features are processed to obtain a keyword extraction set.

The process of processing the deep features to obtain the keyword extraction set can be realized based on a tag filter layer, and the prediction result of the word vector model are subjected to intersection processing to obtain the final predicted output keyword result, namely the keyword extraction set.

Further, step 1 in the last alternative may include:

step 1, performing vectorization representation of word granularity and vectorization representation of word granularity on data to be processed respectively to obtain word matrix representation and word matrix representation;

and 2, representing the word matrix and the word matrix as a multi-dimensional matrix.

It can be seen that in this alternative, we mainly describe how to perform vector conversion. In the alternative scheme, vectorization representation of word granularity and vectorization representation of word granularity are respectively carried out on data to be processed to obtain word matrix representation and word matrix representation, and the word matrix representation are used as multi-dimensional matrix representation.

In summary, in the embodiment, data enhancement processing and data confrontation learning data are performed on the error-corrected data, and then the keyword extraction model is trained, so that the data volume is increased, the generalization capability and robustness of the model are improved, the effect of data insufficiency or data imbalance is solved, and by performing feature extraction by adopting the attention and the improved Bi-LSTM structure, key information can be effectively extracted from the data, and the performance and effect of keyword extraction are improved.

The method for training the keyword extraction model provided in the present application is further described with reference to another specific embodiment.

Referring to fig. 2, fig. 2 is a flowchart of another method for training a keyword extraction model according to an embodiment of the present disclosure.

In this embodiment, the original data is real hasty data from actual business of a company, and error correction work needs to be performed on text data converted from an ASR (Automatic Speech Recognition) to a text. Common types of error correction are: harmonic word correction, such as eye-glasses; error correction of confusing sound words, such as wandering girl-cowherd girl; the method is similar to character correction, such as sorghum-sorghum and the like. Chinese error correction can be roughly divided into two steps, the first being error detection and the second being error correction. In the error detection part, words are divided by jieba, and because sentences contain wrongly-written characters, the word segmentation result is often in the situation of wrong segmentation, so that errors are detected from two aspects of character granularity and word granularity, and suspected error results with different granularities are processed to form a suspected error position candidate set; in the error correction part, all suspected error parts are traversed, a pronunciation-like dictionary and a shape-like dictionary are used for replacing words in error positions, then the perplexity of sentences is calculated through a language model, and results of all candidate sets are compared and sequenced to obtain the optimal result.

Data augmentation is a process of generating a large amount of Data from a small amount of Data, and has the effects of improving the generalization capability and robustness of a model and solving Data insufficiency or Data imbalance. The present embodiment accomplishes data enhancement using the following means. Firstly, constructing two vocabularies of a common synonym list and a professional synonym list, and marking words appearing in the professional vocabularies by BIOES, wherein B represents beginning (Begin), I represents middle (Intermediate), E represents End (End), O represents Other, and S represents Single character (Single). Data enhancement is then achieved by using synonym substitution, random deletion, random exchange, reverse translation, noise injection, grammar transformation.

The input of the counterlearning in NLP (Natural Language Processing) is discrete, and cannot be directly added to the original data, and disturbance on continuous embedding is considered. In this embodiment, a modified PGD (Fast Gradient Method) mode of FGM (Fast Gradient Method) is adopted, and compared with the one-step countermeasure of FGM, the PGD adopts a strategy of small step and multiple step to perform countermeasure. Specifically, forward and backward propagation is performed one time, and the disturbance is calculated from the grad (gradient) one time, and is added to the embedding grad. If one range is exceeded, the mapping is back to within range again.

The anti-learning is to add carefully selected artificially synthesized noise into input natural data of the deep neural network to confuse the neural network, so that the anti-interference performance of the neural network model is enhanced and the robustness of the model is improved.

The basic idea of counterlearning is the min-max equation, as follows:

wherein, the brackets in the inner layer is a maximization process, and a group of countermeasure samples which can maximize loss (loss) in the sample space are to be found, the countermeasure samples are obtained by combining an original sample x and a disturbance term obtained through a certain means, wherein x represents an input, max (L) is an optimization target, a min function refers to a sample set, and the expected loss of the model on the countermeasure sample set is minimized through updating model parameters.

Therefore, the original data passes through the error correction module, the data enhancement module (data alignment) and the countermeasure learning module, and after the data is processed in the early stage, the data is cleaned and amplified in the early stage.

In a more specific application scenario, a sklern module can be used to randomly extract data, and the data is divided into a training set, a testing set and a verification set, wherein the ratio of the training set to the testing set to the verification set is 7:2: 1. Training the training set data in a deep learning model, verifying by using the verification set data, avoiding the overfitting problem of the model, ensuring the robustness and generalization capability of the model, predicting unknown data by using the test set data, and improving the accuracy, recall rate and F1 value of the model by continuous model adjustment and parameter setting.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a keyword extraction model according to an embodiment of the present application.

As shown in fig. 3, Input is the original text data processed by the previous data, and a tokenization operation is required to complete the conversion process from the chinese character to the vector. Because the Chinese language is profound, the segmentation cannot be directly segmented by a space as English, in addition, the standard of the segmentation is not universal, and new words are continuously generated, so the Chinese segmentation is difficult to have a universal tool, and a set of brand-new segmentation method and strategy are usually required to be established aiming at different tasks and fields. The present embodiment does not use word segmentation to do the processing. Using the embedding approach, a vector representation of word granularity is made using the skip-gram of the word2vec model from two different granularities, word granularity and word granularity, i.e., the way in which its contextual words are predicted with the word Wi known. The vocabulary of word granularity is typically small relative to word granularity, which makes the individual word semantic representations of word granularity less accurate than word granularity, but at the same time makes word granularity better able to model words that appear less frequently in training or words that are not in the vocabulary (OOV, out of vocabulary, unknown words), considering the use of BERT (Bidirectional Encoder Representation from transformations) model for vector Representation of word granularity. input is input and subjected to embedding of word2vec and BERT to obtain two different vector representations of word granularity and word granularity, and the two different vector representations are connected through a [ sep ] flag bit to obtain a brand-new spliced vector representation.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a Bi-LSTM module of the keyword extraction model according to the embodiment of the present application.

And the spliced vector enters a Bi-LSTM module to complete primary feature extraction. The currently used single-layer LSTM (Long Short-Term Memory network) structure is extracted from the previous surface layer, but the single-layer LSTM can only learn information features from front to back, and cannot learn information features from back to front. For example, "this package is not well suited for me, and no previous package is good", where the less well suited is a modification of the package, but the comparison information cannot be learned. However, the Bi-LSTM structure can better capture bidirectional semantic dependence by performing splicing operation on the learned characteristics of the forward LSTM layer and the backward LSTM layer to finish the processing of bidirectional information. In the process of consulting the urging sentence, the information between adjacent words is found to have very close relation, in order to fully acquire the information of adjacent sentences, the Bi-LSTM structure is considered to be modified, specifically, as shown by oblique lines in FIG. 4, in the backward LSTM structure in the Bi-LSTM, the input source is in addition to the vector of the embedding layer and the input at the time t, the embedding layer vector representation at the time t +1 is considered to be newly added, and the compactness of the adjacent vectors is increased through the operation of sliding a window.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an attention layer of a keyword extraction model according to an embodiment of the present application.

After the feature extraction of the Bi-LSTM layer, the information enters an attention layer to carry out deep information extraction. The attention layer can be simply understood as turning the center of gravity of interest of the model from the global region to the core key region. The attention mechanism is much like the logic of a human looking at a picture, when we are seeing a picture, we do not see the entire content of the picture, but focus attention on the focus of the picture. Meanwhile, the attention mechanism has three advantages: less parameters, high speed and good effect. Specifically, the structure of the model enables the complexity of the model to be smaller than that of a CNN (Convolutional Neural Networks) and an RNN (Recurrent Neural Networks), parameters are fewer, and the requirement on computational power is certainly less; the attention mechanism solves the problem that RNN can not be calculated in parallel, so that the model can be calculated in parallel, and the calculation speed is higher; the attention is the key point, and even if the text is long, important information in the long text can be grasped from the middle. The structure diagram of attention is shown in fig. 5, and can be simply explained by 3 steps: the first step is: performing similarity calculation on Q (query) and K (key) to obtain a weight; secondly, normalizing the weight to obtain a directly usable weight; and thirdly, carrying out weighted summation on the weight sum V (value). The thinking of the attention structure can be highly summarized by 'weighted sum', the RNN era is a memorial hardback, and the model of attention learns the bailout, so that the model is resuscitated, the bailout is understood, and the society has a through-going convergence.

The last layer of the model structure is added with a label filter layer, and the condition that the prediction results of the word vector model are inconsistent is mainly considered. In the process of model training, with the increase of the training epoch (round number), the problem of overfitting may occur to the model, resulting in the situation that the predicted keywords are more than the actual keywords, so the operation of solving the intersection between the prediction result of the word vector model and the prediction result of the word vector model is considered as the final prediction output keyword result.

Therefore, in the embodiment, data enhancement processing and data confrontation learning data are performed on the error-corrected data, then the keyword extraction model is trained, the data volume is increased, meanwhile, the generalization capability and robustness of the model are improved, the effect of data insufficiency or data unbalance is solved, and by adopting the attention and the improved Bi-LSTM structure to perform feature extraction, key information can be effectively extracted from the data, and the performance and effect of keyword extraction are improved.

In the following, the training apparatus of the keyword extraction model provided in the embodiment of the present application is introduced, and the training apparatus of the keyword extraction model described below and the training method of the keyword extraction model described above may be referred to in correspondence with each other.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a training apparatus for keyword extraction models according to an embodiment of the present disclosure.

In this embodiment, the apparatus may include:

a data error correction module 100, configured to perform error correction processing on original data to obtain error-corrected data;

the data enhancement module 200 is configured to perform data enhancement processing on the error-corrected data based on the synonym vocabulary to obtain first input data;

the confrontation learning module 300 is configured to perform confrontation learning processing on the error-corrected data based on a projection gradient descent manner to obtain second input data;

and the model training module 400 is used for training the first input data and the second input data by adopting a keyword extraction model based on attention and an improved Bi-LSTM structure to obtain a trained keyword model.

Optionally, the apparatus may further include:

the extraction module is used for acquiring data to be processed; and processing the data to be processed by adopting the trained keyword model to obtain a keyword extraction set.

Optionally, the data error correction module 100 is specifically configured to recognize original voice data through a voice recognition technology to obtain text data; carrying out error detection on the text data based on the word segmentation result of the text data to obtain an error position candidate set; and performing error correction on the text data based on the error position candidate set to obtain error-corrected data.

Optionally, the data enhancement module 200 is specifically configured to construct a standard synonym table and a professional synonym table based on the database data; performing data enhancement operation on the error-corrected data by adopting a standard synonym vocabulary and a professional synonym vocabulary to obtain first input data; wherein the data enhancement operation comprises at least: synonym replacement, random deletion, random swapping, reverse translation, noise injection, grammar transformation.

Optionally, the confrontation learning module 300 is specifically configured to perform noise generation processing on the error-corrected data based on a projection gradient descent manner to obtain a noise component; a noise component is added to the error-corrected data to obtain second input data.

An embodiment of the present application further provides a server, including:

a memory for storing a computer program;

a processor for implementing the steps of the training method as described in the above embodiments when executing the computer program.

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the training method according to the above embodiments.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The present application provides a method, an apparatus, a server and a computer readable storage medium for training a keyword extraction model. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. A method for training a keyword extraction model is characterized by comprising the following steps:

2. The training method of claim 1, further comprising:

acquiring data to be processed;

3. The training method according to claim 2, wherein processing the data to be processed by using the trained keyword model to obtain a keyword extraction set comprises:

and processing the deep features to obtain the keyword extraction set.

4. A training method as claimed in claim 3, wherein vectorizing the data to be processed to obtain a multi-dimensional matrix representation comprises:

5. The training method of claim 1, wherein the error correction processing of the original data to obtain error-corrected data comprises:

6. The training method of claim 1, wherein performing data enhancement processing on the error-corrected data based on a synonym vocabulary to obtain first input data comprises:

7. The training method of claim 1, wherein performing a counterlearning process on the error-corrected data based on a projection gradient descent method to obtain second input data comprises:

8. A training device for a keyword extraction model is characterized by comprising:

9. A server, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the training method as claimed in any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the training method according to any one of claims 1 to 7.