CN114781611A

CN114781611A - Natural language processing method, language model training method and related equipment

Info

Publication number: CN114781611A
Application number: CN202210423601.8A
Authority: CN
Inventors: 王伟; 张黔; 陈焕坤; 郑毅
Original assignee: Runlian Software System Shenzhen Co Ltd
Current assignee: Runlian Software System Shenzhen Co Ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-07-22

Abstract

The application relates to the technical field of artificial intelligence, and discloses a natural language processing method, a language model training method and related equipment thereof, wherein the language model training method comprises the following steps: obtaining a corpus set; performing feature extraction on the corpus by using a plurality of feature extraction models to obtain a plurality of feature vectors corresponding to each document in the corpus; obtaining semantic vectors corresponding to the documents based on a plurality of feature vectors corresponding to the documents; clustering semantic vectors corresponding to all documents in the corpus by using a clustering model to obtain a plurality of semantic clusters; respectively training the language model by adopting reinforcement learning according to each semantic cluster to finally obtain parameters of the trained language model corresponding to each semantic cluster; and determining a final language model according to the parameters of the trained language model corresponding to each semantic cluster. The method and the device improve the training efficiency of the language model and reduce the resource consumption in the training process.

Description

Natural language processing method, language model training method and related equipment

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a natural language processing method, a language model training method, and related devices.

Background

At present, a language model based on a deep neural network has outstanding effects in various fields such as semantic understanding and text generation, but most of the current language models rely on a supervised learning mode supported by mass data, and the capacity of the model is increased, so that huge challenges of training and deploying the model are brought. One of the squares is present: the amount of training sample data often reaches the TB level; in the prior art, the existing training corpus is often used directly, which results in greatly increased training time and greater resource consumption. Therefore, how to improve the model training efficiency, shorten the training time, and reduce the resource consumption becomes an urgent problem to be solved.

Disclosure of Invention

The application provides a natural language processing method, a language model training method and related equipment thereof, which are used for solving the problems of low model training efficiency and high consumption in the prior art.

In a first aspect, the present application provides a method for training a language model, including:

obtaining a corpus set;

performing feature extraction on the corpus by using a plurality of feature extraction models to obtain a plurality of feature vectors corresponding to each document in the corpus;

obtaining semantic vectors corresponding to the documents based on the plurality of feature vectors corresponding to the documents;

clustering semantic vectors corresponding to all the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters;

respectively training the language model by adopting reinforcement learning according to each semantic cluster to finally obtain the parameters of the trained language model corresponding to each semantic cluster;

and determining a final language model according to the parameters of the trained language model corresponding to each semantic cluster.

Further, the multiple feature extraction models include a hidden feature extraction model, a topic feature extraction model and an entity feature extraction model, and the feature extraction of the corpus by using the multiple feature extraction models to obtain multiple feature vectors corresponding to each document in the corpus includes:

performing hidden feature extraction on each document in the corpus through the hidden feature extraction model to obtain a first feature vector corresponding to each document;

performing topic feature extraction on each document in the corpus by using the topic feature extraction model to obtain a second feature vector corresponding to each document;

and utilizing the entity feature extraction model to extract entity features of the documents in the corpus to obtain third feature vectors corresponding to the documents.

Further, the extracting the theme features of the documents in the corpus by using the theme feature extraction model to obtain the second feature vector corresponding to each document includes:

performing subject term extraction on each document in the corpus through the subject feature extraction model to obtain a plurality of subject terms and arranging the subject terms;

vectorizing the arranged plurality of subject terms through a Bert model under a subject feature extraction model to obtain a second feature vector corresponding to each document.

Further, the extracting the entity features of the documents in the corpus by using the entity feature extraction model to obtain a third feature vector corresponding to each document includes:

identifying entities in each document and relationships among the entities through a named entity identification technology and a relationship extraction technology in an entity feature extraction model;

constructing a knowledge graph based on the entities and the relationship between the entities;

and performing feature extraction on the knowledge graph through a graph convolution neural network in the entity feature extraction model to obtain a third feature vector.

Further, the obtaining of the semantic vector corresponding to each document based on the plurality of feature vectors corresponding to each document includes:

obtaining the weights of the first feature vector, the second feature vector and the third feature vector based on an analytic hierarchy process;

and according to the weights of the first feature vector, the second feature vector and the third feature vector, carrying out weighted summation on the first feature vector, the second feature vector and the third feature vector to obtain a semantic vector corresponding to the document.

Further, the training the language model by adopting reinforcement learning according to each semantic cluster includes:

in each training period, when the performance index of a language model corresponding to a semantic cluster reaches a preset threshold value, acquiring the current state information of the language model, and broadcasting the state information of the language model to the language model corresponding to each semantic cluster;

after the language model corresponding to each semantic cluster receives the state information, updating the parameters of the language model, and selecting a processing path according to the selection probability; the selection probability is obtained by processing a plurality of semantic vectors used according to the training period through a deep learning neural network;

giving different benefits according to the processing path selected by the language model corresponding to each semantic cluster;

obtaining the total income of the training period according to the income of each language model;

and the deep learning neural network adjusts parameters according to the total income and trains through a plurality of training periods until the total income is converged.

Further, the determining the final language model according to the parameters of the trained language model corresponding to each semantic cluster includes:

after all training periods are finished, summarizing the final gradient data of the language model corresponding to each semantic cluster to a trainer corresponding to the same language model;

the training device carries out average processing according to the final gradient data corresponding to all the language models to obtain an average gradient;

and sending the average gradient to the language model corresponding to each semantic cluster to update the parameters of the language model to obtain the final language model.

In a second aspect, the present application further provides a natural language processing method, including:

acquiring text data to be processed;

and processing the text data to be processed according to the final language model to obtain a processing result corresponding to the text data to be processed.

In a third aspect, the present application further provides a language model training device, including:

the acquisition module is used for acquiring the corpus set;

the feature extraction module is used for performing feature extraction on the corpus by using a plurality of feature extraction models to obtain a plurality of feature vectors corresponding to all documents in the corpus;

a merging module, configured to obtain a semantic vector corresponding to each document based on the plurality of feature vectors corresponding to each document;

the clustering module is used for clustering semantic vectors corresponding to the documents in the corpus to obtain a plurality of semantic clusters;

the training module is used for training the language model by adopting reinforcement learning according to each semantic cluster to finally obtain parameters of the trained language model corresponding to each semantic cluster;

and the determining module is used for determining the final language model according to the parameters of the trained language model corresponding to each semantic cluster.

In a fourth aspect, the present application further provides a computer device, including:

at least one processor; and (c) a second step of,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of language model training as described above.

In a fifth aspect, the present application further provides a non-transitory computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor, implement the language model training method as described above.

Compared with the prior art, the natural language processing method, the language model training method and the related equipment thereof provided by the embodiment of the application have the following beneficial effects:

the method comprises the steps of extracting the characteristics of a corpus by using a plurality of characteristic extraction models through obtaining the corpus, obtaining a plurality of characteristic vectors corresponding to all documents in the corpus, and realizing multi-dimensional extraction of text characteristics in the corpus; obtaining semantic vectors corresponding to the documents based on the plurality of feature vectors corresponding to the documents, combining the plurality of feature vectors to obtain corresponding semantic vectors, realizing integration of text features, clustering the semantic vectors corresponding to the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters, training the language model by adopting reinforcement learning according to the semantic clusters respectively to obtain parameters of the trained language model corresponding to the semantic clusters, and determining the final language model according to the parameters of the trained language model corresponding to the semantic clusters. By utilizing the strength degree of semantic association in corpus centralization, different semantic clusters are divided for parallel training, and a reinforced learning thought is adopted, so that the language model learns more deep language rules as soon as possible, the training time is shortened, the model convergence is accelerated, and the training overhead of the language model is reduced.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a schematic flowchart of a language model training method according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of a language model training method according to another embodiment of the present application;

FIG. 3 is a flowchart of one embodiment of step S220 in FIG. 2;

FIG. 4 is a flowchart of one embodiment of step S230 in FIG. 2;

FIG. 5 is a flowchart of another embodiment of step S5 of FIG. 1;

FIG. 6 is a block diagram of a language model training apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. One skilled in the art will explicitly or implicitly appreciate that the embodiments described herein can be combined with other embodiments.

The application provides a language model training method. Referring to fig. 1, fig. 1 is a schematic flowchart of a language model training method according to an embodiment of the present application.

In this embodiment, the language model training method includes:

s1, obtaining a corpus set;

specifically, the corpus set can be directly obtained from a database, or connected with other systems, and the corpus set, that is, a set of document corpuses used for language model training, is directly obtained from other systems and serves as a document with a label.

S2, performing feature extraction on the corpus by using a plurality of feature extraction models to obtain a plurality of feature vectors corresponding to each document in the corpus;

specifically, a first feature vector, a second feature vector and a third feature vector corresponding to each document in the corpus are obtained by extracting features of the corpus through a hidden feature extraction model, a topic feature extraction model and an entity feature extraction model.

Further, as shown in fig. 2, the multiple feature extraction models include a hidden feature extraction model, a topic feature extraction model, and an entity feature extraction model, and the extracting features of the corpus by using the multiple feature extraction models to obtain multiple feature vectors corresponding to each document in the corpus includes:

s210, extracting the hidden features of the documents in the corpus through the hidden feature extraction model to obtain first feature vectors corresponding to the documents;

s220, extracting the theme features of the documents in the corpus by using the theme feature extraction model to obtain second feature vectors corresponding to the documents;

and S230, performing entity feature extraction on each document in the corpus by using the entity feature extraction model to obtain a third feature vector corresponding to each document.

Specifically, the implicit feature extraction model is obtained by directly extracting implicit features of a document to obtain a first feature vector, the first feature vector is a feature vector of a document level, and the implicit feature extraction model can be obtained by training based on a TextCNN model;

the topic feature extraction model is used for extracting keywords in the document, and after the keywords are sequenced, vectorization is carried out to obtain a second feature vector, wherein the second feature vector is a whole vector of a topic word of the document;

the entity feature extraction model is used for extracting entities in all documents to construct a knowledge graph, and then extracting feature vectors in the knowledge graph by using a graph convolution neural network to obtain the third feature vectors.

The document is processed by utilizing various feature extraction models, so that feature vectors of multiple dimensions can be obtained, and text features can be better embodied.

Still further, as shown in fig. 3, the performing, by using the topic feature extraction model, topic feature extraction on each document in the corpus to obtain a second feature vector corresponding to each document includes:

s221, extracting subject terms of each document in the corpus through the subject feature extraction model to obtain a plurality of subject terms and arranging the subject terms;

s222, vectorizing the arranged plurality of subject terms through a Bert model under a subject feature extraction model to obtain a second feature vector corresponding to each document.

Specifically, only subject words are extracted from each document in the corpus through a subject feature extraction model, the number of the subject words is set according to needs, a sequence is formed by splicing and sequencing a plurality of subject words, the sequence is input into a trained Bert model, and the second feature vector is output through the conversion of the Bert model.

The topic feature extraction model is used for extracting topic words, after the topic words are sorted, the Bert model is input for vectorization processing, so that the topic features of the document are obtained, and the model can learn more internal language rules earlier during subsequent training.

Still further, as shown in fig. 4, the performing, by using the entity feature extraction model, entity feature extraction on each document in the corpus to obtain a third feature vector corresponding to each document includes:

s231, identifying entities in the documents and relations among the entities through a named entity identification technology and a relation extraction technology in an entity feature extraction model;

s232, constructing a knowledge graph based on the entities and the relationship among the entities;

and S233, performing feature extraction on the knowledge graph through a graph convolution neural network in the entity feature extraction model to obtain a third feature vector.

Specifically, by means of a named entity identification technology, the Bert-Bi _ LSTM-CRF is adopted in the method for identifying the entities in the documents, the relation between the entities is extracted by a relation extraction technology, and after the relation between the entities is obtained, the embedded vector of the relation between the entities can be obtained by adopting TransE and a subsequent improving party through calculation.

A knowledge graph is constructed by building such triples in entity-to-entity relationships.

Performing feature extraction on the knowledge graph through a graph convolution neural network in an entity feature extraction model to obtain a third feature vector, wherein the number of layers of the graph convolution neural network can be set according to needs; for example, when there are n vertices, i.e., n entities, in the knowledgegraph, the embedded vector dimension of each vertex is m, defining a matrix X ∈ R^n×mDefining a vector

Wherein A is an adjacent matrix of nodes in the knowledge graph, M is an in-out degree matrix,

by passing

L⁰Calculated as X, j denotes the number of layers of the graph convolution network, D_ggData of a g row and a g column of the in-degree matrix are shown, namely data on a diagonal line; a. the_geData representing the g-th row and e-th column of the adjacency matrix, W₀As a weight matrix, the last layer of the vector with σ as the activation function (Relu, Sigmoid, etc. can be used) is the third feature vector,

the relation between the entity and the entity is extracted from the document to obtain the knowledge graph, and the feature extraction is carried out on the knowledge graph by utilizing the graph convolution neural network, so that the model can learn more internal language rules earlier during subsequent training.

S3, obtaining semantic vectors corresponding to the documents based on the feature vectors corresponding to the documents;

specifically, weighted summation is performed according to a first feature vector, a second feature vector and a third feature vector corresponding to each document, so as to obtain a semantic vector corresponding to each document.

Specifically, the weights of the first feature vector, the second feature vector and the third feature vector are obtained through an analytic hierarchy process; the Analytic Hierarchy Process (AHP) refers to a decision method for performing qualitative and quantitative analysis based on the decomposition of elements always related to decision into levels such as targets, criteria, schemes, and the like. And according to the weights of the first feature vector, the second feature vector and the third feature vector, carrying out weighted summation on the first feature vector, the second feature vector and the third feature vector to obtain a semantic vector corresponding to the document.

And weighting and summing the feature vectors based on the weight obtained by the analytic hierarchy process to obtain semantic vectors corresponding to the documents, so that the complete extraction of the document features is realized, and the model can learn more internal language rules earlier during subsequent training.

S4, clustering semantic vectors corresponding to the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters;

specifically, semantic vectors corresponding to each document are clustered by using a clustering model, and since the number of trainers may be limited in the application in the following process, the clustering model used in the application is a K-means clustering model, wherein K is the number of trainers. And when the number of the trainers is not limited, processing can be performed by adopting a mean shift clustering model and the like, and clustering is performed according to the real situation of the semantic vector corresponding to each document.

The K-means clustering model is a clustering analysis algorithm for iterative solution, and the method comprises the steps of dividing data into K groups in advance, randomly selecting K objects as initial clustering centers, calculating the distance between each object and each seed clustering center, and allocating each object to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster.

The mean shift clustering model is a sliding window based algorithm to find dense regions of data points. This is a centroid based algorithm that locates the center point of each group/class by updating the candidate points for the center point to the mean of the points within the sliding window. And then removing similar windows of the candidate windows to finally form a central point set and corresponding groups.

S5, respectively training the language model by adopting reinforcement learning according to each semantic cluster, and finally obtaining the parameters of the trained language model corresponding to each semantic cluster;

specifically, the semantic vectors in each semantic cluster are used for training the language model respectively, that is, when there are a plurality of semantic clusters, a plurality of language models are trained correspondingly at the same time, and the training mode is performed in a reinforcement learning mode, so that the total profit is converged and maximized finally, and the parameters of the trained language model corresponding to each semantic cluster are obtained finally.

Further, as shown in fig. 5, the training the language model by reinforcement learning according to each semantic cluster includes:

s501, in each training period, when the performance index of a language model corresponding to a semantic cluster reaches a preset threshold value, acquiring the current state information of the language model, and broadcasting the state information of the language model to the language models corresponding to the semantic clusters;

s502, after the language model corresponding to each semantic cluster receives the state information, updating parameters of the language model, and selecting a processing path according to the selection probability; the selection probability is obtained by processing a plurality of semantic vectors used according to the training period through a deep learning neural network;

s503, giving different benefits according to the processing path selected by the language model corresponding to each semantic cluster;

s504, obtaining the total income of the training period according to the income of each language model;

and S505, the deep learning neural network adjusts parameters according to the total income, and training is carried out in a plurality of training periods until the total income is converged.

Specifically, a language model is trained for each semantic cluster, that is, each language model is trained by using semantic vectors in different semantic clusters. In each training period, detecting the performance index of each language model at any time in the training process, and when the performance index reaches a preset threshold value, acquiring the current state information of the language model, wherein the state information comprises the number Ns of samples used when the training of the language cluster is finished and the current gradient information; broadcasting the state information by the language model corresponding to each semantic cluster;

other language models have three processing paths after receiving state information:

1) immediately ending the training in the period, and recording the own performance index value (defined as F1)_o) And the number of used samples Nt; calculating the average value of the gradient sent by the other party and the gradient of the own party at the moment, updating the own loss function, and obtaining a new performance index value (defined as F1)_N) (ii) a If F1_oLess than F1_NThen give a profit

Otherwise, the profit is given

2) Record the current local performance metric value (defined as F2)_o) Calculating the average value of the gradient sent by the other party and the gradient of the own party at the moment, updating the loss function of the own party, continuing training until the training period is finished to obtain a new performance index value (F2)_N) (ii) a Let the number of samples used in a training period be Nb if F2_oLess than F2_NThen give a profit

Otherwise, a benefit is given

3) Recording the current value of own performance index (F3)_o) Calculating the average value of the gradient sent by the other party and the gradient of the own party at the moment, and updating the loss function of the own party; on the basis of the number of samples Nt used by the current training of the own party, the training period is ended after the samples with the quantity delta N are selected randomly by training, and a new performance index value is obtained (F3)_N) (ii) a If F3_oLess than F3_NThen give a profit

Otherwise, a benefit is given

Selecting a processing path according to the selection probability; the selection probability is obtained by processing a plurality of semantic vectors used according to the training period through a deep learning neural network; the method specifically comprises the following steps:

and (4) optimizing by adopting a strategy gradient method in the field of reinforcement learning, so that the total training yield is maximized, and the specific process is as follows. A multi-layer neural network with the corresponding classes of the 3 actions is trained. Taking a two-layer neural network as an example, inputting a vector v (i.e., a semantic vector used in the training period), setting a hidden layer weight matrix as w1, adopting a relu activation function, setting an offset as b1, and outputting o1 ═ relu (w1 × v + b 1); let w2 be the second hidden layer weight matrix, b2 be the offset, and o2 be equal to relu (w2 o1+ b2), then the probability of taking one processing path at a time is obtained through softmax layer o3, o 3. More hidden layers can be adopted in practical use to obtain better effect. The specific benefit is obtained by selecting the processing path with the maximum probability for processing;

after a training period is finished, summarizing profits corresponding to the language models to obtain a total training benefit; the total training gain is obtained according to the following formula:

wherein γ is a gain attenuation coefficient, n is the number of training cycles, i is 1 to (n-1), S_tThe total training gain for the t-th training period.

And the deep learning neural network adjusts parameters according to the total income of each training period, and the training of a plurality of training periods is carried out until the total income is converged. The upgrading network can be optimized by adopting methods such as SGD (generalized gateway device) and Adam.

The training samples are divided into different semantic clusters for parallel training by using the strength of semantic association between training corpora, and a reinforcement learning mode is adopted, so that the model learns more internal language rules as early as possible, the convergence of the model is accelerated, and the training overhead of the language model is reduced.

In other embodiments of the present application, the language model corresponding to each semantic cluster may be configured to train an agent that is responsible for training algorithm program execution, resource application, and communication with other agents throughout the training process.

And S6, determining the final language model according to the parameters of the trained language model corresponding to each semantic cluster.

Parameters of the trained language model corresponding to each semantic cluster, mainly gradient data, are sent to the same language model for gathering, averaging is carried out to obtain an average gradient, the average gradient is sent to the language model corresponding to each semantic cluster for parameter updating, the updated language models are gathered, and a final language model is obtained.

The final language model can complete tasks such as machine translation, part of speech tagging, syntactic analysis and classification.

the trainer carries out average processing according to the final gradient data corresponding to all the language models to obtain an average gradient;

and sending the average gradient to the language model corresponding to each semantic cluster so as to update the parameters of the language model to obtain the final language model.

Specifically, after all training periods are finished, summarizing the final gradient data of the language model corresponding to each semantic cluster to a trainer corresponding to the same language model; specifically, the language model with the lowest workload at this time may be summarized to a trainer corresponding to the language model, the average gradients of all the gradients are calculated, the average gradients are sent to the language models corresponding to the semantic clusters, parameters of the language models are updated, and the updated language models are summarized to obtain the final language model.

And finally, the average gradient is calculated, and each language model updates the parameters thereof by using the average gradient, so that the language model is further optimized.

The method comprises the steps of obtaining a corpus, utilizing multiple feature extraction models to perform feature extraction on the corpus to obtain multiple feature vectors corresponding to all documents in the corpus, obtaining multiple feature vectors corresponding to all documents in the corpus, and achieving multi-dimensional extraction of text features in the corpus; obtaining semantic vectors corresponding to the documents based on the plurality of feature vectors corresponding to the documents, combining the plurality of feature vectors to obtain corresponding semantic vectors, realizing integration of text features, clustering the semantic vectors corresponding to the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters, training the language model by adopting reinforcement learning according to the semantic clusters respectively to obtain parameters of the trained language model corresponding to the semantic clusters, and determining the final language model according to the parameters of the trained language model corresponding to the semantic clusters. By utilizing the strength degree of semantic association in corpus centralization, different semantic clusters are divided for parallel training, and a reinforced learning thought is adopted, so that the language model learns more deep language rules as soon as possible, the training time is shortened, the model convergence is accelerated, and the training overhead of the language model is reduced.

The embodiment of the present application further provides a natural language processing method, where the method includes:

acquiring text data to be processed;

Specifically, text data to be processed is acquired, the text data to be processed is processed according to the trained final language model, all models in the final language model can be used for processing the text data to be processed, or the text data to be processed is classified first, which semantic cluster the text data to be processed belongs to is judged, and a corresponding processing result is obtained by processing the text data to be processed by using a language model corresponding to the final language model based on the semantic cluster corresponding to the text data to be processed.

Furthermore, the language model training method can quickly learn more internal language rules by performing corresponding training according to the labels carried by each text in the corpus, thereby accelerating model convergence and reducing the training overhead of the language model.

And the final language model is adopted for use, so that the output processing result is more excellent and the speed is higher.

The present embodiment further provides a language model training device, as shown in fig. 6, which is a functional block diagram of the language model training device according to the present application.

The language model training device 100 may be installed in an electronic device. According to the implemented functions, the language model training device 100 may include an obtaining module 101, a feature extraction module 102, a merging module 103, a clustering module 104, a training module 105, and a determination module 106. A module, also referred to as a unit in this application, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

an obtaining module 101, configured to obtain a corpus set;

a feature extraction module 102, configured to perform feature extraction on the corpus by using multiple feature extraction models to obtain multiple feature vectors corresponding to each document in the corpus;

further, the multiple feature extraction models include a hidden feature extraction model, a topic feature extraction model and an entity feature extraction model, and the feature extraction module 102 includes a first extraction submodule, a second extraction submodule and a third extraction submodule;

the first extraction submodule is configured to perform implicit feature extraction on each document in the corpus through the implicit feature extraction model to obtain a first feature vector corresponding to each document;

the second extraction submodule is used for extracting the theme features of the documents in the corpus by using the theme feature extraction model to obtain second feature vectors corresponding to the documents;

and the third extraction submodule is used for extracting the entity features of the documents in the corpus by using the entity feature extraction model to obtain a third feature vector corresponding to each document.

Through the cooperation of the first extraction submodule, the second extraction submodule and the third extraction submodule, the documents are processed by utilizing various feature extraction models, so that feature vectors of multiple dimensions can be obtained, and text features can be better embodied.

Still further, the second extraction sub-module further comprises a subject extraction unit and a vectorization unit;

the topic extraction unit is used for extracting topic words from the documents in the corpus through the topic feature extraction model to obtain a plurality of topic words and arranging the topic words;

the vectorization unit is configured to perform vectorization processing on the arranged multiple topic words through a Bert model under a topic feature extraction model to obtain a second feature vector corresponding to each document.

Through the cooperation of the theme extraction unit and the vectorization unit, the theme word is extracted by using the theme feature extraction model, and after the theme word is sequenced, the Bert model is input for vectorization processing, so that the theme feature of the document is obtained, and the model can learn more internal language rules earlier during subsequent training.

Still further, the third extraction submodule further comprises an entity extraction unit, a construction unit and a graph convolution extraction unit;

the entity extraction unit is used for identifying entities in the documents and the relationships among the entities through a named entity identification technology and a relationship extraction technology in an entity feature extraction model;

the construction unit is used for constructing a knowledge graph based on the entities and the relationship among the entities;

and the graph convolution extraction unit is used for extracting the features of the knowledge graph through a graph convolution neural network in the entity feature extraction model to obtain a third feature vector.

Through the cooperation of the entity extraction unit, the construction unit and the graph convolution extraction unit, the relation between the entity and the entity is extracted from the document to obtain the knowledge graph, and the graph convolution neural network is used for extracting the characteristics of the knowledge graph, so that the model can learn more internal language rules earlier during subsequent training.

A merging module 103, configured to obtain a semantic vector corresponding to each document based on the plurality of feature vectors corresponding to each document;

further, the merging module 103 includes a weight obtaining sub-module and a weighted summation sub-module;

the weight obtaining submodule is used for obtaining the weights of the first feature vector, the second feature vector and the third feature vector based on an analytic hierarchy process;

and the weighted summation submodule is used for carrying out weighted summation on the first feature vector, the second feature vector and the third feature vector according to the weights of the first feature vector, the second feature vector and the third feature vector to obtain the semantic vector corresponding to the document.

The weight obtaining submodule and the weighted summation submodule are matched, the weight is obtained based on an analytic hierarchy process, the feature vector is weighted and summed based on the weight, the semantic vector corresponding to the document is obtained, therefore, the complete extraction of the document feature is realized, and the model can learn more internal language rules earlier during subsequent training.

A clustering module 104, configured to cluster semantic vectors corresponding to the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters;

the training module 105 is configured to train the language model by reinforcement learning according to each semantic cluster, and finally obtain parameters of the trained language model corresponding to each semantic cluster;

further, the training module 105 includes a broadcasting sub-module, a path selection sub-module, a corresponding processing sub-module, a profit calculation sub-module, and a parameter adjusting sub-module;

the broadcasting submodule is used for acquiring the state information of the language model at the moment when the performance index of the language model corresponding to a semantic cluster reaches a preset threshold value in each training period, and broadcasting the state information of the language model to the language model corresponding to each semantic cluster;

the path selection submodule is used for updating self parameters after the language model corresponding to each semantic cluster receives the state information and selecting a processing path according to the selection probability; the selection probability is obtained by processing a plurality of semantic vectors used according to the training period through a deep learning neural network;

the corresponding processing sub-module is used for giving different benefits according to the processing path selected by the language model corresponding to each semantic cluster;

the income calculation submodule is used for obtaining the total income of the training period according to the income of each language model;

and the parameter adjusting sub-module is used for adjusting parameters of the deep learning neural network according to the total income, and training the deep learning neural network through a plurality of training periods until the total income is converged.

Through the cooperation of the broadcasting submodule, the path selection submodule, the corresponding processing submodule, the profit calculation submodule and the parameter adjusting submodule, training samples are divided into different semantic clusters for parallel training by utilizing the strength of semantic association between training corpora, and a reinforcement learning mode is adopted, so that the model learns more internal language rules as early as possible, the model convergence is accelerated, and the training overhead of the language model is reduced.

And the determining module 106 is configured to determine a final language model according to the parameters of the trained language model corresponding to each semantic cluster.

Further, the determining module 106 includes a summarizing sub-module, an averaging sub-module, and a sending sub-module;

the summarization submodule is used for summarizing the final gradient data of the language model corresponding to each semantic cluster to a trainer corresponding to the same language model after all training periods are finished;

the average submodule is used for the trainer to carry out average processing according to the final gradient data corresponding to all the language models to obtain an average gradient;

and the sending submodule is used for sending the average gradient to the language model corresponding to each semantic cluster so as to update parameters of the sending submodule and obtain the final language model.

And finally, the average gradient is calculated through the matching of the collecting sub-module, the averaging sub-module and the sending sub-module, and each language model updates the parameters thereof by using the average gradient, so that the language model is further optimized.

By adopting the device, the language model training device 100 performs feature extraction on the corpus by using a plurality of feature extraction models through the matching use of the acquisition module 101, the feature extraction module 102, the combination module 103, the clustering module 104, the training module 105 and the determination module 106, so as to obtain a plurality of feature vectors corresponding to each document in the corpus, and realize multi-dimensional extraction of text features in the corpus; obtaining semantic vectors corresponding to the documents based on the plurality of feature vectors corresponding to the documents, combining the plurality of feature vectors to obtain corresponding semantic vectors, realizing integration of text features, clustering the semantic vectors corresponding to the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters, training the language model by adopting reinforcement learning according to the semantic clusters respectively to obtain parameters of the trained language model corresponding to the semantic clusters, and determining the final language model according to the parameters of the trained language model corresponding to the semantic clusters. By utilizing the strength degree of semantic association in corpus concentration, different semantic clusters are divided for parallel training, and a reinforcement learning thought is adopted, so that a language model learns more deeper language rules as early as possible, the training time is shortened, the model convergence is accelerated, and the training overhead of the language model is reduced.

The present embodiment also provides a natural language processing apparatus, which can be installed in an electronic device. According to the implemented functions, the natural language processing device may include a data acquisition module and a processing module. A module, which may also be referred to as a unit in this application, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions of the respective modules/units are as follows:

the data acquisition module is used for acquiring text data to be processed;

and the processing module is used for processing the text data to be processed according to the final language model to obtain a processing result corresponding to the text data to be processed.

Through the cooperation of the data acquisition module and the processing module, the final language model is adopted for use, so that the output processing result is more optimal and the speed is higher.

The embodiment of the application further provides computer equipment. Referring to fig. 7, fig. 7 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both an internal storage unit of the computer device 4 and an external storage device thereof. In this embodiment, the memory 41 is generally used for storing an operating system and various application software installed on the computer device 4, such as computer readable instructions of a language model training method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the language model training method.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

In this embodiment, when the processor executes the computer readable instructions stored in the memory, the steps of the above-described embodiment language model training method are implemented, by obtaining a corpus, and performing feature extraction on the corpus by using multiple feature extraction models to obtain multiple feature vectors corresponding to each document in the corpus, and to implement multidimensional extraction of text features in the corpus; obtaining semantic vectors corresponding to the documents based on the plurality of feature vectors corresponding to the documents, combining the plurality of feature vectors to obtain corresponding semantic vectors, realizing integration of text features, clustering the semantic vectors corresponding to the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters, training the language model by adopting reinforcement learning according to the semantic clusters respectively to obtain parameters of the trained language model corresponding to the semantic clusters, and determining the final language model according to the parameters of the trained language model corresponding to the semantic clusters. By utilizing the strength degree of semantic association in corpus centralization, different semantic clusters are divided for parallel training, and a reinforced learning thought is adopted, so that the language model learns more deep language rules as soon as possible, the training time is shortened, the model convergence is accelerated, and the training overhead of the language model is reduced.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable instruction is stored, and the computer-readable instruction can be executed by at least one processor, so that the at least one processor executes the steps of the language model training method, and by obtaining a corpus, and performing feature extraction on the corpus by using multiple feature extraction models, a plurality of feature vectors corresponding to documents in the corpus are obtained, and a plurality of feature vectors corresponding to documents in the corpus are obtained, so as to implement multidimensional extraction of text features in the corpus; obtaining semantic vectors corresponding to the documents based on the plurality of feature vectors corresponding to the documents, combining the plurality of feature vectors to obtain corresponding semantic vectors, realizing integration of text features, clustering the semantic vectors corresponding to the documents in the corpus by using a clustering model to obtain a plurality of semantic clusters, training the language model by adopting reinforcement learning according to the semantic clusters respectively to obtain parameters of the trained language model corresponding to the semantic clusters, and determining the final language model according to the parameters of the trained language model corresponding to the semantic clusters. By utilizing the strength degree of semantic association in corpus centralization, different semantic clusters are divided for parallel training, and a reinforced learning thought is adopted, so that the language model learns more deep language rules as soon as possible, the training time is shortened, the model convergence is accelerated, and the training overhead of the language model is reduced.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

The language model training device, the computer device, and the computer-readable storage medium according to the above embodiments of the present application have the same technical effects as the language model training method according to the above embodiments, and are not expanded herein.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for language model training, the method comprising:

obtaining a corpus set;

2. The method according to claim 1, wherein the plurality of feature extraction models include a hidden feature extraction model, a topic feature extraction model, and an entity feature extraction model, and the extracting the features of the corpus using the plurality of feature extraction models to obtain a plurality of feature vectors corresponding to each document in the corpus comprises:

extracting hidden features of the documents in the corpus through the hidden feature extraction model to obtain first feature vectors corresponding to the documents;

3. The method according to claim 2, wherein the extracting the topic feature of each document in the corpus by using the topic feature extraction model to obtain the second feature vector corresponding to each document comprises:

vectorizing the arranged plurality of topic words through a Bert model under a topic feature extraction model to obtain a second feature vector corresponding to each document.

4. The method according to claim 2, wherein the extracting the entity features of the documents in the corpus by using the entity feature extraction model to obtain a third feature vector corresponding to each document comprises:

identifying entities in the documents and relations among the entities through a named entity identification technology and a relation extraction technology in an entity feature extraction model;

constructing a knowledge graph based on the entities and the relationship among the entities;

5. The method of claim 2, wherein obtaining the semantic vector corresponding to each of the documents based on the plurality of feature vectors corresponding to each of the documents comprises:

6. The method for training the language model according to claim 1, wherein the training the language model by reinforcement learning according to each semantic cluster comprises:

in each training period, when the performance index of the language model corresponding to a semantic cluster reaches a preset threshold value, acquiring the current state information of the language model, and broadcasting the state information of the language model to the language model corresponding to each semantic cluster;

after the language model corresponding to each semantic cluster receives the state information, updating parameters of the language model, and selecting a processing path according to the selection probability; the selection probability is obtained by processing a plurality of semantic vectors used according to the training period through a deep learning neural network;

and the deep learning neural network adjusts parameters according to the total income, and trains through a plurality of training periods until the total income is converged.

7. The method for training a language model according to claim 1, wherein the determining the final language model according to the parameters of the trained language model corresponding to each semantic cluster comprises:

8. A method of natural language processing, the method comprising:

acquiring text data to be processed;

the final language model according to any one of claims 1 to 7, processing the text data to be processed to obtain a processing result corresponding to the text data to be processed.

9. An apparatus for training a language model, the apparatus comprising:

the acquisition module is used for acquiring the corpus set;

the feature extraction module is used for extracting features of the corpus by using various feature extraction models to obtain a plurality of feature vectors corresponding to each document in the corpus;

the clustering module is used for clustering semantic vectors corresponding to all the documents in the corpus to obtain a plurality of semantic clusters;

the training module is used for respectively training the language model by adopting reinforcement learning according to each semantic cluster to finally obtain the parameters of the trained language model corresponding to each semantic cluster;

10. A computer device, characterized in that the computer device comprises:

at least one processor; and (c) a second step of,

the memory stores computer readable instructions which, when executed by the processor, implement the language model training method of any one of claims 1 to 7.

11. A computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the language model training method of any one of claims 1 to 7.