CN116502640B - Text characterization model training method and device based on context - Google Patents

Text characterization model training method and device based on context Download PDF

Info

Publication number
CN116502640B
CN116502640B CN202310779760.6A CN202310779760A CN116502640B CN 116502640 B CN116502640 B CN 116502640B CN 202310779760 A CN202310779760 A CN 202310779760A CN 116502640 B CN116502640 B CN 116502640B
Authority
CN
China
Prior art keywords
word
words
training
loss
characterization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310779760.6A
Other languages
Chinese (zh)
Other versions
CN116502640A (en
Inventor
孙海亮
暴宇健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xumi Yuntu Space Technology Co Ltd filed Critical Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority to CN202310779760.6A priority Critical patent/CN116502640B/en
Publication of CN116502640A publication Critical patent/CN116502640A/en
Application granted granted Critical
Publication of CN116502640B publication Critical patent/CN116502640B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure relates to the technical field of machine learning, and provides a text characterization model training method and device based on context. The method comprises the following steps: determining a low-dimensional vector representation of each word through a nonlinear coding network; determining context latent vectors of the first K words through a long-short-term memory network according to low-dimensional vector characterization of the first K words; respectively passing the upper and lower Wen Qian vectors of the K-th word through N-K fully connected neural networks to obtain predictive vector characterization from the K+1th word to the N-th word; calculating self-supervision loss according to the low-dimensional vector characterization and the predictive vector characterization of the (K+1) -th word to the (N) -th word; positive samples and negative samples of the first K words are determined from the training data set, and a contrast learning loss is calculated according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words; and updating model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model.

Description

Text characterization model training method and device based on context
Technical Field
The disclosure relates to the technical field of machine learning, in particular to a text characterization model training method and device based on context.
Background
Text characterization is to represent text information by numbers, and text encoding is a text characterization method. Text characterization may be used for text recognition, text processing, text transmission, text marking, and the like. To improve the efficiency of text characterization, machine learning may be utilized for text characterization. Currently used machine learning models, such as diffusion models and generative countermeasure networks, do not consider the relevance of text context when text characterization is performed, and moreover, the commonly used loss function has weak guiding effect on text characterization model training.
In the process of implementing the disclosed concept, the inventor finds that at least the following technical problems exist in the related art: the relevance of the text context cannot be considered when the model characterizes the text, and the guiding effect of the loss function on model training is weak.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a computer readable storage medium for training a text characterization model based on context, so as to solve the problem in the prior art that the relevance of the text context cannot be considered when the model characterizes the text, and the guiding effect of the loss function on model training is weak.
In a first aspect of an embodiment of the present disclosure, a text characterization model training method based on context is provided, including: constructing a nonlinear coding network by utilizing a plurality of embedded layers, and constructing a text characterization model by utilizing the nonlinear coding network, a long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected behind each fully-connected neural network; acquiring a training data set, dividing training texts in the training data set according to words, and recording the number of the words obtained by division as N, wherein the position of each word in the training texts determines the sequence between the word and other words; determining a low-dimensional vector representation of each word through a nonlinear coding network; determining context latent vectors of the first K words through a long-short-term memory network according to the low-dimensional vector representation of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the previous word of the word; respectively passing the upper and lower Wen Qian vectors of the K-th word through N-K fully connected neural networks to obtain predictive vector characterization from the K+1th word to the N-th word; calculating self-supervision loss according to the low-dimensional vector characterization and the predictive vector characterization of the (K+1) -th word to the (N) -th word; positive samples and negative samples of the first K words are determined from the training data set, and a contrast learning loss is calculated according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words; and updating model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model.
In a second aspect of the embodiments of the present disclosure, there is provided a text characterization model training device based on context, including: the construction module is configured to construct a nonlinear coding network by utilizing a plurality of embedded layers, and construct a text characterization model by utilizing the nonlinear coding network, the long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected to each fully-connected neural network; the segmentation module is configured to acquire a training data set, divide training texts in the training data set according to words, and record the number of words obtained by division as N, wherein the position of each word in the training texts determines the sequence between the word and other words; a first determination module configured to determine a low-dimensional vector representation of each term over a nonlinear encoding network; a second determining module configured to determine context latent vectors of the first K words through the long-short term memory network based on the low-dimensional vector representations of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the preceding word of the word; the third determining module is configured to enable the upper Wen Qian vector and the lower Wen Qian vector of the K word to respectively pass through N-K fully-connected neural networks to obtain predictive vector characterization of the K+1th word to the N word; a first calculation module configured to calculate a self-supervising penalty from the low-dimensional vector characterizations and the predictive vector characterizations of the k+1th term through the nth term; the second calculation module is configured to determine positive samples and negative samples of the first K words from the training data set, and calculate contrast learning loss according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words; and the updating module is configured to update the model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model.
In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.
Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: because the embodiment of the disclosure constructs a nonlinear coding network by utilizing a plurality of embedded layers, constructs a text characterization model by utilizing the nonlinear coding network, a long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected to each fully-connected neural network; acquiring a training data set, dividing training texts in the training data set according to words, and recording the number of the words obtained by division as N, wherein the position of each word in the training texts determines the sequence between the word and other words; determining a low-dimensional vector representation of each word through a nonlinear coding network; determining context latent vectors of the first K words through a long-short-term memory network according to the low-dimensional vector representation of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the previous word of the word; respectively passing the upper and lower Wen Qian vectors of the K-th word through N-K fully connected neural networks to obtain predictive vector characterization from the K+1th word to the N-th word; calculating self-supervision loss according to the low-dimensional vector characterization and the predictive vector characterization of the (K+1) -th word to the (N) -th word; positive samples and negative samples of the first K words are determined from the training data set, and a contrast learning loss is calculated according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words; according to the self-supervision loss and contrast learning loss, model parameters of the text characterization model are updated to complete training of the text characterization model, so that the problem that in the prior art, the relevance of text contexts cannot be considered when the text is characterized by the model, and the guiding effect of a loss function on the model training is weak can be solved, further, the text characterization information can fully represent information in the text, and the guiding effect of the loss function on the model training is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a flow chart diagram (I) of a context-based text characterization model training method provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart diagram (II) of a context-based text characterization model training method provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a context-based text characterization model training device provided by an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
Fig. 1 is a schematic flow diagram (a) of a text characterization model training method based on context according to an embodiment of the disclosure. The context-based text characterization model training method of FIG. 1 may be performed by a computer or server, or software on a computer or server. As shown in fig. 1, the text characterization model training method based on the context includes:
s101, constructing a nonlinear coding network by utilizing a plurality of embedded layers, and constructing a text characterization model by utilizing the nonlinear coding network, a long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected to each fully-connected neural network;
s102, acquiring a training data set, dividing training texts in the training data set according to words, and recording the number of words obtained by division as N, wherein the position of each word in the training texts determines the sequence between the word and other words;
s103, determining low-dimensional vector characterization of each word through a nonlinear coding network;
s104, determining context latent vectors of the first K words through a long-short term memory network according to low-dimensional vector characterization of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector characterization of the word and the context latent vector of the previous word of the word;
s105, the upper and lower Wen Qian vectors of the Kth word are respectively subjected to N-K fully connected neural networks to obtain predictive vector characterization from the Kth+1th word to the Nth word;
s106, calculating self-supervision loss according to the low-dimensional vector characterization and the predictive vector characterization of the (K+1) -th word to the (N) -th word;
s107, positive samples and negative samples of the first K words are determined from the training data set, and contrast learning loss is calculated according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words;
and S108, updating model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model.
The Long-Short Term Memory network is LSTM (Long Short-Term Memory). The nonlinear coding network is followed by a long-term memory network, the long-term memory network is followed by N-K fully-connected neural networks which are mutually parallel, and each fully-connected neural network is followed by a Relu activation function, so that a text representation model is obtained. Each fully connected neural network may be single-layered. Because each fully-connected neural network is randomly initialized, parameters within the plurality of fully-connected neural networks are all different. The training dataset has a plurality of training texts, which may be regarded as processing one training text for ease of understanding. The sequence exists among words obtained by dividing the training text, and the earlier the words are in the training text, the earlier the sequence is. For example, a training text is divided into nine words, which are sequentially No. 1, no. 2, no. 3, no. 4, no. 5, no. 6, no. 7, no. 8, and No. 9, all words before No. 3 are No. 1 and No. 2, and the word before No. 3 is No. 2.
N is the number of words, K is a set fixed value, for example, N is 9,K and is 4, then one training text is divided into nine words with sequencing, and 5 words are fully connected with the neural network. All words pass through a nonlinear coding network to obtain low-dimensional vector characterization of each word, and the low-dimensional vector characterization of the first 4 words pass through a long-short-term memory network to obtain context latent vectors of the first 4 words (each word has a context latent vector); the upper and lower Wen Qian vectors of the 4 th word are respectively passed through 5 fully connected neural networks, and 5 obtained results are used as predictive vector characterization of the last 5 words (each word has a predictive vector characterization); calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the last 5 words; the contrast learning penalty is calculated from the low-dimensional vector characterizations of the positive and negative samples of the first 4 and the first 4 words (one low-dimensional vector characterization for each word). Positive and negative samples of the first K words are determined from the training dataset, and positive and negative samples of the first K words are determined from the partitioned training dataset. The positive and negative samples of a word are determined according to the distance between words, for example, the positive sample of the 4 th word may be the 5 th word, the negative sample of the 4 th word may be the 8 th word, that is, the k+1th word is taken as the positive sample of the K-th word, and the k+4th word is taken as the negative sample of the K-th word.
Note that since word 1 has no word before, the upper and lower Wen Qian vectors of word 1 are only related to the low-dimensional vector characterization of word 1.
Optionally, dividing the training texts in the training data set according to preset lengths to obtain N small texts, wherein the position of each small text in the training texts determines the sequence between the small text and other small texts; determining a low-dimensional vector representation of each small text; determining context latent vectors of the first K small texts, and determining predictive vector characterization of the (K+1) th small text to the (N) th small text; calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation from the (K+1) th small text to the (N) th small text; positive and negative samples of the first K small texts are determined from the training dataset, and a contrast learning penalty is calculated from the first K small texts and the low-dimensional vector characterizations of the positive and negative samples of the first K small texts.
According to the technical scheme provided by the embodiment of the disclosure, a nonlinear coding network is constructed by utilizing a plurality of embedded layers, a text characterization model is constructed by utilizing the nonlinear coding network, a long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected to each fully-connected neural network; acquiring a training data set, dividing training texts in the training data set according to words, and recording the number of the words obtained by division as N, wherein the position of each word in the training texts determines the sequence between the word and other words; determining a low-dimensional vector representation of each word through a nonlinear coding network; determining context latent vectors of the first K words through a long-short-term memory network according to the low-dimensional vector representation of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the previous word of the word; respectively passing the upper and lower Wen Qian vectors of the K-th word through N-K fully connected neural networks to obtain predictive vector characterization from the K+1th word to the N-th word; calculating self-supervision loss according to the low-dimensional vector characterization and the predictive vector characterization of the (K+1) -th word to the (N) -th word; positive samples and negative samples of the first K words are determined from the training data set, and a contrast learning loss is calculated according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words; and updating model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model. According to the application, the context latent vector of each word is related to the context latent vector of the previous word (namely, the context association is considered in the text representation), and the self-supervision loss and the contrast learning loss are integrated, so that the problems that the context association cannot be considered when the text is represented by the model and the guiding effect of the loss function on the model training is weaker in the prior art can be solved by adopting the technical means, and further, the text representation information can fully represent the information in the text and the guiding effect of the loss function on the model training is improved.
In an alternative embodiment, after acquiring the training data set, the method further comprises: dividing the training data set into a first training set, a second training set and a third training set according to a preset proportion; in the text characterization model: calculating the contrast learning loss of the first training set, freezing model parameters of the long-term memory network and all the fully-connected neural networks, and updating the model parameters of the nonlinear coding network according to the contrast learning loss of the first training set so as to complete the first-stage training of the text characterization model; calculating self-supervision loss of the second training set, freezing model parameters of the nonlinear coding network, and updating model parameters of the long-short-term memory network and all the fully-connected neural networks according to the self-supervision loss of the second training set so as to complete second-stage training of the text representation model; and calculating the contrast learning loss and the self-supervision loss of the third training set, and updating the model parameters of the text characterization model according to the contrast learning loss and the self-supervision loss of the third training set so as to complete the third-stage training of the text characterization model.
In the embodiment of the present application, the comparison learning loss of the first training set is calculated, the self-supervision loss of the second training set is calculated, and the comparison learning loss and the self-supervision loss of the third training set are calculated in the same manner as the method for calculating the correlation loss in the previous embodiment, which is not described herein.
In the first stage training, freezing model parameters of a long-short-term memory network and all fully-connected neural networks, updating model parameters of a nonlinear coding network according to the comparison learning loss of a first training set, wherein the long-short-term memory network and all fully-connected neural networks are not involved in the process; after the first-stage training is finished, thawing model parameters of the long-short-term memory network and all the fully-connected neural networks, starting the second-stage training, freezing model parameters of the nonlinear coding network, updating the model parameters of the long-short-term memory network and all the fully-connected neural networks according to self-supervision loss of the second training set, wherein the nonlinear coding network participates in the process, but the model parameters of the nonlinear coding network are not updated (because the input of the long-short-term memory network uses the output of the nonlinear coding network, the nonlinear coding network participates in the process); after the second-stage training is finished, thawing model parameters of the nonlinear coding network, starting the third-stage training, and updating model parameters of the text characterization model according to the contrast learning loss and the self-supervision loss of the third training set (the model parameters of the nonlinear coding network, the long-short-period memory network and all the fully-connected neural networks in the third-stage training are updated). And after training in the third stage, determining that training of the text characterization model is completed.
Fig. 2 is a schematic flow chart (ii) of a text characterization model training method based on context according to an embodiment of the present disclosure. As shown in fig. 2, includes:
s201, extracting word vector features of the first K words by using a word vector model;
s202, determining the comprehensive vector of each word according to the word vector characteristics of the word and all words before the word;
s203, calculating word characterization loss according to the context latent vectors and the comprehensive vectors of the first K words;
s204, freezing model parameters of the long-term memory network and all the fully-connected neural networks, and updating model parameters of the nonlinear coding network according to the contrast learning loss so as to complete the first-stage training of the text characterization model;
s205, freezing model parameters of a nonlinear coding network and all fully-connected neural networks, and updating model parameters of a long-term and short-term memory network according to word representation loss so as to complete second-stage training of a text representation model;
s206, freezing model parameters of the nonlinear coding network and the long-short-term memory network, and updating model parameters of all the fully-connected neural networks according to self-supervision loss to complete the third-stage training of the text characterization model.
The word vector model is word2vec. The word vector feature of the 1 st word is the comprehensive vector of the 1 st word, the comprehensive vector of the 3 rd word is determined according to the word vector features of the 1 st, 2 nd and 3 rd words, and the word vector feature of one word and all words before the word vector feature can be spliced together to be used as the comprehensive vector of the word. The cross entropy loss function may be utilized to calculate a word token loss between the context latent vector and the synthetic vector.
The multi-stage training in the embodiment of the present application is similar to that in the previous embodiment, and will not be described again.
The low-dimensional vector representation and upper and lower Wen Qian vectors for the kth term are calculated as follows.
z K =Genc(x K );
c K =Gar(z K ,c K-1 );
Genc () represents a nonlinear coding network, x K Represent the K-th word, z K Low-dimensional vector representation representing the kth term, gar () represents a long-short term memory network, c K A context latent vector representing the kth term, c K-1 Upper and lower Wen Qian vectors representing the K-1 th word.
The same is true of low-dimensional vector characterizations and context-aware vector calculations for other words.
Self-monitoring losses are calculated by the following formula
Where MSE () is the mean square error function,representing the predictive vector of the (K+i) -th word,>for low-dimensional vector characterization of the (K+i) th word, i is a natural number, and the value of i is between 1 and N-K.
The contrast learning loss is calculated by the following formula
Wherein,is a triple loss function, j, and>and->Are natural numbers, j is 1 to K, the +.>The word is positive sample of the j-th word, +.>The words are negative examples of the j-th word,>the value is between 2 and K+1, < >>Take on a value between K+1 and N, z j Characterizing a low-dimensional vector for the jth word,>is->Low-dimensional vector representation of individual words, +.>Is->Low-dimensional vector characterization of individual words.
Updating model parameters of the text characterization model based on the self-supervision loss and the contrast learning loss, comprising: calculating the total loss according to the following formula, and updating the model parameters of the text characterization model according to the total loss:
wherein,for total loss->For self-supervision loss->To contrast learning loss->For weight adjustment factor, ++>The value range of (2) is between 0 and 1, and can be set by oneself.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 3 is a schematic diagram of a context-based text characterization model training device provided by an embodiment of the present disclosure. As shown in fig. 3, the context-based text characterization model training device includes:
a construction module 301 configured to construct a nonlinear coding network using a plurality of embedded layers, and construct a text characterization model using the nonlinear coding network, the long-short-term memory network, and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and each fully-connected neural network is followed by an activation function;
the segmentation module 302 is configured to obtain a training data set, divide a training text in the training data set according to words, and record the number of the words obtained by division as N, wherein the position of each word in the training text determines the sequence between the word and other words;
a first determination module 303 configured to determine a low-dimensional vector representation of each term over a non-linear encoding network,
a second determining module 304 configured to determine context latent vectors of the first K words through the long-short term memory network based on the low-dimensional vector representations of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the preceding word of the word;
the third determining module 305 is configured to pass the upper and lower Wen Qian vectors of the kth word through N-K fully-connected neural networks to obtain predictive vector representations of the (k+1) th word to the nth word;
a first calculation module 306 configured to calculate a self-supervising penalty from the low-dimensional vector characterizations and the predictive vector characterizations of the (k+1) -th term to the (N) -th term;
a second calculation module 307 configured to determine positive and negative samples of the first K words from the training dataset, calculate a contrast learning penalty from the first K words and the low-dimensional vector characterizations of the positive and negative samples of the first K words;
an updating module 308 configured to update model parameters of the text characterization model based on the self-supervision loss and the contrast learning loss to complete training of the text characterization model.
According to the technical scheme provided by the embodiment of the disclosure, a nonlinear coding network is constructed by utilizing a plurality of embedded layers, a text characterization model is constructed by utilizing the nonlinear coding network, a long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected to each fully-connected neural network; acquiring a training data set, dividing training texts in the training data set according to words, and recording the number of the words obtained by division as N, wherein the position of each word in the training texts determines the sequence between the word and other words; determining a low-dimensional vector representation of each word through a nonlinear coding network; determining context latent vectors of the first K words through a long-short-term memory network according to the low-dimensional vector representation of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the previous word of the word; respectively passing the upper and lower Wen Qian vectors of the K-th word through N-K fully connected neural networks to obtain predictive vector characterization from the K+1th word to the N-th word; calculating self-supervision loss according to the low-dimensional vector characterization and the predictive vector characterization of the (K+1) -th word to the (N) -th word; positive samples and negative samples of the first K words are determined from the training data set, and a contrast learning loss is calculated according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words; and updating model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model. According to the application, the context latent vector of each word is related to the context latent vector of the previous word (namely, the context association is considered in the text representation), and the self-supervision loss and the contrast learning loss are integrated, so that the problems that the context association cannot be considered when the text is represented by the model and the guiding effect of the loss function on the model training is weaker in the prior art can be solved by adopting the technical means, and further, the text representation information can fully represent the information in the text and the guiding effect of the loss function on the model training is improved.
Optionally, the updating module 308 is configured to divide the training data set into a first training set, a second training set and a third training set according to a preset ratio; in the text characterization model: calculating the contrast learning loss of the first training set, freezing model parameters of the long-term memory network and all the fully-connected neural networks, and updating the model parameters of the nonlinear coding network according to the contrast learning loss of the first training set so as to complete the first-stage training of the text characterization model; calculating self-supervision loss of the second training set, freezing model parameters of the nonlinear coding network, and updating model parameters of the long-short-term memory network and all the fully-connected neural networks according to the self-supervision loss of the second training set so as to complete second-stage training of the text representation model; and calculating the contrast learning loss and the self-supervision loss of the third training set, and updating the model parameters of the text characterization model according to the contrast learning loss and the self-supervision loss of the third training set so as to complete the third-stage training of the text characterization model.
Optionally, the update module 308 is configured to extract word vector features of the first K words using the word vector model; determining a comprehensive vector of each word according to the word vector characteristics of the word and all words before the word; calculating word characterization loss according to the context latent vectors and the comprehensive vectors of the first K words; freezing model parameters of the long-term memory network and all the fully-connected neural networks, and updating the model parameters of the nonlinear coding network according to the contrast learning loss so as to complete the first-stage training of the text characterization model; freezing model parameters of the nonlinear coding network and all the fully-connected neural networks, and updating model parameters of the long-term and short-term memory network according to the word representation loss so as to complete the second-stage training of the text representation model; and freezing model parameters of the nonlinear coding network and the long-term and short-term memory network, and updating model parameters of all the fully-connected neural networks according to the self-supervision loss to complete the third-stage training of the text characterization model.
Optionally, the first determining module 303 is configured to calculate the low-dimensional vector representation of the kth term and the up-down Wen Qian vector as follows.
z K =Genc(x K );
c K =Gar(z K ,c K-1 );
Genc () represents a nonlinear coding network, x K Represent the K-th word, z K Low-dimensional vector representation representing the kth term, gar () represents a long-short term memory network, c K A context latent vector representing the kth term, c K-1 Upper and lower Wen Qian vectors representing the K-1 th word.
The same is true of low-dimensional vector characterizations and context-aware vector calculations for other words.
Optionally, the first calculation module 306 is configured to calculate the self-supervising loss by the following formula
Where MSE () is the mean square error function,representing the predictive vector of the (K+i) -th word,>for low-dimensional vector characterization of the (K+i) th word, i is a natural number, and the value of i is between 1 and N-K.
Optionally, the second calculation module 307 is configured to calculate the contrast learning loss by the following formula
Wherein,is a triple loss function, j, and>and->Are natural numbers, j is 1 to K, the +.>The word is positive sample of the j-th word, +.>The words are negative examples of the j-th word,>the value is between 2 and K+1, < >>Take on a value between K+1 and N, z j Characterizing a low-dimensional vector for the jth word,>is->Low-dimensional vector representation of individual words, +.>Is->Low-dimensional vector characterization of individual words.
Optionally, the updating module 308 is configured to calculate the total loss by:
wherein,for total loss->For self-supervision loss->To contrast learning loss->For weight adjustment factor, ++>The value range of (2) is between 0 and 1, and can be set by oneself.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.
Fig. 4 is a schematic diagram of an electronic device 4 provided by an embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.
The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not limiting of the electronic device 4 and may include more or fewer components than shown, or different components.
The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Memory 402 may also include both internal storage units and external storage devices of electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims (10)

1. A text characterization model training method based on context, comprising:
constructing a nonlinear coding network by utilizing a plurality of embedded layers, and constructing a text characterization model by utilizing the nonlinear coding network, a long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected to each fully-connected neural network;
acquiring a training data set, dividing training texts in the training data set according to words, and recording the number of words obtained by dividing as N, wherein the position of each word in the training texts determines the sequence between the word and other words;
determining a low-dimensional vector representation of each word through the nonlinear coding network;
determining context latent vectors of the first K words through the long-short-term memory network according to the low-dimensional vector representation of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the previous word of the word;
respectively passing the upper and lower Wen Qian vectors of the K-th word through N-K fully connected neural networks to obtain predictive vector characterization from the K+1th word to the N-th word;
calculating self-supervision loss according to the low-dimensional vector characterization and the predictive vector characterization of the (K+1) -th word to the (N) -th word;
positive samples and negative samples of the first K words are determined from the training data set, and a contrast learning loss is calculated according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words;
and updating model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model.
2. The method of claim 1, wherein after acquiring the training data set, the method further comprises:
dividing the training data set into a first training set, a second training set and a third training set according to a preset proportion;
in the text characterization model:
calculating the contrast learning loss of the first training set, freezing the model parameters of the long-short-period memory network and all the fully-connected neural networks, and updating the model parameters of the nonlinear coding network according to the contrast learning loss of the first training set so as to complete the first-stage training of the text characterization model;
calculating the self-supervision loss of the second training set, freezing the model parameters of the nonlinear coding network, and updating the model parameters of the long-short-period memory network and all the fully-connected neural networks according to the self-supervision loss of the second training set so as to complete the second-stage training of the text representation model;
and calculating the contrast learning loss and the self-supervision loss of the third training set, and updating the model parameters of the text characterization model according to the contrast learning loss and the self-supervision loss of the third training set so as to complete the third-stage training of the text characterization model.
3. The method according to claim 1, wherein the method further comprises:
in the text characterization model:
freezing model parameters of the long-short-term memory network and all fully-connected neural networks, and updating the model parameters of the nonlinear coding network according to the comparison learning loss so as to complete the first-stage training of the text characterization model;
freezing model parameters of the nonlinear coding network and all fully-connected neural networks, and updating the model parameters of the long-term and short-term memory network according to word representation loss to complete second-stage training of the text representation model;
and freezing model parameters of the nonlinear coding network and the long-short-term memory network, and updating model parameters of all the fully-connected neural networks according to the self-supervision loss to complete the third-stage training of the text characterization model.
4. A method according to claim 3, characterized in that the method further comprises:
extracting word vector features of the first K words by using a word vector model;
determining a comprehensive vector of each word according to the word vector characteristics of the word and all words before the word;
and calculating word characterization loss according to the context latent vectors and the comprehensive vectors of the first K words.
5. The method according to claim 1, characterized in thatThe self-monitoring loss is calculated by the following formula
Where MSE () is the mean square error function,representing the predictive vector of the (K+i) -th word,>for low-dimensional vector characterization of the (K+i) th word, i is a natural number, and the value of i is between 1 and N-K.
6. The method of claim 1, wherein the contrast learning penalty is calculated by the formula
Wherein,is a triple loss function, j, and>and->Are natural numbers, j is 1 to K, the +.>Positive sample with the j-th wordFirst->The words are negative examples of the j-th word,>the value is between 2 and K+1, < >>Take on a value between K+1 and N, z j Characterizing a low-dimensional vector for the jth word,>is->Low-dimensional vector representation of individual words, +.>Is->Low-dimensional vector characterization of individual words.
7. The method of claim 1, wherein updating model parameters of the text characterization model based on the self-supervised and the comparative learning losses comprises:
calculating the total loss by the following formula, and updating the model parameters of the text characterization model according to the total loss:
wherein,for the total loss, ++>For the self-supervision loss->For said contrast learning loss, < >>Is a weight adjustment factor.
8. A context-based text characterization model training device, comprising:
the construction module is configured to construct a nonlinear coding network by utilizing a plurality of embedded layers, and construct a text characterization model by utilizing the nonlinear coding network, a long-short-term memory network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, and an activation function is connected to each fully-connected neural network;
the segmentation module is configured to acquire a training data set, divide training texts in the training data set according to words, and record the number of the words obtained by division as N, wherein the position of each word in the training texts determines the sequence between the word and other words;
a first determination module configured to determine a low-dimensional vector representation of each term over the nonlinear encoding network;
a second determining module configured to determine context latent vectors of the first K words through the long-short term memory network according to the low-dimensional vector representation of the first K words, wherein the context latent vector of each word is related to the low-dimensional vector representation of the word and the context latent vector of the preceding word of the word;
the third determining module is configured to enable the upper Wen Qian vector and the lower Wen Qian vector of the K word to respectively pass through N-K fully-connected neural networks to obtain predictive vector characterization of the K+1th word to the N word;
a first calculation module configured to calculate a self-supervising penalty from the low-dimensional vector characterizations and the predictive vector characterizations of the k+1th term through the nth term;
the second calculation module is configured to determine positive samples and negative samples of the first K words from the training data set, and calculate contrast learning loss according to the first K words and low-dimensional vector characterization of the positive samples and the negative samples of the first K words;
and the updating module is configured to update model parameters of the text characterization model according to the self-supervision loss and the contrast learning loss so as to complete training of the text characterization model.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.
CN202310779760.6A 2023-06-29 2023-06-29 Text characterization model training method and device based on context Active CN116502640B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310779760.6A CN116502640B (en) 2023-06-29 2023-06-29 Text characterization model training method and device based on context

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310779760.6A CN116502640B (en) 2023-06-29 2023-06-29 Text characterization model training method and device based on context

Publications (2)

Publication Number Publication Date
CN116502640A CN116502640A (en) 2023-07-28
CN116502640B true CN116502640B (en) 2023-12-12

Family

ID=87317060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310779760.6A Active CN116502640B (en) 2023-06-29 2023-06-29 Text characterization model training method and device based on context

Country Status (1)

Country Link
CN (1) CN116502640B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992773A (en) * 2019-03-20 2019-07-09 华南理工大学 Term vector training method, system, equipment and medium based on multi-task learning
WO2020227651A1 (en) * 2019-05-09 2020-11-12 Automobilia Ii, Llc Methods, systems and computer program products for media processing and display
CN114548321A (en) * 2022-03-05 2022-05-27 昆明理工大学 Self-supervision public opinion comment viewpoint object classification method based on comparative learning
CN116127953A (en) * 2023-04-18 2023-05-16 之江实验室 Chinese spelling error correction method, device and medium based on contrast learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992773A (en) * 2019-03-20 2019-07-09 华南理工大学 Term vector training method, system, equipment and medium based on multi-task learning
WO2020227651A1 (en) * 2019-05-09 2020-11-12 Automobilia Ii, Llc Methods, systems and computer program products for media processing and display
CN114548321A (en) * 2022-03-05 2022-05-27 昆明理工大学 Self-supervision public opinion comment viewpoint object classification method based on comparative learning
CN116127953A (en) * 2023-04-18 2023-05-16 之江实验室 Chinese spelling error correction method, device and medium based on contrast learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition;Baoguang Shi et al.;arXiv:1507.05717v1;第1-9页 *
基于描述约束的词表示学习;冶忠林 等;中文信息学报;第33卷(第4期);第29-36页 *

Also Published As

Publication number Publication date
CN116502640A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
US11829874B2 (en) Neural architecture search
CN110366734B (en) Optimizing neural network architecture
WO2020224219A1 (en) Chinese word segmentation method and apparatus, electronic device and readable storage medium
US20200265192A1 (en) Automatic text summarization method, apparatus, computer device, and storage medium
WO2021114840A1 (en) Scoring method and apparatus based on semantic analysis, terminal device, and storage medium
CN105022754B (en) Object classification method and device based on social network
US20130138589A1 (en) Exploiting sparseness in training deep neural networks
WO2021089012A1 (en) Node classification method and apparatus for graph network model, and terminal device
CN109710921B (en) Word similarity calculation method, device, computer equipment and storage medium
CN116362351A (en) Method and device for training pre-training language model by using noise disturbance
CN111339775A (en) Named entity identification method, device, terminal equipment and storage medium
CN114020950B (en) Training method, device, equipment and storage medium for image retrieval model
CN108475346B (en) Neural random access machine
CN110046344B (en) Method for adding separator and terminal equipment
CN116127925B (en) Text data enhancement method and device based on destruction processing of text
CN116502640B (en) Text characterization model training method and device based on context
US20180129916A1 (en) Statistical max pooling with deep learning
CN111339308A (en) Training method and device of basic classification model and electronic equipment
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal
CN115062769A (en) Knowledge distillation-based model training method, device, equipment and storage medium
CN114358023A (en) Intelligent question-answer recall method and device, computer equipment and storage medium
CN114595641A (en) Method and system for solving combined optimization problem
CN116523028B (en) Image characterization model training method and device based on image space position
JP2018081294A (en) Acoustic model learning device, voice recognition device, acoustic model learning method, voice recognition method, and program
Lee et al. Improved model adaptation approach for recognition of reduced-frame-rate continuous speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant