WO2022257453A1 - 释义分析模型训练方法、装置、终端设备及存储介质 - Google Patents

释义分析模型训练方法、装置、终端设备及存储介质 Download PDF

Info

Publication number
WO2022257453A1
WO2022257453A1 PCT/CN2022/071358 CN2022071358W WO2022257453A1 WO 2022257453 A1 WO2022257453 A1 WO 2022257453A1 CN 2022071358 W CN2022071358 W CN 2022071358W WO 2022257453 A1 WO2022257453 A1 WO 2022257453A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
text
target
loss function
training
Prior art date
Application number
PCT/CN2022/071358
Other languages
English (en)
French (fr)
Inventor
赵盟盟
王媛
吴文哲
王磊
苏亮州
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022257453A1 publication Critical patent/WO2022257453A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application belongs to the technical field of artificial intelligence, and in particular relates to a paraphrase analysis model training method, device, terminal equipment and storage medium.
  • the text paraphrase analysis task is one of the common indicators to evaluate the performance of the model in the field of natural language processing. Specifically, two paragraphs of text are input into the model, and the model is processed to predict whether the meaning expressed by the two paragraphs of text is the same meaning, that is, the similarity of interpretation between the two paragraphs of text.
  • One of the purposes of the embodiments of the present application is to provide a paraphrase analysis model training method, device, terminal equipment and storage medium, aiming to solve the technical problem of low prediction accuracy of the text paraphrase analysis model trained in the prior art.
  • the embodiment of the present application provides a paraphrase analysis model training method, the method includes:
  • the training sample includes at least two paragraphs of text
  • the training sample is input into the first network structure of the pre-training model to obtain the target embedding vector of the training sample;
  • the pre-training model also includes a high-pass filter layer and a second network structure;
  • the target training model is used to process any input two paragraphs of text to be recognized, and output the two paragraphs of text to be recognized Identify the paraphrase similarity of text.
  • the embodiment of the present application provides a paraphrase analysis model training device, the device includes:
  • An acquisition module configured to acquire training samples, the training samples at least including two paragraphs of text;
  • the first input module is used to input the training sample into the first network structure of the pre-training model to obtain the target embedding vector of the training sample;
  • the pre-training model also includes a high-pass filter layer and a second network structure;
  • the second input module is used to input the target embedding vector to the high-pass filter layer for information noise filtering to obtain a filter vector;
  • a third input module configured to input the target embedding vector and the filtering vector into the second network structure for vector processing to obtain the target loss function value of the training sample
  • the training module is used to perform backpropagation training on the pre-trained model based on the target loss function value to obtain a target training model, and the target training model is used to process any input two paragraphs of text to be recognized and output The paraphrase similarity of the two texts to be recognized.
  • the third aspect of the embodiments of the present application provides a terminal device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the computer program Realized when:
  • the training sample includes at least two paragraphs of text
  • the training sample is input into the first network structure of the pre-training model to obtain the target embedding vector of the training sample;
  • the pre-training model also includes a high-pass filter layer and a second network structure;
  • the target training model is used to process any input two paragraphs of text to be recognized, and output the two paragraphs of text to be recognized Identify the paraphrase similarity of text.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it realizes:
  • the training sample includes at least two paragraphs of text
  • the training sample is input into the first network structure of the pre-training model to obtain the target embedding vector of the training sample;
  • the pre-training model also includes a high-pass filter layer and a second network structure;
  • the target training model is used to process any input two paragraphs of text to be recognized, and output the two paragraphs of text to be recognized Identify the paraphrase similarity of text.
  • the fifth aspect of the embodiments of the present application also provides a computer program product.
  • the computer program product is run on a terminal device, the terminal device is executed to realize:
  • the training sample includes at least two paragraphs of text
  • the training sample is input into the first network structure of the pre-training model to obtain the target embedding vector of the training sample;
  • the pre-training model also includes a high-pass filter layer and a second network structure;
  • the target training model is used to process any input two paragraphs of text to be recognized, and output the two paragraphs of text to be recognized Identify the paraphrase similarity of text.
  • the embodiment of the present application includes the following advantages:
  • the target embedding vector that can contain the paraphrase information between two paragraphs of text can be initially obtained, reducing the need to redesign training Time for the first network structure in the model to vectorize text.
  • high-pass filtering is performed on the target embedding vector to reduce the interference of information noise in the target embedding vector to the model.
  • model processing is performed based on the target embedding vector and a more accurate filter vector to obtain the target loss function value.
  • the pre-training model can not only maximize the retention of the feature information between the original two texts during the vector processing process, but also enhance the key information in the two texts based on the filter vector.
  • various learning parameters and weight parameters in the pre-training model are fine-tuned to obtain the target training model, so as to improve the prediction accuracy of the target training model.
  • Fig. 1 is the realization flowchart of a kind of paraphrase analysis model training method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an implementation of S102 of a paraphrase analysis model training method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of an implementation of S1023 of a paraphrase analysis model training method provided by an embodiment of the present application
  • Fig. 4 is a schematic diagram of an implementation of S104 of a paraphrase analysis model training method provided by an embodiment of the present application
  • Fig. 5 is a schematic diagram of an implementation of S1043 of a paraphrase analysis model training method provided by an embodiment of the present application
  • Fig. 6 is a flow chart of realizing a paraphrase analysis model training method provided by another embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a paraphrase analysis model training device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the paraphrase analysis model training method provided by the embodiment of the present application can be applied to terminal devices such as tablet computers, notebook computers, ultra-mobile personal computers (ultra-mobile personal computer, UMPC), and netbooks.
  • terminal devices such as tablet computers, notebook computers, ultra-mobile personal computers (ultra-mobile personal computer, UMPC), and netbooks.
  • UMPC ultra-mobile personal computer
  • netbooks netbooks
  • FIG. 1 shows an implementation flowchart of a paraphrase analysis model training method provided by an embodiment of the present application. The method includes the following steps:
  • the above training samples may be obtained in advance from a plurality of existing practical applications in different industries and/or research fields.
  • corresponding sample data sets are pre-collected in practical applications in different industries and/or research fields, and each sample data set includes corresponding training samples.
  • the above-mentioned sample data set is specifically a data set used for paraphrase similarity analysis training.
  • the above sample data sets specifically include but not limited to MNLI data set, SNLI data set and SICK data set.
  • each training sample when the above-mentioned training samples are samples used for paraphrase similarity analysis, each training sample needs to include two paragraphs of text, and include the actual results (similar results, or dissimilar results) between the two paragraphs of texts. ). Based on this, after the training model receives the training samples, it can be model-processed and output prediction results. Afterwards, the training loss value is calculated based on the predicted and actual results to update the trained model.
  • the language of the above two paragraphs of text includes but not limited to Chinese, English and other languages.
  • existing language translation technology can be used to translate the language of one of the texts to obtain two texts in the same language.
  • the above-mentioned pre-training model may be an existing model or a newly designed network model, which is not limited.
  • the above-mentioned pre-training model is specifically an existing Bert model improved by Transformer, which is usually used to perform text paraphrase analysis tasks to reduce the training time of the pre-training model.
  • the target embedding vector is a feature vector representing the paraphrase relationship between two pieces of text. That is, after inputting the training sample into the first network structure and performing vector processing on it, the obtained target embedding vector can be used to represent the paraphrase relationship between two paragraphs of text. Then, the second network structure and high-pass filtering layer can process the target embedding vector and output the prediction result of the paraphrase similarity between two pieces of text. Finally, the second network structure can also calculate the corresponding loss function value based on the prediction result and the actual result of the training sample, and iteratively update the pre-training model.
  • the above-mentioned first network structure is to process the training samples to obtain the structure of the target embedding vector, which can be a network structure that uses the existing Bert model to process the text to obtain the text feature vector, and no further description is made for this Detailed Description.
  • the above-mentioned second network structure may specifically include a Dropout layer, a Linear layer, a Softmax layer, and a loss function calculation layer.
  • the above-mentioned Dropout layer can solve the problem of model overfitting in the process of model training.
  • the linear function in the above Linear layer can make the model gradually converge during the training process.
  • the activation function in the above-mentioned Softmax layer can be calculated based on the input feature vector, and output the prediction result (paraphrase similarity) between two pieces of text in the training sample.
  • the loss function calculation layer can calculate the loss function value based on the predicted results and the actual results of the training samples.
  • the actual results between the training samples usually take 1 or 0 as a parameter to participate in the calculation.
  • the loss function in the above-mentioned loss function layer may specifically be a cross-entropy loss function
  • the above-mentioned softmax function may specifically be:
  • xi is the initial probability that the pre-training model predicts that the training sample belongs to the i-th prediction result
  • f(xi) is the activation function calculation based on the initial probability of the above-mentioned prediction result
  • the obtained prediction training sample belongs to the final prediction result of the i-th class. probability.
  • the number of categories of i is not limited.
  • the final probability of the output f(xi) can also be considered as the paraphrase similarity of the two ends of the text output by the pre-training model.
  • this embodiment needs to further modify it through the above activation function f(xi), so as to improve the effect of the second network structure on the overall pre-training model. expression ability.
  • the above-mentioned high-pass filter layer may be used to perform information noise filtering processing on the input target embedding vector, so as to remove low-frequency components in the target embedding vector and retain high-frequency components therein.
  • the above S102 has already explained the high-pass filter layer, which will not be further described.
  • the above-mentioned high-frequency components usually contain more accurate vector information in the target embedding vector, so that the obtained filter vector can better reflect the paraphrase information contained between the two texts.
  • the above-mentioned high-pass filter layer includes a high-pass filter, which can pass high-frequency signals normally while processing signals, and block and weaken low-frequency signals lower than a set threshold. But the range of blocking and attenuation will vary according to different frequencies and different filtering procedures (purposes).
  • the high-pass filter layer can convert the target embedding vector into a frequency signal for representation, and the signal is formed by superimposing signals between multiple different frequencies. Wherein, converting the vector into a signal can be realized by using an existing test vector conversion technology, which will not be described in detail. Afterwards, for the high-frequency signal obtained after high-pass filter processing, it can be converted into a vector again based on the above-mentioned conversion technology for subsequent model processing.
  • the second network structure has been explained above, and will not be further described. What needs to be added is that the target embedding vector and filter vector can be simultaneously input into the second network structure for model processing, and the result obtained is the prediction result between two texts output by the pre-training model. Afterwards, the pre-training model is calculated based on the prediction results and the actual results of the training samples to obtain the target loss function value.
  • the terminal device may also respectively input the target embedding vector and the filter vector into the second network structure, and at this time, two kinds of prediction results output by the pre-training model may be correspondingly obtained.
  • the second network structure can calculate each prediction result and the actual result respectively, and obtain two kinds of loss function values correspondingly. Finally, the second network structure may add the two loss function values to obtain the target loss function value, which is not limited.
  • the filtered filter vector and the target embedding vector are input into the second network structure for processing, and when the characteristic information of the text is preserved to the maximum extent, the two paragraphs can also be realized based on the filter vector Enhancement of key information in text.
  • the pre-training model can perform backpropagation training based on the value, so as to update the learning parameters and weight parameters of each network layer in the pre-training model to obtain the target training model.
  • the target training model obtained based on the above method is specifically a binary classification model, which can be used to perform model processing on the input two pieces of text to be recognized, and output the similarity of the interpretation of the two pieces of text to be recognized.
  • the target training model can refer to the above steps of S102 and S103 to process the two texts to be recognized to obtain the target embedding vector and filter vector of the two texts to be recognized. Afterwards, when the target training model inputs the target embedding vector and filter vector to the second network structure, because the current target training model does not need to perform backpropagation training, that is, it does not need to calculate the loss function value. Therefore, the target training model only needs to input the target embedding vector and filter vector to the Dropout layer, Linear layer, and Softmax layer in the second network structure for processing, and then the prediction result can be obtained.
  • the target embedding vector that can contain the paraphrase information between two paragraphs of text can be initially obtained, reducing the need for redesign The time to train the first network structure that vectorizes text in the model. Then, high-pass filtering is performed on the target embedding vector to reduce the interference of information noise in the target embedding vector to the model. After that, model processing is performed based on the target embedding vector and a more accurate filter vector to obtain the target loss function value. In this way, the pre-training model can not only maximize the retention of the feature information between the original two texts during the vector processing process, but also enhance the key information in the two texts based on the filter vector. Finally, according to the target loss function value, various learning parameters and weight parameters in the pre-training model are fine-tuned to obtain the target training model, so as to improve the prediction accuracy of the target training model.
  • the training sample is input into the first network structure of the pre-training model to obtain the target embedding vector of the training sample, which specifically includes the following sub-steps S1021-1024, which are described in detail as follows:
  • S1022. Determine the text content between the start symbol and the division symbol as the first text, and determine the text content after the division symbol as the second text.
  • the above-mentioned starting symbol and dividing symbol can be set by the user according to the actual situation, including but not limited to letters, numbers, etc., and this embodiment does not limit the expression forms of the starting symbol and dividing symbol.
  • the terminal device may determine the text between the SEP symbol and the CLS symbol as the first text, and determine the text after the CLS symbol as the second text.
  • the above-mentioned first embedding vector may be a vector obtained after processing each word segment in the first text.
  • the first word vector of the first participle may be obtained; and, based on the start symbol, the first word position vector of the first participle in the first text is determined.
  • comprehensive processing is performed based on the first word vector, the first word position vector, and the preset embedding vector of the first text to obtain a first word segmentation embedding vector of the first word segmentation.
  • the above three vectors are summed to obtain the first word segmentation embedding vector that can represent the first word segmentation.
  • the above steps are performed for each first word segment to obtain the first word segment embedding vectors of each first word segment in the first text. Based on this, it can be considered that the above-mentioned first embedding vector is composed of first word-segment embedding vectors of each first word-segment in the above-mentioned first text.
  • both the above-mentioned first embedding vector and the second embedding vector can be expressed in a specific digital form. Based on this, after obtaining the first embedding vector and the second embedding vector, the average value of the numbers between the two vectors can be used as the target embedding vector of the training sample, which is not limited. At this point, it can be understood that the target embedding vector is obtained based on the processing of the first embedding vector and the second embedding vector, therefore, it can be considered that the target embedding vector contains vector information of both embedding vectors.
  • the first text includes at least one first participle
  • the second text includes at least one second participle
  • the first embedding vector is embedded by the participle corresponding to the first participle
  • the second embedding vector is composed of the word segmentation embedding vector corresponding to the second word segmentation; in S1023, the first text and the second text are input into the first network structure to obtain the first
  • the first embedding vector of a text and the second embedding vector of the second text specifically include the following sub-steps S10231-10234, which are described in detail as follows:
  • the first text and the second text each include at least one participle
  • the first text includes multiple first participles as an example for explanation.
  • the terminal device may perform text segmentation on the first text based on a preset word vector library to obtain multiple first segmentation words.
  • the word vector library contains multiple word segments, and each word segment corresponds to a unique word vector. Based on this, the terminal device may first take the entire first text as a word segment and compare it in the word vector database. If there is no corresponding participle, reduce the first character or the last character, and compare the remaining text as a participle in the word vector library until each character in the first text matches a corresponding participle and word vectors. At this time, the corresponding participle is the first participle.
  • the word order of the first segmentation among multiple first segmentations included in the first text may be determined based on the segmentation result. After that, the word order is used as the word position vector of the first participle.
  • S10233 Perform vector addition processing according to the first word position vector, the first word vector, and the preset embedding information of the first text to obtain a first word segment embedding vector of the first word segment; and, according to The second word position vector, the second word vector, and the preset embedding information of the second text are subjected to vector sum processing to obtain a second word segment embedding vector of the second word segment.
  • the above-mentioned first pre-embedded vector is used to distinguish the text to which the first word segment belongs, and can be set by the user according to the actual situation. It should be noted that, for multiple first word segments in the first text, the first pre-embedding vectors corresponding to each first word segment are consistent.
  • the above-mentioned first word vector, first word position vector and preset embedding vector can all be represented in a specific digital form. Therefore, for the three kinds of vectors of the above-mentioned first participle, the summed value of the above-mentioned three kinds of vectors can be used to represent the first participle embedding vector of the first participle. Finally, after determining the first word-segment embedding vectors respectively corresponding to the multiple first word-segments in the first text, the first embedding vector of the first text is obtained.
  • the process of obtaining the second embedding vector of the second text is similar to the above process of obtaining the first embedding vector, for details, please refer to the above description.
  • the target embedding vector and the filter vector are respectively input into the second network structure for vector processing to obtain the target loss function value of the training sample , specifically including the following sub-steps S1041-1043, detailed as follows:
  • the above-mentioned second network structure has been explained in the above-mentioned S102, which will not be further described. It should be noted that, for the above-mentioned target embedding vector and filter vector, the second network structure processes the two vectors separately to obtain corresponding first prediction results and second prediction results.
  • the filter vector is obtained based on processing the target embedding vector. Therefore, compared with the first prediction result, it can also be considered that the second prediction result is closer to the actual result of the training samples.
  • the above cross entropy is used to evaluate the difference between the probability distribution (prediction result) predicted by the current pre-training model and the real distribution (real result).
  • reducing the cross-entropy loss can improve the prediction accuracy of the pre-training model.
  • using cross entropy to calculate the loss function can solve the problem of using the square loss function to iteratively update the pre-set The problem of slow training model.
  • the above calculation of the target loss function value based on the original loss function and the filtering loss function may specifically be: calculating the sum of the original loss function and the filtering loss function to obtain the target loss function.
  • the above-mentioned calculation of the target loss function may also go through the following sub-steps S10431-S10432, which are described in detail as follows:
  • the filter vector can enhance the key information in the two paragraphs of text. Therefore, for the above two loss functions, it can be considered that the preset second weight value is greater than the preset first weight value. In this way, the target loss function obtained by calculating the preset first weight value and the preset second weight value can better complete the training of the pre-training model, so as to obtain a target training model with high prediction accuracy.
  • the first network structure is a network structure that performs vector processing on text in the existing Bert model, it can be considered that the first network structure is a mature network structure. Based on this, when iteratively updating the model parameters of the pre-training model, only the model parameters in the second network structure and the high-pass filter layer can be iteratively updated to ensure that the target training model after training has a certain prediction accuracy. At the same time, it can also reduce the training time for the target training model.
  • the method further includes the following steps S11-S12, described in detail as follows:
  • the above-mentioned cosine similarity is based on the cosine value of the angle between two vectors in the vector space as a measure of the difference between two individuals (two sections of text to be recognized). The closer the value is to 1, the two can be considered The more similar the vectors are, the more similar the two texts to be recognized are.
  • the target training model is obtained through the above method, which is suitable for identifying the similarity of interpretation between two pieces of text.
  • the target training model needs to recognize the number of multiple texts to be recognized, it is necessary to perform the above method on each piece of text.
  • the target embedding vector to be recognized of the text to be recognized can be obtained.
  • the target vector to be recognized is input to the high-pass filter layer to obtain the filtered vector of the text to be recognized.
  • the target training model can calculate the cosine similarity of the filter vectors of pairs of texts to be recognized as the paraphrase similarity of pairs of texts to be recognized.
  • the target training model can not only realize the analysis of the paraphrase similarity between two paragraphs of text to be recognized, but also recognize and predict the paraphrase similarity between multiple texts.
  • FIG. 7 is a structural block diagram of a paraphrase analysis model training device provided by an embodiment of the present application.
  • the modules included in the paraphrase analysis model training device in this embodiment are used to execute the steps in the embodiments corresponding to FIG. 1 to FIG. 6 .
  • FIG. 1 to FIG. 6 and related descriptions in the embodiments corresponding to FIG. 1 to FIG. 6 .
  • the paraphrase analysis model training device 700 includes: an acquisition module 710, a first input module 720, a second input module 730, a third input module 740 and a training module 750, wherein:
  • the acquiring module 710 is configured to acquire training samples, where the training samples include at least two paragraphs of text.
  • the first input module 720 is used to input the training sample into the first network structure of the pre-training model to obtain the target embedding vector of the training sample; the pre-training model also includes a high-pass filter layer and a second network structure .
  • the second input module 730 is configured to input the target embedding vector to the high-pass filter layer to perform information noise filtering to obtain a filter vector.
  • the third input module 740 is configured to input the target embedding vector and the filtering vector into the second network structure for vector processing to obtain the target loss function value of the training sample.
  • the training module 750 is configured to perform backpropagation training on the pre-trained model based on the target loss function value to obtain a target training model, and the target training model is used to process any input two paragraphs of text to be recognized, Output the paraphrase similarity of the two texts to be recognized.
  • the first input module 720 is also used for:
  • Identifying the start symbol and the segmentation symbol in the training sample determining the text content between the start symbol and the segmentation symbol as the first text, and determining the text content after the segmentation symbol as the first text
  • Two texts input the first text and the second text into the first network structure, obtain the first embedding vector of the first text, and the second embedding vector of the second text; calculate The mean value of the first embedding vector and the second embedding vector, and use the mean value as the target embedding vector of the training sample.
  • the first text includes at least one first word segmentation
  • the second text includes at least one second word segmentation
  • the first embedding vector is composed of word segmentation embedding vectors corresponding to the first word segmentation
  • the second text The second embedding vector is composed of the word segmentation embedding vector corresponding to the second word segmentation
  • the first input module 720 is also used for:
  • any first word segmentation of the first text determine the first word vector of the first word segmentation; and, for any second word segmentation of the second text, determine the second word of the second word segmentation vector; determine the first word position vector of the first word segment in the first text relative to the start symbol; and determine the second word segment in the second text relative to the segmentation symbol
  • the second word position vector according to the first word position vector, the first word vector and the preset embedding information of the first text, vector sum processing is performed to obtain the first word segmentation embedding of the first word segmentation vector; and, according to the second word position vector, the second word vector and the preset embedding information of the second text, vector sum processing is performed to obtain a second word segmentation embedding vector of the second word segmentation; based on A first word segmentation embedding vector of the first word segmentation generates the first embedding vector; and, based on a second word segmentation embedding vector of the second word segmentation, the second embedding vector is generated.
  • the third input module 740 is also used for:
  • the target embedding vector into the second network structure inputting the target embedding vector into the second network structure to perform vector processing to obtain a first prediction result of the second network structure predicting the similarity of interpretation of the two paragraphs of text; and, inputting the filter vector into the Vector processing is carried out in the second network structure to obtain the second prediction result of the second network structure predicting the interpretation similarity of the two paragraphs of text; the preset cross-entropy loss function is used to respectively calculate the first prediction result and The second prediction result is calculated to obtain an original loss function and a filtering loss function; and the target loss function value is calculated according to the original loss function and the filtering loss function.
  • the third input module 740 is also used for:
  • the training module 750 is also used for:
  • the target training model includes the first network structure, An updated second network structure and an updated high-pass filter layer.
  • the paraphrase analysis model training device 700 also includes:
  • the fourth input module is configured to sequentially input the text to be recognized into the first network structure and the high-pass filter layer in the target training model for any text to be recognized to obtain a filter vector of the text to be recognized.
  • a calculation module configured to calculate the cosine similarity of any two segments of the filter vectors of the text to be recognized based on the filter vectors of the multiple segments of the text to be recognized, and the cosine similarity is used to represent the interpretation of any two segments of the text to be recognized similarity.
  • each module is used to execute each step in the embodiment corresponding to FIG. 1 to FIG. 6 , and for the embodiment corresponding to FIG. 1 to FIG. 6
  • Each step has been explained in detail in the above embodiments, please refer to FIG. 1 to FIG. 6 and related descriptions in the embodiments corresponding to FIG. 1 to FIG. 6 for details, and details will not be repeated here.
  • Fig. 8 is a structural block diagram of a terminal device provided by another embodiment of the present application.
  • the terminal device 800 of this embodiment includes: a processor 810 , a memory 820 , and a computer program 830 stored in the memory 820 and operable on the processor 810 , for example, a program for interpreting and analyzing a model training method.
  • the processor 810 executes the computer program 830 , the steps in the above-mentioned embodiments of each paraphrase analysis model training method are implemented, such as S101 to S105 shown in FIG. 1 .
  • the processor 810 executes the computer program 830, it realizes the functions of the modules in the above embodiment corresponding to FIG. 7 , for example, the functions of the units 710 to 750 shown in FIG. 7 , specifically as follows:
  • a terminal device including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, it realizes:
  • the training sample includes at least two paragraphs of text
  • the training sample is input into the first network structure of the pre-training model to obtain the target embedding vector of the training sample;
  • the pre-training model also includes a high-pass filter layer and a second network structure;
  • the target training model is used to process any input two paragraphs of text to be recognized, and output the two paragraphs of text to be recognized Identify the paraphrase similarity of text.
  • Identifying the start symbol and the segmentation symbol in the training sample determining the text content between the start symbol and the segmentation symbol as the first text, and determining the text content after the segmentation symbol as the first text
  • Two texts input the first text and the second text into the first network structure, obtain the first embedding vector of the first text, and the second embedding vector of the second text; calculate The mean value of the first embedding vector and the second embedding vector, and use the mean value as the target embedding vector of the training sample.
  • the first text includes at least one first word segmentation
  • the second text includes at least one second word segmentation
  • the first embedding vector is composed of word segmentation embedding vectors corresponding to the first word segmentation
  • the second embedding vector is composed of the word segmentation embedding vector corresponding to the second word segmentation
  • any first word segmentation of the first text determine the first word vector of the first word segmentation; and, for any second word segmentation of the second text, determine the second word of the second word segmentation vector; determine the first word position vector of the first word segment in the first text relative to the start symbol; and determine the second word segment in the second text relative to the segmentation symbol
  • the second word position vector according to the first word position vector, the first word vector and the preset embedding information of the first text, vector sum processing is performed to obtain the first word segmentation embedding of the first word segmentation vector; and, according to the second word position vector, the second word vector and the preset embedding information of the second text, vector sum processing is performed to obtain a second word segmentation embedding vector of the second word segmentation; based on A first word segmentation embedding vector of the first word segmentation generates the first embedding vector; and, based on a second word segmentation embedding vector of the second word segmentation, the second embedding vector is generated.
  • the target embedding vector into the second network structure inputting the target embedding vector into the second network structure to perform vector processing to obtain a first prediction result of the second network structure predicting the similarity of interpretation of the two paragraphs of text; and, inputting the filter vector into the Vector processing is carried out in the second network structure to obtain the second prediction result of the second network structure predicting the interpretation similarity of the two paragraphs of text; the preset cross-entropy loss function is used to respectively calculate the first prediction result and The second prediction result is calculated to obtain an original loss function and a filtering loss function; and the target loss function value is calculated according to the original loss function and the filtering loss function.
  • the target training model includes the first network structure, An updated second network structure and an updated high-pass filter layer.
  • the text to be recognized is sequentially input into the first network structure and the high-pass filter layer in the target training model to obtain the filter vector of the text to be recognized; based on the filter vector of the multi-section text to be recognized, The cosine similarity of the filter vectors of any two pieces of the text to be recognized is calculated respectively, and the cosine similarity is used to represent the similarity of the interpretation of any two pieces of the text to be recognized.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it realizes:
  • the training sample includes at least two paragraphs of text
  • the training sample is input into the first network structure of the pre-training model to obtain the target embedding vector of the training sample;
  • the pre-training model also includes a high-pass filter layer and a second network structure;
  • the target training model is used to process any input two paragraphs of text to be recognized, and output the two paragraphs of text to be recognized Identify the paraphrase similarity of text.
  • Identifying the start symbol and the segmentation symbol in the training sample determining the text content between the start symbol and the segmentation symbol as the first text, and determining the text content after the segmentation symbol as the first text
  • Two texts input the first text and the second text into the first network structure, obtain the first embedding vector of the first text, and the second embedding vector of the second text; calculate The mean value of the first embedding vector and the second embedding vector, and use the mean value as the target embedding vector of the training sample.
  • the first text includes at least one first word segmentation
  • the second text includes at least one second word segmentation
  • the first embedding vector is composed of word segmentation embedding vectors corresponding to the first word segmentation
  • the second embedding vector is composed of the word segmentation embedding vector corresponding to the second word segmentation
  • any first word segmentation of the first text determine the first word vector of the first word segmentation; and, for any second word segmentation of the second text, determine the second word of the second word segmentation vector; determine the first word position vector of the first word segment in the first text relative to the start symbol; and determine the second word segment in the second text relative to the segmentation symbol
  • the second word position vector according to the first word position vector, the first word vector and the preset embedding information of the first text, vector sum processing is performed to obtain the first word segmentation embedding of the first word segmentation vector; and, according to the second word position vector, the second word vector and the preset embedding information of the second text, vector sum processing is performed to obtain a second word segmentation embedding vector of the second word segmentation; based on A first word segmentation embedding vector of the first word segmentation generates the first embedding vector; and, based on a second word segmentation embedding vector of the second word segmentation, the second embedding vector is generated.
  • the target embedding vector into the second network structure inputting the target embedding vector into the second network structure to perform vector processing to obtain a first prediction result of the second network structure predicting the similarity of interpretation of the two paragraphs of text; and, inputting the filter vector into the Vector processing is carried out in the second network structure to obtain the second prediction result of the second network structure predicting the interpretation similarity of the two paragraphs of text; the preset cross-entropy loss function is used to respectively calculate the first prediction result and The second prediction result is calculated to obtain an original loss function and a filtering loss function; and the target loss function value is calculated according to the original loss function and the filtering loss function.
  • the target training model includes the first network structure, An updated second network structure and an updated high-pass filter layer.
  • the text to be recognized is sequentially input into the first network structure and the high-pass filter layer in the target training model to obtain the filter vector of the text to be recognized; based on the filter vector of the multi-section text to be recognized, The cosine similarity of the filter vectors of any two pieces of the text to be recognized is calculated respectively, and the cosine similarity is used to represent the similarity of the interpretation of any two pieces of the text to be recognized.
  • the computer program 830 can be divided into one or more modules, and one or more modules are stored in the memory 820 and executed by the processor 810 to complete the present application.
  • One or more modules may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 830 in the terminal device 800 .
  • the computer program 830 can be divided into an acquisition module, a first input module, a second input module, a third input module and a training module, and the specific functions of each module are as above.
  • the computer-readable storage medium may be an internal storage unit of the terminal device described in the foregoing embodiments, for example, a hard disk or a memory of the terminal device.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the terminal device 800 may include, but not limited to, a processor 810 and a memory 820 .
  • FIG. 8 is only an example of a terminal device 800, and does not constitute a limitation to the terminal device 800. It may include more or less components than those shown in the figure, or combine certain components, or different components.
  • a terminal device may also include an input and output device, a network access device, a bus, and the like.
  • the so-called processor 810 may be a central processing unit, or other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete hardware components, and the like.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the storage 820 may be an internal storage unit of the terminal device 800 , such as a hard disk or memory of the terminal device 800 .
  • the storage 820 may also be an external storage device of the terminal device 800, such as a plug-in hard disk, a smart memory card, a flash memory card, etc. equipped on the terminal device 800.
  • the memory 820 may also include both an internal storage unit of the terminal device 800 and an external storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请适用于人工智能技术领域,提供了一种释义分析模型训练方法、装置、终端设备及存储介质,方法包括:获取包含两段文本的训练样本;将训练样本输入至预训练模型的第一网络结构中,得到训练样本的目标嵌入向量;将目标嵌入向量输入至高通滤波层进行信息噪音过滤处理,得到滤波向量;将目标嵌入向量和滤波向量分别输入至第二网络结构中进行向量处理,得到目标损失函数值;基于目标损失函数值,对预训练模型进行反向传播训练,得到目标训练模型。采用上述方法,可在对待识别文本的向量处理过程中实现对关键信息的增强,有助于提高目标训练模型的预测准确率。

Description

释义分析模型训练方法、装置、终端设备及存储介质
本申请要求于2021年06月09日在中国专利局提交的、申请号为202110642143.2、发明名称为“释义分析模型训练方法、装置、终端设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于人工智能技术领域,尤其涉及一种释义分析模型训练方法、装置、终端设备及存储介质。
背景技术
文本释义分析任务是自然语言处理领域中评价模型性能优劣的常见指标之一。具体为,输入两段文本至模型中,经过模型处理预测两段文本所表达的意思是否为同一个意思,即两段文本之间的释义相似度。
目前,在进行文本释义分析模型的训练过程中,通常是直接基于人工标注的大量文本数据进行向量处理(如提取文本释义),以进行模型训练。然而,发明人意识到,上述对文本数据进行向量处理的过程中,均未对文本数据中普遍存在的干扰信息进行处理,最终模型的预测准确率较低。
技术问题
本申请实施例的目的之一在于:提供一种释义分析模型训练方法、装置、终端设备及存储介质,旨在解决现有技术中训练的文本释义分析模型预测准确率低的技术问题。
技术解决方案
为解决上述技术问题,本申请实施例采用的技术方案是:
第一方面,本申请实施例提供了一种释义分析模型训练方法,方法包括:
获取训练样本,所述训练样本至少包括两段文本;
将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
第二方面,本申请实施例提供了一种释义分析模型训练装置,装置包括:
获取模块,用于获取训练样本,所述训练样本至少包括两段文本;
第一输入模块,用于将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
第二输入模块,用于将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
第三输入模块,用于将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
训练模块,用于基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
本申请实施例的第三方面提供了一种终端设备,包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:
获取训练样本,所述训练样本至少包括两段文本;
将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入 向量;所述预训练模型还包括高通滤波层和第二网络结构;
将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:
获取训练样本,所述训练样本至少包括两段文本;
将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
本申请实施例的第五方面还提供了一种计算机程序产品,当所述计算机程序产品在终端设备上运行时,使得所述终端设备执行时实现:
获取训练样本,所述训练样本至少包括两段文本;
将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
有益效果
与现有技术相比,本申请实施例包括以下优点:
本申请实施例,通过采用已有可对文本进行向量处理的第一网络结构,对训练样本进行向量处理后,可初步得到可包含两段文本之间释义信息的目标嵌入向量,减少重新设计训练模型中的对文本进行向量处理的第一网络结构的时间。而后,对目标嵌入向量进行高通滤波处理,以减少目标嵌入向量中信息噪声对模型的干扰。之后,基于目标嵌入向量和更为精确的滤波向量进行模型处理,得到目标损失函数值。以此,可使预训练模型在向量处理过程中,不仅可最大化的保留原始两段文本之间的特征信息,还可基于滤波向量实现对两段文本中关键信息的增强。最后,根据该目标损失函数值对预训练模型中的多种学习参数和权重参数进行微调,得到目标训练模型,以提高目标训练模型的预测准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或示范性技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本申请一实施例提供的一种释义分析模型训练方法的实现流程图;
图2是本申请一实施例提供的一种释义分析模型训练方法的S102的一种实现方式示意图;
图3是本申请一实施例提供的一种释义分析模型训练方法的S1023的一种实现方式示意图;
图4是本申请一实施例提供的一种释义分析模型训练方法的S104的一种实现方式示意图;
图5是本申请一实施例提供的一种释义分析模型训练方法的S1043的一种实现方式示意图;
图6是本申请另一实施例提供的一种释义分析模型训练方法的实现流程图;
图7是本申请实施例提供的释义分析模型训练装置的结构示意图;
图8是本申请实施例提供的终端设备的结构示意图。
本发明的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
本申请实施例提供的释义分析模型训练方法可以应用于平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本等终端设备上,本申请实施例对终端设备的具体类型不作任何限制。
请参阅图1,图1示出了本申请实施例提供的一种释义分析模型训练方法的实现流程图,该方法包括如下步骤:
S101、获取训练样本,所述训练样本至少包括两段文本。
在一实施例中,上述训练样本可以预先从已有的多个不同行业和/或研究领域的实际应用进行获取。示例性的,不同行业和/或研究领域的实际应用中均预先收集有相应的样本数据集,每种样本数据集均包含有相应的训练样本。在本实施例中,上述样本数据集具体为用于进行释义相似度分析训练的数据集。其中,上述样本数据集具体包括但不限于MNLI数据集、SNLI数据集以及SICK数据集。
在一实施例中,在上述训练样本为用于进行释义相似度分析的样本时,每个训练样本需包括两段文本,并包含两段文本之间的实际结果(相似结果,或者不相似结果)。基于此,在训练模型接收到训练样本后,可对其进行模型处理,并输出预测结果。之后,基于预测结果与实际结果计算训练损失值,以更新训练模型。
在一实施例中,上述两段文本的语言包括但不限于中文、英文等形式的语言。另外,在两段文本的语言并不相同时,可利用已有的语言翻译技术,对其中一段文本的语言进行翻译,以得到相同语言的两段文本。
S102、将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构。
在一实施例中,上述预训练模型可以为已有的模型,或者新设计的网络模型,对此不作限定。在本实施例中,上述预训练模型具体为已有由Transformer改进的Bert模型,该模型通常用于执行文本释义分析任务,以减少预训练模型的训练时间。
在一实施例中,上述目标嵌入向量为表示两段文本之间释义关系的特征向量。即在将训练样本输入至第一网络结构并对其进行向量处理后,得到的目标嵌入向量可用于表示两 段文本之间的释义关系。而后,第二网络结构和高通滤波层可对目标嵌入向量进行处理,并输出两段文本之间释义相似度的预测结果。最后,第二网络结构还可基于该预测结果与训练样本的实际结果,计算相应损失函数值,并对预训练模型进行迭代更新。
在一实施例中,上述第一网络结构为对训练样本进行处理,得到目标嵌入向量的结构,其可为使用已有的Bert模型中对文本进行处理得到文本特征向量的网络结构,对此不作详细描述。上述第二网络结构具体可包括Dropout层、Linear层、Softmax层以及损失函数计算层。其中,上述Dropout层在模型训练的过程中,可以解决模型过拟合的问题。上述Linear层中的线性函数可以使模型在训练的过程中逐渐收敛。上述Softmax层中的激活函数可以基于输入的特征向量进行计算,输出训练样本中两段文本之间的预测结果(释义相似度)。之后,损失函数计算层可以基于预测结果以及训练样本的实际结果计算损失函数值。
其中,训练样本之间的实际结果通常以1或0为参数参与计算。其中,上述损失函数层中的损失函数具体可以为交叉熵损失函数,而上述softmax函数具体为:
Figure PCTCN2022071358-appb-000001
其中,xi为预训练模型预测训练样本属于第i类预测结果的初始概率,f(xi)为基于上述预测结果的初始概率进行激活函数计算,得到的预测训练样本属于第i类预测结果的最终概率。需要说明的是,在进行二分类的过程中,上述i的类别只有两类。然而,在进行多分类的预测过程中,对i的类别的数量不作限定。此时,输出的f(xi)的最终概率还可认为是预训练模型输出的两端文本的释义相似度。需要补充的是,在得到第i类预测结果的初始概率xi后,本实施例还需通过上述激活函数f(xi)对其进行进一步的修正,以提高第二网络结构对整体预训练模型的表达能力。
在一实施例中,上述高通滤波层可用于对输入的目标嵌入向量进行信息噪音过滤处理,以清除目标嵌入向量中的低频分量,保留其中的高频分量。
S103、将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量。
在一实施例中,上述S102已对高通滤波层进行解释,对此不再进行说明。需要补充的是,上述高频分量通常包含有目标嵌入向量中更为精确的向量信息,进而可使得到的滤波向量能够更好的体现出两段文本之间所包含的释义信息。
在一实施例中,上述高通滤波层包含有高通滤波器,其在对信号进行处理时,可使高频信号正常通过,而低于设定临界值的低频信号则被阻隔、减弱。但是阻隔、减弱的幅度则会依据不同的频率以及不同的滤波程序(目的)而改变。在具体应用中,在将目标嵌入向量输入至高通滤波层时,高通滤波层可将目标嵌入向量转换为频率上的信号进行表示,且该信号由多个不同频率之间的信号叠加而成。其中,将向量转换为信号可采用已有的测试向量转换技术进行实现,对此不进行详细说明。之后,对于经过高通滤波器处理后得到高频信号,可重新基于上述转换技术转换为向量进行后续模型处理。
S104、将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值。
在一实施例中,上述已对第二网络结构进行解释,对此不再进行说明。需要补充的是,目标嵌入向量和滤波向量可以同时输入至第二网络结构中进行模型处理,得到的为预训练模型输出的两段文本的之间预测结果。之后,预训练模型在基于该预测结果和训练样本的实际结果进行计算,得到目标损失函数值。或者,终端设备也可以分别将目标嵌入向量和滤波向量输入至第二网络结构中,此时可对应得到预训练模型输出的两种预测结果。之后,第二网络结构可将每种预测结果分别和实际结果进行计算,对应得到两种损失函数值。最后,第二网络结构可将两种损失函数值进行加和,得到目标损失函数值,对此不作限定。
需要特别说明的是,对目标嵌入向量进行信息噪音过滤处理后,虽然可精确的保留其两段文本之间的释义信息,然而,在实际情况下,该信息噪音过滤处理还将不可避免的损 失两段文本之间的部分特征信息。基于此,在本实施例中,将过滤后的滤波向量和目标嵌入向量输入至第二网络结构中进行处理,可在最大化的保留文本的特征信息时,还可基于滤波向量实现对两段文本中关键信息的增强。
S105、基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
在一实施例中,在得到上述目标损失函数值后,预训练模型可基于该数值进行反向传播训练,以更新预训练模型中各网络层的学习参数和权重参数,得到目标训练模型。另外,基于上述方法得到的目标训练模型具体为二分类模型,其可用于对输入的两段待识别文本进行模型处理,输出两段待识别文本的释义相似度。
需要补充的是,在进行释义相似度预测时,目标训练模型对两段待识别文本的处理过程具体可参照上述S102以及S103的步骤,得到两段待识别文本的目标嵌入向量和滤波向量。之后,目标训练模型在将目标嵌入向量和滤波向量输入至第二网络结构时,因当前目标训练模型不需要进行反向传播训练,也即不需要进行损失函数值计算。因此,目标训练模型只需将目标嵌入向量和滤波向量输入至第二网络结构中的Dropout层、Linear层、Softmax层进行处理,即可得到预测结果。
在本实施例中,通过采用已有可对文本进行向量处理的第一网络结构,对训练样本进行向量处理后,可初步得到可包含两段文本之间释义信息的目标嵌入向量,减少重新设计训练模型中的对文本进行向量处理的第一网络结构的时间。而后,对目标嵌入向量进行高通滤波处理,以减少目标嵌入向量中信息噪声对模型的干扰。之后,基于目标嵌入向量和更为精确的滤波向量进行模型处理,得到目标损失函数值。以此,可使预训练模型在向量处理过程中,不仅可最大化的保留原始两段文本之间的特征信息,还可基于滤波向量实现对两段文本中关键信息的增强。最后,根据该目标损失函数值对预训练模型中的多种学习参数和权重参数进行微调,得到目标训练模型,以提高目标训练模型的预测准确率。
参照图2,在一实施例中,在S102将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量,具体包括如下子步骤S1021-1024,详述如下:
S1021、识别所述训练样本中的起始符号以及分割符号。
S1022、将所述起始符号与所述分割符号之间的文本内容确定为第一文本,以及将处于所述分割符号之后的文本内容确定为第二文本。
在一实施例中,上述起始符号和分割符号均可为用户根据实际情况进行设置,其包括但不限于字母、数字等形式,本实施例对起始符号和分割符号的表现形式不作限定。
在本实施例中,因上述S102中已说明使用已有的Bert模型对训练样本进行向量处理,基于此,可采用Bert模型中对训练样本进行分割的起始符号SEP和分割符号CLS进行标识。之后,终端设备可将SEP符号与CLS符号之间的文本确定为第一文本,以及将CLS符号之后的文本确定为第二文本。
S1023、将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量。
在一实施例中,上述第一嵌入向量可以为对第一文本中各个分词进行处理后得到的向量。示例性的,对于第一文本中的任一第一分词,可获取第一分词的第一词向量;以及,基于起始符号,确定第一分词在第一文本中的第一词位置向量。之后,基于第一词向量、第一词位置向量以及第一文本的预设嵌入向量进行综合处理,得到第一分词的第一分词嵌入向量。之后,对上述三种向量进行加和,得到可表示第一分词的第一分词嵌入向量。最后,对每个第一分词执行上述步骤,得到第一文本中各个第一分词的第一分词嵌入向量。基于此,可认为上述第一嵌入向量即由上述第一文本中各个第一分词的第一分词嵌入向量组成。
可以理解的是,得到上述第二文本的第二嵌入向量的过程与上述得到第一嵌入向量的 过程相似,具体可参照上述说明。
S1024、计算所述第一嵌入向量和所述第二嵌入向量的均值,并将所述均值作为所述训练样本的目标嵌入向量。
在一实施例中,因上述第一嵌入向量和第二嵌入向量均可通过具体的数字形式进行表示。基于此,在得到第一嵌入向量和第二嵌入向量后,可采用两种向量之间数字的平均值,作为训练样本的目标嵌入向量,对此不作限定。此时,可以理解的是,该目标嵌入向量因是基于第一嵌入向量和第二嵌入向量进行处理得到,因此,可认为目标嵌入向量同时包含了两者嵌入向量的向量信息。
参照图3,在一实施例中,所述第一文本包括至少一个第一分词,所述第二文本包括至少一个第二分词,所述第一嵌入向量由所述第一分词对应的分词嵌入向量组成,所述第二嵌入向量由所述第二分词对应的分词嵌入向量组成;在S1023将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量中,具体包括如下子步骤S10231-10234,详述如下:
S10231、针对所述第一文本的任一第一分词,确定所述第一分词的第一词向量;以及,针对所述第二文本的任一第二分词,确定所述第二分词的第二词向量。
S10232、确定所述第一分词在所述第一文本中相对于所述起始符号的第一词位置向量;以及,确定所述第二分词在所述第二文本中相对于所述分割符号的第二词位置向量。
在一实施例中,第一文本和第二文本均分别至少包括一个分词,以第一文本包含多个第一分词为例进行解释说明。示例性的,终端设备可基于预先设置的词向量库,对第一文本进行文本分词,得到多个第一分词。其中,词向量库包含多有多个分词,且每个分词均对应有唯一的词向量。基于此,终端设备可先将整个第一文本作为一个分词,在词向量库中进行比对。若未有相应的分词,则减少第一个字符或最后一个字符,并将剩余的文本作为一个分词在词向量库中进行比对,直至第一文本中的每个字符均匹配有相应的分词以及词向量。此时,相应的分词即为第一分词。
在一实施例中,在对第一文本进行文本分词后,可基于分词结果确定第一分词在第一文本所包含多个第一分词中的词顺序。之后,将词顺序作为第一分词的词位置向量。
S10233、根据所述第一词位置向量、所述第一词向量以及所述第一文本的预设嵌入信息进行向量加和处理,得到所述第一分词的第一分词嵌入向量;以及,根据所述第二词位置向量、所述第二词向量以及所述第二文本的预设嵌入信息进行向量加和处理,得到所述第二分词的第二分词嵌入向量。
S10234、基于所述第一分词的第一分词嵌入向量生成所述第一嵌入向量;以及,基于所述第二分词的第二分词嵌入向量生成所述第二嵌入向量。
在一实施例中,上述第一预先嵌入向量用于区分第一分词所归属的文本,可以由用户根据实际情况进行设置。需要说明的是,对于第一文本中的多个第一分词,其每个第一分词对应的第一预先嵌入向量均一致。
需要补充的是,基于上述S10231-S10233的说明,可认为上述第一词向量、第一词位置向量以及预设嵌入向量均可以具体的数字形式进行表示。因此,对于上述第一分词的三种向量,可采用上述三种向量的加和数值表示为第一分词的第一分词嵌入向量。最后,在确定第一文本中多个第一分词分别对应的第一分词嵌入向量,即为得到第一文本的第一嵌入向量。
在一实施例中,得到上述第二文本的第二嵌入向量的过程与上述得到第一嵌入向量的过程相似,具体可参照上述说明。
参照图4,在一实施例中,在S104所述将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值中,具体包括如下子步骤S1041-1043,详述如下:
S1041、将所述目标嵌入向量输入所述第二网络结构中进行向量处理,得到所述第二网 络结构预测所述两段文本的释义相似度的第一预测结果;以及,将所述滤波向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第二预测结果。
在一实施例中,上述第二网络结构已在上述S102中进行解释,对此不再进行说明。需要说明的是,对于上述目标嵌入向量和滤波向量,第二网络结构是分别对两种向量进行处理,以分别得到对应的第一预测结果和第二预测结果。
可以理解的是,因滤波向量是基于对目标嵌入向量进行处理得到。因此,相比于第一预测结果,还可认为第二预测结果更接近于训练样本的实际结果。
S1042、采用预设的交叉熵损失函数分别对所述第一预测结果和所述第二预测结果进行计算,得到原始损失函数和滤波损失函数。
S1043、根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值。
在一实施例中,上述交叉熵是用来评估当前预训练模型预测的概率分布(预测结果)与真实分布(真实结果)的差异情况。其中,减少交叉熵损失即为可提高预训练模型的预测准确率。需要补充的是,相比于使用其他损失函数(如平方损失函数),当第二网络结构中使用Sigmoid或Softmax作为激活函数时,使用交叉熵计算损失函数,可以解决利用平方损失函数迭代更新预训练模型慢的问题。
在一实施例中,上述根据原始损失函数和滤波损失函数计算所述目标损失函数值,具体可为:计算原始损失函数和滤波损失函数之和,得到目标损失函数。然而,参照图5,在另一实施例中,上述计算目标损失函数也可以通过如下子步骤S10431-S10432,详述如下:
S10431、基于原始损失函数对应的预设第一权重值,计算修正后的原始损失函数;以及,基于滤波损失函数对应的预设第二权重值,计算修正后的滤波损失函数。
S10432、将所述修正后的原始损失函数与所述修正后的滤波损失函数之和作为所述目标损失函数值。
在一实施例中,上述S105已说明滤波向量可实现对两段文本中关键信息的增强。因此,对于上述两种损失函数,可认为预设第二权重值大于预设第一权重值。以此,经过预设第一权重值和预设的第二权重值进行计算得到的目标损失函数,可更好的完成对预训练模型的训练,以得到预测准确率高的目标训练模型。
在一实施例中,在得到目标损失函数值后,通常的,是对预训练模型所有的权重参数和学习参数进行更新。然而,在本实施中,因第一网络结构为已有的Bert模型中对文本进行向量处理的网络结构,可认为该第一网络结构为已成熟的网络结构。基于此,在对预训练模型的模型参数进行迭代更新时,可只对第二网络结构和高通滤波层中的模型参数进行迭代更新,在保证训练后的目标训练模型具备一定的预测准确率的同时,还可减少对目标训练模型的训练时间。
参照图6,在一实施例中,在生成所述目标训练模型后,若输入至所述目标训练模型的待识别文本的数量超过两段,所述方法还包括如下步骤S11-S12,详述如下:
S11、针对任一待识别文本,将所述待识别文本依次输入所述目标训练模型中的第一网络结构和高通滤波层,得到所述待识别文本的滤波向量。
S12、基于多段待识别文本的滤波向量,分别计算任意两段所述待识别文本的滤波向量的余弦相似度,所述余弦相似度用于表示任意两段所述待识别文本的释义相似度。
在一实施例中,上述余弦相似度是基于向量空间中的两个向量夹角的余弦值作为衡量两个个体(两段待识别文本)间差异大小的度量,值越接近1,可认为两个向量越相似,即表示两段待识别文本之间越相似。
在一实施例中,通过上述方法得到目标训练模型,适用于对两段文本进行释义相似度识别。然而,若目标训练模型需要对多个待识别文本的数量进行识别时,则需对每段文本进行上述方法处理。
具体的,针对任一待识别文本,将该段待识别文本输入至目标训练模型的第一网络结 构后,可得到该待识别文本的待识别目标嵌入向量。之后,将待识别目标向量输入至高通滤波层,得到待识别文本的滤波向量。此时,因滤波向量包含有待识别目标嵌入向量中更为精确的向量信息。因此,目标训练模型可计算两两待识别文本的滤波向量的余弦相似度,作为两两待识别文本的释义相似度。此时,目标训练模型不仅可实现两段待识别文本之间的释义相似度分析,还可对多段文本之间的释义相似度进行识别预测。
请参阅图7,图7是本申请实施例提供的一种释义分析模型训练装置的结构框图。本实施例中释义分析模型训练装置包括的各模块用于执行图1至图6对应的实施例中的各步骤。具体请参阅图1至图6以及图1至图6所对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。参见图7,释义分析模型训练装置700包括:获取模块710、第一输入模块720、第二输入模块730、第三输入模块740和训练模块750,其中:
获取模块710,用于获取训练样本,所述训练样本至少包括两段文本。
第一输入模块720,用于将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构。
第二输入模块730,用于将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量。
第三输入模块740,用于将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值。
训练模块750,用于基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
在一实施例中,第一输入模块720还用于:
识别所述训练样本中的起始符号以及分割符号;将所述起始符号与所述分割符号之间的文本内容确定为第一文本,以及将处于所述分割符号之后的文本内容确定为第二文本;将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量;计算所述第一嵌入向量和所述第二嵌入向量的均值,并将所述均值作为所述训练样本的目标嵌入向量。
在一实施例中,第一文本包括至少一个第一分词,所述第二文本包括至少一个第二分词,所述第一嵌入向量由所述第一分词对应的分词嵌入向量组成,所述第二嵌入向量由所述第二分词对应的分词嵌入向量组成;第一输入模块720还用于:
针对所述第一文本的任一第一分词,确定所述第一分词的第一词向量;以及,针对所述第二文本的任一第二分词,确定所述第二分词的第二词向量;确定所述第一分词在所述第一文本中相对于所述起始符号的第一词位置向量;以及,确定所述第二分词在所述第二文本中相对于所述分割符号的第二词位置向量;根据所述第一词位置向量、所述第一词向量以及所述第一文本的预设嵌入信息进行向量加和处理,得到所述第一分词的第一分词嵌入向量;以及,根据所述第二词位置向量、所述第二词向量以及所述第二文本的预设嵌入信息进行向量加和处理,得到所述第二分词的第二分词嵌入向量;基于所述第一分词的第一分词嵌入向量生成所述第一嵌入向量;以及,基于所述第二分词的第二分词嵌入向量生成所述第二嵌入向量。
在一实施例中,第三输入模块740还用于:
将所述目标嵌入向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第一预测结果;以及,将所述滤波向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第二预测结果;采用预设的交叉熵损失函数分别对所述第一预测结果和所述第二预测结果进行计算,得到原始损失函数和滤波损失函数;根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值。
在一实施例中,第三输入模块740还用于:
基于所述原始损失函数对应的预设第一权重值,计算修正后的原始损失函数;以及,基于所述滤波损失函数对应的预设第二权重值,计算修正后的滤波损失函数;将所述修正后的原始损失函数与所述修正后的滤波损失函数之和作为所述目标损失函数值。
在一实施例中,训练模块750还用于:
基于所述目标损失函数值,依次对所述第二网络结构和所述高通滤波层中的模型参数进行迭代更新,得到所述目标训练模型,所述目标训练模型包括所述第一网络结构、更新后的第二网络结构和更新后的高通滤波层。
在一实施例中,释义分析模型训练装置700还包括:
第四输入模块,用于针对任一待识别文本,将所述待识别文本依次输入所述目标训练模型中的第一网络结构和高通滤波层,得到所述待识别文本的滤波向量。
计算模块,用于基于多段待识别文本的滤波向量,分别计算任意两段所述待识别文本的滤波向量的余弦相似度,所述余弦相似度用于表示任意两段所述待识别文本的释义相似度。
应当理解的是,图7示出的释义分析模型训练装置的结构框图中,各模块用于执行图1至图6对应的实施例中的各步骤,而对于图1至图6对应的实施例中的各步骤已在上述实施例中进行详细解释,具体请参阅图1至图6以及图1至图6所对应的实施例中的相关描述,此处不再赘述。
图8是本申请另一实施例提供的一种终端设备的结构框图。如图8所示,该实施例的终端设备800包括:处理器810、存储器820以及存储在存储器820中并可在处理器810运行的计算机程序830,例如释义分析模型训练方法的程序。处理器810执行计算机程序830时实现上述各个释义分析模型训练方法各实施例中的步骤,例如图1所示的S101至S105。或者,处理器810执行计算机程序830时实现上述图7对应的实施例中各模块的功能,例如,图7所示的单元710至750的功能,具体如下所述:
一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:
获取训练样本,所述训练样本至少包括两段文本;
将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
识别所述训练样本中的起始符号以及分割符号;将所述起始符号与所述分割符号之间的文本内容确定为第一文本,以及将处于所述分割符号之后的文本内容确定为第二文本;将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量;计算所述第一嵌入向量和所述第二嵌入向量的均值,并将所述均值作为所述训练样本的目标嵌入向量。
在一个实施例中,所述第一文本包括至少一个第一分词,所述第二文本包括至少一个第二分词,所述第一嵌入向量由所述第一分词对应的分词嵌入向量组成,所述第二嵌入向量由所述第二分词对应的分词嵌入向量组成;所述处理器执行所述计算机程序时还实现:
针对所述第一文本的任一第一分词,确定所述第一分词的第一词向量;以及,针对所述第二文本的任一第二分词,确定所述第二分词的第二词向量;确定所述第一分词在所述第一文本中相对于所述起始符号的第一词位置向量;以及,确定所述第二分词在所述第二 文本中相对于所述分割符号的第二词位置向量;根据所述第一词位置向量、所述第一词向量以及所述第一文本的预设嵌入信息进行向量加和处理,得到所述第一分词的第一分词嵌入向量;以及,根据所述第二词位置向量、所述第二词向量以及所述第二文本的预设嵌入信息进行向量加和处理,得到所述第二分词的第二分词嵌入向量;基于所述第一分词的第一分词嵌入向量生成所述第一嵌入向量;以及,基于所述第二分词的第二分词嵌入向量生成所述第二嵌入向量。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
将所述目标嵌入向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第一预测结果;以及,将所述滤波向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第二预测结果;采用预设的交叉熵损失函数分别对所述第一预测结果和所述第二预测结果进行计算,得到原始损失函数和滤波损失函数;根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
基于所述原始损失函数对应的预设第一权重值,计算修正后的原始损失函数;以及,基于所述滤波损失函数对应的预设第二权重值,计算修正后的滤波损失函数;将所述修正后的原始损失函数与所述修正后的滤波损失函数之和作为所述目标损失函数值。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
基于所述目标损失函数值,依次对所述第二网络结构和所述高通滤波层中的模型参数进行迭代更新,得到所述目标训练模型,所述目标训练模型包括所述第一网络结构、更新后的第二网络结构和更新后的高通滤波层。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
针对任一待识别文本,将所述待识别文本依次输入所述目标训练模型中的第一网络结构和高通滤波层,得到所述待识别文本的滤波向量;基于多段待识别文本的滤波向量,分别计算任意两段所述待识别文本的滤波向量的余弦相似度,所述余弦相似度用于表示任意两段所述待识别文本的释义相似度。
一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:
获取训练样本,所述训练样本至少包括两段文本;
将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
在一个实施例中,所述计算机程序被处理器执行时还实现:
识别所述训练样本中的起始符号以及分割符号;将所述起始符号与所述分割符号之间的文本内容确定为第一文本,以及将处于所述分割符号之后的文本内容确定为第二文本;将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量;计算所述第一嵌入向量和所述第二嵌入向量的均值,并将所述均值作为所述训练样本的目标嵌入向量。
在一个实施例中,所述第一文本包括至少一个第一分词,所述第二文本包括至少一个第二分词,所述第一嵌入向量由所述第一分词对应的分词嵌入向量组成,所述第二嵌入向量由所述第二分词对应的分词嵌入向量组成;所述计算机程序被处理器执行时还实现:
针对所述第一文本的任一第一分词,确定所述第一分词的第一词向量;以及,针对所述第二文本的任一第二分词,确定所述第二分词的第二词向量;确定所述第一分词在所述第一文本中相对于所述起始符号的第一词位置向量;以及,确定所述第二分词在所述第二文本中相对于所述分割符号的第二词位置向量;根据所述第一词位置向量、所述第一词向量以及所述第一文本的预设嵌入信息进行向量加和处理,得到所述第一分词的第一分词嵌入向量;以及,根据所述第二词位置向量、所述第二词向量以及所述第二文本的预设嵌入信息进行向量加和处理,得到所述第二分词的第二分词嵌入向量;基于所述第一分词的第一分词嵌入向量生成所述第一嵌入向量;以及,基于所述第二分词的第二分词嵌入向量生成所述第二嵌入向量。
在一个实施例中,所述计算机程序被处理器执行时还实现:
将所述目标嵌入向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第一预测结果;以及,将所述滤波向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第二预测结果;采用预设的交叉熵损失函数分别对所述第一预测结果和所述第二预测结果进行计算,得到原始损失函数和滤波损失函数;根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值。
在一个实施例中,所述计算机程序被处理器执行时还实现:
基于所述原始损失函数对应的预设第一权重值,计算修正后的原始损失函数;以及,基于所述滤波损失函数对应的预设第二权重值,计算修正后的滤波损失函数;将所述修正后的原始损失函数与所述修正后的滤波损失函数之和作为所述目标损失函数值。
在一个实施例中,所述计算机程序被处理器执行时还实现:
基于所述目标损失函数值,依次对所述第二网络结构和所述高通滤波层中的模型参数进行迭代更新,得到所述目标训练模型,所述目标训练模型包括所述第一网络结构、更新后的第二网络结构和更新后的高通滤波层。
在一个实施例中,所述计算机程序被处理器执行时还实现:
针对任一待识别文本,将所述待识别文本依次输入所述目标训练模型中的第一网络结构和高通滤波层,得到所述待识别文本的滤波向量;基于多段待识别文本的滤波向量,分别计算任意两段所述待识别文本的滤波向量的余弦相似度,所述余弦相似度用于表示任意两段所述待识别文本的释义相似度。
示例性的,计算机程序830可以被分割成一个或多个模块,一个或者多个模块被存储在存储器820中,并由处理器810执行,以完成本申请。一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序830在终端设备800中的执行过程。例如,计算机程序830可以被分割成获取模块、第一输入模块、第二输入模块、第三输入模块和训练模块,各模块具体功能如上。所述计算机可读存储介质可以是前述实施例所述的终端设备的内部存储单元,例如所述终端设备的硬盘或内存。所述计算机可读存储介质可以是非易失性,也可以是易失性。
终端设备800可包括,但不仅限于,处理器810、存储器820。本领域技术人员可以理解,图8仅仅是终端设备800的示例,并不构成对终端设备800的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器810可以是中央处理单元,还可以是其他通用处理器、数字信号处理器、专用集成电路、现成可编程门阵列或者其他可编程逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
存储器820可以是终端设备800的内部存储单元,例如终端设备800的硬盘或内存。存储器820也可以是终端设备800的外部存储设备,例如终端设备800上配备的插接式硬盘,智能存储卡,闪存卡等。进一步地,存储器820还可以既包括终端设备800的内部存 储单元也包括外部存储设备。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种释义分析模型训练方法,其中,包括:
    获取训练样本,所述训练样本至少包括两段文本;
    将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
    将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
    将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
    基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
  2. 如权利要求1所述的释义分析模型训练方法,其中,所述将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量,包括:
    识别所述训练样本中的起始符号以及分割符号;
    将所述起始符号与所述分割符号之间的文本内容确定为第一文本,以及将处于所述分割符号之后的文本内容确定为第二文本;
    将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量;
    计算所述第一嵌入向量和所述第二嵌入向量的均值,并将所述均值作为所述训练样本的目标嵌入向量。
  3. 如权利要求2所述的释义分析模型训练方法,其中,所述第一文本包括至少一个第一分词,所述第二文本包括至少一个第二分词,所述第一嵌入向量由所述第一分词对应的分词嵌入向量组成,所述第二嵌入向量由所述第二分词对应的分词嵌入向量组成;
    所述将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量,包括:
    针对所述第一文本的任一第一分词,确定所述第一分词的第一词向量;以及,针对所述第二文本的任一第二分词,确定所述第二分词的第二词向量;
    确定所述第一分词在所述第一文本中相对于所述起始符号的第一词位置向量;以及,确定所述第二分词在所述第二文本中相对于所述分割符号的第二词位置向量;
    根据所述第一词位置向量、所述第一词向量以及所述第一文本的预设嵌入信息进行向量加和处理,得到所述第一分词的第一分词嵌入向量;以及,根据所述第二词位置向量、所述第二词向量以及所述第二文本的预设嵌入信息进行向量加和处理,得到所述第二分词的第二分词嵌入向量;
    基于所述第一分词的第一分词嵌入向量生成所述第一嵌入向量;以及,基于所述第二分词的第二分词嵌入向量生成所述第二嵌入向量。
  4. 如权利要求1-3任一所述的释义分析模型训练方法,其中,所述将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值,包括:
    将所述目标嵌入向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第一预测结果;以及,将所述滤波向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第二预测结果;
    采用预设的交叉熵损失函数分别对所述第一预测结果和所述第二预测结果进行计算,得到原始损失函数和滤波损失函数;
    根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值。
  5. 如权利要求4所述的释义分析模型训练方法,其中,所述根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值,包括:
    基于所述原始损失函数对应的预设第一权重值,计算修正后的原始损失函数;以及,基于所述滤波损失函数对应的预设第二权重值,计算修正后的滤波损失函数;
    将所述修正后的原始损失函数与所述修正后的滤波损失函数之和作为所述目标损失函数值。
  6. 如权利要求1-3任一所述的释义分析模型训练方法,其中,所述基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,包括:
    基于所述目标损失函数值,依次对所述第二网络结构和所述高通滤波层中的模型参数进行迭代更新,得到所述目标训练模型,所述目标训练模型包括所述第一网络结构、更新后的第二网络结构和更新后的高通滤波层。
  7. 如权利要求1-3任一所述的释义分析模型训练方法,其中,在生成所述目标训练模型后,若输入至所述目标训练模型的待识别文本的数量超过两段,所述方法还包括:
    针对任一待识别文本,将所述待识别文本依次输入所述目标训练模型中的第一网络结构和高通滤波层,得到所述待识别文本的滤波向量;
    基于多段待识别文本的滤波向量,分别计算任意两段所述待识别文本的滤波向量的余弦相似度,所述余弦相似度用于表示任意两段所述待识别文本的释义相似度。
  8. 一种释义分析模型训练装置,其中,包括:
    获取模块,用于获取训练样本,所述训练样本至少包括两段文本;
    第一输入模块,用于将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
    第二输入模块,用于将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
    第三输入模块,用于将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
    训练模块,用于基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现:
    获取训练样本,所述训练样本至少包括两段文本;
    将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
    将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
    将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
    基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本的释义相似度。
  10. 根据权利要求9所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    识别所述训练样本中的起始符号以及分割符号;
    将所述起始符号与所述分割符号之间的文本内容确定为第一文本,以及将处于所述分割符号之后的文本内容确定为第二文本;
    将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量;
    计算所述第一嵌入向量和所述第二嵌入向量的均值,并将所述均值作为所述训练样本的目标嵌入向量。
  11. 根据权利要求10所述的终端设备,其中,所述第一文本包括至少一个第一分词,所述第二文本包括至少一个第二分词,所述第一嵌入向量由所述第一分词对应的分词嵌入向量组成,所述第二嵌入向量由所述第二分词对应的分词嵌入向量组成,所述处理器执行所述计算机程序时还实现:
    针对所述第一文本的任一第一分词,确定所述第一分词的第一词向量;以及,针对所述第二文本的任一第二分词,确定所述第二分词的第二词向量;
    确定所述第一分词在所述第一文本中相对于所述起始符号的第一词位置向量;以及,确定所述第二分词在所述第二文本中相对于所述分割符号的第二词位置向量;
    根据所述第一词位置向量、所述第一词向量以及所述第一文本的预设嵌入信息进行向量加和处理,得到所述第一分词的第一分词嵌入向量;以及,根据所述第二词位置向量、所述第二词向量以及所述第二文本的预设嵌入信息进行向量加和处理,得到所述第二分词的第二分词嵌入向量;
    基于所述第一分词的第一分词嵌入向量生成所述第一嵌入向量;以及,基于所述第二分词的第二分词嵌入向量生成所述第二嵌入向量。
  12. 根据权利要求9-11任一所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    将所述目标嵌入向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第一预测结果;以及,将所述滤波向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第二预测结果;
    采用预设的交叉熵损失函数分别对所述第一预测结果和所述第二预测结果进行计算,得到原始损失函数和滤波损失函数;
    根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值。
  13. 根据权利要求12所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    基于所述原始损失函数对应的预设第一权重值,计算修正后的原始损失函数;以及,基于所述滤波损失函数对应的预设第二权重值,计算修正后的滤波损失函数;
    将所述修正后的原始损失函数与所述修正后的滤波损失函数之和作为所述目标损失函数值。
  14. 根据权利要求9-11任一所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    基于所述目标损失函数值,依次对所述第二网络结构和所述高通滤波层中的模型参数进行迭代更新,得到所述目标训练模型,所述目标训练模型包括所述第一网络结构、更新后的第二网络结构和更新后的高通滤波层。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现:
    获取训练样本,所述训练样本至少包括两段文本;
    将所述训练样本输入至预训练模型的第一网络结构中,得到所述训练样本的目标嵌入向量;所述预训练模型还包括高通滤波层和第二网络结构;
    将所述目标嵌入向量输入至所述高通滤波层进行信息噪音过滤处理,得到滤波向量;
    将所述目标嵌入向量和所述滤波向量分别输入至所述第二网络结构中进行向量处理,得到所述训练样本的目标损失函数值;
    基于所述目标损失函数值,对所述预训练模型进行反向传播训练,得到目标训练模型,所述目标训练模型用于对任意输入的两段待识别文本进行处理,输出所述两段待识别文本 的释义相似度。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    识别所述训练样本中的起始符号以及分割符号;
    将所述起始符号与所述分割符号之间的文本内容确定为第一文本,以及将处于所述分割符号之后的文本内容确定为第二文本;
    将所述第一文本和所述第二文本输入至所述第一网络结构中,得到所述第一文本的第一嵌入向量,以及所述第二文本的第二嵌入向量;
    计算所述第一嵌入向量和所述第二嵌入向量的均值,并将所述均值作为所述训练样本的目标嵌入向量。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述第一文本包括至少一个第一分词,所述第二文本包括至少一个第二分词,所述第一嵌入向量由所述第一分词对应的分词嵌入向量组成,所述第二嵌入向量由所述第二分词对应的分词嵌入向量组成,所述计算机程序被处理器执行时还实现:
    针对所述第一文本的任一第一分词,确定所述第一分词的第一词向量;以及,针对所述第二文本的任一第二分词,确定所述第二分词的第二词向量;
    确定所述第一分词在所述第一文本中相对于所述起始符号的第一词位置向量;以及,确定所述第二分词在所述第二文本中相对于所述分割符号的第二词位置向量;
    根据所述第一词位置向量、所述第一词向量以及所述第一文本的预设嵌入信息进行向量加和处理,得到所述第一分词的第一分词嵌入向量;以及,根据所述第二词位置向量、所述第二词向量以及所述第二文本的预设嵌入信息进行向量加和处理,得到所述第二分词的第二分词嵌入向量;
    基于所述第一分词的第一分词嵌入向量生成所述第一嵌入向量;以及,基于所述第二分词的第二分词嵌入向量生成所述第二嵌入向量。
  18. 根据权利要求15-17任一所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    将所述目标嵌入向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第一预测结果;以及,将所述滤波向量输入所述第二网络结构中进行向量处理,得到所述第二网络结构预测所述两段文本的释义相似度的第二预测结果;
    采用预设的交叉熵损失函数分别对所述第一预测结果和所述第二预测结果进行计算,得到原始损失函数和滤波损失函数;
    根据所述原始损失函数和所述滤波损失函数计算所述目标损失函数值。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述处理器执行所述计算机程序时还实现:
    基于所述原始损失函数对应的预设第一权重值,计算修正后的原始损失函数;以及,基于所述滤波损失函数对应的预设第二权重值,计算修正后的滤波损失函数;
    将所述修正后的原始损失函数与所述修正后的滤波损失函数之和作为所述目标损失函数值。
  20. 根据权利要求15-17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    基于所述目标损失函数值,依次对所述第二网络结构和所述高通滤波层中的模型参数进行迭代更新,得到所述目标训练模型,所述目标训练模型包括所述第一网络结构、更新后的第二网络结构和更新后的高通滤波层。
PCT/CN2022/071358 2021-06-09 2022-01-11 释义分析模型训练方法、装置、终端设备及存储介质 WO2022257453A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110642143.2 2021-06-09
CN202110642143.2A CN113377909B (zh) 2021-06-09 2021-06-09 释义分析模型训练方法、装置、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022257453A1 true WO2022257453A1 (zh) 2022-12-15

Family

ID=77573163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071358 WO2022257453A1 (zh) 2021-06-09 2022-01-11 释义分析模型训练方法、装置、终端设备及存储介质

Country Status (2)

Country Link
CN (1) CN113377909B (zh)
WO (1) WO2022257453A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689354A (zh) * 2024-02-04 2024-03-12 芯知科技(江苏)有限公司 基于云服务的招聘信息的智能处理方法及平台

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377909B (zh) * 2021-06-09 2023-07-11 平安科技(深圳)有限公司 释义分析模型训练方法、装置、终端设备及存储介质
CN114065768B (zh) * 2021-12-08 2022-12-09 马上消费金融股份有限公司 特征融合模型的训练、文本处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895553A (zh) * 2018-08-23 2020-03-20 国信优易数据有限公司 语义匹配模型训练方法、语义匹配方法及答案获取方法
US20200242304A1 (en) * 2017-11-29 2020-07-30 Tencent Technology (Shenzhen) Company Limited Text recommendation method and apparatus, and electronic device
CN111859960A (zh) * 2020-07-27 2020-10-30 中国平安人寿保险股份有限公司 基于知识蒸馏的语义匹配方法、装置、计算机设备和介质
CN112328786A (zh) * 2020-11-03 2021-02-05 平安科技(深圳)有限公司 基于bert的文本分类方法、装置、计算机设备及存储介质
CN113377909A (zh) * 2021-06-09 2021-09-10 平安科技(深圳)有限公司 释义分析模型训练方法、装置、终端设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948646A (zh) * 2019-01-24 2019-06-28 西安交通大学 一种时序数据相似度度量方法及度量系统
US11520993B2 (en) * 2019-07-24 2022-12-06 Nec Corporation Word-overlap-based clustering cross-modal retrieval
CN112214335B (zh) * 2020-10-13 2023-12-01 重庆工业大数据创新中心有限公司 基于知识图谱和相似度网络的Web服务发现方法
CN112597324A (zh) * 2020-12-15 2021-04-02 武汉工程大学 一种基于相关滤波的图像哈希索引构建方法、系统及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200242304A1 (en) * 2017-11-29 2020-07-30 Tencent Technology (Shenzhen) Company Limited Text recommendation method and apparatus, and electronic device
CN110895553A (zh) * 2018-08-23 2020-03-20 国信优易数据有限公司 语义匹配模型训练方法、语义匹配方法及答案获取方法
CN111859960A (zh) * 2020-07-27 2020-10-30 中国平安人寿保险股份有限公司 基于知识蒸馏的语义匹配方法、装置、计算机设备和介质
CN112328786A (zh) * 2020-11-03 2021-02-05 平安科技(深圳)有限公司 基于bert的文本分类方法、装置、计算机设备及存储介质
CN113377909A (zh) * 2021-06-09 2021-09-10 平安科技(深圳)有限公司 释义分析模型训练方法、装置、终端设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HE HUA, LIN JIMMY: "Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement", PROCEEDINGS OF THE 2016 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, STROUDSBURG, PA, USA, 17 June 2016 (2016-06-17), Stroudsburg, PA, USA, pages 937 - 948, XP093013487, DOI: 10.18653/v1/N16-1108 *
SAPUTRO WAHYU FAQIH; DJAMAL ESMERALDA C.; ILYAS RIDWAN: "Paraphrase Identification Between Two Sentence Using Support Vector Machine", 2019 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI), IEEE, 9 July 2019 (2019-07-09), pages 406 - 411, XP033708269, DOI: 10.1109/ICEEI47359.2019.8988874 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117689354A (zh) * 2024-02-04 2024-03-12 芯知科技(江苏)有限公司 基于云服务的招聘信息的智能处理方法及平台
CN117689354B (zh) * 2024-02-04 2024-04-19 芯知科技(江苏)有限公司 基于云服务的招聘信息的智能处理方法及平台

Also Published As

Publication number Publication date
CN113377909A (zh) 2021-09-10
CN113377909B (zh) 2023-07-11

Similar Documents

Publication Publication Date Title
WO2022257453A1 (zh) 释义分析模型训练方法、装置、终端设备及存储介质
CN112084327B (zh) 在保留语义的同时对稀疏标注的文本文档的分类
US10839315B2 (en) Method and system of selecting training features for a machine learning algorithm
EP3227836B1 (en) Active machine learning
Meng et al. Research on denoising sparse autoencoder
KR102250728B1 (ko) 샘플 처리 방법, 장치, 기기 및 저장 매체
CN110097096B (zh) 一种基于tf-idf矩阵和胶囊网络的文本分类方法
CN108898181B (zh) 一种图像分类模型的处理方法、装置及存储介质
US10824808B2 (en) Robust key value extraction
CN112418320A (zh) 一种企业关联关系识别方法、装置及存储介质
WO2022227214A1 (zh) 分类模型训练方法、装置、终端设备及存储介质
CN114428860A (zh) 院前急救病例文本的识别方法、装置、终端及存储介质
CN113780418A (zh) 一种数据的筛选方法、系统、设备和存储介质
CN112445914A (zh) 文本分类方法、装置、计算机设备和介质
EP4227855A1 (en) Graph explainable artificial intelligence correlation
CN116680401A (zh) 文档处理方法、文档处理装置、设备及存储介质
WO2016090625A1 (en) Scalable web data extraction
CN115115920A (zh) 一种数据训练方法及装置
CN115472179A (zh) 面向数字音频删除和插入篡改操作自动检测方法及系统
Richard et al. Densenets for time series classification: towards automation of time series pre-processing with cnns
CN113297376A (zh) 基于元学习的法律案件风险点识别方法及系统
CN112183103A (zh) 融合不同预训练词向量的卷积神经网络实体关系抽取方法
CN116431757B (zh) 基于主动学习的文本关系抽取方法、电子设备及存储介质
CN117171653B (zh) 一种识别信息关系的方法、装置、设备及存储介质
WO2022153710A1 (en) Training apparatus, classification apparatus, training method, classification method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22819061

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE