CN116362292A - Text classification model training method and device, text classification method and device - Google Patents

Text classification model training method and device, text classification method and device Download PDF

Info

Publication number
CN116362292A
CN116362292A CN202211729559.9A CN202211729559A CN116362292A CN 116362292 A CN116362292 A CN 116362292A CN 202211729559 A CN202211729559 A CN 202211729559A CN 116362292 A CN116362292 A CN 116362292A
Authority
CN
China
Prior art keywords
text
classification model
classification
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211729559.9A
Other languages
Chinese (zh)
Inventor
邓其春
马金龙
吴文亮
黎子骏
张政统
王伟喆
曾锐鸿
盘子圣
焦南凯
兰翔
徐志坚
谢睿
陈光尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Quwan Network Technology Co Ltd
Original Assignee
Guangzhou Quwan Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Quwan Network Technology Co Ltd filed Critical Guangzhou Quwan Network Technology Co Ltd
Priority to CN202211729559.9A priority Critical patent/CN116362292A/en
Publication of CN116362292A publication Critical patent/CN116362292A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a text classification model training method and device and a text classification method and device, wherein the text classification model training method comprises the steps of determining a training text set, wherein the training text set comprises unlabeled illegal texts, unlabeled illegal texts with marked forbidden categories and unlabeled normal texts; generating training data by using the hidden characters and the target text; inputting training data into a text classification model to obtain target characters predicted by the text classification model and classification results predicted by the text classification model based on the training data; and adjusting parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain the trained text classification model. Therefore, the text classification model obtained by training by the text classification model training method provided by the application has higher semantic analysis capability and illegal text distinguishing capability.

Description

Text classification model training method and device, text classification method and device
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a method and apparatus for training a text classification model, and a method and apparatus for classifying text.
Background
With the rapid development of internet technology, the reading of text information on the internet becomes a common recreation mode, but with the reduction of the threshold of issuing text information on the internet, a large number of illegal texts which are not suitable for being displayed for users, especially underage users, exist on the internet, so that certain treatment is required for the illegal texts to ensure physical and mental health of the users, especially the underage users. The premise of processing the offensive text is to find out the offensive text from a large amount of text information.
Based on this, in order to find out offensive text from a large amount of text information, a classification model capable of identifying offensive text may be introduced, and each text information may be classified.
Disclosure of Invention
In view of this, the present application provides a text classification model training method and apparatus, and a text classification method and apparatus for training a classification model capable of identifying offensive text, and classifying each text information.
In order to achieve the above object, the following solutions have been proposed:
a text classification model training method, comprising:
determining a training text set, wherein the training text set comprises a plurality of unlabeled illegal texts, a plurality of illegal texts with marked forbidden categories and a plurality of unlabeled normal texts;
sequentially selecting target texts from the training text set;
generating training data by using preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters;
inputting the training data into a text classification model to obtain target characters predicted by the text classification model and classification results predicted by the text classification model based on the training data, wherein the classification results are classification results corresponding to a plurality of forbidden categories, and the text classification model is a text classification model to be trained;
calculating a text semantic loss value of the text classification model according to the target characters and the target text;
calculating a classification loss value of the text classification model according to the classification result and the target text;
and adjusting parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain a trained text classification model.
Optionally, the method further comprises:
and adjusting the bias parameters and the weight parameters of the trained text classification model to obtain a processed text classification model, wherein the output of the processed text classification model is a classification result corresponding to the input text.
Optionally, adjusting the bias parameter and the weight parameter of the trained text classification model to obtain a processed text classification model, which includes:
and adjusting weight parameters and bias parameters related to the predicted target characters and the predicted classification results in the trained text classification model to obtain a processed text classification model.
Optionally, adjusting a weight parameter and a bias parameter related to the predicted target character and the predicted classification result in the trained text classification model to obtain a processed text classification model, which includes:
adjusting weight parameters and bias parameters related to the predicted target characters and the predicted classification results in the trained text classification model by using a preset adjusting formula to obtain a processed text classification model;
the adjustment formula is as follows:
H=BERT(X;θ)
Figure SMS_1
Figure SMS_2
Figure SMS_3
Figure SMS_4
Figure SMS_5
wherein X represents the input of the trained text classification model, theta represents the weight parameter of the trained text classification model, BERT represents the semantic coding by adopting the text classification model, H represents the semantic vector coded by the text classification model, and Slice 1 The representation intercepts the semantic vector encoded by the text classification model,
Figure SMS_6
representing the corresponding semantic vector of the classification result,/->
Figure SMS_7
Representing the semantic vector after full connection, LN represents Layer Normalization normalization operation, GELU represents Gaussian error linear unit activation function, W 1 Weight parameters representing full connection layer, B 1 Bias parameters representing fully connected layers, +.>
Figure SMS_8
Representing vector matrix corresponding to target character only, slice 2 Representing a vector matrix corresponding to the target character in the truncated dictionary, and enabling to represent the vector matrix of the dictionary,/-for the dictionary>
Figure SMS_9
Score representing the corresponding classification result of the forbidden class, +.>
Figure SMS_10
Representing transpose of vector matrix corresponding to target character only, B 2 Bias parameters representing dimension transformation (upscale), +.>
Figure SMS_11
And (3) representing the probability of the classification result corresponding to the forbidden class, wherein Softmax represents the probability obtained by adopting a Softmax function, and the dictionary is a pre-established character database.
Optionally, generating training data by using preset hidden characters and the target text includes:
replacing part of characters of the target text by using the hidden characters to obtain a replaced text;
if the target text does not have a label, directly taking the replacement text as the training data;
If the target text has a label, processing the replacement text by using a preset text template to obtain training data.
Optionally, the text template includes a fixed sequence and a specific position corresponding to the forbidden categories, and further includes a specific position for replacing the text;
processing the replacement text by using a preset text template to obtain training data, wherein the processing comprises the following steps:
determining a classification result corresponding to each forbidden class by using the labeling label of the target text corresponding to the replacement text;
determining the sequence of each classification result based on the fixed sequence corresponding to the forbidden classes in the text template;
forming a two-classification result combination according to the sequence of each two-classification result;
and generating training data based on specific positions corresponding to the forbidden categories in the text template, specific positions of the replacement text in the text template, the classification result combination and the replacement text.
Optionally, the generating training data based on specific positions corresponding to the forbidden categories in the text template, specific positions of the replaced text in the text template, the classification result combination and the replaced text includes:
Combining the two classification result combination and the replacement text according to the specific positions corresponding to the forbidden classes in the text template and the specific positions of the replacement text to obtain combined data;
and adding preset prefix characters into the combined data, adding suffix characters between a classification result of the combined data and the replacement text to obtain training data, so that after the training data is input into the text classification model, the text classification model distinguishes and identifies the classification result combination and the replacement text based on the prefix characters and the suffix characters.
A text classification model training device, comprising:
the training text set comprises a plurality of unlabeled illegal texts, a plurality of unlabeled illegal texts with marked forbidden categories and a plurality of unlabeled normal texts;
the selecting unit is used for sequentially selecting target texts from the training text set;
the generation unit is used for generating training data by utilizing preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters;
The classifying unit is used for inputting the training data into a text classifying model to obtain target characters predicted by the text classifying model and classifying results predicted by the text classifying model based on the training data, wherein the classifying results are classifying results corresponding to a plurality of forbidden categories, and the text classifying model is a text classifying model to be trained;
the calculating unit is used for calculating a text semantic loss value of the text classification model according to the target characters and the target text;
the utilization unit is used for calculating a classification loss value of the text classification model according to the classification result and the target text;
and the adjusting unit is used for adjusting the parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain the trained text classification model.
A text classification method, comprising:
acquiring text information to be classified;
and classifying the text information to be classified by using the text classification model trained by the text classification model training method to obtain a classification result, wherein the classification result comprises a classification result corresponding to a plurality of forbidden categories.
A text classification device, comprising:
the text acquisition unit is used for acquiring text information to be classified;
the information classification unit is used for classifying the text information to be classified by the text classification model trained by the text classification model training method to obtain a classification result, wherein the classification result comprises classification results corresponding to a plurality of forbidden categories.
According to the technical scheme, the training method for the text classification model determines a training text set, wherein the training text set comprises a plurality of unlabeled illegal texts, a plurality of unlabeled illegal texts with illegal categories, and a plurality of unlabeled normal texts; sequentially selecting target texts from the training text set; generating training data by using preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters; inputting the training data into a text classification model to obtain target characters predicted by the text classification model and classification results predicted by the text classification model based on the training data, wherein the classification results are classification results corresponding to a plurality of forbidden categories; through the process, the semantic analysis capability of the text classification model can be trained through hiding part of characters, and the text classification model can learn the distinction between the normal text and the offensive sample by adopting a mixed training mode of the normal text and the offensive text, so that the discrimination capability of the text classification model is improved; calculating a text semantic loss value of the text classification model according to the target characters and the target text; calculating a classification loss value of the text classification model according to the classification result and the target text; and adjusting parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain a trained text classification model. Through the process, the parameters of the text classification model can be adjusted by using the loss value, so that the purposes of improving the semantic analysis capability of the text classification model and the illegal text recognition capability are achieved. Therefore, the text classification model obtained by training by the text classification model training method provided by the application has higher semantic analysis capability and illegal text distinguishing capability.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a flowchart of a text classification model training method disclosed in the present application;
fig. 2 is a structural block diagram of a training device for text classification model according to an embodiment of the present application;
fig. 3 is a block diagram of a hardware structure of a text classification model training device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The text classification model training method provided by the application can be applied to numerous general or special computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor devices, distributed computing environments that include any of the above devices or devices, and the like.
The text classification model training method of the present application is described in detail below with reference to fig. 1, and includes the following steps:
step S1, determining a training text set.
Specifically, the training text set may include a plurality of unlabeled offending texts, a plurality of offending texts with marked offending categories, and a plurality of unlabeled normal texts.
The proportion among unlabeled offensive texts, offensive texts with marked forbidden categories and unlabeled normal texts in the training text set can be set according to actual training requirements, and generally speaking, offensive texts with marked forbidden categories can be main components in the training text set.
The training text set may include offensive text corresponding to a plurality of offensive categories, and the same offensive text may correspond to one or more offensive categories. The same illicit category may correspond to multiple illicit text.
The forbidden categories can be set according to actual requirements.
And S2, selecting target texts from the training text set in sequence.
Specifically, the target text can be sequentially selected from the training text set, and the target text can be unlabeled illegal text, unlabeled illegal text with forbidden classes, or unlabeled normal text.
And S3, generating training data by utilizing the preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters.
Specifically, part of characters in the selected target text can be hidden, further, part of words in the target text can be hidden by adopting hidden characters, and part of words in the target text can be replaced by randomly selecting characters, so that the purpose of hiding is achieved.
Training data can be generated by utilizing preset hidden characters and the target text in various modes, for example, partial words in the target text can be randomly replaced to generate the training data; sensitive words in the target text can also be determined, and the hidden characters are utilized to replace the sensitive words in the target text, so that training data are generated.
Wherein the characters in the target text that are replaced by hidden characters may account for 15% of the total characters in the target text.
And S4, inputting the training data into a text classification model to obtain target characters predicted by the text classification model and classification results predicted by the text classification model based on the training data.
In particular, the text classification model may be a text classification model that requires training.
The training data can be input into the text classification model for training of the text classification model, and target characters and classification results output by the text classification model are obtained.
And obtaining a classification result of the text classification model based on training data and target character prediction corresponding to the training data.
The target characters may be characters obscured by hidden characters in the target text predicted by the text classification model based on the training data.
The classification result is a classification result corresponding to the multiple forbidden categories, for example, the multiple forbidden categories can be a first forbidden type and a second forbidden type, the classification result is yes or no, and then the classification result can be whether the first forbidden type is and whether the second forbidden type is.
If the target text corresponding to the training data is the normal text, the two classification results corresponding to the forbidden classification are no.
And S5, calculating a text semantic loss value of the text classification model according to the target characters and the target text.
Specifically, the target character can be replaced with the hidden character in the training data, and the obtained predicted text; and calculating cross entropy loss between the predicted text and the target text, and calculating a text semantic loss value of the text classification model based on the semantic distance.
And S6, calculating a classification loss value of the text classification model according to the classification result and the target text.
Specifically, whether a labeling label exists in the target text is determined, and if the labeling label exists, a classification loss value of a text classification model is calculated according to the distance between a classification result and the labeling label; if the target text is a normal text, determining that the classification loss value is 0 when the target text is the normal text and the classification result indicates that the training data does not have a matched forbidden class; if the target text is illegal text and no labeling label exists, the classification result indicates that the training data has a matched forbidden class, and the classification loss value is directly determined to be 0.
And step S7, adjusting parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, and obtaining the trained text classification model.
In particular, parameters of the text classification model may be adjusted based on the text semantic loss value and the size of the classification loss value until the text semantic loss value and the classification loss value meet less than a threshold.
The threshold value can be preset according to actual requirements, and different accuracy requirements can be corresponding to different threshold values.
A penalty function may also be employed and used to calculate the penalty in training the text classification model using the training data, the target character, and the classification result, where the penalty function is as follows:
Figure SMS_12
wherein i represents the ith character of the training data; j represents the j-th character in the dictionary, the target character is selected from the dictionary, and 21128 words can be included in the dictionary; l represents a loss value corresponding to single training data, and N represents trainingTraining the number of characters in the data replaced by hidden characters, m i Indicating whether the ith character in the training data is replaced by a hidden character, y j Indicating whether the jth character in the training data is a label, p j Representing the probability that the jth character in the dictionary is the target character.
According to the technical scheme, through the text classification model training method, the semantic analysis capability of the text classification model can be trained through hiding part of characters, and the text classification model can learn the distinction between the normal text and the offending sample by adopting a mixed training mode of the normal text and the offending text, so that the discrimination capability of the text classification model is improved, and the parameters of the text classification model can be adjusted by using the loss value, so that the purposes of improving the semantic analysis capability of the text classification model and the offending text recognition capability are achieved. Therefore, the text classification model obtained by training by the text classification model training method provided by the application has higher semantic analysis capability and illegal text distinguishing capability.
In addition, through training the text set and mixing with unlabeled offending text, unlabeled offending text with forbidden classes, and training the text classification model by unlabeled normal text, supervised learning and self-supervised learning of the text classification model can be realized, namely, semi-supervised learning can be carried out by fully utilizing the labeled and unlabeled text, and the reliability and learning capacity of the text classification model are further improved.
In some embodiments of the present application, considering that the text classification model obtained by training may predict the target character, but in the actual prediction process, the text classification model is not required to predict the character, and only the text classification model is required to predict the classification result, so that after the text classification model is obtained by training, a processing process for the text classification model may be added, and the function of the text classification model may be deleted, so as to achieve the purpose of accelerating the prediction classification result, and next, the processing process will be described in detail as follows:
and S8, adjusting the bias parameters and the weight parameters of the trained text classification model to obtain a processed text classification model, wherein the output of the processed text classification model is a classification result corresponding to the input text.
Specifically, parameters of the text classification model may be adjusted to implement that the output of the text classification model is only a classification result, where the classification result includes classification results corresponding to a plurality of forbidden categories.
The adjusted parameter may be a bias parameter and a weight parameter.
Compared with the previous embodiment, the method and the device for predicting the target characters of the text classification model increase the process of adjusting the bias parameters and the weight parameters of the text classification model, can reduce the process of predicting the target characters of the text classification model through the process, further reduce the prediction difficulty of the text classification model and the prediction process under the condition that the text classification model only outputs the classification result, realize the acceleration of the prediction classification result of the text classification model, and improve the prediction efficiency of the text classification model.
In some embodiments of the present application, the process of adjusting the bias parameters and the weight parameters of the trained text classification model in step S8 to obtain the processed text classification model is described in detail, and the steps are as follows:
and S80, adjusting weight parameters and bias parameters related to the predicted target characters and the predicted classification results in the trained text classification model to obtain a processed text classification model.
Specifically, in the process of adjusting the bias parameters and the weight parameters of the text classification model, only the characters and the predicted target characters and the predicted classification results in the text classification model can be adjusted
From the above technical solution, it can be seen that this embodiment provides an optional manner for adjusting parameters of a text classification model, which can adjust parameters related to a predicted target character and a predicted classification result in the text classification model, so as to further improve efficiency of adjusting the text classification model and further improve distinguishing efficiency of the text classification model.
In some embodiments of the present application, in step S80, the process of adjusting the weight parameters and bias parameters related to the predicted target characters and the predicted classification result in the trained text classification model to obtain the processed text classification model is described in detail, and the steps are as follows:
s800, adjusting weight parameters and bias parameters related to the predicted target characters and the predicted classification result in the trained text classification model by using a preset adjusting formula to obtain a processed text classification model.
Specifically, the adjustment formula is as follows:
H=BERT(X;θ)
Figure SMS_13
Figure SMS_14
Figure SMS_15
Figure SMS_16
Figure SMS_17
Wherein X represents the input of the trained text classification model, theta represents the weight parameter of the trained text classification model, BERT represents the semantic coding by adopting the text classification model, H represents the semantic vector coded by the text classification model, and Slice 1 The representation intercepts the semantic vector encoded by the text classification model,
Figure SMS_18
representing the correspondence of the classification resultSemantic vector->
Figure SMS_19
Representing the semantic vector after full connection, LN represents Layer Normalization normalization operation, GELU represents Gaussian error linear unit activation function, W 1 Weight parameters representing full connection layer, B 1 Bias parameters representing fully connected layers, +.>
Figure SMS_20
Representing vector matrix corresponding to target character only, slice 2 Representing a vector matrix corresponding to the target character in the truncated dictionary, and enabling to represent the vector matrix of the dictionary,/-for the dictionary>
Figure SMS_21
Score representing the corresponding classification result of the forbidden class, +.>
Figure SMS_22
Representing transpose of vector matrix corresponding to target character only, B 2 Bias parameters representing dimension transformation (upscale), +.>
Figure SMS_23
And (3) representing the probability of the classification result corresponding to the forbidden class, wherein Softmax represents the probability obtained by adopting a Softmax function, the dictionary is a pre-established character database, and the target characters are selected from the dictionary.
Wherein H is E R b×s×d
Figure SMS_24
ETable∈R v×d ,/>
Figure SMS_25
Figure SMS_26
b represents the batch size, s represents the sequence length, d represents the vector dimension, k represents the number of forbidden categories, v represents the dictionary size, and typically v > s.
According to the technical scheme, the method and the device for adjusting the parameters of the text classification model can achieve parameter adjustment of the text classification model better through the process, and therefore the purpose of improving the prediction efficiency of the text classification model is achieved.
After the parameters of the trained text classification model are adjusted by adopting the adjusting formula, the operation times of the text classification model and a large number of power function calculations in the softmax function can be reduced, so that the prediction efficiency of the text classification model can be improved. And verification shows that after parameters of the text classification model are adjusted, the accuracy of classification results predicted by the text classification model is not reduced, and therefore, the text classification model obtained by the embodiment can improve the prediction efficiency while guaranteeing the discrimination accuracy.
In some embodiments of the present application, the process of generating training data by using the preset hidden characters and the target text in step S3, where the training data is a target text with part of the characters replaced by the hidden characters is described in detail, and the steps are as follows:
S30, replacing part of characters of the target text by using the hidden characters to obtain a replaced text.
Specifically, the replacement text can be obtained in various manners, for example, part of characters of the target text can be randomly replaced by using hidden characters, so that the replacement text is obtained; and the sensitive words in the target text can be replaced by hidden characters to obtain a replacement text.
And S31, if the target text does not have a label, directly taking the replacement text as the training data.
Specifically, when no label exists in the target text, the target text is replaced by the replacement characters, and the obtained replacement text can be directly used as training data.
S32, if the target text has a label, processing the replacement text by using a preset text template to obtain training data.
Specifically, if a tag exists in the target text, the text template can be utilized to process the replacement text to obtain training data, so that forbidden categories exist at fixed positions of each training data.
According to the technical scheme, the method for generating the training data is an optional mode for generating the training data, the formats of the training data generated by the mode are similar, semi-supervised learning of the text classification model in the training process is facilitated, and the training process of the text classification model is quickened.
In some embodiments of the present application, in step S32, if the target text has a tag, the process of processing the replacement text by using a preset text template to obtain training data is described in detail, and the steps are as follows:
s320, determining a classification result corresponding to each forbidden class by using the labeling label of the target text corresponding to the replacement text.
Specifically, the labeling tag of the target text may indicate the forbidden class matched with the target text, and a classification result corresponding to each forbidden class is determined according to the forbidden class matched with the target text, for example, if the labeling tag of the target text indicates that the forbidden class matched with the target text is a fifth forbidden class, and the forbidden class which can be distinguished by the text classification model is a first forbidden class, a second forbidden class, a third forbidden class, a fourth forbidden class and a fifth forbidden class, then the classification result corresponding to the first forbidden class may be no, the classification result corresponding to the second forbidden class may be no, the classification result corresponding to the third forbidden class may be no, the classification result corresponding to the fourth forbidden class may be no, and the classification result corresponding to the fifth forbidden class may be yes.
S321, determining the sequence of each classification result based on the fixed sequence corresponding to the forbidden classes in the text template.
Specifically, the sequence of the two classification results corresponding to each forbidden class that can be identified by the text classification model can be determined according to the fixed sequence corresponding to the forbidden class in the text template, for example, when each forbidden class that can be identified by the text classification model is a first forbidden type, a second forbidden type, a third forbidden type, a fourth forbidden type and a fifth forbidden type, it is required to determine that the sequence of the two classification results corresponding to the first forbidden type, the sequence of the two classification results corresponding to the second forbidden type, the sequence of the two classification results corresponding to the third forbidden type, the sequence of the two classification results corresponding to the fourth forbidden type and the sequence of the two classification results corresponding to the fifth forbidden type.
S322, forming a two-classification result combination according to the sequence of the two-classification results.
Specifically, the respective classification results may be combined according to the order corresponding to the respective classification results, and the obtained combined result is the combination of the classification results. For example, the fixed order of the forbidden categories in the text template may be a first forbidden type, a second forbidden type, a third forbidden type, a fourth forbidden type, and a fifth forbidden type, and then the order of the "no" of the two classification results corresponding to the first forbidden type is the first, the order of the "no" of the two classification results corresponding to the second forbidden type is the second, the order of the "no" of the two classification results corresponding to the third forbidden type is the third, the order of the "no" of the two classification results corresponding to the fourth forbidden type is the no, and when the two classification results corresponding to the fifth forbidden type is the yes, the formed combination of the two classification results may be the no or the no.
S323, training data is generated based on specific positions corresponding to the forbidden categories in the text template, specific positions of the replacement text in the text template, the classification result combination and the replacement text.
Specifically, based on a specific position corresponding to the forbidden class in the text template and a specific position of the replacement text, the two-class result combination and the replacement text are combined to generate training data, for example, the position which can be the forbidden class in the text template is positioned at the left end of the replacement text, and then the two-class result combination can be placed before the replacement text to generate the training data.
According to the technical scheme, the embodiment provides an optional combination mode of the training data, through the mode, in the process of training the text classification model, the text classification model can distinguish the combination of the alternative text and the classification result according to the positions of each component of the training data, and the training efficiency of the text classification model is improved.
In some embodiments of the present application, the process of generating training data in step S323 based on specific positions corresponding to the forbidden categories in the text template, specific positions of the substituted text in the text template, the combination of the classification results, and the substituted text is described in detail, and the steps are as follows:
S3230, combining the two classification result combinations and the replacement text according to specific positions corresponding to the forbidden categories and specific positions of the replacement text in the text template to obtain combined data.
Specifically, the combination of the classification result and the replacement text can be integrated according to the forbidden classification and the position of the replacement text in the text template, and the obtained result is combination data, wherein the combination data comprises the combination of the classification result and the replacement text.
S3231, adding preset prefix characters into the combined data, and adding suffix characters between the classification result of the combined data and the replacement text to obtain training data, so that after the training data is input into the text classification model, the text classification model distinguishes and identifies the classification result combination and the replacement text based on the prefix characters and the suffix characters.
Specifically, a prefix character and a suffix character may be preset, specific contents of the prefix character and the suffix character may be set according to actual requirements, the prefix character may include a character string indicating a starting meaning, and the training data may be composed of "prefix character+dichotomy result combination+suffix character+substitution text". Thus, the data between the prefix characters and the suffix characters are the two classification result combinations, and the data after the suffix characters are the substitution text.
As can be seen from the above technical solution, the present embodiment provides an alternative way of generating training data by using the alternative text and the binary result combination, by which the binary result combination and the alternative text in the training data can be better distinguished by using the prefix character and the suffix character, so that training of the text classification model can be better completed.
The text classification method provided by the embodiment of the application will be described in detail, and the text classification model obtained by training above can be applied to the text classification method provided below, which can be referred to by the text classification model training method provided above.
The specific steps of the text classification method can be as follows:
s1, acquiring text information to be classified.
Specifically, text information to be classified can be acquired.
The text information to be classified can be obtained from the internet, for example, the text information to be classified is obtained from a chat record in a live broadcast room, and the text information to be classified is obtained from a chat interface.
S2, classifying the text information to be classified by using the text classification model trained by the text classification model training method provided by any embodiment to obtain a classification result, wherein the classification result comprises classification results corresponding to a plurality of forbidden categories.
Specifically, the text information to be classified can be input into the text classification model obtained through the training, a classification result predicted by the text classification model is obtained, the classification result can comprise classification results corresponding to a plurality of forbidden categories, and whether the text information to be classified belongs to the illegal text can be known through the classification result.
According to the technical scheme, the text classification method is provided, and the illegal texts and the normal texts can be identified through the process, so that the illegal texts can be processed, and physical and psychological health of underage users and scientific surfing can be guaranteed.
The text classification model training device provided in the embodiments of the present application is described below, and the text classification model training device described below and the text classification model training method described above may be referred to correspondingly.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a training device for text classification model according to an embodiment of the present application.
As shown in fig. 2, the text classification model training apparatus may include:
a determining unit 1, configured to determine a training text set, where the training text set includes a plurality of unlabeled offensive texts, a plurality of offensive texts with marked forbidden categories, and a plurality of unlabeled normal texts;
A selecting unit 2, configured to sequentially select target texts from the training text set;
a generating unit 3, configured to generate training data by using preset hidden characters and the target text, where the training data is a target text in which part of characters are replaced by hidden characters;
the classifying unit 4 is used for inputting the training data into a text classifying model to obtain target characters predicted by the text classifying model and classifying results predicted by the text classifying model based on the training data, wherein the classifying results are classifying results corresponding to a plurality of forbidden categories;
a calculating unit 5, configured to calculate a text semantic loss value of the text classification model according to the target character and the target text;
a utilization unit 6, configured to calculate a classification loss value of the text classification model according to the classification result and the target text;
and the adjusting unit 7 is configured to adjust parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value meet preset conditions, thereby obtaining a trained text classification model.
Optionally, the text classification model training apparatus may further include:
and the parameter adjusting unit is used for adjusting the bias parameters and the weight parameters of the trained text classification model to obtain a processed text classification model, and the output of the processed text classification model is a classification result corresponding to the input text.
Alternatively, the parameter adjusting unit may include:
and the weight parameter adjusting unit is used for adjusting weight parameters and bias parameters related to the predicted target characters and the predicted classification results in the trained text classification model to obtain a processed text classification model.
Alternatively, the weight parameter adjusting unit may include:
the company utilization unit is used for utilizing a preset regulation formula to regulate weight parameters and bias parameters related to the predicted target characters and the predicted classification results in the trained text classification model to obtain a processed text classification model;
the adjustment formula is as follows:
H=BERT(X;θ)
Figure SMS_27
Figure SMS_28
Figure SMS_29
Figure SMS_30
Figure SMS_31
wherein X represents the input of the trained text classification model, theta represents the weight parameter of the trained text classification model, BERT represents the semantic coding by adopting the text classification model, H represents the semantic vector coded by the text classification model, and Slice 1 Representation interception through text segmentationThe semantic vector after class model encoding,
Figure SMS_32
representing the corresponding semantic vector of the classification result,/->
Figure SMS_33
Representing the semantic vector after full connection, LN represents Layer Normalization normalization operation, GELU represents Gaussian error linear unit activation function, W 1 Weight parameters representing full connection layer, B 1 Bias parameters representing fully connected layers, +.>
Figure SMS_34
Representing vector matrix corresponding to target character only, slice 2 Representing a vector matrix corresponding to the target character in the truncated dictionary, and enabling to represent the vector matrix of the dictionary,/-for the dictionary>
Figure SMS_35
Score representing the corresponding classification result of the forbidden class, +.>
Figure SMS_36
Representing transpose of vector matrix corresponding to target character only, B 2 Bias parameters representing dimension transformation (upscale), +.>
Figure SMS_37
And (5) representing the probability of the classification result corresponding to the forbidden class, wherein Softmax represents the probability obtained by adopting a Softmax function.
Alternatively, the generating unit may include:
a character replacing unit, configured to replace a part of characters of the target text with hidden characters, so as to obtain a replaced text;
the label judging unit is used for directly taking the replacement text as the training data if the target text does not have a label;
and the text processing unit is used for processing the replacement text by using a preset text template if the target text has a label, so as to obtain training data.
Alternatively, the text processing unit may include:
the classification result determining unit is used for determining a classification result corresponding to each forbidden class by using the labeling label of the target text corresponding to the replacement text;
the sequence determining unit is used for determining the sequence of each classification result based on the fixed sequence corresponding to the forbidden classes in the text template;
the two-classification result combination unit is used for forming two-classification result combinations according to the sequence of the two-classification results;
the position utilization unit is used for generating training data based on specific positions corresponding to the forbidden categories in the text template, specific positions of the replaced text in the text template, the classification result combination and the replaced text.
Alternatively, the location utilization unit may include:
the first position utilization unit is used for combining the two classification result combinations and the replacement text according to specific positions corresponding to the forbidden categories in the text template and specific positions of the replacement text to obtain combined data;
and the second position utilization unit is used for adding preset prefix characters into the combined data and adding suffix characters between the classification result of the combined data and the replacement text to obtain training data, so that after the training data is input into the text classification model, the text classification model distinguishes and identifies the classification result combination and the replacement text based on the prefix characters and the suffix characters.
The text classification model training device provided by the embodiment of the application can be applied to text classification model training equipment, such as a PC terminal, a cloud platform, a server cluster and the like. Optionally, fig. 3 shows a block diagram of a hardware structure of the text classification model training apparatus, and referring to fig. 3, the hardware structure of the text classification model training apparatus may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete communication with each other through the communication bus 4;
processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
Determining a training text set, wherein the training text set comprises a plurality of unlabeled illegal texts, a plurality of illegal texts with marked forbidden categories and a plurality of unlabeled normal texts;
sequentially selecting target texts from the training text set;
generating training data by using preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters;
inputting the training data into a text classification model to obtain target characters predicted by the text classification model and classification results predicted by the text classification model based on the training data, wherein the classification results are classification results corresponding to a plurality of forbidden categories;
calculating a text semantic loss value of the text classification model according to the target characters and the target text;
calculating a classification loss value of the text classification model according to the classification result and the target text;
and adjusting parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain a trained text classification model.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the application also provides a readable storage medium, which can store a program suitable for being executed by a processor, the program being configured to:
determining a training text set, wherein the training text set comprises a plurality of unlabeled illegal texts, a plurality of illegal texts with marked forbidden categories and a plurality of unlabeled normal texts;
sequentially selecting target texts from the training text set;
generating training data by using preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters;
inputting the training data into a text classification model to obtain target characters predicted by the text classification model and classification results predicted by the text classification model based on the training data, wherein the classification results are classification results corresponding to a plurality of forbidden categories;
calculating a text semantic loss value of the text classification model according to the target characters and the target text;
calculating a classification loss value of the text classification model according to the classification result and the target text;
And adjusting parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain a trained text classification model.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The text classification device provided in the embodiments of the present application will be described in detail below, and the text classification device described below may be referred to with the text classification method provided above.
The text classification apparatus may include:
the text acquisition unit is used for acquiring text information to be classified;
the information classification unit is used for classifying the text information to be classified by using a text classification model trained by a text classification model training method to obtain a classification result, wherein the classification result comprises classification results corresponding to a plurality of forbidden categories.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Various embodiments of the present application may be combined with one another. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for training a text classification model, comprising:
determining a training text set, wherein the training text set comprises a plurality of unlabeled illegal texts, a plurality of illegal texts with marked forbidden categories and a plurality of unlabeled normal texts;
sequentially selecting target texts from the training text set;
Generating training data by using preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters;
inputting the training data into a text classification model to obtain target characters predicted by the text classification model and classification results predicted by the text classification model based on the training data, wherein the classification results are classification results corresponding to a plurality of forbidden categories;
calculating a text semantic loss value of the text classification model according to the target characters and the target text;
calculating a classification loss value of the text classification model according to the classification result and the target text;
and adjusting parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain a trained text classification model.
2. The text classification model training method of claim 1, further comprising:
and adjusting the bias parameters and the weight parameters of the trained text classification model to obtain a processed text classification model, wherein the output of the processed text classification model is a classification result corresponding to the input text.
3. The method for training a text classification model according to claim 2, wherein the adjusting the bias parameter and the weight parameter of the trained text classification model to obtain the processed text classification model comprises:
and adjusting weight parameters and bias parameters related to the predicted target characters and the predicted classification results in the trained text classification model to obtain a processed text classification model.
4. A method of training a text classification model according to claim 3, wherein adjusting the weight parameters and bias parameters associated with the predicted target character and the predicted classification result in the trained text classification model to obtain a processed text classification model comprises:
adjusting weight parameters and bias parameters related to the predicted target characters and the predicted classification results in the trained text classification model by using a preset adjusting formula to obtain a processed text classification model;
the adjustment formula is as follows:
H=BERT(X;θ)
Figure FDA0004031002750000021
Figure FDA0004031002750000022
Figure FDA0004031002750000023
Figure FDA0004031002750000024
Figure FDA0004031002750000025
wherein X represents the input of the trained text classification model, theta represents the weight parameter of the trained text classification model, BERT represents the semantic coding by adopting the text classification model, H represents the semantic vector coded by the text classification model, and Slice 1 The representation intercepts the semantic vector encoded by the text classification model,
Figure FDA0004031002750000026
representing the corresponding semantic vector of the classification result,/->
Figure FDA0004031002750000027
Representing the semantic vector after full connection, LN represents Layer Normalization normalization operation, GELU represents Gaussian error linear unit activation function, W 1 Weight parameters representing full connection layer, B 1 Bias parameters representing fully connected layers, +.>
Figure FDA0004031002750000028
Representing vector matrix corresponding to target character only, slice 2 Representing a vector matrix corresponding to the target character in the truncated dictionary, and enabling to represent the vector matrix of the dictionary,/-for the dictionary>
Figure FDA0004031002750000029
Score representing the corresponding classification result of the forbidden class, +.>
Figure FDA00040310027500000210
Representing transpose of vector matrix corresponding to target character only, B 2 Bias parameters representing dimension transformation (upscale), +.>
Figure FDA00040310027500000211
Representing the probability of the corresponding classification result of the forbidden class, and Softmax represents the calculation of the probability by adopting the Softmax functionThe rate, the dictionary is a pre-established character database.
5. The text classification model training method of claim 1, wherein generating training data using preset hidden characters and the target text comprises:
replacing part of characters of the target text by using the hidden characters to obtain a replaced text;
If the target text does not have a label, directly taking the replacement text as the training data;
if the target text has a label, processing the replacement text by using a preset text template to obtain training data.
6. The method for training a text classification model according to claim 5, wherein the text template comprises a fixed sequence and a specific position corresponding to a plurality of forbidden categories, and further comprises a specific position for replacing text;
processing the replacement text by using a preset text template to obtain training data, wherein the processing comprises the following steps:
determining a classification result corresponding to each forbidden class by using the labeling label of the target text corresponding to the replacement text;
determining the sequence of each classification result based on the fixed sequence corresponding to the forbidden classes in the text template;
forming a two-classification result combination according to the sequence of each two-classification result;
and generating training data based on specific positions corresponding to the forbidden categories in the text template, specific positions of the replacement text in the text template, the classification result combination and the replacement text.
7. The method for training a text classification model according to claim 6, wherein generating training data based on specific locations corresponding to a plurality of forbidden categories in a text template, specific locations of a substituted text in the text template, the combination of classification results, and the substituted text comprises:
Combining the two classification result combination and the replacement text according to the specific positions corresponding to the forbidden classes in the text template and the specific positions of the replacement text to obtain combined data;
and adding preset prefix characters into the combined data, adding suffix characters between a classification result of the combined data and the replacement text to obtain training data, so that after the training data is input into the text classification model, the text classification model distinguishes and identifies the classification result combination and the replacement text based on the prefix characters and the suffix characters.
8. A text classification model training device, comprising:
the training text set comprises a plurality of unlabeled illegal texts, a plurality of unlabeled illegal texts with marked forbidden categories and a plurality of unlabeled normal texts;
the selecting unit is used for sequentially selecting target texts from the training text set;
the generation unit is used for generating training data by utilizing preset hidden characters and the target text, wherein the training data is the target text with partial characters replaced by the hidden characters;
The classifying unit is used for inputting the training data into a text classifying model to obtain target characters predicted by the text classifying model and classifying results predicted by the text classifying model based on the training data, wherein the classifying results are classifying results corresponding to a plurality of forbidden categories;
the calculating unit is used for calculating a text semantic loss value of the text classification model according to the target characters and the target text;
the utilization unit is used for calculating a classification loss value of the text classification model according to the classification result and the target text;
and the adjusting unit is used for adjusting the parameters of the text classification model based on the text semantic loss value and the classification loss value until the text semantic loss value and the classification loss value accord with preset conditions, so as to obtain the trained text classification model.
9. A method of text classification, comprising:
acquiring text information to be classified;
classifying the text information to be classified by using the text classification model trained by the text classification model training method according to any one of claims 1 to 7 to obtain a classification result, wherein the classification result comprises classification results corresponding to a plurality of forbidden classes.
10. A text classification device, comprising:
the text acquisition unit is used for acquiring text information to be classified;
the information classification unit is used for classifying the text information to be classified by using the text classification model trained by the text classification model training method according to any one of claims 1-7 to obtain a classification result, wherein the classification result comprises classification results corresponding to a plurality of forbidden categories.
CN202211729559.9A 2022-12-30 2022-12-30 Text classification model training method and device, text classification method and device Pending CN116362292A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211729559.9A CN116362292A (en) 2022-12-30 2022-12-30 Text classification model training method and device, text classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211729559.9A CN116362292A (en) 2022-12-30 2022-12-30 Text classification model training method and device, text classification method and device

Publications (1)

Publication Number Publication Date
CN116362292A true CN116362292A (en) 2023-06-30

Family

ID=86926325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211729559.9A Pending CN116362292A (en) 2022-12-30 2022-12-30 Text classification model training method and device, text classification method and device

Country Status (1)

Country Link
CN (1) CN116362292A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874239A (en) * 2024-03-11 2024-04-12 腾讯科技(深圳)有限公司 Content generation method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874239A (en) * 2024-03-11 2024-04-12 腾讯科技(深圳)有限公司 Content generation method, device, equipment and storage medium
CN117874239B (en) * 2024-03-11 2024-06-11 腾讯科技(深圳)有限公司 Content generation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112084337B (en) Training method of text classification model, text classification method and equipment
Zhao et al. Cyberbullying detection based on semantic-enhanced marginalized denoising auto-encoder
Fang et al. Self multi-head attention-based convolutional neural networks for fake news detection
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
US10803387B1 (en) Deep neural architectures for detecting false claims
CN111680159B (en) Data processing method and device and electronic equipment
US7689531B1 (en) Automatic charset detection using support vector machines with charset grouping
US7827133B2 (en) Method and arrangement for SIM algorithm automatic charset detection
CN108536754A (en) Electronic health record entity relation extraction method based on BLSTM and attention mechanism
CN112231485B (en) Text recommendation method and device, computer equipment and storage medium
CN110795525A (en) Text structuring method and device, electronic equipment and computer readable storage medium
CN110188158B (en) Keyword and topic label generation method, device, medium and electronic equipment
CN112667813B (en) Method for identifying sensitive identity information of referee document
CN112464655A (en) Word vector representation method, device and medium combining Chinese characters and pinyin
CN111898704A (en) Method and device for clustering content samples
CN116362292A (en) Text classification model training method and device, text classification method and device
CN115186085A (en) Reply content processing method and interaction method of media content interaction content
CN114722832A (en) Abstract extraction method, device, equipment and storage medium
Trisal et al. K-RCC: A novel approach to reduce the computational complexity of KNN algorithm for detecting human behavior on social networks
Zhao et al. Topic identification of text‐based expert stock comments using multi‐level information fusion
Karaoglan et al. Enhancing Aspect Category Detection Through Hybridised Contextualised Neural Language Models: A Case Study In Multi-Label Text Classification
Khan et al. Fake news classification using machine learning: Count vectorizer and support vector machine
Ling Coronavirus public sentiment analysis with BERT deep learning
Hossain et al. An Ensemble Method-Based Machine Learning Approach Using Text Mining to Identify Semantic Fake News

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination