CN115329784B - Sentence repeat generating system based on pre-training model - Google Patents

Sentence repeat generating system based on pre-training model Download PDF

Info

Publication number
CN115329784B
CN115329784B CN202211245822.7A CN202211245822A CN115329784B CN 115329784 B CN115329784 B CN 115329784B CN 202211245822 A CN202211245822 A CN 202211245822A CN 115329784 B CN115329784 B CN 115329784B
Authority
CN
China
Prior art keywords
sentence
model
generation
module
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211245822.7A
Other languages
Chinese (zh)
Other versions
CN115329784A (en
Inventor
谢冰
尹越
袭向明
宋伟
朱世强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202211245822.7A priority Critical patent/CN115329784B/en
Publication of CN115329784A publication Critical patent/CN115329784A/en
Application granted granted Critical
Publication of CN115329784B publication Critical patent/CN115329784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a sentence repetition generating system based on a pre-training model, which comprises a repetition generating module, a fluency filtering module and a semantic filtering module, wherein the repetition generating module is used for generating a repetition, the repetition generating module comprises a translation generating module, a model generating module and a synonym replacing generating module, the translation generating module generates the repetition by two methods of transliteration and retranslation, the model generating module generates the repetition by directly training a Chinese repetition generating model and indirectly generating the Chinese repetition by using an English repetition generating model, and the synonym generating module generates the repetition by replacing a synonym in an original sentence.

Description

Sentence repeat generating system based on pre-training model
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a sentence repeat generating system based on a pre-training model.
Background
The repeated sentence generating task is to generate sentences which have the same semanteme as the original sentences but different expressions. The sentence generated is called the repeat of the original sentence. The generation of the repeat can play an important role in tasks such as question answering, translation, natural language generation and the like. In a question-and-answer system, questions input by a user can be expanded by repeated generation, so that a question-and-answer library is easier to match with similar questions. In training the translation model, the restatement generation may augment the training data and the tag data. In the natural language generation task, the generated sentences are generated repeatedly to generate abundant and diverse expressions.
The current generation methods for rephrasing are mainly rule-based, statistical-based and neural network-based methods.
The generation method based on the rules rewrites the original sentence according to the rules, and changes the expression and structure of the sentence to generate the repeat under the condition of keeping the same semantic meaning as the original sentence. For example, adding or inserting appropriate mood assist words in appropriate positions of the original sentence, changing the word order of the sentence, and replacing synonyms for words in the sentence. Synonym substitution for words in a sentence can produce rich paraphrases, but for ambiguous words, its semantics are often related to the context of the primitive, and its synonyms are somewhat inappropriate. If the synonym is blindly taken to replace the original word, the generated repeated sentence has semantic change and grammar is not available.
The statistical-based method is mainly a repeat generation method based on statistical machine translation. The statement generation method can be considered as a translation task in which the source language and the target language are the same. And translating the original sentence into sentences with the same semantics and different expressions by using a statistical machine translation model to generate a repeat. With the development of deep learning, the translation model based on neural network training is superior to the machine translation model based on statistics, and the generation of the manifold by using the neural network translation model is a better choice.
The development of deep learning also provides new ideas and methods for the generation of the retelling language, and the pre-trained language model can be finely adjusted to generate the retelling language. The language model pre-trained on the large-scale corpus has strong extraction capability on the general characteristics of the text. Fine-tuning on the replication of the generated data set based on such a model may be very effective. With the development of deep learning, more and more pre-training models are sourced. The model features have strong extraction capability, greatly help the promotion of downstream tasks, and are valuable resources worthy of utilization.
Natural language processing has received increasing attention as artificial intelligence and deep learning have evolved. The creation of a repeat is also being studied by more and more scholars as a research direction of natural language processing. Currently, the research on Chinese language repeat generation is less than that of English language repeat generation. English language repetition generated studies have a long technological accumulation and a broader research population. At present, open-source frames and models are available to realize English rephrase generation, and can generate high-quality English rephrase, which are also resources worthy of utilization.
At present, a better automatic evaluation method is lacked for the repeated generation task. The most convincing method is manual evaluation, but this method is time-consuming, labor-intensive and costly. How to automatically evaluate the generated repeat quality, especially whether the repeat is smooth or not, and whether the repeated semantics are the same as the original sentence is a problem worthy of further study.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a sentence repeat generation system based on a pre-training model.
In order to realize the purpose, the technical scheme of the invention is as follows:
a sentence repeat generating system based on a pre-training model comprises a repeat generating module, a fluency filtering module and a semantic filtering module which are connected in sequence; the rephrase generating module is used for generating the rephrase; the fluency filtering module is used for calculating the fluency of repeat, and filtering to obtain repeat with the fluency not lower than a threshold value; the semantic filtering module is used for calculating the semantic similarity between the repeat and the original sentence, and filtering to obtain the repeat with the semantic similarity not lower than a threshold value.
Further, the rephrase generation module comprises a translation generation module, a model generation module and a synonym replacement generation module; the translation generation module generates a repeat through two modes of transliteration generation and retranslation generation, the model generation module generates a repeat through a language model, and the synonym replacement generation module generates a repeat through replacing synonyms in the original sentence.
Furthermore, the transliteration generation mode generates a repeat by translating an input sentence into a Chinese language, a Guangdong language and a traditional Chinese language, and the retranslation generation mode generates a repeat by translating an input Chinese sentence into an English sentence and then translating the English sentence into a Chinese language; the retranslation generating method also utilizes a multi-language translation model to translate an input sentence into a German sentence, and then translates the German sentence into a Chinese to generate a repeat.
Furthermore, the model generation module comprises two modes of directly training the Chinese rephrase generation model to generate the rephrase and indirectly generating the Chinese rephrase by using the English rephrase generation model, wherein the mode of indirectly generating the Chinese rephrase by using the English rephrase generation model is to translate Chinese sentences into English, input the English rephrase generation model to generate English rephrase and translate the English rephrase into Chinese to generate the Chinese rephrase.
Further, the synonym replacement generating module comprises a participle module, a named entity recognition module, a replaceable word filtering module, a synonym searching module, a synonym filtering module and a synonym replacement module, wherein the synonym replacement generating module comprises a participle sub-module, a named entity recognition sub-module, a replaceable word filtering sub-module, a synonym searching sub-module, a synonym filtering sub-module and a synonym replacement sub-module.
Further, the replaceable word filtering module in the synonym replacement generating module takes a set consisting of entity types such as an article name (nw), other proper names (nz), punctuation marks (w), a person name (PER), a location name (LOC), an organization name (ORG) and TIME (TIME) as a non-replaceable entity type set, and entities with types belonging to the set do not carry out synonym replacement.
Further, a mask Language Model (mask Language Model) pre-trained on large-scale corpus is introduced into a synonym filtering module in the synonym replacement generation module to filter synonyms, and the method specifically includes the following steps:
s1: replacing original words in the sentences with masks with the same length as the synonyms to obtain mask sentences;
s2: inputting a mask sentence into a pre-training mask language model, calculating the probability of generating a word corresponding to a synonym by the output corresponding to the mask position, and using the geometric mean of the generation probability of each word of the synonym as a confidence coefficient;
s3: setting a confidence threshold;
s4: and filtering out synonyms with the confidence degrees smaller than the threshold value to obtain synonyms suitable for the current context.
Furthermore, the fluency filtering module calculates the fluency by using the pre-trained mask language model, sets a fluency threshold value, and filters to obtain the repeat of the fluency. The fluency calculation method is to cover each word in the sentence one by one, and calculate the probability of generating the original word at the mask position output by the model. The degree of confusion is calculated based on the resulting generation probability, and the value obtained by subtracting the degree of confusion from 1 is mapped to the interval [0,1] using an exponential function. The fluency calculation method is formulated as:
Figure 339413DEST_PATH_IMAGE001
/>
wherein flu is fluency, N is sentence length, i is ith position, p is probability of generation, s is sentence,
Figure 947899DEST_PATH_IMAGE002
is the ith word of the sentence.
Furthermore, the semantic filtering module encodes the Sentence by using the Sennce Bert model to obtain a Sentence vector, and judges whether the semantics are the same according to the cosine values between the Sentence vectors. And setting a cosine value threshold, and considering that the semantics are the same if and only if the cosine value between sentence vectors is greater than or equal to the threshold.
The semantic filtering module trains an independent, sentence generation independent, senntence Bert model to determine semantics. Constructing a data set training model containing a strong negative sample and a weak negative sample, and improving a ternary objective function in the training of the sequence Bert model into the following form:
Figure 196477DEST_PATH_IMAGE003
wherein
Figure 247479DEST_PATH_IMAGE004
For an improved loss value>
Figure 327430DEST_PATH_IMAGE005
A sentence vector representing the sentence a @>
Figure 415472DEST_PATH_IMAGE006
A sentence vector representing the sentence p @>
Figure 715128DEST_PATH_IMAGE007
A sentence vector representing a sentence n; the sentence p is a positive example of a, i.e., p has the same semantic meaning as a. Sentence n is a negative example of a, i.e., n is different from a in semantic; />
Figure 179608DEST_PATH_IMAGE008
Represents a distance measure, <' > is>
Figure 504279DEST_PATH_IMAGE009
Indicating the set margin value.
The invention has the beneficial effects that:
(1) The method fully utilizes open-source pre-training model resources, utilizes the translation model to directly translate to generate the repeat and retranslate to generate the repeat, utilizes the multi-language translation model to improve the efficiency and diversity of generating the repeat, utilizes the pre-trained language model to finely tune and train the Chinese repeat generation model, and applies the English repeat generation model to the Chinese repeat generation.
(2) When synonym replacement is carried out, a set formed by entity types is used as an irreplaceable entity type set, and entities of which the types belong to the set are not subjected to synonym replacement, so that sentence semantic change caused by replacement of proper nouns and key information is avoided.
(3) A mask Language Model (mask Language Model) pre-trained on large-scale linguistic data is introduced to filter synonyms to obtain synonyms suitable for the current context, and the phenomenon that the synonyms which do not accord with the context cause sentences to be unsmooth or change sentence semantics is avoided.
(4) Calculating fluency by using a pre-trained mask language model, setting a fluency threshold, and filtering to obtain a repeat sentence with the fluency not lower than the threshold, thereby filtering out unsmooth sentences and obtaining the repeat sentences with fluency.
(5) Training an independent and Sentence generation-independent sequence Bert model to judge semantics so as to avoid influence on an evaluation result by a generation process, and ensuring generation of high-quality repetition by using multiple measures, wherein the steps comprise collecting an irreplaceable entity composition set, defining synonym confidence and a fluency calculation method, and improving a sequence Bert loss function.
Drawings
FIG. 1 is a block diagram of a system according to the present invention;
FIG. 2 is a diagram of a repeat generating module;
FIG. 3 is a diagram of a translation generation module architecture;
FIG. 4 is a block diagram of an transliteration generation module;
FIG. 5 is a block diagram of a translation generation module;
FIG. 6 is a block diagram of a model generation module;
FIG. 7 is a diagram of a synonym replacement generation module structure;
FIG. 8 is a diagram of a synonym filter module structure.
Detailed Description
The sentence repetition generating system based on the pre-training model provided by the invention is described in detail below with reference to the accompanying drawings.
The invention relates to a sentence repeat generating system based on a pre-training model, which comprises a repeat generating module, a fluency filtering module and a semantic filtering module which are connected in sequence; the rephrase generating module is used for generating the rephrase; the fluency filtering module is used for calculating the fluency of repeat, and filtering to obtain repeat with the fluency not lower than a threshold value; the semantic filtering module is used for calculating the semantic similarity between the repeat and the original sentence, and filtering to obtain the repeat with the semantic similarity not lower than a threshold value.
The duplicate generation module comprises a translation generation module, a model generation module and a synonym replacement generation module; the translation generation module generates a repeat through two modes of transliteration generation and retranslation generation, the model generation module generates a repeat through a language model, and the synonym replacement generation module generates a repeat through replacing synonyms in the original sentence.
The transliteration generation mode generates a repeat by translating an input sentence into a Chinese language, a Guangdong language and a traditional Chinese language, and the retranslation generation mode generates a repeat by translating an input Chinese sentence into an English sentence and then translating the English sentence into a Chinese language; the retracing generation method also utilizes a multi-language translation model to translate an input sentence into a German sentence, and then translates the German sentence into a Chinese to generate a rephrase.
The model generation module comprises two modes of directly training a Chinese compound generation model to generate compound and indirectly generating Chinese compound by using an English compound generation model, wherein the mode of indirectly generating Chinese compound by using the English compound generation model is to translate Chinese sentences into English, input the English compound generation model to generate English compound, and translate the English compound into Chinese to generate Chinese compound.
The synonym replacement generation module comprises a participle submodule, a named entity recognition submodule, a replaceable word filtering submodule, a synonym searching submodule, a synonym filtering submodule and a synonym replacing submodule.
The replaceable word filtering module in the synonym replacement generating module takes an entity type set consisting of an article name (nw), other proper names (nz), punctuation marks (w), a person name (PER), a place name (LOC), an organization name (ORG) and TIME (TIME) as a non-replaceable entity type set, and entities with types belonging to the set do not undergo synonym replacement.
The synonym filtering module in the synonym replacement generating module introduces a mask Language Model (Masked Language Model) pre-trained on large-scale corpus to filter synonyms, and the method specifically comprises the following steps:
s1: replacing original words in the sentence with masks with the same length as the synonyms to obtain mask sentences;
s2: inputting a mask sentence into a pre-training mask language model, calculating the probability of generating a word corresponding to a synonym by the output corresponding to the mask position, and using the geometric mean of the generation probability of each word of the synonym as a confidence coefficient;
s3: setting a confidence threshold;
s4: and filtering out synonyms with the confidence degrees smaller than the threshold value to obtain synonyms suitable for the current context.
The fluency filtering module calculates the fluency by using the pre-trained mask language model, sets a fluency threshold value, and filters to obtain the repeat of the fluency. The fluency calculation method is to mask each word in a sentence one by one, and calculate the probability of generating an original word at the mask position output by the model. The degree of confusion is calculated based on the resulting generation probability, and the value obtained by subtracting the degree of confusion from 1 is mapped to the interval [0,1] using an exponential function. The fluency calculation method is formulated as:
Figure 763222DEST_PATH_IMAGE010
/>
wherein flu is fluency, N is sentence length, i is ith position, p is generation probability, s is sentence,
Figure 734196DEST_PATH_IMAGE002
is the ith word of the sentence.
The semantic filtering module encodes the sentences by using a Senntence Bert model to obtain Sentence vectors, and judges whether the semantics are the same or not according to cosine values among the Sentence vectors. And setting a cosine value threshold, and considering that the semantics are the same if and only if the cosine value between sentence vectors is greater than or equal to the threshold.
The semantic filtering module trains an independent, sentence generation independent, sennce Bert model to judge semantics. Constructing a data set training model containing a strong negative sample and a weak negative sample, and improving a ternary objective function in the training of the sequence Bert model into the following form:
Figure 894044DEST_PATH_IMAGE011
wherein
Figure 89533DEST_PATH_IMAGE004
For an improved loss value>
Figure 50536DEST_PATH_IMAGE005
A sentence vector representing sentence a @>
Figure 682374DEST_PATH_IMAGE006
A sentence vector representing the sentence p @>
Figure 488656DEST_PATH_IMAGE007
A sentence vector representing a sentence n; the sentence p is a positive example of a, i.e., p has the same semantic meaning as a. Sentence n is a negative example of a, i.e., n is different from a in semantic; />
Figure 397706DEST_PATH_IMAGE008
Represents a distance metric, <' > based on>
Figure 873818DEST_PATH_IMAGE009
Indicating the set margin value.
Example 1
The pre-training model name referred to in this embodiment 1 is the name of the model in the hugging face model library. The model may be loaded according to the model name through the transformations framework of huggingface. For example, the model Vamsi/T5_ parahrase _ rows may be loaded in the following manner:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws")
model = AutoModelForSeq2SeqLM.from_pretrained("Vamsi/T5_Paraphrase_Paws")
when a sentence is input into a model or a sentence is generated by the model, for example, "inputting a mask sentence into a pre-trained mask language model", "calculating the probability of generating an original word from the mask position output by a coding model", and "inputting an english sentence into a Vamsi/T5_ paramhrase _ Paws model to generate an english duplicate" are used for convenience of expression and avoid repeated descriptions. The steps of adding separators ([ CLS ], [ SEP ]) before and after, mapping characters to integer indexes, padding length (pad) for zero padding, generating a word type index list (token type id) and an attention mask list (attention mask) required for model input, or mapping integer indexes to characters are omitted. These omitted steps are not obvious to the relevant practitioner and need not be described in any further detail.
The sentence repeat generating system based on the pre-training model inputs simplified Chinese mandarin sentences and outputs sentences with the same semanteme and different expressions, and the implementation steps comprise:
1. a Baidu translation interface calling module is set up to realize the function of translating the Chinese mandarin into the languages, the Guangdong languages and the traditional Chinese;
2. the method comprises the following steps of realizing a function of translating Chinese into English based on a Helsinki-NLP/opus-mt-zh-en model, and realizing the function of translating English into Chinese based on the Helsinki-NLP/opus-mt-en-zh model;
3. the function of translating Chinese into German is realized based on a Helsinki-NLP/opus-mt-ZH-de model, and the functions of translating German into Mandarin Chinese, guangdong language and local dialect are realized based on the Helsinki-NLP/opus-mt-de-ZH model;
4. and (4) collecting sentence pairs with the same semantic meaning by using a crawler and a manual labeling method, and using the sentence pairs as a Chinese repeat generation data set. Training a Chinese repeat generating model based on a thu-coai/CDial-GPT _ LCCC-large model fine tuning;
5. realizing the function of generating English rephrase based on a Vamsi/T5_ Parahrase _ Paws model; combining the translation function realized in the step 2, realizing the function of translating the Chinese sentence into the English sentence, inputting the English sentence into the Vamsi/T5_ Parahrase _ Paws model to generate English retelling, and translating the English retelling into Chinese;
6. utilizing a Baidu LAC (Lexical Analysis of Chinese) tool to perform word segmentation and named entity identification on sentences, and marking entities with entity types of article names (nw), other proper names (nz), punctuations (w), person names (PER), location names (LOC), organization names (ORG) and TIME (TIME) as non-replaceable entities, otherwise, marking as replaceable entities;
7. realizing a synonym searching function based on a synonym forest expansion version of a Hayada information retrieval research room, and searching out synonyms of entity words marked as replaceable words;
8. the probability calculation function of generating words at the mask position is realized based on a chip-roberta-wwm-ext model, sentences with masks are input, the probability of generating words corresponding to synonyms at the mask position is output, and a geometric mean is calculated as a confidence coefficient. Setting a confidence threshold value to be 0.0015, and realizing a function of filtering out synonyms with the confidence values lower than the threshold value;
9. the function of replacing the original words in the sentences into synonyms is realized;
10. realizing the function of calculating the fluency based on a chip-roberta-wwm-ext model; setting the fluency threshold to
Figure 868319DEST_PATH_IMAGE012
The function of filtering repeat with fluency lower than the threshold value is realized;
11. and (5) constructing a strong negative sample and a weak negative sample for each sentence by using the repeated generated data collected in the step (4) to generate a new data set. Training a sequence-BERT model using the improved ternary objective function based on the new dataset;
12. and (5) coding the sentences based on the model trained in the step 11, and calculating vector cosine values of the original sentences and the compound sentences. And setting the threshold value of the cosine value to be 0.85, and realizing the function of filtering the repeated statement of the original sentence with the cosine value lower than the threshold value.
The sentence repeat generating system based on the pre-training model, disclosed by the invention, is shown in fig. 1 and comprises a repeat generating module, a fluency filtering module and a semantic filtering module. Wherein: the repeat generating module is used for generating repeat; the fluency filtering module calculates the repeated fluency, and filters to obtain the repeated fluency not lower than a threshold value; and the semantic filtering module calculates the semantic similarity between the repeat and the original sentence, and filters the repeat to obtain the repeat with the semantic similarity not lower than a threshold value.
The structure of the repeat generation module is shown in fig. 2, and comprises a translation generation module, a model generation module and a synonym replacement generation module.
The translation generation module is shown in fig. 3 and includes a transliteration generation module and a retrace generation module. In this embodiment, the input is simplified Chinese Mandarin. The generation of transliteration means the generation of repeat through one-time translation, and the generation of retranslation means the generation of repeat by firstly translating Chinese into other languages and then translating the other languages back into Chinese.
The transliteration generation module is shown in fig. 4, and the target languages of translation include chinese, cantonese and traditional chinese. The input is translated into a literal using the open source model raynardj/wenyanwen-chicken-translate-to-answer. The input is translated into Guangdong and traditional Chinese using a Baidu Universal translation API.
As shown in FIG. 5, the translation generation module uses a Helsinki-NLP/opus-mt-zh-en model to translate Chinese into English, and then uses the Helsinki-NLP/opus-mt-en-zh model to translate English into Chinese to generate a reply. The multilingual translation model can translate an input language into a plurality of target languages, and the multilingual translation model is used for generating the repeat, so that abundant repeat can be generated and the generation efficiency is improved. In this example, helsinki-NLP/opus-mt-de-ZH model was used. It is a multilingual translation model that can translate german into mandarin, cantonese, and local dialects. In order to use the Helsinki-NLP/opus-mt-de-ZH model, the Helsinki-NLP/opus-mt-ZH-de model is used for translating Chinese into German, and then the Helsinki-NLP/opus-mt-de-ZH model is used for translating the German into Chinese such as Mandarin, guangdong and local dialects to generate the compound statement.
The model generation module includes two methods, as shown in fig. 6, that is, directly training the chinese repeat generating model to generate repeat and indirectly generating chinese repeat by using the english repeat generating model. The GPT module in FIG. 6 is a Chinese repeat generation model obtained based on the fine tuning training of the thu-coai/CDial-GPT _ LCCC-large model. the thu-coai/CDial-GPT _ LCCC-large is an open-source Chinese language model, and can generate high-quality dialogue response after being trained by an LCCC-large data set. And collecting sentence pairs with the same semantics and different expressions by a network crawling and manual labeling method to serve as a Chinese repeated statement data set. And (3) finely adjusting the thu-coai/CDial-GPT _ LCCC-large model by using the Chinese repeat data set to obtain a Chinese repeat generating model.
At present, english rephrase generation is studied more deeply than Chinese rephrase generation, and a plurality of open-source models and frameworks can generate high-quality rephrase. Although the open source models and frameworks are resources generated by English language reproduction, the open source models and frameworks are still valuable in Chinese language reproduction tasks. To utilize these resources in Chinese language paraphrase generation, the most natural idea is to translate Chinese sentences into English, input an English paraphrase generation model to generate English paraphrases, and translate English paraphrases into Chinese to generate Chinese paraphrases. In this embodiment, a Vamsi/T5_ parahrase _ pages model is selected, a translation method in a translation generation module is used to translate chinese into english, an english sentence is input into the Vamsi/T5_ parahrase _ pages model to generate an english rephrase, and then the english rephrase is translated into chinese by the translation method in the translation generation module to generate a chinese rephrase.
The synonym replacement module is shown in fig. 7 and comprises a segmentation module, a named entity recognition module, an alternative filtering module, a synonym searching module, a synonym filtering module and a synonym replacement module. The idea of generating the repeat by the agreement word replacement module is to replace words in the sentence with synonyms, so that the expression of the sentence can be changed to generate the repeat under the condition of keeping the semantic of the sentence unchanged. However, the meaning of a word is related to the context of the sentence in which the word is located, and in some cases, synonym replacement can change the sentence semantics. For example, the phrase "when a conjugant is listed" is synonymous with "feeling", "imagination", "conception", "reverie", etc. for the term "conjugant". The "association" in a sentence is the name of the organization, and replacing "association" with a synonym changes the sentence semantics. In order to avoid this situation, a set of non-replaceable entity types needs to be counted, and entity words belonging to the set of non-replaceable entity types will not be replaced by synonyms. The sentence is divided and named entity identification is carried out by using a Baidu LAC (Lexical Analysis of Chinese) tool. And taking a set consisting of entity types such as an article name (nw), other proper names (nz), punctuation marks (w), a person name (PER), a location name (LOC), an organization name (ORG) and TIME (TIME) as a set of non-replaceable entity types, wherein the entities with the types belonging to the set are not subjected to synonym replacement.
And for the alternative words, searching synonyms through the synonym forest expansion version of the Hadamard information retrieval research room. The synonym forest expansion version of the information retrieval research room in Hadamard is an expanded version of synonym forest in the information retrieval research room in Hadamard and comprises 77343 words. In the extended version of the word forest text, words in the same line have the same word senses or have strong correlation with the word senses. Synonyms can be looked up based on the expanded version of the forest. Given a word, other words in the row may be treated as synonyms for the given word by querying the row number of the word. For example, the term "soybean" can be used synonymously to refer to the terms "green soybean" and "soybean".
However, for an ambiguous word, its meaning is related to the context in which it is presented. Some synonyms queried from a forest of words may not be applicable to the current context. For example, "several" and "more or less" can both be synonyms for "how many", in the sentence "how many people are in the figure" more or less "cannot be synonyms for" how many "; in the sentence "how much he is angry" the "several" cannot be used as a synonym for the "how much". Therefore, the queried synonyms need to be filtered to obtain the synonyms applicable to the current context. Compared with the synonym which is not suitable for the current context, the sentence is still smooth and fluent after the synonym which is suitable for the current context is replaced by the original word. From a statistical point of view, synonyms that apply to the current context should have a higher probability of occurrence in the current context than synonyms that do not apply to the current context. Based on this idea, a mask Language Model (Masked Language Model) pre-trained on large-scale corpora is introduced to filter synonyms. Specifically, original words in the sentence are replaced by masks with the same length as the synonyms, and a mask sentence is obtained. Inputting a mask sentence into a pre-training mask language model, calculating the probability of generating a word corresponding to a synonym by the output corresponding to the mask position, taking the geometric mean of the generation probability of each word of the synonym as a confidence coefficient, and expressing the probability as follows by a formula:
Figure 212713DEST_PATH_IMAGE013
where conf is the confidence, n is the mask length, p is the probability of generation, i is the ith mask position, w is a synonym,
Figure 976269DEST_PATH_IMAGE014
is the ith word of the synonym.
After the synonym confidence is present, a confidence threshold is set. And filtering out synonyms with the confidence degrees smaller than the threshold value to obtain synonyms suitable for the current context. Taking "how many people are in the graph" as an example, replacing "how many" in the sentence with a mask having the same length as the synonym results in a mask sentence. For synonym "several", we get MASK sentence "with MASK]Human, for synonyms "more or less", we get the MASK sentence "there is [ MASK ] in the graph][MASK][MASK][MASK]Human ". The mask sentence is input into a pre-training mask language model, and a chinese-roberta-wwm-ext model is selected in the embodiment. The probability that the corresponding output at the mask location generates synonyms is calculated, as shown in FIG. 8. For MASK sentence "there is [ MASK in the figure]Person ", probability of mask position generating" a few
Figure 919822DEST_PATH_IMAGE015
The confidence of "a/d" is>
Figure 136039DEST_PATH_IMAGE016
. For MASK sentence "there is [ MASK in the figure][MASK][MASK][MASK]Human ", the mask positions respectively generate a" more or less "probability of being
Figure 284124DEST_PATH_IMAGE017
. The "more or less" confidence level is
Figure 777553DEST_PATH_IMAGE018
The threshold value config =0.0015 is set,
Figure 985681DEST_PATH_IMAGE019
thus "several" is retained and "more or less" is filtered out.
After the synonyms suitable for the context are obtained, the synonym replacing module replaces the original words with the synonyms to generate the repeated statement. For example, for the sentence "how many people are in the figure", the synonyms "drawing" and "how many" of "figure" are obtained. The synonym replacement module will generate 6 repeats: how many people are in the drawing, and how many people are in the drawing.
The statement generating module generates a statement through translation generation, model generation and synonym replacement. And after the repetition is removed, inputting the repetition into a fluency filtering module, and filtering out fluency and smooth repetition. Similar to the synonym filtering module, the fluency is calculated by utilizing a pre-trained mask language model. Specifically, each word in the sentence is covered one by one, and the probability of generating the original word at the mask position output by the model is calculated. Calculating a degree of confusion based on the obtained generation probability, and mapping a value obtained by subtracting the degree of confusion from 1 to an interval [0,1] by using an exponential function]. The fluency calculation method is formulated as:
Figure 689195DEST_PATH_IMAGE020
wherein flu is fluency, N is sentence length, i is ith position, p is probability of generation, s is sentence,
Figure 375391DEST_PATH_IMAGE021
is the ith word of the sentence.
In the embodiment, a self-coding model, namely, a chip-roberta-wwm-ext is selected to build a fluency filtering module. Take "how many people are in the figure" as an example, will "[ MASK]How many people in the model are input to obtain the output of the model in the MASK]The probability of a location generating a "map" is
Figure 503753DEST_PATH_IMAGE022
. Then "map [ MASK]Input model for how many people there areObtaining model output in [ MASK ]]The probability that a location generates "in" is ≦>
Figure 882781DEST_PATH_IMAGE023
. By analogy, the probability is obtained
Figure 73591DEST_PATH_IMAGE024
,/>
Figure 438845DEST_PATH_IMAGE025
. Calculate the fluency->
Figure 765921DEST_PATH_IMAGE026
. For "how many patterns there are in person", the probability is calculated, respectively>
Figure 315851DEST_PATH_IMAGE027
Figure 885635DEST_PATH_IMAGE028
Figure 913634DEST_PATH_IMAGE029
And calculating to obtain the fluency->
Figure 360795DEST_PATH_IMAGE030
. Setting the fluency threshold to
Figure 81627DEST_PATH_IMAGE031
. The fluency of the example sentence has >>
Figure 122395DEST_PATH_IMAGE032
"how many people in the figure" can be filtered by fluency, "how many figures in the figure" will be filtered out.
After filtering to obtain fluent retelling, the retelling should be ensured to be the same as the original sentence semantics. According to the method, a repetition is generated through a generation model, an original sentence and a repeated sentence vector are obtained through the generation model, and whether the repeated sentence is the same as the original sentence in semantics is judged through cosine values among the sentence vectors. This approach does not distinguish between the generation and evaluation tasks. The generation model generates the repeat based on the original sentence, which indicates that the generation model considers that the repeated semantics are the same as the original sentence, and then the sentence vector obtained by the generation model is used for judging the original sentence and the repeat semantics, so that the judgment semantics tend to be the same. In order to avoid the generation process affecting the evaluation result, the semantic filtering module should train an independent model unrelated to sentence generation to judge the semantics.
Using the Sennce Bert structure, a dataset is constructed to train the model. A data set constructed when training a Chinese restatement model is utilized. Each pair of sentences in the data set is a positive sample. For each sentence, one sentence is randomly sampled from the data set and paired with it as a negative sample. Most sentences of the negative sample formed by random sampling pairing have different semantics and words, and the sample is called as a strong negative sample. For better training of the model, for each sentence, a sentence pair similar to its wording but having a different semantic meaning is selected from the data set as a negative sample, called a weak negative sample. The selection method comprises the steps of firstly extracting tf-idf vectors of sentences, recalling the most similar 10 sentences according to cosine values of the tf-idf vectors among the sentences, and manually selecting one sentence with different semantics from the current sentence from the 10 sentences. And if no sentences with different semantics exist in the recalled 10 sentences, continuously recalling 10 sentences with later cosine values for screening until the sentences with different semantics are selected. After a new data set is constructed, the training of the model begins. The sequence Bert model is trained to include a Classification Objective Function (Classification Objective Function), a Regression Objective Function (Regression Objective Function), and a triple Objective Function (triple Objective Function). The conventional ternary objective function is to minimize the loss function:
Figure 954085DEST_PATH_IMAGE033
. Wherein loss is a loss value>
Figure 521332DEST_PATH_IMAGE034
A sentence vector representing the sentence a @>
Figure 147486DEST_PATH_IMAGE035
A sentence vector, representing a sentence p>
Figure 924818DEST_PATH_IMAGE036
A sentence vector representing sentence n. The sentence p is a positive example of a, i.e., p has the same semantic meaning as a. Sentence n is a negative example of a, i.e., n is different from a in semantic meaning. />
Figure 560199DEST_PATH_IMAGE037
Represents a distance metric, <' > based on>
Figure 981953DEST_PATH_IMAGE038
Indicating the set margin value.
In the expression of loss function
Figure 919953DEST_PATH_IMAGE038
Make a setting of>
Figure 794368DEST_PATH_IMAGE035
At least the ratio->
Figure 233439DEST_PATH_IMAGE036
Leave/be>
Figure 509700DEST_PATH_IMAGE034
More closely->
Figure 366404DEST_PATH_IMAGE038
Of the distance of (c). The loss function can be opened and/or closed>
Figure 728115DEST_PATH_IMAGE034
And &>
Figure 970878DEST_PATH_IMAGE036
But not enough to open>
Figure 977011DEST_PATH_IMAGE035
And &>
Figure 381448DEST_PATH_IMAGE036
Is a distance of. For example when>
Figure 230455DEST_PATH_IMAGE034
And &>
Figure 745750DEST_PATH_IMAGE036
Is at a distance of->
Figure 121237DEST_PATH_IMAGE039
And->
Figure 696574DEST_PATH_IMAGE035
Located and->
Figure 32878DEST_PATH_IMAGE036
At the middle of the middle, it is selected>
Figure 351864DEST_PATH_IMAGE040
The model will not be optimized again during training. When a sentence p is input and a sentence with the same semantic meaning as p is desired to be searched, the sentences a and n are as close to p in the sentence vector space. For the sentence p, it cannot be judged which of a and n is the same as self-semantic according to the distance between the sentence vectors. As a positive example of sentence a, p has the same semantic meaning as a. It can be considered that the sentences p and a are equivalent, and interchanging p and a should not and cannot affect the calculation result of the loss function. To embody this idea, the ternary objective function is modified into the form: />
Figure 332589DEST_PATH_IMAGE041
. Wherein +>
Figure 78828DEST_PATH_IMAGE042
For improved loss values, the remaining characters have the same meaning as the original ternary objective function. When +>
Figure 636849DEST_PATH_IMAGE034
And &>
Figure 759525DEST_PATH_IMAGE036
In a distance of >>
Figure 345490DEST_PATH_IMAGE039
And->
Figure 262630DEST_PATH_IMAGE035
Is located at
Figure 573526DEST_PATH_IMAGE034
And &>
Figure 234314DEST_PATH_IMAGE036
At the middle of the middle, it is selected>
Figure 924053DEST_PATH_IMAGE043
The model continues to be optimized during training.
The improved ternary objective function expression has the rotation symmetry of sentences a and p, and makes
Figure 543253DEST_PATH_IMAGE034
And &>
Figure 75865DEST_PATH_IMAGE035
In a distance ratio>
Figure 664979DEST_PATH_IMAGE034
And &>
Figure 333857DEST_PATH_IMAGE036
、/>
Figure 858379DEST_PATH_IMAGE035
And &>
Figure 878288DEST_PATH_IMAGE036
Is at least small>
Figure 756245DEST_PATH_IMAGE038
. The improved ternary objective function enables sentences with the same semantics to be closer in a sentence vector space and sentences with different semantics to be farther away.
And training by utilizing the classification target function, the regression target function and the improved ternary target function to obtain a Senntence Bert model, setting a cosine value threshold, and considering that the semantics of two sentences are the same when the cosine value of a Sentence vector between the two sentences is greater than or equal to the threshold. The original sentence "how many people there are in the figure" and the repeated sentence "how many people there are in the figure" and how many trees there are in the figure "are taken as examples. The cosine values of the original sentence and the sentence "there are several persons in the figure" are 0.86368716, and the cosine values of the sentence "there are trees in the figure" are 0.14648251. Setting the threshold value of cosine value to be 0.85, filtering to obtain the repeat of the original sentence with 'several persons in the picture', and filtering out 'how many trees are in the picture'.
The above modules are assembled to build a duplicate generation system of the present embodiment, and duplicate representations generated by the system are shown in the following table.
Figure 545210DEST_PATH_IMAGE044
/>
Table 1 example of generating the present embodiment
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A sentence rephrasing generation system based on a pre-training model is characterized in that: the system comprises a repeat generation module, a fluency filtering module and a semantic filtering module which are connected in sequence; the rephrase generating module is used for generating the rephrase; the fluency filtering module is used for calculating the fluency of repeat, and filtering to obtain repeat with the fluency not lower than a threshold value; the semantic filtering module is used for calculating the semantic similarity between the repeat and the original sentence, and filtering to obtain the repeat with the semantic similarity not lower than a threshold value;
the fluency filtering module calculates the fluency by utilizing a pre-trained mask language model, sets a fluency threshold value, and filters to obtain the repeat of the fluency; the fluency calculation method comprises the steps of covering each word in a sentence one by one, and calculating the probability of generating an original word at the mask position output by a model; based onCalculating the confusion degree of the generated probability, and mapping the value obtained by subtracting the confusion degree from 1 to the interval [0,1] by using an exponential function](ii) a The fluency calculation method is formulated as:
Figure DEST_PATH_IMAGE001
wherein flu is fluency, N is sentence length, i is ith position, p is probability of generation, s is sentence,
Figure 776440DEST_PATH_IMAGE002
is the ith word of the sentence;
the semantic filtering module trains an independent Sennce Bert model irrelevant to Sentence generation to judge semantics, constructs a data set training model containing strong negative samples and weak negative samples, and improves a ternary objective function during training of the Sennce Bert model into the following form:
Figure DEST_PATH_IMAGE003
wherein
Figure 772209DEST_PATH_IMAGE004
For an improved loss value>
Figure 20788DEST_PATH_IMAGE005
A sentence vector representing the sentence a @>
Figure 9472DEST_PATH_IMAGE006
A sentence vector representing the sentence p @>
Figure 292686DEST_PATH_IMAGE007
A sentence vector representing a sentence n; sentence p is a positive example of a, i.e., p has the same semantic meaning as a; sentence n is a negative example of a, i.e., n is different from a in semantic;
Figure 459356DEST_PATH_IMAGE008
represents a distance measure, <' > is>
Figure 195231DEST_PATH_IMAGE010
Indicating the set margin value.
2. The system according to claim 1, wherein the system comprises: the rephrase generation module comprises a translation generation module, a model generation module and a synonym replacement generation module; the translation generation module generates a statement through an transliteration generation mode and a retrace generation mode, the model generation module generates a statement through a language model, and the synonym replacement generation module generates a statement through replacing synonyms in the original sentence.
3. The system for generating sentence rephrasing based on the pre-training model of claim 2, wherein: the transliteration generation mode generates a compound sentence by translating an input sentence into Chinese, and the retranslation generation mode translates the input Chinese sentence into a foreign language sentence by using a multilingual translation model and then translates the foreign language sentence into Chinese to generate the compound sentence.
4. The system for generating sentence rephrasing based on the pre-training model of claim 2, wherein: the model generation module comprises two modes of directly training a Chinese compound generation model to generate compound and indirectly generating Chinese compound by using an English compound generation model, wherein the mode of indirectly generating Chinese compound by using the English compound generation model is to translate Chinese sentences into English, input the English compound generation model to generate English compound, and translate the English compound into Chinese to generate Chinese compound.
5. The system according to claim 2, wherein the sentence repetition generation system based on the pre-training model comprises: the synonym replacement generation module comprises a participle sub-module, a named entity recognition sub-module, a replaceable word filtering sub-module, a synonym searching sub-module, a synonym filtering sub-module and a synonym replacement sub-module.
6. The system according to claim 5, wherein the sentence repetition generation system based on the pre-training model comprises: the replaceable word filtering submodule takes a set consisting of a work name, other proper names, punctuation marks, a person name, a place name, a mechanism name and time as a non-replaceable entity type set, and entities of which the types belong to the set are not replaced by synonyms.
7. The system according to claim 5, wherein the sentence repetition generation system based on the pre-training model comprises: the synonym filtering submodule introduces a mask language model pre-trained on a large-scale corpus to filter synonyms and specifically comprises the following steps:
s1: replacing original words in the sentence with masks with the same length as the synonyms to obtain mask sentences;
s2: inputting a mask sentence into a pre-training mask language model, calculating the probability of generating a word corresponding to a synonym by the output corresponding to the mask position, and using the geometric mean of the generation probability of each word of the synonym as a confidence coefficient;
s3: setting a confidence threshold;
s4: and filtering out synonyms with the confidence degrees smaller than the threshold value to obtain synonyms suitable for the current context.
8. The system according to claim 1, wherein the system comprises: the semantic filtering module encodes the Sentence by using a Sennce Bert model to obtain a Sentence vector, and judges whether the semantics are the same or not according to cosine values among the Sentence vectors; and setting a cosine value threshold, and considering that the semantics are the same if and only if the cosine value between the sentence vectors is greater than or equal to the threshold.
CN202211245822.7A 2022-10-12 2022-10-12 Sentence repeat generating system based on pre-training model Active CN115329784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211245822.7A CN115329784B (en) 2022-10-12 2022-10-12 Sentence repeat generating system based on pre-training model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211245822.7A CN115329784B (en) 2022-10-12 2022-10-12 Sentence repeat generating system based on pre-training model

Publications (2)

Publication Number Publication Date
CN115329784A CN115329784A (en) 2022-11-11
CN115329784B true CN115329784B (en) 2023-04-07

Family

ID=83915134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211245822.7A Active CN115329784B (en) 2022-10-12 2022-10-12 Sentence repeat generating system based on pre-training model

Country Status (1)

Country Link
CN (1) CN115329784B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102455786A (en) * 2010-10-25 2012-05-16 三星电子(中国)研发中心 System and method for optimizing Chinese sentence input method
CN109558597A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Text interpretation method and device, equipment and storage medium
CN113590786A (en) * 2021-07-28 2021-11-02 平安科技(深圳)有限公司 Data prediction method, device, equipment and storage medium
CN114417825A (en) * 2022-01-19 2022-04-29 上海一者信息科技有限公司 English synonym recommendation method fusing dictionary and context information

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070043553A1 (en) * 2005-08-16 2007-02-22 Microsoft Corporation Machine translation models incorporating filtered training data
CN102650987A (en) * 2011-02-25 2012-08-29 北京百度网讯科技有限公司 Machine translation method and device both based on source language repeat resource
CN110555203B (en) * 2018-05-31 2023-05-30 北京百度网讯科技有限公司 Text replication method, device, server and storage medium
CN110543639B (en) * 2019-09-12 2023-06-02 扬州大学 English sentence simplification algorithm based on pre-training transducer language model
CN111027331B (en) * 2019-12-05 2022-04-05 百度在线网络技术(北京)有限公司 Method and apparatus for evaluating translation quality
CN111814451A (en) * 2020-05-21 2020-10-23 北京嘀嘀无限科技发展有限公司 Text processing method, device, equipment and storage medium
CN113553824A (en) * 2021-07-07 2021-10-26 临沂中科好孕智能技术有限公司 Sentence vector model training method
CN113971394A (en) * 2021-10-26 2022-01-25 上海交通大学 Text repeat rewriting system
CN114416984A (en) * 2022-01-12 2022-04-29 平安科技(深圳)有限公司 Text classification method, device and equipment based on artificial intelligence and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102455786A (en) * 2010-10-25 2012-05-16 三星电子(中国)研发中心 System and method for optimizing Chinese sentence input method
CN109558597A (en) * 2018-12-17 2019-04-02 北京百度网讯科技有限公司 Text interpretation method and device, equipment and storage medium
CN113590786A (en) * 2021-07-28 2021-11-02 平安科技(深圳)有限公司 Data prediction method, device, equipment and storage medium
CN114417825A (en) * 2022-01-19 2022-04-29 上海一者信息科技有限公司 English synonym recommendation method fusing dictionary and context information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于句子级BLEU指标挑选数据的半监督神经机器翻译;叶绍林等;《模式识别与人工智能》;20171015(第10期);全文 *

Also Published As

Publication number Publication date
CN115329784A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN109359294B (en) Ancient Chinese translation method based on neural machine translation
WO2022057116A1 (en) Transformer deep learning model-based method for translating multilingual place name root into chinese
Cotterell et al. Labeled morphological segmentation with semi-markov models
CN110209818B (en) Semantic sensitive word and sentence oriented analysis method
Bertaglia et al. Exploring word embeddings for unsupervised textual user-generated content normalization
Patel et al. ES2ISL: an advancement in speech to sign language translation using 3D avatar animator
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
Sun et al. VCWE: visual character-enhanced word embeddings
Tawfik et al. Morphology-aware word-segmentation in dialectal Arabic adaptation of neural machine translation
Liu Research on the development of computer intelligent proofreading system based on the perspective of English translation application
CN115858750A (en) Power grid technical standard intelligent question-answering method and system based on natural language processing
CN108491399A (en) Chinese to English machine translation method based on context iterative analysis
CN113408307B (en) Neural machine translation method based on translation template
Zhu Deep learning for Chinese language sentiment extraction and analysis
CN115329784B (en) Sentence repeat generating system based on pre-training model
CN109960782A (en) A kind of Tibetan language segmenting method and device based on deep neural network
Li et al. New word discovery algorithm based on n-gram for multi-word internal solidification degree and frequency
Jafar Tafreshi et al. A novel approach to conditional random field-based named entity recognition using Persian specific features
CN113886521A (en) Text relation automatic labeling method based on similar vocabulary
CN112085985A (en) Automatic student answer scoring method for English examination translation questions
CN110569510A (en) method for identifying named entity of user request data
Satpathy et al. Analysis of Learning Approaches for Machine Translation Systems
Romero et al. Towards text simplification in spanish: A brief overview of deep learning approaches for text simplification
Nathani et al. Part of Speech Tagging for a Resource Poor Language: Sindhi in Devanagari Script using HMM and CRF
Seresangtakul et al. Thai-Isarn dialect parallel corpus construction for machine translation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant