CN118132672A - Model pre-training method and device, storage medium and electronic equipment - Google Patents
Model pre-training method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN118132672A CN118132672A CN202410131223.5A CN202410131223A CN118132672A CN 118132672 A CN118132672 A CN 118132672A CN 202410131223 A CN202410131223 A CN 202410131223A CN 118132672 A CN118132672 A CN 118132672A
- Authority
- CN
- China
- Prior art keywords
- data
- path
- training
- under
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 503
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000006835 compression Effects 0.000 claims abstract description 91
- 238000007906 compression Methods 0.000 claims abstract description 91
- 238000012545 processing Methods 0.000 claims abstract description 44
- 230000000873 masking effect Effects 0.000 claims abstract description 33
- 230000010354 integration Effects 0.000 description 16
- 238000004590 computer program Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 230000011218 segmentation Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention provides a model pre-training method, a device, a storage medium and electronic equipment, wherein the method comprises the following steps: compression processing is carried out to obtain compressed data sources under each path of data in each training data; masking the compressed data sources under the data of each path in each training data, and encoding the masking data sources under the corresponding path in each training data to obtain the encoded data sources under the data of each path in each training data; based on the coding data sources under each path of data in each training data and the compression data sources under each path of data in each training data, respectively calculating model loss values under each path of data, respectively optimizing model parameters in an initial language model corresponding to the corresponding path of data according to the model loss values under each path of data, and obtaining an intermediate language model corresponding to each path of data so as to determine a target language model corresponding to each path of data. The embodiment of the invention can conveniently perform model pre-training so as to improve the model pre-training efficiency.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a model pre-training method and apparatus, a storage medium, and an electronic device.
Background
Currently, a pre-training language model has been widely focused on in natural language processing, and the pre-training language model refers to: pre-training to obtain better semantic representation before training of a downstream task by using a large amount of unsupervised data; however, the related art results in a lower model pre-training efficiency when model pre-training (i.e., modeling) is performed, especially when long text modeling is performed. Based on this, how to perform model pre-training conveniently to improve the model pre-training efficiency has no better solution at present.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a model pre-training method, a device, a storage medium and electronic equipment, so as to solve the problems of lower model pre-training efficiency and the like caused by related technologies; that is, the embodiment of the invention can conveniently perform model pre-training to improve the model pre-training efficiency, and can effectively improve the training and reasoning speed of the model.
According to an aspect of the present invention, there is provided a model pre-training method, the method comprising:
acquiring a training data set, wherein one training data set comprises an initial data source of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer;
Respectively compressing initial data sources under each path of data in each training data included in the training data set to obtain compressed data sources under each path of data in each training data, wherein the number of compressed semantic representations in one compressed data source is smaller than the number of initial semantic representations in the corresponding initial data source;
Masking the compressed data sources under each path of data in each training data respectively to obtain masking data sources under each path of data in each training data, calling initial language models corresponding to each path of data respectively, and encoding the masking data sources under the corresponding paths of data in each training data to obtain encoded data sources under each path of data in each training data;
And calculating model loss values under each path of data based on the coding data sources under each path of data in each training data and the compression data sources under each path of data in each training data, and optimizing model parameters in an initial language model corresponding to each path of data according to the model loss values under each path of data to obtain an intermediate language model corresponding to each path of data so as to determine a target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data.
According to another aspect of the present invention, there is provided a model pre-training apparatus, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training data set, one training data comprises an initial data source of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer;
The processing unit is used for respectively compressing the initial data sources under each path of data in each training data included in the training data set to obtain compressed data sources under each path of data in each training data, and the number of compressed semantic representations in one compressed data source is smaller than the number of initial semantic representations in the corresponding initial data source;
The processing unit is further configured to mask the compressed data sources under each path of data in each training data respectively, obtain mask data sources under each path of data in each training data, call an initial language model corresponding to each path of data respectively, and encode the mask data sources under the corresponding path of data in each training data, so as to obtain encoded data sources under each path of data in each training data;
The processing unit is further configured to calculate a model loss value under each path of data based on the encoded data source under each path of data in each training data and the compressed data source under each path of data in each training data, and optimize model parameters in an initial language model corresponding to each path of data according to the model loss value under each path of data, so as to obtain an intermediate language model corresponding to each path of data, so as to determine a target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data.
According to another aspect of the invention there is provided an electronic device comprising a processor, and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the above mentioned method.
According to another aspect of the present invention there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above mentioned method.
After the training data set is obtained, the embodiment of the invention can respectively compress the initial data sources under each path of data in each training data included in the training data set to obtain the compressed data sources under each path of data in each training data, wherein one training data comprises the initial data sources of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer; the number of compressed semantic representations in one compressed data source is less than the number of initial semantic representations in the corresponding initial data source. Then, the compressed data sources under the data in each path in each training data can be respectively masked to obtain the mask data sources under the data in each path in each training data, the initial language model corresponding to each path of data is respectively called, and the mask data sources under the corresponding path of data in each training data are encoded to obtain the encoded data sources under each path of data in each training data. Based on the above, the model loss value under each path of data can be calculated based on the coding data source under each path of data in each training data and the compression data source under each path of data in each training data, and the model parameters in the initial language model corresponding to the corresponding path of data are optimized according to the model loss value under each path of data, so as to obtain the intermediate language model corresponding to each path of data, and the target language model corresponding to each path of data is determined based on the intermediate language model corresponding to each path of data. Therefore, the embodiment of the invention can conveniently perform model pre-training to improve the model pre-training efficiency, and can effectively improve the training and reasoning speed of the model.
Drawings
Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:
FIG. 1 illustrates a flow diagram of a model pre-training method according to an exemplary embodiment of the invention;
FIG. 2 shows a schematic diagram of a pre-training according to an exemplary embodiment of the present invention;
FIG. 3 illustrates a flow diagram of another model pre-training method according to an exemplary embodiment of the present invention;
FIG. 4 shows a schematic diagram of a contrast study according to an exemplary embodiment of the invention;
FIG. 5 shows a schematic block diagram of a model pre-training apparatus according to an exemplary embodiment of the present invention;
fig. 6 shows a block diagram of an exemplary electronic device that can be used to implement an embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.
It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of such messages or information.
It should be noted that, the execution body of the model pre-training method provided by the embodiment of the present invention may be one or more electronic devices, which is not limited in this aspect of the present invention; the electronic device may be a terminal (i.e. a client) or a server, and when the execution body includes a plurality of electronic devices and the plurality of electronic devices include at least one terminal and at least one server, the model pre-training method provided by the embodiment of the present invention may be executed jointly by the terminal and the server. Accordingly, the terminals referred to herein may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices, and the like. The server mentioned herein may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing (cloud computing), cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms, and so on.
Based on the above description, an embodiment of the present invention proposes a model pre-training method that can be performed by the above-mentioned electronic device (terminal or server); or the model pre-training method may be performed jointly by the terminal and the server. For convenience of explanation, the model pre-training method is executed by the electronic device in the following description; as shown in fig. 1, the model pre-training method may include the following steps S101-S104:
s101, acquiring a training data set, wherein one training data set comprises an initial data source of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer.
Wherein one training data corresponds to one object; then, correspondingly, the training data set may comprise training data of each object of the at least one object, i.e. the training data set may comprise an initial data source of each object under each path of data; alternatively, the data source of an object under each path of data may also be referred to as the data source under each path of data in the training data of the corresponding object, that is, one training data may include the initial data source under each path of data, and the data source under each path of data in one training data may refer to: and the data sources of the objects corresponding to the corresponding training data under each path of data. For example, the compressed data source under each path of data in one training data may refer to a compressed data source under each path of data for an object corresponding to the corresponding training data, the mask data source under each path of data in one training data may refer to a mask data source under each path of data for an object corresponding to the corresponding training data, the encoded data source under each path of data in one training data may refer to an encoded data source under each path of data for an object corresponding to the corresponding training data, and so on.
Alternatively, an object may be a user, a commodity (e.g., a book, etc.), or the like; the embodiment of the present invention is not limited thereto. Optionally, each object in at least one object may belong to the same type, for example, may be all users or all books, etc.
It should be understood that, when the value of M is 1, the embodiment of the present invention relates to one-way data, so as to implement the following determination of a target language model corresponding to one-way data; when the value of M is greater than 1, the data may be expanded to multiple paths of data, so as to implement the following determination of the target language model corresponding to each path of data, that is, the number of target language models may be multiple, one path of data corresponds to one target language model, and so on.
For example, assuming that the value of M is 2 and one object is a user, one path of data may be attribute tag data (such as gender, age, etc.) of the user, that is, an initial data source under one path of data may be used to describe attribute information of the user; accordingly, the other path of data may be search text and/or browsing text of the user, or the like, i.e. the initial data source under the other path of data may be used to describe the search content and/or browsing content of the user, or the like.
In the embodiment of the present invention, the above-mentioned acquisition manner of the training data set may include, but is not limited to, the following several ways:
the first acquisition mode is as follows: the electronic device may store a plurality of training data in its own memory space, in which case the electronic device may select at least one training data from the plurality of training data and add the at least one training data to the training data set such that the training data set includes the at least one training data.
The second acquisition mode is as follows: the electronic device may obtain a training data download link and add training data downloaded based on the training data download link to the training data set such that the training data set includes training data downloaded based on the training data download link.
The third acquisition mode is as follows: the electronic device may obtain a training text set that includes at least one training text, which may include training sub-text of an object under each path of data. In this case, for any training text in the training text set, the electronic device may perform word segmentation processing on the training sub-text under each path of data in any training text to obtain a word segmentation result under each path of data in any training text, where one word segmentation result may include a plurality of words (token), and may perform vectorization processing on the word segmentation result under each path of data in any training text to obtain an initial data source under each path of data in any training text, where the initial data source under any path of data in any training text may include an initial semantic representation of each word segmentation under any path of data in any training text, so that the initial data source under each path of data in any training text may be added to the training data set, so as to use the initial data source under each path of data in any training text as one training data in the training data set, and so on.
It should be noted that one training data may correspond to one training text, and one path of data in one training data corresponds to one training sub-text in the corresponding training text. The text length of one training sub-text may be any length, which is not limited in the embodiment of the present invention, and the text length of one training sub-text may be the number of word segmentation in the corresponding training sub-text. Alternatively, when the text length of a text is greater than a preset length threshold, the text may be regarded as a long text; alternatively, the preset length threshold may be set empirically, or may be set according to actual requirements, which is not limited in this embodiment of the present invention.
Optionally, when the training sub-text in the training text corresponding to each training data is a long text, the embodiment of the present invention may implement long text modeling, which is specifically shown below, and is not described herein again.
S102, respectively compressing initial data sources under each path of data in each training data included in the training data set to obtain compressed data sources under each path of data in each training data, wherein the number of compressed semantic representations in one compressed data source is smaller than the number of initial semantic representations in the corresponding initial data source.
In the embodiment of the invention, aiming at any training data in a training data set and any data in M paths of data, the electronic equipment can compress an initial data source under any data in any training data to obtain a compressed data source under any data in any training number; that is, the initial data source of the object corresponding to any training data under any path of data can be compressed, so as to obtain the compressed data source of the object corresponding to any training data under any path of data. The compressed data source under any path of data in any training data can comprise: the compressed data source of the object corresponding to any one training data under any one path of data can comprise a plurality of compressed semantic representations, namely the compressed data source under any one path of data in any one training data can comprise: and a plurality of compressed semantic representations under any one path of data in any training data. Accordingly, the initial data source under any one path of data in any one training data may include: a plurality of initial semantic representations under any one way of data in any training data.
It should be noted that, the number of compressed semantic representations under any path of data in any training data may be smaller than the number of initial semantic representations under any path of data in any training data; optionally, when the training sub-text corresponding to any path of data in any training data is a long text, the number of compressed semantic representations under any path of data in any training data may be far smaller than the number of initial semantic representations under any path of data in any training data, so as to effectively compress the text length.
S103, masking the compressed data sources under each path of data in each training data respectively to obtain the mask data sources under each path of data in each training data, calling initial language models corresponding to each path of data respectively, and encoding the mask data sources under the corresponding paths of data in each training data to obtain the encoded data sources under each path of data in each training data.
Wherein, one mask data source may include mask semantic representations of each mask unit in the H mask units, that is, the electronic device may mask the compressed semantic representations of each mask unit in one compressed data source to obtain corresponding mask data sources, and accordingly, the mask semantic representations of each semantic unit in one mask data source except for the H mask units may be compressed semantic representations of the corresponding semantic units in the corresponding compressed data sources, where H is a positive integer. Also, one encoded data source may include encoded semantic representations of individual semantic units, and thus may include encoded semantic representations of individual ones of the H mask units. Wherein a semantic representation may correspond to a semantic unit, and a semantic unit may be a mask unit or a semantic unit other than a mask unit.
It should be noted that the mask units in any two data sources may be the same or different; that is, the number of mask units in any two data sources may be the same or different, which is not limited in the embodiment of the present invention; it should be appreciated that the masking units in the data sources (e.g., compressed data source and encoded data source, etc.) in the same path of data in the same training data are the same.
For example, assuming that one compressed data source includes 4 compressed semantic representations, and a mask unit in the 4 compressed semantic representations includes a 3 rd semantic unit (i.e., a semantic unit corresponding to the 3 rd compressed semantic representation), the electronic device may mask the 3 rd semantic unit in the compressed data source, thereby obtaining a masked semantic representation of the 3 rd semantic unit, so as to implement masking the compressed data source, and obtain a masked data source corresponding to the compressed data source; wherein, the mask data source corresponding to the compressed data source may include: the compressed semantic representation of the 1 st semantic unit, the compressed semantic representation of the 2 nd semantic unit, the mask semantic representation of the 3 rd semantic unit, and the compressed semantic representation of the 4 th semantic unit.
Alternatively, a language model may be a FLASH model (an efficient long text model), a Bert model (a deep bi-directional pre-training model), or the like; the embodiment of the present invention is not limited thereto. In the case of performing long text modeling, a FLASH model may be preferable as the language model. Alternatively, the model parameters in an initial language model may be generated randomly, may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
S104, calculating model loss values under each path of data based on the coding data sources under each path of data in each training data and the compression data sources under each path of data in each training data, optimizing model parameters in an initial language model corresponding to each path of data according to the model loss values under each path of data, and obtaining an intermediate language model corresponding to each path of data so as to determine a target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data.
In one embodiment, for the mth data in the M-way data, the electronic device may determine, from the encoded data sources under the mth data in each training data, the encoded semantic representation of each mask unit under the mth data in the corresponding training data (i.e., the encoded semantic representation of each mask unit under the mth data for each object may be determined), and me [1, M ]. Then, correspondingly, the model loss value under the mth path data can be calculated based on the coding semantic representation of each mask unit under the mth path data in the respective training data and the compressed data source under the mth path data in the corresponding training data. Specifically, the electronic device may traverse each training data in the training data set, and use the currently traversed training data as current training data; then, calculating a model loss value of the mth path of data in the current training data by adopting the coding semantic representation of each mask unit of the mth path of data in the current training data and a compressed data source of the mth path of data in the current training data; after traversing each training data in the training data set, obtaining a model loss value under the mth path of data in each training data, and carrying out weighted summation on the model loss value under the mth path of data in each training data to obtain the model loss value under the mth path of data. Optionally, the weight values involved in the weighted summation process in the embodiment of the present invention may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention; for example, when the weight values of the model loss values under the mth data in the respective training data are the same, the average value operation or the summation operation may be performed on the model loss values under the mth data in the respective training data, and so on.
Optionally, when the coding semantic representation of each mask unit under the mth path of data in the current training data and the compressed data source under the mth path of data in the current training data are adopted to calculate the model loss value under the mth path of data in the current training data, for any mask unit in the mth path of data in the current training data, the coding semantic representation of any mask unit under the mth path of data in the current training data and the compressed data source under the mth path of data in the current training data can be adopted to calculate the model loss value under any mask unit, so as to obtain the model loss value under each mask unit under the mth path of data in the current training data, and then weighted summation (such as average value operation or summation operation) is performed on the model loss value under each mask unit under the mth path of data in the current training data, so as to obtain the model loss value under the mth path of data in the current training data. Specifically, the electronic device may calculate the model loss value under any mask unit using equation 1.1:
The Ex may be an expectation of a coded data source (i.e., an expectation of each coded semantic representation) under the mth path of data in the current training data, or an expectation of a compressed data source under the mth path of data in the current training data, etc., which is not limited in the embodiment of the present invention; the corresponding f (x) may be the encoded semantic representation of any mask unit under the mth data in the current training data, f (x +) may be the compressed semantic representation of any mask unit under the mth data in the current training data, W may be the number of compressed semantic representations under the mth data in the current training data, f (x j) may be the compressed semantic representation of the jth semantic unit under the mth data in the current training data except for any mask unit, L may be the corresponding penalty value, and so on.
Illustratively, as shown in fig. 2, it is assumed that the number of compressed semantic representations under the mth data in the current training data is 4, and any mask unit under the mth data in the current training data is the 3 rd semantic unit, in which case f (x +) may be the compressed semantic representation of the 3 rd semantic unit under the mth data in the current training data; at this time, the value of W may be 4, the 1 st semantic unit under the mth data in the current training data may be the 1 st semantic unit under the mth data in the current training data except any mask unit, the 2 nd semantic unit may be the 2 nd semantic unit under the mth data in the current training data except any mask unit, and the 4 th semantic unit may be the 3 rd semantic unit under the mth data in the current training data except any mask unit.
Therefore, the embodiment of the invention can perform contrast learning on the coding semantic representation of any mask unit under the mth path of data in the current training data and the compressed data source under the mth path of data in the current training data, so as to pull the semantic representation of an encoder (encoder) (namely, the coding semantic representation) after mask (namely, mask) and the semantic representation before the mask at the bottom layer (namely, the compressed semantic representation of any mask unit), and can pull the coding semantic representation of any mask unit and the compressed semantic representation of each semantic unit except any mask unit.
In another embodiment, for the mth path of data in the M paths of data, the electronic device may calculate the model loss value under the mth path of data based on the encoded data source under the mth path of data in each training data and the compressed data source under the mth path of data in each training data; specifically, for any coding semantic representation under the mth path of data in any training data in the training data set, a model loss value under a semantic unit corresponding to any coding semantic representation can be calculated based on any coding semantic representation under the mth path of data in any training data and a compressed data source under the mth path of data in any training data, so as to obtain a model loss value of each semantic unit under the mth path of data in any training data, and the model loss values of each semantic unit under the mth path of data in any training data are weighted and summed to obtain the model loss value under the mth path of data in any training data. Based on this, the model loss value for the mth data can be calculated based on the model loss value for the mth data in each training data.
The specific manner of calculating the model loss value under the semantic unit corresponding to any coding semantic representation is the same as the specific manner of calculating the model loss value under any mask unit based on any coding semantic representation under the mth path of data in any training data and the compressed data source under the mth path of data in any training data, and the embodiments of the present invention will not be repeated.
Further, when the model parameters in the initial language model corresponding to the corresponding path data are optimized according to the model loss values of the path data respectively to obtain the intermediate language model corresponding to the path data, the electronic equipment can optimize the model parameters in the initial language model corresponding to the path data according to the model loss values of the path data to obtain the intermediate language model corresponding to the path data when aiming at the path data M in the path data M.
After the training data set is obtained, the embodiment of the invention can respectively compress the initial data sources under each path of data in each training data included in the training data set to obtain the compressed data sources under each path of data in each training data, wherein one training data comprises the initial data sources of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer; the number of compressed semantic representations in one compressed data source is less than the number of initial semantic representations in the corresponding initial data source. Then, the compressed data sources under the data in each path in each training data can be respectively masked to obtain the mask data sources under the data in each path in each training data, the initial language model corresponding to each path of data is respectively called, and the mask data sources under the corresponding path of data in each training data are encoded to obtain the encoded data sources under each path of data in each training data. Based on the above, the model loss value under each path of data can be calculated based on the coding data source under each path of data in each training data and the compression data source under each path of data in each training data, and the model parameters in the initial language model corresponding to the corresponding path of data are optimized according to the model loss value under each path of data, so as to obtain the intermediate language model corresponding to each path of data, and the target language model corresponding to each path of data is determined based on the intermediate language model corresponding to each path of data. Therefore, the embodiment of the invention can conveniently perform model pre-training to improve the model pre-training efficiency, and can effectively improve the training and reasoning speed of the model.
Based on the above description, the embodiment of the invention also provides a more specific model pre-training method. Accordingly, the model pre-training method may be performed by the above-mentioned electronic device (terminal or server); or the model pre-training method may be performed jointly by the terminal and the server. For convenience of explanation, the model pre-training method is executed by the electronic device in the following description; referring to fig. 3, the model pre-training method may include the following steps S301 to S308:
S301, acquiring a training data set, wherein one training data set comprises an initial data source of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer.
S302, compressing initial data sources under each path of data in each training data included in the training data set to obtain compressed data sources under each path of data in each training data, wherein the number of compressed semantic representations in one compressed data source is smaller than the number of initial semantic representations in the corresponding initial data source.
Specifically, for any training data in the training data set and any data in the M paths of data, the electronic device may determine a text compression length; according to the text compression length, carrying out data division on an initial data source under any path of data in any training data to obtain N semantic representation groups to be compressed, wherein the number of semantic representations in one semantic representation group to be compressed is smaller than or equal to the text compression length; the number of semantic representations (i.e. initial semantic representations here) in each of the first N-1 semantic representation groups to be compressed may be equal to the text compression length, and the number of semantic representations in the nth semantic representation group of the N semantic representation groups to be compressed may be less than or equal to the text compression length. Further, compression processing can be performed on each semantic representation group to be compressed in the N semantic representation groups to be compressed respectively, so as to obtain compression semantic representations of each semantic representation group to be compressed, and a compression data source under any path of data in any training data can be obtained. Alternatively, a semantic representation group to be compressed may be used as a sentence, so as to implement compression at a sentence level, and further implement masking at a sentence level as described below.
Alternatively, the text compression length may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention. Optionally, the value of N may be the number of sentences in the text corresponding to any path of data in any training data, where the number of text compression lengths may be equal to N, and one text compression length is the number of word divisions in one sentence, that is, one text compression length may correspond to one sentence, and so on; the embodiment of the present invention is not limited thereto.
For example, as shown in fig. 2, assuming that the text compression length is 3, and the text length corresponding to the initial data source under any path of data in any training data is 12 token, and one token corresponds to one initial semantic representation, in this case, the initial semantic representation of each 3 token may be used as a semantic representation group to be compressed, so that each 3 token is compressed into 1 representation unit (i.e., semantic unit), that is, 3 initial semantic representations may be compressed into 1 compressed semantic representation; then, correspondingly, the number of compressed semantic representations under any path of data in any training data may be 4 (i.e., the text representation length may be 4), where the text representation length may refer to the number of semantic representations used to represent one text. It can be seen that when the initial data source is a long text data source, embodiments of the present invention can compress the long text data source into a corresponding short text data source (i.e., compressed data source).
Optionally, the compressed data sources under each path of data in each training data are obtained by performing compression processing on the initial data sources under each path of data in each training data through an initial compression model, that is, the electronic device may call the initial compression model to perform compression processing on the initial data sources under each path of data in each training data. Alternatively, a compression model may be a CNN (Convolutional Neural Network ) semantic compression model (also referred to as a semantic compression module), or may be a mean pooling (average) module, which is not limited in the embodiment of the present invention.
In the embodiment of the invention, when the compression model is a CNN semantic compression model, the electronic equipment can optimize model parameters in the initial compression model based on model loss values under each path of data to obtain an intermediate compression model; determining a target compression model based on the intermediate compression model; specifically, the model parameters in the intermediate compression model may be continuously optimized until a compression convergence condition is reached, e.g., the number of iterations (i.e., the number of optimizations) reaches a first preset number of iterations, or the compression loss value is less than a first preset loss threshold, the compression convergence condition may be determined to be reached, and so on. Alternatively, the first preset iteration number and the first preset loss threshold may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
The compression loss value may be determined based on the model loss value under each path of data, that is, the electronic device may determine the compression loss value based on the model loss value under each path of data; based on this, the model parameters in the initial compression model can be optimized in a direction to reduce the compression loss value.
In one embodiment, when determining the compression loss value based on the model loss value under each path of data, the electronic device may perform weighted summation on the model loss value under each path of data to obtain a weighted summation loss value, and use the weighted summation loss value as the compression loss value; optionally, the weight value of the model loss value under each path of data in the weighted summation process may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
In another embodiment, the electronic device may select a maximum model loss value from the model loss values under each path of data, and use the selected model loss value as the compression loss value, and so on.
S303, masking the compressed data sources under each path of data in each training data respectively to obtain the mask data sources under each path of data in each training data, calling initial language models corresponding to each path of data respectively, and encoding the mask data sources under the corresponding paths of data in each training data to obtain the encoded data sources under each path of data in each training data.
Specifically, for any training data in the training data set and any data in the M paths of data, the electronic device can determine a mask probability, and determine at least one mask unit from a compressed data source under any data in any training data according to the mask probability, wherein one mask unit corresponds to one compressed semantic representation; based on the above, the compressed semantic representations of the determined mask units can be masked respectively to obtain a mask data source under any path of data in any training data, wherein the mask data source under any path of data in any training data comprises the mask semantic representations of the determined mask units, and correspondingly, the mask data under any path of data in any training data can also comprise the compressed semantic representations of each semantic unit except for the determined mask units under any path of data in any training data.
Optionally, the mask probability may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention; an exemplary mask probability may be 15%, in which case the compressed semantic units may be masked off with a probability of 15%, i.e., the compressed semantic representation of the semantic units may be randomly masked off by 15%, and so on.
Therefore, the embodiment of the invention can mask the compressed semantic representation, so that the method mentioned by the embodiment of the invention can be a model pre-training method of sentence level masking, namely a mask pre-training method of sentence level, and the semantic representation capability of single-path data can be improved. Based on the method, the pre-training of the target language model can be realized in a Mask Language Model (MLM) mode, namely, the unsupervised pre-training of the mask language model can be performed aiming at the compressed semantic representation, so that the pre-training efficiency is improved, and the model performance of the target language model is further improved; alternatively, the language model may correspond to two training tasks, one of which may be a task that uses MLM to enable a transcoder (a model that uses the attention mechanism to increase the model training speed) to merge bi-directional features (i.e., a pre-training task) and the other of which may be a fine-tuning task.
Alternatively, a language model may include an encoder; optionally, during the pre-training process, a language model may also include a masking language model (an output layer); then, correspondingly, in downstream tasks, a language model may include an encoder and NLP (natural language processing) layer (another output layer); the encoder included in the language model in the downstream task may be an encoder obtained through a pre-training process.
S304, calculating model loss values under each path of data based on the coding data sources under each path of data in each training data and the compression data sources under each path of data in each training data, and optimizing model parameters in an initial language model corresponding to the corresponding path of data according to the model loss values under each path of data to obtain an intermediate language model corresponding to each path of data.
S305, based on the training data set, iteratively optimizing model parameters in the intermediate language model corresponding to each path of data until convergence conditions are reached, and obtaining the language model to be associated corresponding to each path of data.
It should be understood that after the intermediate language model corresponding to each path of data is obtained, model parameters in the intermediate language model corresponding to each path of data may be continuously optimized until a convergence condition is reached.
Optionally, when the iteration number reaches the second preset iteration number, it is determined that a convergence condition is reached, where the iteration number of the language model to be associated corresponding to each path of data may be the same or different, which is not limited by the embodiment of the present invention; that is, the second preset iteration times corresponding to any two paths of data may be the same or different, which is not limited in the embodiment of the present invention. Or determining that a convergence condition is reached when the model loss value reaches a second preset loss threshold value, and the like; it should be noted that the second preset loss threshold value corresponding to any two paths of data may be the same or different, which is not limited by the embodiment of the present invention. It can be seen that one path of data can correspond to one convergence condition, and the convergence conditions corresponding to any two paths of data can be the same or different, which is not limited by the embodiment of the invention; it should be understood that, for any one path of data in the M paths of data, when a convergence condition corresponding to any one path of data is reached, a language model to be associated corresponding to any one path of data can be obtained. Optionally, the second preset iteration times and the second preset loss threshold corresponding to each path of data may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
It should be understood that when the convergence condition corresponding to any one path of data is reached, it means that: the iteration times reach a second preset iteration times corresponding to any path of data; when the second preset iteration times corresponding to each path of data are the same, the training times of the initial language model corresponding to each path of data can be the same, that is, the language model to be associated corresponding to each path of data can be obtained at the same time.
Optionally, when determining the compression loss value based on the model loss value under each path of data, if any path of data has reached a convergence condition at the last iteration, that is, the model loss value of any path of data under the current iteration is null, the compression loss value may be determined based on the model loss value that is not null in the model loss values under each path of data; or may determine that a compression convergence condition is reached when any of the data reaches the convergence condition, and so on.
In the embodiment of the invention, when model parameters in the intermediate language model corresponding to each path of data are iteratively optimized based on the training data set, if the compression model is a CNN semantic compression model, the compression model can be updated in the iterative process, and then the compression data source under any path of data in any training data can be updated along with the updating of the compression model; under the condition, the above-mentioned initial data sources under each path of data in each training data included in the training data set can be iteratively executed to obtain the compressed data sources under each path of data in each training data, so as to implement iterative training of the intermediate language model corresponding to each path of data. Or if the compression model is mean pooling modules, the compression model can not be updated in the iterative process, and then the compression data source under any path of data in any training data can be unchanged; in this case, the above-mentioned masking of the compressed data sources under each path of data in each training data may be performed iteratively, to obtain the masked data sources under each path of data in each training data, so as to implement iterative training of the intermediate language model corresponding to each path of data, and so on.
S306, determining current compressed data sources under each path of data in each training data, and determining data sources to be coded under each path of data in corresponding training data based on the current compressed data sources under each path of data in each training data.
In one embodiment, the electronic device may invoke a compression model under the current system time, and perform compression processing on the initial data sources under each path of data in each training data, to obtain a current compressed data source under each path of data in each training data. Alternatively, the compression model at the current system time may be the target compression model (i.e., the compression model when the compression convergence condition is reached), or may be the compression model when the compression convergence condition is not reached (at this time, model training of the compression model may be continued based on the following associated loss value, that is, the following associated loss value may be used as the compression loss value at the current system time to perform model training), or the like; the embodiment of the present invention is not limited thereto.
In another embodiment, the electronic device may use the compressed data source under each path of data in each training data as the current compressed data source under the corresponding path of data in the corresponding training data, so as to determine the current compressed data source under each path of data in each training data, and so on. For example, when the compression model is mean pooling modules, the compressed data sources under the data paths in each training data are the same, so that the compressed data sources under the data paths in each training data can be directly used as the current compressed data sources under the data paths in each training data.
Optionally, the electronic device may use the current compressed data source under any path of data in any training data as a data source to be encoded under any path of data in any training data; or masking the current compressed data source under any path of data in any training data according to the masking probability to obtain the current masking data source under any path of data in any training data, using the current masking data source under any path of data in any training data as the data source to be encoded under any path of data in any training data, and the like.
S307, calling the language model to be associated corresponding to each path of data, and respectively encoding the data sources to be encoded under the corresponding path of data in each training data to obtain the current encoded data sources under each path of data in each training data.
Optionally, the electronic device may call an encoder in the language model to be associated corresponding to each path of data, and encode the data source to be encoded under the corresponding path of data in each training data, for example, when one data source to be encoded is a corresponding current compressed data source; or the encoder and the mask language model in the language model to be associated corresponding to each path of data (namely, the whole language model to be associated is called) can be called, and the data sources to be encoded under the corresponding path of data in each training data are respectively encoded, for example, when one data source to be encoded is the corresponding current mask data source, and the like; the embodiment of the present invention is not limited thereto.
S308, calculating an association loss value based on the current coding data source of each path of data in each training data, optimizing model parameters in the to-be-associated language model corresponding to each path of data according to the direction of reducing the association loss value, and obtaining the to-be-associated intermediate language model corresponding to each path of data so as to determine the target language model corresponding to each path of data based on the to-be-associated intermediate language model corresponding to each path of data.
Specifically, when calculating the association loss value based on the current coding data source under each path of data in each training data, the electronic device may determine at least one comparison learning group based on the current coding data source under each path of data in each training data, where one comparison learning group includes the current coding data sources under any two paths of data in each training data; then, each contrast learning group in at least one contrast learning group can be traversed, the currently traversed contrast learning group is used as a current contrast learning group, and two paths of data corresponding to the current contrast learning group are used as first path of data and second path of data. Based on the method, the loss value under the current comparison learning group can be calculated based on the current coding data source under the first path of data in each training data and the current coding data source under the second path of data in each training data; after traversing each contrast learning group in at least one contrast learning group, obtaining a loss value under each contrast learning group, and calculating an associated loss value based on the loss value under each contrast learning group. Optionally, the loss values under each comparison learning group can be weighted and summed (such as mean value operation or summation operation) to obtain an associated loss value; alternatively, the weights of the loss values under each comparison learning group may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
In one embodiment, when calculating a loss value under a current comparison learning group based on a current coding data source under first path data in each training data and a current coding data source under second path data in each training data, for any current coding semantic representation under first path data in any training data, a loss value of any current coding semantic representation can be calculated based on any current coding semantic representation under first path data in any training data and a current coding data source under second path data in each training data, so as to obtain a loss value of each current coding semantic representation under first path data in each training data; correspondingly, the loss value of each current coding semantic representation under the first path of data in each training data can be weighted and summed (such as mean value operation or summation operation) to obtain the loss value under the current comparison learning group.
Optionally, the electronic device may calculate the loss value of any current encoded semantic representation based on any current encoded semantic representation of the first path of data in any training data and the current encoded data source of the second path of data in each training data using equation 1.1. For example, W-1 may be a sum of a number of currently encoded semantic representations under the second path data in each of the training data except any of the training data, ex may be a desire among each currently encoded semantic representation under the first path data in each of the training data or a desire among each currently encoded semantic representation under the second path data in each of the training data, f (x) may be any currently encoded semantic representation under the first path data in any of the training data, f (x +) may be each currently encoded semantic representation under the second path data in any of the training data (at this time, the number of f (x +) may be a plurality of, then the numerator may be a sum of a plurality of products), f (x j) may be a currently encoded semantic representation of the j-th unit under the second path data in each of the training data except any of the training data.
In another embodiment, when calculating the loss value under the current comparison learning group based on the current coding data source under the first path data in each training data and the current coding data source under the second path data in each training data, semantic representation integration can be performed on the current coding data source under the first path data in each training data (that is, semantic representation integration can be performed on each current coding semantic representation under the first path data in any training data) respectively, so as to obtain a semantic representation integration result under the first path data in each training data, and semantic representation integration can be performed on the current coding data source under the second path data in each training data respectively, so as to obtain a semantic representation integration result under the second path data in each training data. Based on the semantic representation integration result under the first path of data in any training data set and the semantic representation integration result under the second path of data in each training data, the loss value of any training data under the current comparison learning group can be calculated to obtain the loss value of each training data under the current comparison learning group; correspondingly, the loss values of the training data under the current comparison learning group can be weighted and summed to obtain the loss value under the current comparison learning group, and the like. Optionally, the semantic representation integration may refer to semantic representation stitching, that is, each current encoded semantic representation in the same current encoded data source may be stitched together; or semantic representation integration can also refer to mean value operation, namely mean value operation can be carried out on each current coding semantic representation in the same current coding data source, and the like; the embodiment of the present invention is not limited thereto.
Optionally, the electronic device may use formula 1.1 to calculate the loss value of any training data under the current comparison learning group based on the semantic representation integration result under the first path of data in any training data and the semantic representation integration result under the second path of data in each training data. For example, W may be the number of training data in the training data set, ex may be an expectation between semantic representation integration results under a first path of data in each training data or an expectation between language representation integration results under a second path of data in each training data, f (x) may be a semantic representation integration result under a first path of data in any training data, f (x +) may be a semantic representation integration result under a second path of data in any training data, and f (x j) may be a semantic representation integration result under a second path of data in a j-th training data except any training data in the training data set.
Further, when determining the target language model corresponding to each path of data based on the intermediate language model to be associated corresponding to each path of data, the electronic device may continue to perform model training (i.e., model pre-training) on the intermediate language model to be associated corresponding to each path of data until reaching the association convergence condition (e.g., the iteration number reaches a third preset iteration number or the association loss value is smaller than a third preset loss threshold value, etc.), to obtain the target language model corresponding to each path of data. Optionally, the third preset iteration number and the third preset loss threshold may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.
Therefore, the embodiment of the invention can carry out contrast learning on the coding semantic representation of each path of data, and can effectively pull up the representation of the semantic space, as shown in fig. 4; based on the method, the embodiment of the invention can make semantic representation among the subsequent merging multi-path data sources more reasonable after the semantic representation of the single-path data is modeled (namely, after the language model to be associated corresponding to each path of data is obtained).
Optionally, the encoder in the target language model corresponding to each path of data may be used in the downstream task to determine a task language model corresponding to each path of data based on the encoder in the target language model corresponding to each path of data, where the task language model corresponding to each path of data may include the encoder in the target language model corresponding to the corresponding path of data. Optionally, the electronic device may further perform fine adjustment on the task language model corresponding to each path of data, so as to obtain a target task language model corresponding to each path of data; the target task language model corresponding to each path of data can be applied to a target task, and the target task can be any downstream task, which is not limited in the embodiment of the present invention.
Optionally, after determining the coded data sources under each path of data in the target data through the target language model or the target task language model corresponding to each path of data, the electronic device further performs data fusion (such as sampling or averaging the coded data sources under each path of data) on the coded data sources under each path of data in the target data, so as to obtain a fused data source (including at least one fused semantic representation) of the target data, and so on; it should be noted that, the embodiment of the present invention is not limited to the specific implementation manner of data fusion. Optionally, the target data may include an initial data source of the target object under each path of data, and the target object may be any object, which is not limited in the embodiment of the present invention.
After the training data set is obtained, the embodiment of the invention can respectively compress the initial data sources under each path of data in each training data included in the training data set to obtain the compressed data sources under each path of data in each training data, wherein one training data comprises the initial data sources of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer; the number of compressed semantic representations in one compressed data source is less than the number of initial semantic representations in the corresponding initial data source. Then, the compressed data sources under the data in each path in each training data can be respectively masked to obtain the mask data sources under the data in each path in each training data, the initial language model corresponding to each path of data is respectively called, and the mask data sources under the corresponding path of data in each training data are encoded to obtain the encoded data sources under each path of data in each training data. Based on the model loss values under the data paths and the compression data sources under the data paths in the training data, model parameters in the initial language model corresponding to the corresponding data paths are optimized according to the model loss values under the data paths to obtain the intermediate language model corresponding to the data paths. Further, based on the training data set, iteratively optimizing model parameters in the intermediate language model corresponding to each path of data until convergence conditions are reached, and obtaining a language model to be associated corresponding to each path of data; and determining the data source to be coded under each path of data in the corresponding training data based on the current compressed data source under each path of data in each training data. Correspondingly, the language model to be associated corresponding to each path of data can be called, and the data sources to be encoded under the corresponding path of data in each training data are respectively encoded to obtain the current encoded data sources under each path of data in each training data; and calculating an association loss value based on the current coding data source under each path of data in each training data, optimizing model parameters in the to-be-associated language model corresponding to each path of data according to the direction of reducing the association loss value, and obtaining the to-be-associated intermediate language model corresponding to each path of data so as to determine the target language model corresponding to each path of data based on the to-be-associated intermediate language model corresponding to each path of data. Therefore, the embodiment of the invention can realize the unsupervised pre-training in the compressed text, namely, the unsupervised pre-training of the mask language model can be carried out aiming at the compressed semantic representation, so that the problem of slower training and reasoning speed caused by long text and/or multi-path data is solved through compression processing, the model pre-training is conveniently carried out, the model pre-training efficiency is improved, the training and reasoning speed of the model can be effectively improved, and the semantic representation capacity of single-path data can be improved; in addition, when the value of M is greater than 1, the method mentioned in the embodiment of the invention can be a pretraining method combined with multiple paths of data sources, semantic space representations of different data can be aligned, namely coding semantic representations under each path of data can be pulled up, so that different data sources of the same object have stronger association, namely can be in a complementary relationship, and therefore, better final semantic representation (namely fusion semantic representation) can be obtained, namely, the accuracy of semantic representation can be improved, and on the basis of the pretraining method, the semantic representations of multiple paths of data can be aligned through a comparison learning technology, and the generalization capability of downstream tasks can be effectively improved.
Based on the description of the related embodiments of the model pre-training method, the embodiments of the present invention further provide a model pre-training apparatus, which may be a computer program (including program code) running in an electronic device; as shown in fig. 5, the model pre-training apparatus may comprise an acquisition unit 501 and a processing unit 502. The model pre-training apparatus may perform the model pre-training method shown in fig. 1 or fig. 3, i.e. the model pre-training apparatus may operate the above units:
an obtaining unit 501, configured to obtain a training data set, where one training data includes an initial data source of an object under each path of data in M paths of data, and one path of data corresponds to one language model, and M is a positive integer;
the processing unit 502 is configured to perform compression processing on initial data sources under each path of data in each training data included in the training data set, so as to obtain compressed data sources under each path of data in each training data, where the number of compressed semantic representations in one compressed data source is smaller than the number of initial semantic representations in the corresponding initial data source;
the processing unit 502 is further configured to mask the compressed data sources under each path of data in each training data, obtain mask data sources under each path of data in each training data, call an initial language model corresponding to each path of data, and encode the mask data sources under corresponding paths of data in each training data, so as to obtain encoded data sources under each path of data in each training data;
The processing unit 502 is further configured to calculate a model loss value under each path of data based on the encoded data source under each path of data in each training data and the compressed data source under each path of data in each training data, and optimize model parameters in an initial language model corresponding to each path of data according to the model loss value under each path of data, so as to obtain an intermediate language model corresponding to each path of data, so as to determine a target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data.
In one embodiment, when the processing unit 502 performs compression processing on the initial data sources under each path of data in each training data included in the training data set to obtain the compressed data sources under each path of data in each training data set, the processing unit may be specifically configured to:
determining a text compression length according to any training data in the training data set and any data in the M paths of data;
According to the text compression length, carrying out data division on an initial data source under any path of data in any training data to obtain N semantic representation groups to be compressed, wherein the number of semantic representations in one semantic representation group to be compressed is smaller than or equal to the text compression length;
and respectively compressing each semantic representation group to be compressed in the N semantic representation groups to be compressed to obtain compressed semantic representations of each semantic representation group to be compressed so as to obtain a compressed data source under any path of data in any training data.
In another embodiment, when the processing unit 502 masks the compressed data sources under the respective paths of data in the respective training data to obtain the mask data sources under the respective paths of data in the respective training data, the processing unit may be specifically configured to:
determining mask probability, and determining at least one mask unit from compressed data sources under any path of data in any training data according to the mask probability, wherein one mask unit corresponds to one compressed semantic representation;
And masking the compressed semantic representations of the determined masking units respectively to obtain masking data sources under any path of data in any one piece of training data, wherein the masking data sources under any path of data in any piece of training data comprise the masking semantic representations of the determined masking units.
In another embodiment, an encoded data source includes encoded semantic representations of each of H mask units, H being a positive integer; the processing unit 502 may be specifically configured to, when calculating the model loss value for each path of data based on the encoded data source for each path of data in each of the training data and the compressed data source for each path of data in each of the training data, respectively:
for the mth path of data in the M paths of data, determining the coding semantic representation of each mask unit under the mth path of data in the corresponding training data from the coding data sources under the mth path of data in each training data, wherein M is E [1, M ];
Calculating a model loss value under the mth path of data based on the coding semantic representation of each mask unit under the mth path of data in each training data and a compressed data source under the mth path of data in the corresponding training data;
The processing unit 502 may be specifically configured to, when optimizing model parameters in the initial language model corresponding to the corresponding path data according to the model loss values under the path data to obtain an intermediate language model corresponding to the path data:
and optimizing model parameters in the initial language model corresponding to the mth path of data according to the model loss value under the mth path of data to obtain an intermediate language model corresponding to the mth path of data.
In another embodiment, when determining the target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data, the processing unit 502 may be specifically configured to:
Based on the training data set, iteratively optimizing model parameters in the intermediate language model corresponding to each path of data until convergence conditions are reached, and obtaining a language model to be associated corresponding to each path of data;
Determining current compressed data sources under the data of each path in each training data, and determining data sources to be coded under the data of each path in corresponding training data based on the current compressed data sources under the data of each path in each training data;
calling a language model to be associated corresponding to each path of data, and respectively encoding data sources to be encoded under corresponding paths of data in each training data to obtain current encoding data sources under each path of data in each training data;
Calculating an association loss value based on the current coding data source under each path of data in each training data, and optimizing model parameters in the to-be-associated language model corresponding to each path of data according to the direction of reducing the association loss value to obtain the to-be-associated intermediate language model corresponding to each path of data so as to determine a target language model corresponding to each path of data based on the to-be-associated intermediate language model corresponding to each path of data.
In another embodiment, the processing unit 502 may be specifically configured to, when calculating the association loss value based on the current encoded data source under each path of data in each of the training data:
Determining at least one comparison learning group based on the current coding data sources under the data paths in the training data, wherein one comparison learning group comprises the current coding data sources under any two data paths in the training data;
traversing each contrast learning group in the at least one contrast learning group, taking the currently traversed contrast learning group as a current contrast learning group, and taking two paths of data corresponding to the current contrast learning group as first path of data and second path of data;
Calculating a loss value under the current comparison learning group based on the current coding data source under the first path of data in each training data and the current coding data source under the second path of data in each training data;
after traversing each contrast learning group in the at least one contrast learning group, obtaining a loss value under each contrast learning group, and calculating the associated loss value based on the loss value under each contrast learning group.
In another embodiment, the compressed data source under each path of data in each training data is obtained by performing compression processing on the initial data source under each path of data in each training data through an initial compression model, and the processing unit 502 is further configured to:
optimizing model parameters in the initial compression model based on the model loss values of each path of data to obtain an intermediate compression model;
A target compression model is determined based on the intermediate compression model.
According to one embodiment of the invention, the steps involved in the method of fig. 1 or 3 may be performed by the various units in the model pre-training apparatus shown in fig. 5. For example, step S101 shown in fig. 1 may be performed by the acquisition unit 501 shown in fig. 5, and steps S102 to S104 may each be performed by the processing unit 502 shown in fig. 5. As another example, step S301 shown in fig. 3 may be performed by the acquisition unit 501 shown in fig. 5, steps S302-S308 may each be performed by the processing unit 502 shown in fig. 5, and so on.
According to another embodiment of the present invention, each unit in the model pre-training apparatus shown in fig. 5 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present invention. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present invention, any model pre-training apparatus may also include other units, and in practical applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present invention, a model pre-training apparatus as shown in fig. 5 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 1 or 3 on a general-purpose electronic device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and a storage element, and the model pre-training method of the embodiment of the present invention is implemented. The computer program may be recorded on, for example, a computer storage medium, and loaded into and run in the above-described electronic device through the computer storage medium.
After the training data set is obtained, the embodiment of the invention can respectively compress the initial data sources under each path of data in each training data included in the training data set to obtain the compressed data sources under each path of data in each training data, wherein one training data comprises the initial data sources of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer; the number of compressed semantic representations in one compressed data source is less than the number of initial semantic representations in the corresponding initial data source. Then, the compressed data sources under the data in each path in each training data can be respectively masked to obtain the mask data sources under the data in each path in each training data, the initial language model corresponding to each path of data is respectively called, and the mask data sources under the corresponding path of data in each training data are encoded to obtain the encoded data sources under each path of data in each training data. Based on the above, the model loss value under each path of data can be calculated based on the coding data source under each path of data in each training data and the compression data source under each path of data in each training data, and the model parameters in the initial language model corresponding to the corresponding path of data are optimized according to the model loss value under each path of data, so as to obtain the intermediate language model corresponding to each path of data, and the target language model corresponding to each path of data is determined based on the intermediate language model corresponding to each path of data. Therefore, the embodiment of the invention can conveniently perform model pre-training to improve the model pre-training efficiency, and can effectively improve the training and reasoning speed of the model.
Based on the description of the method embodiment and the apparatus embodiment, the exemplary embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to an embodiment of the invention when executed by the at least one processor.
The exemplary embodiments of the present invention also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present invention.
The exemplary embodiments of the invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the invention.
Referring to fig. 6, a block diagram of an electronic device 600 that may be a server or a client of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic device 600 can also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the electronic device 600, and the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 608 may include, but is not limited to, magnetic disks, optical disks. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above. For example, in some embodiments, the model pre-training method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. In some embodiments, the computing unit 601 may be configured to perform the model pre-training method by any other suitable means (e.g., by means of firmware).
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It is also to be understood that the foregoing is merely illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (10)
1. A method of model pre-training, comprising:
acquiring a training data set, wherein one training data set comprises an initial data source of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer;
Respectively compressing initial data sources under each path of data in each training data included in the training data set to obtain compressed data sources under each path of data in each training data, wherein the number of compressed semantic representations in one compressed data source is smaller than the number of initial semantic representations in the corresponding initial data source;
Masking the compressed data sources under each path of data in each training data respectively to obtain masking data sources under each path of data in each training data, calling initial language models corresponding to each path of data respectively, and encoding the masking data sources under the corresponding paths of data in each training data to obtain encoded data sources under each path of data in each training data;
And calculating model loss values under each path of data based on the coding data sources under each path of data in each training data and the compression data sources under each path of data in each training data, and optimizing model parameters in an initial language model corresponding to each path of data according to the model loss values under each path of data to obtain an intermediate language model corresponding to each path of data so as to determine a target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data.
2. The method according to claim 1, wherein the compressing the initial data sources under the respective paths of data in the respective training data included in the training data set to obtain the compressed data sources under the respective paths of data in the respective training data includes:
determining a text compression length according to any training data in the training data set and any data in the M paths of data;
According to the text compression length, carrying out data division on an initial data source under any path of data in any training data to obtain N semantic representation groups to be compressed, wherein the number of semantic representations in one semantic representation group to be compressed is smaller than or equal to the text compression length;
and respectively compressing each semantic representation group to be compressed in the N semantic representation groups to be compressed to obtain compressed semantic representations of each semantic representation group to be compressed so as to obtain a compressed data source under any path of data in any training data.
3. The method according to claim 2, wherein the masking the compressed data sources under the respective paths of data in the respective training data to obtain the masked data sources under the respective paths of data in the respective training data includes:
determining mask probability, and determining at least one mask unit from compressed data sources under any path of data in any training data according to the mask probability, wherein one mask unit corresponds to one compressed semantic representation;
And masking the compressed semantic representations of the determined masking units respectively to obtain masking data sources under any path of data in any one piece of training data, wherein the masking data sources under any path of data in any piece of training data comprise the masking semantic representations of the determined masking units.
4. A method according to any one of claims 1-3, wherein one source of encoded data comprises encoded semantic representations of each of H mask units, H being a positive integer; the calculating the model loss value under each path of data based on the coding data source under each path of data in each training data and the compression data source under each path of data in each training data respectively includes:
for the mth path of data in the M paths of data, determining the coding semantic representation of each mask unit under the mth path of data in the corresponding training data from the coding data sources under the mth path of data in each training data, wherein M is E [1, M ];
Calculating a model loss value under the mth path of data based on the coding semantic representation of each mask unit under the mth path of data in each training data and a compressed data source under the mth path of data in the corresponding training data;
the optimizing the model parameters in the initial language model corresponding to the corresponding path data according to the model loss value under the path data to obtain the intermediate language model corresponding to the path data comprises the following steps:
and optimizing model parameters in the initial language model corresponding to the mth path of data according to the model loss value under the mth path of data to obtain an intermediate language model corresponding to the mth path of data.
5. A method according to any one of claims 1-3, wherein determining the target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data comprises:
Based on the training data set, iteratively optimizing model parameters in the intermediate language model corresponding to each path of data until convergence conditions are reached, and obtaining a language model to be associated corresponding to each path of data;
Determining current compressed data sources under the data of each path in each training data, and determining data sources to be coded under the data of each path in corresponding training data based on the current compressed data sources under the data of each path in each training data;
calling a language model to be associated corresponding to each path of data, and respectively encoding data sources to be encoded under corresponding paths of data in each training data to obtain current encoding data sources under each path of data in each training data;
Calculating an association loss value based on the current coding data source under each path of data in each training data, and optimizing model parameters in the to-be-associated language model corresponding to each path of data according to the direction of reducing the association loss value to obtain the to-be-associated intermediate language model corresponding to each path of data so as to determine a target language model corresponding to each path of data based on the to-be-associated intermediate language model corresponding to each path of data.
6. The method of claim 5, wherein said calculating an associated loss value based on a current source of encoded data for each of said paths of data in said respective training data comprises:
Determining at least one comparison learning group based on the current coding data sources under the data paths in the training data, wherein one comparison learning group comprises the current coding data sources under any two data paths in the training data;
traversing each contrast learning group in the at least one contrast learning group, taking the currently traversed contrast learning group as a current contrast learning group, and taking two paths of data corresponding to the current contrast learning group as first path of data and second path of data;
Calculating a loss value under the current comparison learning group based on the current coding data source under the first path of data in each training data and the current coding data source under the second path of data in each training data;
after traversing each contrast learning group in the at least one contrast learning group, obtaining a loss value under each contrast learning group, and calculating the associated loss value based on the loss value under each contrast learning group.
7. A method according to any one of claims 1 to 3, wherein the compressed data sources for each of the paths of data in each of the training data are obtained by compressing the initial data sources for each of the paths of data in each of the training data using an initial compression model, the method further comprising:
optimizing model parameters in the initial compression model based on the model loss values of each path of data to obtain an intermediate compression model;
A target compression model is determined based on the intermediate compression model.
8. A model pre-training apparatus, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training data set, one training data comprises an initial data source of an object under each path of data in M paths of data, one path of data corresponds to one language model, and M is a positive integer;
The processing unit is used for respectively compressing the initial data sources under each path of data in each training data included in the training data set to obtain compressed data sources under each path of data in each training data, and the number of compressed semantic representations in one compressed data source is smaller than the number of initial semantic representations in the corresponding initial data source;
The processing unit is further configured to mask the compressed data sources under each path of data in each training data respectively, obtain mask data sources under each path of data in each training data, call an initial language model corresponding to each path of data respectively, and encode the mask data sources under the corresponding path of data in each training data, so as to obtain encoded data sources under each path of data in each training data;
The processing unit is further configured to calculate a model loss value under each path of data based on the encoded data source under each path of data in each training data and the compressed data source under each path of data in each training data, and optimize model parameters in an initial language model corresponding to each path of data according to the model loss value under each path of data, so as to obtain an intermediate language model corresponding to each path of data, so as to determine a target language model corresponding to each path of data based on the intermediate language model corresponding to each path of data.
9. An electronic device, comprising:
a processor; and
A memory in which a program is stored,
Wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410131223.5A CN118132672A (en) | 2024-01-30 | 2024-01-30 | Model pre-training method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410131223.5A CN118132672A (en) | 2024-01-30 | 2024-01-30 | Model pre-training method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118132672A true CN118132672A (en) | 2024-06-04 |
Family
ID=91232633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410131223.5A Pending CN118132672A (en) | 2024-01-30 | 2024-01-30 | Model pre-training method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118132672A (en) |
-
2024
- 2024-01-30 CN CN202410131223.5A patent/CN118132672A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113239705B (en) | Pre-training method and device of semantic representation model, electronic equipment and storage medium | |
WO2019246491A1 (en) | Neural network acceleration and embedding compression systems and methods with activation sparsification | |
KR20220115046A (en) | Method and appartuas for semantic retrieval, device and storage medium | |
JP7286810B2 (en) | Text intelligent cleaning method, apparatus and computer readable storage medium | |
US20230084055A1 (en) | Method for generating federated learning model | |
WO2019154411A1 (en) | Word vector retrofitting method and device | |
US20230115984A1 (en) | Method and apparatus for training model, method and apparatus for generating molecules | |
US20240202499A1 (en) | Element text processing method and apparatus, electronic device, and storage medium | |
CN111737406A (en) | Text retrieval method, device and equipment and training method of text retrieval model | |
CN111667069A (en) | Pre-training model compression method and device and electronic equipment | |
CN113468857B (en) | Training method and device for style conversion model, electronic equipment and storage medium | |
CN111008213A (en) | Method and apparatus for generating language conversion model | |
CN117493013A (en) | Prompt text generation method, device, equipment and medium thereof | |
CN112861512A (en) | Data processing method, device, equipment and storage medium | |
US20230206007A1 (en) | Method for mining conversation content and method for generating conversation content evaluation model | |
Dai et al. | Distributed encoding and updating for SAZD coded distributed training | |
CN113987154A (en) | Similar sentence generation model training method based on UniLM and contrast learning and related equipment | |
CN118132672A (en) | Model pre-training method and device, storage medium and electronic equipment | |
CN114399998B (en) | Voice processing method, device, equipment, storage medium and program product | |
CN113033205B (en) | Method, device, equipment and storage medium for entity linking | |
CN113553857A (en) | Text processing method and text processing device | |
CN113553413A (en) | Dialog state generation method and device, electronic equipment and storage medium | |
CN118297682B (en) | Multi-behavior-based interactive content recommendation method and device and related equipment | |
EP4033734A2 (en) | Method and apparatus for measuring the communication frequency between user accounts | |
CN118018029A (en) | Text compression method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |