CN117688176A - Pseudo language family clustering method and device based on multilingual pre-training large model - Google Patents
Pseudo language family clustering method and device based on multilingual pre-training large model Download PDFInfo
- Publication number
- CN117688176A CN117688176A CN202311653724.1A CN202311653724A CN117688176A CN 117688176 A CN117688176 A CN 117688176A CN 202311653724 A CN202311653724 A CN 202311653724A CN 117688176 A CN117688176 A CN 117688176A
- Authority
- CN
- China
- Prior art keywords
- language
- similarity
- pairs
- pool
- shared
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 87
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012512 characterization method Methods 0.000 claims abstract description 36
- 239000011159 matrix material Substances 0.000 claims abstract description 30
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 238000013519 translation Methods 0.000 abstract description 22
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 235000019800 disodium phosphate Nutrition 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 206010042772 syncope Diseases 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of text machine translation, in particular to a pseudo language family clustering method and device based on a multilingual pre-training large model, wherein the method comprises the following steps: establishing a shared language pool; calculating a Fisher information matrix of the language pairs in the shared language pool based on the multi-language pre-training large model, and obtaining a characterization result of the language pairs in the shared language pool; calculating the similarity between the language pairs according to the characterization result to obtain a similarity value; and sorting the similarity among the language pairs according to the similarity value, and selecting auxiliary language pairs conforming to the boundary value according to the preset boundary value to complete pseudo language family clustering based on the multilingual pre-training large model. The invention uses the capability of the multi-language pre-training to characterize the language pairs, more effectively selects and clusters auxiliary languages, improves generalization of the auxiliary languages among different models and data sets, and finally improves translation quality of low-resource language pairs under multi-language co-training.
Description
Technical Field
The invention relates to the technical field of machine translation, in particular to a pseudo language family clustering method and device based on a multilingual pre-training large model.
Background
Neural Machine Translation (NMT) has become the dominant Machine Translation (MT) paradigm in academic research and commercial use. In recent years, the NMT framework is found to naturally integrate multiple languages. Thus, research efforts involving MT systems in multiple languages have increased dramatically. Researchers refer to NMT systems that handle translation of more than one language pair as Multilingual NMT (MNMT) systems. The ultimate goal of MNMT research is to develop a single model for translation between as many languages as possible by efficiently utilizing the available language resources. Although MNMT brings a pleasing improvement in translation quality, these models all rely on large parallel corpora. Since such a corpus exists on only a few language pairs, in most low-resource languages, the translation performance is far from expected. Related studies have shown that for low-resource language translation, multi-language co-training can be advantageous over conventional fine-tuning methods in some cases by introducing additional pairs of auxiliary languages during the fine-tuning stage. However, subsequent studies further indicate that co-training does not always bring about a positive effect, sometimes even leading to a reduction of translation quality, depending on the choice of the co-linguistic pair.
In recent research at home and abroad, it is shown that by fine tuning the model by using a language similar to the target language, the translation quality of the target language pair can be improved without using the data of the target language pair, and it is further explained that there is a synergistic effect between the language pairs. However, the collaborative training of any language pair can not achieve the same effect, so that the screening of the collaborative language pair becomes a key step for improving the translation quality of the MNMT in low-resource language pair. Languages in a language family generally have a common territory and language background, so that more meaning is similar at the character or word level, and from the linguistic perspective, the language pairs have more similar or similar linguistic features such as characters, grammar and the like. At present, the academic research in the field is mainly divided into two directions: on the one hand, researchers often integrate different prior knowledge including language similarity, resource availability, language type, task-specific requirements, etc.; on the other hand, researchers try to apply language embedding, represent each language with embedded vectors and cluster them in an embedding space, for example, language embedding (Language Embedding) layers are added in a model, after multi-language training, embedded vectors are built for each language pair, then language families are built through hierarchical clustering, so that the translation quality of the language pair is improved, or an Adapter structure is embedded in the structure of the model under the condition that the parameters of a pre-training model are kept unchanged, and the translation quality is improved through training of the language family Adapter in a downstream task.
Although these methods can improve the translation quality of language pairs, they also face certain difficulties in practical applications. In particular, training new models or modifying the structure of models can complicate these methods and are difficult to reproduce in cases where the original structure and data of large language models are difficult to obtain.
Disclosure of Invention
In order to solve the technical problems that in the prior art, training a new model or changing the structure of the model can complicate the methods and can not be reproduced under the condition that the original structure and data of a large language model are difficult to acquire, the embodiment of the invention provides a pseudo language family clustering method and device based on a multi-language pre-training large model. The technical scheme is as follows:
in one aspect, a method for clustering pseudo language families based on a multilingual pre-training large model is provided, the method is implemented by a pseudo language family clustering device based on the multilingual pre-training large model, and the method comprises:
s1, establishing a shared language pool;
s2, calculating a Fisher information matrix of the language pairs in the shared language pool based on the multi-language pre-training large model, and obtaining a characterization result of the language pairs in the shared language pool;
S3, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value;
s4, sorting the similarity among the language pairs according to the similarity value, selecting auxiliary language pairs conforming to the boundary value according to the preset boundary value, and completing pseudo language family clustering based on the multilingual pre-training large model.
Optionally, in step S1, establishing the shared language pool includes:
acquiring a TED data set;
extracting multiple languages in the TED data set, translating the multiple languages into language pairs of English as a basic data set, and establishing a shared language pool.
Optionally, in step S2, based on the multilingual pre-training large model, calculating a fee-house information matrix of the language pairs in the shared language pool, to obtain a characterization result of the language pairs in the shared language pool, including:
acquiring a parallel corpus corresponding to the languages in the shared language pool, and equally dividing data in the parallel corpus into j small batch data sets;
sequentially inputting the small batch data sets into a multilingual pre-training large model, and outputting a Fisher information matrix of each small batch data set;
calculating an average Fisher information matrix of each small batch data set after one input round, and taking the average Fisher information matrix as an estimated value to obtain the Fisher information weight of each small batch data set;
And characterizing the distribution of the corresponding language pairs in the shared language pool according to the weight of the Fisher information.
Optionally, in step S3, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value, including:
obtaining a characterization result;
and calculating the distance between the target language pair and the auxiliary language pair by adopting a mean square error method, wherein the distance is similar to the similarity.
Optionally, in step S3, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value, including:
and calculating the KL divergence from the auxiliary language to the target language by using the fee-house information matrix to obtain the distance between the target language pair and the auxiliary language pair, wherein the distance is similar to the distance, and the higher the similarity is.
Optionally, in step S3, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value, including:
selecting and assigning a value of 1 to the parameter of the previous K, and assigning a value of 0 to the remaining parameters to create a fee-house information mask;
and calculating the distance between the target language pair and the auxiliary language pair according to the number of the parameters activated simultaneously and the number of the parameters activated in the target direction, wherein the distance is similar to the distance, and the similarity is higher.
Optionally, in step S4, the similarity between the language pairs is ordered according to the similarity value, and an auxiliary language pair conforming to the boundary value is selected according to the preset boundary value, so as to complete the pseudo language family clustering based on the multilingual pre-training large model, including:
Traversing and calculating the similarity between all language pairs;
descending order according to the similarity between the language pairs;
presetting an initial searching radius, and defining a boundary range according to the initial searching radius;
integrating the nearest language pair in the boundary range into an auxiliary language list;
updating the search radius according to the similarity between the latest added language pair and the target language pair;
and repeatedly updating the search radius until new language pairs are not expanded, obtaining clustered pseudo language families, and completing pseudo language family clustering based on the multilingual pre-training large model.
In another aspect, a pseudo language family clustering device based on a multilingual pre-training large model is provided, the device is applied to a pseudo language family clustering method based on the multilingual pre-training large model, and the device comprises:
the language pool module is used for establishing a shared language pool;
the characterization module is used for calculating a Fisher information matrix of the language pairs in the shared language pool based on the multi-language pre-training large model to obtain a characterization result of the language pairs in the shared language pool;
the similarity calculation module is used for calculating the similarity between the language pairs according to the characterization result to obtain a similarity value;
And the clustering module is used for sequencing the similarity among the language pairs according to the similarity value, selecting auxiliary language pairs conforming to the boundary value according to the preset boundary value, and completing pseudo language family clustering based on the multilingual pre-training large model.
In another aspect, a pseudo language family clustering apparatus based on a multilingual pre-training large model is provided, the pseudo language family clustering apparatus based on the multilingual pre-training large model including: a processor; a memory having stored thereon computer readable instructions which, when executed by the processor, implement any of the pseudo language family clustering methods based on a multilingual pre-training large model as described above.
In another aspect, a computer-readable storage medium having stored therein at least one instruction loaded and executed by a processor to implement any of the above-described pseudo language family clustering methods based on a multilingual pre-training large model is provided.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
aiming at the limitation that additional priori knowledge is needed or the model architecture needs to be modified in the prior art, the invention provides a method for constructing more effective language to perform multi-language collaborative training on a clustering method. The core objective is to characterize the language pairs by using the capability of the multilingual pre-training itself, more effectively select and cluster the auxiliary languages and improve their generalization between different models and datasets, and finally improve the translation quality of the low-resource language pairs under the multilingual co-training.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a pseudo language family clustering method based on a multilingual pre-training large model provided by an embodiment of the invention;
FIG. 2 is a schematic illustration of a language pair (XX-en) provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of a distribution of 40% higher Fisher information parameters in a model structure provided by an embodiment of the present invention;
FIG. 4 is a block diagram of a pseudo language family clustering device based on a multilingual pre-training large model provided by an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a pseudo language family clustering device based on a multilingual pre-training large model according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is described below with reference to the accompanying drawings.
In embodiments of the invention, words such as "exemplary," "such as" and the like are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the term use of an example is intended to present concepts in a concrete fashion. Furthermore, in embodiments of the present invention, the meaning of "and/or" may be that of both, or may be that of either, optionally one of both.
In the embodiments of the present invention, "image" and "picture" may be sometimes used in combination, and it should be noted that the meaning of the expression is consistent when the distinction is not emphasized. "of", "corresponding" and "corresponding" are sometimes used in combination, and it should be noted that the meaning of the expression is consistent when the distinction is not emphasized.
In the embodiment of the present invention, sometimes a subscript such as W1 may be wrongly expressed in a non-subscript form such as W1, and the meaning of the subscript is consistent when the distinction is not emphasized.
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a pseudo language family clustering method based on a multi-language pre-training large model, which can be realized by pseudo language family clustering equipment based on the multi-language pre-training large model, and the pseudo language family clustering equipment based on the multi-language pre-training large model can be a terminal or a server. The process flow of the pseudo language family clustering method based on the multilingual pre-training large model as shown in fig. 1 can comprise the following steps:
S101, establishing a shared language pool;
in a possible implementation manner, in step S101, the establishing a shared language pool includes:
acquiring a TED data set;
extracting multiple languages in the TED data set, translating the multiple languages into language pairs of English as a basic data set, and establishing a shared language pool.
In one possible implementation, for language pair characterization studies, a shared language pool is first created, containing high-resource and low-resource language pairs and spanning multiple language families. The invention selects a TED data set, and uses 17 kinds of data from language to English (en for short) as a basic data set of the invention. These language pairs together constitute a pool of alternative shared languages that will be used in subsequent selections as alternative auxiliary languages to the low-resource language. These languages span seven different families of languages: the schiff language (Balto-Slavic), the south island language (austonesian), the india-iran language (Indo-Iranian), the syncope language (turkey), the japan language (japan), the korean language (Koreanic) and the hiraman language (Germanic) are shown in fig. 2.
S102, calculating a Fisher information matrix of the language pairs in the shared language pool based on the multi-language pre-training large model, and obtaining a characterization result of the language pairs in the shared language pool;
In a possible implementation manner, in step S102, based on the multilingual pre-training large model, a fee-house information matrix of the language pairs in the shared language pool is calculated, and a characterization result of the language pairs in the shared language pool is obtained, including:
acquiring a parallel corpus corresponding to the languages in the shared language pool, and equally dividing data in the parallel corpus into j small batch data sets;
sequentially inputting the small batch data sets into a multilingual pre-training large model, and outputting a Fisher information matrix of each small batch data set;
calculating an average Fisher information matrix of each small batch data set after one input round, and taking the average Fisher information matrix as an estimated value to obtain the Fisher information weight of each small batch data set;
and characterizing the distribution of the corresponding language pairs in the shared language pool according to the weight of the Fisher information.
The invention selects FIM to calculate the parameters in the pre-training model, calculates the weight of the Fischer-Tropsch information of each parameter, the weight size represents the importance of the parameter, and the parameter with large weight represents the sensitivity of the parameter in a specific translation direction. The parameters used to evaluate and select those parameters that are sensitive to a particular translation direction, i.e., parameters that require extensive weight updates during the fine tuning phase, by parameter calculation are important indicators for evaluating the importance and potential value of a particular parameter. Essentially, it quantifies the variance of the first derivative of the logarithm of the likelihood function. By measuring the size of this metric, the necessity of fine tuning a particular parameter in a subsequent task can be inferred. The original calculation formula is as follows:
Wherein X and Y represent the input and output of the model, respectively; θ represents a parameter of the model; p represents probability distribution of output Y under the condition that the input is X and the parameter is theta; t represents a matrix transfer rank; e represents the desire. For the i-th parameter, use of the diagonal matrix helps to estimate the fischer information matrix:
while using a diagonal matrix helps to estimate FIM, obtaining an accurate probability estimate remains a difficult task. In view of this, we approximate FIM using the following formula:
here, D represents the entire data set, |d| represents the number of data, and the entire data set is divided into j small batches of equal data quantities and sequentially input into the model for training. The invention inputs the language-corresponding parallel corpus into the model, and calculates FIM of each small batch in only one round (epoch). During model training we accumulate FIM for each small batch using equation (3), but do not back propagate. After one epoch is completed, we calculate the average FIM obtained for each small batch as the final estimate.
Further, the invention analyzes the distribution of high Fisher information parameters in the pre-trained model structure. The distribution of 40% higher parameters was observed in the study as shown in figure 3. The pre-training model was initially divided into 5 parts: encoder attention layer (encoder attention layer, E for short) a ) Full connection layer of encoder (encoder fully connected layer, E for short) f ) Decoder self-attention, D a ) Decoder cross attention layer (decoder cross attention layer, D for short) c ) And decoder full connection layer (decoder fully connected layer, D for short) f ). Wherein more than 60% of the parameters are distributed in a feed-forward network (feed-for)ward networks, abbreviated as FFN), the present invention selects the FFN layer as the FIM layer for calculation and measures the similarity.
S103, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value;
in a possible implementation manner, in step S103, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value includes:
obtaining a characterization result;
and calculating the distance between the language pair in the shared language pool and the target language pair by adopting a mean square error method, wherein the distance is similar to the target language pair, and the similarity is higher.
In one possible implementation, the mean square error (Mean Square Error, MSE): the calculation formula is as follows, wherein t, a are respectively a target language pair and an auxiliary language pair, S (t,a) For the distance between t and a, the closer the distance is, the higher the similarity is, F is FIM, |F t The i indicates the number of parameters.
In a possible implementation manner, in step S103, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value includes:
and calculating KL divergence of the language pairs in the shared language pool and the target language pairs by using the fee-house information matrix to obtain the distance between the language pairs in the shared language pool and the target language pairs, wherein the distance is similar, and the similarity is higher.
In one possible embodiment, KL divergence (Kullback-Leibler divergence, abbreviated KL): the FIM is directly used to calculate the KL divergence of the language pairs from the target language pairs in the shared language pool, thereby more accurately representing the distance between the language pairs. The calculation formula is as follows, where the sign is consistent with that described in MSE, |·| represents the calculated absolute value.
In a possible implementation manner, in step S103, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value includes:
selecting and assigning a value of 1 to the parameter of the previous K, and assigning a value of 0 to the remaining parameters to create a fee-house information mask;
and calculating the distance between the target language pair and the auxiliary language pair according to the number of the parameters activated simultaneously and the number of the parameters activated in the target direction, wherein the distance is similar to the distance, and the similarity is higher.
In a possible implementation, overlap similarity (Overlap Similarity, abbreviated as overlay): unlike the first two calculation methods, the overlapping similarity does not directly use FIM, and the fiscal information mask (fisher information mask, abbreviated as M) is created by selecting and assigning a value of 1 to the parameter of the previous K and a value of 0 to the remaining parameters. The calculation is as follows, wherein overlay and activation represent the number of parameters activated simultaneously and the number of parameters activated in the target direction.
In the overlay method, the invention defaults to 40% as the default parameter for K because it achieves the best translation effect.
S104, sorting the similarity among the language pairs according to the similarity value, and selecting auxiliary language pairs conforming to the boundary value according to the preset boundary value to complete pseudo language family clustering based on the multilingual pre-training large model.
In a possible implementation manner, in step S104, the method includes sorting the similarities between the language pairs according to the similarity values, selecting the auxiliary language pairs conforming to the boundary values according to the preset boundary values, and completing the pseudo language family clustering based on the multilingual pre-training large model, including:
traversing and calculating the similarity between all language pairs;
Descending order according to the similarity between the language pairs;
presetting an initial searching radius, and defining a boundary range according to the initial searching radius;
integrating the nearest language pair in the boundary range into an auxiliary language list;
updating the search radius according to the similarity between the latest added language pair and the target language pair;
and repeatedly updating the search radius until new language pairs are not expanded, obtaining clustered pseudo language families, and completing pseudo language family clustering based on the multilingual pre-training large model.
In a possible embodiment, the invention designs a simple algorithm to select the auxiliary language. First, we calculate the similarity of other language pairs to the target language pair using the similarity measurement method described above, rank the similarity between the language pairs, and then set an initial search radius. Within this predefined boundary, the closest language pairs are integrated into the auxiliary language list. The radius is then adjusted based on the similarity measure of the newly added language pair to the target language pair. This process is repeated until the new language pair is no longer expanded. The present invention refers to such clustered language families as pseudo language families. The algorithm for selecting the auxiliary language pairs is as follows:
1. Arranging the similarity in descending order (MSE and KL in ascending order, and overtop descending order) to create a list L;
2. initializing Gap as L1-L0, adding the first language into the auxiliary list;
3. from i=2 iterations to the end of L, per loop we choose the language as follows:
a) If |L [ i-1] -L [ i ] | < Gap, adding the ith language to the auxiliary list, and updating gap= |L [ i-1] -L [ i ] |;
b) If |L [ i-1]]-L[i]I < Gap, add the ith language to the auxiliary list, and update
c) If |L [ i-1] -L [ i ] | > Gap, the loop is terminated;
4. the language pairs in the auxiliary list with the target pair constitute a family of pseudo languages of the target language pair.
In a possible embodiment, to evaluate the method of the invention, the following baseline (baseline) was designed based on the m2m100—418M model:
pre-trained: directly translating the target language pair by using the pre-training model without any fine adjustment;
ft (fine-tune): fine tuning the basic model by using bilingual data of the target language pair;
lf (language family): fine tuning using the traditional language family divided in fig. 2, taking temperature samples and setting temperature to 1.5;
LF+FT: based on the LF method, the data of the target language pair is used for further fine tuning.
For the training phase, the batch size is set to 4096; the method provided by the invention carries out up-sampling on the training data so as to make the training data identical, so as to ensure that the proportion of each language in each small batch is identical, except for Hindi (Hindi, short for hi), and when using the overlay method, the same sampling mode as LF is used. Optimization using Adam optimizer, where β 1 =0.98、β 2 =0.98 and e=10e -6 . Learning rate lr=3e -5 . The translation quality was evaluated in the present invention using the BLEU (bilingual evaluation understudy, bilingual replacement evaluation) value.
TABLE 1 pseudo language family under different method choices
Table 2 BLEU scores for various low resource language pairs under different approaches
Table 1 shows pseudo language families clustered for low resource language pairs in a shared language pool by the method of the present invention, and table 2 shows test results for the test language, bos (fas), hindi (hi), bangla (bn), indonesian (id), malaysia (Malay, ms), to English on the TED data set.
Models (1) to (4) represent the baseline in the current common fine tuning, and models (5) to (7) represent our implementation of the method using different calculation methods.
The invention uses three measurement modes of MSE, KL and Overlap to calculate the distance or similarity between language pairs, so as to cluster pseudo language families and verify on low-resource language pairs. The evaluation results show that the BLEU score is further improved by all three calculation modes, the finally obtained lifting effect is equivalent, and the model (7) obtains the best lifting effect.
FIG. 4 is a block diagram illustrating a pseudo language family clustering apparatus based on a multilingual pre-training large model for a pseudo language family clustering method based on a multilingual pre-training large model, according to an exemplary embodiment. Referring to fig. 4, the apparatus includes a language pool module 410, a characterization module 420, a similarity calculation module 430, and a clustering module 440. For ease of illustration, fig. 4 shows only the main components of the full-flow visualization device 400:
a language pool module 410 for creating a shared language pool;
the characterization module 420 is configured to calculate a fee-house information matrix of the language pairs in the shared language pool based on the multilingual pre-training large model, and obtain a characterization result of the language pairs in the shared language pool;
the similarity calculation module 430 is configured to calculate a similarity between the language pairs according to the characterization result, and obtain a similarity value;
The clustering module 440 is configured to sort the similarities between the language pairs according to the similarity values, select an auxiliary language pair conforming to the boundary values according to the preset boundary values, and complete pseudo language family clustering based on the multilingual pre-training large model.
Optionally, a language pool module 410 for obtaining a TED dataset;
extracting multiple languages in the TED data set, translating the multiple languages into language pairs of English as a basic data set, and establishing a shared language pool.
Optionally, the characterization module 420 is configured to obtain a parallel corpus corresponding to a language in the shared language pool, and divide data in the parallel corpus into j small batch data sets;
sequentially inputting the small batch data sets into a multilingual pre-training large model, and outputting a Fisher information matrix of each small batch data set;
calculating an average Fisher information matrix of each small batch data set after one input round, and taking the average Fisher information matrix as an estimated value to obtain the Fisher information weight of each small batch data set;
and characterizing the distribution of the corresponding language pairs in the shared language pool according to the weight of the Fisher information.
Optionally, a similarity calculation module 430 is configured to obtain a characterization result;
And calculating the distance between the language pair in the shared language pool and the target language pair by adopting a mean square error method, wherein the distance is similar to the target language pair, and the similarity is higher.
Optionally, the similarity calculation module 430 is configured to calculate KL divergences between the language pairs in the shared language pool and the target language pairs by using the fee-house information matrix, so as to obtain distances between the language pairs in the shared language pool and the target language pairs, where the distances are similar, and the similarity is higher.
Optionally, a similarity calculation module 430 is configured to select and assign a value of 1 to the parameter of the previous K, and assign a value of 0 to the remaining parameters to create a fee-house information mask;
and calculating the distance between the language pair in the shared language pool and the target language pair according to the number of the parameters activated simultaneously and the number of the parameters activated in the target direction, wherein the distance is similar, and the higher the similarity is.
Optionally, a clustering module 440 for computing the similarity between all language pairs in a traversal manner;
descending order according to the similarity between the language pairs;
presetting an initial searching radius, and defining a boundary range according to the initial searching radius;
integrating the nearest language pair in the boundary range into an auxiliary language list;
updating the search radius according to the similarity between the latest added language pair and the target language pair;
And repeatedly updating the search radius until new language pairs are not expanded, obtaining clustered pseudo language families, and completing pseudo language family clustering based on the multilingual pre-training large model.
Aiming at the limitation that additional priori knowledge is needed or the model architecture needs to be modified in the prior art, the invention provides a method for constructing more effective language to perform multi-language collaborative training on a clustering method. The core objective is to characterize the language pairs by using the capability of the multilingual pre-training itself, more effectively select and cluster the auxiliary languages and improve their generalization between different models and datasets, and finally improve the translation quality of the low-resource language pairs under the multilingual co-training.
Fig. 5 is a schematic structural diagram of a pseudo language family clustering device based on a multilingual pre-training large model according to an embodiment of the present invention, where, as shown in fig. 5, the pseudo language family clustering device based on the multilingual pre-training large model may include the pseudo language family clustering device based on the multilingual pre-training large model shown in fig. 4. Optionally, a pseudo language family clustering device 510 based on a multilingual pre-trained large model may include a processor 2001.
Optionally, the pseudo language family clustering device 510 based on a multilingual pre-trained large model may also include a memory 2002 and a transceiver 2003.
The processor 2001 may be connected to the memory 2002 and the transceiver 2003 via a communication bus, for example.
The following describes the components of the pseudo language family clustering device 510 based on the multilingual pre-training large model in detail with reference to fig. 5:
the processor 2001 is a control center of the pseudo language family clustering device 510 based on a multilingual pre-training large model, and may be one processor or a generic name of a plurality of processing elements. For example, processor 2001 is one or more central processing units (central processing unit, CPU), but may also be an integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, such as: one or more microprocessors (digital signal processor, DSPs), or one or more field programmable gate arrays (field programmable gate array, FPGAs).
Alternatively, the processor 2001 may perform various functions of the pseudo language family clustering device 510 based on the multilingual pre-training large model by running or executing a software program stored in the memory 2002, and invoking data stored in the memory 2002.
In a particular implementation, the processor 2001 may include one or more CPUs, such as CPU0 and CPU1 shown in FIG. 4, as an example.
In a particular implementation, as one embodiment, the pseudo language family clustering device 510 based on a multilingual pre-trained large model may also include multiple processors, such as processor 2001 and processor 2004 shown in fig. 4. Each of these processors may be a single-core processor (single-CPU) or a multi-core processor (multi-CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
The memory 2002 is used for storing a software program for executing the solution of the present invention, and is controlled by the processor 2001 to execute the solution, and the specific implementation may refer to the above method embodiment, which is not described herein again.
Alternatively, memory 2002 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, without limitation. Memory 2002 may be integrated with processor 2001 or may exist separately and be coupled to processor 2001 via an interface circuit (not shown in fig. 5) of pseudo-language family clustering device 510 based on a multilingual pre-trained large model, as embodiments of the present invention are not limited in this regard.
A transceiver 2003 for communicating with a network device or with a terminal device.
Alternatively, transceiver 2003 may include a receiver and a transmitter (not separately shown in fig. 5). The receiver is used for realizing the receiving function, and the transmitter is used for realizing the transmitting function.
Alternatively, the transceiver 2003 may be integrated with the processor 2001, or may exist separately, and be coupled to the processor 2001 through an interface circuit (not shown in fig. 5) of the pseudo language family clustering device 510 based on a multilingual pre-trained large model, as embodiments of the present invention are not particularly limited.
It should be noted that the structure of the pseudo language family clustering device 510 based on the multilingual pre-training large model shown in fig. 5 is not limited to this router, and the actual knowledge structure recognition device may include more or less components than those illustrated, or may combine some components, or may be a different arrangement of components.
In addition, the technical effects of the pseudo language family clustering device 410 based on the multilingual pre-training large model may refer to the technical effects of the pseudo language family clustering method based on the multilingual pre-training large model described in the above method embodiment, and will not be described herein.
It is to be appreciated that the processor 2001 in embodiments of the invention may be a central processing unit (central processing unit, CPU) which may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
The above embodiments may be implemented in whole or in part by software, hardware (e.g., circuitry), firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.
In the present invention, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another device, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A pseudo language family clustering method based on a multilingual pre-training large model, the method comprising:
s1, establishing a shared language pool;
s2, calculating a Fisher information matrix of the language pairs in the shared language pool based on the multi-language pre-training large model, and obtaining a characterization result of the language pairs in the shared language pool;
s3, calculating the similarity between the language pairs according to the characterization result to obtain a similarity value;
s4, sorting the similarity among the language pairs according to the similarity value, selecting auxiliary language pairs conforming to the boundary value according to a preset boundary value, and completing pseudo language family clustering based on a multilingual pre-training large model.
2. The method according to claim 1, wherein in the step S1, the step of creating the shared language pool includes:
Acquiring a TED data set;
extracting multiple languages in the TED data set, translating the multiple languages into language pairs of English to serve as a basic data set, and establishing a shared language pool.
3. The method according to claim 1, wherein in the step S2, based on the multilingual pre-training large model, a fee-house information matrix of the language pairs in the shared language pool is calculated, and the characterization result of the language pairs in the shared language pool is obtained, including:
acquiring a parallel corpus corresponding to the languages in the shared language pool, and equally dividing the data in the parallel corpus into j small batch data sets;
sequentially inputting the small batch data sets into a multilingual pre-training large model, and outputting a Fisher information matrix of each small batch data set;
calculating an average Fisher information matrix of each small batch data set after one input round, and taking the average Fisher information matrix as an estimated value to obtain Fisher information weight of each small batch data set;
and characterizing the distribution of the corresponding language pairs in the shared language pool according to the Fisher information weight.
4. The method according to claim 3, wherein in the step S3, the step of calculating the similarity between the language pairs according to the characterization result to obtain a similarity value includes:
Obtaining a characterization result; selecting a target language pair;
and calculating the distance between the language pair in the shared language pool and the target language pair by adopting a mean square error method, wherein the distance is similar to the target language pair, and the higher the similarity is.
5. The method according to claim 3, wherein in the step S3, the step of calculating the similarity between the language pairs according to the characterization result to obtain a similarity value includes:
selecting a target language pair;
and calculating the KL divergence of the language pairs in the shared language pool and the target language pairs by using the fee-house information matrix to obtain the distance between the language pairs in the shared language pool, wherein the distance is similar, and the similarity is higher.
6. The method according to claim 3, wherein in the step S3, the step of calculating the similarity between the language pairs according to the characterization result to obtain a similarity value includes:
selecting a target language pair;
selecting and assigning a value of 1 to the parameter of the previous K, and assigning a value of 0 to the remaining parameters to create a fee-house information mask;
and calculating the distance between the language pair in the shared language pool and the target language pair according to the number of the parameters activated simultaneously and the number of the parameters activated in the target direction, wherein the distance is similar, and the higher the similarity is.
7. The method according to claim 4, 5 or 6, wherein in step S4, the similarity between the language pairs is ranked according to the similarity value, the auxiliary language pair conforming to the boundary value is selected according to a preset boundary value, and the completion of the pseudo language family clustering based on the multilingual pre-training large model includes:
traversing and calculating the similarity between all language pairs;
descending order according to the similarity between the language pairs;
presetting an initial searching radius, and defining a boundary range according to the initial searching radius;
integrating the nearest language pair in the boundary range into an auxiliary language list;
updating the search radius according to the similarity between the latest added language pair and the target language pair;
and repeatedly updating the search radius until new language pairs are not expanded, obtaining clustered pseudo language families, and completing pseudo language family clustering based on the multilingual pre-training large model.
8. A pseudo-language family clustering device based on a multilingual pre-training large model, the device comprising:
the language pool module is used for establishing a shared language pool;
the characterization module is used for calculating a Fisher information matrix of the language pairs in the shared language pool based on the multi-language pre-training large model to obtain a characterization result of the language pairs in the shared language pool;
The similarity calculation module is used for calculating the similarity between the language pairs according to the characterization result to obtain a similarity value;
and the clustering module is used for sequencing the similarity among the language pairs according to the similarity value, selecting auxiliary language pairs conforming to the boundary value according to a preset boundary value, and completing pseudo language family clustering based on the multilingual pre-training large model.
9. A pseudo language family clustering device based on a multilingual pre-training large model, the pseudo language family clustering device based on the multilingual pre-training large model comprising:
a processor;
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 7.
10. A computer readable storage medium having stored therein program code which is callable by a processor to perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311653724.1A CN117688176B (en) | 2023-12-04 | 2023-12-04 | Pseudo language family clustering method and device based on multilingual pre-training large model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311653724.1A CN117688176B (en) | 2023-12-04 | 2023-12-04 | Pseudo language family clustering method and device based on multilingual pre-training large model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117688176A true CN117688176A (en) | 2024-03-12 |
CN117688176B CN117688176B (en) | 2024-09-24 |
Family
ID=90134409
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311653724.1A Active CN117688176B (en) | 2023-12-04 | 2023-12-04 | Pseudo language family clustering method and device based on multilingual pre-training large model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117688176B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2007217900A1 (en) * | 2006-02-17 | 2007-08-30 | Google Llc | Encoding and adaptive, scalable accessing of distributed models |
CN112257468A (en) * | 2020-11-03 | 2021-01-22 | 沈阳雅译网络技术有限公司 | Method for improving translation performance of multi-language neural machine |
US20210174019A1 (en) * | 2019-12-10 | 2021-06-10 | Beijing Xiaomi Mobile Software Co., Ltd. | Method, device and storage medium for training machine translation model |
CN114048760A (en) * | 2021-09-27 | 2022-02-15 | 中国科学院自动化研究所 | Multi-language machine translation model training method, multi-language translation method and device |
CN116957070A (en) * | 2023-03-31 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Multitasking training method and device, storage medium and electronic equipment |
CN117094334A (en) * | 2023-08-21 | 2023-11-21 | 腾讯科技(深圳)有限公司 | Data processing method, device and equipment based on large language model |
-
2023
- 2023-12-04 CN CN202311653724.1A patent/CN117688176B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2007217900A1 (en) * | 2006-02-17 | 2007-08-30 | Google Llc | Encoding and adaptive, scalable accessing of distributed models |
US20210174019A1 (en) * | 2019-12-10 | 2021-06-10 | Beijing Xiaomi Mobile Software Co., Ltd. | Method, device and storage medium for training machine translation model |
CN112257468A (en) * | 2020-11-03 | 2021-01-22 | 沈阳雅译网络技术有限公司 | Method for improving translation performance of multi-language neural machine |
CN114048760A (en) * | 2021-09-27 | 2022-02-15 | 中国科学院自动化研究所 | Multi-language machine translation model training method, multi-language translation method and device |
CN116957070A (en) * | 2023-03-31 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Multitasking training method and device, storage medium and electronic equipment |
CN117094334A (en) * | 2023-08-21 | 2023-11-21 | 腾讯科技(深圳)有限公司 | Data processing method, device and equipment based on large language model |
Non-Patent Citations (1)
Title |
---|
薛擎天;李军辉;贡正仙;: "多语言的无监督神经机器翻译", 厦门大学学报(自然科学版), no. 02, 23 March 2020 (2020-03-23), pages 50 - 55 * |
Also Published As
Publication number | Publication date |
---|---|
CN117688176B (en) | 2024-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11604956B2 (en) | Sequence-to-sequence prediction using a neural network model | |
Anastasopoulos et al. | Pushing the limits of low-resource morphological inflection | |
US10997503B2 (en) | Computationally efficient neural network architecture search | |
CN111444320B (en) | Text retrieval method and device, computer equipment and storage medium | |
US11093719B2 (en) | Machine translation method and apparatus | |
US11468324B2 (en) | Method and apparatus with model training and/or sequence recognition | |
EP3420491B1 (en) | Differentially private iteratively reweighted least squares | |
US11106873B2 (en) | Context-based translation retrieval via multilingual space | |
US11657802B2 (en) | Utilizing a dynamic memory network for state tracking | |
CN110175336B (en) | Translation method and device and electronic equipment | |
CN112905735A (en) | Method and apparatus for natural language processing | |
US11734339B2 (en) | Generating embeddings in a multimodal embedding space for cross-lingual digital image retrieval | |
CN107391682B (en) | Knowledge verification method, knowledge verification apparatus, and storage medium | |
WO2020000764A1 (en) | Hindi-oriented multi-language mixed input method and device | |
WO2022042297A1 (en) | Text clustering method, apparatus, electronic device, and storage medium | |
WO2013127060A1 (en) | Techniques for transliterating input text from a first character set to a second character set | |
US20230223112A1 (en) | Retrosynthesis using neural networks | |
Smania et al. | Conditional distribution modeling as an alternative method for covariates simulation: comparison with joint multivariate normal and bootstrap techniques | |
Cao et al. | CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization | |
CN117688176B (en) | Pseudo language family clustering method and device based on multilingual pre-training large model | |
US10180938B2 (en) | Assisted free form decision definition using rules vocabulary | |
CN110795562A (en) | Map optimization method, device, terminal and storage medium | |
Pölsterl et al. | Scalable, axiomatic explanations of deep alzheimer’s diagnosis from heterogeneous data | |
WO2018066083A1 (en) | Learning program, information processing device and learning method | |
Wu et al. | Problems in the estimation of the key parameters using MLE in lung cancer screening |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |