CN113312453A - Model pre-training system for cross-language dialogue understanding - Google Patents
Model pre-training system for cross-language dialogue understanding Download PDFInfo
- Publication number
- CN113312453A CN113312453A CN202110667409.9A CN202110667409A CN113312453A CN 113312453 A CN113312453 A CN 113312453A CN 202110667409 A CN202110667409 A CN 202110667409A CN 113312453 A CN113312453 A CN 113312453A
- Authority
- CN
- China
- Prior art keywords
- module
- dialogue
- word
- language
- cross
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3337—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Abstract
The invention relates to a model pre-training system for cross-language dialogue understanding. The invention aims to solve the problems that in the existing cross-language dialogue understanding scene, due to the fact that corpus of a small language is scarce, the model training effect is limited, an accurate dialogue understanding system cannot be obtained, and accurate reply cannot be completed to user words. A model pre-training system for cross-language dialogue understanding comprises: the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module. The invention is used in the field of cross-language dialog understanding.
Description
Technical Field
The invention relates to a model pre-training system for cross-language dialogue understanding, relates to a cross-language model pre-training system in the field of natural language processing, and relates to a dialogue understanding model training system in the field of natural language processing.
Background
Currently, a man-machine conversation system becomes a leading research hotspot in the industry due to the huge use value and prospect of the man-machine conversation system. Indeed, in the last 60 s, professor Joseph Weizonbaum of the university of Engineers in Massachusetts has begun to develop a human-machine dialog system Eliza (Weizonbaum J. ELIZA-a computer program for the student of natural language communication between human and machine [ J ]. Communications of the ACM,1966,9(1):36-45.) that is able to mimic the responses of psychotherapists and provide assistance to patients with psychological illnesses. In the years that follow, man-machine dialog systems for various purposes have also been developed due to the rapid development of natural language processing (Chowdhury G. Natural language processing [ J ]. Annual review of information science and technology,2003,37(1):51-89.) and deep learning (LeCun Y, Bengio Y, Hinton G. deep learning [ J ]. nature,2015,521(7553): 436-. The most prominent module behind these human-machine dialog systems is the dialog understanding system.
The dialog understanding system is able to understand the user's intentions and give corresponding replies and help, such as weather inquiries, airline reservations, ordering, device control for smart homes, voice control for car-mounted devices, etc. At present, the industry has many conversation understanding systems applied to mobile phones or smart home devices, but most of them are only adapted to languages such as chinese and english, which have wide application range. Similarly, Pre-training of a model of the conversational understanding system by researchers in academia (Wu C S, Hoi S, Socher R, et al. Tod-bert: Pre-trained natural language understanding for task-oriented languages [ J ]. arXiv preprinting arXiv:2004.06871,2020.) is also limited to English, and is rarely studied in cross-language scenarios. The important reason for this situation is that because the corpus is scarce in the field of conversational understanding labeled in the Chinese language, how to effectively utilize the existing conversational understanding corpus to assist training in the cross-language scene is a problem that needs to be solved at present.
Disclosure of Invention
The invention aims to solve the problems that in the existing cross-language dialogue understanding scene, due to the fact that corpus of a small language is scarce, model training effect is limited, an accurate dialogue understanding system cannot be obtained, and accurate reply cannot be completed to user words, and provides a cross-language dialogue understanding-oriented model pre-training system.
A model pre-training system for cross-language dialogue understanding comprises:
the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module;
the data acquisition module is used for collecting an English data set in the labeled dialogue understanding field;
the dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module and merging dialogue domain labels with the same meaning on different data sets;
the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, respectively segmenting the user words and the system replies, and simultaneously labeling a conversation field label for each sample by utilizing conversation field label information combined in the conversation field label sorting and combining module;
the target language determining module is used for determining a target language;
the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module;
the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English word with the word corresponding to the target language, and simultaneously keeping the original English word as a label to be predicted;
the coding module obtains a coded representation of the processed sample in the word replacement module by using a cross-language coding model;
the word replacement prediction module uses a fully-connected neural network, the encoding expression of each word in the sample obtained by the encoding module calculates the probability of the word which is possibly replaced in the dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module;
the dialogue domain prediction module to which the sample belongs uses a fully-connected neural network, the dialogue domain to which the sample belongs is judged by the coding expression of the whole sentence of the sample obtained by the coding module, and the cross entropy loss is calculated through the dialogue domain label marked in the training corpus sorting module;
the integral model acquisition module adds the cross entropy loss obtained by the word replacement prediction module and the cross entropy loss obtained by the dialogue field prediction module to which the sample belongs to obtain the final loss;
through the final loss, performing back propagation on the integral model and updating parameters of the integral model;
the overall model in the overall model acquisition module is a cross-language coding model in the coding module, and a word replaces the whole of a fully-connected neural network in the prediction module and a fully-connected neural network in the dialogue domain prediction module to which the sample belongs;
the training module trains an integral model in the integral model acquisition module by using the processed data in the training corpus sorting module and the word replacement module;
and the downstream task fine-tuning module in the cross-language dialogue understanding field uses the whole model trained by the training module as a pre-training model, and completes the tasks in the cross-language dialogue understanding field based on the pre-training model.
The invention has the beneficial effects that:
the invention provides a model pre-training system for cross-language dialogue understanding, which does not depend on cross-language labeled dialogue understanding data and can pre-train a dialogue understanding model in a cross-language scene only by utilizing the existing English data. In addition, the invention designs a self-supervision task, and utilizes a dictionary to automatically label, so that the model can learn the mapping relation between English words and words of other languages which are translation pairs in the pre-training process, thereby improving the overall expression between other languages and English on the pre-training model. Particularly, the invention also summarizes the dialogue domain labels in different English dialogue understanding data sets, and trains the model by using the labeled information, so that the model can learn the special knowledge of the dialogue understanding domain in the pre-training process. The method solves the problems that in the existing cross-language dialogue understanding scene, due to the fact that corpus of a small language is scarce, model training effect is limited, an accurate dialogue understanding system cannot be obtained, and accurate reply cannot be completed to user words.
The invention evaluates the data set of a conversational language understanding task in ten small languages of Arabic, German, Spanish, French, Italian, Malaysia, Polish, Russian, Thai and Turkish, which covers the two most classical subtasks in the field of conversational understanding: intent recognition and slot extraction. Experimental results show that the model pre-trained by the method can obtain a better result than a baseline model when a downstream task is trained.
The present invention trains the dialogue language understanding data set on the ten languages by using five random seeds respectively, takes the average result under the five random seeds as the current result, and compares the average result on the ten languages. The model pre-trained by the method has the intention recognition accuracy rate of 93.73 percent, is improved by 4.17 percent compared with a baseline model, has the slot position extraction F1 value of 66.80 percent, is improved by 3.03 percent compared with the baseline model, has the intention and slot position prediction overall accuracy rate of 38.01 percent, and is improved by 3.6 percent compared with the baseline model. The method has great improvement on various indexes, which also shows that the system provided by the invention is very effective for pre-training the cross-language dialogue understanding model.
Drawings
FIG. 1 is a mulberry diagram of conversation domain label summary categorization results for multiple conversation understanding datasets.
Detailed Description
The first embodiment is as follows: the model pre-training system for cross-language dialogue understanding includes:
the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module;
the data acquisition module is used for collecting an English data set in the labeled dialogue understanding field;
8 industry comparative classical public English dialog understanding datasets were collected including CamRest676(Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milicasic, Lina M Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young, 2016.A network end-to-end reliable task-oriented dialog system. arXiv preprintiv: 1604.04562.), WOZ (Nikola)Diarmuid O Séaghdha,Tsung-Hsien Wen,Blaise Thomson,and Steve Young.2016.Neural belief tracker:Data-driven dialogue state tracking.arXiv preprint arXiv:1606.03777.)、SMD(Mihail Eric and Christopher D Manning.2017.Keyvalue retrieval networks for task-oriented dialogue.arXiv preprint arXiv:1705.05414.)、MSR-E2E(Xiujun Li,Sarah Panda,JJ(Jingjing)Liu,and Jianfeng Gao.2018.Microsoft dialogue challenge:Building end-to-end task-completion dialogue systems.In SLT 2018.)、Taskmaster(Bill Byrne,Karthik Krishnamoorthi,Chinnadhurai Sankar,Arvind Neelakantan,Daniel Duckworth,Semih Yavuz,Ben Goodrich,Amit Dubey,Andy Cedilnik,and Kyu-Young Kim.2019.Taskmaster-1:Toward a realistic and diverse dialog dataset.arXiv preprint arXiv:1909.05358.)、Schema(Abhinav Rastogi,Xiaoxue Zang,Srinivas Sunkara,Raghav Gupta,and Pranav Khaitan.2019.Towards scalable multi-domain conversational agents:The schema-guided dialogue dataset.arXiv preprint arXiv:1909.05855.)、MetaLWOZ(Sungjin Lee,Hannes Schulz,Adam Atkinson,Jianfeng Gao,Kaheer Suleman,Layla El Asri,Mahmoud Adada,Minlie Huang,Shikhar Sharma,Wendy Tay,and Xiujun Li.2019.Multi-domain task-completion dialog challenge.In Dialog System Technology Challenges 8.)、MultiWOZ(Budzianowski,Tsung-Hsien Wen,Bo-Hsiang Tseng,Inigo Casanueva,Stefan Ultes,Osman Ramadan,and2018.Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling.arXiv preprint arXiv:1810.00278.)。
The dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module (such as weather inquiry, scheduled flight, meal ordering, intelligent household equipment control, vehicle-mounted equipment voice control and the like), and merging dialogue domain labels with the same meaning on different data sets (such as weather inquiry, scheduled flight, meal ordering, intelligent household equipment control, vehicle-mounted equipment voice control and the like);
the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, respectively segmenting the user words and the system replies according to blank spaces, and labeling a conversation field label for each sample by utilizing conversation field label information combined in the conversation field label sorting and combining module;
the target language determining module is used for determining a target language in the pre-training process according to the research current situations in the academic world and the industry and the use range and frequency of each international language;
we manually selected 10 representative languages among the languages used in various countries, including: arabic, German, Spanish, French, Italian, Malaysia, Polish, Russian, Thai, Turkish;
the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module;
the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English words with the words corresponding to the target language, and simultaneously keeping original English words (randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module) as labels to be predicted;
the coding module obtains a coded representation of the processed sample in the word replacement module by using a cross-language coding model;
the word replacement prediction module uses a full-connection neural network (the full-connection neural networks of the word replacement prediction module and the dialogue domain prediction module to which the sample belongs are different and have different parameters), the coding of each word in the sample obtained by the coding module represents the probability of the word which is possibly replaced in the calculation dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module;
the sample affiliated dialogue field prediction module uses a full-connection neural network (the full-connection neural networks of the word replacement prediction module and the sample affiliated dialogue field prediction module are different and have different parameters), the whole sentence of the sample obtained by the coding module (one sample is a user utterance and a system reply in a round of dialogue, one sample comprises a plurality of words, the plurality of words form a sentence, and one sample is a whole sentence) is coded and expressed to judge the dialogue field to which the sample belongs, and the cross entropy loss is calculated through dialogue field labels marked in the training corpus sorting module;
the integral model acquisition module adds the cross entropy loss obtained by the word replacement prediction module and the cross entropy loss obtained by the dialogue field prediction module to which the sample belongs to obtain the final loss;
through the final loss, performing back propagation on the integral model and updating parameters of the integral model;
the overall model in the overall model acquisition module is a cross-language coding model in the coding module, and a word replaces the whole of a fully-connected neural network in the prediction module and a fully-connected neural network in the dialogue domain prediction module to which the sample belongs;
the training module trains an integral model in the integral model acquisition module by using the processed data in the training corpus sorting module and the word replacement module;
the downstream task fine tuning module in the cross-language dialogue understanding field uses the whole model trained by the training module as a pre-training model, and completes tasks in the cross-language dialogue understanding field based on the pre-training model;
tasks within the domain of cross-language dialog understanding include: cross-language dialog language understanding (Cross-language dialog language understanding), Cross-language Intent recognition (Cross-language Intent detection), Cross-language dialog state tracking (Cross-language dialog state tracking), Cross-language dialog behavior prediction (Cross-language dialog behavior prediction), Cross-language reply selection (Cross-language Response selection), and the like. The parameters of the pre-training model are respectively used as the initialization parameters of a BERT architecture-based cross-language dialogue language understanding model, a BERT architecture-based cross-language intention recognition model, a BERT architecture-based cross-language dialogue state tracking model, a BERT architecture-based cross-language dialogue behavior prediction model, a BERT architecture-based cross-language reply selection model and other models, the BERT architecture-based cross-language dialogue language understanding model, the BERT architecture-based cross-language intention recognition model, the BERT architecture-based cross-language dialogue state tracking model, the BERT architecture-based cross-language dialogue behavior prediction model, the BERT architecture-based cross-language reply selection model and other models are respectively trained to respectively obtain a trained BERT architecture-based cross-language dialogue language understanding model, a BERT architecture-based cross-language intention recognition model, a BERT architecture-based cross-language state tracking model, a BERT architecture-based cross-language reply selection model and other models, The method comprises the steps of generating a model of cross-language dialogue behavior prediction based on a BERT architecture, generating a model of cross-language reply selection based on the BERT architecture and the like, and accordingly completing tasks of cross-language dialogue language understanding, cross-language intention recognition, cross-language dialogue state tracking, cross-language dialogue behavior prediction, cross-language reply selection and the like.
The second embodiment is as follows: the difference between the present embodiment and the specific embodiment is that the dialogue domain tag sorting and merging module is configured to sort dialogue domain tags marked on all data sets in the data acquisition module (for example, weather inquiry, scheduled flight, meal ordering, device control of smart home, voice control of vehicle-mounted devices, and the like), and merge dialogue domain tags having the same meaning on different data sets (for example, weather inquiry, scheduled flight, meal ordering, device control of smart home, voice control of vehicle-mounted devices, and the like); the specific process is as follows:
step two, arranging dialogue domain labels marked on all data sets in the data acquisition module, wherein 1 dialogue domain label is arranged in CamRest676, 1 dialogue domain label is arranged in WOZ, 3 dialogue domain labels are arranged in SMD, 3 dialogue domain labels are arranged in MSR-E2E, 6 dialogue domain labels are arranged in Taskmaster, 17 dialogue domain labels are arranged in Schema, 47 dialogue domain labels are arranged in MetaLWOZ, and 6 dialogue domain labels are arranged in MultiWOZ;
and step two, classifying the conversation field labels with the same meaning on different data sets into the same category through manual screening, wherein the classification result is shown in figure 1. In fig. 1, the left text represents the name of the data set and the number of samples therein, the right text represents the name of the classified dialogue area and the number of samples contained therein, the sum of the numbers of the two sides is equal, the arc line connecting the left side and the right side represents that a part of the samples in the data set shown on the left side corresponds to the dialogue area label shown on the right side, and the width of the arc line represents the ratio of the part of the samples in all the samples. The 8 data sets are collated and gathered to form 59 conversation field labels.
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: the difference between the embodiment and the specific embodiment is that the corpus training arrangement module is used for segmenting dialogue corpora in all data sets collected by the data acquisition module, taking user words and system replies in a round of dialogue as a sample, segmenting the user words and the system replies according to blank spaces, and labeling dialogue domain labels on each sample by using dialogue domain label information combined in the step two; the specific process is as follows:
step three, the dialogue understanding corpus in the data set collected in the data acquisition module is multi-turn dialogue, and each dialogue can be expressed as D ═ U1,R1,…,UN,RN};
Wherein N represents the number of dialogue rounds, U1And R1User utterances and system replies, U, representing the 1 st round of dialogue, respectivelyNAnd RNRepresenting user utterances and system replies, respectively, for the nth round of dialog;
taking the user's words and system replies in one turn as a sample, and dividing words for them according to blank space, inserting separator [ SEP ] between them]And insert an identifier [ CLS ] at the beginning of the sentence]Is used to represent global information, resulting in a sample S { [ CLS { [],u1,u2,…,ui,[SEP],r1,r2,…,rj};
Wherein u is1And r1Representing the 1 st word, u, in the user utterance and the system reply, respectively2And r2Representing the 2 nd word, u, in the user utterance and the system reply, respectivelyiRepresenting the i-th word, r, in the user utterancejRepresenting the jth word in the system reply, i representing the length after word segmentation for the user utterance; j represents the length after the system is replied with participles;
step two, marking a conversation field label on each sample by utilizing the conversation field label information combined in the conversation field label sorting and combining module (because part of the samples collected in the step 1 are marked to a plurality of conversation field labels, the invention only considers the condition of a single conversation field label, therefore, if the sample belongs to a plurality of conversation fields after being marked, the sample is ignored), and each sample marked with the conversation field label is expressed as:
S={Stokens=[CLS],u1,u2,…,ui,[SEP],r1,r2,…,rj;Sdomain=d},
wherein d is the dialog domain label corresponding to the sample, StokensFor each sample the sequence of processed input characters, tokens (the words are those u1,u2,r1,r2The characters, in addition to words, also include [ CLS]And [ SEP ]]),SdomainA dialogue domain label for each sample;
the finished samples amounted to 457555;
other steps and parameters are the same as those in the first or second embodiment.
The fourth concrete implementation mode: the difference between this embodiment and the first to third embodiments is that the static dictionary determining module is configured to collect, according to the target language determined by the target language determining module, static dictionaries translated from english vocabulary to each target language; the specific process is as follows:
the dictionary translated from English to target language is downloaded through the website https:// github.
Other steps and parameters are the same as those in one of the first to third embodiments.
The fifth concrete implementation mode: the embodiment is different from the first to the fourth specific embodiments in that the word replacement module is configured to randomly select an english word with a certain proportion on each sample labeled with a dialogue field tag in the training corpus sorting module, randomly select a language from the target language determined by the target language determination module for each randomly selected word, translate each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determination module, replace the english word with a word corresponding to the target language, and simultaneously retain an original english word (an english word with a certain proportion is randomly selected on each sample labeled with a dialogue field tag in the training corpus sorting module) as a tag to be predicted; the specific process is as follows:
setting the randomly selected proportion as p% (15%);
at the same time, create SgoldensThe array is used to store the model label to be predicted (gold), and StokensOf the same length with [ PAD]Array pair S as placeholdergoldensCarry out initialization, i.e. Sgoldens=[PAD],…,[PAD]。
In addition, create SmasksArray for storing position information of the replaced words of the model, and StokensAll 0 array pairs S of the same lengthmasksCarry out initialization, i.e. Smasks=0,…,0;
S on each sample after labeling dialogue field labels in training corpus sorting moduletokensGenerating a random number of 0-1 for each t, if the random number is less than p%, randomly selecting target language from 10 languages determined in the target language determining module at equal probability, and translating t to the corresponding word t in the randomly selected target language by using the static dictionary collected in the static dictionary determining modulexLet t be t in the samplexReplacing the position, and storing the replaced t in SgoldensAs the label to be predicted, and simultaneously using S as the labelmasksThe value of this position is 1;
t∈{t|t∈Stokens,t≠[CLS],t≠[SEP]}
an example of a sample after word substitution is
Sgoldens=[PAD],…,uk,…,rl,…,rm,…,[PAD];Smasks=0,…,1,…,1,…,1,…,0}
Wherein the content of the first and second substances,u representing k position in user utterancekThe vocabulary of the target language after the word replacement,r representing the position of l in the system replylThe vocabulary of the target language after the word replacement,r representing m position in system recoverymAnd (5) performing word replacement on the target language vocabulary.
Other steps and parameters are the same as in one of the first to fourth embodiments.
The sixth specific implementation mode: this embodiment is different from one of the first to fifth embodiments in that Sgoldens、Stokens、SmasksAll have the same length, but SgoldensThere is a replaced t only at the position of the replaced word, and the other positions are [ PAD ]]Meaning that no prediction is required.
Other steps and parameters are the same as those in one of the first to fifth embodiments.
The seventh embodiment: the difference between this embodiment and one of the first to sixth embodiments is that the encoding module uses a cross-language encoding model to obtain an encoded representation of the processed samples in the word replacement module; the specific process is as follows:
XLM-RoBERTA-base (Conneau A, Khandelwal K, Goyal N, et al, Unstupervised cross-linear representation learning at scale [ J ] is selected]arXiv preprint arXiv:1911.02116,2019.) as a cross-language coding model, for a pass throughS processed by word replacement in word replacement moduletokensCoding is carried out to obtain the coded representation of each token
Wherein Cross _ Lingual _ Encoder is a Cross-language coding model, h[CLS]、h[SEP]Respectively represent [ CLS]And [ SEP ]]The tag is represented by a coded representation after being coded by a cross-language coding model,represents u1The coded representation after being coded by the cross-language coding model,is represented by r1And (4) representing the coded representation after the coding of the cross-language coding model.
Other steps and parameters are the same as those in one of the first to sixth embodiments.
The specific implementation mode is eight: the embodiment is different from the first to seventh embodiments in that the word replacement prediction module uses a fully-connected neural network (the fully-connected neural networks of the word replacement prediction module and the dialogue domain prediction module to which the sample belongs are different and have different parameters), the encoding of each word in the sample obtained by the encoding module represents the probability of the word which is possibly replaced in the calculation dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module; the specific process is as follows:
eighthly, using a fully connected neural network, calculating the probability of possibly replaced words in the dictionary according to the coded representation of each word in the sample obtained by the coding module
WhereinIs the weight of the fully-connected neural network, b is the bias of the fully-connected neural network, hiTo weave intoCoded representation of the i-th position, z, obtained in the code moduleiThe predicted probability for the word at the ith position (the representation of the word at the 1 st position is z)1At the x-th position is zxThe following word replacement task is to predict the replaced word for each position word respectively);
step eight two, through S constructed in the word replacement modulegoldensAnd SmasksCalculating cross entropy loss of Word Replacement task (WR)
Wherein V is the size of the vocabulary, zi,kRepresenting the predicted probability of the kth word at the ith position,a true tag (0 or 1, where 1 denotes the kth word and S) to the kth word at the ith positiongoldensThe words in the ith position are consistent, otherwise, the words are 0),cross entropy loss for i position;
wherein i is SmasksThe position of the replaced word stored in, Smasks[i]Denotes SmasksThe value of the i-th position in (c),is the sum of the losses over the positions of all replaced words.
Other steps and parameters are the same as those in one of the first to seventh embodiments.
The specific implementation method nine: the embodiment is different from one to eight of the specific embodiments in that the sample belonging dialogue domain prediction module uses a fully-connected neural network (the fully-connected neural networks of the word replacement prediction module and the sample belonging dialogue domain prediction module are different and have different parameters), the coding expression of a sample whole sentence obtained by the coding module (one sample is a user utterance and a system reply in a round of dialogue, one sample comprises a plurality of words, the plurality of words form a sentence, and one sample is a whole sentence) judges the dialogue domain to which the sample belongs, and the cross entropy loss is calculated through a dialogue domain label marked in the training corpus sorting module; the specific process is as follows:
step nine, using a fully-connected neural network, and obtaining an identifier [ CLS ] in a sample by an encoding module]Is represented by a code of[CLS]Calculating the probability of the dialogue area to which the sample belongs
and step nine, calculating the cross entropy loss of a dialogue Domain classification task (Domain Classifier, DC for short) through the dialogue Domain label marked in the training corpus sorting module.
Other steps and parameters are the same as those in one to eight of the embodiments.
The detailed implementation mode is ten: the difference between this embodiment and the first to ninth embodiments is that, in the ninth step, the cross entropy loss of the Domain classification task (Domain Classifier, abbreviated as DC) is calculated by the dialog Domain label marked in the corpus sorting module:
wherein D is the number of the conversation field tags collected in the conversation field tag sorting and merging module, ziFor the ith dialogue domain mark in' is zThe predicted probability of a signal is determined,true tag for ith dialogue domain for current sample (0 or 1, where 1 denotes the ith dialogue domain tag and SdomainConsistent, otherwise 0).
Other steps and parameters are the same as those in one of the first to ninth embodiments.
The following examples were used to demonstrate the beneficial effects of the present invention:
the first embodiment is as follows:
the embodiment selects a dialogue language understanding task as a downstream task in the dialogue understanding field, and gives a dialogue language understanding data set in the cross-language field, the task is to classify the intention of a dialogue language and extract corresponding slots in a sentence, and the task is specifically prepared according to the following steps:
collecting English dialogue language understanding data, translating the English dialogue language understanding data into a cross-language field, and labeling translated texts for training, verifying and testing models;
we downloaded the SNIPS dataset (Alice Cocke, AlaasAade, Adrien Ball, Th 'eodoreBluche, Alexandre Caulier, David Leroy, Cl' ementdouumouro, Thibault Gisslbercht, France co Caltagarone, Thibaut Lavril, et al.2018.SNIPS voice platform: an embedded spoke mapping arrangement system for private-by-design voice interface. arXiv print arXiv:1805.10190.), split 700 bar samples from its validation set (100 bars per intent, total 7 intents), split into two halves (350 bars for total 7 intents, 50 bars per intent), and simultaneously extract a random set of test bars from its validation set (350 bars per test bar, total 50 bars per test bar).
For the extracted training, verification and test set (1050 samples in total), the expert is requested to respectively translate the extracted training, verification and test set into Arabic, German, Spanish, French, Italian, Malaysia, Polish, Russian, Thai and Turkish languages, 10 languages in total, and the slot positions in the extracted training, verification and test set are marked again while the original intention labels of the extracted training, verification and test set are kept for the model.
Setting a baseline cross-language pre-training model;
XLM-RoBERTA-base was chosen as the baseline cross-language pre-training model for this example.
Step three, setting a dialogue language understanding task model architecture;
our model uses a whole model with a pipeline as an architecture. The overall model is composed of two models, namely an intention classification model and a slot extraction model.
Step four, training an intention classification model;
step four, obtaining the coding representation of the sample by using the cross-language pre-training model
Where Input is the Input sample, k is the sample length, h[CLS]Is a sample [ CLS]The coded representation at the label is represented by,for the coded representation of the first word in the sample,is the coded representation of the kth word in the sample;
step four, using a full-connection neural network to calculate the probability of the intention label of the current sample
WhereinB is the weight of the fully-connected neural network, b is the bias of the fully-connected neural network;
step four and step three, calculating cross entropy loss through the prediction probability in the step four and the step two
Wherein I is the number of intents summarized in step two,for the true label of the current sample to the i-th intention (0 or 1, where 1 means that the i-th intention label is the golden label of the sample, and vice versa is 0), ziA predicted probability for the model for the ith intention tag;
fourthly, performing back propagation through the loss calculated in the fourth step and the third step and updating model parameters;
step five, training a slot position extraction model;
step five, obtaining coded representation of samples by using cross-language pre-training model
Where Input is the Input sample, k is the sample length, h[CLS]Is a sample [ CLS]The coded representation at the label is represented by,for the coded representation of the first word in the sample,is the coded representation of the kth word in the sample;
step two, a full-connection neural network is respectively created for each intention, and the probability of the slot position label of each token position of the current sample is predicted through the golden label of the intention in the sample
WhereinWeight of the fully-connected neural network corresponding to the ith intention label, biBias of fully-connected neural network for ith intention tag, hkFor coded representation of words at k positions in the sample, zkPredicting the probability of the slot position of the word after the word passes through the model;
step five and step three, calculating cross entropy loss through the prediction probability in the step five and step two
Where L is the sample length, SiThe number of slot tags corresponding to the ith intention tag,for the true tag of the current sample k position to the s-th slot (0 or 1, where 1 means that the s-th slot tag is the golden slot tag of the sample k position, and vice versa is 0), zk,sPredicting probability of the current sample k position model to the s-th slot position label;
fifthly, performing back propagation through the loss calculated in the fifth step and the third step and updating model parameters;
the training processes of the intention classification model and the slot extraction model in the fourth step and the fifth step are mutually independent, and the two trained models form the integral model in the third step.
Predicting a final result and calculating an index;
sixthly, predicting a final result;
and secondly, predicting the slot position label on the sample by using the fully-connected neural network corresponding to the slot position extraction model trained in the fifth step by using the prediction result of the intention classification model.
Sixthly, calculating indexes;
let the number of mean predictions correct for all samples be CIntentIf the total number of samples is A, the Intent recognition accuracy (Intent Acc) is
Assuming that the number of correct Slot tag predictions is TP, the number of incorrect predictions is FP, and the number of unpredicted slots is FN in all tokens of all samples, the calculation method of Slot extraction F1 value (Slot F1) is as follows:
let C be the number of all samples for which the intent and all slots are predicted correctlyOverallWhen the total number of samples is A, the Overall recognition accuracy (Overall Acc) is
In order to balance result fluctuation caused by less test data, 5 different random seeds are selected from a training set for experiment, the average value of each index of each language under the 5 random seeds is counted, and finally the average experiment result of 10 languages is reported.
The final experimental results on the test set are shown in table 1.
TABLE 1 average experimental results of conversational language understanding tasks in Ten-door languages
The best results are shown in bold in the table.
Where the first row of experimental results shows our experimental results on the baseline model.
The second row shows the experimental results of a model pre-training system oriented to cross-language dialogue understanding according to the present invention.
The third row shows the experimental results of the word replacement method in the above-mentioned scheme of the present invention changed to Masked Language Model (Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep biological transformations for Language integrity [ J ]. arXiv preprintiv: 1810.04805,2018.).
The fourth row shows the experimental results after the dialogue domain classification fully-connected neural network in the scheme of the invention is removed.
The fifth row shows the experimental results after the word replacement method in the above scheme of the present invention is changed into Masked Language Model and the classified fully-connected neural network in the dialogue domain is removed.
As can be seen from the experimental results of the ablation experiments in the third, fourth and fifth rows of Table 1, all parts in the scheme of the present invention are indispensable, and the combined training of the word replacement model and the classification model in the dialogue domain can make the model effect better.
As can be seen from Table 1, the intention recognition accuracy of the cross-language dialogue understanding pre-training model trained by the method is improved by 4.17% compared with that of the baseline model, the slot extraction F1 value is improved by 3.03% compared with that of the baseline model, and the accuracy of the whole intention and slot prediction is improved by 3.60% compared with that of the baseline model. The method also proves that the overall effect of the cross-language dialogue understanding model can be remarkably improved.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.
Claims (10)
1. A model pre-training system for cross-language conversational understanding, characterized by: the system comprises:
the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module;
the data acquisition module is used for collecting an English data set in the labeled dialogue understanding field;
the dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module and merging dialogue domain labels with the same meaning on different data sets;
the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, respectively segmenting the user words and the system replies, and simultaneously labeling a conversation field label for each sample by utilizing conversation field label information combined in the conversation field label sorting and combining module;
the target language determining module is used for determining a target language;
the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module;
the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English word with the word corresponding to the target language, and simultaneously keeping the original English word as a label to be predicted;
the coding module obtains a coded representation of the processed sample in the word replacement module by using a cross-language coding model;
the word replacement prediction module uses a fully-connected neural network, the encoding expression of each word in the sample obtained by the encoding module calculates the probability of the word which is possibly replaced in the dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module;
the dialogue domain prediction module to which the sample belongs uses a fully-connected neural network, the dialogue domain to which the sample belongs is judged by the coding expression of the whole sentence of the sample obtained by the coding module, and the cross entropy loss is calculated through the dialogue domain label marked in the training corpus sorting module;
the integral model acquisition module adds the cross entropy loss obtained by the word replacement prediction module and the cross entropy loss obtained by the dialogue field prediction module to which the sample belongs to obtain the final loss;
through the final loss, performing back propagation on the integral model and updating parameters of the integral model;
the overall model in the overall model acquisition module is a cross-language coding model in the coding module, and a word replaces the whole of a fully-connected neural network in the prediction module and a fully-connected neural network in the dialogue domain prediction module to which the sample belongs;
the training module trains an integral model in the integral model acquisition module by using the processed data in the training corpus sorting module and the word replacement module;
and the downstream task fine-tuning module in the cross-language dialogue understanding field uses the whole model trained by the training module as a pre-training model, and completes the tasks in the cross-language dialogue understanding field based on the pre-training model.
2. The model pre-training system for cross-language dialogue understanding according to claim 1, wherein: the dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module and merging dialogue domain labels with the same meaning on different data sets; the specific process is as follows:
step two, sorting all data sets in the data acquisition module to have marked conversation field labels;
and step two, classifying the conversation field labels with the same meaning on different data sets into the same category through manual screening.
3. The model pre-training system for cross-language dialogue understanding according to claim 2, wherein: the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, segmenting words of the user words and the system replies respectively, and labeling a conversation field label for each sample by using the conversation field label information combined in the step two; the specific process is as follows:
step three, the dialogue understanding corpus in the data set collected in the data acquisition module is multi-turn dialogue, and each dialogue can be expressed as D ═ U1,R1,...,UN,RN};
Wherein N represents the number of dialogue rounds, U1And R1User utterances and system replies, U, representing the 1 st round of dialogue, respectivelyNAnd RNRepresenting user utterances and system replies, respectively, for the nth round of dialog;
taking the user utterance and the system reply in a round of conversation as a sample, segmenting the user utterance and the system reply respectively, and inserting a separator [ SEP ] between the user utterance and the system reply]And insert an identifier [ CLS ] at the beginning of the sentence]Is used to represent global information, resulting in a sample S { [ CLS { [],u1,u2,...,ui,[SEP],r1,r2,...,rj};
Wherein u is1And r1Representing the 1 st word, u, in the user utterance and the system reply, respectively2And r2Representing the 2 nd word, u, in the user utterance and the system reply, respectivelyiRepresenting the i-th word, r, in the user utterancejRepresenting the jth word in the system reply, i representing the length after word segmentation for the user utterance; j represents the length after the system is replied with participles;
step two, marking a conversation field label on each sample by utilizing the conversation field label information combined in the conversation field label sorting and combining module, wherein each sample marked with the conversation field label is represented as:
S={Stokens=[CLS],u1,u2,…,ui,[SEP],r1,r2,…,rj;Sdomain=d},
wherein d is the dialog domain label corresponding to the sample, StokensFor the sequence of processed input characters, S, in each sampledomainDialog realm tags for each sample.
4. A model pre-training system for cross-language dialogue understanding according to claim 3, wherein: the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module; the specific process is as follows:
by the web address https: com/facebook/MUSE downloads a dictionary that translates english to the target language.
5. The model pre-training system for cross-language dialogue understanding according to claim 4, wherein: the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English word with the word corresponding to the target language, and simultaneously keeping the original English word as a label to be predicted; the specific process is as follows:
setting the randomly selected proportion as p%;
creation SgoldensArray for storing labels to be predicted, [ PAD ]]Array pair S as placeholdergoldensCarry out initialization, i.e. Sgoldens=[PAD],…,[PAD]。
Creation SmasksArray is used for storing position information of replaced words, and all 0 array pairs S are usedmasksCarry out initialization, i.e. Smasks=0,...,0;
S on each sample after labeling dialogue field labels in training corpus sorting moduletokensGenerating a random number of 0-1 for each t, if the random number is less than p%, translating t to the corresponding word t in the randomly selected target language by using the static dictionary collected from the static dictionary determining modulexLet t be t in the samplexReplacing the position, and storing the replaced t in SgoldensAs the label to be predicted, and simultaneously using S as the labelmasksThe value of this position is 1;
t∈{t|t∈Stokens,t≠[CLS],t≠[SEP]}
an example of a sample after word substitution is
Sgoldens=[PAD],…,uk,…,rl,…,rm,…,[PAD];Smasks=0,…,1,…,1,…,1,…,0}
Wherein the content of the first and second substances,representing speech at a userU in the middle k positionkThe vocabulary of the target language after the word replacement,r representing the position of l in the system replylThe vocabulary of the target language after the word replacement,r representing m position in system recoverymAnd (5) performing word replacement on the target language vocabulary.
6.A model pre-training system for cross-language dialogue understanding according to claim 4 or 5, characterized in that: said Sgoldens、Stokens、SmasksAll have the same length, but SgoldensThere is a replaced t only at the position of the replaced word, and the other positions are [ PAD ]]Meaning that no prediction is required.
7. The cross-language dialogue understanding-oriented model pre-training system of claim 6, wherein: the encoding module obtains an encoded representation of the processed samples in the word replacement module using a cross-language encoding model; the specific process is as follows:
selecting XLM-RoBERta-base as cross-language coding model, and substituting processed S for words in word substitution moduletokensCoding is carried out to obtain the coded representation of each token
Wherein Cross _ Lingual _ Encoder is a Cross-language coding model, h[CLS]、h[sEP]Respectively represent [ CLS]And [ SEP ]]The tag is represented by a coded representation after being coded by a cross-language coding model,represents u1The coded representation after being coded by the cross-language coding model,is represented by r1And (4) representing the coded representation after the coding of the cross-language coding model.
8. The cross-language dialogue understanding-oriented model pre-training system of claim 7, wherein: the word replacement prediction module uses a fully-connected neural network, the encoding expression of each word in the sample obtained by the encoding module calculates the probability of the word which is possibly replaced in the dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module; the specific process is as follows:
eighthly, using a fully connected neural network, calculating the probability of possibly replaced words in the dictionary according to the coded representation of each word in the sample obtained by the coding module
WhereinIs the weight of the fully-connected neural network, b is the bias of the fully-connected neural network, hiFor the coded representation of the i-th position obtained in the coding module,a predicted probability for a word at the ith position;
step eight two, through S constructed in the word replacement modulegoldensAnd SmasksComputing cross-entropy loss for word replacement tasks
Wherein y is the size of the vocabulary,representing the predicted probability of the kth word at the ith position,the true label representing the k word at the ith position,cross entropy loss for i position;
9. The cross-language dialogue understanding-oriented model pre-training system of claim 8, wherein: the dialogue domain prediction module to which the sample belongs uses a fully-connected neural network, the dialogue domain to which the sample belongs is judged by the coding expression of the whole sentence of the sample obtained by the coding module, and the cross entropy loss is calculated through the dialogue domain label marked in the training corpus sorting module; the specific process is as follows:
step nine, using a fully-connected neural network, and obtaining an identifier [ CLS ] in a sample by an encoding module]Is represented by a code of[CLS]Calculating the probability of the dialogue area to which the sample belongs
and step nine, calculating the cross entropy loss of the dialogue field classification task through the dialogue field labels marked in the training corpus sorting module.
10. The cross-language dialogue understanding-oriented model pre-training system of claim 9, wherein: in the ninth step, the cross entropy loss of the dialogue field classification task is calculated through the dialogue field labels marked in the training corpus sorting module:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110667409.9A CN113312453B (en) | 2021-06-16 | 2021-06-16 | Model pre-training system for cross-language dialogue understanding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110667409.9A CN113312453B (en) | 2021-06-16 | 2021-06-16 | Model pre-training system for cross-language dialogue understanding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113312453A true CN113312453A (en) | 2021-08-27 |
CN113312453B CN113312453B (en) | 2022-09-23 |
Family
ID=77379146
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110667409.9A Active CN113312453B (en) | 2021-06-16 | 2021-06-16 | Model pre-training system for cross-language dialogue understanding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113312453B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115455981A (en) * | 2022-11-11 | 2022-12-09 | 合肥智能语音创新发展有限公司 | Semantic understanding method, device, equipment and storage medium for multi-language sentences |
CN116628160A (en) * | 2023-05-24 | 2023-08-22 | 中南大学 | Task type dialogue method, system and medium based on multiple knowledge bases |
CN116805004A (en) * | 2023-08-22 | 2023-09-26 | 中国科学院自动化研究所 | Zero-resource cross-language dialogue model training method, device, equipment and medium |
CN117149987A (en) * | 2023-10-31 | 2023-12-01 | 中国科学院自动化研究所 | Training method and device for multilingual dialogue state tracking model |
CN117648430A (en) * | 2024-01-30 | 2024-03-05 | 南京大经中医药信息技术有限公司 | Dialogue type large language model supervision training evaluation system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108960317A (en) * | 2018-06-27 | 2018-12-07 | 哈尔滨工业大学 | Across the language text classification method with Classifier combination training is indicated based on across language term vector |
CN109213851A (en) * | 2018-07-04 | 2019-01-15 | 中国科学院自动化研究所 | Across the language transfer method of speech understanding in conversational system |
CN111326138A (en) * | 2020-02-24 | 2020-06-23 | 北京达佳互联信息技术有限公司 | Voice generation method and device |
-
2021
- 2021-06-16 CN CN202110667409.9A patent/CN113312453B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108960317A (en) * | 2018-06-27 | 2018-12-07 | 哈尔滨工业大学 | Across the language text classification method with Classifier combination training is indicated based on across language term vector |
CN109213851A (en) * | 2018-07-04 | 2019-01-15 | 中国科学院自动化研究所 | Across the language transfer method of speech understanding in conversational system |
CN111326138A (en) * | 2020-02-24 | 2020-06-23 | 北京达佳互联信息技术有限公司 | Voice generation method and device |
Non-Patent Citations (1)
Title |
---|
DECHUAN TENG等: "《INJECTING WORD INFORMATION WITH MULTI-LEVEL WORD ADAPTER FOR CHINESE SPOKEN LANGUAGE UNDERSTANDING》", 《2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115455981A (en) * | 2022-11-11 | 2022-12-09 | 合肥智能语音创新发展有限公司 | Semantic understanding method, device, equipment and storage medium for multi-language sentences |
CN115455981B (en) * | 2022-11-11 | 2024-03-19 | 合肥智能语音创新发展有限公司 | Semantic understanding method, device and equipment for multilingual sentences and storage medium |
CN116628160A (en) * | 2023-05-24 | 2023-08-22 | 中南大学 | Task type dialogue method, system and medium based on multiple knowledge bases |
CN116628160B (en) * | 2023-05-24 | 2024-04-19 | 中南大学 | Task type dialogue method, system and medium based on multiple knowledge bases |
CN116805004A (en) * | 2023-08-22 | 2023-09-26 | 中国科学院自动化研究所 | Zero-resource cross-language dialogue model training method, device, equipment and medium |
CN116805004B (en) * | 2023-08-22 | 2023-11-14 | 中国科学院自动化研究所 | Zero-resource cross-language dialogue model training method, device, equipment and medium |
CN117149987A (en) * | 2023-10-31 | 2023-12-01 | 中国科学院自动化研究所 | Training method and device for multilingual dialogue state tracking model |
CN117149987B (en) * | 2023-10-31 | 2024-02-13 | 中国科学院自动化研究所 | Training method and device for multilingual dialogue state tracking model |
CN117648430A (en) * | 2024-01-30 | 2024-03-05 | 南京大经中医药信息技术有限公司 | Dialogue type large language model supervision training evaluation system |
CN117648430B (en) * | 2024-01-30 | 2024-04-16 | 南京大经中医药信息技术有限公司 | Dialogue type large language model supervision training evaluation system |
Also Published As
Publication number | Publication date |
---|---|
CN113312453B (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113312453B (en) | Model pre-training system for cross-language dialogue understanding | |
CN108614875B (en) | Chinese emotion tendency classification method based on global average pooling convolutional neural network | |
CN106202010B (en) | Method and apparatus based on deep neural network building Law Text syntax tree | |
CN110969020B (en) | CNN and attention mechanism-based Chinese named entity identification method, system and medium | |
CN109635280A (en) | A kind of event extraction method based on mark | |
CN107729309A (en) | A kind of method and device of the Chinese semantic analysis based on deep learning | |
CN106980608A (en) | A kind of Chinese electronic health record participle and name entity recognition method and system | |
CN110083700A (en) | A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks | |
CN109960728B (en) | Method and system for identifying named entities of open domain conference information | |
CN109359293A (en) | Mongolian name entity recognition method neural network based and its identifying system | |
CN110851599B (en) | Automatic scoring method for Chinese composition and teaching assistance system | |
CN112733866A (en) | Network construction method for improving text description correctness of controllable image | |
CN110046356B (en) | Label-embedded microblog text emotion multi-label classification method | |
CN111222318B (en) | Trigger word recognition method based on double-channel bidirectional LSTM-CRF network | |
CN110263325A (en) | Chinese automatic word-cut | |
CN108563725A (en) | A kind of Chinese symptom and sign composition recognition methods | |
CN111523420A (en) | Header classification and header list semantic identification method based on multitask deep neural network | |
CN110472245A (en) | A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks | |
CN110222338A (en) | A kind of mechanism name entity recognition method | |
CN113128232A (en) | Named entity recognition method based on ALBERT and multi-word information embedding | |
CN109977402A (en) | A kind of name entity recognition method and system | |
CN109446523A (en) | Entity attribute extraction model based on BiLSTM and condition random field | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
CN113312918B (en) | Word segmentation and capsule network law named entity identification method fusing radical vectors | |
CN114003700A (en) | Method and system for processing session information, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |