CN113312453A - Model pre-training system for cross-language dialogue understanding - Google Patents

Model pre-training system for cross-language dialogue understanding Download PDF

Info

Publication number
CN113312453A
CN113312453A CN202110667409.9A CN202110667409A CN113312453A CN 113312453 A CN113312453 A CN 113312453A CN 202110667409 A CN202110667409 A CN 202110667409A CN 113312453 A CN113312453 A CN 113312453A
Authority
CN
China
Prior art keywords
module
dialogue
word
language
cross
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110667409.9A
Other languages
Chinese (zh)
Other versions
CN113312453B (en
Inventor
车万翔
李祺欣
覃立波
刘挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202110667409.9A priority Critical patent/CN113312453B/en
Publication of CN113312453A publication Critical patent/CN113312453A/en
Application granted granted Critical
Publication of CN113312453B publication Critical patent/CN113312453B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3337Translation of the query language, e.g. Chinese to English
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Abstract

The invention relates to a model pre-training system for cross-language dialogue understanding. The invention aims to solve the problems that in the existing cross-language dialogue understanding scene, due to the fact that corpus of a small language is scarce, the model training effect is limited, an accurate dialogue understanding system cannot be obtained, and accurate reply cannot be completed to user words. A model pre-training system for cross-language dialogue understanding comprises: the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module. The invention is used in the field of cross-language dialog understanding.

Description

Model pre-training system for cross-language dialogue understanding
Technical Field
The invention relates to a model pre-training system for cross-language dialogue understanding, relates to a cross-language model pre-training system in the field of natural language processing, and relates to a dialogue understanding model training system in the field of natural language processing.
Background
Currently, a man-machine conversation system becomes a leading research hotspot in the industry due to the huge use value and prospect of the man-machine conversation system. Indeed, in the last 60 s, professor Joseph Weizonbaum of the university of Engineers in Massachusetts has begun to develop a human-machine dialog system Eliza (Weizonbaum J. ELIZA-a computer program for the student of natural language communication between human and machine [ J ]. Communications of the ACM,1966,9(1):36-45.) that is able to mimic the responses of psychotherapists and provide assistance to patients with psychological illnesses. In the years that follow, man-machine dialog systems for various purposes have also been developed due to the rapid development of natural language processing (Chowdhury G. Natural language processing [ J ]. Annual review of information science and technology,2003,37(1):51-89.) and deep learning (LeCun Y, Bengio Y, Hinton G. deep learning [ J ]. nature,2015,521(7553): 436-. The most prominent module behind these human-machine dialog systems is the dialog understanding system.
The dialog understanding system is able to understand the user's intentions and give corresponding replies and help, such as weather inquiries, airline reservations, ordering, device control for smart homes, voice control for car-mounted devices, etc. At present, the industry has many conversation understanding systems applied to mobile phones or smart home devices, but most of them are only adapted to languages such as chinese and english, which have wide application range. Similarly, Pre-training of a model of the conversational understanding system by researchers in academia (Wu C S, Hoi S, Socher R, et al. Tod-bert: Pre-trained natural language understanding for task-oriented languages [ J ]. arXiv preprinting arXiv:2004.06871,2020.) is also limited to English, and is rarely studied in cross-language scenarios. The important reason for this situation is that because the corpus is scarce in the field of conversational understanding labeled in the Chinese language, how to effectively utilize the existing conversational understanding corpus to assist training in the cross-language scene is a problem that needs to be solved at present.
Disclosure of Invention
The invention aims to solve the problems that in the existing cross-language dialogue understanding scene, due to the fact that corpus of a small language is scarce, model training effect is limited, an accurate dialogue understanding system cannot be obtained, and accurate reply cannot be completed to user words, and provides a cross-language dialogue understanding-oriented model pre-training system.
A model pre-training system for cross-language dialogue understanding comprises:
the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module;
the data acquisition module is used for collecting an English data set in the labeled dialogue understanding field;
the dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module and merging dialogue domain labels with the same meaning on different data sets;
the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, respectively segmenting the user words and the system replies, and simultaneously labeling a conversation field label for each sample by utilizing conversation field label information combined in the conversation field label sorting and combining module;
the target language determining module is used for determining a target language;
the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module;
the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English word with the word corresponding to the target language, and simultaneously keeping the original English word as a label to be predicted;
the coding module obtains a coded representation of the processed sample in the word replacement module by using a cross-language coding model;
the word replacement prediction module uses a fully-connected neural network, the encoding expression of each word in the sample obtained by the encoding module calculates the probability of the word which is possibly replaced in the dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module;
the dialogue domain prediction module to which the sample belongs uses a fully-connected neural network, the dialogue domain to which the sample belongs is judged by the coding expression of the whole sentence of the sample obtained by the coding module, and the cross entropy loss is calculated through the dialogue domain label marked in the training corpus sorting module;
the integral model acquisition module adds the cross entropy loss obtained by the word replacement prediction module and the cross entropy loss obtained by the dialogue field prediction module to which the sample belongs to obtain the final loss;
through the final loss, performing back propagation on the integral model and updating parameters of the integral model;
the overall model in the overall model acquisition module is a cross-language coding model in the coding module, and a word replaces the whole of a fully-connected neural network in the prediction module and a fully-connected neural network in the dialogue domain prediction module to which the sample belongs;
the training module trains an integral model in the integral model acquisition module by using the processed data in the training corpus sorting module and the word replacement module;
and the downstream task fine-tuning module in the cross-language dialogue understanding field uses the whole model trained by the training module as a pre-training model, and completes the tasks in the cross-language dialogue understanding field based on the pre-training model.
The invention has the beneficial effects that:
the invention provides a model pre-training system for cross-language dialogue understanding, which does not depend on cross-language labeled dialogue understanding data and can pre-train a dialogue understanding model in a cross-language scene only by utilizing the existing English data. In addition, the invention designs a self-supervision task, and utilizes a dictionary to automatically label, so that the model can learn the mapping relation between English words and words of other languages which are translation pairs in the pre-training process, thereby improving the overall expression between other languages and English on the pre-training model. Particularly, the invention also summarizes the dialogue domain labels in different English dialogue understanding data sets, and trains the model by using the labeled information, so that the model can learn the special knowledge of the dialogue understanding domain in the pre-training process. The method solves the problems that in the existing cross-language dialogue understanding scene, due to the fact that corpus of a small language is scarce, model training effect is limited, an accurate dialogue understanding system cannot be obtained, and accurate reply cannot be completed to user words.
The invention evaluates the data set of a conversational language understanding task in ten small languages of Arabic, German, Spanish, French, Italian, Malaysia, Polish, Russian, Thai and Turkish, which covers the two most classical subtasks in the field of conversational understanding: intent recognition and slot extraction. Experimental results show that the model pre-trained by the method can obtain a better result than a baseline model when a downstream task is trained.
The present invention trains the dialogue language understanding data set on the ten languages by using five random seeds respectively, takes the average result under the five random seeds as the current result, and compares the average result on the ten languages. The model pre-trained by the method has the intention recognition accuracy rate of 93.73 percent, is improved by 4.17 percent compared with a baseline model, has the slot position extraction F1 value of 66.80 percent, is improved by 3.03 percent compared with the baseline model, has the intention and slot position prediction overall accuracy rate of 38.01 percent, and is improved by 3.6 percent compared with the baseline model. The method has great improvement on various indexes, which also shows that the system provided by the invention is very effective for pre-training the cross-language dialogue understanding model.
Drawings
FIG. 1 is a mulberry diagram of conversation domain label summary categorization results for multiple conversation understanding datasets.
Detailed Description
The first embodiment is as follows: the model pre-training system for cross-language dialogue understanding includes:
the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module;
the data acquisition module is used for collecting an English data set in the labeled dialogue understanding field;
8 industry comparative classical public English dialog understanding datasets were collected including CamRest676(Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milicasic, Lina M Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young, 2016.A network end-to-end reliable task-oriented dialog system. arXiv preprintiv: 1604.04562.), WOZ (Nikola)
Figure BDA0003117425670000041
Diarmuid O Séaghdha,Tsung-Hsien Wen,Blaise Thomson,and Steve Young.2016.Neural belief tracker:Data-driven dialogue state tracking.arXiv preprint arXiv:1606.03777.)、SMD(Mihail Eric and Christopher D Manning.2017.Keyvalue retrieval networks for task-oriented dialogue.arXiv preprint arXiv:1705.05414.)、MSR-E2E(Xiujun Li,Sarah Panda,JJ(Jingjing)Liu,and Jianfeng Gao.2018.Microsoft dialogue challenge:Building end-to-end task-completion dialogue systems.In SLT 2018.)、Taskmaster(Bill Byrne,Karthik Krishnamoorthi,Chinnadhurai Sankar,Arvind Neelakantan,Daniel Duckworth,Semih Yavuz,Ben Goodrich,Amit Dubey,Andy Cedilnik,and Kyu-Young Kim.2019.Taskmaster-1:Toward a realistic and diverse dialog dataset.arXiv preprint arXiv:1909.05358.)、Schema(Abhinav Rastogi,Xiaoxue Zang,Srinivas Sunkara,Raghav Gupta,and Pranav Khaitan.2019.Towards scalable multi-domain conversational agents:The schema-guided dialogue dataset.arXiv preprint arXiv:1909.05855.)、MetaLWOZ(Sungjin Lee,Hannes Schulz,Adam Atkinson,Jianfeng Gao,Kaheer Suleman,Layla El Asri,Mahmoud Adada,Minlie Huang,Shikhar Sharma,Wendy Tay,and Xiujun Li.2019.Multi-domain task-completion dialog challenge.In Dialog System Technology Challenges 8.)、MultiWOZ(
Figure BDA0003117425670000043
Budzianowski,Tsung-Hsien Wen,Bo-Hsiang Tseng,Inigo Casanueva,Stefan Ultes,Osman Ramadan,and
Figure BDA0003117425670000042
2018.Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling.arXiv preprint arXiv:1810.00278.)。
The dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module (such as weather inquiry, scheduled flight, meal ordering, intelligent household equipment control, vehicle-mounted equipment voice control and the like), and merging dialogue domain labels with the same meaning on different data sets (such as weather inquiry, scheduled flight, meal ordering, intelligent household equipment control, vehicle-mounted equipment voice control and the like);
the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, respectively segmenting the user words and the system replies according to blank spaces, and labeling a conversation field label for each sample by utilizing conversation field label information combined in the conversation field label sorting and combining module;
the target language determining module is used for determining a target language in the pre-training process according to the research current situations in the academic world and the industry and the use range and frequency of each international language;
we manually selected 10 representative languages among the languages used in various countries, including: arabic, German, Spanish, French, Italian, Malaysia, Polish, Russian, Thai, Turkish;
the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module;
the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English words with the words corresponding to the target language, and simultaneously keeping original English words (randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module) as labels to be predicted;
the coding module obtains a coded representation of the processed sample in the word replacement module by using a cross-language coding model;
the word replacement prediction module uses a full-connection neural network (the full-connection neural networks of the word replacement prediction module and the dialogue domain prediction module to which the sample belongs are different and have different parameters), the coding of each word in the sample obtained by the coding module represents the probability of the word which is possibly replaced in the calculation dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module;
the sample affiliated dialogue field prediction module uses a full-connection neural network (the full-connection neural networks of the word replacement prediction module and the sample affiliated dialogue field prediction module are different and have different parameters), the whole sentence of the sample obtained by the coding module (one sample is a user utterance and a system reply in a round of dialogue, one sample comprises a plurality of words, the plurality of words form a sentence, and one sample is a whole sentence) is coded and expressed to judge the dialogue field to which the sample belongs, and the cross entropy loss is calculated through dialogue field labels marked in the training corpus sorting module;
the integral model acquisition module adds the cross entropy loss obtained by the word replacement prediction module and the cross entropy loss obtained by the dialogue field prediction module to which the sample belongs to obtain the final loss;
through the final loss, performing back propagation on the integral model and updating parameters of the integral model;
the overall model in the overall model acquisition module is a cross-language coding model in the coding module, and a word replaces the whole of a fully-connected neural network in the prediction module and a fully-connected neural network in the dialogue domain prediction module to which the sample belongs;
the training module trains an integral model in the integral model acquisition module by using the processed data in the training corpus sorting module and the word replacement module;
the downstream task fine tuning module in the cross-language dialogue understanding field uses the whole model trained by the training module as a pre-training model, and completes tasks in the cross-language dialogue understanding field based on the pre-training model;
tasks within the domain of cross-language dialog understanding include: cross-language dialog language understanding (Cross-language dialog language understanding), Cross-language Intent recognition (Cross-language Intent detection), Cross-language dialog state tracking (Cross-language dialog state tracking), Cross-language dialog behavior prediction (Cross-language dialog behavior prediction), Cross-language reply selection (Cross-language Response selection), and the like. The parameters of the pre-training model are respectively used as the initialization parameters of a BERT architecture-based cross-language dialogue language understanding model, a BERT architecture-based cross-language intention recognition model, a BERT architecture-based cross-language dialogue state tracking model, a BERT architecture-based cross-language dialogue behavior prediction model, a BERT architecture-based cross-language reply selection model and other models, the BERT architecture-based cross-language dialogue language understanding model, the BERT architecture-based cross-language intention recognition model, the BERT architecture-based cross-language dialogue state tracking model, the BERT architecture-based cross-language dialogue behavior prediction model, the BERT architecture-based cross-language reply selection model and other models are respectively trained to respectively obtain a trained BERT architecture-based cross-language dialogue language understanding model, a BERT architecture-based cross-language intention recognition model, a BERT architecture-based cross-language state tracking model, a BERT architecture-based cross-language reply selection model and other models, The method comprises the steps of generating a model of cross-language dialogue behavior prediction based on a BERT architecture, generating a model of cross-language reply selection based on the BERT architecture and the like, and accordingly completing tasks of cross-language dialogue language understanding, cross-language intention recognition, cross-language dialogue state tracking, cross-language dialogue behavior prediction, cross-language reply selection and the like.
The second embodiment is as follows: the difference between the present embodiment and the specific embodiment is that the dialogue domain tag sorting and merging module is configured to sort dialogue domain tags marked on all data sets in the data acquisition module (for example, weather inquiry, scheduled flight, meal ordering, device control of smart home, voice control of vehicle-mounted devices, and the like), and merge dialogue domain tags having the same meaning on different data sets (for example, weather inquiry, scheduled flight, meal ordering, device control of smart home, voice control of vehicle-mounted devices, and the like); the specific process is as follows:
step two, arranging dialogue domain labels marked on all data sets in the data acquisition module, wherein 1 dialogue domain label is arranged in CamRest676, 1 dialogue domain label is arranged in WOZ, 3 dialogue domain labels are arranged in SMD, 3 dialogue domain labels are arranged in MSR-E2E, 6 dialogue domain labels are arranged in Taskmaster, 17 dialogue domain labels are arranged in Schema, 47 dialogue domain labels are arranged in MetaLWOZ, and 6 dialogue domain labels are arranged in MultiWOZ;
and step two, classifying the conversation field labels with the same meaning on different data sets into the same category through manual screening, wherein the classification result is shown in figure 1. In fig. 1, the left text represents the name of the data set and the number of samples therein, the right text represents the name of the classified dialogue area and the number of samples contained therein, the sum of the numbers of the two sides is equal, the arc line connecting the left side and the right side represents that a part of the samples in the data set shown on the left side corresponds to the dialogue area label shown on the right side, and the width of the arc line represents the ratio of the part of the samples in all the samples. The 8 data sets are collated and gathered to form 59 conversation field labels.
Other steps and parameters are the same as those in the first embodiment.
The third concrete implementation mode: the difference between the embodiment and the specific embodiment is that the corpus training arrangement module is used for segmenting dialogue corpora in all data sets collected by the data acquisition module, taking user words and system replies in a round of dialogue as a sample, segmenting the user words and the system replies according to blank spaces, and labeling dialogue domain labels on each sample by using dialogue domain label information combined in the step two; the specific process is as follows:
step three, the dialogue understanding corpus in the data set collected in the data acquisition module is multi-turn dialogue, and each dialogue can be expressed as D ═ U1,R1,…,UN,RN};
Wherein N represents the number of dialogue rounds, U1And R1User utterances and system replies, U, representing the 1 st round of dialogue, respectivelyNAnd RNRepresenting user utterances and system replies, respectively, for the nth round of dialog;
taking the user's words and system replies in one turn as a sample, and dividing words for them according to blank space, inserting separator [ SEP ] between them]And insert an identifier [ CLS ] at the beginning of the sentence]Is used to represent global information, resulting in a sample S { [ CLS { [],u1,u2,…,ui,[SEP],r1,r2,…,rj};
Wherein u is1And r1Representing the 1 st word, u, in the user utterance and the system reply, respectively2And r2Representing the 2 nd word, u, in the user utterance and the system reply, respectivelyiRepresenting the i-th word, r, in the user utterancejRepresenting the jth word in the system reply, i representing the length after word segmentation for the user utterance; j represents the length after the system is replied with participles;
step two, marking a conversation field label on each sample by utilizing the conversation field label information combined in the conversation field label sorting and combining module (because part of the samples collected in the step 1 are marked to a plurality of conversation field labels, the invention only considers the condition of a single conversation field label, therefore, if the sample belongs to a plurality of conversation fields after being marked, the sample is ignored), and each sample marked with the conversation field label is expressed as:
S={Stokens=[CLS],u1,u2,…,ui,[SEP],r1,r2,…,rj;Sdomain=d},
wherein d is the dialog domain label corresponding to the sample, StokensFor each sample the sequence of processed input characters, tokens (the words are those u1,u2,r1,r2The characters, in addition to words, also include [ CLS]And [ SEP ]]),SdomainA dialogue domain label for each sample;
the finished samples amounted to 457555;
other steps and parameters are the same as those in the first or second embodiment.
The fourth concrete implementation mode: the difference between this embodiment and the first to third embodiments is that the static dictionary determining module is configured to collect, according to the target language determined by the target language determining module, static dictionaries translated from english vocabulary to each target language; the specific process is as follows:
the dictionary translated from English to target language is downloaded through the website https:// github.
Other steps and parameters are the same as those in one of the first to third embodiments.
The fifth concrete implementation mode: the embodiment is different from the first to the fourth specific embodiments in that the word replacement module is configured to randomly select an english word with a certain proportion on each sample labeled with a dialogue field tag in the training corpus sorting module, randomly select a language from the target language determined by the target language determination module for each randomly selected word, translate each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determination module, replace the english word with a word corresponding to the target language, and simultaneously retain an original english word (an english word with a certain proportion is randomly selected on each sample labeled with a dialogue field tag in the training corpus sorting module) as a tag to be predicted; the specific process is as follows:
setting the randomly selected proportion as p% (15%);
at the same time, create SgoldensThe array is used to store the model label to be predicted (gold), and StokensOf the same length with [ PAD]Array pair S as placeholdergoldensCarry out initialization, i.e. Sgoldens=[PAD],…,[PAD]。
In addition, create SmasksArray for storing position information of the replaced words of the model, and StokensAll 0 array pairs S of the same lengthmasksCarry out initialization, i.e. Smasks=0,…,0;
S on each sample after labeling dialogue field labels in training corpus sorting moduletokensGenerating a random number of 0-1 for each t, if the random number is less than p%, randomly selecting target language from 10 languages determined in the target language determining module at equal probability, and translating t to the corresponding word t in the randomly selected target language by using the static dictionary collected in the static dictionary determining modulexLet t be t in the samplexReplacing the position, and storing the replaced t in SgoldensAs the label to be predicted, and simultaneously using S as the labelmasksThe value of this position is 1;
t∈{t|t∈Stokens,t≠[CLS],t≠[SEP]}
an example of a sample after word substitution is
Figure BDA0003117425670000081
Sgoldens=[PAD],…,uk,…,rl,…,rm,…,[PAD];Smasks=0,…,1,…,1,…,1,…,0}
Wherein the content of the first and second substances,
Figure BDA0003117425670000082
u representing k position in user utterancekThe vocabulary of the target language after the word replacement,
Figure BDA0003117425670000083
r representing the position of l in the system replylThe vocabulary of the target language after the word replacement,
Figure BDA0003117425670000084
r representing m position in system recoverymAnd (5) performing word replacement on the target language vocabulary.
Other steps and parameters are the same as in one of the first to fourth embodiments.
The sixth specific implementation mode: this embodiment is different from one of the first to fifth embodiments in that Sgoldens、Stokens、SmasksAll have the same length, but SgoldensThere is a replaced t only at the position of the replaced word, and the other positions are [ PAD ]]Meaning that no prediction is required.
Other steps and parameters are the same as those in one of the first to fifth embodiments.
The seventh embodiment: the difference between this embodiment and one of the first to sixth embodiments is that the encoding module uses a cross-language encoding model to obtain an encoded representation of the processed samples in the word replacement module; the specific process is as follows:
XLM-RoBERTA-base (Conneau A, Khandelwal K, Goyal N, et al, Unstupervised cross-linear representation learning at scale [ J ] is selected]arXiv preprint arXiv:1911.02116,2019.) as a cross-language coding model, for a pass throughS processed by word replacement in word replacement moduletokensCoding is carried out to obtain the coded representation of each token
Figure BDA0003117425670000091
Wherein Cross _ Lingual _ Encoder is a Cross-language coding model, h[CLS]、h[SEP]Respectively represent [ CLS]And [ SEP ]]The tag is represented by a coded representation after being coded by a cross-language coding model,
Figure BDA0003117425670000092
represents u1The coded representation after being coded by the cross-language coding model,
Figure BDA0003117425670000093
is represented by r1And (4) representing the coded representation after the coding of the cross-language coding model.
Other steps and parameters are the same as those in one of the first to sixth embodiments.
The specific implementation mode is eight: the embodiment is different from the first to seventh embodiments in that the word replacement prediction module uses a fully-connected neural network (the fully-connected neural networks of the word replacement prediction module and the dialogue domain prediction module to which the sample belongs are different and have different parameters), the encoding of each word in the sample obtained by the encoding module represents the probability of the word which is possibly replaced in the calculation dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module; the specific process is as follows:
eighthly, using a fully connected neural network, calculating the probability of possibly replaced words in the dictionary according to the coded representation of each word in the sample obtained by the coding module
Figure BDA0003117425670000094
Wherein
Figure BDA0003117425670000095
Is the weight of the fully-connected neural network, b is the bias of the fully-connected neural network, hiTo weave intoCoded representation of the i-th position, z, obtained in the code moduleiThe predicted probability for the word at the ith position (the representation of the word at the 1 st position is z)1At the x-th position is zxThe following word replacement task is to predict the replaced word for each position word respectively);
step eight two, through S constructed in the word replacement modulegoldensAnd SmasksCalculating cross entropy loss of Word Replacement task (WR)
Figure BDA0003117425670000101
Wherein V is the size of the vocabulary, zi,kRepresenting the predicted probability of the kth word at the ith position,
Figure BDA0003117425670000102
a true tag (0 or 1, where 1 denotes the kth word and S) to the kth word at the ith positiongoldensThe words in the ith position are consistent, otherwise, the words are 0),
Figure BDA0003117425670000103
cross entropy loss for i position;
Figure BDA0003117425670000104
wherein i is SmasksThe position of the replaced word stored in, Smasks[i]Denotes SmasksThe value of the i-th position in (c),
Figure BDA0003117425670000105
is the sum of the losses over the positions of all replaced words.
Other steps and parameters are the same as those in one of the first to seventh embodiments.
The specific implementation method nine: the embodiment is different from one to eight of the specific embodiments in that the sample belonging dialogue domain prediction module uses a fully-connected neural network (the fully-connected neural networks of the word replacement prediction module and the sample belonging dialogue domain prediction module are different and have different parameters), the coding expression of a sample whole sentence obtained by the coding module (one sample is a user utterance and a system reply in a round of dialogue, one sample comprises a plurality of words, the plurality of words form a sentence, and one sample is a whole sentence) judges the dialogue domain to which the sample belongs, and the cross entropy loss is calculated through a dialogue domain label marked in the training corpus sorting module; the specific process is as follows:
step nine, using a fully-connected neural network, and obtaining an identifier [ CLS ] in a sample by an encoding module]Is represented by a code of[CLS]Calculating the probability of the dialogue area to which the sample belongs
Figure BDA0003117425670000108
Wherein
Figure BDA0003117425670000109
B' is the bias of the fully-connected neural network;
and step nine, calculating the cross entropy loss of a dialogue Domain classification task (Domain Classifier, DC for short) through the dialogue Domain label marked in the training corpus sorting module.
Other steps and parameters are the same as those in one to eight of the embodiments.
The detailed implementation mode is ten: the difference between this embodiment and the first to ninth embodiments is that, in the ninth step, the cross entropy loss of the Domain classification task (Domain Classifier, abbreviated as DC) is calculated by the dialog Domain label marked in the corpus sorting module:
Figure BDA0003117425670000106
wherein D is the number of the conversation field tags collected in the conversation field tag sorting and merging module, ziFor the ith dialogue domain mark in' is zThe predicted probability of a signal is determined,
Figure BDA0003117425670000107
true tag for ith dialogue domain for current sample (0 or 1, where 1 denotes the ith dialogue domain tag and SdomainConsistent, otherwise 0).
Other steps and parameters are the same as those in one of the first to ninth embodiments.
The following examples were used to demonstrate the beneficial effects of the present invention:
the first embodiment is as follows:
the embodiment selects a dialogue language understanding task as a downstream task in the dialogue understanding field, and gives a dialogue language understanding data set in the cross-language field, the task is to classify the intention of a dialogue language and extract corresponding slots in a sentence, and the task is specifically prepared according to the following steps:
collecting English dialogue language understanding data, translating the English dialogue language understanding data into a cross-language field, and labeling translated texts for training, verifying and testing models;
we downloaded the SNIPS dataset (Alice Cocke, AlaasAade, Adrien Ball, Th 'eodoreBluche, Alexandre Caulier, David Leroy, Cl' ementdouumouro, Thibault Gisslbercht, France co Caltagarone, Thibaut Lavril, et al.2018.SNIPS voice platform: an embedded spoke mapping arrangement system for private-by-design voice interface. arXiv print arXiv:1805.10190.), split 700 bar samples from its validation set (100 bars per intent, total 7 intents), split into two halves (350 bars for total 7 intents, 50 bars per intent), and simultaneously extract a random set of test bars from its validation set (350 bars per test bar, total 50 bars per test bar).
For the extracted training, verification and test set (1050 samples in total), the expert is requested to respectively translate the extracted training, verification and test set into Arabic, German, Spanish, French, Italian, Malaysia, Polish, Russian, Thai and Turkish languages, 10 languages in total, and the slot positions in the extracted training, verification and test set are marked again while the original intention labels of the extracted training, verification and test set are kept for the model.
Setting a baseline cross-language pre-training model;
XLM-RoBERTA-base was chosen as the baseline cross-language pre-training model for this example.
Step three, setting a dialogue language understanding task model architecture;
our model uses a whole model with a pipeline as an architecture. The overall model is composed of two models, namely an intention classification model and a slot extraction model.
Step four, training an intention classification model;
step four, obtaining the coding representation of the sample by using the cross-language pre-training model
Figure BDA0003117425670000111
Where Input is the Input sample, k is the sample length, h[CLS]Is a sample [ CLS]The coded representation at the label is represented by,
Figure BDA0003117425670000112
for the coded representation of the first word in the sample,
Figure BDA0003117425670000121
is the coded representation of the kth word in the sample;
step four, using a full-connection neural network to calculate the probability of the intention label of the current sample
Figure BDA0003117425670000122
Wherein
Figure BDA0003117425670000123
B is the weight of the fully-connected neural network, b is the bias of the fully-connected neural network;
step four and step three, calculating cross entropy loss through the prediction probability in the step four and the step two
Figure BDA0003117425670000124
Wherein I is the number of intents summarized in step two,
Figure BDA0003117425670000125
for the true label of the current sample to the i-th intention (0 or 1, where 1 means that the i-th intention label is the golden label of the sample, and vice versa is 0), ziA predicted probability for the model for the ith intention tag;
fourthly, performing back propagation through the loss calculated in the fourth step and the third step and updating model parameters;
step five, training a slot position extraction model;
step five, obtaining coded representation of samples by using cross-language pre-training model
Figure BDA0003117425670000126
Where Input is the Input sample, k is the sample length, h[CLS]Is a sample [ CLS]The coded representation at the label is represented by,
Figure BDA0003117425670000127
for the coded representation of the first word in the sample,
Figure BDA0003117425670000128
is the coded representation of the kth word in the sample;
step two, a full-connection neural network is respectively created for each intention, and the probability of the slot position label of each token position of the current sample is predicted through the golden label of the intention in the sample
Figure BDA0003117425670000129
Wherein
Figure BDA00031174256700001210
Weight of the fully-connected neural network corresponding to the ith intention label, biBias of fully-connected neural network for ith intention tag, hkFor coded representation of words at k positions in the sample, zkPredicting the probability of the slot position of the word after the word passes through the model;
step five and step three, calculating cross entropy loss through the prediction probability in the step five and step two
Figure BDA00031174256700001211
Where L is the sample length, SiThe number of slot tags corresponding to the ith intention tag,
Figure BDA00031174256700001212
for the true tag of the current sample k position to the s-th slot (0 or 1, where 1 means that the s-th slot tag is the golden slot tag of the sample k position, and vice versa is 0), zk,sPredicting probability of the current sample k position model to the s-th slot position label;
fifthly, performing back propagation through the loss calculated in the fifth step and the third step and updating model parameters;
the training processes of the intention classification model and the slot extraction model in the fourth step and the fifth step are mutually independent, and the two trained models form the integral model in the third step.
Predicting a final result and calculating an index;
sixthly, predicting a final result;
and secondly, predicting the slot position label on the sample by using the fully-connected neural network corresponding to the slot position extraction model trained in the fifth step by using the prediction result of the intention classification model.
Sixthly, calculating indexes;
let the number of mean predictions correct for all samples be CIntentIf the total number of samples is A, the Intent recognition accuracy (Intent Acc) is
Figure BDA0003117425670000131
Assuming that the number of correct Slot tag predictions is TP, the number of incorrect predictions is FP, and the number of unpredicted slots is FN in all tokens of all samples, the calculation method of Slot extraction F1 value (Slot F1) is as follows:
Figure BDA0003117425670000132
Figure BDA0003117425670000133
Figure BDA0003117425670000134
let C be the number of all samples for which the intent and all slots are predicted correctlyOverallWhen the total number of samples is A, the Overall recognition accuracy (Overall Acc) is
Figure BDA0003117425670000135
In order to balance result fluctuation caused by less test data, 5 different random seeds are selected from a training set for experiment, the average value of each index of each language under the 5 random seeds is counted, and finally the average experiment result of 10 languages is reported.
The final experimental results on the test set are shown in table 1.
TABLE 1 average experimental results of conversational language understanding tasks in Ten-door languages
Figure BDA0003117425670000136
Figure BDA0003117425670000141
The best results are shown in bold in the table.
Where the first row of experimental results shows our experimental results on the baseline model.
The second row shows the experimental results of a model pre-training system oriented to cross-language dialogue understanding according to the present invention.
The third row shows the experimental results of the word replacement method in the above-mentioned scheme of the present invention changed to Masked Language Model (Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep biological transformations for Language integrity [ J ]. arXiv preprintiv: 1810.04805,2018.).
The fourth row shows the experimental results after the dialogue domain classification fully-connected neural network in the scheme of the invention is removed.
The fifth row shows the experimental results after the word replacement method in the above scheme of the present invention is changed into Masked Language Model and the classified fully-connected neural network in the dialogue domain is removed.
As can be seen from the experimental results of the ablation experiments in the third, fourth and fifth rows of Table 1, all parts in the scheme of the present invention are indispensable, and the combined training of the word replacement model and the classification model in the dialogue domain can make the model effect better.
As can be seen from Table 1, the intention recognition accuracy of the cross-language dialogue understanding pre-training model trained by the method is improved by 4.17% compared with that of the baseline model, the slot extraction F1 value is improved by 3.03% compared with that of the baseline model, and the accuracy of the whole intention and slot prediction is improved by 3.60% compared with that of the baseline model. The method also proves that the overall effect of the cross-language dialogue understanding model can be remarkably improved.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims (10)

1. A model pre-training system for cross-language conversational understanding, characterized by: the system comprises:
the system comprises a data acquisition module, a dialogue field label sorting and merging module, a training corpus sorting module, a target language type determining module, a static dictionary determining module, a word replacing module, a coding module, a word replacing and predicting module, a sample belonging dialogue field predicting module, an integral model acquiring module, a training module and a cross-language dialogue understanding field downstream task fine-tuning module;
the data acquisition module is used for collecting an English data set in the labeled dialogue understanding field;
the dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module and merging dialogue domain labels with the same meaning on different data sets;
the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, respectively segmenting the user words and the system replies, and simultaneously labeling a conversation field label for each sample by utilizing conversation field label information combined in the conversation field label sorting and combining module;
the target language determining module is used for determining a target language;
the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module;
the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English word with the word corresponding to the target language, and simultaneously keeping the original English word as a label to be predicted;
the coding module obtains a coded representation of the processed sample in the word replacement module by using a cross-language coding model;
the word replacement prediction module uses a fully-connected neural network, the encoding expression of each word in the sample obtained by the encoding module calculates the probability of the word which is possibly replaced in the dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module;
the dialogue domain prediction module to which the sample belongs uses a fully-connected neural network, the dialogue domain to which the sample belongs is judged by the coding expression of the whole sentence of the sample obtained by the coding module, and the cross entropy loss is calculated through the dialogue domain label marked in the training corpus sorting module;
the integral model acquisition module adds the cross entropy loss obtained by the word replacement prediction module and the cross entropy loss obtained by the dialogue field prediction module to which the sample belongs to obtain the final loss;
through the final loss, performing back propagation on the integral model and updating parameters of the integral model;
the overall model in the overall model acquisition module is a cross-language coding model in the coding module, and a word replaces the whole of a fully-connected neural network in the prediction module and a fully-connected neural network in the dialogue domain prediction module to which the sample belongs;
the training module trains an integral model in the integral model acquisition module by using the processed data in the training corpus sorting module and the word replacement module;
and the downstream task fine-tuning module in the cross-language dialogue understanding field uses the whole model trained by the training module as a pre-training model, and completes the tasks in the cross-language dialogue understanding field based on the pre-training model.
2. The model pre-training system for cross-language dialogue understanding according to claim 1, wherein: the dialogue domain label sorting and merging module is used for sorting dialogue domain labels marked on all data sets in the data acquisition module and merging dialogue domain labels with the same meaning on different data sets; the specific process is as follows:
step two, sorting all data sets in the data acquisition module to have marked conversation field labels;
and step two, classifying the conversation field labels with the same meaning on different data sets into the same category through manual screening.
3. The model pre-training system for cross-language dialogue understanding according to claim 2, wherein: the training corpus sorting module is used for dividing conversation corpuses in all data sets collected by the data acquisition module, taking user words and system replies in a round of conversation as a sample, segmenting words of the user words and the system replies respectively, and labeling a conversation field label for each sample by using the conversation field label information combined in the step two; the specific process is as follows:
step three, the dialogue understanding corpus in the data set collected in the data acquisition module is multi-turn dialogue, and each dialogue can be expressed as D ═ U1,R1,...,UN,RN};
Wherein N represents the number of dialogue rounds, U1And R1User utterances and system replies, U, representing the 1 st round of dialogue, respectivelyNAnd RNRepresenting user utterances and system replies, respectively, for the nth round of dialog;
taking the user utterance and the system reply in a round of conversation as a sample, segmenting the user utterance and the system reply respectively, and inserting a separator [ SEP ] between the user utterance and the system reply]And insert an identifier [ CLS ] at the beginning of the sentence]Is used to represent global information, resulting in a sample S { [ CLS { [],u1,u2,...,ui,[SEP],r1,r2,...,rj};
Wherein u is1And r1Representing the 1 st word, u, in the user utterance and the system reply, respectively2And r2Representing the 2 nd word, u, in the user utterance and the system reply, respectivelyiRepresenting the i-th word, r, in the user utterancejRepresenting the jth word in the system reply, i representing the length after word segmentation for the user utterance; j represents the length after the system is replied with participles;
step two, marking a conversation field label on each sample by utilizing the conversation field label information combined in the conversation field label sorting and combining module, wherein each sample marked with the conversation field label is represented as:
S={Stokens=[CLS],u1,u2,…,ui,[SEP],r1,r2,…,rj;Sdomain=d},
wherein d is the dialog domain label corresponding to the sample, StokensFor the sequence of processed input characters, S, in each sampledomainDialog realm tags for each sample.
4. A model pre-training system for cross-language dialogue understanding according to claim 3, wherein: the static dictionary determining module is used for respectively collecting static dictionaries translated from English vocabulary to various target languages according to the target languages determined by the target language determining module; the specific process is as follows:
by the web address https: com/facebook/MUSE downloads a dictionary that translates english to the target language.
5. The model pre-training system for cross-language dialogue understanding according to claim 4, wherein: the word replacing module is used for randomly selecting a certain proportion of English words on each sample marked with the dialogue field labels in the training corpus sorting module, randomly selecting a language from the target language determined in the target language determining module for each randomly selected word, translating each randomly selected word to a word corresponding to the target language by using a static dictionary collected by the static dictionary determining module, replacing the English word with the word corresponding to the target language, and simultaneously keeping the original English word as a label to be predicted; the specific process is as follows:
setting the randomly selected proportion as p%;
creation SgoldensArray for storing labels to be predicted, [ PAD ]]Array pair S as placeholdergoldensCarry out initialization, i.e. Sgoldens=[PAD],…,[PAD]。
Creation SmasksArray is used for storing position information of replaced words, and all 0 array pairs S are usedmasksCarry out initialization, i.e. Smasks=0,...,0;
S on each sample after labeling dialogue field labels in training corpus sorting moduletokensGenerating a random number of 0-1 for each t, if the random number is less than p%, translating t to the corresponding word t in the randomly selected target language by using the static dictionary collected from the static dictionary determining modulexLet t be t in the samplexReplacing the position, and storing the replaced t in SgoldensAs the label to be predicted, and simultaneously using S as the labelmasksThe value of this position is 1;
t∈{t|t∈Stokens,t≠[CLS],t≠[SEP]}
an example of a sample after word substitution is
Figure FDA0003117425660000031
Sgoldens=[PAD],…,uk,…,rl,…,rm,…,[PAD];Smasks=0,…,1,…,1,…,1,…,0}
Wherein the content of the first and second substances,
Figure FDA0003117425660000032
representing speech at a userU in the middle k positionkThe vocabulary of the target language after the word replacement,
Figure FDA0003117425660000033
r representing the position of l in the system replylThe vocabulary of the target language after the word replacement,
Figure FDA0003117425660000034
r representing m position in system recoverymAnd (5) performing word replacement on the target language vocabulary.
6.A model pre-training system for cross-language dialogue understanding according to claim 4 or 5, characterized in that: said Sgoldens、Stokens、SmasksAll have the same length, but SgoldensThere is a replaced t only at the position of the replaced word, and the other positions are [ PAD ]]Meaning that no prediction is required.
7. The cross-language dialogue understanding-oriented model pre-training system of claim 6, wherein: the encoding module obtains an encoded representation of the processed samples in the word replacement module using a cross-language encoding model; the specific process is as follows:
selecting XLM-RoBERta-base as cross-language coding model, and substituting processed S for words in word substitution moduletokensCoding is carried out to obtain the coded representation of each token
Figure FDA0003117425660000041
Figure FDA0003117425660000042
Wherein Cross _ Lingual _ Encoder is a Cross-language coding model, h[CLS]、h[sEP]Respectively represent [ CLS]And [ SEP ]]The tag is represented by a coded representation after being coded by a cross-language coding model,
Figure FDA0003117425660000043
represents u1The coded representation after being coded by the cross-language coding model,
Figure FDA0003117425660000044
is represented by r1And (4) representing the coded representation after the coding of the cross-language coding model.
8. The cross-language dialogue understanding-oriented model pre-training system of claim 7, wherein: the word replacement prediction module uses a fully-connected neural network, the encoding expression of each word in the sample obtained by the encoding module calculates the probability of the word which is possibly replaced in the dictionary, and the cross entropy loss is calculated through the label to be predicted in the word replacement module; the specific process is as follows:
eighthly, using a fully connected neural network, calculating the probability of possibly replaced words in the dictionary according to the coded representation of each word in the sample obtained by the coding module
Figure FDA0003117425660000045
Wherein
Figure FDA0003117425660000046
Is the weight of the fully-connected neural network, b is the bias of the fully-connected neural network, hiFor the coded representation of the i-th position obtained in the coding module,
Figure FDA0003117425660000047
a predicted probability for a word at the ith position;
step eight two, through S constructed in the word replacement modulegoldensAnd SmasksComputing cross-entropy loss for word replacement tasks
Figure FDA0003117425660000048
Wherein y is the size of the vocabulary,
Figure FDA0003117425660000049
representing the predicted probability of the kth word at the ith position,
Figure FDA00031174256600000410
the true label representing the k word at the ith position,
Figure FDA00031174256600000411
cross entropy loss for i position;
Figure FDA00031174256600000412
wherein i is SmasksThe position of the replaced word stored in, Smasks[i]Denotes SmasksThe value of the i-th position in (c),
Figure FDA0003117425660000051
is the sum of the losses over the positions of all replaced words.
9. The cross-language dialogue understanding-oriented model pre-training system of claim 8, wherein: the dialogue domain prediction module to which the sample belongs uses a fully-connected neural network, the dialogue domain to which the sample belongs is judged by the coding expression of the whole sentence of the sample obtained by the coding module, and the cross entropy loss is calculated through the dialogue domain label marked in the training corpus sorting module; the specific process is as follows:
step nine, using a fully-connected neural network, and obtaining an identifier [ CLS ] in a sample by an encoding module]Is represented by a code of[CLS]Calculating the probability of the dialogue area to which the sample belongs
Figure FDA0003117425660000052
Wherein
Figure FDA0003117425660000053
B' is the bias of the fully-connected neural network;
and step nine, calculating the cross entropy loss of the dialogue field classification task through the dialogue field labels marked in the training corpus sorting module.
10. The cross-language dialogue understanding-oriented model pre-training system of claim 9, wherein: in the ninth step, the cross entropy loss of the dialogue field classification task is calculated through the dialogue field labels marked in the training corpus sorting module:
Figure FDA0003117425660000054
wherein D is the number of the conversation domain labels summarized in the conversation domain label sorting and merging module,
Figure FDA0003117425660000055
is composed of
Figure FDA0003117425660000056
For the prediction probability of the ith dialogue domain tag,
Figure FDA0003117425660000057
the true label for the ith dialogue domain for the current sample.
CN202110667409.9A 2021-06-16 2021-06-16 Model pre-training system for cross-language dialogue understanding Active CN113312453B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110667409.9A CN113312453B (en) 2021-06-16 2021-06-16 Model pre-training system for cross-language dialogue understanding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110667409.9A CN113312453B (en) 2021-06-16 2021-06-16 Model pre-training system for cross-language dialogue understanding

Publications (2)

Publication Number Publication Date
CN113312453A true CN113312453A (en) 2021-08-27
CN113312453B CN113312453B (en) 2022-09-23

Family

ID=77379146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110667409.9A Active CN113312453B (en) 2021-06-16 2021-06-16 Model pre-training system for cross-language dialogue understanding

Country Status (1)

Country Link
CN (1) CN113312453B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115455981A (en) * 2022-11-11 2022-12-09 合肥智能语音创新发展有限公司 Semantic understanding method, device, equipment and storage medium for multi-language sentences
CN116628160A (en) * 2023-05-24 2023-08-22 中南大学 Task type dialogue method, system and medium based on multiple knowledge bases
CN116805004A (en) * 2023-08-22 2023-09-26 中国科学院自动化研究所 Zero-resource cross-language dialogue model training method, device, equipment and medium
CN117149987A (en) * 2023-10-31 2023-12-01 中国科学院自动化研究所 Training method and device for multilingual dialogue state tracking model
CN117648430A (en) * 2024-01-30 2024-03-05 南京大经中医药信息技术有限公司 Dialogue type large language model supervision training evaluation system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960317A (en) * 2018-06-27 2018-12-07 哈尔滨工业大学 Across the language text classification method with Classifier combination training is indicated based on across language term vector
CN109213851A (en) * 2018-07-04 2019-01-15 中国科学院自动化研究所 Across the language transfer method of speech understanding in conversational system
CN111326138A (en) * 2020-02-24 2020-06-23 北京达佳互联信息技术有限公司 Voice generation method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960317A (en) * 2018-06-27 2018-12-07 哈尔滨工业大学 Across the language text classification method with Classifier combination training is indicated based on across language term vector
CN109213851A (en) * 2018-07-04 2019-01-15 中国科学院自动化研究所 Across the language transfer method of speech understanding in conversational system
CN111326138A (en) * 2020-02-24 2020-06-23 北京达佳互联信息技术有限公司 Voice generation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DECHUAN TENG等: "《INJECTING WORD INFORMATION WITH MULTI-LEVEL WORD ADAPTER FOR CHINESE SPOKEN LANGUAGE UNDERSTANDING》", 《2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115455981A (en) * 2022-11-11 2022-12-09 合肥智能语音创新发展有限公司 Semantic understanding method, device, equipment and storage medium for multi-language sentences
CN115455981B (en) * 2022-11-11 2024-03-19 合肥智能语音创新发展有限公司 Semantic understanding method, device and equipment for multilingual sentences and storage medium
CN116628160A (en) * 2023-05-24 2023-08-22 中南大学 Task type dialogue method, system and medium based on multiple knowledge bases
CN116628160B (en) * 2023-05-24 2024-04-19 中南大学 Task type dialogue method, system and medium based on multiple knowledge bases
CN116805004A (en) * 2023-08-22 2023-09-26 中国科学院自动化研究所 Zero-resource cross-language dialogue model training method, device, equipment and medium
CN116805004B (en) * 2023-08-22 2023-11-14 中国科学院自动化研究所 Zero-resource cross-language dialogue model training method, device, equipment and medium
CN117149987A (en) * 2023-10-31 2023-12-01 中国科学院自动化研究所 Training method and device for multilingual dialogue state tracking model
CN117149987B (en) * 2023-10-31 2024-02-13 中国科学院自动化研究所 Training method and device for multilingual dialogue state tracking model
CN117648430A (en) * 2024-01-30 2024-03-05 南京大经中医药信息技术有限公司 Dialogue type large language model supervision training evaluation system
CN117648430B (en) * 2024-01-30 2024-04-16 南京大经中医药信息技术有限公司 Dialogue type large language model supervision training evaluation system

Also Published As

Publication number Publication date
CN113312453B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN113312453B (en) Model pre-training system for cross-language dialogue understanding
CN108614875B (en) Chinese emotion tendency classification method based on global average pooling convolutional neural network
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN109635280A (en) A kind of event extraction method based on mark
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN106980608A (en) A kind of Chinese electronic health record participle and name entity recognition method and system
CN110083700A (en) A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN109960728B (en) Method and system for identifying named entities of open domain conference information
CN109359293A (en) Mongolian name entity recognition method neural network based and its identifying system
CN110851599B (en) Automatic scoring method for Chinese composition and teaching assistance system
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN110046356B (en) Label-embedded microblog text emotion multi-label classification method
CN111222318B (en) Trigger word recognition method based on double-channel bidirectional LSTM-CRF network
CN110263325A (en) Chinese automatic word-cut
CN108563725A (en) A kind of Chinese symptom and sign composition recognition methods
CN111523420A (en) Header classification and header list semantic identification method based on multitask deep neural network
CN110472245A (en) A kind of multiple labeling emotional intensity prediction technique based on stratification convolutional neural networks
CN110222338A (en) A kind of mechanism name entity recognition method
CN113128232A (en) Named entity recognition method based on ALBERT and multi-word information embedding
CN109977402A (en) A kind of name entity recognition method and system
CN109446523A (en) Entity attribute extraction model based on BiLSTM and condition random field
CN114254645A (en) Artificial intelligence auxiliary writing system
CN113312918B (en) Word segmentation and capsule network law named entity identification method fusing radical vectors
CN114003700A (en) Method and system for processing session information, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant