CN114003700A - Method and system for processing session information, electronic device and storage medium - Google Patents

Method and system for processing session information, electronic device and storage medium Download PDF

Info

Publication number
CN114003700A
CN114003700A CN202111257225.1A CN202111257225A CN114003700A CN 114003700 A CN114003700 A CN 114003700A CN 202111257225 A CN202111257225 A CN 202111257225A CN 114003700 A CN114003700 A CN 114003700A
Authority
CN
China
Prior art keywords
dialogue
sentence
entity
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111257225.1A
Other languages
Chinese (zh)
Inventor
王梦婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Nuonuo Network Technology Co ltd
Original Assignee
Zhejiang Nuonuo Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Nuonuo Network Technology Co ltd filed Critical Zhejiang Nuonuo Network Technology Co ltd
Priority to CN202111257225.1A priority Critical patent/CN114003700A/en
Publication of CN114003700A publication Critical patent/CN114003700A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a method for processing dialog information, which comprises the following steps: obtaining dialogue linguistic data generated by multiple rounds of dialogue, and performing intention labeling and entity labeling on the dialogue linguistic data to obtain training data; building a sentence feature extraction module; building a stacking integrated learning module framework by using a plurality of pre-training language models, and connecting the sentence feature extraction module with the stacking integrated learning module framework to obtain a multi-purpose entity combined extraction model; inputting the training data into the multi-purpose entity joint extraction model for training; and if the dialog information to be answered is received, extracting a semantic understanding result of the dialog information to be answered by using the trained multi-intention entity joint extraction model, so that the semantic understanding identification accuracy can be improved. The application also discloses a processing system of the dialogue information, a storage medium and an electronic device, which have the beneficial effects.

Description

Method and system for processing session information, electronic device and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method and a system for processing dialog information, an electronic device, and a storage medium.
Background
Natural language processing is a sub-field of artificial intelligence, which refers to the ability of machines to understand and interpret human languages, and has a wide range of application scenarios, such as emotion analysis, automatic summarization, dialog systems, and so on. The dialogue system mainly comprises two types of task type and chatting type according to functions and consists of five technical modules of speech recognition (ASR), semantic understanding (NLU), Dialogue Management (DM), Natural Language Generation (NLG), speech synthesis (TIS) and the like.
The technical difficulty of the dialogue system mainly lies in semantic understanding (NLU) and Dialogue Management (DM), and a semantic understanding (NLU) module is a premise and guarantee for normal execution of the Dialogue Management (DM), and aims to convert a spoken language into a structured semantic representation, which mainly comprises two tasks of intention classification and entity recognition. In the related art, a joint model is usually used to extract intentions and entities, but in an actual dialogue scene, a user may express one or more intentions through multiple sentences, and most of the current dialogue systems divide the dialogue text into single sentences for analysis in the semantic understanding process, thereby causing inaccuracy in the understanding of the dialogue intentions.
Therefore, how to improve the recognition accuracy of semantic understanding is a technical problem that needs to be solved by those skilled in the art at present.
Disclosure of Invention
The application aims to provide a dialogue information processing method, a dialogue information processing system, an electronic device and a storage medium, which can improve the recognition accuracy of semantic understanding.
In order to solve the above technical problem, the present application provides a method for processing dialog information, including:
obtaining dialogue linguistic data generated by multiple rounds of dialogue, and performing intention labeling and entity labeling on the dialogue linguistic data to obtain training data;
building a sentence feature extraction module; the sentence feature extraction module is used for extracting multi-turn dialogue corpus feature vectors of the dialogue corpus; the multi-round dialogue corpus feature vector comprises vocabulary features and sentence correlation features with dialogue sequences;
building a stacking integrated learning module framework by using a plurality of pre-training language models, and connecting the sentence feature extraction module with the stacking integrated learning module framework to obtain a multi-purpose entity combined extraction model;
inputting the training data into the multi-purpose entity joint extraction model for training;
and if the dialog information to be answered is received, extracting a semantic understanding result of the dialog information to be answered by using the trained multi-intention entity joint extraction model.
Optionally, the sentence construction feature extraction module includes:
building a sentence feature extraction module comprising a target pre-training language model, a self-attention network, a transformer encoder network layer and an LSTM network layer;
correspondingly, the process of the sentence feature extraction module for extracting the multi-turn dialogue corpus feature vectors of the dialogue corpus includes:
extracting the vocabulary characteristics of each dialogue sentence by using the target pre-training language model;
inputting the vocabulary characteristics into the self-attention network to obtain the context vocabulary characteristics of each dialog sentence;
inputting the context vocabulary characteristics of all the dialog sentences into the transform encoder network layer to obtain inter-sentence correlation characteristics;
inputting the sentence correlation characteristics to the LSTM network layer to obtain the sentence correlation characteristics with the dialogue sequence;
and fusing the vocabulary characteristics and the association characteristics among the sentences with the conversation sequence to obtain the multi-round conversation corpus characteristic vector.
Optionally, extracting the vocabulary features of each dialog sentence by using the target pre-training language model includes:
determining a word segmentation processing result of each dialog sentence;
and inputting the word segmentation processing result of each dialog sentence into the target pre-training language model to obtain the vocabulary characteristics.
Optionally, the stacked ensemble learning module framework includes a base model layer and an ensemble learning layer;
the basic model layer comprises a plurality of pre-training language models and is used for training in a K-fold cross validation mode to obtain an intention recognition model set and an entity extraction model set; the integrated learning layer is used for learning a meta-learner according to the intention identification model set and the entity extraction model set to obtain intention loss and entity loss;
correspondingly, the method also comprises the following steps:
and performing parameter adjustment on the stacked ensemble learning module framework according to the intention loss and the entity loss.
Optionally, the training is performed in a K-fold cross validation manner to obtain an intention recognition model set and an entity extraction model set, including:
respectively training each pre-training language model by using the multi-round dialogue corpus feature vectors in a K-fold cross validation mode to obtain an intention recognition result and an entity extraction result output by each pre-training language model;
summarizing all the intention recognition results to obtain the intention recognition model set;
and summarizing all the entity extraction results to obtain the entity extraction model set.
Optionally, performing intent labeling and entity labeling on the dialog corpus to obtain training data, including:
performing word segmentation processing on each dialogue sentence of the spoken material;
performing intention labeling and entity labeling on the word segmentation processing result of each conversation sentence to obtain a labeled sentence comprising an intention label and an entity label;
and dividing the marked sentences into the training data and the test data according to a preset proportion.
Optionally, after the training data is input into the multi-intent entity joint extraction model for training, the method further includes:
and verifying the trained multi-intention entity combined extraction model by using the test data to obtain a model evaluation result.
The present application also provides a system for processing dialog information, the system comprising:
the system comprises a labeling module, a training module and a processing module, wherein the labeling module is used for acquiring dialogue linguistic data generated by multiple rounds of dialogue and performing intention labeling and entity labeling on the dialogue linguistic data to obtain training data;
the model building module is used for building a sentence feature extraction module; the sentence feature extraction module is used for extracting multi-turn dialogue corpus feature vectors of the dialogue corpus; the multi-round dialogue corpus feature vector comprises vocabulary features and sentence correlation features with dialogue sequences;
the model connection module is used for building a stacking integrated learning module framework by utilizing a plurality of pre-training language models and connecting the sentence feature extraction module with the stacking integrated learning module framework to obtain a multi-intention entity combined extraction model;
the model training module is used for inputting the training data into the multi-purpose entity joint extraction model for training;
and the information processing module is used for extracting the semantic understanding result of the dialogue information to be answered by using the trained multi-intention entity joint extraction model if the dialogue information to be answered is received.
The present application also provides a storage medium having stored thereon a computer program that, when executed, implements the steps performed by the above-described method of processing dialog information.
The application also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps executed by the processing method of the dialogue information when calling the computer program in the memory.
The application provides a method for processing dialogue information, which comprises the following steps: obtaining dialogue linguistic data generated by multiple rounds of dialogue, and performing intention labeling and entity labeling on the dialogue linguistic data to obtain training data; building a sentence feature extraction module; the sentence feature extraction module is used for extracting multi-turn dialogue corpus feature vectors of the dialogue corpus; the multi-round dialogue corpus feature vector comprises vocabulary features and sentence correlation features with dialogue sequences; building a stacking integrated learning module framework by using a plurality of pre-training language models, and connecting the sentence feature extraction module with the stacking integrated learning module framework to obtain a multi-purpose entity combined extraction model; inputting the training data into the multi-purpose entity joint extraction model for training; and if the dialog information to be answered is received, extracting a semantic understanding result of the dialog information to be answered by using the trained multi-intention entity joint extraction model.
The method comprises the steps of extracting a multi-turn dialogue corpus feature vector of the dialogue corpus by using a sentence feature extraction module, wherein the multi-turn dialogue corpus feature vector comprises vocabulary features and sentence association features with dialogue sequences. The multi-turn dialogue corpus feature vectors are simultaneously fused with vocabulary feature information, sentence internal context information and sentence-to-sentence context information, and deeper intention and entity information can be extracted compared with the intention and the entity information based on a single vocabulary or a single sentence. The sentence characteristic extraction module is connected with the stacking integrated learning module frame to obtain a multi-intention entity combined extraction model, context information of dialogue linguistic data can be considered by the multi-intention entity combined extraction model, all intentions and entities in the dialogue linguistic data can be identified, relations among all intentions and entities can be identified, the trained multi-intention entity combined extraction model extracts semantic understanding results of the dialogue information to be answered, context information is considered in the process of extracting the semantic understanding results, and the semantic understanding identification accuracy can be improved. The application also provides a processing system of the dialogue information, an electronic device and a storage medium, which have the beneficial effects and are not described again.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a method for processing session information according to an embodiment of the present application;
fig. 2 is a flowchart of a multi-intent entity joint extraction method based on multi-turn dialog information according to an embodiment of the present application;
fig. 3 is a block diagram of a context-aware-based dialog sentence feature extraction module according to an embodiment of the present application;
fig. 4 is a block diagram of a multi-intent entity joint extraction model based on stack integration according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a system for processing dialog messages according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a processing method of session information according to an embodiment of the present application.
The specific steps may include:
s101: obtaining dialogue linguistic data generated by multiple rounds of dialogue, and performing intention labeling and entity labeling on the dialogue linguistic data to obtain training data;
the embodiment can be applied to a dialogue system, the dialogue corpus can be multi-turn dialogue content between the user and the dialogue system, and after the dialogue corpus generated by multi-turn dialogue is obtained, intention labeling and entity labeling can be performed on all the dialogue corpuses to obtain training data. The training data refers to sentences to which intention labels and entity labels are added.
S102: building a sentence feature extraction module;
in this embodiment, a sentence feature extraction model may be built by using a plurality of models, and then a sentence feature extraction module is used to extract a plurality of rounds of dialogue corpus feature vectors of the dialogue corpus, where the plurality of rounds of dialogue corpus feature vectors include vocabulary features and inter-sentence association features with a dialogue sequence.
As a possible implementation, the multi-round dialog corpus feature vectors of the dialog corpus may be extracted by: building a sentence feature extraction module comprising a target pre-training language model, a self-attention network, a transformer encoder network layer and an LSTM network layer; extracting the vocabulary characteristics of each dialogue sentence by using the target pre-training language model; inputting the vocabulary characteristics into the self-attention network to obtain the context vocabulary characteristics of each dialog sentence; inputting the context vocabulary characteristics of all the dialog sentences into the transform encoder network layer to obtain inter-sentence correlation characteristics; inputting the sentence correlation characteristics to the LSTM network layer to obtain the sentence correlation characteristics with the dialogue sequence; and fusing the vocabulary characteristics and the association characteristics among the sentences with the conversation sequence to obtain the multi-round conversation corpus characteristic vector.
Further, the present embodiment may extract the vocabulary features of the dialog sentences by: determining a word segmentation processing result of each dialog sentence; and inputting the word segmentation processing result of each dialog sentence into the target pre-training language model to obtain the vocabulary characteristics.
S103: building a stacking integrated learning module framework by using a plurality of pre-training language models, and connecting the sentence feature extraction module with the stacking integrated learning module framework to obtain a multi-purpose entity combined extraction model;
the stacking ensemble learning module framework comprises a basic model layer and an ensemble learning layer; the basic model layer comprises a plurality of pre-training language models and is used for training in a K-fold cross validation mode to obtain an intention recognition model set and an entity extraction model set; and the integrated learning layer is used for learning a meta-learner according to the intention identification model set and the entity extraction model set to obtain intention loss and entity loss. The embodiment may perform parameter adjustment on the stacked ensemble learning module framework according to the intention loss and the entity loss.
Further, the embodiment may generate the intention recognition model set and the entity extraction model set by the following ways: respectively training each pre-training language model by using the multi-round dialogue corpus feature vectors in a K-fold cross validation mode to obtain an intention recognition result and an entity extraction result output by each pre-training language model; summarizing all the intention recognition results to obtain the intention recognition model set; and summarizing all the entity extraction results to obtain the entity extraction model set.
S104: inputting the training data into the multi-purpose entity joint extraction model for training;
the embodiment can input training data into the multi-intention entity joint extraction model for training, so that the multi-intention entity joint extraction model can identify a plurality of intents, a plurality of entities and the relationship between the entities and the intents.
S105: and if the dialog information to be answered is received, extracting a semantic understanding result of the dialog information to be answered by using the trained multi-intention entity joint extraction model.
The dialog information to be answered may include a plurality of rounds of dialog sentences, and the dialog information to be answered may be input into the multi-intent entity joint extraction model, so that the multi-intent entity joint extraction model extracts a semantic understanding result of the dialog information to be answered. After the semantic understanding result is obtained, a corresponding reply sentence can be generated according to the semantic understanding result.
In the embodiment, a sentence feature extraction module is used to extract a multi-round corpus feature vector of the corpus, where the multi-round corpus feature vector includes vocabulary features and inter-sentence association features with a dialogue sequence. The multi-turn dialogue corpus feature vectors are simultaneously fused with vocabulary feature information, sentence internal context information and sentence-to-sentence context information, and deeper intention and entity information can be extracted compared with the intention and the entity information based on a single vocabulary or a single sentence. In this embodiment, the sentence feature extraction module is connected to the frame of the stacked integrated learning module to obtain a multi-intent entity joint extraction model, the multi-intent entity joint extraction model can consider context information of a dialog corpus, can identify all intentions and entities in the dialog corpus, and relations between the intentions and the entities, the trained multi-intent entity joint extraction model extracts a semantic understanding result of the dialog information to be answered, and the process of extracting the semantic understanding result considers the context information, so that the semantic understanding identification accuracy can be improved.
As a possible implementation, the training data may be obtained by: performing word segmentation processing on each dialogue sentence of the spoken material; performing intention labeling and entity labeling on the word segmentation processing result of each conversation sentence to obtain a labeled sentence comprising an intention label and an entity label; and dividing the marked sentences into the training data and the test data according to a preset proportion.
Further, after the training data is input into the multi-intention entity joint extraction model for training, the trained multi-intention entity joint extraction model can be verified by using the test data, so that a model evaluation result is obtained. And if the model evaluation result does not accord with the preset condition, continuing to train the multi-purpose entity combined extraction model, and if the model evaluation result accords with the preset condition, judging that the multi-purpose entity combined extraction model is trained completely.
The flow described in the above embodiment is described below through an embodiment in practical application, please refer to fig. 2, and fig. 2 is a flow chart of a multi-intent entity joint extraction method based on multi-turn dialog information provided in the embodiment of the present application, which can continue data preprocessing for chinese dialog corpus and perform intent labeling and entity labeling. After data preprocessing, the dialogue sentence characteristics of the dialogue corpus in the Chinese are extracted, and a semantic understanding result is finally obtained by utilizing a stacking integration learning framework. The specific implementation steps are as follows:
step 1: acquiring and sorting Chinese dialogue linguistic data, and marking intentions and entities of the linguistic data according to dialogue context information;
step 2: a dialogue sentence feature extraction module based on context sensing is set up, and feature extraction is carried out on the original corpus;
and step 3: building a stacking integrated learning module framework based on a plurality of Chinese pre-training language basic models;
and 4, step 4: connecting the sentence characteristic extraction module with the stacking integrated learning module to form a final multi-intention entity joint extraction model structure;
and 5: inputting training data into a model for training, and storing the model;
step 6: and inputting the test data into the model for model verification to obtain a model evaluation result.
The embodiment provides a sentence feature extraction method based on multi-turn dialogue information, vocabulary feature information, sentence internal context information and sentence inter-sentence context information are fused, and deeper intention and entity information can be extracted compared with the intention and entity information based on a single vocabulary or a sentence. The embodiment also provides a multi-purpose entity joint extraction method based on stacking integration, and the model has higher identification accuracy than a single model through ablation learning and comparison experiments.
The multi-purpose entity joint extraction model structure can be suitable for task-based intelligent customer service scenes, for example, multiple purposes of a user can be accurately identified through multiple rounds of inquiry, corresponding entity information is filled, tasks are accurately executed, and the user is replied. Illustrating the input data and the output data of the multi-intent entity joint extraction model structure:
the input data is as follows:
u: i want to book a day to go to the air ticket in shanghai.
B: good, ask for a day's points?
U: 10 am in the morning, how do the weather in the open sky?
B: preferably, a ticket for going to Shanghai at 10 am of tomorrow is reserved for you, the weather of Shanghai in tomorrow is clear, and the air temperature is 15-0 ℃.
The output data are as follows:
u: intention is: booking an air ticket; entity: departure place (Hangzhou), arrival place (Shanghai), and date (tomorrow).
B: intention is: inquiring the departure time; entity: none.
U: intention is: providing departure time and weather inquiry; entity: time (10 am), date (tomorrow), location (shanghai).
B: intention is: booking air tickets, and inquiring weather; entity: time (10 am), date (tomorrow), location (shanghai), and temperature (10-15 deg.).
As a possible implementation manner, the operation of labeling the corpus in step 1 may include the following steps:
step 1.1: and acquiring the Chinese dialogue corpus.
In the embodiment, the real Chinese dialogue corpus in the relevant task scene of the intelligent office assistant can be used.
Step 1.2: for each multi-turn dialogue corpus X in the original corpus obtained in step 1.1i={x1,x2,...xTDividing each sentence into words according to word list to obtain sentence xiIs represented by a vector of { w }1,w2,...,wNAnd f, wherein T is the number of sentences in the multi-turn conversation, and N is the length of the sentences, namely the number of words after word segmentation.
Step 1.3: for each multi-turn dialog sample XiEach sentence x in (1)iPerforming intent and entity tagging, e.g. sentence χiThe labeled intention label is
Figure BDA0003324278510000091
Entity tag is
Figure BDA0003324278510000092
Wherein the content of the first and second substances,
Figure BDA0003324278510000093
Yafor all intention tag sets, YsIs a set of all entity tags.
Step 1.4: and dividing the marked data into a training set and a testing set according to the ratio of 9: 1.
As a possible implementation, the extracting the features in step 2 may include the following steps:
step 2.1: representing the dialogue corpus vector obtained in the step 1.2 by XiEach sentence x iniInput into a base language model, where a Chinese pre-training language model BERT is used to obtain a sentence feature representation H based on lexical informationi={h1,h2,...,hT}。
Step 2.2: each sentence xiCorresponding lexical feature representation hiRespectively input into a self-attention network to obtain sentence characteristic representation S with context vocabulary information in a sentencei={s1,s2,...,sT}。
Wherein, the calculation formula of the self-attention network layer is shown as follows, W, uwAnd bwAre all aimed at the sentence xiChinese vocabulary wjThe trainable parameters of (a) are determined,
Figure BDA0003324278510000101
as a sentence χiWord wjThe characteristics of (A) represent:
Figure BDA0003324278510000102
in the above formula
Figure BDA0003324278510000103
Representing the degree of association between each vocabulary and all other vocabularies.
Step 2.3: to characterize the conversation SiAll sentences are input into the Transformer Encoder network layer together, so that the feature representation has context information of mutual correlation among the sentences. Dialog boxSign SiA plurality of dialog samples (each comprising a plurality of sentences) are represented, i.e. the dialog samples Xi of the plurality of dialog samples are represented by lexical features.
Step 2.4: because the training sample is a multi-turn dialogue corpus, the sequence information according to the number of dialogue turns is very important, the feature representation obtained in step 2.3 only has the disordered association information among sentences and can not obtain the one-way sequence relation of the sentences, and therefore the feature representation is input into the LSTM network layer again to obtain the feature representation C with the multi-turn dialogue sequence informationi={c1,c2,...,cT}. The above features refer to the feature representation after step 2.3, i.e. the vector representation obtained after passing through the transform Encoder layer.
Step 2.5: finally, the feature with lexical information obtained in step 2.1 is represented by HiAnd the feature expression with the context information obtained in the step 2.4 is fused together to obtain the final multi-turn dialogue corpus feature vector, and the vector not only can represent the semantic information of the vocabulary, but also contains the context and structure information to be expressed by the whole sentence and the context semantic information in a multi-turn dialogue scene.
Referring to fig. 3, fig. 3 is a structural diagram of a dialog sentence feature extraction module based on context awareness according to an embodiment of the present application. As shown in fig. 3, the module for extracting the feature of the dialog sentence based on the context awareness is mainly divided into three steps, namely, extracting the characteristic information of the vocabulary of the sentence through a pre-training language model, then extracting the internal context information of the sentence through a self-attention layer, obtaining the context information related to the sentence and the sequence between the sentences through a Transformer Encoder and an LSTM, and finally fusing the characteristic information of the vocabulary of the sentence and the context characteristic information to obtain the final characteristic representation X1' of the sentence.
Referring to fig. 4, fig. 4 is a block diagram of a multi-intent entity joint extraction model structure based on stack integration according to an embodiment of the present application, where the framework includes a base model layer and an ensemble learning layer. As shown in fig. 3, the method is based on a stacking integration strategy, uses a plurality of Chinese pre-training language models as basic models, performs model training through K-fold cross validation, respectively obtains an intention recognition model set and an entity extraction model set, and then respectively inputs the two model sets into a meta-learner, namely a timedistributeddepth layer, to obtain intention recognition loss and entity extraction loss. As a possible implementation, the step 3 of building a framework of a stacked ensemble learning module may include the following steps:
step 3.1: firstly, building basic model layers, and selecting the number of basic models to be 3 to avoid over-training and over-fitting due to too complicated models;
step 3.2: because the application scene is a Chinese multi-turn dialogue scene and the training corpus is a Chinese dialogue text corpus, the basic model is selected to be a Chinese pre-training language model BERT, ELECTRA and ERNIE 2.0;
step 3.3: each basic model is trained in a K-fold cross validation mode, and for the purpose of model expandability, the value of K is 5;
step 3.4: and inputting the output characteristics of all basic models into a TimeDistributedDense layer (an ensemble learning layer) to learn a meta-learner, namely respectively performing intent recognition and entity extraction at a vocabulary level to finally obtain intent loss and entity loss, and obtaining model total loss according to the intent loss and the entity loss. Parameter adjustments can be made to the stacked ensemble learning module framework based on the model overall loss until the model overall loss is less than a threshold.
The embodiment can perform different preprocessing according to different languages, for example, English corpus does not need to be subjected to word segmentation processing; and an appropriate pre-training language model is selected for integration according to the corpus, for example, the English corpus can be selected from BERT, RoBERT, ALBERT and the like.
As a possible implementation, the operation of extracting features in step 4 may include the following steps:
step 4.1: inputting the output vector of the feature extraction module into a basic model layer of the integrated stacking frame;
step 4.2: with respect to the model main hyper-parameter settings, the training batch size of BERT is 16, the batch size of eletra and ERNIE2.0 is 32, and the learning rate of the timedistributeddepth layer is 0.001.
As a possible implementation, the extracting the features in step 6 may include the following steps: the evaluation indexes comprise accuracy, recall and comprehensive evaluation index F1, and are calculated as follows, wherein TP represents that positive samples are predicted to be positive samples, FP represents that negative samples are predicted to be positive samples, and FN represents that positive samples are predicted to be negative samples;
p (precision) ═ TP/(TP + FP);
r (recall) ═ TP/(TP + FN);
f1 (overall evaluation index) is 2PR/(P + R).
Secondly, most systems can only do single intention and slot identification for a single sentence, and in an actual scene, a plurality of single sentences express a plurality of intentions.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a system for processing dialog messages according to an embodiment of the present application, where the system may include:
the labeling module 501 is configured to obtain a dialogue corpus generated by multiple rounds of dialogues, and perform intention labeling and entity labeling on the dialogue corpus to obtain training data;
a model building module 502 for building a sentence feature extraction module; the sentence feature extraction module is used for extracting multi-turn dialogue corpus feature vectors of the dialogue corpus; the multi-round dialogue corpus feature vector comprises vocabulary features and sentence correlation features with dialogue sequences;
the model connection module 503 is configured to build a stacked ensemble learning module framework by using multiple pre-training language models, and connect the sentence feature extraction module with the stacked ensemble learning module framework to obtain a multi-intent entity joint extraction model;
a model training module 504, configured to input the training data into the multi-intent entity joint extraction model for training;
and the information processing module 505 is configured to, if the dialog information to be answered is received, extract a semantic understanding result of the dialog information to be answered by using the trained multi-intent entity joint extraction model.
In the embodiment, a sentence feature extraction module is used to extract a multi-round corpus feature vector of the corpus, where the multi-round corpus feature vector includes vocabulary features and inter-sentence association features with a dialogue sequence. The multi-turn dialogue corpus feature vectors are simultaneously fused with vocabulary feature information, sentence internal context information and sentence-to-sentence context information, and deeper intention and entity information can be extracted compared with the intention and the entity information based on a single vocabulary or a single sentence. In this embodiment, the sentence feature extraction module is connected to the frame of the stacked integrated learning module to obtain a multi-intent entity joint extraction model, the multi-intent entity joint extraction model can consider context information of a dialog corpus, can identify all intentions and entities in the dialog corpus, and relations between the intentions and the entities, the trained multi-intent entity joint extraction model extracts a semantic understanding result of the dialog information to be answered, and the process of extracting the semantic understanding result considers the context information, so that the semantic understanding identification accuracy can be improved.
Further, the model building module 502 is configured to build a sentence feature extraction module including a target pre-training language model, a self-attention network, a transformer encoder network layer, and an LSTM network layer;
correspondingly, the process of the sentence feature extraction module for extracting the multi-turn dialogue corpus feature vectors of the dialogue corpus includes:
extracting the vocabulary characteristics of each dialogue sentence by using the target pre-training language model;
inputting the vocabulary characteristics into the self-attention network to obtain the context vocabulary characteristics of each dialog sentence;
inputting the context vocabulary characteristics of all the dialog sentences into the transform encoder network layer to obtain inter-sentence correlation characteristics;
inputting the sentence correlation characteristics to the LSTM network layer to obtain the sentence correlation characteristics with the dialogue sequence;
and fusing the vocabulary characteristics and the association characteristics among the sentences with the conversation sequence to obtain the multi-round conversation corpus characteristic vector.
Further, the method also comprises the following steps:
the word segmentation module is used for determining the word segmentation processing result of each dialogue sentence; and the word segmentation processing result of each dialog sentence is input into the target pre-training language model to obtain the vocabulary characteristics.
Further, the stacked ensemble learning module framework comprises a base model layer and an ensemble learning layer;
the basic model layer comprises a plurality of pre-training language models and is used for training in a K-fold cross validation mode to obtain an intention recognition model set and an entity extraction model set; the integrated learning layer is used for learning a meta-learner according to the intention identification model set and the entity extraction model set to obtain intention loss and entity loss;
correspondingly, the method also comprises the following steps:
and the parameter adjusting module is used for carrying out parameter adjustment on the stacking integrated learning module framework according to the intention loss and the entity loss.
Further, the method also comprises the following steps:
the model set summarizing module is used for respectively training each pre-training language model by using the multi-turn dialogue corpus feature vectors in a K-fold cross validation mode to obtain an intention recognition result and an entity extraction result output by each pre-training language model; the system is also used for summarizing all the intention recognition results to obtain the intention recognition model set; and the entity extraction model set is also used for summarizing all the entity extraction results to obtain the entity extraction model set.
Further, the labeling module 501 is configured to perform word segmentation processing on each dialog sentence of the spoken material; the system is also used for carrying out intention labeling and entity labeling on the word segmentation processing result of each conversation sentence to obtain a labeled sentence comprising intention labels and entity labels; and the system is also used for dividing the marked sentences into the training data and the test data according to a preset proportion.
Further, the method also comprises the following steps:
and the evaluation module is used for verifying the trained multi-intention entity joint extraction model by using the test data after the training data is input into the multi-intention entity joint extraction model for training to obtain a model evaluation result.
Since the embodiment of the system part corresponds to the embodiment of the method part, the embodiment of the system part is described with reference to the embodiment of the method part, and is not repeated here.
The present application also provides a storage medium having a computer program stored thereon, which when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided by the foregoing embodiments when calling the computer program in the memory. Of course, the electronic device may also include various network interfaces, power supplies, and the like.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for processing session information, comprising:
obtaining dialogue linguistic data generated by multiple rounds of dialogue, and performing intention labeling and entity labeling on the dialogue linguistic data to obtain training data;
building a sentence feature extraction module; the sentence feature extraction module is used for extracting multi-turn dialogue corpus feature vectors of the dialogue corpus; the multi-round dialogue corpus feature vector comprises vocabulary features and sentence correlation features with dialogue sequences;
building a stacking integrated learning module framework by using a plurality of pre-training language models, and connecting the sentence feature extraction module with the stacking integrated learning module framework to obtain a multi-purpose entity combined extraction model;
inputting the training data into the multi-purpose entity joint extraction model for training;
and if the dialog information to be answered is received, extracting a semantic understanding result of the dialog information to be answered by using the trained multi-intention entity joint extraction model.
2. The method for processing dialogue information according to claim 1, wherein the sentence construction feature extraction module includes:
building a sentence feature extraction module comprising a target pre-training language model, a self-attention network, a transformer encoder network layer and an LSTM network layer;
correspondingly, the process of the sentence feature extraction module for extracting the multi-turn dialogue corpus feature vectors of the dialogue corpus includes:
extracting the vocabulary characteristics of each dialogue sentence by using the target pre-training language model;
inputting the vocabulary characteristics into the self-attention network to obtain the context vocabulary characteristics of each dialog sentence;
inputting the context vocabulary characteristics of all the dialog sentences into the transform encoder network layer to obtain inter-sentence correlation characteristics;
inputting the sentence correlation characteristics to the LSTM network layer to obtain the sentence correlation characteristics with the dialogue sequence;
and fusing the vocabulary characteristics and the association characteristics among the sentences with the conversation sequence to obtain the multi-round conversation corpus characteristic vector.
3. The method for processing dialogue information according to claim 2, wherein the extracting vocabulary features of each dialogue sentence by using the target pre-training language model comprises:
determining a word segmentation processing result of each dialog sentence;
and inputting the word segmentation processing result of each dialog sentence into the target pre-training language model to obtain the vocabulary characteristics.
4. The dialogue information processing method according to claim 1, wherein the stacked ensemble learning module framework comprises a base model layer and an ensemble learning layer;
the basic model layer comprises a plurality of pre-training language models and is used for training in a K-fold cross validation mode to obtain an intention recognition model set and an entity extraction model set; the integrated learning layer is used for learning a meta-learner according to the intention identification model set and the entity extraction model set to obtain intention loss and entity loss;
correspondingly, the method also comprises the following steps:
and performing parameter adjustment on the stacked ensemble learning module framework according to the intention loss and the entity loss.
5. The method for processing dialogue information according to claim 4, wherein the training by means of K-fold cross validation to obtain an intention recognition model set and an entity extraction model set comprises:
respectively training each pre-training language model by using the multi-round dialogue corpus feature vectors in a K-fold cross validation mode to obtain an intention recognition result and an entity extraction result output by each pre-training language model;
summarizing all the intention recognition results to obtain the intention recognition model set;
and summarizing all the entity extraction results to obtain the entity extraction model set.
6. The method for processing dialogue information according to claim 1, wherein the performing intent labeling and entity labeling on the dialogue corpus to obtain training data comprises:
performing word segmentation processing on each dialogue sentence of the spoken material;
performing intention labeling and entity labeling on the word segmentation processing result of each conversation sentence to obtain a labeled sentence comprising an intention label and an entity label;
and dividing the marked sentences into the training data and the test data according to a preset proportion.
7. The method for processing dialogue information according to claim 6, further comprising, after inputting the training data into the multi-intent entity joint extraction model for training:
and verifying the trained multi-intention entity combined extraction model by using the test data to obtain a model evaluation result.
8. An apparatus for processing dialogue information, comprising:
the system comprises a labeling module, a training module and a processing module, wherein the labeling module is used for acquiring dialogue linguistic data generated by multiple rounds of dialogue and performing intention labeling and entity labeling on the dialogue linguistic data to obtain training data;
the model building module is used for building a sentence feature extraction module; the sentence feature extraction module is used for extracting multi-turn dialogue corpus feature vectors of the dialogue corpus; the multi-round dialogue corpus feature vector comprises vocabulary features and sentence correlation features with dialogue sequences;
the model connection module is used for building a stacking integrated learning module framework by utilizing a plurality of pre-training language models and connecting the sentence feature extraction module with the stacking integrated learning module framework to obtain a multi-intention entity combined extraction model;
the model training module is used for inputting the training data into the multi-purpose entity joint extraction model for training;
and the information processing module is used for extracting the semantic understanding result of the dialogue information to be answered by using the trained multi-intention entity joint extraction model if the dialogue information to be answered is received.
9. An electronic device, comprising a memory in which a computer program is stored and a processor that implements the steps of the method for processing dialogue information according to any one of claims 1 to 7 when the processor calls the computer program in the memory.
10. A storage medium having stored thereon computer-executable instructions which, when loaded and executed by a processor, carry out the steps of a method of processing dialog messages according to any one of claims 1 to 7.
CN202111257225.1A 2021-10-27 2021-10-27 Method and system for processing session information, electronic device and storage medium Pending CN114003700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111257225.1A CN114003700A (en) 2021-10-27 2021-10-27 Method and system for processing session information, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111257225.1A CN114003700A (en) 2021-10-27 2021-10-27 Method and system for processing session information, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN114003700A true CN114003700A (en) 2022-02-01

Family

ID=79924391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111257225.1A Pending CN114003700A (en) 2021-10-27 2021-10-27 Method and system for processing session information, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114003700A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601771A (en) * 2022-12-01 2023-01-13 广州数说故事信息科技有限公司(Cn) Business order identification method, device, medium and terminal equipment based on multi-mode data
CN116910224A (en) * 2023-09-13 2023-10-20 四川金信石信息技术有限公司 Method and system for extracting switching operation information based on large language model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601771A (en) * 2022-12-01 2023-01-13 广州数说故事信息科技有限公司(Cn) Business order identification method, device, medium and terminal equipment based on multi-mode data
CN116910224A (en) * 2023-09-13 2023-10-20 四川金信石信息技术有限公司 Method and system for extracting switching operation information based on large language model
CN116910224B (en) * 2023-09-13 2023-11-21 四川金信石信息技术有限公司 Method and system for extracting switching operation information based on large language model

Similar Documents

Publication Publication Date Title
CN111625641B (en) Dialog intention recognition method and system based on multi-dimensional semantic interaction representation model
CN107154260B (en) Domain-adaptive speech recognition method and device
CN112100349A (en) Multi-turn dialogue method and device, electronic equipment and storage medium
CN111883115B (en) Voice flow quality inspection method and device
CN113505591A (en) Slot position identification method and electronic equipment
CN113780012A (en) Depression interview conversation generation method based on pre-training language model
CN110428823A (en) Speech understanding device and the speech understanding method for using the device
CN114003700A (en) Method and system for processing session information, electronic device and storage medium
CN111984780A (en) Multi-intention recognition model training method, multi-intention recognition method and related device
CN113254613A (en) Dialogue question-answering method, device, equipment and storage medium
CN114004231A (en) Chinese special word extraction method, system, electronic equipment and storage medium
CN113408287A (en) Entity identification method and device, electronic equipment and storage medium
CN116910220A (en) Multi-round dialogue interaction processing method, device, equipment and storage medium
CN117892237B (en) Multi-modal dialogue emotion recognition method and system based on hypergraph neural network
CN114937465A (en) Speech emotion recognition method based on self-supervision learning and computer equipment
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN114420169A (en) Emotion recognition method and device and robot
CN114239607A (en) Conversation reply method and device
US11615787B2 (en) Dialogue system and method of controlling the same
CN112257432A (en) Self-adaptive intention identification method and device and electronic equipment
CN117493548A (en) Text classification method, training method and training device for model
CN115376547A (en) Pronunciation evaluation method and device, computer equipment and storage medium
CN115410560A (en) Voice recognition method, device, storage medium and equipment
CN111489742B (en) Acoustic model training method, voice recognition device and electronic equipment
CN114254649A (en) Language model training method and device, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination