CN116362242A - Small sample slot value extraction method, device, equipment and storage medium - Google Patents

Small sample slot value extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116362242A
CN116362242A CN202310259317.6A CN202310259317A CN116362242A CN 116362242 A CN116362242 A CN 116362242A CN 202310259317 A CN202310259317 A CN 202310259317A CN 116362242 A CN116362242 A CN 116362242A
Authority
CN
China
Prior art keywords
feature vector
value extraction
slot value
slot
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310259317.6A
Other languages
Chinese (zh)
Inventor
周喜
杨奉毅
杨雅婷
马博
董瑞
艾比布拉·阿塔伍拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Technical Institute of Physics and Chemistry of CAS
Original Assignee
Xinjiang Technical Institute of Physics and Chemistry of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Technical Institute of Physics and Chemistry of CAS filed Critical Xinjiang Technical Institute of Physics and Chemistry of CAS
Priority to CN202310259317.6A priority Critical patent/CN116362242A/en
Publication of CN116362242A publication Critical patent/CN116362242A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a small sample slot value extraction method, a device, equipment and a storage medium, wherein the method acquires a slot value extraction data set, processes the data set and constructs the small sample slot value extraction data set; training a slot value extraction model in the basic field by using all data in the auxiliary set to obtain a historical information encoder; fusing the generated historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector to obtain an enhanced feature vector of the word, and further obtaining a prototype vector representation of the target slot position; and calculating the emission score and the transition score of each sentence in the query set, calculating the probability of the slot position to which each word belongs, and extracting the slot value in the sentence. The method and the device fully migrate a large amount of knowledge in the similar field, reduce the forgetting degree of the model on the historical information, effectively adapt to the slot value extraction task under the small sample scene, and improve the accuracy of the small sample slot value extraction.

Description

Small sample slot value extraction method, device, equipment and storage medium
Technical Field
The invention relates to the field of natural language processing in the technical field of information, in particular to the technical fields of slot value extraction, small sample learning and the like. Specifically, the invention provides a small sample slot value extraction method, a device, equipment and a storage medium.
Background
Slot value extraction (Slot mapping) is a key task in man-machine dialog systems, whose purpose is to identify Slot values in user statements. In recent years, with the rapid development of deep learning technology, the task of extracting slot values has also advanced greatly, and researchers have proposed a series of effective algorithms that typically require a large amount of data to support. However, in the initial stages of dialog system development, dialog text in the target domain is often difficult to collect, often with only a small number of data sample supports. In this case, since training data of the target task is limited, a more serious overfitting phenomenon occurs on the task in a data-driven method, which requires that the model have the ability to learn generalizations from a small number of samples. In order to solve the problem of insufficient training samples, researchers put forward a small sample learning algorithm inspired by the ability of human beings to quickly learn new things, and hope to learn a model with good discrimination ability for unknown categories by using knowledge outside the field and a small amount of labeling data. Typically, the small sample learning approach employs a training strategy for meta-learning, which requires data assistance for a large number of other tasks or domains. The basic idea is to simulate the situation of a small sample at test time at training time, so that all training tasks are in the form of small samples, which are called meta-tasks. Therefore, the model learns a new meta-task every time when training, and can well process a new small sample task after a large number of training. Such conventional small sample learning models aim at minimizing the loss of multiple different metatasks, rather than focusing on specific labels in each metatask.
The existing small sample cell value extraction method has the following problems:
1) In an actual slot value extraction task, overlapped labels are usually contained among a plurality of fields, and the common existence of the phenomenon leads to deviation between the label space of the actual task and the setting of a traditional small sample task, so that the model forgets history information;
2) The part of speech is taken as a universal language feature across fields, a certain guiding effect can be exerted under the condition of a small sample, and the existing model does not consider the part of speech;
in order to solve the problems in the conventional small sample slot value extraction method, the invention provides a small sample slot value extraction method, a device, equipment and a storage medium. The invention provides a meta-task construction strategy based on field migration and part-of-speech migration, and provides a two-stage model training framework to realize meta-task training, a new meta-task can fully migrate a large amount of knowledge in the similar field, the forgetting degree of a model on historical information is reduced, and the slot value extraction accuracy of the target field in a small sample scene can be effectively improved.
Disclosure of Invention
The invention aims to provide a small sample slot value extraction method, a device, equipment and a storage medium. The method comprises the steps of obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set; training a slot value extraction model in the basic field by using all data in the auxiliary set to obtain a historical information encoder; extracting semantic information of words on a support set and a query set, and respectively using a historical information encoder, a meta knowledge encoder and a part-of-speech information encoder to encode and obtain historical information feature vectors, meta knowledge feature vectors and part-of-speech information feature vectors of the words; fusing the generated historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector to obtain an enhanced feature vector of the word, and further obtaining a prototype vector representation of the target slot position; and calculating the emission score and the transfer score of each sentence in the query set, calculating the probability of the slot position to which each word belongs by using the emission score and the transfer score, and extracting the slot value in the sentence. The method and the device fully migrate a large amount of knowledge in the similar field, reduce the forgetting degree of the model on the historical information, effectively adapt to the slot value extraction task under the small sample scene, and improve the accuracy of the small sample slot value extraction.
The invention discloses a small sample groove value extraction method, which comprises the following steps:
a. obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set, wherein the construction of the small sample slot value extraction data set is to divide the whole data set into a training set, a verification set and a test set; sentences in the training set, the verification set and the test set respectively belong to different fields; during training, verification and testing, a plurality of groups of different domain migration meta-tasks are constructed, and each domain migration meta-task comprises a support set, a query set and an auxiliary set; the auxiliary set consists of all data in the basic field corresponding to the current target field; the basic domain is the domain in the training set most similar to the target domain;
b. c, using all data on the auxiliary set in the step a, using an independent BERT language model as a historical information encoder, encoding words as feature vectors, using a slot value extraction model in the basic field of conditional random field frame training, storing the historical information encoder after training is completed, and freezing current parameters;
c. extracting semantic information of words on a support set and a query set in the step a, using a historical information encoder in the step b to encode and obtain historical information feature vectors of the words, using a meta knowledge encoder to encode and obtain meta knowledge feature vectors of the words, and using a part-of-speech information encoder to obtain part-of-speech information feature vectors of the words;
d. c, fusing the historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector generated in the step c to obtain word enhancement feature vectors, and averaging all word enhancement feature vectors corresponding to each slot on a support set to obtain an average vector serving as a prototype vector representation of the slot;
e. and calculating the emission score and the transfer score of each sentence in the query set by adopting a conditional random field framework, calculating the probability of the slot position to which each word belongs by utilizing the emission score and the transfer score, and extracting the slot value in the sentence.
In the step c, the history information encoder is the BERT language model trained in the step b, and the meta knowledge encoder and the part-of-speech information encoder are independent BERT language models.
A small sample slot value extraction device corresponding to the method of claims 1-2, the device comprising a small sample slot value extraction data set construction module, a basic field slot value extraction module, a semantic encoder module, a feature fusion module and a conditional random field module, wherein:
small sample slot value extraction dataset construction module: the method comprises the steps of obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set, wherein the data set segmentation unit is used for dividing the whole data set into a training set, a verification set and a test set; the field migration meta-task construction unit is used for constructing a plurality of groups of different field migration meta-tasks during training and testing, and comprises a support set, a query set and an auxiliary set;
basic field slot value extraction module: training a slot value extraction model on the basic field by utilizing all data on the auxiliary set, wherein the slot value extraction model specifically comprises a basic field coding unit, and a word is coded into a feature vector by using an independent BERT language model; and the basic field conditional random field unit calculates the emission score and the transfer score and determines the slot position of each word in the sentence.
Semantic encoder module: extracting semantic information of words on a support set and a query set, encoding the words into three different high-dimensional feature vectors, namely a historical information feature vector, a meta knowledge feature vector and a part-of-speech information feature vector, wherein the method specifically comprises a historical information encoding unit, and encoding the words into historical information feature vectors by using a basic field encoding unit on a trained basic field slot value extraction module; a meta knowledge encoding unit encoding words into meta knowledge feature vectors using an independent BERT language model; a part-of-speech information encoding unit that encodes the word into part-of-speech information feature vectors using an independent BERT language model;
and a feature fusion module: fusing the generated historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector to obtain an enhanced feature vector of a word, further obtaining a prototype vector representation of a target slot, and specifically comprising a task adaptation unit, determining weights of the historical information feature vector and the meta knowledge feature vector, and fusing the historical information feature vector and the meta knowledge feature vector to obtain a task adaptation feature vector; the part-of-speech adaptation unit is used for determining weights of the part-of-speech information feature vector and the element knowledge feature vector, and fusing the part-of-speech information feature vector and the element knowledge feature vector to obtain a part-of-speech adaptation feature vector; the enhancement feature generation unit averages the task adaptation feature vector and the part-of-speech adaptation feature vector to obtain an enhancement feature vector; the prototype vector generation unit is used for averaging all word enhancement feature vectors corresponding to each slot position on the support set, and the obtained average vector is used as a prototype vector representation of the slot position;
conditional random field module: the method comprises the steps of calculating the emission score and the transfer score of each sentence in a query set, determining the slot position of each word in the sentence, and specifically comprises an emission score calculation unit, calculating the similarity between a word enhancement feature vector and a slot position prototype, and obtaining the emission score of the word; the transfer score calculating unit is used for obtaining the transfer score among the slots through training; and the slot value extraction unit is used for calculating the probability of the slot position of each word by using the emission score and the transition score and extracting the slot value in the sentence.
An electronic device, the device comprising: at least one processor; at least one GPU computing card; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor or by the at least one GPU computing card to enable the at least one processor or the at least one GPU computing card to perform the method of claims 1-2.
A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method recited in claims 1-2.
The method, the device, the equipment and the storage medium for extracting the small sample slot value can finish the slot value extraction task under the small sample scene and improve the accuracy of slot value extraction under the small sample condition.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings. The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of constructing a small sample bin value extraction dataset according to the present invention;
FIG. 3 is a flow chart of a method of domain migration meta-task construction of the present invention;
FIG. 4 is a flow chart of a method for encoding words into feature vectors according to the present invention;
FIG. 5 is a block diagram of a method for encoding words into feature vectors in accordance with the present invention;
FIG. 6 is a flow chart of a method for extracting slot values of a conditional random field framework in accordance with the present invention;
FIG. 7 is a flow chart of a method of fusing word history information features, meta knowledge features, and part-of-speech information features in accordance with the present invention;
FIG. 8 is a block diagram of a method of fusing word history information features, meta knowledge features, and part-of-speech information features in accordance with the present invention;
FIG. 9 is a schematic diagram of a small sample cell value extraction device according to the present invention;
FIG. 10 is a block diagram of an electronic device of the small sample slot value extraction method of the present invention.
Detailed Description
In order to better understand the solution of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. Various details of the embodiments of the present application are included to facilitate understanding, and they should be considered merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Examples
The small sample slot value extracting process includes the following steps:
a. the method comprises the steps of obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set, wherein the whole data set is divided into a training set, a verification set and a test set, and sentences in the training set, the verification set and the test set respectively belong to different fields; during training, verification and testing, a plurality of groups of different domain migration meta-tasks are constructed, and each domain migration meta-task comprises a support set, a query set and an auxiliary set; the auxiliary set consists of all data in the basic field corresponding to the current target field; the basic domain is the domain in the training set most similar to the target domain;
b. c, using all data on the auxiliary set in the step a, using an independent BERT language model as a historical information encoder, encoding words as feature vectors, using a slot value extraction model in the basic field of conditional random field frame training, storing the historical information encoder after training is completed, and freezing current parameters;
c. extracting semantic information of words on a support set and a query set in the step a, using a historical information encoder in the step b to encode and obtain historical information feature vectors of the words, using a meta knowledge encoder to encode and obtain meta knowledge feature vectors of the words, and using a part-of-speech information encoder to obtain part-of-speech information feature vectors of the words, wherein the historical information encoder is a BERT language model trained in the step b, and the meta knowledge encoder and the part-of-speech information encoder are independent BERT language models;
d. c, fusing the historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector generated in the step c to obtain word enhancement feature vectors, and averaging all word enhancement feature vectors corresponding to each slot on a support set to obtain an average vector serving as a prototype vector representation of the slot;
e. calculating the emission score and the transfer score of each sentence in the query set by adopting a conditional random field framework, calculating the probability of the slot position to which each word belongs by utilizing the emission score and the transfer score, and extracting the slot value in the sentence;
the device is composed of a small sample slot value extraction data set construction module, a basic field slot value extraction module, a semantic encoder module, a feature fusion module and a conditional random field module, wherein:
small sample slot value extraction dataset construction module: the method comprises the steps of obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set, wherein the data set segmentation unit is used for dividing the whole data set into a training set, a verification set and a test set; the field migration meta-task construction unit is used for constructing a plurality of groups of different field migration meta-tasks during training and testing, and comprises a support set, a query set and an auxiliary set;
basic field slot value extraction module: training a slot value extraction model on the basic field by utilizing all data on the auxiliary set, wherein the slot value extraction model specifically comprises a basic field coding unit, and a word is coded into a feature vector by using an independent BERT language model; the basic field conditional random field unit calculates the emission score and the transfer score, and determines the slot position of each word in the sentence;
semantic encoder module: extracting semantic information of words on a support set and a query set, encoding the words into three different high-dimensional feature vectors, namely a historical information feature vector, a meta knowledge feature vector and a part-of-speech information feature vector, wherein the method specifically comprises a historical information encoding unit, and encoding the words into historical information feature vectors by using a basic field encoding unit on a trained basic field slot value extraction module; a meta knowledge encoding unit encoding words into meta knowledge feature vectors using an independent BERT language model; a part-of-speech information encoding unit that encodes the word into part-of-speech information feature vectors using an independent BERT language model;
and a feature fusion module: fusing the generated historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector to obtain an enhanced feature vector of a word, further obtaining a prototype vector representation of a target slot, and specifically comprising a task adaptation unit, determining weights of the historical information feature vector and the meta knowledge feature vector, and fusing the historical information feature vector and the meta knowledge feature vector to obtain a task adaptation feature vector; the part-of-speech adaptation unit is used for determining weights of the part-of-speech information feature vector and the element knowledge feature vector, and fusing the part-of-speech information feature vector and the element knowledge feature vector to obtain a part-of-speech adaptation feature vector; the enhancement feature generation unit averages the task adaptation feature vector and the part-of-speech adaptation feature vector to obtain an enhancement feature vector; the prototype vector generation unit is used for averaging all word enhancement feature vectors corresponding to each slot position on the support set, and the obtained average vector is used as a prototype vector representation of the slot position;
conditional random field module: the method comprises the steps of calculating the emission score and the transfer score of each sentence in a query set, determining the slot position of each word in the sentence, and specifically comprises an emission score calculation unit, calculating the similarity between a word enhancement feature vector and a slot position prototype, and obtaining the emission score of the word; the transfer score calculating unit is used for obtaining the transfer score among the slots through training; the slot value extraction unit is used for calculating the probability of the slot position of each word by using the emission fraction and the transfer fraction, and extracting the slot value in the sentence;
an electronic device, the device comprising: at least one processor; at least one GPU computing card; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor or by the at least one GPU computing card to enable the at least one processor or the at least one GPU computing card to perform the method of claims 1-2;
a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method recited in claims 1-2;
FIG. 1 is a flow chart of a small sample slot value extraction method that may be applied to the case of small samples, slot value extraction in a dialog system, which may be performed by a small sample slot value extraction device implemented in software and/or hardware; referring to fig. 1, the small sample bin value extraction method includes:
obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set; in one embodiment, the slot value extraction dataset is a dialog dataset containing a plurality of slot values for a plurality of fields;
illustratively, the slot value extraction dataset is a benchmark slot value extraction dataset SNIPS, which contains seven different fields, such as reserving restaurants, querying weather, playing music, etc.; each field contains various slots, such as slots containing time, restaurant names, number of people and the like in the restaurant reservation field;
the construction of the small sample slot value extraction dataset is shown in fig. 2, and specifically comprises the following steps:
dividing the whole data set into a training set, a verification set and a test set, wherein sentences in the training set, the verification set and the test set respectively belong to different fields;
illustratively, in the SNIPS data set, all samples corresponding to five fields such as a reserved restaurant belong to a training set, all samples corresponding to the query weather field belong to a verification set, and all samples corresponding to the play music field belong to a test set;
constructing a plurality of field migration metatasks by using a training set for model training, wherein the metatasks aim to train the generalization capability of a model by simulating a scene of a small sample, and the field migration metatasks are constructed in a C-way K-shot form and comprise a support set, a query set and an auxiliary set;
constructing a plurality of field migration meta tasks by using the verification set for model verification, and storing a model with the best performance on the verification set for final model test;
in the model training iteration process, 1000 component tasks are set in total, each training 100 component tasks perform verification on the tasks of the verification set to obtain model loss scores on the verification set, and after training is completed, the model with the best performance on the verification set is taken as the model used in the test;
constructing a plurality of field migration meta-tasks by using a test set for model test, and in a test stage, constructing a plurality of field migration meta-tasks as well, wherein the final slot value extraction accuracy is calculated by the average value of the slot value extraction accuracy in the plurality of meta-tasks;
the construction field migration meta-task is shown in fig. 3, and specifically includes:
building a support set from the training set or the test set: taking the construction of a domain migration meta-task in a training stage as an example, firstly randomly selecting a domain from the whole training set as a target domain, then selecting N sentences on the target domain as a support set, and assuming that the target domain has C different slots, the slot value extraction task needs to be completed under the condition that each slot has K samples, then the selection of the N sentences needs to satisfy two conditions, (1) all slots in the N sentences appear at least K times; (2) After any sentence is removed, at least one slot appears for less than K times; the N samples form a support set S,
Figure BDA0004130623690000071
wherein x is (i) ,y (i) Respectively representing the ith sentence in the support set and the corresponding slot label sequence;
illustratively, in the SNIPS dataset, if the final expectation is to extract slot values for the new sentence given 5 samples per slot, then k=5; taking a training stage as an example, firstly, 1 domain is selected from 5 domains in a training set as a target domain, for example, a reserved restaurant is taken as a target domain, and the domain has 10 different slots, namely, c=10; a total of 30 sentences were selected from the restaurant reservation domain as a support set, i.e., n=30; at least 5 occurrences of each slot in the 30 sentences, and less than 5 occurrences of at least one slot after any sentence is removed;
constructing a query set from a training set or a test set: taking meta-task construction of a training stage as an example, L samples are extracted from the residual samples corresponding to C slots in the target field to form a query set Q,
Figure BDA0004130623690000072
wherein x is (i) ,y (i) Respectively representing an ith sentence in the query set and a corresponding slot label sequence thereof;
illustratively, in the SNIPS dataset, 40 sentences are randomly extracted as a query set from the remaining samples of this target area of reserved restaurants, i.e., l=40;
according to the current target field, a basic field is selected from the training set, and an auxiliary set is constructed: selecting other fields most similar to the current target field in the training set as basic fields according to the slot position labels of the fields, forming an auxiliary set A by all sentences on the basic fields,
Figure BDA0004130623690000073
wherein (1)>
Figure BDA0004130623690000074
Respectively representing the ith sentence in the auxiliary set and a corresponding slot label sequence thereof, wherein M represents the total number of sentences in the auxiliary set;
illustratively, in the SNIPS data set, the area with the highest overlapping degree of the training set and the restaurant reservation slot label is the reserved movie, and the reserved movie is the basic area of the reserved restaurant, and all 1000 sentences in the reserved movie area form an auxiliary set, namely m=1000;
constructing and completing a domain migration meta-task: the goal of the domain migration meta-task is to determine the correct slot label for each word in the sentence on the query set Q based on a large number of samples on the auxiliary set a and a small number of samples on the support set S; the loss function is specifically
Figure BDA0004130623690000075
Wherein y is * The method comprises the steps that a slot label predicted by a model on a sentence x is used, and y is a corresponding real slot label on the sentence x;
and training a slot value extraction model in the basic field by using all data on the auxiliary set to obtain a historical information encoder: an independent BERT language model is adopted as a historical information encoder, words are encoded into feature vectors, a groove value extraction model in the basic field of conditional random field frame training is used, the historical information encoder is stored after training is completed, and current parameters are frozen;
the word coding method is shown in fig. 4, and specifically comprises the following steps:
a separate BERT language model is used as the encoder: taking a history information Encoder as an example, a separate BERT language model is used as the history information Encoder, denoted Basic Encoder (BE);
for each sentence, encoding the sentence into a token sequence, adding a special mark [ CLS ] at the beginning position of the token sequence, and adding a special mark [ SEP ] at the end position;
inputting the token sequence into the BERT model, and taking the output at the corresponding position as the feature vector of the word;
illustratively, as shown in FIG. 5, where the sentence is first encoded as a token sequence, it should be noted that the same word may be encoded as multiple tokens, after the special tags [ CLS ] and [ SEP ] are added, the token sequence is input into the BERT model, and the final output of the word's corresponding position is represented as a word feature vector;
the flow chart of the conditional random field frame slot value extraction method is shown in fig. 6, and specifically comprises the following steps:
acquiring semantic information of words: according to the above method of encoding words as feature vectors, words are encoded as feature vectors by an encoder for each sentence x on the auxiliary set b Each word (x) b ) j The history information feature vector e is encoded into words using a history information encoder j ,e j =BE((x b ) j );
Calculating the emission of sentencesScore: the emission fraction calculation method is as follows
Figure BDA0004130623690000081
Wherein n is sentence x b The total number of words in the set,
Figure BDA0004130623690000082
Figure BDA0004130623690000083
is slot label y j Is calculated by the average value of the historical information feature vectors of all words corresponding to the slots in the support set;
calculating the transfer score of the sentence: the transfer score calculating method is as follows
Figure BDA0004130623690000084
f T (y j-1 ,y j )=p(y j |y j-1 ) Dependency relationship between tags p (y j |y j-1 ) Training by a neural network;
training a model by minimizing a loss function: final tag sequence y b The probability of (2) is
Figure BDA0004130623690000085
Figure BDA0004130623690000086
Wherein Y is b To assist the set of slot labels on the set, λ is a parameter that measures the weight between the two scores, typically 1 can be taken, and finally the loss function of the conditional random field model is L b =-logp(y b |x b ,S);
Extracting semantic information of words on the support set and the query set, and acquiring historical information features, meta knowledge features and part-of-speech information features of the words: the method comprises the steps of obtaining a historical information feature vector of a word by using a historical information encoder, obtaining a meta knowledge feature vector of the word by using a meta knowledge encoder, and obtaining a part-of-speech information feature vector of the word by using a part-of-speech information encoder; the historical information encoder is a BERT language model trained by a slot value extraction model in the basic field, and the meta knowledge encoder and the part-of-speech information encoder are independent BERT language models;
illustratively, taking sentence x on the support set as an example, the history information Encoder is denoted as Basic Encoder (BE), the Meta knowledge Encoder is denoted as Meta Encoder (ME), the part-of-speech information Encoder is denoted as POS Encoder (PE), for each word x in sentence x i Generating corresponding word semantic information feature vectors and historical information feature vectors by using three encoders
Figure BDA0004130623690000091
Meta knowledge feature vector->
Figure BDA0004130623690000092
Part-of-speech information feature vector->
Figure BDA0004130623690000093
Fusing semantic information features of the generated various words to obtain enhanced feature vectors of the words, and further obtaining prototype vector representations of the slots: fusing the generated historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector to obtain word enhancement feature vectors, and averaging all word enhancement feature vectors corresponding to each slot on the support set to obtain an average vector serving as a prototype vector representation of the slot;
referring to fig. 7, the flow chart of the word history information feature, meta knowledge feature and part-of-speech information feature fusion method specifically includes:
a task adaptation network formed by two full-connection layers is adopted to determine weights of the historical information feature vector and the element knowledge feature vector, and the historical information feature vector and the element knowledge feature vector are fused to obtain a task adaptation feature vector; determining weights of part-of-speech information feature vectors and element knowledge feature vectors by adopting a part-of-speech adaptation network formed by all connection layers of two layers, and fusing the part-of-speech information feature vectors and the element knowledge feature vectors to obtain part-of-speech adaptation feature vectors; averaging the task adaptation feature vector and the part-of-speech adaptation feature vector to obtain an enhancement feature vector;
illustratively, as in FIG. 8, for words, a historical information feature vector
Figure BDA0004130623690000094
Meta knowledge feature vector->
Figure BDA0004130623690000095
Part-of-speech information feature vector->
Figure BDA0004130623690000096
For the task adaptation feature vector +.>
Figure BDA0004130623690000097
e represents the element product of the two vectors, weight +.>
Figure BDA0004130623690000098
Is calculated as follows,/>
Figure BDA0004130623690000099
Wherein->
Figure BDA00041306236900000910
W 1 ,d 1 ,W 2 ,d 2 The weight of the linear layer a, the bias of the linear layer a, the weight of the linear layer b and the bias of the linear layer b are respectively; similarly, part-of-speech adaptive feature vector ++>
Figure BDA00041306236900000911
Weight->
Figure BDA00041306236900000912
The manner of calculation of (c) is as follows,
Figure BDA00041306236900000913
wherein->
Figure BDA00041306236900000914
W 3 ,d 3 ,W 4 ,d 4 The weight of the linear layer c, the bias of the linear layer c, the weight of the linear layer d and the bias of the linear layer d are respectively; enhanced feature vector of final word
Figure BDA0004130623690000101
Calculating the emission score and the transfer score of each sentence in the query set by adopting a conditional random field framework, and finally extracting the slot value in the sentence: the method for extracting the frame slot values of the conditional random field in the query set is consistent with the method for extracting the frame slot values of the conditional random field adopted in the auxiliary set, and finally, a Viterbi algorithm is adopted to determine sentence slot values;
fig. 9 is a schematic structural diagram of a small sample tank value extraction device according to an embodiment of the present invention, including: the device comprises a small sample slot value extraction data set construction module, a basic field slot value extraction module, a semantic encoder module, a feature fusion module and a conditional random field module; wherein:
the small sample slot value extraction data set construction module is used for acquiring a slot value extraction data set, processing the data set and constructing the small sample slot value extraction data set;
the basic field slot value extraction module is used for training a slot value extraction model in the basic field by utilizing all data in the auxiliary set;
the semantic encoder module is used for extracting semantic information of words on the support set and the query set, and encoding the words into three different high-dimensional feature vectors which are historical information feature vectors, meta knowledge feature vectors and part-of-speech information feature vectors respectively;
the feature fusion module is used for fusing the generated historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector to obtain an enhanced feature vector of the word, and further obtaining a prototype vector representation of the target slot position;
the conditional random field module is used for calculating the emission score and the transfer score of each sentence in the query set and determining the slot position of each word in the sentence;
the small sample slot value extraction data set construction module comprises: the data set segmentation unit is used for dividing the whole data set into a training set, a verification set and a test set;
the data set segmentation unit is specifically characterized in that sentences in a training set, a verification set and a test set respectively belong to different fields;
the field migration meta-task construction unit is used for constructing a plurality of groups of different field migration meta-tasks during training and testing, and comprises a support set, a query set and an auxiliary set;
the basic field slot value extraction module comprises: a basic domain coding unit for coding words into feature vectors using an independent BERT language model;
the basic field conditional random field unit is used for calculating emission scores and transfer scores and determining the slot position of each word in the sentence;
the semantic encoder module comprises: the historical information encoding unit is used for encoding words into historical information feature vectors by using the basic field encoding unit on the trained basic field slot value extraction module;
a meta knowledge encoding unit for encoding words into meta knowledge feature vectors using an independent BERT language model;
a part-of-speech information encoding unit for encoding words into part-of-speech information feature vectors using an independent BERT language model;
the feature fusion module comprises: the task adaptation unit is used for determining weights of the historical information feature vector and the meta knowledge feature vector, and fusing the historical information feature vector and the meta knowledge feature vector to obtain a task adaptation feature vector;
the part-of-speech adaptation unit is used for determining weights of the part-of-speech information feature vector and the element knowledge feature vector, and fusing the part-of-speech information feature vector and the element knowledge feature vector to obtain a part-of-speech adaptation feature vector;
the enhancement feature generation unit is used for averaging the task adaptation feature vector and the part-of-speech adaptation feature vector to obtain an enhancement feature vector;
the prototype vector generation unit is used for averaging all word enhancement feature vectors corresponding to each slot position on the support set, and the obtained average vector is used as a prototype vector representation of the slot position;
the conditional random field module comprises: the emission score calculation unit is used for calculating the similarity between the word enhancement feature vector and the slot prototype to obtain the emission score of the word;
the transfer score calculating unit is used for obtaining the transfer score among the slots through training;
the slot value extraction unit is used for calculating the probability of the slot position of each word by using the emission score and the transfer score and extracting the slot value in the sentence;
the invention also provides an electronic device and a readable storage medium; as shown in fig. 10, a block diagram of an electronic device of a small sample slot value extraction method; electronic devices refer to a wide variety of modern electronic digital computers including, for example: personal computers, portable computers, various server devices; the components shown herein and their interconnection and function are by way of example only;
as shown in fig. 10, the electronic device includes: one or more multi-core processors, one or more GPU computing cards, memory, for causing interactions to occur with an electronic device, further comprising: input equipment and output equipment. The devices are interconnected and communicated through buses:
the memory is a non-transitory computer readable storage medium provided by the invention; wherein the memory stores instructions executable by the at least one multi-core processor or the at least one GPU computing card to cause the entity identification and linking methods provided herein to be performed; the non-transitory computer readable storage medium of the present invention stores computer instructions for causing a computer to execute the entity recognition and linking method provided by the present invention;
an input device for providing and receiving control signals input into the electronic device by a user, including a keyboard for generating digital or character information and a mouse for controlling the device to generate other key signals; the output device provides feedback information from the consumer electronic device including a display of the print execution results or processes.
This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims (5)

1. The small sample groove value extraction method is characterized by comprising the following steps of:
a. obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set, wherein the construction of the small sample slot value extraction data set is to divide the whole data set into a training set, a verification set and a test set; sentences in the training set, the verification set and the test set respectively belong to different fields; during training, verification and testing, a plurality of groups of different domain migration meta-tasks are constructed, and each domain migration meta-task comprises a support set, a query set and an auxiliary set; the auxiliary set consists of all data in the basic field corresponding to the current target field; the basic domain is the domain in the training set most similar to the target domain;
b. c, using all data on the auxiliary set in the step a, using an independent BERT language model as a historical information encoder, encoding words as feature vectors, using a slot value extraction model in the basic field of conditional random field frame training, storing the historical information encoder after training is completed, and freezing current parameters;
c. extracting semantic information of words on a support set and a query set in the step a, using a historical information encoder in the step b to encode and obtain historical information feature vectors of the words, using a meta knowledge encoder to encode and obtain meta knowledge feature vectors of the words, and using a part-of-speech information encoder to obtain part-of-speech information feature vectors of the words;
d. c, fusing the historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector generated in the step c to obtain word enhancement feature vectors, and averaging all word enhancement feature vectors corresponding to each slot on a support set to obtain an average vector serving as a prototype vector representation of the slot;
e. and calculating the emission score and the transfer score of each sentence in the query set by adopting a conditional random field framework, calculating the probability of the slot position to which each word belongs by utilizing the emission score and the transfer score, and extracting the slot value in the sentence.
2. The method of claim 1, wherein the history information encoder in step c is a BERT language model trained in step b, and the meta knowledge encoder and the part-of-speech information encoder are independent BERT language models.
3. A small sample slot value extraction device corresponding to the method as claimed in claims 1-2, characterized in that the device is composed of a small sample slot value extraction data set construction module, a basic field slot value extraction module, a semantic encoder module, a feature fusion module and a conditional random field module, wherein:
small sample slot value extraction dataset construction module: the method comprises the steps of obtaining a slot value extraction data set, processing the data set, and constructing a small sample slot value extraction data set, wherein the data set segmentation unit is used for dividing the whole data set into a training set, a verification set and a test set; the field migration meta-task construction unit is used for constructing a plurality of groups of different field migration meta-tasks during training and testing, and comprises a support set, a query set and an auxiliary set;
basic field slot value extraction module: training a slot value extraction model on the basic field by utilizing all data on the auxiliary set, wherein the slot value extraction model specifically comprises a basic field coding unit, and a word is coded into a feature vector by using an independent BERT language model; the basic field conditional random field unit calculates the emission score and the transfer score, and determines the slot position of each word in the sentence;
semantic encoder module: extracting semantic information of words on a support set and a query set, encoding the words into three different high-dimensional feature vectors, namely a historical information feature vector, a meta knowledge feature vector and a part-of-speech information feature vector, wherein the method specifically comprises a historical information encoding unit, and encoding the words into historical information feature vectors by using a basic field encoding unit on a trained basic field slot value extraction module; a meta knowledge encoding unit encoding words into meta knowledge feature vectors using an independent BERT language model; a part-of-speech information encoding unit that encodes the word into part-of-speech information feature vectors using an independent BERT language model;
and a feature fusion module: fusing the generated historical information feature vector, the meta knowledge feature vector and the part-of-speech information feature vector to obtain an enhanced feature vector of a word, further obtaining a prototype vector representation of a target slot, and specifically comprising a task adaptation unit, determining weights of the historical information feature vector and the meta knowledge feature vector, and fusing the historical information feature vector and the meta knowledge feature vector to obtain a task adaptation feature vector; the part-of-speech adaptation unit is used for determining weights of the part-of-speech information feature vector and the element knowledge feature vector, and fusing the part-of-speech information feature vector and the element knowledge feature vector to obtain a part-of-speech adaptation feature vector; the enhancement feature generation unit averages the task adaptation feature vector and the part-of-speech adaptation feature vector to obtain an enhancement feature vector; the prototype vector generation unit is used for averaging all word enhancement feature vectors corresponding to each slot position on the support set, and the obtained average vector is used as a prototype vector representation of the slot position;
conditional random field module: the method comprises the steps of calculating the emission score and the transfer score of each sentence in a query set, determining the slot position of each word in the sentence, and specifically comprises an emission score calculation unit, calculating the similarity between a word enhancement feature vector and a slot position prototype, and obtaining the emission score of the word; the transfer score calculating unit is used for obtaining the transfer score among the slots through training; and the slot value extraction unit is used for calculating the probability of the slot position of each word by using the emission score and the transition score and extracting the slot value in the sentence.
4. An electronic device, the device comprising: at least one processor; at least one GPU computing card; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor or by the at least one GPU computing card to enable the at least one processor or the at least one GPU computing card to perform the method of claims 1-2.
5. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method recited in claims 1-2.
CN202310259317.6A 2023-03-17 2023-03-17 Small sample slot value extraction method, device, equipment and storage medium Pending CN116362242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310259317.6A CN116362242A (en) 2023-03-17 2023-03-17 Small sample slot value extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310259317.6A CN116362242A (en) 2023-03-17 2023-03-17 Small sample slot value extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116362242A true CN116362242A (en) 2023-06-30

Family

ID=86926664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310259317.6A Pending CN116362242A (en) 2023-03-17 2023-03-17 Small sample slot value extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116362242A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116865840A (en) * 2023-09-01 2023-10-10 中国人民解放军战略支援部队航天工程大学 Giant constellation operation management-oriented software robot cluster system and construction method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116865840A (en) * 2023-09-01 2023-10-10 中国人民解放军战略支援部队航天工程大学 Giant constellation operation management-oriented software robot cluster system and construction method
CN116865840B (en) * 2023-09-01 2023-12-08 中国人民解放军战略支援部队航天工程大学 Giant constellation operation management-oriented software robot cluster system and construction method

Similar Documents

Publication Publication Date Title
CN110334354B (en) Chinese relation extraction method
CN111078836B (en) Machine reading understanding method, system and device based on external knowledge enhancement
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
CN112487820B (en) Chinese medical named entity recognition method
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN111985239A (en) Entity identification method and device, electronic equipment and storage medium
CN113254610B (en) Multi-round conversation generation method for patent consultation
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
CN109933792A (en) Viewpoint type problem based on multi-layer biaxially oriented LSTM and verifying model reads understanding method
WO2019235103A1 (en) Question generation device, question generation method, and program
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN112200664A (en) Repayment prediction method based on ERNIE model and DCNN model
CN115186147B (en) Dialogue content generation method and device, storage medium and terminal
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN113743099A (en) Self-attention mechanism-based term extraction system, method, medium and terminal
CN115292463A (en) Information extraction-based method for joint multi-intention detection and overlapping slot filling
CN111966811A (en) Intention recognition and slot filling method and device, readable storage medium and terminal equipment
CN116362242A (en) Small sample slot value extraction method, device, equipment and storage medium
CN113705207A (en) Grammar error recognition method and device
CN113723111B (en) Small sample intention recognition method, device, equipment and storage medium
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN113012685B (en) Audio recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination