CN116204616A - Artificial intelligence question-answering method based on semantic training algorithm - Google Patents
Artificial intelligence question-answering method based on semantic training algorithm Download PDFInfo
- Publication number
- CN116204616A CN116204616A CN202211711312.4A CN202211711312A CN116204616A CN 116204616 A CN116204616 A CN 116204616A CN 202211711312 A CN202211711312 A CN 202211711312A CN 116204616 A CN116204616 A CN 116204616A
- Authority
- CN
- China
- Prior art keywords
- semantic
- training
- artificial intelligence
- semantic training
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 101
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000000586 desensitisation Methods 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 239000013598 vector Substances 0.000 claims description 26
- 239000012634 fragment Substances 0.000 claims description 19
- 238000012937 correction Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000011084 recovery Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 17
- 238000003058 natural language processing Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/245—Classification techniques relating to the decision surface
- G06F18/2453—Classification techniques relating to the decision surface non-linear, e.g. polynomial classifier
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Nonlinear Science (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to an artificial intelligence technology, in particular to an artificial intelligence question-answering method based on a semantic training algorithm, which comprises the following steps: s1.1: collecting corpus data; s1.2: carrying out data preprocessing on the collected corpus data; s1.3: constructing a semantic training model based on the collected corpus data; s1.4: and carrying out model optimization training on the constructed semantic training model. According to the invention, the acquired corpus data is subjected to desensitization processing, so that the leakage of sensitive data is prevented, the safety of artificial intelligence question and answer is ensured, mask training and sequential logic training are respectively carried out by constructing a semantic training model, the learning capacity of the semantic training model is improved, the expansibility of an artificial intelligence question and answer system is improved, the accuracy and the recovery rate of the artificial intelligence question and answer are improved, the question can be recovered more efficiently, and the method can be applied to various intelligent fields.
Description
Technical Field
The invention relates to an artificial intelligence technology, in particular to an artificial intelligence question-answering method based on a semantic training algorithm.
Background
Natural language processing is a scientific research for processing media such as language, characters and the like used for information transmission by human beings, the English name of the natural language processing is Natural Language Processing, NLP is an important field in the current very popular artificial intelligence, a dialogue system is an important research direction in NLP, and the main research is how to enable a computer to have intelligent thinking for interaction with human beings, so that the natural language processing is an important work of the artificial intelligence and is a very challenging task. In 1950, turing published an evaluation method for computer systems through "computer and Intelligence", named "Turing test", which provided clear target direction for computer intelligence never before, namely represented the level of computer intelligence by means of man-machine conversation, and brought about a great deal of attention for students. The current dialogue system mainly comprises a question and answer system which helps users answer questions; the task-oriented dialog system mainly provides corresponding operation prompts and the like for users under the appointed scene tasks; the question-answering system is mainly a large-scale system for providing knowledge inquiry for users by processing questions input by users, analyzing the key contents of the questions according to the input, and searching and generating answers among candidate existing questions-answers.
The traditional task oriented system mainly comprises the steps of obtaining the intention input from a user side, carrying out a series of vectorization processing on the intention by utilizing the existing text processing methods, converting input information into vector state representation which can be recognized by a machine system, obtaining a matching value of a relevant answer by a series of calculation on the vector information, selecting the existing candidate answer according to a set matching strategy and the calculated matching value, and obtaining and returning the corresponding answer to the user. The system regards the dialogue as a pipeline, the mode mostly needs to manually label the characteristics of semantic representation in the dialogue process, thus a large amount of manpower and material resources are consumed, the cost is high, and the system is a task-oriented system, can only complete the work under a certain scene set for the system, cannot be applied to other fields, and has poor expansibility.
Disclosure of Invention
The invention aims to solve the defects in the background technology by providing an artificial intelligence question-answering method based on a semantic training algorithm.
The technical scheme adopted by the invention is as follows:
the artificial intelligence question-answering method based on the semantic training algorithm comprises the following steps:
s1.1: collecting corpus data;
s1.2: carrying out data preprocessing on the collected corpus data;
s1.3: constructing a semantic training model based on the collected corpus data;
s1.4: and carrying out model optimization training on the constructed semantic training model.
As a preferred technical scheme of the invention: in the step S1.2, error detection correction processing and desensitization processing are carried out on the collected corpus data.
As a preferred technical scheme of the invention: in the step S1.3, the input corpus data is defined, and the definition input is thatWhere m represents the total turn of the dialog, i.e. [1, m]Represents the ith round of dialog, u i Representing the speech replied to the ith round, defining +.>n i Is the length of the utterance returned by the ith round,/-for the ith round>J's character representing the speech replied to in the ith pass of speech, j's [1, n ] i ]。
As a preferred technical scheme of the invention: in the S1.3, the input of the semantic training model comprises token coding, segment coding and position coding, and the token coding, the segment coding and the position coding are carried outAdding as input, wherein token is encoded asThe corresponding character embedding table is E t Representing the vocabulary size, the fragment coding is +.>The corresponding fragment embedding table is E s S represents the maximum number of fragments, the position coding is +.>The corresponding position embedding table is +.>N represents the sequence length of the entire dialog, i.e. +.>Where m represents the total turn of the dialog, i.e. [1, m]Represents the ith round, n, of the dialog i Is the length of the utterance recovered by the ith round,
the total input of the semantic training model is obtained as follows:
wherein ,eij The method comprises the steps of inputting a semantic training model, wherein each position corresponds to an embedded vector; i epsilon [1, m],j∈[1,n i ]The method comprises the steps of carrying out a first treatment on the surface of the Extracting to obtain the following components:
E ij =transformer(e ij )
wherein ,Eij Representing the output vector for each character of the sequence.
As a preferred technical scheme of the invention: and identifying through a nonlinear classifier by the embedded vector corresponding to each position, and judging whether the embedded vectors at other positions have masks and actual texts corresponding to the masks through mask training.
As a preferred technical scheme of the invention: the mask training is as follows:
for the recovered utterancePredicting characters by a nonlinear character classifier, the formula is as follows:
wherein ,representing +.>Predicted value of E tT Transpose of the character-embedded table, b 1 Is a bias parameter of the nonlinear classifier.
As a preferred technical scheme of the invention: in the mask training process, a loss function L of mask training 1 (θ,θ 1 ) Expressed as:
wherein θ is encoder parameters of the semantic training model, θ 1 For the non-line character classifier parameter, the number of characters of the input sequence mask is M, m=m' represents the number of mask characters, and V represents the vocabulary size.
As a preferred technical scheme of the invention: for the speech u replied to the ith round i Sequential logic training is performed by first embedding vector E of the ith segment i1 Predicting the turn of the segment in the dialogAnd compared with the actual run:
wherein ,for the prediction round of the ith fragment, W 2 Embedding a unit vector of a table for a fragment, b 2 Is a bias parameter of the nonlinear classifier.
As a preferred technical scheme of the invention: in the sequential logic training, a loss function L of the sequential logic training 2 (θ,θ 2 ) Expressed as:
wherein θ is encoder parameters of the semantic training model, θ 2 Is a parameter of a nonlinear round classifier,for the prediction round of the ith segment, e=e' represents the number of segments of the dialog and S represents the maximum number of segments.
As a preferred technical scheme of the invention: in the step S1.4, a total loss function L of the semantic training model is obtained:
L=L 1 (θ,θ 1 )+L 2 (θ,θ 2 )
training the semantic training model with the minimum loss function as a target, wherein θ is the encoder parameter of the semantic training model, and θ is 1 Is a non-linear character, θ 2 Parameter classifier parameters, L, which are nonlinear round classifiers 1 (θ,θ 1 ) Loss function trained for mask, L 2 (θ,θ 2 ) A penalty function trained for sequential logic.
Compared with the prior art, the artificial intelligence question-answering method based on the semantic training algorithm has the beneficial effects that:
according to the invention, the acquired corpus data is subjected to desensitization processing, so that the leakage of sensitive data is prevented, the safety of artificial intelligence question and answer is ensured, mask training and sequential logic training are respectively carried out by constructing a semantic training model, the learning capacity of the semantic training model is improved, the expansibility of an artificial intelligence question and answer system is improved, the accuracy and the recovery rate of the artificial intelligence question and answer are improved, the question can be recovered more efficiently, and the method can be applied to various intelligent fields.
Drawings
FIG. 1 is a flow chart of a method of a preferred embodiment of the present invention.
Detailed Description
It should be noted that, under the condition of no conflict, the embodiments of the present embodiments and features in the embodiments may be combined with each other, and the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and obviously, the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a preferred embodiment of the present invention provides an artificial intelligence question-answering method based on a semantic training algorithm, comprising the steps of:
s1.1: collecting corpus data;
s1.2: carrying out data preprocessing on the collected corpus data;
s1.3: constructing a semantic training model based on the collected corpus data;
s1.4: and carrying out model optimization training on the constructed semantic training model.
In the step S1.2, error detection correction processing and desensitization processing are carried out on the collected corpus data.
In the step S1.3, the input corpus data is defined, and the definition input is thatWhere m represents the total turn of the dialog, i.e. [1, m]Represents the ith round of dialog, u i Representing the speech replied to the ith round, defining +.>n i Is the length of the reply utterance of the ith round,/-for the ith round>J's character representing the speech replied to in the ith pass of speech, j's [1, n ] i ]。
In the S1.3, the input of the semantic training model comprises token coding, segment coding and position coding, and the addition of the token coding, the segment coding and the position coding is used as the input, wherein the token coding is as followsThe corresponding character embedding table is E t Representing the fragment encoded +.>The corresponding fragment embedding table is E s S represents the maximum number of fragments, the position coding is +.>The corresponding position embedding table is +.>N represents the sequence length of the entire dialog, i.e. +.>Where m represents the total turn of the dialog, i.e. [1, m]Represents the ith round, n, of the dialog i Is the length of the utterance recovered by the ith round,
the total input of the semantic training model is obtained as follows:
wherein ,eij Training for semanticsTraining the input of a model, and embedding vectors corresponding to each position; i epsilon [1, m],j∈[1,n i ]The method comprises the steps of carrying out a first treatment on the surface of the The method comprises the steps of carrying out a first treatment on the surface of the Extracting to obtain the following components:
E ij =transformer(e ij )
wherein ,Eij Representing the output vector for each character of the sequence.
And identifying through a nonlinear classifier by the embedded vector corresponding to each position, and judging whether the embedded vectors at other positions have masks and actual texts corresponding to the masks through mask training.
The mask training is as follows:
for the recovered utterancePredicting characters by a nonlinear character classifier, the formula is as follows:
wherein ,representing +.>Predicted value of E tT Transpose of the character-embedded table, b 1 Is a bias parameter of the nonlinear classifier.
In the mask training process, a loss function L of mask training 1 (θ,θ 1 ) Expressed as:
wherein θ is encoder parameters of the semantic training model, θ 1 For the non-line character classifier parameter, the number of characters of the input sequence mask is M, m=m' represents the number of mask characters, and V represents the vocabulary size.
For the speech u replied to the ith round i Sequential logic training is performed by first embedding vector E of the ith segment i1 Predicting the turn of the segment in the dialogAnd compared with the actual run:
wherein ,for the prediction round of the ith fragment, W 2 Embedding a unit vector of a table for a fragment, b 2 Is a bias parameter of the nonlinear classifier.
In the sequential logic training, a loss function L of the sequential logic training 2 (θ,θ 2 ) Expressed as:
wherein θ is encoder parameters of the semantic training model, θ 2 Is a parameter of a nonlinear round classifier,for the prediction round of the ith segment, e=e' represents the number of segments of the dialog and S represents the maximum number of segments.
In the step S1.4, a total loss function L of the semantic training model is obtained:
L=L 1 (θ,θ 1 )+L 2 (θ,θ 2 )
training the semantic training model with the minimum loss function as a target, wherein θ is the encoder parameter of the semantic training model, and θ is 1 Is a non-linear character, θ 2 Parameter classifier parameters, L, which are nonlinear round classifiers 1 (θ,θ 1 ) Loss function trained for mask, L 2 (θ,θ 2 ) A penalty function trained for sequential logic.
In this embodiment, corpus data is collected, error detection and correction are performed based on the collected corpus data, and desensitization processing is performed, so that digits, such as an identification card number, a collection number, a bank card number, an address house number and the like, and sensitive information, such as names and addresses and the like, appearing in the corpus data can be replaced randomly in the desensitization processing to prevent information leakage.
Defining the input corpus data, wherein the definition input is thatWhere m represents the total turn of the dialog, i.e. [1, m]Represents the ith round of dialog, u i Representing the speech replied to the ith round, defining +.>n i Is the length of the utterance returned by the ith round,/-for the ith round>J's character representing the speech replied to in the ith pass of speech, j's [1, n ] i ]。
The input of the semantic training model comprises token coding, segment coding and position coding, and the token coding is added with the segment coding and the position coding as input, wherein the token coding is as followsThe corresponding character embedding table is E t V represents the vocabulary size, fragment encoded as +.>The corresponding fragment embedding table is E s S represents the maximum number of fragments, the position coding is +.>Corresponding position is embeddedGo into the table to be +.>N represents the sequence length of the entire dialog, i.e. +.>Where m represents the total turn of the dialog, i.e. [1, m]Represents the ith round, n, of the dialog i Is the length of the utterance recovered by the ith round,
the total input of the semantic training model is obtained as follows:
wherein ,eij The method comprises the steps of inputting a semantic training model, wherein each position corresponds to an embedded vector; i epsilon [1, m],j∈[1,n i ]The method comprises the steps of carrying out a first treatment on the surface of the Extracting to obtain the following components:
E ij =transformer(e ij )
wherein ,Eij Representing the output vector for each character of the sequence.
And identifying through a nonlinear classifier by using the embedded vector corresponding to each position, and judging whether masks and actual texts corresponding to the masks exist in the embedded vectors at other positions through mask training.
For the recovered utteranceIf the 2 nd and 3 rd characters are selected as masks, the 2 nd and 3 rd characters are masked to obtain: />Predicting characters by a nonlinear character classifier, the formula is as follows:
wherein ,representing +.>Predicted value of E tT Transpose of the character-embedded table, b 1 Is a bias parameter of the nonlinear classifier. />
Through mask learning on the characters, the learning capacity of the semantic training model is improved, and the answer efficiency of the semantic training model is improved.
The interactive information has sequential logic questions, if the answer question is likely to be a reply of a sentence in the previous interactive information, the speech u replied by the ith round is needed i Sequential logic training is performed by first embedding vector E of the ith segment i1 Predicting the turn of the segment in the dialogAnd compared with the actual run:
wherein ,for the prediction round of the ith fragment, W 2 Embedding a unit vector of a table for a fragment, b 2 Is a bias parameter of the nonlinear classifier.
By sequential logic training of the reply utterances, the accuracy and the recovery rate of the reply are improved.
According to two training processes, loss functions are obtained respectively:
wherein θ is encoder parameters of the semantic training model, θ 1 Is a non-linear wordAnd (3) a character classifier parameter, wherein the number of characters of an input sequence mask is M, m=m' represents the number of mask characters, and V represents the vocabulary size.
Wherein θ is encoder parameters of the semantic training model, θ 2 Is a parameter of a nonlinear round classifier,for the prediction round of the ith segment, e=e' represents the number of segments of the dialog and S represents the maximum number of segments.
Obtaining a total loss function of the semantic training model:
L=L 1 (θ,θ 1 )+L 2 (θ,θ 2 )
training the semantic training model with the minimum loss function as a target, wherein θ is the encoder parameter of the semantic training model, and θ is 1 Is a non-linear character, θ 2 Parameter classifier parameters, L, for a nonlinear round classifier 1 (θ,θ 1 ) Loss function trained for mask, L 2 (θ,θ 2 ) A penalty function trained for sequential logic. So as to improve the accuracy of the model and reduce the error of the model.
The model can be further trained by expanding the collected corpus data so as to improve the question-answer effect of the semantic training model.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
Claims (10)
1. The artificial intelligence question-answering method based on the semantic training algorithm is characterized by comprising the following steps of: the method comprises the following steps:
s1.1: collecting corpus data;
s1.2: carrying out data preprocessing on the collected corpus data;
s1.3: constructing a semantic training model based on the collected corpus data;
s1.4: and carrying out model optimization training on the constructed semantic training model.
2. The artificial intelligence question-answering method based on semantic training algorithm according to claim 1, wherein: in the step S1.2, error detection correction processing and desensitization processing are carried out on the collected corpus data.
3. The artificial intelligence question-answering method based on semantic training algorithm according to claim 1, wherein: in the step S1.3, the input corpus data is defined, and the definition input is thatWhere m represents the total turn of the dialog, i.e. [1, m]Represents the ith round of dialog, u i Representing the speech replied to the ith round, defining +.>n i Is the length of the reply utterance of the ith round,/-for the ith round>J's character representing the speech replied to in the ith pass of speech, j's [1, n ] i ]。
4. The artificial intelligence question-answering method based on semantic training algorithm according to claim 3, wherein: in the S1.3, the input of the semantic training model comprises token coding, segment coding and position coding, and the addition of the token coding, the segment coding and the position coding is used as the input, wherein the token coding is as followsThe corresponding character embedding table is E t Representing the word segment encoded as +.>The corresponding fragment embedding table is E s S represents the maximum number of fragments, the position coding is +.>The corresponding position embedding table is +.>N represents the sequence length of the entire dialog, i.e. +.>Where m represents the total turn of the dialog, i.e. [1, m]Represents the ith round, n, of the dialog i Is the length of the utterance recovered by the ith round,
the total input of the semantic training model is obtained as follows:
wherein ,eij For inputting semantic training model, each position corresponds to an embedded vector;i∈[1,m],j∈[1,n i ]The method comprises the steps of carrying out a first treatment on the surface of the Extracting to obtain the following components:
E ij =transformer(e ij )
wherein ,Eij Representing the output vector for each character of the sequence.
5. The artificial intelligence question-answering method based on semantic training algorithm according to claim 4, wherein: and identifying through a nonlinear classifier by the embedded vector corresponding to each position, and judging whether the embedded vectors at other positions have masks and actual texts corresponding to the masks through mask training.
6. The artificial intelligence question-answering method based on semantic training algorithm according to claim 5, wherein: the mask training is as follows:
for the recovered utterancePredicting characters by a nonlinear character classifier, the formula is as follows:
7. The artificial intelligence question-answering method based on semantic training algorithm according to claim 6, wherein: in the mask training process, a loss function L of mask training 1 (θ,θ 1 ) Expressed as:
wherein θ is encoder parameters of the semantic training model, θ 1 For the non-line character classifier parameter, the number of characters of the input sequence mask is M, m=m' represents the number of mask characters, and V represents the vocabulary size.
8. The artificial intelligence question-answering method based on semantic training algorithm according to claim 7, wherein: for the speech u replied to the ith round i Sequential logic training is performed by first embedding vector E of the ith segment i1 Predicting the turn of the segment in the dialog
9. The artificial intelligence question-answering method based on semantic training algorithm according to claim 8, wherein: in the sequential logic training, a loss function L of the sequential logic training 2 (θ,θ 2 ) Expressed as:
10. The artificial intelligence question-answering method based on semantic training algorithm according to claim 1, wherein: in the step S1.4, a total loss function L of the semantic training model is obtained:
L=L 1 (θ,θ 1 )+L 2 (θ,θ 2 )
training the semantic training model with the minimum loss function as a target, wherein θ is the encoder parameter of the semantic training model, and θ is 1 Is a non-linear character, θ 2 Parameter classifier parameters, L, which are nonlinear round classifiers 1 (θ,θ 1 ) Loss function trained for mask, L 2 (θ,θ 2 ) A penalty function trained for sequential logic.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211711312.4A CN116204616A (en) | 2022-12-29 | 2022-12-29 | Artificial intelligence question-answering method based on semantic training algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211711312.4A CN116204616A (en) | 2022-12-29 | 2022-12-29 | Artificial intelligence question-answering method based on semantic training algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116204616A true CN116204616A (en) | 2023-06-02 |
Family
ID=86508635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211711312.4A Pending CN116204616A (en) | 2022-12-29 | 2022-12-29 | Artificial intelligence question-answering method based on semantic training algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116204616A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115358243A (en) * | 2022-07-27 | 2022-11-18 | 上海浦东发展银行股份有限公司 | Training method, device, equipment and storage medium for multi-round dialogue recognition model |
CN115391512A (en) * | 2022-08-30 | 2022-11-25 | 上海浦东发展银行股份有限公司 | Training method, device, equipment and storage medium of dialogue language model |
-
2022
- 2022-12-29 CN CN202211711312.4A patent/CN116204616A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115358243A (en) * | 2022-07-27 | 2022-11-18 | 上海浦东发展银行股份有限公司 | Training method, device, equipment and storage medium for multi-round dialogue recognition model |
CN115391512A (en) * | 2022-08-30 | 2022-11-25 | 上海浦东发展银行股份有限公司 | Training method, device, equipment and storage medium of dialogue language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110134771B (en) | Implementation method of multi-attention-machine-based fusion network question-answering system | |
CN110781680B (en) | Semantic similarity matching method based on twin network and multi-head attention mechanism | |
CN110134946B (en) | Machine reading understanding method for complex data | |
CN111738016A (en) | Multi-intention recognition method and related equipment | |
CN112101044B (en) | Intention identification method and device and electronic equipment | |
CN114926150B (en) | Digital intelligent auditing method and device for transformer technology compliance assessment | |
CN113742733B (en) | Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
CN112037773A (en) | N-optimal spoken language semantic recognition method and device and electronic equipment | |
CN113723105A (en) | Training method, device and equipment of semantic feature extraction model and storage medium | |
CN114676255A (en) | Text processing method, device, equipment, storage medium and computer program product | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN112328748A (en) | Method for identifying insurance configuration intention | |
CN111597816A (en) | Self-attention named entity recognition method, device, equipment and storage medium | |
CN114492460A (en) | Event causal relationship extraction method based on derivative prompt learning | |
CN112488111B (en) | Indication expression understanding method based on multi-level expression guide attention network | |
CN113065352B (en) | Method for identifying operation content of power grid dispatching work text | |
CN116595023A (en) | Address information updating method and device, electronic equipment and storage medium | |
CN116341519A (en) | Event causal relation extraction method, device and storage medium based on background knowledge | |
CN116483314A (en) | Automatic intelligent activity diagram generation method | |
CN109960782A (en) | A kind of Tibetan language segmenting method and device based on deep neural network | |
CN114416991A (en) | Method and system for analyzing text emotion reason based on prompt | |
CN116204616A (en) | Artificial intelligence question-answering method based on semantic training algorithm | |
CN114461779A (en) | Case writing element extraction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |