CN114036272A - Semantic analysis method and system for dialog system, electronic device and storage medium - Google Patents

Semantic analysis method and system for dialog system, electronic device and storage medium Download PDF

Info

Publication number
CN114036272A
CN114036272A CN202111271655.9A CN202111271655A CN114036272A CN 114036272 A CN114036272 A CN 114036272A CN 202111271655 A CN202111271655 A CN 202111271655A CN 114036272 A CN114036272 A CN 114036272A
Authority
CN
China
Prior art keywords
semantic analysis
layer
information
word
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111271655.9A
Other languages
Chinese (zh)
Inventor
江豪
肖龙源
李稀敏
李威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202111271655.9A priority Critical patent/CN114036272A/en
Publication of CN114036272A publication Critical patent/CN114036272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a semantic analysis method, a semantic analysis system, an electronic device and a storage medium of a dialogue system, wherein the semantic analysis method comprises the following steps of a, obtaining dialogue data, and preprocessing the dialogue data to obtain corpus information to be trained; b, training a word2vec model by using the corpus information to be trained; c, constructing a semantic analysis model based on the word2vec model; and d, inputting the corpus information to be analyzed into the semantic analysis model, wherein the semantic analysis model comprises a word2vec embedded layer, a BilSTM layer, a CDW layer and a linear classification layer. The invention provides a semantic analysis method, a semantic analysis system, an electronic device and a storage medium of a dialog system, which can simply and efficiently distinguish user semantics, provide accurate semantic information and provide reliable guidance for the next action of an intelligent dialog system.

Description

Semantic analysis method and system for dialog system, electronic device and storage medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a semantic analysis method and system for a dialog system, an electronic device and a storage medium.
Background
In the intelligent dialogue system, the semantic analysis result influences the next state of the intelligent dialogue, so that the analysis of the correct semantics of the dialogue information of the user is very important. For example: in the intelligent medical dialogue system, the semantic analysis result is the active inquiry of the user, and the next state corresponding to the intelligent dialogue is the answer of the user; the semantic analysis result is a passive answer of the user, and the next state corresponding to the intelligent conversation is to summarize symptoms/diseases, or further provide accurate treatment/examination suggestions and the like.
In general, the dialogue semantics of the user can be distinguished by whether the dialogue semantics are question sentences, wherein the question sentences are actively inquired and the statement sentences are passively answered. However, due to the particularity of Chinese dialogs, it is generally difficult to simply distinguish the semantics of a user by whether the dialog is a question or not. For example: "I consult for symptoms of XX disease", which sentence is a statement sentence, but actually belongs to the active inquiry of the user.
In the prior art, sentence pattern matching is performed by adopting a rule template method or a machine learning method to simply distinguish the semantics of users. However, the two ways can only distinguish whether the user dialogue sentences are question sentences, and for statement sentence inquiry sentences of the user, the user semantics cannot be correctly distinguished; and the accuracy is low, and reliable semantic guidance cannot be provided for the intelligent dialog system.
Disclosure of Invention
The invention mainly aims to provide a semantic analysis method, a semantic analysis system, an electronic device and a storage medium for a dialog system, which can simply and efficiently distinguish user semantics, provide accurate semantic information and provide reliable guidance for the next step of behavior of an intelligent dialog system.
In order to achieve the above object, the present invention provides a semantic analysis method for a dialog system, which comprises the following steps: step a, obtaining dialogue data, and preprocessing the dialogue data to obtain corpus information to be trained; b, training a word2vec model by using the corpus information to be trained; c, constructing a semantic analysis model based on the word2vec model; step d, inputting corpus information to be analyzed into the semantic analysis model, wherein the semantic analysis model comprises a word2vec embedded layer, a BilSTM layer, a CDW layer and a linear classification layer; the specific semantic analysis process comprises the following steps: d1. the word2vec embedding layer extracts word vector information of the corpus information to be analyzed, and the BilSTM layer is used for acquiring context information of the corpus to be analyzed; d2. the CDW layer acquires semantic information of the linguistic data to be analyzed according to the word vector information and the context information of the linguistic data to be analyzed; d3. and the linear classification layer classifies according to the semantic information to obtain two classification results 1 or 0 as a semantic analysis result, wherein 1 represents active query and 0 represents passive answer.
Optionally, the preprocessing includes removing stop words, removing useless characters, and removing emoticons.
Optionally, the step b includes the following steps: b1. carrying out entity recognition on the preprocessed corpus information to be trained by adopting an NER algorithm, and determining an entity contained in the corpus information to be trained; b2. performing word segmentation on the preprocessed corpus information by adopting a Jieba word segmentation, and counting the word frequency T of a word segmentation result; b3. manually combining and reserving unidentified entities in the word segmentation result; b4. the word2vec model was trained and saved using the Gensim package.
Optionally, in the step b, training is performed only on the word segmentation result with the word frequency T being greater than or equal to 5.
Optionally, the semantic analysis model further includes a Dropout layer and a LayerNorm layer; the corpus information to be analyzed sequentially passes through a word2vec embedded layer, a Dropout layer, a BilSTM layer, a LayerNorm layer, a CDW layer and a linear classification layer.
Optionally, the step d2 specifically includes the following steps:
d21. calculating a first weight u for each wordit
uit=tanh(Wwhit+bw);
Wherein i represents the ith sentence, t represents the t character in the ith sentence, and hitFor the output of the t character in the ith sentence after passing through the LayerNorm layer, WwIs hitCorresponding weight, bwIs hitA corresponding offset;
d22. calculating the distance relationship SRD between each character and the central wordit
Figure BDA0003328186510000031
Wherein, PaThe position of the central word is the position of the central word, the central word is one of symptoms, diseases or examination entities contained in the ith sentence, and m is a threshold value;
d23. based on threshold parameter sigma and distance relation SRD of each character and central worditTo obtain a second weight u for each wordit′,
Figure BDA0003328186510000032
Wherein n is the sentence length of the ith sentence;
d24. computing a feature vector s for the entire sentencei
Figure BDA0003328186510000033
Wherein, thetaitThe contribution degree of the t character in the ith sentence to the semantic information;
d25. according to the feature vector s of the whole sentenceiTwo classification results are obtained, 1 for active queries and 0 for passive answers.
Optionally, the threshold m is 10, and the threshold parameter σ is 5.
In addition, corresponding to the semantic analysis method of the dialogue system, the semantic analysis system comprises a text acquisition module, a semantic analysis module and a semantic analysis module, wherein the text acquisition module is used for acquiring dialogue data and preprocessing the dialogue data to obtain corpus information to be trained;
the model training module is used for training a word2vec model by adopting the corpus information to be trained;
the semantic analysis model building module is used for building a semantic analysis model based on the word2vec model;
the semantic analysis module is used for inputting the corpus information to be analyzed into the semantic analysis model, the semantic analysis model comprises a word2vec embedded layer, a BilSTM layer, a CDW layer and a linear classification layer, the word2vec embedded layer extracts word vector information of the corpus information to be analyzed, and the BilSTM layer is used for acquiring context information of the corpus to be analyzed; the CDW layer acquires semantic information of the linguistic data to be analyzed according to the word vector information and the context information of the linguistic data to be analyzed; and the linear classification layer classifies according to the semantic information to obtain two classification results 1 or 0 as a semantic analysis result, wherein 1 represents active query and 0 represents passive answer.
And, corresponding to the dialog system semantic analysis method, a semantic analysis system comprising at least one processor and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dialog system semantic analysis method.
The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the dialog system semantic analysis method.
The invention has the beneficial effects that:
(1) according to the invention, through the semantic analysis model, user semantics can be distinguished simply and efficiently, active query and passive answer of a user are correctly distinguished, accurate semantic information is provided, and reliable guidance is provided for the next step of behavior of the intelligent dialogue system according to the semantic analysis result;
(2) after the semantic analysis result of the user is obtained through the semantic analysis model, the intelligent dialogue system can carry out dialogue process design according to the result, so that the fluency and the specialty of the dialogue system are improved;
(3) the invention carries out word segmentation by adopting a method of combining the NER algorithm and the Jieba, thereby avoiding the condition that the common word segmentation tool can not carry out correct word segmentation on the entity content appearing in the specific field (such as the medical field); moreover, by adopting a method of combining the NER algorithm and the Jieba, the special words in a specific field (such as the medical field) can be reserved to the maximum extent, so that the performance of the word2vec model is improved;
(4) according to the invention, the generalization of the semantic analysis model is improved through the Dropout layer and the LayerNorm layer.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a simplified flow diagram of a semantic analysis method for a dialog system according to the present invention;
FIG. 2 is a schematic diagram of the semantic analysis model of the dialog system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the method for analyzing the semantics of the dialog system can simply and efficiently distinguish the semantics of the user, correctly distinguish the active query and the passive answer of the user, provide accurate semantic information, and provide reliable guidance for the next action of the intelligent dialog system according to the semantic analysis result.
The invention discloses a semantic analysis method for a dialog system, wherein the dialog system is preferably a medical dialog system, and the semantic analysis method specifically comprises the following steps:
step a, obtaining dialogue data in the medical field, and preprocessing the dialogue data to obtain corpus information to be trained; preferably, preprocessing includes, but is not limited to, removing stop words, removing useless characters, and removing emoticons;
b, training a word2vec model by using corpus information to be trained;
c, constructing a semantic analysis model based on the word2vec model (the model structure can be shown by referring to FIG. 2);
step d, inputting corpus information to be analyzed into a semantic analysis model, wherein the semantic analysis model comprises a word2vec embedded layer, a BilSTM layer, a CDW layer and a linear classification layer; the specific semantic analysis process comprises the following steps:
word2vec embedding layer extracts word vector information of the corpus information to be analyzed, and the BilSTM layer is used for obtaining context information of the corpus to be analyzed;
the CDW layer obtains semantic information of the linguistic data to be analyzed according to the word vector information and the context information of the linguistic data to be trained;
d3. and the linear classification layer classifies according to the semantic information and outputs a semantic analysis result. Preferably, the classifier used by the linear classification layer is a binary classifier, and the output classification result is 1 (representing active query) or 0 (representing passive answer). The output classification result is the semantic analysis result.
Preferably, if the semantic analysis result is active query, the next action of the medical dialogue system is to answer the active query; if the semantic analysis result is a passive answer, the next action of the medical dialogue system is to analyze the passive answer or continue to inquire the user.
After the semantic analysis result of the user is obtained through the semantic analysis model, the intelligent medical dialogue system can carry out dialogue process design according to the result, so that the fluency and the specialty of the dialogue system are improved.
In addition, due to the fact that the self-training word2vec model is adopted instead of a large-scale pre-training model, the model effect is guaranteed, and meanwhile the calculation efficiency of the model is improved.
Furthermore, the invention can well acquire the local semantic information of the sentence through the Context features Dynamic Weighted (CDW) model, thereby improving the accuracy of the model. Specifically, the CDW model obtains semantic information according to word vectors of the central words and context information thereof, and the target words refer to entities such as diseases and symptoms contained in the corpus to be analyzed.
In this embodiment, step b includes the following steps:
b1. performing entity recognition on the preprocessed corpus information to be trained by adopting an NER algorithm, and determining medical field entities contained in the corpus information to be trained;
b2. performing word segmentation on the preprocessed corpus information by adopting a Jieba word segmentation, counting the word frequency T of a word segmentation result, and preferably, training only aiming at the word segmentation result with the word frequency T being more than or equal to 5;
b3. manually combining the medical field entities which are not identified in the word segmentation result, and reserving the medical field entities;
b4. the word2vec model is trained and saved using the Gensim package for subsequent use.
The invention carries out word segmentation by adopting a method of combining the NER algorithm and the Jieba, thereby avoiding the condition that the common word segmentation tool can not carry out correct word segmentation on the content of symptoms, diseases and the like in the medical field.
And for a section of corpus, firstly finding out the medical field entities of the corpus through the existing NER service, then carrying out Jieba word segmentation on the corpus, and manually merging and reserving the medical field entities which are not identified in the word segmentation result, so that the specific words in the medical field can be reserved to the maximum extent by the method, and the performance of the word2vec model is improved.
As shown in FIG. 2, the semantic analysis model of the invention mainly comprises a word2vec embedded layer, a Dropout layer, a BilSTM layer, a LayerNorm layer, a CDW layer and a linear classification layer. The corpus information to be analyzed sequentially passes through a word2vec embedded layer, a Dropout layer, a BilSTM layer, a LayerNorm layer, a CDW layer and a linear classification layer.
In this embodiment, the CDW layer specifically acquiring semantic information of a corpus to be analyzed includes the following steps:
d21. calculating a first weight u for each wordit
uit=tanh(Wwhit+bw);
Wherein i represents the ith sentence, t represents the t character in the ith sentence, and hitIs the output of the t character in the ith sentence after passing through the LayerNorm layer, WwIs hitCorresponding weight, bwIs hitA corresponding offset;
d22. calculating the distance relationship SRD between each character and the central wordit
Figure BDA0003328186510000081
Wherein, PaIs the position of the central word, the central word is one of symptoms, diseases or examination entities contained in the ith sentence, and m is a threshold value; preferably, the threshold m is 10;
d23. based on threshold parameter sigma and distance relation SRD of each character and central worditTo obtain a second weight u for each wordit', updating the second weight u by the distance relationit' so that the weight of the important character is kept unchanged, the weight of the character farther away from the central word is smaller, the influence of extra information on the final prediction result is reduced, and the accuracy of the model is improved;
Figure BDA0003328186510000082
wherein n is the sentence length of the ith sentence; preferably, the threshold parameter σ is 5.
d24. Computing a feature vector s for the entire sentencei
Figure BDA0003328186510000083
Wherein, thetaitThe contribution degree of the t character in the ith sentence to semantic information is smaller for the character or word which is farther away from the central word;
d25. from the feature vector s of the whole sentenceiTwo classification results are obtained, 1 for active queries and 0 for passive answers.
The invention also provides a corresponding semantic analysis system, which comprises: the text acquisition module acquires dialogue data and performs preprocessing to obtain corpus information to be trained; the model training module is used for training a word2vec model by adopting the corpus information to be trained; the semantic analysis model building module is used for building a semantic analysis model based on the word2vec model; the semantic analysis module is used for inputting the corpus information to be analyzed into a semantic analysis model, the semantic analysis model comprises a word2vec embedded layer, a BilSTM layer, a CDW layer and a linear classification layer, the word vector information of the corpus information to be analyzed is extracted by the word2vec embedded layer, and the BilSTM layer is used for acquiring the context information of the corpus to be analyzed; the CDW layer acquires semantic information of the linguistic data to be analyzed according to the word vector information and the context information of the linguistic data to be analyzed; and the linear classification layer classifies according to the semantic information to obtain two classification results 1 or 0 as a semantic analysis result, wherein 1 represents active query and 0 represents passive answer.
The method is mainly applied to an intelligent medical dialogue system, analyzes complex sentences in the medical dialogue of the user, provides accurate semantic information, judges whether the input of the user is active inquiry or passive answer, and provides reliable guidance for the next action of the intelligent dialogue system. The semantic analysis model of the invention can judge the semantic condition of the input sentence of the user, and judge the query or answer semantics of the symptom, disease or checking entity contained in the input sentence.
For example, the user inputs "i want to consult the symptom of XX disease", it can be judged by the model that the user is a query for XX disease, and the user inputs "i do not have XX symptom" it can be judged by the model that the user is an answer to XX symptom.
After the semantic information of the user is obtained through the semantic analysis model, the intelligent medical dialogue system can carry out dialogue process design according to the semantic information, for example, if the input of the user is an inquiry sentence, the dialogue system needs to answer the inquiry sentence, if the input of the user is a passive answer, the dialogue system can carry out analysis according to an answer result or continue inquiring the user, and therefore fluency and specialty of the dialogue system are improved.
The invention also provides an electronic device, which comprises at least one processor and a memory which is in communication connection with the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a dialog system semantic analysis method.
In this embodiment, a computer-readable storage medium is further provided, in which a computer program is stored, where the computer program is executed by a processor to implement the dialog system semantic analysis method.
Also, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
While the above description shows and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A dialogue system semantic analysis method is characterized by comprising the following steps:
step a, obtaining dialogue data, and preprocessing the dialogue data to obtain corpus information to be trained;
b, training a word2vec model by using the corpus information to be trained;
c, constructing a semantic analysis model based on the word2vec model;
step d, inputting corpus information to be analyzed into the semantic analysis model, wherein the semantic analysis model comprises a word2vec embedded layer, a BilSTM layer, a CDW layer and a linear classification layer; the specific semantic analysis process comprises the following steps:
d1. the word2vec embedding layer extracts word vector information of the corpus information to be analyzed, and the BilSTM layer is used for acquiring context information of the corpus to be analyzed;
d2. the CDW layer acquires semantic information of the linguistic data to be analyzed according to the word vector information and the context information of the linguistic data to be analyzed;
d3. and the linear classification layer classifies according to the semantic information to obtain two classification results 1 or 0 as a semantic analysis result, wherein 1 represents active query and 0 represents passive answer.
2. The semantic analysis method of a dialog system according to claim 1, characterized in that: the preprocessing includes removing stop words, removing useless characters, and removing emoticons.
3. The semantic analysis method of a dialog system according to claim 1, characterized in that: the step b comprises the following steps:
b1. carrying out entity recognition on the preprocessed corpus information to be trained by adopting an NER algorithm, and determining an entity contained in the corpus information to be trained;
b2. performing word segmentation on the preprocessed corpus information by adopting a Jieba word segmentation, and counting the word frequency T of a word segmentation result;
b3. manually combining and reserving unidentified entities in the word segmentation result;
b4. the word2vec model was trained and saved using the Gensim package.
4. A dialog system semantic analysis method according to claim 3, characterized in that: in the step b, training is only carried out on the word segmentation result with the word frequency T being more than or equal to 5.
5. The semantic analysis method of a dialog system according to claim 1, characterized in that: the semantic analysis model further comprises a Dropout layer and a LayerNorm layer;
the corpus information to be analyzed sequentially passes through a word2vec embedded layer, a Dropout layer, a BilSTM layer, a LayerNorm layer, a CDW layer and a linear classification layer.
6. The semantic analysis method of a dialog system according to claim 5, characterized in that: the step d2 specifically comprises the following steps:
d21. calculating a first weight u for each wordit
uit=tanh(Wwhit+bw);
Wherein i represents the ith sentence, t represents the t character in the ith sentence, and hitFor the output of the t character in the ith sentence after passing through the LayerNorm layer, WwIs hitCorresponding weight, bwIs hitA corresponding offset;
d22. calculating the distance relationship SRD between each character and the central wordit
Figure FDA0003328186500000021
Wherein, PaThe position of the central word is the position of the central word, the central word is one of symptoms, diseases or examination entities contained in the ith sentence, and m is a threshold value;
d23. based on threshold parameter sigma and distance relation SRD of each character and central worditTo obtain each word'
Second weight uit
Figure FDA0003328186500000022
Wherein n is the sentence length of the ith sentence;
d24. computing a feature vector s for the entire sentencei
Figure FDA0003328186500000031
Wherein, thetaitThe contribution degree of the t character in the ith sentence to the semantic information;
d25. according to the feature vector s of the whole sentenceiTwo classification results are obtained, 1 for active queries and 0 for passive answers.
7. The semantic analysis method of a dialog system according to claim 6, characterized in that: the threshold m is 10, and the threshold parameter σ is 5.
8. A semantic analysis system, the system comprising:
the text acquisition module acquires dialogue data and performs preprocessing to obtain corpus information to be trained;
the model training module is used for training a word2vec model by adopting the corpus information to be trained;
the semantic analysis model building module is used for building a semantic analysis model based on the word2vec model;
the semantic analysis module is used for inputting the corpus information to be analyzed into the semantic analysis model, the semantic analysis model comprises a word2vec embedded layer, a BilSTM layer, a CDW layer and a linear classification layer, the word2vec embedded layer extracts word vector information of the corpus information to be analyzed, and the BilSTM layer is used for acquiring context information of the corpus to be analyzed; the CDW layer acquires semantic information of the linguistic data to be analyzed according to the word vector information and the context information of the linguistic data to be analyzed; and the linear classification layer classifies according to the semantic information to obtain two classification results 1 or 0 as a semantic analysis result, wherein 1 represents active query and 0 represents passive answer.
9. An electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the dialog system semantic analysis method of any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the dialog system semantic analysis method according to one of claims 1 to 7.
CN202111271655.9A 2021-10-29 2021-10-29 Semantic analysis method and system for dialog system, electronic device and storage medium Pending CN114036272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111271655.9A CN114036272A (en) 2021-10-29 2021-10-29 Semantic analysis method and system for dialog system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111271655.9A CN114036272A (en) 2021-10-29 2021-10-29 Semantic analysis method and system for dialog system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN114036272A true CN114036272A (en) 2022-02-11

Family

ID=80135836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111271655.9A Pending CN114036272A (en) 2021-10-29 2021-10-29 Semantic analysis method and system for dialog system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114036272A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947918A (en) * 2019-03-12 2019-06-28 南京邮电大学 Semantic analysis towards intelligent customer service session operational scenarios
US20200065389A1 (en) * 2017-10-10 2020-02-27 Tencent Technology (Shenzhen) Company Limited Semantic analysis method and apparatus, and storage medium
CN110990543A (en) * 2019-10-18 2020-04-10 平安科技(深圳)有限公司 Intelligent conversation generation method and device, computer equipment and computer storage medium
CN111104799A (en) * 2019-10-16 2020-05-05 中国平安人寿保险股份有限公司 Text information representation method and system, computer equipment and storage medium
CN111581966A (en) * 2020-04-30 2020-08-25 华南师范大学 Context feature fusion aspect level emotion classification method and device
CN111914556A (en) * 2020-06-19 2020-11-10 合肥工业大学 Emotion guiding method and system based on emotion semantic transfer map
CN112836053A (en) * 2021-03-05 2021-05-25 三一重工股份有限公司 Man-machine conversation emotion analysis method and system for industrial field
CN112883714A (en) * 2021-03-17 2021-06-01 广西师范大学 ABSC task syntactic constraint method based on dependency graph convolution and transfer learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200065389A1 (en) * 2017-10-10 2020-02-27 Tencent Technology (Shenzhen) Company Limited Semantic analysis method and apparatus, and storage medium
CN109947918A (en) * 2019-03-12 2019-06-28 南京邮电大学 Semantic analysis towards intelligent customer service session operational scenarios
CN111104799A (en) * 2019-10-16 2020-05-05 中国平安人寿保险股份有限公司 Text information representation method and system, computer equipment and storage medium
CN110990543A (en) * 2019-10-18 2020-04-10 平安科技(深圳)有限公司 Intelligent conversation generation method and device, computer equipment and computer storage medium
CN111581966A (en) * 2020-04-30 2020-08-25 华南师范大学 Context feature fusion aspect level emotion classification method and device
CN111914556A (en) * 2020-06-19 2020-11-10 合肥工业大学 Emotion guiding method and system based on emotion semantic transfer map
CN112836053A (en) * 2021-03-05 2021-05-25 三一重工股份有限公司 Man-machine conversation emotion analysis method and system for industrial field
CN112883714A (en) * 2021-03-17 2021-06-01 广西师范大学 ABSC task syntactic constraint method based on dependency graph convolution and transfer learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林江豪;周咏梅;阳爱民;陈锦: "基于语义相似度的情感特征向量提取方法", 计算机科学, no. 010, 31 December 2017 (2017-12-31) *
陈曙东;罗超;欧阳小叶;李威: "基于动态词典匹配的语义增强中文命名实体识别算法", 无线电工程, no. 007, 31 December 2021 (2021-12-31) *

Similar Documents

Publication Publication Date Title
CN111738004B (en) Named entity recognition model training method and named entity recognition method
CN110096570B (en) Intention identification method and device applied to intelligent customer service robot
CN110110585B (en) Intelligent paper reading implementation method and system based on deep learning and computer program
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN109992664B (en) Dispute focus label classification method and device, computer equipment and storage medium
CN115599901B (en) Machine question-answering method, device, equipment and storage medium based on semantic prompt
CN110347787B (en) Interview method and device based on AI auxiliary interview scene and terminal equipment
CN112883193A (en) Training method, device and equipment of text classification model and readable medium
CN111460142B (en) Short text classification method and system based on self-attention convolutional neural network
CN111046660B (en) Method and device for identifying text professional terms
CN111984780A (en) Multi-intention recognition model training method, multi-intention recognition method and related device
CN110991185A (en) Method and device for extracting attributes of entities in article
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
CN113672731A (en) Emotion analysis method, device and equipment based on domain information and storage medium
CN116881470A (en) Method and device for generating question-answer pairs
CN113705207A (en) Grammar error recognition method and device
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
CN116595994A (en) Contradictory information prediction method, device, equipment and medium based on prompt learning
CN115203356B (en) Professional field question-answering library construction method, question-answering method and system
CN116483314A (en) Automatic intelligent activity diagram generation method
CN114036272A (en) Semantic analysis method and system for dialog system, electronic device and storage medium
CN114239555A (en) Training method of keyword extraction model and related device
CN114372467A (en) Named entity extraction method and device, electronic equipment and storage medium
CN114692615A (en) Small sample semantic graph recognition method for small languages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination