CN112699663A - Semantic understanding system based on combination of multiple algorithms - Google Patents
Semantic understanding system based on combination of multiple algorithms Download PDFInfo
- Publication number
- CN112699663A CN112699663A CN202110019975.9A CN202110019975A CN112699663A CN 112699663 A CN112699663 A CN 112699663A CN 202110019975 A CN202110019975 A CN 202110019975A CN 112699663 A CN112699663 A CN 112699663A
- Authority
- CN
- China
- Prior art keywords
- semantic
- words
- algorithm module
- matching
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013135 deep learning Methods 0.000 claims abstract description 28
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a semantic understanding system based on combination of multiple algorithms, which comprises: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client. The invention solves the problem of low semantic understanding precision of the existing man-machine conversation.
Description
Technical Field
The invention relates to the technical field of semantic understanding, in particular to a semantic understanding system based on combination of multiple algorithms.
Background
Due to the complexity of Chinese sentences, the meaning of the Chinese sentences is difficult to understand by a machine, any algorithm cannot recognize and understand the sentences well, but the Chinese sentences can be recognized and understood simultaneously by combining a plurality of different algorithms, the semantic understanding technology combined by the plurality of algorithms is mainly applied to semantic understanding intention recognition scenes in human-computer conversation scenes, and the semantics of the sentences can be better understood and the understanding precision is improved by combining a plurality of algorithm modes. And extracting the algorithm with high score for use, so that the semantics of the sentence can be better approached.
Disclosure of Invention
Therefore, the invention provides a semantic understanding system based on combination of various algorithms to solve the problem of low semantic understanding precision of the existing man-machine conversation.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention discloses a semantic understanding system based on combination of multiple algorithms, which comprises: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client.
Furthermore, the semantic grammar algorithm module carries out word segmentation on the sentences according to a configured vocabulary, and the vocabulary records common general words and professional terms.
Furthermore, the semantic grammar algorithm module carries out word segmentation on the basis of a vocabulary table, associates a near word, a similar word and a word with the same semantic meaning with the word after the word is completely segmented in a sentence, splits a common problem item and a solution FAQ of a corresponding problem into a segment of semantic grammar entry system consisting of a plurality of words, and stores the split semantic grammar in a database for matching.
Furthermore, after the semantic grammar algorithm module receives the actual man-machine conversation sentences, the problem sentences are split, and the split words are subjected to similarity matching with the words in the database to obtain matching scores.
Further, the deep learning algorithm module learns the potential semantic rules in the text from a large amount of artificially labeled text data through training to generate a deep learning semantic model.
Further, the deep learning semantic model performs automatic intention recognition on sentence text of the newly entered human-computer conversation.
Further, after the intention is identified, the deep learning semantic model calculates a similarity matching score according to the identification result.
Further, the system compares the matching score calculated by the semantic grammar algorithm module with the matching score calculated by the deep learning algorithm module, and obtains the algorithm matching result with the highest similarity score as a final output result.
The invention has the following advantages:
the invention discloses a semantic understanding system based on combination of various algorithms, which performs semantic understanding matching through a semantic grammar algorithm module calculation and a deep learning algorithm module, calculates respective matching similarity scores of the two algorithms, and takes an algorithm matching result with the highest score as a final output semantic understanding result. Compared with the method using a single algorithm, the method has the advantages that the effect is more obvious, and the semantic understanding is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
FIG. 1 is a flow chart of a semantic understanding system based on a combination of a plurality of algorithms according to an embodiment of the present invention;
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The embodiment discloses a semantic understanding system based on combination of multiple algorithms, which comprises: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client.
The semantic grammar algorithm module divides sentences into words according to a configured vocabulary table, and the vocabulary table records common general vocabularies and professional terms; the semantic grammar algorithm module divides words on the basis of a vocabulary table, associates a near-meaning word, a similar word and a word with the same semantic meaning with the word after the word is divided in a sentence, divides a common problem item and a solution FAQ of a corresponding problem into a segment of semantic grammar recording system consisting of a plurality of words, and stores the divided semantic grammar in a database for matching.
And inputting the sentences generated in the man-machine conversation process into a semantic grammar algorithm module, splitting the problem sentences after the semantic grammar algorithm module receives the actual man-machine conversation sentences, and performing similarity matching on the split words and the words in the database to obtain matching scores.
The deep learning algorithm module learns potential semantic rules in the text from a large amount of artificially labeled text data through training to generate a deep learning semantic model. And (5) training by using a bert model, and storing the trained result.
The BERT is called Bidirectional Encoder reproduction from Transformers, i.e., encoders of Bidirectional Transformers because the Encoder cannot obtain the information to be predicted. The main innovation points of the model are all on a pre-train method, namely two methods, namely Masked LM and Next sequence Prediction, are used for capturing the representation at the word level and the Sentence level respectively.
The deep learning semantic model carries out automatic intention recognition on the sentence text of the newly-entered man-machine conversation, after the intention recognition, corresponding semantic understanding results are matched, and similarity matching scores are calculated according to the recognition results.
After the semantic grammar algorithm module and the deep learning algorithm module calculate the similarity matching scores, the system compares the matching scores calculated by the semantic grammar algorithm module with the matching scores calculated by the deep learning algorithm module, and obtains the algorithm matching result with the highest similarity score as the final output result. Compared with the method using a single algorithm, the method has the advantages that the effect is more obvious, and the semantic understanding is more accurate.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (8)
1. A semantic understanding system based on a combination of a plurality of algorithms, the system comprising: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client.
2. The system according to claim 1, wherein the semantic grammar algorithm module is used for segmenting the sentences according to a configured vocabulary, and the vocabulary records common general words and professional terms.
3. The semantic understanding system based on the combination of the algorithms as claimed in claim 2, wherein the semantic grammar algorithm module performs word segmentation on the basis of the vocabulary, associates the similar words, the words with the same semantics with the words after the words are completely segmented in the sentence, splits the common question items and the answer FAQ of the corresponding question into a segment of semantic grammar entry system composed of a plurality of words, and stores the split semantic grammar in the database for matching.
4. The semantic understanding system based on the combination of the algorithms as claimed in claim 1, wherein the semantic grammar algorithm module splits the question sentence after receiving the actual man-machine conversation sentence, and performs similarity matching between the split words and the words in the database to obtain the matching score.
5. The semantic understanding system based on the combination of algorithms as claimed in claim 1, wherein the deep learning algorithm module learns the latent semantic rules in the text from a large amount of manually labeled text data by training to generate a deep learning semantic model.
6. A semantic understanding system based on a combination of algorithms according to claim 1, characterized in that the deep learning semantic model performs automatic intention recognition for sentence text of a newly entered human-machine conversation.
7. The semantic understanding system based on the combination of the algorithms, according to claim 1, wherein the deep learning semantic model calculates a similarity matching score for the recognition result after the recognition of the intent.
8. The semantic understanding system based on the combination of the algorithms as claimed in claim 1, wherein the system compares the matching score calculated by the semantic grammar algorithm module with the matching score calculated by the deep learning algorithm module, and obtains the algorithm matching result with the highest similarity score as the final output result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110019975.9A CN112699663A (en) | 2021-01-07 | 2021-01-07 | Semantic understanding system based on combination of multiple algorithms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110019975.9A CN112699663A (en) | 2021-01-07 | 2021-01-07 | Semantic understanding system based on combination of multiple algorithms |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112699663A true CN112699663A (en) | 2021-04-23 |
Family
ID=75513225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110019975.9A Pending CN112699663A (en) | 2021-01-07 | 2021-01-07 | Semantic understanding system based on combination of multiple algorithms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112699663A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897263A (en) * | 2016-12-29 | 2017-06-27 | 北京光年无限科技有限公司 | Robot dialogue exchange method and device based on deep learning |
CN107436864A (en) * | 2017-08-04 | 2017-12-05 | 逸途(北京)科技有限公司 | A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec |
CN109657232A (en) * | 2018-11-16 | 2019-04-19 | 北京九狐时代智能科技有限公司 | A kind of intension recognizing method |
CN110008323A (en) * | 2019-03-27 | 2019-07-12 | 北京百分点信息科技有限公司 | A kind of the problem of semi-supervised learning combination integrated study, equivalence sentenced method for distinguishing |
CN110136699A (en) * | 2019-07-10 | 2019-08-16 | 南京硅基智能科技有限公司 | A kind of intension recognizing method based on text similarity |
CN110532566A (en) * | 2019-09-03 | 2019-12-03 | 山东浪潮通软信息科技有限公司 | A kind of implementation method that vertical field Question sentence parsing calculates |
CN110705296A (en) * | 2019-09-12 | 2020-01-17 | 华中科技大学 | Chinese natural language processing tool system based on machine learning and deep learning |
CN111026843A (en) * | 2019-12-02 | 2020-04-17 | 北京智乐瑟维科技有限公司 | Artificial intelligent voice outbound method, system and storage medium |
CN111581354A (en) * | 2020-05-12 | 2020-08-25 | 金蝶软件(中国)有限公司 | FAQ question similarity calculation method and system |
CN112632259A (en) * | 2020-12-30 | 2021-04-09 | 中通天鸿(北京)通信科技股份有限公司 | Automatic dialog intention recognition system based on linguistic rule generation |
-
2021
- 2021-01-07 CN CN202110019975.9A patent/CN112699663A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897263A (en) * | 2016-12-29 | 2017-06-27 | 北京光年无限科技有限公司 | Robot dialogue exchange method and device based on deep learning |
CN107436864A (en) * | 2017-08-04 | 2017-12-05 | 逸途(北京)科技有限公司 | A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec |
CN109657232A (en) * | 2018-11-16 | 2019-04-19 | 北京九狐时代智能科技有限公司 | A kind of intension recognizing method |
CN110008323A (en) * | 2019-03-27 | 2019-07-12 | 北京百分点信息科技有限公司 | A kind of the problem of semi-supervised learning combination integrated study, equivalence sentenced method for distinguishing |
CN110136699A (en) * | 2019-07-10 | 2019-08-16 | 南京硅基智能科技有限公司 | A kind of intension recognizing method based on text similarity |
CN110532566A (en) * | 2019-09-03 | 2019-12-03 | 山东浪潮通软信息科技有限公司 | A kind of implementation method that vertical field Question sentence parsing calculates |
CN110705296A (en) * | 2019-09-12 | 2020-01-17 | 华中科技大学 | Chinese natural language processing tool system based on machine learning and deep learning |
CN111026843A (en) * | 2019-12-02 | 2020-04-17 | 北京智乐瑟维科技有限公司 | Artificial intelligent voice outbound method, system and storage medium |
CN111581354A (en) * | 2020-05-12 | 2020-08-25 | 金蝶软件(中国)有限公司 | FAQ question similarity calculation method and system |
CN112632259A (en) * | 2020-12-30 | 2021-04-09 | 中通天鸿(北京)通信科技股份有限公司 | Automatic dialog intention recognition system based on linguistic rule generation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11314921B2 (en) | Text error correction method and apparatus based on recurrent neural network of artificial intelligence | |
CN109146610B (en) | Intelligent insurance recommendation method and device and intelligent insurance robot equipment | |
US10176804B2 (en) | Analyzing textual data | |
US10347244B2 (en) | Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response | |
CN108766414B (en) | Method, apparatus, device and computer-readable storage medium for speech translation | |
CN109145276A (en) | A kind of text correction method after speech-to-text based on phonetic | |
US10515292B2 (en) | Joint acoustic and visual processing | |
US11093110B1 (en) | Messaging feedback mechanism | |
CN112784696B (en) | Lip language identification method, device, equipment and storage medium based on image identification | |
KR101581816B1 (en) | Voice recognition method using machine learning | |
WO2020186712A1 (en) | Voice recognition method and apparatus, and terminal | |
Vinnarasu et al. | Speech to text conversion and summarization for effective understanding and documentation | |
US20150178274A1 (en) | Speech translation apparatus and speech translation method | |
CN114120985A (en) | Pacifying interaction method, system and equipment of intelligent voice terminal and storage medium | |
Chen et al. | Towards unsupervised automatic speech recognition trained by unaligned speech and text only | |
CN107123419A (en) | The optimization method of background noise reduction in the identification of Sphinx word speeds | |
Chandak et al. | Streaming language identification using combination of acoustic representations and ASR hypotheses | |
CN107562907B (en) | Intelligent lawyer expert case response device | |
CN113535925A (en) | Voice broadcasting method, device, equipment and storage medium | |
CN112632259A (en) | Automatic dialog intention recognition system based on linguistic rule generation | |
CN107609096B (en) | Intelligent lawyer expert response method | |
CN112699663A (en) | Semantic understanding system based on combination of multiple algorithms | |
Stoyanchev et al. | Localized error detection for targeted clarification in a virtual assistant | |
CN110807370B (en) | Conference speaker identity noninductive confirmation method based on multiple modes | |
CN108877781B (en) | Method and system for searching film through intelligent voice |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |