CN112699663A - Semantic understanding system based on combination of multiple algorithms - Google Patents

Semantic understanding system based on combination of multiple algorithms Download PDF

Info

Publication number
CN112699663A
CN112699663A CN202110019975.9A CN202110019975A CN112699663A CN 112699663 A CN112699663 A CN 112699663A CN 202110019975 A CN202110019975 A CN 202110019975A CN 112699663 A CN112699663 A CN 112699663A
Authority
CN
China
Prior art keywords
semantic
words
algorithm module
matching
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110019975.9A
Other languages
Chinese (zh)
Inventor
娄鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Icsoc Beijing Communication Technology Co ltd
Original Assignee
Icsoc Beijing Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Icsoc Beijing Communication Technology Co ltd filed Critical Icsoc Beijing Communication Technology Co ltd
Priority to CN202110019975.9A priority Critical patent/CN112699663A/en
Publication of CN112699663A publication Critical patent/CN112699663A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a semantic understanding system based on combination of multiple algorithms, which comprises: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client. The invention solves the problem of low semantic understanding precision of the existing man-machine conversation.

Description

Semantic understanding system based on combination of multiple algorithms
Technical Field
The invention relates to the technical field of semantic understanding, in particular to a semantic understanding system based on combination of multiple algorithms.
Background
Due to the complexity of Chinese sentences, the meaning of the Chinese sentences is difficult to understand by a machine, any algorithm cannot recognize and understand the sentences well, but the Chinese sentences can be recognized and understood simultaneously by combining a plurality of different algorithms, the semantic understanding technology combined by the plurality of algorithms is mainly applied to semantic understanding intention recognition scenes in human-computer conversation scenes, and the semantics of the sentences can be better understood and the understanding precision is improved by combining a plurality of algorithm modes. And extracting the algorithm with high score for use, so that the semantics of the sentence can be better approached.
Disclosure of Invention
Therefore, the invention provides a semantic understanding system based on combination of various algorithms to solve the problem of low semantic understanding precision of the existing man-machine conversation.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention discloses a semantic understanding system based on combination of multiple algorithms, which comprises: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client.
Furthermore, the semantic grammar algorithm module carries out word segmentation on the sentences according to a configured vocabulary, and the vocabulary records common general words and professional terms.
Furthermore, the semantic grammar algorithm module carries out word segmentation on the basis of a vocabulary table, associates a near word, a similar word and a word with the same semantic meaning with the word after the word is completely segmented in a sentence, splits a common problem item and a solution FAQ of a corresponding problem into a segment of semantic grammar entry system consisting of a plurality of words, and stores the split semantic grammar in a database for matching.
Furthermore, after the semantic grammar algorithm module receives the actual man-machine conversation sentences, the problem sentences are split, and the split words are subjected to similarity matching with the words in the database to obtain matching scores.
Further, the deep learning algorithm module learns the potential semantic rules in the text from a large amount of artificially labeled text data through training to generate a deep learning semantic model.
Further, the deep learning semantic model performs automatic intention recognition on sentence text of the newly entered human-computer conversation.
Further, after the intention is identified, the deep learning semantic model calculates a similarity matching score according to the identification result.
Further, the system compares the matching score calculated by the semantic grammar algorithm module with the matching score calculated by the deep learning algorithm module, and obtains the algorithm matching result with the highest similarity score as a final output result.
The invention has the following advantages:
the invention discloses a semantic understanding system based on combination of various algorithms, which performs semantic understanding matching through a semantic grammar algorithm module calculation and a deep learning algorithm module, calculates respective matching similarity scores of the two algorithms, and takes an algorithm matching result with the highest score as a final output semantic understanding result. Compared with the method using a single algorithm, the method has the advantages that the effect is more obvious, and the semantic understanding is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
FIG. 1 is a flow chart of a semantic understanding system based on a combination of a plurality of algorithms according to an embodiment of the present invention;
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
The embodiment discloses a semantic understanding system based on combination of multiple algorithms, which comprises: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client.
The semantic grammar algorithm module divides sentences into words according to a configured vocabulary table, and the vocabulary table records common general vocabularies and professional terms; the semantic grammar algorithm module divides words on the basis of a vocabulary table, associates a near-meaning word, a similar word and a word with the same semantic meaning with the word after the word is divided in a sentence, divides a common problem item and a solution FAQ of a corresponding problem into a segment of semantic grammar recording system consisting of a plurality of words, and stores the divided semantic grammar in a database for matching.
And inputting the sentences generated in the man-machine conversation process into a semantic grammar algorithm module, splitting the problem sentences after the semantic grammar algorithm module receives the actual man-machine conversation sentences, and performing similarity matching on the split words and the words in the database to obtain matching scores.
The deep learning algorithm module learns potential semantic rules in the text from a large amount of artificially labeled text data through training to generate a deep learning semantic model. And (5) training by using a bert model, and storing the trained result.
The BERT is called Bidirectional Encoder reproduction from Transformers, i.e., encoders of Bidirectional Transformers because the Encoder cannot obtain the information to be predicted. The main innovation points of the model are all on a pre-train method, namely two methods, namely Masked LM and Next sequence Prediction, are used for capturing the representation at the word level and the Sentence level respectively.
The deep learning semantic model carries out automatic intention recognition on the sentence text of the newly-entered man-machine conversation, after the intention recognition, corresponding semantic understanding results are matched, and similarity matching scores are calculated according to the recognition results.
After the semantic grammar algorithm module and the deep learning algorithm module calculate the similarity matching scores, the system compares the matching scores calculated by the semantic grammar algorithm module with the matching scores calculated by the deep learning algorithm module, and obtains the algorithm matching result with the highest similarity score as the final output result. Compared with the method using a single algorithm, the method has the advantages that the effect is more obvious, and the semantic understanding is more accurate.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (8)

1. A semantic understanding system based on a combination of a plurality of algorithms, the system comprising: the system comprises a semantic grammar algorithm module and a deep learning algorithm module, wherein the semantic grammar algorithm module and the deep learning algorithm module are used for performing word segmentation processing on sentences and answers, respectively associating near-meaning words, similar words and words with the same semantics with the words after the words in the sentences are segmented, storing the words in a database for later semantic grammar matching, training by using a bert model, storing the trained results for later matching, calculating corresponding similarity scores by using the semantic grammar algorithm module and the deep learning algorithm module, and feeding the result with the highest similarity score as a final matching result back to a client.
2. The system according to claim 1, wherein the semantic grammar algorithm module is used for segmenting the sentences according to a configured vocabulary, and the vocabulary records common general words and professional terms.
3. The semantic understanding system based on the combination of the algorithms as claimed in claim 2, wherein the semantic grammar algorithm module performs word segmentation on the basis of the vocabulary, associates the similar words, the words with the same semantics with the words after the words are completely segmented in the sentence, splits the common question items and the answer FAQ of the corresponding question into a segment of semantic grammar entry system composed of a plurality of words, and stores the split semantic grammar in the database for matching.
4. The semantic understanding system based on the combination of the algorithms as claimed in claim 1, wherein the semantic grammar algorithm module splits the question sentence after receiving the actual man-machine conversation sentence, and performs similarity matching between the split words and the words in the database to obtain the matching score.
5. The semantic understanding system based on the combination of algorithms as claimed in claim 1, wherein the deep learning algorithm module learns the latent semantic rules in the text from a large amount of manually labeled text data by training to generate a deep learning semantic model.
6. A semantic understanding system based on a combination of algorithms according to claim 1, characterized in that the deep learning semantic model performs automatic intention recognition for sentence text of a newly entered human-machine conversation.
7. The semantic understanding system based on the combination of the algorithms, according to claim 1, wherein the deep learning semantic model calculates a similarity matching score for the recognition result after the recognition of the intent.
8. The semantic understanding system based on the combination of the algorithms as claimed in claim 1, wherein the system compares the matching score calculated by the semantic grammar algorithm module with the matching score calculated by the deep learning algorithm module, and obtains the algorithm matching result with the highest similarity score as the final output result.
CN202110019975.9A 2021-01-07 2021-01-07 Semantic understanding system based on combination of multiple algorithms Pending CN112699663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110019975.9A CN112699663A (en) 2021-01-07 2021-01-07 Semantic understanding system based on combination of multiple algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110019975.9A CN112699663A (en) 2021-01-07 2021-01-07 Semantic understanding system based on combination of multiple algorithms

Publications (1)

Publication Number Publication Date
CN112699663A true CN112699663A (en) 2021-04-23

Family

ID=75513225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110019975.9A Pending CN112699663A (en) 2021-01-07 2021-01-07 Semantic understanding system based on combination of multiple algorithms

Country Status (1)

Country Link
CN (1) CN112699663A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897263A (en) * 2016-12-29 2017-06-27 北京光年无限科技有限公司 Robot dialogue exchange method and device based on deep learning
CN107436864A (en) * 2017-08-04 2017-12-05 逸途(北京)科技有限公司 A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec
CN109657232A (en) * 2018-11-16 2019-04-19 北京九狐时代智能科技有限公司 A kind of intension recognizing method
CN110008323A (en) * 2019-03-27 2019-07-12 北京百分点信息科技有限公司 A kind of the problem of semi-supervised learning combination integrated study, equivalence sentenced method for distinguishing
CN110136699A (en) * 2019-07-10 2019-08-16 南京硅基智能科技有限公司 A kind of intension recognizing method based on text similarity
CN110532566A (en) * 2019-09-03 2019-12-03 山东浪潮通软信息科技有限公司 A kind of implementation method that vertical field Question sentence parsing calculates
CN110705296A (en) * 2019-09-12 2020-01-17 华中科技大学 Chinese natural language processing tool system based on machine learning and deep learning
CN111026843A (en) * 2019-12-02 2020-04-17 北京智乐瑟维科技有限公司 Artificial intelligent voice outbound method, system and storage medium
CN111581354A (en) * 2020-05-12 2020-08-25 金蝶软件(中国)有限公司 FAQ question similarity calculation method and system
CN112632259A (en) * 2020-12-30 2021-04-09 中通天鸿(北京)通信科技股份有限公司 Automatic dialog intention recognition system based on linguistic rule generation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897263A (en) * 2016-12-29 2017-06-27 北京光年无限科技有限公司 Robot dialogue exchange method and device based on deep learning
CN107436864A (en) * 2017-08-04 2017-12-05 逸途(北京)科技有限公司 A kind of Chinese question and answer semantic similarity calculation method based on Word2Vec
CN109657232A (en) * 2018-11-16 2019-04-19 北京九狐时代智能科技有限公司 A kind of intension recognizing method
CN110008323A (en) * 2019-03-27 2019-07-12 北京百分点信息科技有限公司 A kind of the problem of semi-supervised learning combination integrated study, equivalence sentenced method for distinguishing
CN110136699A (en) * 2019-07-10 2019-08-16 南京硅基智能科技有限公司 A kind of intension recognizing method based on text similarity
CN110532566A (en) * 2019-09-03 2019-12-03 山东浪潮通软信息科技有限公司 A kind of implementation method that vertical field Question sentence parsing calculates
CN110705296A (en) * 2019-09-12 2020-01-17 华中科技大学 Chinese natural language processing tool system based on machine learning and deep learning
CN111026843A (en) * 2019-12-02 2020-04-17 北京智乐瑟维科技有限公司 Artificial intelligent voice outbound method, system and storage medium
CN111581354A (en) * 2020-05-12 2020-08-25 金蝶软件(中国)有限公司 FAQ question similarity calculation method and system
CN112632259A (en) * 2020-12-30 2021-04-09 中通天鸿(北京)通信科技股份有限公司 Automatic dialog intention recognition system based on linguistic rule generation

Similar Documents

Publication Publication Date Title
US11314921B2 (en) Text error correction method and apparatus based on recurrent neural network of artificial intelligence
CN109146610B (en) Intelligent insurance recommendation method and device and intelligent insurance robot equipment
US10176804B2 (en) Analyzing textual data
US10347244B2 (en) Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response
CN108766414B (en) Method, apparatus, device and computer-readable storage medium for speech translation
CN109145276A (en) A kind of text correction method after speech-to-text based on phonetic
US10515292B2 (en) Joint acoustic and visual processing
US11093110B1 (en) Messaging feedback mechanism
CN112784696B (en) Lip language identification method, device, equipment and storage medium based on image identification
KR101581816B1 (en) Voice recognition method using machine learning
WO2020186712A1 (en) Voice recognition method and apparatus, and terminal
Vinnarasu et al. Speech to text conversion and summarization for effective understanding and documentation
US20150178274A1 (en) Speech translation apparatus and speech translation method
CN114120985A (en) Pacifying interaction method, system and equipment of intelligent voice terminal and storage medium
Chen et al. Towards unsupervised automatic speech recognition trained by unaligned speech and text only
CN107123419A (en) The optimization method of background noise reduction in the identification of Sphinx word speeds
Chandak et al. Streaming language identification using combination of acoustic representations and ASR hypotheses
CN107562907B (en) Intelligent lawyer expert case response device
CN113535925A (en) Voice broadcasting method, device, equipment and storage medium
CN112632259A (en) Automatic dialog intention recognition system based on linguistic rule generation
CN107609096B (en) Intelligent lawyer expert response method
CN112699663A (en) Semantic understanding system based on combination of multiple algorithms
Stoyanchev et al. Localized error detection for targeted clarification in a virtual assistant
CN110807370B (en) Conference speaker identity noninductive confirmation method based on multiple modes
CN108877781B (en) Method and system for searching film through intelligent voice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination