CN112149410A - Semantic recognition method and device, computer equipment and storage medium - Google Patents

Semantic recognition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112149410A
CN112149410A CN202010795138.0A CN202010795138A CN112149410A CN 112149410 A CN112149410 A CN 112149410A CN 202010795138 A CN202010795138 A CN 202010795138A CN 112149410 A CN112149410 A CN 112149410A
Authority
CN
China
Prior art keywords
recognition
sentence
semantic
model
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010795138.0A
Other languages
Chinese (zh)
Inventor
夏海兵
林昊
佘丽丽
毛宇
吴仕灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Merchants Union Consumer Finance Co Ltd
Original Assignee
Merchants Union Consumer Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Merchants Union Consumer Finance Co Ltd filed Critical Merchants Union Consumer Finance Co Ltd
Priority to CN202010795138.0A priority Critical patent/CN112149410A/en
Publication of CN112149410A publication Critical patent/CN112149410A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to a semantic recognition method, a semantic recognition device, a computer device and a storage medium. The method comprises the following steps: obtaining a sentence to be identified; carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result; when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through the key semantic model to obtain a key semantic recognition result; when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through a real-time matching model to obtain a real-time matching recognition result; when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentences to be recognized through a supervised classification model to obtain a supervised classification recognition result; and obtaining the semantics of the sentence to be recognized according to the result of the supervised classification recognition. By adopting the method, the identification accuracy can be improved, and the identification speed and the identification stability can be ensured.

Description

Semantic recognition method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a semantic recognition method, an apparatus, a computer device, and a storage medium.
Background
With the development of natural language processing technology, a semantic recognition technology appears, and in order to efficiently and smoothly interact with a client in the face of massive business requests put forward on line in a real scene, the semantic recognition technology which has high operation speed and recognition accuracy and can be updated on line in real time is needed.
In the traditional semantic recognition technology, a rule template is usually adopted to realize semantic recognition under an interactive service scene, and the rule template performs semantic recognition according to the logical combination of keywords, so that most of client dialects containing keyword sets can be recognized, but the defects of high recall rate, low recognition accuracy rate and the like exist, the configuration is complex, the labor input is large, and the service scene mobility is poor. With the rise of deep learning technology and the accumulation of a large amount of text data, semantic recognition is carried out through deep learning algorithm modeling, the complexity of keyword configuration can be effectively reduced, the semantic recognition accuracy is high, however, the problems that misjudgment samples are difficult to improve during retraining and the recognition accuracy is easily influenced by network parameter changes exist in deep learning.
Therefore, the traditional semantic recognition technology has the problems of low operation speed, low recognition accuracy and poor model stability.
Disclosure of Invention
Based on this, it is necessary to provide a semantic recognition method, an apparatus, a computer device, and a storage medium for solving the technical problems of slow operation speed, low recognition accuracy, and poor model stability.
A method of semantic recognition, the method comprising:
obtaining a sentence to be identified;
carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result;
when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through a key semantic model to obtain a key semantic recognition result;
when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through a real-time matching model to obtain a real-time matching recognition result;
when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentence to be recognized through a supervised classification model to obtain a supervised classification recognition result;
and obtaining the semantics of the sentence to be recognized according to the supervision classification recognition result.
In one embodiment, the performing complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result includes:
obtaining a statement template and template semantics corresponding to the statement template;
judging whether a target statement template matched with the statement to be recognized can be found in the statement template;
if so, obtaining the semantics of the sentence to be recognized according to the template semantics of the target sentence template;
if not, outputting the complete matching identification result as identification failure.
In one embodiment, the performing key semantic recognition on the sentence to be recognized through the key semantic model to obtain a key semantic recognition result includes:
acquiring a keyword logic rule and rule semantics corresponding to the keyword logic rule;
judging whether a target logic rule matched with the statement to be identified can be found in the keyword logic rule;
if so, obtaining the semantics of the statement to be recognized according to the rule semantics of the target logic rule;
if not, outputting the key semantic recognition result as recognition failure.
In one embodiment, the real-time matching model comprises a text matching model and a semantic matching model; the real-time matching recognition of the sentence to be recognized through the real-time matching model to obtain a real-time matching recognition result, which comprises the following steps:
determining a text similar sentence corresponding to the sentence to be recognized according to the text matching model;
obtaining semantic similarity between the text similar statement and the statement to be recognized through the semantic matching model;
judging whether the semantic similarity exceeds a preset similarity threshold;
if so, selecting a target sentence from the text similar sentences, and obtaining the semantics of the sentence to be recognized according to the semantics of the target sentence; the target sentence is a text similar sentence of which the semantic similarity meets a preset condition;
if not, outputting the real-time matching identification result as identification failure.
In one embodiment, the performing supervised classification and recognition on the sentence to be recognized through a supervised classification model to obtain a supervised classification and recognition result includes:
obtaining model semantics of the supervised classification model;
counting the semantic matching degree of the statement to be recognized to the model semantics;
judging whether the semantic matching degree exceeds a preset matching degree threshold value or not;
if so, selecting a target semantic from the model semantics, and obtaining the semantic of the sentence to be recognized according to the target semantic; the target semantics are model semantics of which the semantic matching degree meets a preset condition;
if not, outputting the supervision classification recognition result as recognition failure.
A real-time matching model training method for semantic recognition is disclosed, wherein the real-time matching model comprises a text matching model and a semantic matching model; the method comprises the following steps:
acquiring a training sample sentence;
obtaining training sample text similar sentences corresponding to the training sample sentences according to a pre-trained text matching model;
obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences;
training a semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
A supervised classification model training method for semantic recognition, the method comprising:
acquiring a training sample sentence;
performing word segmentation processing on the training sample sentences to obtain a word set of the training sample sentences;
resampling the word set to obtain resampled words;
coding the resampled words to obtain resampled word codes;
training a supervised classification model to be trained based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure, so as to obtain a supervised classification recognition result.
A semantic recognition apparatus, the apparatus comprising:
the acquisition module is used for acquiring the sentence to be identified;
the complete matching recognition module is used for carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result;
the key semantic recognition module is used for carrying out key semantic recognition on the sentence to be recognized through a key semantic model to obtain a key semantic recognition result when the complete matching recognition result is recognition failure;
the real-time matching identification module is used for carrying out real-time matching identification on the sentence to be identified through a real-time matching model when the key semantic identification result is identification failure so as to obtain a real-time matching identification result;
the supervised classification recognition module is used for carrying out supervised classification recognition on the sentences to be recognized through a supervised classification model to obtain a supervised classification recognition result when the real-time matching recognition result is recognition failure;
and the output module is used for obtaining the semantics of the sentence to be recognized according to the supervision classification recognition result.
A real-time matching model training apparatus for semantic recognition, the apparatus comprising:
the acquisition module is used for acquiring training sample sentences;
the text similarity module is used for obtaining training sample text similar sentences corresponding to the training sample sentences according to a pre-trained text matching model;
the training sample generation module is used for obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences;
the training module is used for training a semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
A supervised classification model training apparatus for semantic recognition, the apparatus comprising:
the acquisition module is used for acquiring training sample sentences;
the word segmentation module is used for carrying out word segmentation processing on the training sample sentences to obtain a word set of the training sample sentences;
the resampling module is used for resampling the word set to obtain resampled words;
the coding module is used for coding the resampling words to obtain resampling word codes;
the training module is used for training a supervised classification model to be trained based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure, so as to obtain a supervised classification recognition result.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
obtaining a sentence to be identified;
carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result;
when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through a key semantic model to obtain a key semantic recognition result;
when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through a real-time matching model to obtain a real-time matching recognition result;
when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentence to be recognized through a supervised classification model to obtain a supervised classification recognition result;
and obtaining the semantics of the sentence to be recognized according to the supervision classification recognition result.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
obtaining a sentence to be identified;
carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result;
when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through a key semantic model to obtain a key semantic recognition result;
when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through a real-time matching model to obtain a real-time matching recognition result;
when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentence to be recognized through a supervised classification model to obtain a supervised classification recognition result;
and obtaining the semantics of the sentence to be recognized according to the supervision classification recognition result.
The semantic recognition method, the semantic recognition device, the computer equipment and the storage medium acquire the sentence to be recognized, complete matching recognition is carried out on the sentence to be recognized through the complete matching model to obtain a complete matching recognition result, and the recognition accuracy rate when the sentence to be recognized is completely matched with the model sample can be ensured; when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through the key semantic model to obtain a key semantic recognition result, so that the recognition accuracy of key semantics can be ensured; when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through the real-time matching model to obtain a real-time matching recognition result, updating the sample library in real time, and ensuring the recognition stability and the recognition accuracy rate of non-key semantics; when the real-time matching recognition result is recognition failure, the sentence to be recognized is subjected to supervised classification recognition through the supervised classification model to obtain a supervised classification recognition result, and the semantics of the sentence to be recognized is obtained according to the supervised classification recognition result, so that the recognition accuracy can be further improved, and the recognition speed and the recognition stability are ensured.
Drawings
FIG. 1 is a diagram of an application environment of a semantic recognition method in one embodiment;
FIG. 2 is a flow diagram that illustrates a semantic recognition method in one embodiment;
FIG. 3 is a flow diagram of a semantic identification method in another embodiment;
FIG. 4 is a schematic flow chart diagram illustrating a method for training a real-time matching model for semantic recognition in one embodiment;
FIG. 5 is a schematic flow chart diagram illustrating a supervised classification model training methodology for semantic recognition in one embodiment;
FIG. 6 is a block diagram of a semantic identification apparatus in one embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The semantic recognition method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a semantic recognition method is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:
step S210, obtaining the sentence to be recognized.
In a specific implementation, a piece of speech or a piece of text may be input to the terminal 102, the terminal 102 transmits the speech or the text to the server 104, and the server 104 may use the received text as a sentence to be recognized, or convert the received speech into a text and use the text as the sentence to be recognized.
And step S220, carrying out complete matching recognition on the sentence to be recognized through the complete matching model to obtain a complete matching recognition result.
In a specific implementation, the server 104 may input a sentence to be recognized into the perfect matching model, and the server 104 may pre-store a plurality of sample sentences of the perfect matching model and sentence intentions corresponding to the sample sentences, where the sample sentences may be texts in a sample library subjected to refined labeling, and the sample sentences may also be sentences subjected to refined labeling and having a higher occurrence frequency, for example, the sample sentences may be texts (Top 1000 data) that are ranked according to text frequency and located at the Top 1000. After the sentence to be recognized is input into the perfect matching model, perfect matching recognition can be performed according to the sample sentence, if the sentence to be recognized is perfectly matched with a certain sample sentence, for example, the sentence to be recognized is perfectly the same as the certain sample sentence, the sample sentence can be used as a target sentence, the sentence intent corresponding to the target sentence can be used as the semantic of the sentence to be recognized, and the perfect matching model can output the sentence intent of the target sentence, thereby completing the semantic recognition of the sentence to be recognized.
And step S230, when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through the key semantic model to obtain a key semantic recognition result.
The key semantic model is a model capable of performing key semantic recognition through a rule template.
In specific implementation, the complete matching model has a condition that a sample sentence which is completely matched with the sentence to be recognized cannot be found, at this time, the complete matching model can output a recognition result as a recognition failure, and the server 104 can input the sentence to be recognized into the key semantic model when reading the recognition failure result. The key semantic model may adopt a rule template, the server 104 may pre-store a Keyword logic set (Keyword logic Collection) of the rule template and statement intents corresponding to each Keyword logic in the set, when a statement to be recognized satisfies a certain Keyword logic, the Keyword logic may be used as a target logic rule, the statement intents corresponding to the target logic rule may be used as semantics of the statement to be recognized, and the key semantic model may output the statement intents of the target logic rule, thereby completing semantic recognition of the statement to be recognized.
It should be noted that, because the rule template has the problems of high recall rate and low accuracy, only key semantic rules may be set in the key semantic model, and at this time, only key word logic and sentence intentions of key semantics may be stored in the server 104. The key semantic model can also be used for accumulating new meaning pattern, when a sentence to be recognized which cannot be recognized by the rule template occurs, the server 104 can take the sentence as a new sample to be stored in a new sample database, and then the sample database of the key semantic model, the real-time matching model or the supervised classification model can be updated by recognizing and labeling the new sample.
And S240, when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through the real-time matching model to obtain a real-time matching recognition result.
The real-time matching model is a model capable of real-time online updating.
In a specific implementation, the key semantic model has a condition that the keyword logic matched with the sentence to be recognized cannot be found, at this time, the key semantic model can output the recognition result as recognition failure, and the server 104 can input the sentence to be recognized into the real-time matching model when reading the recognition failure result. The real-time Matching model can adopt a similarity model based on BM25(Best Matching 25) and a twin Network model (Simease Network), the similarity model can firstly recall N sample sentences most similar to the characters or the characters of the sentences to be recognized from a sample library through a BM25 algorithm, then a twin neural Network is used for extracting semantic features, finally cosine is used for calculating the semantic similarity between the sentences to be recognized and the N sample sentences, the sample sentences with the highest similarity are selected through reordering, if the similarity reaches a preset threshold value, the sample sentences can be used as target sentences, the sentence intentions corresponding to the target sentences can be used as the semantics of the sentences to be recognized, and the real-time Matching model can output the sentence intentions of the target sentences to complete the semantic recognition of the sentences to be recognized.
The BM25 is an algorithm proposed based on a probabilistic search model, and can be used to evaluate the relevance between a search word (query) and a document (D), the score between the query and the D may be composed of 3 parts, including the relevance between a word and the D, the relevance between a word and the query, and the weight of each word, and the score between the query and the D can be obtained by summing the scores of each word.
The twin neural network is a type of neural network architecture containing two or more identical sub-networks, wherein identical refers to having the same configuration, including having the same parameters and weights, and the parameter update can be performed on the two sub-networks together.
Note that, the BM25 algorithm may be replaced by algorithms such as BoW (Bag-of-Words), VSM (Vector Space Model), TF-IDF (Term Frequency-Inverse text Frequency), Jaccord (jackcard), and SimHash (local sensitive hash), and the twin Network Model may be replaced by algorithms such as ESIM (Enhanced sequence Inference Model), BiMPM (Bilateral Multi-view Matching Model), ABCNN (Attention-Based Convolutional Neural Network), din (dense interaction Network), and cn (density-related Network-attached Co) and loop-connected Co.
And step S250, when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentence to be recognized through the supervised classification model to obtain a supervised classification recognition result.
In specific implementation, the real-time matching model has a condition that the keyword logic matched with the sentence to be recognized cannot be found, at this time, the real-time matching model can output the recognition result as recognition failure, and the server 104 can input the sentence to be recognized into the supervised classification model when reading the recognition failure result. The server 104 may store a plurality of sentence intentions in advance, the supervised classification model may determine, through a TextCNN (Text Convolutional Neural Networks) algorithm, a probability of a sentence to be recognized on each sentence intention, if the probability reaches a preset threshold, the sentence intention with the highest probability value may be selected as a target semantic, and the supervised classification model may output the target semantic, thereby completing semantic recognition of the sentence to be recognized.
The TextCNN algorithm may be replaced with DCNN (Deep Convolutional Neural network), LSTM (Long Short Term Memory), RCNN (Region-based Convolutional Neural network), FastText (fast text classification model), Attention (Attention model), or the like.
And step S260, obtaining the semantics of the sentence to be recognized according to the result of the supervised classification recognition.
In the concrete implementation, if the probability calculated by the supervised classification model is higher than a preset threshold, the sentence intention with the highest probability value can be selected as the target semantic, and the supervised classification model can output the target semantic to finish the semantic recognition of the sentence to be recognized; if the probability calculated by the supervised classification model is lower than the preset threshold, it indicates that the recognition fails through the complete matching model, the key semantic model, the real-time matching model and the supervised classification model, and the semantic recognition cannot be realized, and at this time, the server 104 may output the semantic recognition result as "other".
According to the semantic recognition method, the sentence to be recognized is obtained, the sentence to be recognized is subjected to complete matching recognition through the complete matching model, a complete matching recognition result is obtained, and the recognition accuracy rate when the sentence to be recognized is completely matched with the model sample can be ensured; when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through the key semantic model to obtain a key semantic recognition result, so that the recognition accuracy of key semantics can be ensured; when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through the real-time matching model to obtain a real-time matching recognition result, updating the sample library in real time, and ensuring the recognition stability and the recognition accuracy rate of non-key semantics; when the real-time matching recognition result is recognition failure, the sentence to be recognized is subjected to supervised classification recognition through the supervised classification model to obtain a supervised classification recognition result, and the semantics of the sentence to be recognized is obtained according to the supervised classification recognition result, so that the recognition accuracy can be further improved, and the recognition speed and the recognition stability are ensured.
In an embodiment, the step S220 may specifically include: obtaining a statement template and template semantics corresponding to the statement template; judging whether a target statement template matched with the statement to be recognized can be found in the statement template; if so, obtaining the semantics of the sentence to be recognized according to the template semantics of the target sentence template; if not, outputting a complete matching identification result as identification failure.
In specific implementation, the server may input a sentence to be recognized into the perfect matching model, and the server may obtain in advance a plurality of sentence templates of the perfect matching model and template semantics corresponding to the sentence templates, and store the sentence templates and the template semantics, where the sentence template may be a text in a sample library subjected to refined labeling, and the sentence template may also be a sentence subjected to refined labeling and having a higher frequency of occurrence, for example, the sentence template may be a text (Top 1000 data) ranked according to the frequency of texts and located at Top 1000. After the sentence to be recognized is input into the perfect matching model, the perfect matching recognition can be performed according to the sentence template, for example, a target sentence template that is perfectly matched with the sentence to be recognized can be searched in the sentence template, if the target sentence template can be searched, the template semantics of the target sentence template can be used as the semantics of the sentence to be recognized, and the sentence to be recognized is output, otherwise, if the target sentence template cannot be searched, the recognition failure can be determined, and the perfect matching recognition result can be output as the recognition failure by the perfect matching model.
For example, the server may store a plurality of sentences and their corresponding sentence intentions in advance, and when a sentence identical to the sentence to be recognized is found in the plurality of sentences, the sentence may be used as a target sentence, and the sentence intention corresponding to the target sentence is the semantic meaning of the sentence to be recognized.
In the embodiment, whether a target sentence template matched with a sentence to be recognized can be found in the sentence template is judged by obtaining the sentence template and the template semantics corresponding to the sentence template, if so, the semantics of the sentence to be recognized is obtained according to the template semantics of the target sentence template, and if not, a complete matching recognition result is output as a recognition failure, so that when the sentence to be recognized is matched with the sentence in the sentence template, a higher semantic recognition accuracy can be obtained, and the semantic recognition stability and the recognition speed are improved.
In an embodiment, the step S230 may specifically include: acquiring a keyword logic rule and rule semantics corresponding to the keyword logic rule; judging whether a target logic rule matched with the statement to be identified can be found in the keyword logic rules or not; if so, obtaining the semantics of the sentence to be recognized according to the rule semantics of the target logic rule; if not, outputting the key semantic recognition result as recognition failure.
In specific implementation, when the server reads the complete matching model and outputs the complete matching recognition result as recognition failure, the server can input the sentence to be recognized into the key semantic model. The key semantic model can adopt a rule template, the server can pre-store keyword logic rules and rule semantics corresponding to the keyword logic rules, a target logic rule matched with a sentence to be recognized can be searched in the keyword logic rules, when the sentence to be recognized meets a certain keyword logic rule, the keyword logic rule can be used as the target logic rule, the rule semantics corresponding to the target logic rule can be used as the semantics of the sentence to be recognized, if the target logic rule cannot be found in the keyword logic rule, recognition failure can be judged, and the key semantic model can output a key semantic recognition result as recognition failure.
For example, the server may pre-store a plurality of keyword logic rules and semantics corresponding to the keyword logic rules, extract keywords from the sentence to be recognized, and if the extracted keywords meet a certain keyword logic rule, may use the semantics corresponding to the keyword logic rules as the semantics of the sentence to be recognized.
It should be noted that, because the rule template has the problems of high recall rate and low accuracy, only key semantic rules may be set in the key semantic model, and at this time, only key word logic rules and rule semantics of key semantics may be stored in the server. The key semantic model can also be used for accumulating new meaning pattern, when a sentence to be recognized which cannot be recognized by the rule template appears, the server can take the sentence as a new sample to be stored in a new sample database, and the sample database of the key semantic model, the real-time matching model or the supervision classification model can be updated by recognizing and labeling the new sample.
In the embodiment, whether a target logic rule matched with a sentence to be recognized can be found in the keyword logic rule is judged by obtaining the keyword logic rule and the rule semantics corresponding to the keyword logic rule, if so, the semantics of the sentence to be recognized is obtained according to the rule semantics of the target logic rule, and if not, the key semantic recognition result is output as recognition failure, so that the semantic recognition speed can be improved.
In an embodiment, the step S240 may specifically include: determining text similar sentences corresponding to the sentences to be recognized according to the text matching model; obtaining semantic similarity between the text similar sentences and the sentences to be recognized through a semantic matching model; judging whether the semantic similarity exceeds a preset similarity threshold; if so, selecting a target sentence from the text similar sentences, and obtaining the semantics of the sentence to be recognized according to the semantics of the target sentence; the target sentence is a text similar sentence of which the semantic similarity accords with a preset condition; if not, outputting the real-time matching identification result as identification failure.
In specific implementation, when the server reads that the key semantic recognition result output by the key semantic model is recognition failure, the server can input the sentence to be recognized into the real-time matching model. The real-time matching model can comprise a text matching model and a semantic matching model, N text similar sentences most similar to the characters or the characters of the sentences to be recognized can be recalled from a sample library through the text matching model, then the semantic features are extracted through the semantic matching model, the semantic similarity between the sentences to be recognized and the text similar sentences is calculated, whether the semantic similarity exceeds a preset similarity threshold value or not is judged, if the semantic similarity exceeds the similarity threshold value, the text similar sentences with the highest semantic similarity can be used as target sentences through sorting, the sentence intention corresponding to the target sentences can be used as the semantics of the sentences to be recognized, otherwise, if the semantic similarity is lower than the similarity threshold value, the recognition failure can be judged, and the real-time matching model can output a real-time matching recognition result as the recognition failure.
In practical application, the text matching model may be BM25, the semantic matching model may be a twin neural network model, the real-time matching model may recall (retrieve) N samples (Ranking N) that are most similar to the character or word features of the sentence to be recognized from the sample library using BM25 algorithm, then extract the semantic features using the twin neural network, finally calculate the similarity between the semantic features of the sentence to be recognized and the Ranking N samples using Cosine, and sort again according to the similarity (Re-Ranking), take the sample with the highest similarity among them, and output the intention of the sample with the highest similarity as the semantic of the sentence to be recognized if the similarity reaches a set threshold.
In the embodiment, the text similar sentences corresponding to the sentences to be recognized are determined according to the text matching model, the semantic similarity between the text similar sentences and the sentences to be recognized is obtained through the semantic matching model, whether the semantic similarity exceeds a preset similarity threshold value is judged, if yes, target sentences are selected from the text similar sentences, the semantics of the sentences to be recognized are obtained according to the semantics of the target sentences, the target sentences are the text similar sentences of which the semantic similarity meets preset conditions, if not, the real-time matching recognition result is output as recognition failure, a sample library can be updated in real time, and the recognition stability and the recognition accuracy of non-key semantics are ensured.
In an embodiment, the step S250 may specifically include: obtaining model semantics of a supervision classification model; counting the semantic matching degree of the statement to be recognized for the model semantics; judging whether the semantic matching degree exceeds a preset matching degree threshold value or not; if so, selecting a target semantic from the model semantics, and obtaining the semantic of the sentence to be recognized according to the target semantic; the target semantics is the model semantics of which the semantic matching degree accords with the preset condition; if not, outputting the result of the supervised classification identification as the identification failure.
In specific implementation, when the server reads the real-time matching model and outputs the real-time matching recognition result as recognition failure, the server can input the sentence to be recognized into the supervision and classification model. The server can pre-store a plurality of sentence intentions as model semantics, the supervised classification model can judge the probability of the sentence to be recognized on each model semantics through a TextCNN algorithm as a semantic matching degree, the semantic matching degree is compared with a preset matching degree threshold, if the semantic matching degree exceeds the matching degree threshold, the model semantics with the highest semantic matching degree can be selected as target semantics, if the semantic matching degree is lower than the matching degree threshold, recognition failure can be judged, and the supervised classification model can output a supervised classification recognition result as recognition failure.
In the embodiment, the model semantics of the supervised classification model are obtained, the semantic matching degree of the to-be-recognized sentence to the model semantics is counted, whether the semantic matching degree exceeds a preset matching degree threshold value is judged, if yes, the target semantics are selected from the model semantics, the semantics of the to-be-recognized sentence are obtained according to the target semantics, the target semantics are the model semantics of which the semantic matching degree meets preset conditions, if not, the supervised classification recognition result is output as recognition failure, and the accuracy of semantic recognition can be further improved.
In order to facilitate those skilled in the art to further understand the embodiments of the present application, the following description will be made with reference to the flow chart of the semantic recognition method in fig. 3. The semantic recognition method integrates complete matching, rule matching, similarity models and classification models from top to bottom to obtain a deep semantic recognition integrated model, and specifically comprises the following steps:
step S310 (complete matching), performing semantic recognition through complete matching (Perfect Match), including outputting a corresponding intention name when the text to be recognized completely matches or hits the text in Top1000 data (data with text frequency ranking in Top 1000) and bandwidth data (data in the sample library) which are subjected to refined labeling. The format of the sample can be (send, label), the send is a sentence, and the label is an intention of manual labeling.
And step S320 (Rule matching), performing semantic recognition through Rule matching, wherein when the text to be recognized meets the keyword logic set, the corresponding ideogram name is output. It should be noted that, because the rule matching has the problems of high recall rate and low accuracy, the integrated model can only retain the rule of the key intention in the rule matching, so as to ensure the recognition rate of the key intention, and meanwhile, the rule matching can be used for the accumulation of new intention samples.
Step S330 (Similarity Model), semantic recognition is carried out through a Similarity Model, a Similarity calculation method combining BM25 and a twin neural network structure is provided, N samples (Ranking N) which are most similar to the text characters or word features to be recognized are recalled (Retrieval) from a sample library by using a BM25 algorithm, then semantic features are extracted by using the twin neural network, finally the Similarity between the text semantic features to be recognized and the Ranking N samples is calculated by using Cosine, samples with the Similarity Top1 (first) are obtained by Re-Ranking (Re-Ranking), and the intention of the Top1 samples is output if the Similarity reaches a set threshold. The BM25 and the twin neural network may employ the following training steps:
in step S331, the data preprocessing is performed, which may specifically include:
a) < sample data (send, label) > generation: the sample accumulated in the actual production database after manual labeling is used as < sample data (sent, label) >, wherein the format of the sample is (sent, label), the sent is a sentence, and the label is a manual labeling intention;
b) jieba (jieba) cuts words and stops words;
c) text-to-sequence encoding: encoding the send of the cut word and the stop word by using a Tokenizer of the keras, and normalizing the send encoding length to the length of 30 by using pad _ sequences of the keras, wherein if the send length is greater than 30, the first 30-bit encoding is intercepted, and if the send length is less than 30, 0 is added before the send encoding to ensure that the send encoding length is normalized to the length of 30;
d) < semantic calls sample data (sent1, sent2, label) > generation: on the basis of a trained BM25 model, each sample in < sample data (send, label) > recalls Top n samples most similar to characters or word features thereof from the < sample data (send, label) > and if the artificial labeling intentions are the same, the label of semantic pairs is 1, otherwise, the label of the semantic pairs is 0, thereby generating < semantic pairs sample data (send 1, send 2, label) >) containing m × n semantic pairs, wherein m is the sample number of < sample data (send, label) > and n is the sample number of the recalled most similar.
Step S332, performing model training, which may specifically include:
e) BM25 model training: training by using the samples after the jieba word cutting and the stop word removing in the step S331 b) to obtain a BM25 model;
f) training a twin network model: the model structure comprises an Input layer (Input), a word Embedding layer (Embedding), a stretching/flattening layer (scatter), 3 fully connected layers (dense, each dense layer is followed by 1 missing layer (Dropout)), and a matching layer (Cosine). Training the twin network model structure based on dense by using the coded < semantic calls sample data (sent1, sent2, label) >.
The jieba is a Python Chinese word segmentation component, supports a precision mode, a full mode and a search engine mode 3 word segmentation mode, and supports traditional word segmentation and custom dictionary.
The Keras is a high-level neural network API (Application Programming Interface), a deep learning library based on Theano and TensorFlow, and is compiled by pure Python and based on TensorFlow, Theano and CNTK back end.
Wherein Tokenizer is a class of Keras for vectorizing text, or converting text into a sequence (i.e. a list of words in a dictionary with subscripts, counting from 1).
Wherein, pad _ sequences is a padding sequence function in Keras, and can normalize a sequence with an indefinite length into a sequence with a definite length.
And step S340 (Classification Model), performing semantic recognition through the Classification Model, wherein the semantic recognition comprises the steps of judging the probability of the client call text on each intention by using a TextCNN algorithm, taking the probability of Top1, and outputting the intention corresponding to the probability of Top1 if the probability of Top1 reaches a set threshold value. The TextCNN model can adopt the following training steps:
step S341, performing data preprocessing, which may specifically include:
g) data preprocessing: eliminating samples with sentence length exceeding 40, and unifying and standardizing intention categories;
h) generating a test set and a training set: the test set and the training set are randomly divided into 8 by < sample data (send, label) >: 2, obtaining a training set, wherein 80% of data in the sample can be randomly selected by the training set, and 20% of data in the sample can be randomly selected by the testing set;
i) word segmentation: the word segmentation adopts a configurable type, including word segmentation according to characters and jieba word segmentation;
j) resampling a training set: if the intended sample size is greater than or equal to the intended sample size of the maximum sample size, then no resampling is performed; if the intended sample size is less than the intended sample size of the maximum sample size, there is a re-sampling to replace, the number of samples is the intended sample size of the maximum sample size x 0.2-the intended sample size; if the sample size of an intent (or class) is greater than or equal to the maximum sample size intent x 0.2, then no resampling is performed; if a certain intended sample size is smaller than the intended maximum sample size x 0.2, there is a re-sampling to replace, and the number of samples is the intended maximum sample size x 0.2-the intended sample size. Because samples with different intentions are unbalanced, namely the proportion of samples with different types is very different, the learning process of the algorithm is greatly interfered, random oversampling is adopted, and the threshold value of whether oversampling is carried out or not can be obtained by setting a random oversampling proportion threshold value (for example, 0.2) and multiplying the sample amount with the intention of the maximum sample amount.
k) And (3) encoding: adopting Txt encoding (text encoding), encoding the sent of the training set by using Tokenizer of keras, and normalizing the length of the sent to the length of 30 by using pad _ sequences of the keras, wherein if the length of the sent exceeds 30, the first 30 bits of encoding is intercepted, if the length of the sent is less than or equal to 30, 0 is added before the sent encoding to ensure that the length of the sent encoding is normalized to the length of 30; then, the test set send can be coded by using the Tokenizer fitted by the training set, and the coding length thereof is normalized to the training set; digitally encoding Label of the training set (e.g., to 0, 1, 2, …) using Label encoding (identity encoding), followed by one-hot encoding (one-bit-efficient encoding); and then, encoding the test set label by using the digital encoder fitted by the training set, and performing one-hot encoding.
Step S342, performing model training, which may specifically include:
l) building and training a TextCNN model structure, wherein the model structure comprises an Input Layer (Input Layer), a word Embedding Layer (Embedding), a loss Layer (Dropout), a convolution Layer (Conv, 4 convolution kernels, and the length bit of the convolution kernels can be [2, 3, 4, 5]), a maximum pooling Layer (MaxPooling), a stretching/flattening Layer (Flatten), a connecting Layer (concatemate) and 2 full-connecting layers (dense); the textCNN model structure can be trained by utilizing the coded training set data;
m) testing and model parameter optimization: and testing the TextCNN model by using the test set, and adjusting the model parameters according to the test result.
In step S350, when no semantic meaning is recognized through the above-mentioned perfect matching, rule matching, similarity model, and classification model, the intention name "other" may be output.
The semantic recognition method integrates complete matching, rule matching, similarity model and classification model from top to bottom, adds an input 'other' module, can obtain a deep semantic recognition integrated model, can complement each sub-model, can solve the problems of complex configuration, large human input, poor service mobility, low semantic recognition accuracy and the like of the traditional semantic recognition, keeps the characteristics of 100 percent of recognition accuracy and high stability which can be theoretically achieved by complete matching, and the matching of the rule matching keywords can solve the characteristics of key intention identification and early new intention pattern accumulation, the similarity model can timely respond to the sample library and update the flow tree in real time on line, the stability and generalization capability of the updated model are ensured, the semantic recognition accuracy can be greatly improved through the classification model based on the supervised deep learning TextCNN.
In one embodiment, as shown in fig. 4, there is provided a real-time matching model training method for semantic recognition, comprising the following steps:
step S410, obtaining training sample sentences;
step S420, obtaining training sample text similar sentences corresponding to the training sample sentences according to the pre-trained text matching model;
step S430, obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences;
step S440, training the semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
In a specific implementation, the real-time matching model may be composed of a semantic matching model and a text matching model, where the text matching model may be the best matching model BM25, and the semantic matching model may be a twin network model. The manually labeled sentence sample can be used as sample data to obtain a set < sample data (send, label) >, wherein the send is a training sample sentence, the label is a manually labeled sentence intention, words of the training sample sentence send are cut and stop words are removed, words of the training sample sentence can be obtained, a text matching model to be trained is trained based on the words of the training sample sentence, and the pre-trained text matching model can be obtained. The token of the keras is used to encode the words of the training sample sentence, and the pad _ sequences of the keras is used to normalize the encoding length to 30, for example, if the encoding length of the send is greater than 30, the first 30 bits are truncated, if the encoding length of the send is not greater than 30, zero is padded before encoding, so that the encoding length is 30. Based on the trained text matching model, each sample in the < sample data (send, label) > can recall n samples with the most similar characters or word features from the < sample data (send, label) > as a training sample text similar sentence, the original sample and the recall sample can form a sentence pair (training sample sentence pair), if the artificial labeling intentions of the original sample and the recall sample are the same, the sentence pair can be identified as label ═ 1, and if the artificial labeling intentions are different, the sentence pair can be identified as label ═ 0, so that a set of < semantic pair sample data (send 1, send 2, label) >, where m is the number of samples in the < sample data (send, label) > and n is the number of the most similar recalls, can be generated. Based on the encoded < semantic pair sample data (sent1, sent2, label) >, the semantic matching model to be trained is trained, and a pre-trained semantic matching model can be obtained.
In the embodiment, training sample sentences are obtained; obtaining training sample text similar sentences corresponding to the training sample sentences according to the pre-trained text matching model; obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences; training a semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure to obtain a real-time matching recognition result.
In one embodiment, as shown in fig. 5, there is provided a supervised classification model training method for semantic recognition, comprising the steps of:
step S510, obtaining a training sample sentence;
step S520, performing word segmentation processing on the training sample sentences to obtain word sets of the training sample sentences;
step S530, resampling is carried out on the word set to obtain resampled words;
step S540, coding the resampling words to obtain resampling word codes;
step S550, training the supervised classification model to be trained based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure so as to obtain a supervised classification recognition result.
In a specific implementation, the supervised classification model can be obtained by training the TextCNN model. After the server acquires a group of training sample sentences, samples with sentence length larger than 40 in the training sample sentences can be removed, the intention types are normalized, and 80% of training sample sentences in the training samples are extracted as a training set, 20% of training sample sentences are extracted as a test set, performing word segmentation or jieba word segmentation on the training sample sentences to obtain a word set of the training sample sentences, resampling the word set in the training set to obtain resampled words, and performing text coding or identification coding on the resample to obtain resample word codes, then training the supervised classification model to be trained by using the resample word codes to obtain a pre-trained supervised classification model, testing the pre-trained supervised classification model by using the training sample sentences in the test set, and adjusting model parameters of the pre-trained supervised classification model according to the test result.
In practical application, because the proportion of samples with different intentions (or types) is usually very different, which may cause interference to the training process, a random over-sampling method may be adopted, and a re-sampling threshold is obtained by setting a threshold coefficient, so that re-sampling may not be performed when the sample size exceeds the re-sampling threshold, and re-sampling may be performed when the sample size does not exceed the re-sampling threshold. For example, the device threshold coefficient may be 1, and if the intended sample size is greater than or equal to the intended sample size of the maximum sample size, no resampling may be performed, if the intended sample size is less than the intended sample size of the maximum sample size, there may be resampling put back, the number of samples is the intended sample size of the maximum sample size x 0.2 — the intended sample size; the device threshold coefficient may also be 0.2, no resampling may be performed if the sample size of an intention (or class) is greater than or equal to the sample size of the maximum sample size intention × 0.2, and resampling may be performed back if the sample size of an intention is less than the sample size of the maximum sample size intention × 0.2, with the number of samples being the maximum sample size intention × 0.2 — the intended sample size.
In the embodiment, the training sample sentences are obtained, word segmentation processing is performed on the training sample sentences to obtain word sets of the training sample sentences, the word sets are resampled to obtain resampled words, the resampled words are encoded to obtain resampled word codes, the supervised classification model to be trained is trained based on the resampled word codes to obtain the pre-trained supervised classification model, the recognition accuracy of the pre-trained supervised classification model is high, and the accuracy of semantic recognition can be improved.
It should be understood that although the various steps in the flow charts of fig. 2-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 6, there is provided a semantic recognition apparatus 600, including: an obtaining module 601, a complete matching identification module 602, a key semantic identification module 603, a real-time matching identification module 604, a supervision classification identification module 605 and an output module 606, wherein:
an obtaining module 601, configured to obtain a sentence to be identified;
a complete matching recognition module 602, configured to perform complete matching recognition on the sentence to be recognized through a complete matching model, so as to obtain a complete matching recognition result;
the key semantic recognition module 603 is configured to, when the complete matching recognition result is recognition failure, perform key semantic recognition on the to-be-recognized sentence through the key semantic model to obtain a key semantic recognition result;
the real-time matching identification module 604 is configured to, when the key semantic identification result is an identification failure, perform real-time matching identification on the sentence to be identified through the real-time matching model to obtain a real-time matching identification result;
a supervised classification recognition module 605, configured to perform supervised classification recognition on the to-be-recognized sentence through the supervised classification model when the real-time matching recognition result is recognition failure, so as to obtain a supervised classification recognition result;
and the output module 606 is used for obtaining the semantics of the sentence to be recognized according to the supervision classification recognition result.
In an embodiment, the above-mentioned complete matching identification module 602 is further configured to obtain a statement template and a template semantic corresponding to the statement template; judging whether a target statement template matched with the statement to be recognized can be found in the statement template; if so, obtaining the semantics of the sentence to be recognized according to the template semantics of the target sentence template; if not, outputting a complete matching identification result as identification failure.
In an embodiment, the key semantic identifying module 603 is further configured to obtain a keyword logic rule and a rule semantic corresponding to the keyword logic rule; judging whether a target logic rule matched with the statement to be identified can be found in the keyword logic rules or not; if so, obtaining the semantics of the sentence to be recognized according to the rule semantics of the target logic rule; if not, outputting the key semantic recognition result as recognition failure.
In an embodiment, the real-time matching identification module 604 is further configured to determine a text similar sentence corresponding to the sentence to be identified according to the text matching model; obtaining semantic similarity between the text similar sentences and the sentences to be recognized through a semantic matching model; judging whether the semantic similarity exceeds a preset similarity threshold; if so, selecting a target sentence from the text similar sentences, and obtaining the semantics of the sentence to be recognized according to the semantics of the target sentence; the target sentence is a text similar sentence of which the semantic similarity accords with a preset condition; if not, outputting the real-time matching identification result as identification failure.
In one embodiment, the supervised classification recognition module 605 is further configured to obtain a model semantic of the supervised classification model; counting the semantic matching degree of the statement to be recognized for the model semantics; judging whether the semantic matching degree exceeds a preset matching degree threshold value or not; if so, selecting a target semantic from the model semantics, and obtaining the semantic of the sentence to be recognized according to the target semantic; the target semantics is the model semantics of which the semantic matching degree accords with the preset condition; if not, outputting the result of the supervised classification identification as the identification failure.
For the specific definition of the semantic recognition device, reference may be made to the above definition of the semantic recognition method, which is not described herein again. The modules in the semantic recognition device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a real-time matching model training device for semantic recognition is provided, which includes: the device comprises an acquisition module, a text similarity module, a training sample generation module and a training module, wherein:
the acquisition module is used for acquiring training sample sentences;
the text similarity module is used for obtaining training sample text similar sentences corresponding to the training sample sentences according to the pre-trained text matching model;
the training sample generation module is used for obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences;
the training module is used for training the semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
In one embodiment, a supervised classification model training apparatus for semantic recognition is provided, including: the device comprises an acquisition module, a word segmentation module, a resampling module, a coding module and a training module, wherein:
the acquisition module is used for acquiring training sample sentences;
the word segmentation module is used for carrying out word segmentation processing on the training sample sentences to obtain a word set of the training sample sentences;
the resampling module is used for resampling the word set to obtain resampled words;
the coding module is used for coding the resampling words to obtain resampling word codes;
the training module is used for training the supervised classification model to be trained based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure so as to obtain a supervised classification recognition result.
For the specific limitations of the real-time matching model training device and the supervised classification model training device for semantic recognition, reference may be made to the limitations of the real-time matching model training method and the supervised classification model training method in the foregoing, which are not described herein again. All or part of each module in the real-time matching model training device and the supervision and classification model training device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is for storing semantic identification data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a semantic recognition method.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: obtaining a sentence to be identified; carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result; when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through the key semantic model to obtain a key semantic recognition result; when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through a real-time matching model to obtain a real-time matching recognition result; when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentences to be recognized through a supervised classification model to obtain a supervised classification recognition result; and obtaining the semantics of the sentence to be recognized according to the result of the supervised classification recognition.
In one embodiment, the processor, when executing the computer program, further performs the steps of: obtaining a statement template and template semantics corresponding to the statement template; judging whether a target statement template matched with the statement to be recognized can be found in the statement template; if so, obtaining the semantics of the sentence to be recognized according to the template semantics of the target sentence template; if not, outputting a complete matching identification result as identification failure.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a keyword logic rule and rule semantics corresponding to the keyword logic rule; judging whether a target logic rule matched with the statement to be identified can be found in the keyword logic rules or not; if so, obtaining the semantics of the sentence to be recognized according to the rule semantics of the target logic rule; if not, outputting the key semantic recognition result as recognition failure.
In one embodiment, the processor, when executing the computer program, further performs the steps of: determining text similar sentences corresponding to the sentences to be recognized according to the text matching model; obtaining semantic similarity between the text similar sentences and the sentences to be recognized through a semantic matching model; judging whether the semantic similarity exceeds a preset similarity threshold; if so, selecting a target sentence from the text similar sentences, and obtaining the semantics of the sentence to be recognized according to the semantics of the target sentence; the target sentence is a text similar sentence of which the semantic similarity accords with a preset condition; if not, outputting the real-time matching identification result as identification failure.
In one embodiment, the processor, when executing the computer program, further performs the steps of: obtaining model semantics of a supervision classification model; counting the semantic matching degree of the statement to be recognized for the model semantics; judging whether the semantic matching degree exceeds a preset matching degree threshold value or not; if so, selecting a target semantic from the model semantics, and obtaining the semantic of the sentence to be recognized according to the target semantic; the target semantics is the model semantics of which the semantic matching degree accords with the preset condition; if not, outputting the result of the supervised classification identification as the identification failure.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: obtaining a sentence to be identified; carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result; when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through the key semantic model to obtain a key semantic recognition result; when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through a real-time matching model to obtain a real-time matching recognition result; when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentences to be recognized through a supervised classification model to obtain a supervised classification recognition result; and obtaining the semantics of the sentence to be recognized according to the result of the supervised classification recognition.
In one embodiment, the computer program when executed by the processor further performs the steps of: obtaining a statement template and template semantics corresponding to the statement template; judging whether a target statement template matched with the statement to be recognized can be found in the statement template; if so, obtaining the semantics of the sentence to be recognized according to the template semantics of the target sentence template; if not, outputting a complete matching identification result as identification failure.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a keyword logic rule and rule semantics corresponding to the keyword logic rule; judging whether a target logic rule matched with the statement to be identified can be found in the keyword logic rules or not; if so, obtaining the semantics of the sentence to be recognized according to the rule semantics of the target logic rule; if not, outputting the key semantic recognition result as recognition failure.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining text similar sentences corresponding to the sentences to be recognized according to the text matching model; obtaining semantic similarity between the text similar sentences and the sentences to be recognized through a semantic matching model; judging whether the semantic similarity exceeds a preset similarity threshold; if so, selecting a target sentence from the text similar sentences, and obtaining the semantics of the sentence to be recognized according to the semantics of the target sentence; the target sentence is a text similar sentence of which the semantic similarity accords with a preset condition; if not, outputting the real-time matching identification result as identification failure.
In one embodiment, the computer program when executed by the processor further performs the steps of: obtaining model semantics of a supervision classification model; counting the semantic matching degree of the statement to be recognized for the model semantics; judging whether the semantic matching degree exceeds a preset matching degree threshold value or not; if so, selecting a target semantic from the model semantics, and obtaining the semantic of the sentence to be recognized according to the target semantic; the target semantics is the model semantics of which the semantic matching degree accords with the preset condition; if not, outputting the result of the supervised classification identification as the identification failure.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: acquiring a training sample sentence; obtaining training sample text similar sentences corresponding to the training sample sentences according to a preset text matching model; obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences; training a semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a training sample sentence; obtaining training sample text similar sentences corresponding to the training sample sentences according to the pre-trained text matching model; obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences; training a semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: acquiring a training sample sentence; performing word segmentation processing on the training sample sentences to obtain a word set of the training sample sentences; resampling the word set to obtain resampled words; coding the resampling words to obtain resampling word codes; training a to-be-trained supervised classification model based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure so as to obtain a supervised classification recognition result.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a training sample sentence; performing word segmentation processing on the training sample sentences to obtain a word set of the training sample sentences; resampling the word set to obtain resampled words; coding the resampling words to obtain resampling word codes; training a to-be-trained supervised classification model based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure so as to obtain a supervised classification recognition result.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of semantic recognition, the method comprising:
obtaining a sentence to be identified;
carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result;
when the complete matching recognition result is recognition failure, performing key semantic recognition on the sentence to be recognized through a key semantic model to obtain a key semantic recognition result;
when the key semantic recognition result is recognition failure, performing real-time matching recognition on the sentence to be recognized through a real-time matching model to obtain a real-time matching recognition result;
when the real-time matching recognition result is recognition failure, performing supervised classification recognition on the sentence to be recognized through a supervised classification model to obtain a supervised classification recognition result;
and obtaining the semantics of the sentence to be recognized according to the supervision classification recognition result.
2. The method of claim 1, wherein the performing full-match recognition on the sentence to be recognized through a full-match model to obtain a full-match recognition result comprises:
obtaining a statement template and template semantics corresponding to the statement template;
judging whether a target statement template matched with the statement to be recognized can be found in the statement template;
if so, obtaining the semantics of the sentence to be recognized according to the template semantics of the target sentence template;
if not, outputting the complete matching identification result as identification failure.
3. The method according to claim 1, wherein the performing key semantic recognition on the sentence to be recognized through a key semantic model to obtain a key semantic recognition result comprises:
acquiring a keyword logic rule and rule semantics corresponding to the keyword logic rule;
judging whether a target logic rule matched with the statement to be identified can be found in the keyword logic rule;
if so, obtaining the semantics of the statement to be recognized according to the rule semantics of the target logic rule;
if not, outputting the key semantic recognition result as recognition failure.
4. The method of claim 1, wherein the real-time matching model comprises a pre-trained text matching model and a pre-trained semantic matching model; the real-time matching recognition of the sentence to be recognized through the real-time matching model to obtain a real-time matching recognition result, which comprises the following steps:
determining a text similar sentence corresponding to the sentence to be recognized according to the text matching model;
obtaining semantic similarity between the text similar statement and the statement to be recognized through the semantic matching model;
judging whether the semantic similarity exceeds a preset similarity threshold;
if so, selecting a target sentence from the text similar sentences, and obtaining the semantics of the sentence to be recognized according to the semantics of the target sentence; the target sentence is a text similar sentence of which the semantic similarity meets a preset condition;
if not, outputting the real-time matching identification result as identification failure.
5. The method according to claim 1, wherein the performing supervised classification recognition on the sentence to be recognized through a supervised classification model to obtain a supervised classification recognition result comprises:
obtaining model semantics of the supervised classification model;
counting the semantic matching degree of the statement to be recognized to the model semantics;
judging whether the semantic matching degree exceeds a preset matching degree threshold value or not;
if so, selecting a target semantic from the model semantics, and obtaining the semantic of the sentence to be recognized according to the target semantic; the target semantics are model semantics of which the semantic matching degree meets a preset condition;
if not, outputting the supervision classification recognition result as recognition failure.
6. A real-time matching model training method for semantic recognition is characterized in that the real-time matching model comprises a text matching model and a semantic matching model; the method comprises the following steps:
acquiring a training sample sentence;
obtaining training sample text similar sentences corresponding to the training sample sentences according to a pre-trained text matching model;
obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences;
training a semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
7. A supervised classification model training method for semantic recognition, the method comprising:
acquiring a training sample sentence;
performing word segmentation processing on the training sample sentences to obtain a word set of the training sample sentences;
resampling the word set to obtain resampled words;
coding the resampled words to obtain resampled word codes;
training a supervised classification model to be trained based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure, so as to obtain a supervised classification recognition result.
8. A semantic recognition apparatus, the apparatus comprising:
the acquisition module is used for acquiring the sentence to be identified;
the complete matching recognition module is used for carrying out complete matching recognition on the sentence to be recognized through a complete matching model to obtain a complete matching recognition result;
the key semantic recognition module is used for carrying out key semantic recognition on the sentence to be recognized through a key semantic model to obtain a key semantic recognition result when the complete matching recognition result is recognition failure;
the real-time matching identification module is used for carrying out real-time matching identification on the sentence to be identified through a real-time matching model when the key semantic identification result is identification failure so as to obtain a real-time matching identification result;
the supervised classification recognition module is used for carrying out supervised classification recognition on the sentences to be recognized through a supervised classification model to obtain a supervised classification recognition result when the real-time matching recognition result is recognition failure;
and the output module is used for obtaining the semantics of the sentence to be recognized according to the supervision classification recognition result.
9. A real-time matching model training apparatus for semantic recognition, the apparatus comprising:
the acquisition module is used for acquiring training sample sentences;
the text similarity module is used for obtaining training sample text similar sentences corresponding to the training sample sentences according to a pre-trained text matching model;
the training sample generation module is used for obtaining training sample sentence pairs according to the training sample sentences and the training sample text similar sentences;
the training module is used for training a semantic matching model to be trained based on the training sample sentence pair to obtain a pre-trained semantic matching model; and the pre-trained text matching model and the pre-trained semantic matching model are used for carrying out real-time matching recognition on the sentence to be recognized when the key semantic recognition result is recognition failure so as to obtain a real-time matching recognition result.
10. A supervised classification model training apparatus for semantic recognition, the apparatus comprising:
the acquisition module is used for acquiring training sample sentences;
the word segmentation module is used for carrying out word segmentation processing on the training sample sentences to obtain a word set of the training sample sentences;
the resampling module is used for resampling the word set to obtain resampled words;
the coding module is used for coding the resampling words to obtain resampling word codes;
the training module is used for training a supervised classification model to be trained based on the resampled word codes to obtain a pre-trained supervised classification model; and the pre-trained supervised classification model is used for carrying out supervised classification recognition on the sentences to be recognized when the real-time matching recognition result is recognition failure, so as to obtain a supervised classification recognition result.
CN202010795138.0A 2020-08-10 2020-08-10 Semantic recognition method and device, computer equipment and storage medium Pending CN112149410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010795138.0A CN112149410A (en) 2020-08-10 2020-08-10 Semantic recognition method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010795138.0A CN112149410A (en) 2020-08-10 2020-08-10 Semantic recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112149410A true CN112149410A (en) 2020-12-29

Family

ID=73887862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010795138.0A Pending CN112149410A (en) 2020-08-10 2020-08-10 Semantic recognition method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112149410A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268578A (en) * 2021-06-24 2021-08-17 中国平安人寿保险股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN115793923A (en) * 2023-02-09 2023-03-14 深圳市泛联信息科技有限公司 Human-computer interface motion track identification method, system, equipment and medium
CN116757203A (en) * 2023-08-16 2023-09-15 杭州北冥星火科技有限公司 Natural language matching method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110264439A1 (en) * 2008-02-29 2011-10-27 Ichiko Sata Information processing device, method and program
CN104516986A (en) * 2015-01-16 2015-04-15 青岛理工大学 Method and device for recognizing sentence
CN110196977A (en) * 2019-05-31 2019-09-03 广西南宁市博睿通软件技术有限公司 A kind of intelligence alert inspection processing system and method
CN110334201A (en) * 2019-07-18 2019-10-15 中国工商银行股份有限公司 A kind of intension recognizing method, apparatus and system
CN111310438A (en) * 2020-02-20 2020-06-19 齐鲁工业大学 Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110264439A1 (en) * 2008-02-29 2011-10-27 Ichiko Sata Information processing device, method and program
CN104516986A (en) * 2015-01-16 2015-04-15 青岛理工大学 Method and device for recognizing sentence
CN110196977A (en) * 2019-05-31 2019-09-03 广西南宁市博睿通软件技术有限公司 A kind of intelligence alert inspection processing system and method
CN110334201A (en) * 2019-07-18 2019-10-15 中国工商银行股份有限公司 A kind of intension recognizing method, apparatus and system
CN111310438A (en) * 2020-02-20 2020-06-19 齐鲁工业大学 Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李志祥等著: "《危机管理专题研究》", 国防工业出版社, pages: 104 - 108 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268578A (en) * 2021-06-24 2021-08-17 中国平安人寿保险股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN113268578B (en) * 2021-06-24 2023-08-29 中国平安人寿保险股份有限公司 Text semantic recognition method and device, computer equipment and storage medium
CN115793923A (en) * 2023-02-09 2023-03-14 深圳市泛联信息科技有限公司 Human-computer interface motion track identification method, system, equipment and medium
CN116757203A (en) * 2023-08-16 2023-09-15 杭州北冥星火科技有限公司 Natural language matching method, device, computer equipment and storage medium
CN116757203B (en) * 2023-08-16 2023-11-10 杭州北冥星火科技有限公司 Natural language matching method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN111985228B (en) Text keyword extraction method, text keyword extraction device, computer equipment and storage medium
CN112149410A (en) Semantic recognition method and device, computer equipment and storage medium
CN112800170A (en) Question matching method and device and question reply method and device
CN111428028A (en) Information classification method based on deep learning and related equipment
CN111078847A (en) Power consumer intention identification method and device, computer equipment and storage medium
CN112395875A (en) Keyword extraction method, device, terminal and storage medium
CN112347223B (en) Document retrieval method, apparatus, and computer-readable storage medium
CN112115232A (en) Data error correction method and device and server
CN116628186B (en) Text abstract generation method and system
CN111143507A (en) Reading understanding method based on composite problems
CN112632261A (en) Intelligent question and answer method, device, equipment and storage medium
CN114298055B (en) Retrieval method and device based on multilevel semantic matching, computer equipment and storage medium
CN111145914A (en) Method and device for determining lung cancer clinical disease library text entity
CN113821587A (en) Text relevance determination method, model training method, device and storage medium
CN117076946A (en) Short text similarity determination method, device and terminal
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
CN112749530B (en) Text encoding method, apparatus, device and computer readable storage medium
CN112149424A (en) Semantic matching method and device, computer equipment and storage medium
CN110442759B (en) Knowledge retrieval method and system, computer equipment and readable storage medium
CN114881003A (en) Text similarity recognition method and device and application
CN112836043A (en) Long text clustering method and device based on pre-training language model
CN114548083B (en) Title generation method, device, equipment and medium
KR102474977B1 (en) Method for providing automatic answering service and system therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Zhaolian Consumer Finance Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: MERCHANTS UNION CONSUMER FINANCE Co.,Ltd.

Country or region before: China

CB02 Change of applicant information