CN113158680B

CN113158680B - Corpus processing and intention recognition method and device

Info

Publication number: CN113158680B
Application number: CN202110304661.3A
Authority: CN
Inventors: 孙譞; 詹舒波; 李红玲
Original assignee: Beijing Xinfang Communication Technology Co ltd
Current assignee: Beijing Xinfang Communication Technology Co ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2024-05-07
Anticipated expiration: 2041-03-23
Also published as: CN113158680A

Abstract

Corpus processing and intent recognition methods, systems, devices and computer readable storage media. The corpus processing and intention recognition method comprises the following steps: obtaining corpus sample data and processing the corpus sample data; based on the corpus sample data, performing model training by adopting at least two algorithms to generate an intention recognition model; inputting a sentence to be identified; carrying out intention recognition of at least two algorithms on the statement to be recognized by utilizing the intention recognition model, and respectively obtaining a corresponding recognition result of each algorithm; and performing evidence analysis based on the corresponding identification result of each algorithm to determine the final intention. Through the scheme of corpus processing and intention recognition, corpus characteristics are strengthened, corpus data are completed, and the technical problems of high false recognition rate and false recognition rate in the prior art are solved.

Description

Corpus processing and intention recognition method and device

Technical Field

The invention relates to the field of natural language intention recognition, in particular to a method and a device for corpus processing and intention recognition.

Background

The intention recognition is one of the most important subjects in natural language processing, and is a technology capable of recognizing the true intention of the input information of a user and further meeting the corresponding processing of the intention of the user, and the technology is increasingly paid attention to and applied to the prior art. In the prior art, the intention recognition is generally realized by adopting a neural network, bert and other deep learning methods, however, in practice, training a model with high recognition rate is often difficult. Firstly, in the aspect of training data, firstly, the data is insufficient, deep learning needs a large amount of sufficient linguistic data with complete characteristics and clear semantics, so that the sufficient semantic characteristics can be learned, and a model with high recognition rate cannot be trained due to insufficient linguistic data, which is a key problem puzzling the current wide application of deep learning; 2. data unbalance, data unbalance among different classifications, a lot of classified data, a little classified data, and classification recognition unfavorable for an algorithm, and the data unbalance among different classifications needs to be reduced as much as possible. Secondly, in the aspect of algorithm limitation, inputting a sentence, giving a probability list by the algorithm, wherein each numerical value is a probability value of the sentence belonging to a certain class, taking the maximum probability, setting a threshold value, and considering that the maximum probability is larger than the threshold value, if the maximum probability is larger than the threshold value, obtaining a class answer, otherwise, failing. The recognition near the threshold line has a high error rate. The threshold is raised, the missing recognition is increased, the threshold is lowered, and the false recognition is increased. Above the threshold line, a false identification may be possible, and below the threshold line, a false identification may be possible. In the prior art, a single algorithm is adopted for processing, such as lstm, bert and other single algorithms, different internal structures are adopted, different calculation methods are adopted for processing data, calculation results are not always consistent, and accuracy is also different, which is caused by corpus and algorithm self synthesis. The single algorithm has high false recognition rate and false recognition rate, and is lack of evidence.

Disclosure of Invention

In view of the above technical problems, it has been found that although each algorithm adopts a different internal structure, different calculation methods process data, the calculation results are not always consistent, and the accuracy is also different. But co-confirmation and mutual evidence by different methods will greatly improve the certainty of intent recognition. For this purpose, the invention provides a method, a device and a computer readable storage medium for corpus processing and intention recognition.

In a first aspect of the present invention, a method for corpus processing and intent recognition includes:

S1: acquiring corpus sample data, and processing the corpus sample data by using a sentence pattern template;

s2: based on the corpus sample data, performing model training by adopting at least two algorithms to generate an intention recognition model;

S3: inputting a sentence to be identified;

S4: carrying out intention recognition of at least two algorithms on the statement to be recognized by utilizing the intention recognition model, and respectively obtaining a corresponding recognition result of each algorithm;

S5: and performing evidence analysis based on the corresponding identification result of each algorithm to determine the final intention.

In a second aspect of the present invention, an apparatus for corpus processing and intent recognition includes:

the sample acquisition module is used for acquiring corpus sample data and processing the corpus sample data;

The intention recognition model generation module is used for carrying out model training by adopting at least two algorithms based on the corpus sample data to generate an intention recognition model;

the input module is used for inputting sentences to be identified;

The intention recognition module is used for recognizing intention of at least two algorithms for the sentence to be recognized by utilizing the intention recognition model, and respectively obtaining a corresponding recognition result of each algorithm;

and the analysis module is used for performing evidence analysis based on the corresponding identification result of each algorithm to determine the final intention.

In a third aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, characterized in that the non-transitory computer readable storage medium stores computer instructions for causing a computer to perform any of the methods of the first aspect.

The invention discloses a method, a device and a computer readable storage medium for corpus processing and intention recognition. The corpus processing and intention recognition method comprises the following steps: obtaining corpus sample data and processing the corpus sample data; based on the corpus sample data, performing model training by adopting at least two algorithms to generate an intention recognition model; inputting a sentence to be identified; carrying out intention recognition of at least two algorithms on the statement to be recognized by utilizing the intention recognition model, and respectively obtaining a corresponding recognition result of each algorithm; and performing evidence analysis based on the corresponding identification result of each algorithm to determine the final intention.

Aiming at the general processing mode in the prior art, namely collecting the corpus and delivering the corpus to a single algorithm, the trained model is always inaccurate, and the two reasons are that 1 is due to the corpus problem, and some word sentence patterns are found to be unrecognized because the words or sentences are not put in the corpus or put in the corpus, but are too sparse and not strengthened, and the algorithm does not memorize and does not learn the characteristics; 2 is the problem of algorithm itself, different algorithms process the same corpus in different ways, and the result is inconsistent. The application provides a targeted solution, the application emphasizes that the thought and the method of template processing are presented to algorithms for the few useful templates in the current industry, and most of the sentences are collected. The application emphasizes the problem of corpus and strengthens the corpus by using a sentence pattern plate; secondly, aiming at the problem of the algorithm, the result of a single algorithm is not trusted, but mutually testified by a plurality of methods.

According to the method, aiming at the problems that misidentification and missing identification are high in corpus caused by collection of some sentences stacked by single algorithm prediction in the prior art, intention measurement and identification are carried out by adopting multiple algorithms, multiple intention identification methods are mutually proved, and finally the reliability and accuracy of the identification result are greatly improved. Meanwhile, a sentence pattern template mode is adopted to describe the training corpus, the number of optional words and synonyms are defined through the template, the corpus sentences are expanded, the semantic features are enriched and perfected, and the problems of insufficient corpus and unbalance are solved.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the preferred embodiments thereof, together with the following detailed description of the invention given in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of corpus processing and intent recognition in accordance with one embodiment of the present invention;

FIG. 2 is a schematic diagram of corpus template processing and model training according to one embodiment of the invention;

FIG. 3 is a corpus structural diagram of one embodiment of the invention;

FIG. 4 is a block diagram of a computer-readable storage medium for corpus processing and intent recognition according to yet another embodiment of the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

Fig. 1 is a flowchart of a corpus processing and intent recognition method according to an embodiment of the present invention. The method can be performed by a provided corpus processing and intention recognition device, which can be implemented as software or as a combination of software and hardware, and the corpus processing and intention recognition device can be integrated in some electronic equipment in a data processing system, such as a server or a terminal device. As shown in fig. 1, the corpus processing and intention recognition method includes the following steps:

step S1, corpus sample data are obtained, and sentence pattern templates are adopted to process the corpus sample data.

And (5) expanding the corpus quantity by using the optional words to strengthen the corpus characteristics. The corpus generally adopts a sentence annotation mode of selecting sentences after the text is transcribed by a recording file or manually writing sentences, and the corpus file contains a large number of: sentence-class name, data pair.

Defining corpus by adopting a corpus template, and adopting the following definitions: and selecting words and synonyms, performing sentence replacement generation, strengthening classified key characteristics and expanding sentences. The structure is shown in figure 3.

In one embodiment, the template file may be in json format. The same classification intent may be composed of multiple files. However, the template file format and the number of files are not limited thereto, and any suitable format and file number format can be implemented.

In the above-mentioned structure, the first and second heat exchangers,

The intent is: an intent classification name;

sentence: a sentence;

option_words: which words in the sentence are optional words;

synonym _words: which words in the sentence have synonyms;

according to the template definition, sentences: "the bar me will not answer" is calculated, and after expansion according to the optional word and the synonym, the method is as follows:

I will not answer

I will not accept

I cannot answer

I cannot accept

I never answer

I never accept

I must not answer

I must not accept

I determine that I do not answer

I determine not to accept

I do not answer

I don't accept

Calculate that the bar me cannot answer

Calculating that the bar me does not accept

Calculate the impossible answer of the bar me

Calculate that I am not likely to accept

Calculate that the bar me never answers

Calculate that I never accept

Calculate that I must not answer

The bar me must not accept

Calculating that the bar me confirms that the bar me cannot answer

Calculating that bar me determines not to accept

Calculating the non-answer of the bar me

Calculate that I don't accept

The user can not answer the bar by pulling

The user can not accept the bar

The user cannot answer the bar by pulling

Not possible to accept by pulling over the bar

The user can never answer the bar by pulling the bar upside down

Draw down the bar I never accept

The user can certainly not answer the bar by pulling the bar upside down

The pull-down bar does not accept me

Pulling over the bar I determines that he will not answer

Pull down bar me determines not to accept

Draw back bar I'm answer

Pull down bar me not accepted

The bar me cannot answer

Does not accept by the bar me

Has no possibility of answering by the bar me

Not be accepted by the bar me

Has no answer to the bar me

Have no way of acceptance by the bar me

If the bar me is affirmed, the answer is not given

Do not accept if the bar me is sure

Bar me has determined that he will not answer

Bar me determines that it is not accepted

Has no answer to the bar me

Go beyond I's acceptance of

So that the bar me cannot answer

So that the bar does not accept me

Thus, the bar me cannot answer

Thus, the bar me cannot accept

So that the bar me never responds

So that the bar me never accepts

Thus, the bar I must not answer

So that the bar me must not accept

In this way, the bar me determines that the bar me cannot answer

In this way the bar me determines that it will not accept

Thus, the bar me does not answer

So that I do not accept

As can be seen, one sentence in the template is 60 sentences after expansion, the semantic features of the sentence expression after expansion are more vivid and complete, and the sentence pattern structure, synonyms and optional words are fully presented to the algorithm.

The sentence pattern mode is adopted to describe the training corpus, and the sentence pattern mode comprises descriptions of optional words and synonyms, so that sentence characteristics of each category can be enhanced, sentences can be expanded, and the problems of insufficient corpus and unbalance are solved.

Step S2: based on corpus sample data, model training is carried out by adopting at least two algorithms, and an intention recognition model is generated.

In one embodiment, as shown in fig. 2, a deep learning algorithm is adopted to train a model based on the sentence pattern template, at least two deep learning algorithms are adopted to train an intention recognition model, for example, the bert and bi-lstm are the current mainstream intention recognition methods, the recognition rate is high, the corpus after the sentence pattern template expansion is sequentially input into a bert sub-model and a bi-lstm sub-model to train respectively, and the finally generated intention recognition model comprises recognition calculation of at least two algorithms. The parameter coefficient training is good, and the prediction accuracy is high. Which intention classifications are defined, and which corpora correspond to each classification. Whether the corpus is complete or not directly influences the training result. The intention recognition rate of the good model is high.

The algorithm adopted in the model training is not limited to the mainstream intention recognition algorithms such as bert, bi-lstm and the like, and any other algorithm suitable for the intention recognition, such as CNN and the like, is suitable for the invention.

Step S3: and inputting a sentence Sen to be identified, and preprocessing.

In one embodiment, the preprocessing includes deactivating words, creating a dictionary of deactivating words, which include mainly some adverbs, adjectives, and conjunctions. And (3) maintaining a stop word list, carrying out a characteristic extraction process, removing the stop word, and extracting core words.

The preprocessing is not limited to disabling words, but also includes any other data preprocessing such as data normalization and sentence regularization.

Step S4: and carrying out intention recognition of at least two algorithms on the sentence Sen to be recognized, and respectively obtaining corresponding recognition results of each algorithm.

In one embodiment, the intent recognition of the sentence Sen to be recognized by a plurality of algorithms comprises the following specific steps:

step S4.1: and measuring the intention relatedness, judging and identifying all relevant intention classifications of the input sentences.

In one embodiment, the intent correlation measure is performed on the statement to be identified Sen using a comprehensive technique. For example, by comprehensively using measurement modes such as keyword regularization, query type and/or negative judgment, etc., analyzing which specific intention classes the sentence Sen to be identified is related to, obtaining the result of the related intention class, and organizing and storing the related intention result in various forms, such as a related intention list, etc.

In one embodiment, an intent database is built, storing all intents and intent classes obtained from the training of the intent recognition model based on the corpus template.

In one embodiment, the intent template may be employed to describe, organize, and store the intents and intent categories.

In one embodiment, the intent-to-relevance measure is performed using a relevance measure identification file, identifying all intent classifications for which the sentence Sen is to be identified is calculated. For example:

the measurements identify key features in the file that define each intent classification by keywords, combinations of keywords, whether questions are questioned, etc.

Such as:

< regex intent= "no pair" pattern= "[ error|mistake ]" and not= "[ no|none ]"/>, indicating that the sentence contains keywords: 'error' or 'error', while not containing: 'none', following the intention: 'not to' correlation.

< Regex intent= "query contact" pattern= "[ find you ]" and= "[ what|how|how ]" questionintent = "question sentence"/>, indicating that the sentence contains keywords: find you, and include: what, how, and the sentence is question sentence, and is related to the intended 'query contact mode'. The discrimination of the question sentence is completed by an independent model.

The intent-to-relevance measure is not used to determine which intent category is, but which category the sentence Sen to be identified is related to. For subsequent evidence analysis.

In one embodiment, the intent classification includes affirmative, negative, request repeated interpretation, query for four classes of identity.

Step S4.2: and (2) carrying out intention prediction on the statement to be recognized Sen based on the intention recognition model trained in the step (S2).

In one embodiment, the intention recognition model comprises at least two intention recognition sub-models, such as bert sub-models and bi-lstm sub-models; and respectively calling the bert submodel and the bi-lstm submodel to carry out intention prediction on the statement Sen to be identified.

In one embodiment, bert submodel intent predictions are made for the statement Sen to be identified, e.g., the predicted intent is: querying identity: 95%, affirmative: 3%, negative: 1% and the request is interpreted repeatedly by 1%. The threshold line is 85%, then bert submodel predictions get the intention to be: the identity is queried. Similarly, bi-lstm submodel intention prediction is carried out on the sentence Sen to be identified, so as to obtain an intention probability, and an intention result is obtained through prediction.

Step S5: and performing evidence analysis based on the corresponding recognition result of each algorithm, and determining the final intention of the sentence to be recognized.

In one embodiment, the intent correlation measurement in step S4.1 determines and identifies all relevant intent classifications of the sentence Sen to be identified, and obtains a first intent identification result;

performing intention prediction on the sentence Sen to be recognized based on bert submodels and bi-lstm submodels in the step S4.2 to respectively obtain a second intention recognition result and a third intention recognition result;

In one embodiment, based on the first, second, and third intent recognition results, a mutual evidence analysis is performed to identify the final result. For example, the confident intention is an intention that at least two intention recognition results agree among the three intention recognition results; if the intention of the highest probability of the second intention recognition result and the third intention recognition result exceeds the threshold value and are identical and belong to the range of the intention class determined by the first intention recognition result, the intention determined by mutual evidence is confident.

In one embodiment, based on the first, second and third intention recognition results, a mutual evidence analysis is performed, and the specific process of recognizing the final result is as follows:

The first intention recognition result obtained by carrying out intention correlation measurement on the sentence Sen to be recognized comprises a sentence-related intention list, wherein each intention count is 1;

And carrying out bert on the second intention recognition result obtained by intention prediction of the submodel of the statement to be recognized Sen, and taking the first two with the largest probability, wherein the count 1 is higher than a threshold value, and the count 0.5 is lower than the threshold value but higher than 0.6.

And carrying out intent prediction on the sentence Sen to be recognized to obtain the third intent recognition result of the bi-lstm submodel, and taking the first two with the largest probability, wherein the count 1 is higher than a threshold value, and the count 0.5 is lower than the threshold value but higher than 0.6.

Accumulating intent counts, defining:

It is believed that intent = maximum intent is only one and the intent count= 2 or 3;

Again resolution intent = count maximum intent there are two and the counts of these two intents= 2;

The problem of misrecognition is solved by a mutual evidence analysis of a count of 0.5 below a threshold but above 0.6, which is to solve the problem of misrecognition, e.g. the highest probability of intent in the second and third intent recognition results is below the threshold but above 0.6 and is identical and equal and belongs to the range of intent classes determined by the first intent recognition result, where the count of intent = 2 is also a confident intent.

Step S6: when the intention results determined by the evidence analysis are multiple, carrying out decision judgment processing;

In one embodiment, in the step S5, in the verification analysis, it is determined that both intentions have higher certainty, for example, the intentions with the highest probability in the second and third intention recognition results exceed the threshold, but are inconsistent and belong to the range of the intention class determined by the first intention recognition result, and two choices are needed to be determined again and again, and the decision is to adopt a phrase matching method, and match the phrases with the same meaning in all the sentences in the two intentions, and if matched, the intentions are confirmed. And comparing every two sentences in all sentences in the limited two intentions, searching sentences with the most similar matching semantics, and finding out the sentences to confirm the intentions. The operation prevents excessive sentences from being compared, and adopts a word indexing mode to pick out sentences related to the input sentences. Wherein, sentence semantic comparison adopts an independent model.

According to the method, aiming at the problems that misidentification and missing identification are high in corpus caused by collection of some sentences stacked by single algorithm prediction in the prior art, intention measurement and identification are carried out by adopting multiple algorithms, multiple intention identification methods are mutually proved, and finally the reliability and accuracy of the identification result are greatly improved. Meanwhile, the sentence pattern template mode is adopted to describe the training corpus, the number of sentences of the corpus is expanded through optional words and synonyms, the semantic features are enriched and perfected, and the problems of insufficient corpus and unbalance are solved.

Yet another embodiment of the present invention provides a method flowchart for corpus processing and intent recognition. In this embodiment, based on the first, second, and third intention recognition results, mutual authentication analysis is performed to recognize the final result. For example, the confident intention is an intention that at least two intention recognition results agree among the three intention recognition results; if the intention of the highest probability of the second intention recognition result and the third intention recognition result exceeds the threshold value and are identical and belong to the range of the intention class determined by the first intention recognition result, the intention determined by mutual evidence is confident.

Accumulating intent counts, defining:

The system further comprises a decision judging module, a decision judging module and a decision judging module, wherein the decision judging module is used for carrying out decision judging processing when the intention results determined by the evidence analysis are multiple;

In one embodiment, in the evidence analysis, it is determined that both intentions have higher certainty, for example, the intentions with the highest probability in the second and third intention recognition results exceed the threshold, but are inconsistent and belong to the range of intention class determined by the first intention recognition result, two-choice repeated decision is needed, a phrase matching method is adopted for decision, short sentences with the same meaning are matched in all sentences in the two intentions, and the intention is confirmed if matched. And comparing every two sentences in all sentences in the limited two intentions, searching sentences with the most similar matching semantics, and finding out the sentences to confirm the intentions. The operation prevents excessive sentences from being compared, and adopts a word indexing mode to pick out sentences related to the input sentences. Wherein, sentence semantic comparison adopts an independent model.

Aiming at the general processing mode in the prior art, namely collecting the corpus and delivering the corpus to a single algorithm, the trained model is always inaccurate, and two reasons are that 1 is due to the corpus problem, some words are found to be put into the corpus, but the corpus is not strengthened, and the algorithm does not memorize the features; 2 is a problem of the algorithm itself. The application provides a targeted solution, the application emphasizes that the thought and the method of template processing are presented to algorithms for the few useful templates in the current industry, and most of the sentences are collected. The application emphasizes the problem of corpus and strengthens the corpus by using a sentence pattern plate; secondly, aiming at the problem of the algorithm, the result of a single algorithm is not trusted, but mutually testified by a plurality of methods.

Fig. 4 is a schematic diagram of a computer-readable storage medium according to an embodiment of the invention. As shown in fig. 4, a computer-readable storage medium 40 according to an embodiment of the present invention has stored thereon non-transitory computer-readable instructions 41. When the non-transitory computer readable instructions 41 are executed by the processor, all or part of the steps of the artificial intelligence bi-directional green wave based optimal traffic signal control method of the embodiments of the invention described above are performed.

The computer readable medium of the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: constructing a basic page, wherein the page code of the basic page is used for constructing an environment required by the operation of the service page and/or realizing the same abstract workflow in the similar service scene; constructing one or more page templates, wherein the page templates are used for providing code templates for realizing service functions in service scenes; based on the corresponding page template, generating a final page code of each page of the service scene through code conversion of a specific function of each page of the service scene; and merging the generated final page code of each page into the page code of the basic page to generate the code of the service page.

Or the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: constructing a basic page, wherein the page code of the basic page is used for constructing an environment required by the operation of the service page and/or realizing the same abstract workflow in the similar service scene; constructing one or more page templates, wherein the page templates are used for providing code templates for realizing service functions in service scenes; based on the corresponding page template, generating a final page code of each page of the service scene through code conversion of a specific function of each page of the service scene; and merging the generated final page code of each page into the page code of the basic page to generate the code of the service page.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented in software or in hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The above description is only illustrative of the preferred embodiments of the present invention and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present invention is not limited to the specific combinations of technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present invention (but not limited to) having similar functions are replaced with each other.

Claims

1. A method for corpus processing and intent recognition, comprising:

S3: inputting a sentence to be identified;

The step S4 includes the following specific steps:

the method comprises the steps of measuring intention relativity, judging and identifying all relevant intention classifications of input sentences, wherein the intention classifications comprise key words, query types and/or negative judgment features, judging and identifying all relevant intention classifications of the sentences to be identified, and obtaining a first intention identification result;

Based on the intention recognition model trained in the step S2, carrying out intention prediction on the statement to be recognized Sen to respectively obtain corresponding recognition results of each algorithm, wherein the intention recognition is carried out on the statement to be recognized by utilizing the first sub-model to obtain a second intention recognition result; performing intention recognition on the statement to be recognized by using the second sub-model to obtain a third intention recognition result;

s5: based on the corresponding identification result of each algorithm, performing evidence analysis to determine the final intention;

step S5 includes performing a proof analysis based on the first intention recognition result, the second intention recognition result and the third intention recognition result, and determining a final intention of the sentence to be recognized;

based on the first, second and third intention recognition results, performing mutual evidence analysis, wherein the specific process for recognizing the final result is as follows:

1) The first intention recognition result obtained by carrying out intention correlation measurement on the sentence Sen to be recognized comprises a sentence-related intention list, wherein each intention count is 1;

2) Carrying out bert on the second intention recognition result obtained by intention prediction of the submodel of the statement to be recognized Sen, taking the first two with the largest probability, wherein the count 1 is higher than a threshold value, and the count 0.5 is lower than the threshold value but higher than 0.6;

3) The third intention recognition result obtained by carrying out intention prediction of bi-lstm submodels on the statement to be recognized Sen takes the first two with the largest probability, and the count 1 higher than a threshold value and the count 0.5 lower than the threshold value but higher than 0.6;

4) Accumulating intent counts, defining:

the definition of the confidence intent is that there is only one intent with the largest count and the intent count is 2 or 3;

again, the definition of resolution is that there are two intentions whose counts are largest and the count of these two intentions is 2.

2. The method according to claim 1, wherein at least two algorithms are used for model training in the step S2 to generate the intention recognition model, and the at least two algorithms used for model training are deep learning algorithms.

3. The method of claim 2, wherein the deep learning algorithm comprises BI-LSTM, BERT, or CNN.

4. The method according to claim 1, further comprising, after performing a corroboration analysis to determine a final intention based on the recognition result corresponding to each algorithm in the step S5,

And S6, when the intention results determined by the evidence analysis are multiple, carrying out decision judgment processing.

5. The method of claim 1, wherein step S1 further comprises preprocessing a sentence to be recognized after the sentence to be recognized is input.

6. A device for corpus processing and intent recognition, comprising:

The sample acquisition module is used for acquiring corpus sample data and processing the corpus sample data through a sentence pattern template;

the input module is used for inputting sentences to be identified;

The intention recognition module is used for recognizing intention of at least two algorithms for the sentence to be recognized by utilizing the intention recognition model, and respectively obtaining a corresponding recognition result of each algorithm; the method is particularly used for: the method comprises the steps of measuring intention relativity, judging and identifying all relevant intention classifications of input sentences, wherein the intention classifications comprise key words, query types and/or negative judgment features, judging and identifying all relevant intention classifications of the sentences to be identified, and obtaining a first intention identification result;

Based on the trained intention recognition model, carrying out intention prediction on the statement to be recognized Sen to respectively obtain corresponding recognition results of each algorithm, wherein the intention recognition is carried out on the statement to be recognized by utilizing the first sub-model to obtain a second intention recognition result; performing intention recognition on the statement to be recognized by using the second sub-model to obtain a third intention recognition result;

the analysis module is used for performing evidence analysis based on the corresponding identification result of each algorithm to determine the final intention;

the analysis module is specifically configured to perform a verification analysis based on the first intent recognition result, the second intent recognition result, and the third intent recognition result, and determine a final intent of the sentence to be recognized;

4) Accumulating intent counts, defining:

Again, the definition of intent is that there are two intent with the largest count and the count of these two intents is 2.

7. A computer-readable storage medium storing non-transitory computer-readable instructions that, when executed by a computer, cause the computer to perform the corpus processing and intent recognition method of any of claims 1-4.