CN116468050A - Text robot training method, system, equipment and medium based on deep learning - Google Patents

Text robot training method, system, equipment and medium based on deep learning Download PDF

Info

Publication number
CN116468050A
CN116468050A CN202310538987.1A CN202310538987A CN116468050A CN 116468050 A CN116468050 A CN 116468050A CN 202310538987 A CN202310538987 A CN 202310538987A CN 116468050 A CN116468050 A CN 116468050A
Authority
CN
China
Prior art keywords
corpus
data
training
intention
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310538987.1A
Other languages
Chinese (zh)
Inventor
吴强
田凤占
江世林
周光杰
陈钰枫
徐金安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
T&i Net Communication Co ltd
Beijing Jiaotong University
Original Assignee
T&i Net Communication Co ltd
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by T&i Net Communication Co ltd, Beijing Jiaotong University filed Critical T&i Net Communication Co ltd
Priority to CN202310538987.1A priority Critical patent/CN116468050A/en
Publication of CN116468050A publication Critical patent/CN116468050A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text robot training method, a system, equipment and a medium based on deep learning, wherein the method comprises the following steps: sorting a preset number of seed corpuses based on the original intention, and importing the seed corpuses into a text robot knowledge base; identifying problem data in all intents and correcting the problem data; all intents include original intent and seed corpus; batch expanding writing is carried out on all intents; identifying whether the expanded intention contains confusion corpus; when the method comprises the steps, correcting the confusion corpus; training a text robot model through the original intention, the seed corpus and the corrected intention after the expanded writing; verifying whether the text robot model accords with the expectation; when the text robot model does not meet the expectations, training the text robot model again by expanding writing on single intention or adjusting or expanding writing on single corpus until the text robot model meets the expectations. Through the processing scheme, the text robot can be trained efficiently and accurately.

Description

Text robot training method, system, equipment and medium based on deep learning
Technical Field
The invention relates to the technical field of information, in particular to a text robot training method, system, equipment and medium based on deep learning.
Background
The operation of the text robot comprises the steps of building and optimizing in a cold start stage, perfecting uncovered scenes after online, supplementing corpus, optimizing question and answer effects and the like, and is used for guaranteeing that a specific service scene of the robot has a good generalization effect and achieving the expected question and answer accuracy.
Unlike the knowledge of the general population about robots, the effect of robots is not already present when the robot products are shipped, but rather requires a great deal of manpower to build and continuously optimize to achieve the desired effect. In the traditional robot operation method, the robot construction, tuning and scene perfecting work is completed in a purely manual mode such as supplementing corpus, testing and verifying single question sentence, testing and inspecting session quality and the like one by totally relying on experience of a robot trainer.
In recent years, after a lot of enterprises purchase AI (Artificial Intelligence) robots, the robots are found to not reach the expected effect at the beginning of purchase, and the essential reason is that the robot operation needs a lot of manpower input, the input manpower needs to know the business of the enterprises, finally, the realization principle of the purchased robots needs to be understood deeply, the operation of the robots can be performed on the basis of three conditions, and the question-answer effect of the robots meets the requirement of real business. Meeting these three conditions is quite difficult for the enterprise, which is one of the main reasons that prevents the robot product from being truly usable and well used in the enterprise.
Therefore, the conventional text robot training method is still inconvenient and disadvantageous in use, and needs to be further improved. How to create a new text robot training method becomes the aim of improvement in the current industry.
Disclosure of Invention
In view of the above, embodiments of the present disclosure provide a text robot training method based on deep learning, which at least partially solves the problems existing in the prior art.
In a first aspect, embodiments of the present disclosure provide a text robot training method based on deep learning, the method including the steps of:
sorting a preset number of seed corpuses based on original intention, and importing the seed corpuses into a text robot knowledge base;
identifying problem data in all intents and correcting the problem data; wherein all intents include the original intent and the seed corpus;
batch write expansion is carried out on all intents to obtain the intents after write expansion;
identifying whether the expanded intention contains confusion corpus; when the mixed corpus is contained, correcting the mixed corpus to obtain the corrected intention after expanded writing;
training a text robot model through the original intention, the seed corpus and the expanded corrected intention;
verifying whether the text robot model meets expectations; when the text robot model does not accord with the expectation, training the text robot model again by performing expanded writing on single intention or adjusting or expanding writing on single corpus until the text robot model accords with the expectation.
According to a specific implementation of an embodiment of the disclosure, the method further includes:
the text robot model meeting the expectations is released and put on line;
screening user question questions which cannot be identified by the text robot model, and performing cluster analysis on the user question questions through a hierarchical clustering function to obtain a cluster analysis result;
providing suggestions according to the clustering analysis result; the suggestion includes creating a new intent based on the cluster analysis result or adding the cluster analysis result to a corpus of existing intents;
each time the intention is created or modified, question data in all intents is identified and corrected.
According to a specific implementation of an embodiment of the present disclosure, the problem data in all intents includes: at least one of FAQ answer similarity data, confusion intention data, and confusion corpus data.
According to a specific implementation manner of the embodiment of the present disclosure, the batch expansion writing of all intents includes:
cleaning, clustering and resampling the original data to construct training data; the original data is data collected from business;
training a write-expansion model through the training data, the standard autoregressive cross entropy loss function and the label smoothing strategy, and generating sentences;
filtering the generated sentences by calculating BLEU values among the sentences;
and sequencing the filtered sentences through a preset sequencing index.
According to a specific implementation of an embodiment of the disclosure, the method further includes:
and mixing a preset proper noun list with the original training data to jointly train the model to obtain a proper noun reserved corpus expansion model.
According to a specific implementation manner of the embodiment of the present disclosure, the preset number of seed corpora is 3-5.
In a second aspect, embodiments of the present disclosure provide a text robot training system based on deep learning, the system comprising:
the data processing module is configured to sort a preset number of seed corpuses based on the original intention, and import the seed corpuses into the robot knowledge base; identifying problem data in all intents and correcting the problem data; wherein all intents include the original intent and the seed corpus;
the write-expanding module is configured to perform batch write expansion on all intents to obtain the intents after write expansion; identifying whether the expanded intention contains confusion corpus; when the mixed corpus is contained, correcting the mixed corpus to obtain the corrected intention after expanded writing;
a verification module configured to train a text robot model with the original intent, the seed corpus, and the expanded revised intent; verifying whether the robot model meets expectations; when the robot model does not accord with the expectation, the robot model is trained again by performing expanded writing on a single intention or adjusting or expanding writing on a single corpus until the robot model accords with the expectation.
According to a specific implementation of an embodiment of the disclosure, the system further includes:
the expanding-writing model training module is configured to clean, cluster and resample the original data to construct training data; the original data is data collected from business; training a write-expansion model through the training data, the standard autoregressive cross entropy loss function and the label smoothing strategy, and generating sentences; filtering the generated sentences by calculating BLEU values among the sentences; and sequencing the filtered sentences through a preset sequencing index.
In a third aspect, embodiments of the present disclosure further provide an electronic device, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor, which when executed by the at least one processor, cause the at least one processor to perform the deep learning based text robot training method of the first aspect or any implementation of the first aspect.
In a fourth aspect, the presently disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions that, when executed by at least one processor, cause the at least one processor to perform the deep learning based text robot training method of the foregoing first aspect or any implementation of the first aspect.
In a fifth aspect, embodiments of the present disclosure also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the text-based robotic training method of the first aspect or any implementation of the first aspect.
According to the text robot training method based on deep learning, various deep learning algorithms and industry accumulated data are combined with the construction, optimization and operation work of a robot, and the steps of corpus batch expansion (industry accumulated data training deep learning model), health degree inspection (clustering and classification), unknown problem clustering (unsupervised machine learning and pre-training similarity calculation) and the like are adopted, so that the operation work of the robot is simplified, and the question-answer accuracy of the robot is improved. The traditional robot construction method is reduced to the working level of writing seed corpus (3-5), the robot construction time which needs a week or even longer is shortened to 1-3 days, the health degree of a knowledge base is automatically checked through health degree checking, confusion intentions and confusion corpus are processed, independence among intentions is guaranteed, and the robot construction method is more efficient and more direct than the traditional mode of manually asking questions one by one to verify and tune the robot. Through unknown problem clustering, uncovered knowledge of the robot is extracted, knowledge with high similarity in a knowledge base is recommended based on similarity settlement to increase and rapidly complete knowledge backflow, coverage rate of the knowledge base is improved, question answering accuracy of the robot is ensured in multiple directions, and the method is more efficient than a traditional method for discovering on-line problems through manual screening of conversation histories.
Drawings
The foregoing is merely an overview of the present invention, and the present invention is further described in detail below with reference to the accompanying drawings and detailed description.
Fig. 1 is a schematic flow chart of a text robot training method based on deep learning according to an embodiment of the disclosure;
fig. 2 is a flowchart of a text robot training method based on deep learning according to an embodiment of the disclosure;
fig. 3 is a schematic structural diagram of a text robot training system based on deep learning according to an embodiment of the disclosure; and
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present disclosure will become readily apparent to those skilled in the art from the following disclosure, which describes embodiments of the present disclosure by way of specific examples. It will be apparent that the described embodiments are merely some, but not all embodiments of the present disclosure. The disclosure may be embodied or practiced in other different specific embodiments, and details within the subject specification may be modified or changed from various points of view and applications without departing from the spirit of the disclosure. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the invention provides a text robot training method based on deep learning, which comprises the steps of presetting original intentions (or called standard questions), and then manually configuring a plurality of seed corpora for each original intention, wherein the seed corpora are a plurality of different question methods for expressing the intentions; based on the standard questions and the seed corpus, automatically expanding and writing more synonymous sentences with different expressions in batches (expanded and written corpus); checking and correcting the expanded writing corpus; finally training the text robot model through the original intention (standard question), the seed corpus and the expanded corpus. Through corpus batch expansion, health degree inspection and unknown problem clustering, the problem that a trainer who needs to understand business and know technology inputs a lot of energy when a conventional robot lands in an enterprise can be solved.
Fig. 1 is a schematic diagram of a text robot training method flow based on deep learning according to an embodiment of the disclosure.
Fig. 2 is a block flow diagram of a text robot training method based on deep learning corresponding to fig. 1.
As shown in fig. 1, at step S110, a preset number of seed corpora are sorted based on the original intent, and the seed corpora are imported into a text robot knowledge base.
More specifically, a robot trainer or client sorts 3 to 5 seed corpora for the original intent, and imports them into a robot (or text robot) knowledge base through Excel.
More specifically, step S120 is next followed.
At step S120, identifying problem data in all intents and correcting the problem data; wherein all intents include the original intent and the seed corpus.
In the embodiment of the invention, the problem data in all intents comprises: at least one of FAQ answer similarity data, confusion intention data, and confusion corpus data.
More specifically, through the health degree checking function, identifying FAQ answer similarity (frequency-askedwquests, common problem solutions), confusing intention, confusing data of the corpus, and correcting intention with problem by merging and splitting, namely if the seed corpus under the two intentions is highly similar, the system prompts, and recommends an implementer to merge the two intentions or make further modification to the seed corpus so as to more distinguish the two intentions; if the answers corresponding to the two intents are highly similar, the system suggests to combine the two intents.
Next, the process goes to step S130.
At step S130, batch write expansion is performed on all intents to obtain write expanded intents.
And (3) through a one-key expanding function, batch expanding is carried out on all intents, and each FAQ and intents expand and write 750 corpora at most.
In the embodiment of the invention, the batch expansion of all intents comprises the following steps:
cleaning, clustering and resampling the original data to construct training data; the original data is data collected from business;
training a write-expansion model through the training data, the standard autoregressive cross entropy loss function and the label smoothing strategy, and generating sentences;
filtering the generated sentences by calculating BLEU values among the sentences;
and sequencing the filtered sentences through a preset sequencing index.
In an embodiment of the present invention, the method further includes:
and mixing a preset proper noun list with the original training data to jointly train the model to obtain a proper noun reserved corpus expansion model.
More specifically, what is done here is corpus transcription, while transcription ability mainly depends on a transcription model, and the following is a transcription model obtaining process:
given a sentence, the purpose of the compound generation is to generate compound sentences that differ from the original sentence in terms of or syntax, while preserving the original semantics. In recent years, with the rapid development of various deep neural network models, the method of generating the repetition is gradually changed from the traditional method to the more advanced deep neural network method. Currently, most existing restated generation models are based on the Seq2Seq model consisting of an encoder and a decoder. The main purpose of the encoder is to extract semantic information, i.e. a vector representation of the context semantics of each word. Then, the decoder generates a complex sentence according to the vector given by the encoder, and particularly, the vector output by the encoder is utilized to forward propagate in each decoding step to obtain the distribution of the vocabulary, the proper vocabulary is sampled from the distribution, and common decoding strategies include greedy decoding, beam search decoding, core sampling decoding and the like. By comparing existing open source generation type Seq2Seq pre-training models MASS (Masked Sequence To Sequence Pre-training), T5 (Text-To-Text Transfer Transforme) and BART (Bidirectional andAuto-Regressive Transformers, bi-directional and auto-regressive transformers), we have chosen the BART model as the base model for fine tuning according To the pre-training task.
The specific flow of the model fine tuning and evaluation is as follows:
(1) Data preparation
Our goal is to input the original sentence, generate sentences with the same semantics but with diversity expression, and then need to construct a large number of similar sentence pairs as training data. The original data is data collected from business, a large number of similar sentences are contained in the data, and training data is constructed by cleaning, clustering and resampling the data.
Data cleaning: filtering the text with pure numbers and part of noise corpus in the data, and desensitizing the text containing telephone numbers, identification numbers and electronic mailboxes.
Clustering data: because the data is collected from different expansion services, repeated linguistic data and a large number of similar linguistic data exist among different services, the repeated linguistic data is filtered out through repeated filtering, then the similar linguistic data are integrated through density Clustering to form an original similar linguistic data base, and the density Clustering is realized by calling a Fast Clustering method in a Sentence-transform framework.
And (3) data sampling: in the original similar corpus, the corpus number in different similar corpus clusters is greatly different, in order to ensure generalization of the model, a sampling strategy is set, and the data of different clusters are respectively sampled, so that the number of similar sentences formed by different clusters in the training data is distributed evenly as much as possible, and the final training data is obtained.
(2) Model arrangement
The standard autoregressive cross entropy loss function is adopted in training, and in order to improve the generalization of the model and prevent overfitting, a Label smoothing (Label smoothing) strategy is added, namely a Label vector after smoothing is used in calculating the lossInstead of the conventional ont-hot encoded tag vector, let K be the total number of classes of multiple classes, α be the hyper-parameter (here we take 0.05), i=target be the true target class for class i, i+.target be the non-true target class for class i, i.e.
Therefore, the distribution of the label after the smoothness is equivalent to adding noise into the real distribution, the model is prevented from being too confident for correct labels, the output value difference of the predicted positive and negative samples is not too large, the overfitting is avoided, and the generalization capability of the model is improved.
During prediction generation, decoding the model based on a sampling strategy of Top_n and Top_p, wherein Top_n refers to that probability sampling is carried out from the first n words in word probability distribution at the current moment in each decoding process of the model, and the sampled words are used as output of the current time step; and Top_p refers to probability-based sampling by selecting words with the sum of probabilities p from large to small in the distribution according to the word probability in each step of decoding. In the actual use process, different strategies can be adopted to improve the diversity of generated sentences, the values of the Top_n and Top_p sampled parameters are increased, the temperature smoothing parameters are set, and the random dropout is started for decoding and the like.
(3) Post-processing of generated corpus
Because of randomness in decoding, sentences with high similarity exist in the sentences generated by the same Batch, namely, the repeated words in the sentences are too many and do not meet the diversity of expression, so that firstly, the generated sentences are filtered, the BLEU values among the sentences are mainly calculated, and if the BLEU values are higher than a set threshold, one of the sentences is deleted, so that a Batch of sentences with the diversity of expression can be obtained.
In practical application, in order to enable a customer to quickly select a plurality of available expanded sentences, a sorting index is designed to sort the filtered sentences, and the sentences with high quality are arranged in front. The ranking index is as follows:
sort_metric=a*similarity(x,yi)+b*BLEU,
BLEU=C*(0.1*bleu(x,yi))2*e-T
wherein similarity (x, yi) is the semantic similarity between the generated sentence yi and the original sentence x, and the value range is [ -1,1]; bleu (x, yi) is the geometric mean of 1-gram (i.e. accuracy of single word) and 2-gram (i.e. accuracy of double word pair) in BLEU value of generated sentence yi and original sentence x, the value range is [0,100], and the value range of final BLEU is [0,1]; c and T are control parameters, and mainly control the diversity of sentences in the sequence before the sequencing; a and b are weight parameters, and the importance of semantic similarity and sentence diversity on the sorting order is adjusted. When the semantic similarity is calculated, firstly, the sentences are encoded, then cosine similarity among the sentences is calculated to obtain similarity (x, yi), and the BLEU value is calculated by calling a sacreBlue package of Python.
The design of the sorting index mainly considers the quality requirement of the clients on the expanded sentences, and a batch of sentences with similar meanings and diversified expressions are obtained. Whether the semantics are similar or not is obtained by calculating similarity (x, yi) of two sentences, and the higher the similarity between the generated sentences and the original sentences is, the better; the diversity of the expression is measured by calculating the blu (x, yi) of two sentences, the blu value is [0,100], generally, the larger the blu value is, the more the repeated words of the generated sentences and the original sentences are, namely the worse the expression diversity is, the smaller the blu value is, the diversity of the generated sentences is improved, but the meaning of the similarity of the semantics is reduced, so that the high-quality corpus is about the median of the blu value in the value range under ideal conditions, namely the semantics are still highly similar while the diversity of the expression is ensured. The generated sentences can be controlled to reach the expected ideal effect by adjusting parameters in the BLEU function.
In order to select proper parameters, a batch of sentences are generated aiming at a sentence, the availability of the sentences is manually judged and marked, and a proper group of parameters are selected to be used as final parameters of the sequencing index by calculating the spearman correlation coefficients of different parameter combinations and manual marking results. After the sorting index is determined, when the number of sentences generated is specified, the sentences extracted after sorting can be ensured to be sentences meeting the quality requirement.
(4) Model evaluation
In order to verify the effectiveness of the generated model, experimental verification is carried out on two implemented projects respectively, and experimental results show that when about 3 seed sentences can be provided manually for each intention or standard question, the intention recognition model is directly trained by using the corpus expanded by the generated model, and finally the accuracy rate of the intention recognition exceeds that of the model trained by manually configuring the corpus.
Experiments prove that the usability of the corpus expansion model is improved, and in practical application, only 3 to 5 clauses are required to be questioned and written by an implementer aiming at each intention or standard, so that a large amount of available language can be expanded and written as training data of the intention recognition model, and the implementation speed of projects is accelerated.
(5) Corpus expansion model with proper noun reservation
When a sentence is generated by decoding, a model is generated word by word, which may lead to that proper nouns in the original sentence may not be directly generated by decoding, for example, the original sentence "how applied by village drivers? The 'village-through' is a proper noun, and in the generated sentence, the village-through cannot be completely appeared or the problem of unconsciousness is solved, so that the proper noun can be completely decoded, a hard copy strategy is adopted, and a new model is retrained.
1) Data preparation
And (3) summarizing a proper noun table in the existing training data set, replacing proper nouns in the sentence pair with proper nouns at the same time with the special token, namely, simultaneously appearing the special token at the source end and the target end, constructing a part of training data, and mixing the training data with the original training data to form a co-training model.
2) Model arrangement
The model is trained by adopting a standard autoregressive loss function; when the prediction is generated, the sentences without proper nouns are directly input into the model for decoding generation, the sentences with proper nouns are required to be replaced by [ special_token ] in the sentences for decoding generation, then the [ special_token ] in the generated sentences is replaced by the original proper nouns, and then the generated sentences are subjected to post-processing to obtain the corpus meeting the requirements.
Next, the process goes to step S140.
At step S140, identifying whether the expanded intent includes a confusion corpus; when the mixed corpus is contained, correcting the mixed corpus to obtain the corrected intention after expanding writing.
Checking whether the expanded intention is the confusion corpus again through a health degree checking function, and directly performing step S150 when the confusion corpus is not the confusion corpus; when the confusion corpus exists, the confusion prediction is corrected.
Next, the process goes to step S150.
At step S150, a text robot model is trained with the original intent, the seed corpus, and the expanded revised intent.
More specifically, after correction, clicking a training button, and training a text robot model through the original intention, the seed corpus and the expanded corrected intention.
Next, the process goes to step S160.
At step S160, verifying whether the text robot model meets expectations; when the text robot model does not accord with the expectation, training the text robot model again by performing expanded writing on single intention or adjusting or expanding writing on single corpus until the text robot model accords with the expectation.
More specifically, after training is completed, whether the robot model meets expectations or not is verified through the debugging interface, namely, some questions are input randomly, whether the robot model can successfully recognize intention information contained in the user input questions and reply corresponding answers is tested, and for the intention which does not meet expectations, adjustment or expansion is performed through expansion of single intention or single corpus, and then step S120 is repeated.
In an embodiment of the present invention, the method further includes:
the text robot model meeting the expectations is released and put on line;
screening user question questions which cannot be identified by the text robot model, and performing cluster analysis on the user question questions through a hierarchical clustering function to obtain a cluster analysis result;
providing suggestions according to the clustering analysis result; the suggestion includes creating a new intent based on the cluster analysis result or adding the cluster analysis result to a corpus of existing intents;
each time the intention is created or modified, question data in all intents is identified and corrected.
More specifically, after the test passes, the robot model is trained again and online is issued. After online, screening the problem that the robot can not identify in the user question, and performing cluster analysis on the unknown problem through the current existing hierarchical clustering function: for the clustering result, according to the similarity degree of each clustering cluster and the current intention cluster, the suggestion enforcer creates a new intention for the clustering cluster or adds the clustering cluster to the corpus of the current intention and provides suggestions for supplementing and perfecting the knowledge base, and the step S140 is repeated to form an intelligent operation closed loop.
Fig. 3 shows a text-based robotic training system 300 based on deep learning provided by the present invention, including a data processing module 310, a transcription module 320, and a verification module 330.
The data processing module 310 is configured to sort a preset number of seed corpora based on the original intention, and import the seed corpora into the robot knowledge base; identifying problem data in all intents and correcting the problem data; wherein all intents include the original intent and the seed corpus;
the write expansion module 320 is configured to perform batch write expansion on all intents to obtain expanded intents; identifying whether the expanded intention contains confusion corpus; when the mixed corpus is contained, correcting the mixed corpus to obtain the corrected intention after expanded writing;
the verification module 330 is configured to train a text robot model through the original intent, the seed corpus, and the expanded corrected intent; verifying whether the robot model meets expectations; when the robot model does not accord with the expectation, the robot model is trained again by performing expanded writing on a single intention or adjusting or expanding writing on a single corpus until the robot model accords with the expectation.
In an embodiment of the present invention, the system further includes: the expanding-writing model training module is configured to clean, cluster and resample the original data to construct training data; the original data is data collected from business; training a write-expansion model through the training data, the standard autoregressive cross entropy loss function and the label smoothing strategy, and generating sentences; filtering the generated sentences by calculating BLEU values among the sentences; and sequencing the filtered sentences through a preset sequencing index.
Referring to fig. 4, the disclosed embodiment also provides an electronic device 40, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the deep learning based text robot training method of the foregoing method embodiments.
The disclosed embodiments also provide a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the deep learning based text robot training method of the foregoing method embodiments.
The disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the deep learning based text robot training method of the foregoing method embodiments.
Referring now to fig. 4, a schematic diagram of an electronic device 40 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 4 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 4, the electronic device 40 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 401, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage means 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the electronic device 40 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
In general, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, magnetic tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 40 to communicate with other devices wirelessly or by wire to exchange data. While an electronic device 40 having various means is shown in the figures, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communications device 409, or from storage 408, or from ROM 402. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 401.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects an internet protocol address from the at least two internet protocol addresses and returns the internet protocol address; receiving an Internet protocol address returned by the node evaluation equipment; wherein the acquired internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer-readable medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the disclosure are intended to be covered by the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A text robot training method based on deep learning, the method comprising the steps of:
sorting a preset number of seed corpuses based on original intention, and importing the seed corpuses into a text robot knowledge base;
identifying problem data in all intents and correcting the problem data; wherein all intents include the original intent and the seed corpus;
batch write expansion is carried out on all intents to obtain the intents after write expansion;
identifying whether the expanded intention contains confusion corpus; when the mixed corpus is contained, correcting the mixed corpus to obtain the corrected intention after expanded writing;
training a text robot model through the original intention, the seed corpus and the expanded corrected intention;
verifying whether the text robot model meets expectations; when the text robot model does not accord with the expectation, training the text robot model again by performing expanded writing on single intention or adjusting or expanding writing on single corpus until the text robot model accords with the expectation.
2. The text robot training method based on deep learning of claim 1, further comprising:
the text robot model meeting the expectations is released and put on line;
screening user question questions which cannot be identified by the text robot model, and performing cluster analysis on the user question questions through a hierarchical clustering function to obtain a cluster analysis result;
providing suggestions according to the clustering analysis result; the suggestion includes creating a new intent based on the cluster analysis result or adding the cluster analysis result to a corpus of existing intents;
each time the intention is created or modified, question data in all intents is identified and corrected.
3. The text robot training method based on deep learning as claimed in claim 1 or 2, wherein the question data in all intents includes: at least one of FAQ answer similarity data, confusion intention data, and confusion corpus data
4. The text robot training method based on deep learning of claim 1 or 2, wherein the batch expansion of all intents comprises:
cleaning, clustering and resampling the original data to construct training data; the original data is data collected from business;
training a write-expansion model through the training data, the standard autoregressive cross entropy loss function and the label smoothing strategy, and generating sentences;
filtering the generated sentences by calculating BLEU values among the sentences;
and sequencing the filtered sentences through a preset sequencing index.
5. The deep learning based text robot training method of claim 4, further comprising:
and mixing a preset proper noun list with the original training data to jointly train the model to obtain a proper noun reserved corpus expansion model.
6. The text robot training method based on deep learning of claim 1, wherein the preset number of seed corpora is 3-5.
7. A text robotic training system based on deep learning, the system comprising:
the data processing module is configured to sort a preset number of seed corpuses based on the original intention, and import the seed corpuses into the robot knowledge base; identifying problem data in all intents and correcting the problem data; wherein all intents include the original intent and the seed corpus;
the write-expanding module is configured to perform batch write expansion on all intents to obtain the intents after write expansion; identifying whether the expanded intention contains confusion corpus; when the mixed corpus is contained, correcting the mixed corpus to obtain the corrected intention after expanded writing;
a verification module configured to train a text robot model with the original intent, the seed corpus, and the expanded revised intent; verifying whether the robot model meets expectations; when the robot model does not accord with the expectation, the robot model is trained again by performing expanded writing on a single intention or adjusting or expanding writing on a single corpus until the robot model accords with the expectation.
8. The deep learning based text robot training system of claim 7, further comprising:
the expanding-writing model training module is configured to clean, cluster and resample the original data to construct training data; the original data is data collected from business; training a write-expansion model through the training data, the standard autoregressive cross entropy loss function and the label smoothing strategy, and generating sentences; filtering the generated sentences by calculating BLEU values among the sentences; and sequencing the filtered sentences through a preset sequencing index.
9. An electronic device, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor, which when executed by the at least one processor, cause the at least one processor to perform the deep learning based text robot training method of any of claims 1 to 6.
10. A non-transitory computer-readable storage medium storing computer instructions that, when executed by at least one processor, cause the at least one processor to perform the deep learning based text robot training method of any of claims 1 to 6.
CN202310538987.1A 2023-05-12 2023-05-12 Text robot training method, system, equipment and medium based on deep learning Pending CN116468050A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310538987.1A CN116468050A (en) 2023-05-12 2023-05-12 Text robot training method, system, equipment and medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310538987.1A CN116468050A (en) 2023-05-12 2023-05-12 Text robot training method, system, equipment and medium based on deep learning

Publications (1)

Publication Number Publication Date
CN116468050A true CN116468050A (en) 2023-07-21

Family

ID=87185192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310538987.1A Pending CN116468050A (en) 2023-05-12 2023-05-12 Text robot training method, system, equipment and medium based on deep learning

Country Status (1)

Country Link
CN (1) CN116468050A (en)

Similar Documents

Publication Publication Date Title
CN109003624B (en) Emotion recognition method and device, computer equipment and storage medium
WO2022033332A1 (en) Dialogue generation method and apparatus, network training method and apparatus, storage medium, and device
CN111291166B (en) Method and device for training language model based on Bert
CN110287461B (en) Text conversion method, device and storage medium
CN110110337B (en) Translation model training method, medium, device and computing equipment
CN111738016B (en) Multi-intention recognition method and related equipment
CN111651996B (en) Digest generation method, digest generation device, electronic equipment and storage medium
CN111428010B (en) Man-machine intelligent question-answering method and device
CN112487139B (en) Text-based automatic question setting method and device and computer equipment
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN109635197B (en) Searching method, searching device, electronic equipment and storage medium
US11636272B2 (en) Hybrid natural language understanding
CN111402861A (en) Voice recognition method, device, equipment and storage medium
CN111625634A (en) Word slot recognition method and device, computer-readable storage medium and electronic device
CN111930914A (en) Question generation method and device, electronic equipment and computer-readable storage medium
CN116737908A (en) Knowledge question-answering method, device, equipment and storage medium
CN114154518A (en) Data enhancement model training method and device, electronic equipment and storage medium
CN112069781A (en) Comment generation method and device, terminal device and storage medium
EP4322066A1 (en) Method and apparatus for generating training data
CN112836476B (en) Summary generation method, device, equipment and medium
CN116468050A (en) Text robot training method, system, equipment and medium based on deep learning
CN112446206A (en) Menu title generation method and device
CN113421551B (en) Speech recognition method, speech recognition device, computer readable medium and electronic equipment
CN113343668B (en) Method and device for solving selected questions, electronic equipment and readable storage medium
EP4328805A1 (en) Method and apparatus for generating target deep learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination