CN110458207A - A kind of corpus Intention Anticipation method, corpus labeling method and electronic equipment - Google Patents
A kind of corpus Intention Anticipation method, corpus labeling method and electronic equipment Download PDFInfo
- Publication number
- CN110458207A CN110458207A CN201910669701.7A CN201910669701A CN110458207A CN 110458207 A CN110458207 A CN 110458207A CN 201910669701 A CN201910669701 A CN 201910669701A CN 110458207 A CN110458207 A CN 110458207A
- Authority
- CN
- China
- Prior art keywords
- corpus
- prediction
- initial
- samples
- intention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000002372 labelling Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 31
- 230000014509 gene expression Effects 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 238000007781 pre-processing Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 239000002537 cosmetic Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to natural language processing techniques, provide a kind of corpus Intention Anticipation method, and the method includes step: being based on pretreated sample, training obtains N number of prediction model;It is based respectively on each prediction model to predict corpus to be predicted, obtains N number of prediction result;Preset rules are matched based on N number of prediction result, determine the corresponding intent information of the corpus to be predicted;Wherein, the N is the odd number more than or equal to 3;The preset rules include: if there are identical prediction results in N number of prediction result, and identical number is greater than N/2, then it is determined that the identical prediction result is the corresponding intent information of the corpus to be predicted.Based on method provided by the present embodiment, the Intention Anticipation to corpus is realized, and improve prediction accuracy, so as to which duplicate artificial treatment work is greatly reduced.In addition, the present invention also provides a kind of corpus labeling method and electronic equipments.
Description
Technical Field
The present invention relates to natural language processing technologies, and in particular, to a corpus intent prediction method, a corpus tagging method, and an electronic device.
Background
The corpus is a basic resource for linguistic research of the corpus and is also a main resource of an empirical language research method. The traditional corpus is mainly applied to the aspects of lexicography, language teaching, traditional language research, statistics or example-based research in natural language processing and the like. With the development of internet big data and artificial intelligence technology, the corpus is also widely applied.
The language database has three characteristics, and language materials which are actually appeared in the practical use of languages, such as user messages and customer service conversations which are directly obtained from web pages, are stored in the language database; the corpus is a basic resource bearing linguistic knowledge, but is not equal to the linguistic knowledge; the real corpus can be useful resources only after being processed, the processing of the real corpus can comprise dirty data removal, semantic labeling, part of speech labeling and the like, when the corpus is labeled, each corpus data is often labeled mainly by manpower, and a large amount of manpower is consumed for labeling the repeated corpus because the corpus data often comprises a large amount of repeated corpus data.
Taking the training corpus of the intent recognition classifier as an example, a large amount of labeled corpus is required when a supervised learning algorithm is used to train a medical and cosmetic industry intent recognition classifier. Most of the labeling work is mainly manually marked, under most of conditions, the corpus is not processed in advance, a large amount of repeated data exists, and if the repeated data are not filtered, the labeling efficiency is influenced, and the manpower is wasted.
Disclosure of Invention
In order to solve the above problem, an embodiment of the present invention provides a corpus intent prediction method, including: training to obtain N prediction models based on the preprocessed samples; predicting the linguistic data to be predicted respectively based on each prediction model to obtain N prediction results; matching preset rules based on the N prediction results, and determining intention information corresponding to the linguistic data to be predicted; wherein N is an odd number of 3 or more; and the preset rule comprises the step of determining that the same prediction result is the intention information corresponding to the corpus to be predicted if the same prediction result exists in the N prediction results and the same number is larger than N/2.
In one implementation, the method for pretreating the sample comprises: collecting initial corpus data; performing intention recognition on the initial corpus data based on a regular expression; selecting N equal parts of the initial corpus data containing the target intention; and performing word segmentation on the N equal parts of initial corpus data, and performing text vectorization to obtain N equal parts of samples.
In one implementation, the method for performing intent recognition on the initial corpus data based on a regular expression includes: collecting intention information and corresponding keywords; and constructing the regular expression based on the target intention and the corresponding key words.
In one embodiment, the method for selecting N equal portions of the initial corpus data containing the target intent comprises: determining the target intention contained in all the initial corpus data; and respectively dividing the initial corpus data containing the same target intention into N equal parts, and respectively selecting one part from the initial corpus data containing different target intentions to merge to obtain the N equal parts of the initial corpus data containing the target intention.
In one embodiment, the method for training N prediction models based on the preprocessed samples includes: constructing N initial prediction models based on different algorithms; and training each initial prediction model based on the preprocessed samples to obtain the N prediction models.
In one implementation, the method further comprises the steps of: periodically carrying out iterative training on each prediction model; when the accuracy of each prediction model exceeds a preset threshold, the iterative training can be quitted; if the same number is smaller than N/2, recording the samples and the results of manual identification corresponding to the samples as iterative samples of each prediction model; if the same number is larger than N/2, recording the sample and the same prediction result as iteration samples of the prediction models with different prediction results.
Therefore, the corpus intent prediction method provided by the invention can realize automatic prediction of corpus data and obtain corresponding intent information, thereby saving labor cost and improving data processing efficiency. The corpus intention prediction method provided by the invention can predict the corpus to be predicted based on N prediction models, and determines the intention information of the corpus to be predicted by a voting system based on the prediction result so as to improve the accuracy of the prediction result. Furthermore, in the process of constructing the N prediction models, different algorithms are selected to construct the initial prediction model, and the training samples are preprocessed to ensure the balance of the samples, so that the accuracy of the prediction result is improved. Meanwhile, through periodic iteration, the prediction precision of the prediction model can be continuously improved, the accuracy of the prediction result can be ensured, and the method can adapt to the expansion requirement of the prediction corpus.
In addition, the invention also provides a corpus labeling method, which comprises the following steps: based on the corpus intention prediction method, carrying out intention prediction on the original corpus to obtain intention information; and labeling the linguistic data to be processed based on the intention information. Thereby providing an auxiliary reference for manual annotation.
The present invention further provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the corpus intent prediction method described above.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
FIG. 1 is a flow chart illustrating a method for predicting corpus intent according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a sample preprocessing method according to a first embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The first embodiment of the present invention is a corpus intent prediction method, which will be described in detail with reference to the drawings.
Referring to fig. 1, fig. 1 is a flowchart illustrating a corpus intent prediction method according to a first embodiment of the present invention.
As shown in fig. 1, the corpus preprocessing method provided by the present invention includes the following steps:
and 101, training to obtain N prediction models based on the preprocessed samples.
In the embodiment of the present invention, intent recognition is mainly implemented by relying on a plurality of prediction models, where N is an odd number greater than or equal to 3, in the construction process of the prediction models, the acquisition of training samples may be based on the method shown in fig. 2, and fig. 2 is a flowchart of a sample preprocessing method in the first embodiment of the present invention.
As shown in fig. 2, the method for preprocessing the sample may comprise the following steps:
step 201, collecting initial corpus data.
The corpus data can be obtained from a network, a service database and other ways, preferably, the corpus data related to an application scene can be selected as initial corpus data based on the requirements of actual application, and after the initial corpus data is obtained, the initial corpus data can be subjected to operations such as screening, cleaning and the like to filter invalid data.
And 202, performing intention identification on the initial corpus data based on the regular expression.
Since the initial corpus data may contain non-target data, that is, data not containing target intention information. Specifically, in an actual application scenario, effective intention information is limited, and what is called effective means that a machine can process the intention information, so that the intention information can be realized based on a regular expression when the intention of the initial corpus data is recognized.
The method for constructing the regular expression can comprise the following steps: and collecting intention information and corresponding keywords, and constructing the regular expression based on the target intention and the corresponding keywords.
For example, the corpus containing query price intentions may include the following keywords: expense, cost, presumably, generally, need, total, possibility, and how much, and how much, then a corresponding regular expression may be constructed:
(cost | general | need | total | possible) ] (how much |)
Based on this, corpora (i.e., industry corpora) in the application scene can be summarized manually or in other ways, and keywords corresponding to each target intention information are obtained through summarization, so that regular expressions for identifying the target intention information are constructed, and the initial corpus data is identified based on the regular expressions respectively to determine the target intention information corresponding to the initial corpus data, wherein the target intention can be selected based on all intentions contained in the initial corpus data, and can also be set based on actual requirements. The intention and the keyword are collected based on the industry linguistic data, so that the application scene can be more suitable, the target intention can be quickly obtained, and the prediction result of the trained model is in the target range.
In other embodiments of the present invention, the regular expression may be obtained based on a larger range of corpus data summary, and the collection of the keywords is more complete, so that the recognition accuracy of the regular expression can be improved.
Step 203, selecting N equal parts of the initial corpus data containing the target intent.
Through the recognition of the regular expressions, the target intention corresponding to each initial corpus data can be determined from the initial corpus data, so that the initial corpus data can be screened based on the target intention, and the specific process can comprise the following steps: determining target intents contained in all initial corpus data; and respectively dividing the initial corpus data containing the same target intention into N equal parts, and respectively selecting one part from the initial corpus data containing different target intentions to merge to obtain N equal parts of initial corpus data containing the target intention.
For example, 10000 pieces of initial corpus data are identified by regular expressions, and then it is determined that the target intent contained in 4000 pieces of initial corpus data is "inquiry price", the target intent contained in 2000 pieces of initial corpus data is "preferential query", the target intent contained in 3000 pieces of initial corpus data is "product consultation", the target intent contained in 400 pieces of initial corpus data is "after-sale consultation", and 600 pieces of invalid data, that is, data not containing the target intent. If N is equal to 4, the number of each target intention type may be divided into 4 equal parts, and then merged into one data, that is, each data includes 1000 corpora for inquiry price, 500 corpora for preferential inquiry, 750 corpora for product inquiry, and 100 corpora for after-sale inquiry, thereby obtaining 4 equal parts of initial corpora data including target intention. The initial corpus data is equally divided, and the integrity of the sample can be ensured to a certain extent.
And 204, performing word segmentation on the N equal parts of initial corpus data, and performing text vectorization to obtain the N equal parts of samples.
After N equal parts of initial corpus data are obtained, word segmentation and text vectorization are respectively carried out on each initial corpus data to obtain N equal parts of samples for training a prediction model.
By the method described in the above steps 201 to 204, the training sample can be preprocessed, so that the effectiveness and integrity of the sample can be improved.
After obtaining N equal portions of the preprocessed samples, the prediction model may be trained based on the samples, which specifically includes:
first, N initial prediction models are constructed based on different algorithms.
Wherein, the initial prediction model can be respectively constructed based on a two-classification, a multi-classification or a deep learning algorithm, and comprises the following steps: naive Bayes, support vector machines, random forests, xgboost, convolutional neural networks, etc., the specific selection of the algorithm can be based on actual requirements, and the embodiment of the invention is not limited at all.
Then, training each initial prediction model based on the preprocessed samples respectively to obtain the N prediction models.
Specifically, the N equal samples can be used for training an initial prediction model respectively, that is, the training samples used by the initial prediction models are different, but the number and the included target intent are consistent. The specific training method may be an existing model training method selected as needed, and the embodiment of the present invention is not limited in any way.
By the method, N prediction models can be obtained by training based on the preprocessed samples.
And step 102, predicting the linguistic data to be predicted respectively based on each prediction model to obtain N prediction results.
When the corpus to be predicted is predicted, the corpus to be predicted can be predicted based on the N prediction models respectively, so that N prediction results are obtained.
And 103, matching a preset rule based on the N prediction results, and determining intention information corresponding to the corpus to be predicted.
It can be understood that, since the N prediction models are respectively constructed based on different algorithms and training samples are different, prediction accuracy of the models may have a certain difference, and therefore, the N obtained prediction results may have differences, for example, different or all the same, and therefore, in order to improve accuracy of the prediction results, a voting system may be adopted to determine intention information corresponding to the corpus to be predicted.
Specifically, the preset rules may include: and if the same prediction results exist in the N prediction results and the same number is larger than N/2, determining the same prediction results as the intention information corresponding to the linguistic data to be predicted.
That is, based on the N prediction results, a voting system may be adopted to determine the intention information corresponding to the corpus to be predicted, but in order to further ensure the accuracy of the result, a threshold may be set to determine whether there is a correct intention in the N prediction results, in this embodiment, the threshold is N/2, that is, when more than half of the prediction results are the same, the intention information of the corpus to be predicted can only be determined, and if the same number in the prediction results does not exceed half, the correct intention information cannot be determined, and the prediction is considered to be invalid.
In the provided embodiment of the present invention, the prediction result for each corpus to be predicted can be recorded for subsequent model iteration.
Specifically, since the N prediction models are trained based on the initial samples, that is, put into use, there is a large improvement space in the accuracy of prediction, and in order to further ensure the accuracy of the prediction result, iterative training may be periodically performed on each prediction model. The samples used for each iterative training may include the samples used for the initial training, as well as a record derived from the prediction.
Specifically, in the prediction result, if the same number is smaller than N/2, the samples and the results of manual identification corresponding to the samples are recorded as iteration samples of each prediction model; and if the same number is larger than N/2, recording the samples and the same prediction result as iteration samples of prediction models with different prediction results.
Therefore, iterative training is respectively carried out on each prediction model based on the samples, when the accuracy of each prediction model exceeds a preset threshold value, the iterative training can be quitted, and a new round of corpus prediction is started. It can be understood that, in a new round of corpus prediction, an iterative sample may also be obtained for the next iterative training, and the addition of a new type of corpus to be predicted may also expand the prediction range of the prediction model to a certain extent.
The setting of the iteration cycle may be based on a fixed time period, or may be determined based on an actual data amount or the number of iteration samples obtained from the prediction result.
If the prediction results of the prediction models are basically consistent after several rounds of iterative training, the iteration can be stopped.
By the corpus preprocessing method, the corpus to be predicted can be predicted to obtain corresponding intention information, so that the cost of manual identification can be saved, and the data processing efficiency is improved.
Based on the same inventive concept, a second embodiment of the present invention provides a corpus tagging method. The method may specifically comprise:
first, the corpus to be processed is subjected to intent information identification to obtain corresponding intent information, wherein a specific method for the intent information identification may refer to the corpus intent prediction method provided in the embodiment of fig. 1, and thus, details thereof are not repeated herein.
And then, labeling the linguistic data to be processed based on the obtained intention information.
By the method provided by the embodiment, automatic identification and automatic labeling of the corpus intentions can be realized, so that labeled corpus data can be obtained and can be directly used in other application scenes or used for manual labeling reference, and the manual labeling speed is improved.
Another embodiment of the invention relates to an electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the corpus intent prediction method in the embodiment shown in fig. 1.
Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting together one or more of the various circuits of the processor and the memory. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.
Yet another embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program, when executed by a processor, implements the above-described method embodiments.
Those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (8)
1. A method for predicting corpus intent, the method comprising the steps of:
training to obtain N prediction models based on the preprocessed samples;
predicting the linguistic data to be predicted respectively based on each prediction model to obtain N prediction results;
matching preset rules based on the N prediction results, and determining intention information corresponding to the linguistic data to be predicted;
wherein N is an odd number of 3 or more;
the preset rule comprises:
and if the same prediction results exist in the N prediction results and the same number is larger than N/2, determining that the same prediction results are the intention information corresponding to the corpus to be predicted.
2. The method of claim 1, wherein the sample is pre-processed by a method comprising:
collecting initial corpus data;
performing intention recognition on the initial corpus data based on a regular expression;
selecting N equal parts of the initial corpus data containing the target intention;
and performing word segmentation on the N equal parts of initial corpus data, and performing text vectorization to obtain N equal parts of samples.
3. The method according to claim 2, wherein the method for performing intent recognition on the initial corpus data based on regular expressions comprises:
collecting intention information and corresponding keywords;
and constructing the regular expression based on the target intention and the corresponding key words.
4. The method of claim 2, wherein said selecting N equal portions of said initial corpus data containing a target intent comprises:
determining the target intention contained in all the initial corpus data;
and respectively dividing the initial corpus data containing the same target intention into N equal parts, and respectively selecting one part from the initial corpus data containing different target intentions to merge to obtain the N equal parts of the initial corpus data containing the target intention.
5. The method of claim 1, wherein the training of the N predictive models based on the preprocessed samples comprises:
constructing N initial prediction models based on different algorithms;
and training each initial prediction model based on the preprocessed samples to obtain the N prediction models.
6. The method of claim 1, further comprising the steps of:
periodically carrying out iterative training on each prediction model;
when the accuracy of each prediction model exceeds a preset threshold, the iterative training can be quitted;
if the same number is smaller than N/2, recording the samples and the results of manual identification corresponding to the samples as iterative samples of each prediction model;
if the same number is larger than N/2, recording the sample and the same prediction result as iteration samples of the prediction models with different prediction results.
7. A corpus tagging method, comprising the steps of:
the corpus intent prediction method according to any one of claims 1 to 6, wherein intent prediction is performed on a corpus to be processed to obtain the intent information;
and labeling the linguistic data to be processed based on the intention information.
8. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the corpus intent prediction method according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910669701.7A CN110458207A (en) | 2019-07-24 | 2019-07-24 | A kind of corpus Intention Anticipation method, corpus labeling method and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910669701.7A CN110458207A (en) | 2019-07-24 | 2019-07-24 | A kind of corpus Intention Anticipation method, corpus labeling method and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110458207A true CN110458207A (en) | 2019-11-15 |
Family
ID=68483217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910669701.7A Pending CN110458207A (en) | 2019-07-24 | 2019-07-24 | A kind of corpus Intention Anticipation method, corpus labeling method and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458207A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111914936A (en) * | 2020-08-05 | 2020-11-10 | 平安科技(深圳)有限公司 | Data feature enhancement method and device for corpus data and computer equipment |
CN112069786A (en) * | 2020-08-25 | 2020-12-11 | 北京字节跳动网络技术有限公司 | Text information processing method and device, electronic equipment and medium |
CN113742399A (en) * | 2021-09-07 | 2021-12-03 | 天之翼(苏州)科技有限公司 | Data tracing method and system based on cloud edge cooperation |
WO2022183547A1 (en) * | 2021-03-03 | 2022-09-09 | 平安科技(深圳)有限公司 | Corpus intention recognition method and apparatus, storage medium, and computer device |
US11537660B2 (en) | 2020-06-18 | 2022-12-27 | International Business Machines Corporation | Targeted partial re-enrichment of a corpus based on NLP model enhancements |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522556A (en) * | 2018-11-16 | 2019-03-26 | 北京九狐时代智能科技有限公司 | A kind of intension recognizing method and device |
CN109871543A (en) * | 2019-03-12 | 2019-06-11 | 广东小天才科技有限公司 | Intention acquisition method and system |
-
2019
- 2019-07-24 CN CN201910669701.7A patent/CN110458207A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522556A (en) * | 2018-11-16 | 2019-03-26 | 北京九狐时代智能科技有限公司 | A kind of intension recognizing method and device |
CN109871543A (en) * | 2019-03-12 | 2019-06-11 | 广东小天才科技有限公司 | Intention acquisition method and system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11537660B2 (en) | 2020-06-18 | 2022-12-27 | International Business Machines Corporation | Targeted partial re-enrichment of a corpus based on NLP model enhancements |
CN111914936A (en) * | 2020-08-05 | 2020-11-10 | 平安科技(深圳)有限公司 | Data feature enhancement method and device for corpus data and computer equipment |
CN111914936B (en) * | 2020-08-05 | 2023-05-09 | 平安科技(深圳)有限公司 | Data characteristic enhancement method and device for corpus data and computer equipment |
CN112069786A (en) * | 2020-08-25 | 2020-12-11 | 北京字节跳动网络技术有限公司 | Text information processing method and device, electronic equipment and medium |
WO2022183547A1 (en) * | 2021-03-03 | 2022-09-09 | 平安科技(深圳)有限公司 | Corpus intention recognition method and apparatus, storage medium, and computer device |
CN113742399A (en) * | 2021-09-07 | 2021-12-03 | 天之翼(苏州)科技有限公司 | Data tracing method and system based on cloud edge cooperation |
CN113742399B (en) * | 2021-09-07 | 2023-10-17 | 天之翼(苏州)科技有限公司 | Cloud edge collaboration-based data tracing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458207A (en) | A kind of corpus Intention Anticipation method, corpus labeling method and electronic equipment | |
CN106874292B (en) | Topic processing method and device | |
CN108875059B (en) | Method and device for generating document tag, electronic equipment and storage medium | |
CN109086265B (en) | Semantic training method and multi-semantic word disambiguation method in short text | |
CN110516063A (en) | A kind of update method of service system, electronic equipment and readable storage medium storing program for executing | |
CN110019668A (en) | A kind of text searching method and device | |
CN104111925B (en) | Item recommendation method and device | |
CN112464656A (en) | Keyword extraction method and device, electronic equipment and storage medium | |
CN109947902A (en) | A kind of data query method, apparatus and readable medium | |
CN104298776A (en) | LDA model-based search engine result optimization system | |
CN111401065A (en) | Entity identification method, device, equipment and storage medium | |
US11594054B2 (en) | Document lineage management system | |
CN115526171A (en) | Intention identification method, device, equipment and computer readable storage medium | |
Chen et al. | Improved Naive Bayes with optimal correlation factor for text classification | |
CN110019670A (en) | A kind of text searching method and device | |
CN114676346A (en) | News event processing method and device, computer equipment and storage medium | |
CN114282513A (en) | Text semantic similarity matching method and system, intelligent terminal and storage medium | |
US20230142351A1 (en) | Methods and systems for searching and retrieving information | |
AU2019290658B2 (en) | Systems and methods for identifying and linking events in structured proceedings | |
CN116933782A (en) | E-commerce text keyword extraction processing method and system | |
CN115328945A (en) | Data asset retrieval method, electronic device and computer-readable storage medium | |
Shahade et al. | Deep learning approach-based hybrid fine-tuned Smith algorithm with Adam optimiser for multilingual opinion mining | |
Patel et al. | Optimized Text Summarization Using Abstraction and Extraction | |
CN118094019B (en) | Text associated content recommendation method and device and electronic equipment | |
Andersen et al. | More Sustainable Text Classification via Uncertainty Sampling and a Human-in-the-Loop |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191115 |
|
RJ01 | Rejection of invention patent application after publication |