CN111241258A - Data cleaning method and device, computer equipment and readable storage medium - Google Patents
Data cleaning method and device, computer equipment and readable storage medium Download PDFInfo
- Publication number
- CN111241258A CN111241258A CN202010016777.2A CN202010016777A CN111241258A CN 111241258 A CN111241258 A CN 111241258A CN 202010016777 A CN202010016777 A CN 202010016777A CN 111241258 A CN111241258 A CN 111241258A
- Authority
- CN
- China
- Prior art keywords
- knowledge point
- feature vector
- point problem
- vector sequence
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000013598 vector Substances 0.000 claims abstract description 329
- 238000012549 training Methods 0.000 claims abstract description 49
- 230000007246 mechanism Effects 0.000 claims abstract description 36
- 230000002457 bidirectional effect Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 5
- 230000009286 beneficial effect Effects 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 241000287127 Passeridae Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a data cleaning method, a data cleaning device, computer equipment and a readable storage medium, wherein the method comprises the following steps: acquiring data to be cleaned, and aiming at each knowledge point, respectively forming a knowledge point pair by the main knowledge point problem and each sub knowledge point problem; aiming at each knowledge point pair, respectively inputting a main knowledge point problem and a sub knowledge point problem into a bert pre-training model, outputting a context semantic feature vector sequence of the main knowledge point problem, and outputting a context semantic feature vector sequence of the sub knowledge point problem; aiming at each knowledge point pair, inputting a context semantic feature vector sequence of a main knowledge point problem and a context semantic feature vector sequence of a sub knowledge point problem into an attention mechanism model, and outputting a semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem; and determining whether the knowledge point pair is dirty data or not according to the semantic matching degree.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data cleaning method, a data cleaning device, computer equipment and a readable storage medium.
Background
With the rapid development of the internet industry over the years, more and more internet platforms have begun to use intelligent question-answering systems to serve on-line customer consulting services. However, the use effect of the intelligent question-answering system depends on the underlying knowledge base, and knowledge points sorted according to the online real-time data are stored in the knowledge base. The knowledge base has a plurality of knowledge points, and each knowledge point comprises a plurality of sub knowledge points and corresponding answers. The intelligent question-answering system will output the best answer from the knowledge base according to the customer's question.
Therefore, for the intelligent question-answering system, the accuracy of the knowledge base determines the application effect of the intelligent question-answering system. Meanwhile, dirty data exist in a knowledge base which is arranged according to online data, and the method is very important for verifying knowledge points of the knowledge base and ensuring the correctness of answers. Therefore, data detection and cleaning are needed for the knowledge base.
At present, the data detection and cleaning method has insufficient text semantic understanding capacity, so that the accuracy of data detection and cleaning is influenced.
Disclosure of Invention
The embodiment of the invention provides a data cleaning method, which aims to solve the technical problem of low accuracy in data cleaning in the prior art. The method comprises the following steps:
acquiring data to be cleaned, and aiming at each knowledge point, respectively forming a knowledge point pair by the main knowledge point problem and each sub knowledge point problem;
aiming at each knowledge point pair, respectively inputting a main knowledge point problem and a sub knowledge point problem into a bert pre-training model, outputting a context semantic feature vector sequence of the main knowledge point problem, and outputting a context semantic feature vector sequence of the sub knowledge point problem, wherein the bert pre-training model is obtained by training based on sample data of data to be cleaned;
aiming at each knowledge point pair, inputting a context semantic feature vector sequence of a main knowledge point problem and a context semantic feature vector sequence of a sub knowledge point problem into an attention mechanism model, and outputting a semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, wherein the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem comprise feature information representing the global importance degree of a feature vector;
and determining whether the knowledge point pair is dirty data or not according to the semantic matching degree.
The embodiment of the invention also provides a data cleaning device, which is used for solving the technical problem of low accuracy in data cleaning in the prior art. The device includes:
the data acquisition module is used for acquiring data to be cleaned and respectively forming knowledge point pairs by the main knowledge point problem and each sub knowledge point problem aiming at each knowledge point;
the vector extraction module is used for respectively inputting the main knowledge point problem and the sub knowledge point problem into a bert pre-training model aiming at each knowledge point pair, outputting a context semantic feature vector sequence of the main knowledge point problem and outputting a context semantic feature vector sequence of the sub knowledge point problem, wherein the bert pre-training model is obtained by training based on sample data of data to be cleaned;
the semantic matching degree calculation module is used for inputting the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem into an attention mechanism model aiming at each knowledge point pair, and outputting a semantic matching degree value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, wherein the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem comprise feature information representing the global importance degree of a feature vector;
and the data cleaning module is used for determining whether the knowledge point pair is dirty data or not according to the semantic matching degree.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the random data cleaning method when executing the computer program so as to solve the technical problem of low accuracy in data cleaning in the prior art.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing any data cleaning method is stored in the computer-readable storage medium, so as to solve the technical problem in the prior art that data cleaning is low in accuracy.
In the embodiment of the invention, a knowledge point pair is provided by respectively forming a main knowledge point problem and each sub knowledge point problem, a context semantic feature vector sequence of the main knowledge point problem and a context semantic feature vector sequence of the sub knowledge point problem in the knowledge point pair are extracted based on a bert pre-training model, the context semantic feature vector sequence is extracted based on the deep learning semantic understanding capability, an attention mechanism model is further adopted to process based on the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem, a semantic matching value of the knowledge point pair is output according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, and the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem comprise feature information representing the global importance degree of a feature vector, compared with the prior art, the data cleaning method detects and cleans data based on the semantic understanding capability of deep learning, is beneficial to improving the accuracy of data cleaning, is beneficial to improving the processing efficiency of data detection and cleaning, and is beneficial to reducing the input cost of manpower and material resources.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart of a data cleansing method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a computer device according to an embodiment of the present invention;
fig. 3 is a block diagram of a data cleansing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In an embodiment of the present invention, a data cleansing method is provided, as shown in fig. 1, the method including:
step 102: acquiring data to be cleaned, and aiming at each knowledge point, respectively forming a knowledge point pair by the main knowledge point problem and each sub knowledge point problem;
step 104: aiming at each knowledge point pair, respectively inputting a main knowledge point problem and a sub knowledge point problem into a bert pre-training model, outputting a context semantic feature vector sequence of the main knowledge point problem, and outputting a context semantic feature vector sequence of the sub knowledge point problem, wherein the bert pre-training model is obtained by training based on sample data of data to be cleaned;
step 106: aiming at each knowledge point pair, inputting a context semantic feature vector sequence of a main knowledge point problem and a context semantic feature vector sequence of a sub knowledge point problem into an attention mechanism model, and outputting a semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, wherein the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem comprise feature information representing the global importance degree of a feature vector;
step 108: and determining whether the knowledge point pair is dirty data or not according to the semantic matching degree.
It can be known from the flow shown in fig. 1 that, in the embodiment of the present invention, a knowledge point pair is proposed to respectively constitute a main knowledge point problem and each sub knowledge point problem, a context semantic feature vector sequence of the main knowledge point problem and a context semantic feature vector sequence of the sub knowledge point problem in the knowledge point pair are extracted based on a bert pre-training model, so as to realize extraction of the context semantic feature vector sequence based on deep learning, and further an attention mechanism model is adopted to process based on the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem, and a semantic matching value of the knowledge point pair is output according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, and the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem include feature information representing the global importance degree of the feature vector, compared with the prior art, the data cleaning method detects and cleans data based on the semantic understanding capability of deep learning, is beneficial to improving the accuracy of data cleaning, is beneficial to improving the processing efficiency of data detection and cleaning, and is beneficial to reducing the input cost of manpower and material resources.
In specific implementation, the data cleaning method can be used for cleaning the knowledge point data, for example, can be used for cleaning the data of a knowledge base.
In specific implementation, the above-mentioned main knowledge point problem may be a problem in a standard form for a certain knowledge point, the sub-knowledge point problem may be a problem in a non-standard form for a certain knowledge point, and for a certain knowledge point, there may be a main knowledge point problem a and a plurality of sub-knowledge point problems a, b, c, d … …, etc., for example, taking an insurance application scenario as an example, the main knowledge point problem may be: asking for a question, can you buy insurance? The sub-knowledge point problem may be: i want to buy insurance, the child knowledge point problem could also be: what classes of insurance are there? . Therefore, for a knowledge point, the main knowledge point problem and each sub knowledge point problem constitute a knowledge point pair, i.e. there may be multiple knowledge point pairs, for example, there may be (a, a), (a, b), (a, c) … …, etc.
In specific implementation, before the data cleaning method is implemented, deep learning training can be performed on a neural network or various machine learning components by using historical data of application scenes to obtain the bert pre-training model and the attention mechanism model, so that the data cleaning method has the convenience of performing transverse extension on the application field, for example, the bert pre-training model and the attention mechanism model are trained by using corresponding application scene data as samples in which application scenes the data cleaning method needs to be applied to. The samples comprise positive samples and negative samples, for example, the positive samples can be knowledge point pairs matched with a main knowledge point problem and a sub knowledge point problem, and the semantic matching value can be set to 1; the negative sample can be a knowledge point pair with unmatched main knowledge point problem and sub knowledge point problem, and the semantic matching value can be set to be 0; the bert pre-training model and the attention mechanism model are obtained by repeatedly training positive samples and negative samples. The bert pre-training model and the attention mechanism model with strong semantic understanding capability can be obtained by training based on a small amount of sample data.
In specific implementation, the trained bert pre-training model is used as an embedding layer, token embedding coding is carried out on tokens in each sentence, and the primary knowledge point problem and the sub-knowledge point problem generate a word vector sequence through the bert pre-training model. BERT parameters are trained in the training process of the BERT pre-training model, fine-tuning _ tuning of the BERT pre-training model can be accessed to a subsequent interactive semantic understanding model, and a word vector sequence generated by the BERT pre-training model can be subjected to neural network training through a Bi-directional long-time and short-time memory network Bi-LSTM neural network.
In specific implementation, the trained bert pre-training model is used to extract the context semantic feature vector sequence based on the semantic understanding ability of deep learning, for example, for each knowledge point pair, the main knowledge point problem and the sub knowledge point problem are respectively input into the bert pre-training model, the context semantic feature vector sequence of the main knowledge point problem is output, and the context semantic feature vector sequence of the sub knowledge point problem is output, which includes:
problem q of main knowledge pointiAnd child knowledge point problem qin(A sub-knowledge point problem set can be represented as Qi={qi1,qi2,....,qin}) respectively inputting a bert pre-training model, wherein the bert pre-training model executes the following steps:
outputting a master knowledge point problem qiHaving a word vector sequence of fixed length, outputting each sub-knowledge point problem qinA word vector sequence of fixed length;
aiming at the knowledge point pairs with the same length between the word vector sequence of the sub-knowledge point problem and the word vector sequence of the main knowledge point problem, the word vector sequence of the main knowledge point problem is input into a bidirectional long-time memory network, and the context semantic feature vector sequence of the main knowledge point problem is outputInputting the word vector sequence of the sub-knowledge point problem into a bidirectional long-time memory network, and outputting the context semantic feature vector sequence of the sub-knowledge point problem
In specific implementation, a Bi-directional long-time and short-time memory network Bi-LSTM unit can be adopted in the trained bert pre-training model to extract context semantic feature vector sequences of the main knowledge point problem and the sub-knowledge point problem. In particular, in Bi-LSIn the TM neural network, aiming at each time t, vector sequence output h output by two long and short memory network LSTM units for splicing a forward text word vector sequence and a reverse text word vector sequencefwAnd hbwAnd outputting the final characteristic vector at the time t of the Bi-LSTM neural network, wherein the dimension of the characteristic vector is 2 times of that of the characteristic vector output by the LSTM unit.
ht=[hfw,hbw]
Wherein h isfwRepresenting the output of an LSTM unit processing a sequence of text word vectors in positive order, hbwRepresenting the output of an LSTM unit processing a sequence of vectors of reversed text words, htA feature vector sequence representing the output of the Bi-LSTM network (i.e., a feature vector at time t in a context semantic feature vector sequence representing the main knowledge point problem and the sub knowledge point problem).
In specific implementation, in order to further enhance the semantic understanding ability during the data cleaning process, in this embodiment, a trained attention mechanism model is used to calculate the semantic matching value of a knowledge point pair, for example, for each knowledge point pair, inputting the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem into the attention mechanism model, and outputting the semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, including:
problem q of main knowledge pointiContext semantic feature vector sequence ofAnd child knowledge point problem qinContext semantic feature vector sequence ofInputting an attention mechanism model, and executing the following steps through the attention mechanism model:
calculating attention weight a of each feature vector in context semantic feature vector sequence of main knowledge point problemqtContext language for calculating sub-knowledge point problemDefining attention weights for each feature vector in a sequence of feature vectors
Taking the attention weight as the feature information, and carrying out corresponding attention weight a on each feature vector in the context semantic feature vector sequence of the main knowledge point problemqtWeighting to obtain a characteristic vector sequence S of the main knowledge point problemqPerforming corresponding attention weight on each feature vector in the context semantic feature vector sequence of the sub-knowledge point problemWeighting to obtain the characteristic vector sequence of the sub-knowledge point problemSpecifically, the attention weight weighting corresponding to each feature vector in the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem can be performed through the following formula: st=atht,htThe feature vector (i.e. each feature vector in the context semantic feature vector sequence) a output for each time t of the bidirectional long-short time memory networkt(collectively represent a)qtAnd) Attention weight, s, of the feature vector output for the time ttNew feature vectors for the weighted words at time t.
And outputting the semantic matching value of the knowledge point pair according to the characteristic vector sequence of the main knowledge point problem and the characteristic vector sequence of the sub knowledge point problem.
In specific implementation, the attention mechanism model executes the following steps to calculate the attention weight of each feature vector in the context semantic feature vector sequence of the main knowledge point problem, and calculate the attention weight of each feature vector in the context semantic feature vector sequence of the sub knowledge point problem:
problem q of main knowledge pointiContext semantic feature vector sequence ofAnd child knowledge point problem qinContext semantic feature vector sequence ofPerforming vector splicing on the feature vectors of the state at the last moment to obtain background information; namely, the problem q of the main knowledge point in each knowledge point pairiContext semantic feature vector sequence ofThe eigenvector and sub-knowledge point problem q of the last moment state of (1)inContext semantic feature vector sequence ofAnd carrying out vector splicing on the feature vectors of the last moment state to obtain background information, wherein the background information comprises semantic feature vectors of all time states before the main knowledge point problem and the sub knowledge point problem.
Reducing the dimension of the background information to half; the dimensionality of the background information is reduced to be consistent with a context semantic feature vector sequence output by a bidirectional long-time memory network, the function can be realized through a full connection layer of an attention mechanism model, and the background information after dimensionality reduction is expressed as bkg.
Calculating context semantic feature vector sequence of the background information and main knowledge point problemSimilarity values among feature vectors of each moment in the master knowledge point problem, and similarity values corresponding to all feature vectors in the context semantic feature vector sequence of the master knowledge point problem form a similarity vector Sim of the master knowledge point problemqCalculating the context semantic feature vector sequence of the background information and sub-knowledge point problemSimilarity values between feature vectors of every moment in the sub-knowledge point problem, and similarity values corresponding to all feature vectors in the context semantic feature vector sequence of the sub-knowledge point problem form the similarity vector of the sub-knowledge point problemSpecifically, the feature vector h output by the background information and the bidirectional long-and-short time memory network at each time t can be calculated by the following formulatSimilarity value sim oft:simt=bkg·ht。
Similarity vector Sim according to the master knowledge point problemqCalculating attention weight of each feature vector in context semantic feature vector sequence of main knowledge point problem, and calculating attention weight of each feature vector according to similarity vector of sub knowledge point problemCalculating the attention weight of each feature vector in the context semantic feature vector sequence of the sub-knowledge point problem; specifically, a softmax calculation mode can be introduced into the attention mechanism model to complete the function, numerical conversion is performed on the similarity vectors, the weight of global important text feature information can be highlighted through an intrinsic mechanism of softmax while data are normalized, and the attention weight of each feature vector in a context semantic feature vector sequence of a principal knowledge point problem and a context semantic feature vector sequence of a sub-knowledge point problem can be calculated through the following formulas:wherein, atAn attention weight representing a feature vector at time t; simtAnd representing the similarity value corresponding to the feature vector at the time t, wherein N represents the total time, and is equal to the vocabulary number of the text.
In specific implementation, the attention mechanism model outputs the semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem by executing the following steps:
calculating the similarity value between each feature vector in the feature vector sequence of the main knowledge point problem and each feature vector in the feature vector sequence of the sub knowledge point problem, wherein the similarity values form a similarity matrix; specifically, the similarity value can be calculated by the following formula:wherein s isqiFeature vector sequence S representing a problem of a master knowledge pointqThe ith feature vector of (a) is,feature vector sequence representing sub-knowledge point problemThe (j) th feature vector of (a),representing a feature vector sqiAndthe similarity values form a similarity matrix sim.
According to the descending order of the similarity values, feature vectors corresponding to a plurality of preset similarity values are taken to form semantic matching feature vectors of the knowledge point pair; specifically, the trained attention mechanism model can complete the function through K-MAX Pooling, and feature vectors corresponding to larger K similarity values in the similarity matrix sim are selected to form semantic matching feature vectors, and the semantic matching feature vectors represent knowledge point pairs to perform text semantic matching.
And inputting the semantic matching feature vector into a full connection layer of the attention mechanism model, outputting a semantic matching value of the knowledge point pair, specifically, finishing the function of the trained attention mechanism model through the full connection layer, finally performing secondary classification of text semantic matching by using a softmax classifier, and outputting the semantic matching value of the knowledge point pair, wherein different semantic matching values represent different judgment results. When the attention mechanism model is trained, a gradient descent method can be used for training weights of obtained prediction results (matching and mismatching), and the neural network model with the best training effect is stored for an intelligent knowledge base system.
In specific implementation, the semantic matching degree value M of the output knowledge point pair may be a numerical value from 0 to 1, and when the semantic matching degree M is greater than a set threshold valueWhen the semantic matching degree M is smaller than a set threshold value, the semantic matching degree M is smaller than the set threshold valueAnd when the data is not matched, the data is considered as dirty data, and the data can be directly returned to the working pool, and the correct answer corresponding to the dirty data matching needs to be further audited and cleaned manually. Therefore, the risk of decline of the sparrow rate of the intelligent question-answering system caused by the non-standard construction and arrangement of the knowledge base can be effectively avoided, the intelligent question-answering system can be matched with the corresponding problems, and correct answers are returned.
In specific implementation, the data cleaning method can be used for a question and answer knowledge base of an online intelligent question and answer robot or system. For example, taking an insurance application scenario as an example, aiming at an online intelligent question-answering robot or system serving a micro-insurance channel, in the face of insurance business consultation services of a large number of micro-credit users, a question-answering knowledge base can be enriched by continuously collecting online business data. With the expansion of business volume, the problem that the question-answer knowledge base has dirty data and huge quantity, and the efficiency is low and the effect is not obvious only by manual arrangement. The data cleaning method is used for carrying out data detection and cleaning on the question and answer knowledge base, migration learning is carried out on a BERT pre-training model based on small data, meanwhile, global feature information is extracted by combining an attention mechanism model, the text semantic understanding capacity is greatly improved, the convenience and the high efficiency of the question and answer knowledge base detection and cleaning are improved by using the data cleaning method, the performance of an on-line intelligent question and answer system is ensured to be optimal, and the method has convenience in transverse expansion in the insurance field.
In this embodiment, a computer device is provided, as shown in fig. 2, and includes a memory 202, a processor 204, and a computer program stored on the memory and executable on the processor, and the processor implements any of the data cleansing methods described above when executing the computer program.
In particular, the computer device may be a computer terminal, a server or a similar computing device.
In the present embodiment, there is provided a computer-readable storage medium storing a computer program for executing any of the data cleansing methods described above.
In particular, computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable storage medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Based on the same inventive concept, the embodiment of the present invention further provides a data cleaning apparatus, as described in the following embodiments. Because the principle of solving the problems of the data cleaning device is similar to that of the data cleaning method, the implementation of the data cleaning device can refer to the implementation of the data cleaning method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a data cleansing apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes:
the data acquisition module 302 is configured to acquire data to be cleaned, and for each knowledge point, form a knowledge point pair from a main knowledge point problem and each sub-knowledge point problem respectively;
the vector extraction module 304 is configured to input the main knowledge point problem and the sub knowledge point problem into a bert pre-training model respectively for each knowledge point pair, output a context semantic feature vector sequence of the main knowledge point problem, and output a context semantic feature vector sequence of the sub knowledge point problem, where the bert pre-training model is obtained by training based on sample data of data to be cleaned;
a semantic matching degree calculation module 306, configured to input, for each knowledge point pair, a context semantic feature vector sequence of the main knowledge point problem and a context semantic feature vector sequence of the sub knowledge point problem into an attention mechanism model, and output a semantic matching degree value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, where the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem include feature information indicating a global importance degree of a feature vector;
and the data cleaning module 308 is configured to determine whether the knowledge point pair is dirty data according to the semantic matching degree.
In an embodiment, the vector extraction module is specifically configured to, for each knowledge point pair, input a main knowledge point problem and a sub knowledge point problem into a bert pre-training model respectively, and execute the following steps through the bert pre-training model:
respectively inputting the main knowledge point problem and the sub knowledge point problem into a bert pre-training model, outputting a word vector sequence with fixed length of the main knowledge point problem, and outputting a word vector sequence with fixed length of each sub knowledge point problem;
aiming at the knowledge point pairs with the same length between the word vector sequence of the sub-knowledge point problem and the word vector sequence of the main knowledge point problem, the word vector sequence of the main knowledge point problem is input into a bidirectional long-short time memory network, the context semantic feature vector sequence of the main knowledge point problem is output, the word vector sequence of the sub-knowledge point problem is input into the bidirectional long-short time memory network, and the context semantic feature vector sequence of the sub-knowledge point problem is output.
In an embodiment, the semantic matching degree calculation module is specifically configured to input the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem into an attention mechanism model, and execute the following steps through the attention mechanism model:
calculating the attention weight of each feature vector in the context semantic feature vector sequence of the main knowledge point problem, and calculating the attention weight of each feature vector in the context semantic feature vector sequence of the sub knowledge point problem;
taking the attention weight as the feature information, carrying out corresponding attention weight weighting on each feature vector in the context semantic feature vector sequence of the main knowledge point problem to obtain a feature vector sequence of the main knowledge point problem, and carrying out corresponding attention weight weighting on each feature vector in the context semantic feature vector sequence of the sub knowledge point problem to obtain a feature vector sequence of the sub knowledge point problem;
and outputting the semantic matching value of the knowledge point pair according to the characteristic vector sequence of the main knowledge point problem and the characteristic vector sequence of the sub knowledge point problem.
In one embodiment, the semantic matching degree calculation module calculates the attention weight of each feature vector in the context semantic feature vector sequence of the main knowledge point problem and calculates the attention weight of each feature vector in the context semantic feature vector sequence of the sub knowledge point problem by the attention mechanism model according to the following steps:
performing vector splicing on the feature vector of the last moment state in the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem to obtain background information;
reducing the dimension of the background information to half;
calculating the similarity value between the background information and the feature vector of each moment in the context semantic feature vector sequence of the main knowledge point problem, wherein the similarity value corresponding to each feature vector in the context semantic feature vector sequence of the main knowledge point problem forms the similarity vector of the main knowledge point problem, calculating the similarity value between the background information and the feature vector of each moment in the context semantic feature vector sequence of the sub knowledge point problem, and the similarity value corresponding to each feature vector in the context semantic feature vector sequence of the sub knowledge point problem forms the similarity vector of the sub knowledge point problem;
and calculating the attention weight of each feature vector in the context semantic feature vector sequence of the main knowledge point problem according to the similarity vector of the main knowledge point problem, and calculating the attention weight of each feature vector in the context semantic feature vector sequence of the sub knowledge point problem according to the similarity vector of the sub knowledge point problem.
In one embodiment, the semantic matching degree calculation module calculates the similarity value between the background information and the feature vector at each moment by using the following formula through the attention mechanism model:
simt=bkg·ht
wherein bkg represents background information, htFeature vector, sim, representing time ttFeature vector h representing background information and time ttThe similarity value between them.
In one embodiment, the semantic matching degree calculating module calculates the attention weight of the feature vector by the following formula:
wherein, atAn attention weight representing a feature vector at time t; simtThe similarity value corresponding to the characteristic vector of t time is shown, N is the time totalAnd (4) counting.
In one embodiment, the semantic matching degree calculating module is further configured to calculate a similarity value between each feature vector in the feature vector sequence of the main knowledge point problem and each feature vector in the feature vector sequence of the sub knowledge point problem, and each similarity value constitutes a similarity matrix;
according to the descending order of the similarity values, feature vectors corresponding to a plurality of preset similarity values are taken to form semantic matching feature vectors of the knowledge point pair;
and inputting the semantic matching feature vector into a full connection layer of the attention mechanism model, and outputting a semantic matching value of the knowledge point pair.
The embodiment of the invention realizes the following technical effects: the method comprises the steps of respectively forming a knowledge point pair by a main knowledge point problem and each sub-knowledge point problem, extracting a context semantic feature vector sequence of the main knowledge point problem and a context semantic feature vector sequence of the sub-knowledge point problem from the knowledge point pair based on a bert pre-training model, realizing the extraction of the context semantic feature vector sequence based on the semantic understanding capability of deep learning, further processing the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub-knowledge point problem by adopting an attention machine model, outputting a semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub-knowledge point problem, and enabling the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub-knowledge point problem to comprise feature information representing the global importance degree of a feature vector, compared with the prior art, the data cleaning method detects and cleans data based on the semantic understanding capability of deep learning, is beneficial to improving the accuracy of data cleaning, is beneficial to improving the processing efficiency of data detection and cleaning, and is beneficial to reducing the input cost of manpower and material resources.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for data cleansing, comprising:
acquiring data to be cleaned, and aiming at each knowledge point, respectively forming a knowledge point pair by the main knowledge point problem and each sub knowledge point problem;
aiming at each knowledge point pair, respectively inputting a main knowledge point problem and a sub knowledge point problem into a bert pre-training model, outputting a context semantic feature vector sequence of the main knowledge point problem, and outputting a context semantic feature vector sequence of the sub knowledge point problem, wherein the bert pre-training model is obtained by training based on sample data of data to be cleaned;
aiming at each knowledge point pair, inputting a context semantic feature vector sequence of a main knowledge point problem and a context semantic feature vector sequence of a sub knowledge point problem into an attention mechanism model, and outputting a semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, wherein the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem comprise feature information representing the global importance degree of a feature vector;
and determining whether the knowledge point pair is dirty data or not according to the semantic matching degree.
2. The data cleaning method of claim 1, wherein for each knowledge point pair, inputting a principal knowledge point problem and a sub-knowledge point problem into a bert pre-training model, respectively, outputting a context semantic feature vector sequence of the principal knowledge point problem, and outputting a context semantic feature vector sequence of the sub-knowledge point problem, comprises:
respectively inputting the main knowledge point problem and the sub knowledge point problem into a bert pre-training model, outputting a word vector sequence with fixed length of the main knowledge point problem, and outputting a word vector sequence with fixed length of each sub knowledge point problem;
aiming at the knowledge point pairs with the same length between the word vector sequence of the sub-knowledge point problem and the word vector sequence of the main knowledge point problem, the word vector sequence of the main knowledge point problem is input into a bidirectional long-short time memory network, the context semantic feature vector sequence of the main knowledge point problem is output, the word vector sequence of the sub-knowledge point problem is input into the bidirectional long-short time memory network, and the context semantic feature vector sequence of the sub-knowledge point problem is output.
3. The data cleaning method of claim 1 or 2, wherein for each knowledge point pair, inputting the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem into an attention mechanism model, and outputting the semantic matching value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, comprises:
inputting the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem into an attention mechanism model, and executing the following steps through the attention mechanism model:
calculating the attention weight of each feature vector in the context semantic feature vector sequence of the main knowledge point problem, and calculating the attention weight of each feature vector in the context semantic feature vector sequence of the sub knowledge point problem;
taking the attention weight as the feature information, carrying out corresponding attention weight weighting on each feature vector in the context semantic feature vector sequence of the main knowledge point problem to obtain a feature vector sequence of the main knowledge point problem, and carrying out corresponding attention weight weighting on each feature vector in the context semantic feature vector sequence of the sub knowledge point problem to obtain a feature vector sequence of the sub knowledge point problem;
and outputting the semantic matching value of the knowledge point pair according to the characteristic vector sequence of the main knowledge point problem and the characteristic vector sequence of the sub knowledge point problem.
4. The data cleaning method of claim 3, wherein calculating the attention weight of each feature vector in the context semantic feature vector sequence of the main knowledge point problem and calculating the attention weight of each feature vector in the context semantic feature vector sequence of the sub knowledge point problem comprises:
performing vector splicing on the feature vector of the last moment state in the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem to obtain background information;
reducing the dimension of the background information to half;
calculating the similarity value between the background information and the feature vector of each moment in the context semantic feature vector sequence of the main knowledge point problem, wherein the similarity value corresponding to each feature vector in the context semantic feature vector sequence of the main knowledge point problem forms the similarity vector of the main knowledge point problem, calculating the similarity value between the background information and the feature vector of each moment in the context semantic feature vector sequence of the sub knowledge point problem, and the similarity value corresponding to each feature vector in the context semantic feature vector sequence of the sub knowledge point problem forms the similarity vector of the sub knowledge point problem;
and calculating the attention weight of each feature vector in the context semantic feature vector sequence of the main knowledge point problem according to the similarity vector of the main knowledge point problem, and calculating the attention weight of each feature vector in the context semantic feature vector sequence of the sub knowledge point problem according to the similarity vector of the sub knowledge point problem.
5. The data cleansing method according to claim 4, wherein the similarity value between the background information and the feature vector at each time is calculated by the following formula:
simt=bkg·ht
wherein bkg represents background information, htFeature vector, sim, representing time ttFeature vector h representing background information and time ttThe similarity value between them.
6. The data cleansing method of claim 4, wherein for each feature vector in the context semantic feature vector sequence of the master knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem, the attention weight of the feature vector is calculated by the following formula:
wherein, atAn attention weight representing a feature vector at time t; simtAnd representing the similarity value corresponding to the feature vector at the time t, and N represents the total time.
7. The data cleansing method of claim 3, wherein outputting, for each pair of knowledge points, the semantic matching value for the pair of knowledge points based on the sequence of feature vectors of the main knowledge point problem and the sequence of feature vectors of the sub knowledge point problem comprises:
calculating the similarity value between each feature vector in the feature vector sequence of the main knowledge point problem and each feature vector in the feature vector sequence of the sub knowledge point problem, wherein the similarity values form a similarity matrix;
according to the descending order of the similarity values, feature vectors corresponding to a plurality of preset similarity values are taken to form semantic matching feature vectors of the knowledge point pair;
and inputting the semantic matching feature vector into a full connection layer of the attention mechanism model, and outputting a semantic matching value of the knowledge point pair.
8. A data cleansing apparatus, comprising:
the data acquisition module is used for acquiring data to be cleaned and respectively forming knowledge point pairs by the main knowledge point problem and each sub knowledge point problem aiming at each knowledge point;
the vector extraction module is used for respectively inputting the main knowledge point problem and the sub knowledge point problem into a bert pre-training model aiming at each knowledge point pair, outputting a context semantic feature vector sequence of the main knowledge point problem and outputting a context semantic feature vector sequence of the sub knowledge point problem, wherein the bert pre-training model is obtained by training based on sample data of data to be cleaned;
the semantic matching degree calculation module is used for inputting the context semantic feature vector sequence of the main knowledge point problem and the context semantic feature vector sequence of the sub knowledge point problem into an attention mechanism model aiming at each knowledge point pair, and outputting a semantic matching degree value of the knowledge point pair according to the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem, wherein the feature vector sequence of the main knowledge point problem and the feature vector sequence of the sub knowledge point problem comprise feature information representing the global importance degree of a feature vector;
and the data cleaning module is used for determining whether the knowledge point pair is dirty data or not according to the semantic matching degree.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the data cleansing method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program that executes the data cleansing method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010016777.2A CN111241258A (en) | 2020-01-08 | 2020-01-08 | Data cleaning method and device, computer equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010016777.2A CN111241258A (en) | 2020-01-08 | 2020-01-08 | Data cleaning method and device, computer equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111241258A true CN111241258A (en) | 2020-06-05 |
Family
ID=70864833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010016777.2A Pending CN111241258A (en) | 2020-01-08 | 2020-01-08 | Data cleaning method and device, computer equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111241258A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113064887A (en) * | 2021-03-22 | 2021-07-02 | 平安银行股份有限公司 | Data management method, device, equipment and storage medium |
CN113138982A (en) * | 2021-05-25 | 2021-07-20 | 黄柱挺 | Big data cleaning method |
CN116303406A (en) * | 2023-05-16 | 2023-06-23 | 河北中废通网络技术有限公司 | Method and device for cleaning junk data, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04160536A (en) * | 1990-10-25 | 1992-06-03 | Toshiba Corp | Knowledge correcting device |
CN108846077A (en) * | 2018-06-08 | 2018-11-20 | 泰康保险集团股份有限公司 | Semantic matching method, device, medium and the electronic equipment of question and answer text |
CN109472305A (en) * | 2018-10-31 | 2019-03-15 | 国信优易数据有限公司 | Answer quality determines model training method, answer quality determination method and device |
CN109726396A (en) * | 2018-12-20 | 2019-05-07 | 泰康保险集团股份有限公司 | Semantic matching method, device, medium and the electronic equipment of question and answer text |
CN110532399A (en) * | 2019-08-07 | 2019-12-03 | 广州多益网络股份有限公司 | Knowledge mapping update method, system and the device of object game question answering system |
CN110532400A (en) * | 2019-09-04 | 2019-12-03 | 江苏苏宁银行股份有限公司 | Knowledge base maintenance method and device based on text classification prediction |
-
2020
- 2020-01-08 CN CN202010016777.2A patent/CN111241258A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04160536A (en) * | 1990-10-25 | 1992-06-03 | Toshiba Corp | Knowledge correcting device |
CN108846077A (en) * | 2018-06-08 | 2018-11-20 | 泰康保险集团股份有限公司 | Semantic matching method, device, medium and the electronic equipment of question and answer text |
CN109472305A (en) * | 2018-10-31 | 2019-03-15 | 国信优易数据有限公司 | Answer quality determines model training method, answer quality determination method and device |
CN109726396A (en) * | 2018-12-20 | 2019-05-07 | 泰康保险集团股份有限公司 | Semantic matching method, device, medium and the electronic equipment of question and answer text |
CN110532399A (en) * | 2019-08-07 | 2019-12-03 | 广州多益网络股份有限公司 | Knowledge mapping update method, system and the device of object game question answering system |
CN110532400A (en) * | 2019-09-04 | 2019-12-03 | 江苏苏宁银行股份有限公司 | Knowledge base maintenance method and device based on text classification prediction |
Non-Patent Citations (1)
Title |
---|
张鑫华: "基于语义推理的知识相似性与冲突检测研究", 《信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113064887A (en) * | 2021-03-22 | 2021-07-02 | 平安银行股份有限公司 | Data management method, device, equipment and storage medium |
CN113064887B (en) * | 2021-03-22 | 2023-12-08 | 平安银行股份有限公司 | Data management method, device, equipment and storage medium |
CN113138982A (en) * | 2021-05-25 | 2021-07-20 | 黄柱挺 | Big data cleaning method |
CN116303406A (en) * | 2023-05-16 | 2023-06-23 | 河北中废通网络技术有限公司 | Method and device for cleaning junk data, electronic equipment and storage medium |
CN116303406B (en) * | 2023-05-16 | 2023-08-04 | 河北中废通网络技术有限公司 | Method and device for cleaning junk data, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220335711A1 (en) | Method for generating pre-trained model, electronic device and storage medium | |
CN111241258A (en) | Data cleaning method and device, computer equipment and readable storage medium | |
CN110633359B (en) | Sentence equivalence judgment method and device | |
CN108763535A (en) | Information acquisition method and device | |
Tang et al. | Modelling student behavior using granular large scale action data from a MOOC | |
CN109062914A (en) | User's recommended method and device, storage medium and server | |
CN112148994B (en) | Information push effect evaluation method and device, electronic equipment and storage medium | |
CN111966811A (en) | Intention recognition and slot filling method and device, readable storage medium and terminal equipment | |
CN117609444A (en) | Searching question-answering method based on large model | |
CN110197213B (en) | Image matching method, device and equipment based on neural network | |
CN115393633A (en) | Data processing method, electronic device, storage medium, and program product | |
CN115730058A (en) | Reasoning question-answering method based on knowledge fusion | |
CN115905613A (en) | Audio and video multitask learning and evaluation method, computer equipment and medium | |
CN114511083A (en) | Model training method and device, storage medium and electronic device | |
CN116738293A (en) | Service evaluation processing method and device and electronic equipment | |
CN115080748B (en) | Weak supervision text classification method and device based on learning with noise label | |
CN111126617A (en) | Method, device and equipment for selecting fusion model weight parameters | |
CN112446206A (en) | Menu title generation method and device | |
CN115357712A (en) | Aspect level emotion analysis method and device, electronic equipment and storage medium | |
CN114119193A (en) | Credit grade evaluation method and device | |
CN112417106B (en) | Question generation method and device based on text | |
CN114783417B (en) | Voice detection method and device, electronic equipment and storage medium | |
CN116227598B (en) | Event prediction method, device and medium based on dual-stage attention mechanism | |
CN115880486B (en) | Target detection network distillation method and device, electronic equipment and storage medium | |
CN112395405B (en) | Query document sorting method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200605 |
|
RJ01 | Rejection of invention patent application after publication |