CN110619363A - Classification method for subclass names corresponding to long description of material data - Google Patents
Classification method for subclass names corresponding to long description of material data Download PDFInfo
- Publication number
- CN110619363A CN110619363A CN201910877234.7A CN201910877234A CN110619363A CN 110619363 A CN110619363 A CN 110619363A CN 201910877234 A CN201910877234 A CN 201910877234A CN 110619363 A CN110619363 A CN 110619363A
- Authority
- CN
- China
- Prior art keywords
- material data
- data
- classification
- subclasses
- description
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000000463 material Substances 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000007477 logistic regression Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000012360 testing method Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 claims description 8
- 238000007637 random forest analysis Methods 0.000 claims description 8
- 238000012706 support-vector machine Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 239000002994 raw material Substances 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 abstract description 6
- 238000004364 calculation method Methods 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 8
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 238000002940 Newton-Raphson method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/087—Inventory or stock management, e.g. order filling, procurement or balancing against orders
Abstract
The invention discloses a classification method for subclasses corresponding to long description of material data. The classification of subclasses of material data can accurately analyze the problems in the data, such as case/full half angle, connector, unit non-uniformity and similar pronunciation, carry out reasonable data preprocessing process, standardize and standardize the data, then convert the data into a form of characteristic vector, and classify the data by adopting a method of logistic regression, L2 regularization and L-BFGS optimization.
Description
Technical Field
The invention relates to the technical field of material data classification, in particular to a classification method for subclasses corresponding to long description of material data.
Background
The material master data contains a description of the materials purchased, produced and stored in inventory by all enterprises. Which is a material database of materials data related to material information (e.g., inventory levels) in an enterprise. Integrating all material data into a single material database eliminates the problem of data redundancy and allows the purchasing department to use the data as well as other application departments (e.g., inventory management, material planning and control, invoice verification, etc.). The material classification means that materials with the same natural attributes are classified according to a certain arrangement order and a certain combination mode. The basic standard of classification by natural attributes is required to be followed as much as possible in the material classification process, the existing material classification efficiency is low, and the phenomenon of classification errors is easy to occur.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the invention aims to provide a classification method of subclasses corresponding to the description of the material data length.
According to the classification method of the subclass names corresponding to the long description of the material data, the method comprises the following steps:
s1: raw material data: reading data of the original material;
s2: data preprocessing: preprocessing the read-in original material data, and standardizing the data;
s3: class-to-number: encoding the original material data category column into numbers;
s4: dividing a sample set: dividing a sample set into a training set and a testing set;
s5: vectorizing the characteristics: converting the material length description into a characteristic vector form;
s6: and (4) classification: obtaining an objective function through learning, and mapping each feature set to a predefined class label;
s7: and (4) evaluating the classification result: the classification results were evaluated by accuracy, recall, and F1 values.
The S2 includes the following steps:
s21: unifying the original material data unit and the connector;
s22: brackets and slashes are removed;
s23: after Chinese word segmentation, text-to-pinyin conversion is carried out;
s24: upper case to lower case and full angle to half angle.
The original material data in the S3 includes a material data length description and a subclass name.
The dividing ratio of the sample set in the S4 is that the ratio of the training set sample size to the test set sample size is 7: 3.
the feature vectorization method in S5 is the tf-idf algorithm.
The material length in S5 is described as material text data.
The classification method in S6 includes logistic regression, naive Bayes, decision trees, support vector machines, K neighbors, random forests, GBDT, XGboost, neural networks and the like.
The metrics for evaluating the classification results in S7 include accuracy, recall, and F1 values.
The beneficial effects of the invention are as follows: the classification of subclasses of material data can accurately analyze the problems existing in the data, such as case/full half angle, connector, unit non-uniformity and similar pronunciation, carry out a reasonable data preprocessing process, standardize and standardize the data, then convert the data into a form of a feature vector, and classify the data by adopting a logistic regression + L2 regularization + L-BFGS optimization method.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a classification method for names of subclasses corresponding to description of material data length according to the present invention;
FIG. 2 is a flow chart of data preprocessing in a classification method of names of subclasses corresponding to long description of material data according to the present invention;
fig. 3 is a flowchart of an example of data preprocessing in the classification method of the names of the subclasses corresponding to the description of the material data length provided by the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic views, and merely illustrate the basic structure of the present invention, and therefore, they show only the components related to the present invention.
Referring to fig. 1-2, a method for classifying names of corresponding subclasses of material data length description comprises the following steps:
s1: raw material data: reading data of the original material;
s2: data preprocessing: preprocessing the read-in original material data, and standardizing the data;
s3: class-to-number: encoding the original material data category column into numbers;
s4: dividing a sample set: dividing a sample set into a training set and a testing set;
s5: vectorizing the characteristics: converting the material length description into a characteristic vector form;
s6: and (4) classification: obtaining an objective function through learning, and mapping each feature set to a predefined class label;
s7: and (4) evaluating the classification result: the classification results were evaluated by accuracy, recall, and F1 values.
S2 includes the steps of:
s21: unifying the original material data unit and the connector;
s22: brackets and slashes are removed;
s23: after Chinese word segmentation, text-to-pinyin conversion is carried out;
s24: upper case to lower case and full angle to half angle.
And in S3, the original material data are long description of material data and name of subclass.
The dividing ratio of the sample set in the S4 is that the ratio of the training set sample size to the test set sample size is 7: 3.
the feature vectorization method in the S5 is tf-idf algorithm.
The material length in S5 is described as material text data.
The classification method in S6 includes logistic regression, naive Bayes, decision trees, support vector machines, K neighbors, random forests, GBDT, XGboost and neural networks.
The method for evaluating the classification result in the S7 is a logistic regression method, a naive Bayes method, a decision tree method, a support vector machine method, a K neighbor method, a random forest method and an XGboost method.
Data preprocessing:
due to the problems of non-uniform capital and small case, non-uniform full half angle, non-uniform multiplier, non-uniform space, non-uniform underline and non-uniform cross bar, non-uniform metering units, non-uniform input word sequence, similar pronunciation and the like of the material data, the preprocessing operation of the data is carried out before the data is converted into the characteristic vector, and the data is normalized and standardized.
Example 2.1:
the material data length describes a radial bearing \ N40/50/20T6540 tilting pad, and the results of the pretreatment process are as follows:
example 2.2:
the long description of the original material data and the subclass name are as follows:
the data length of the pretreated materials is described as follows:
kebian danhuang zhijia df07kfa116 2327n 2747n 9↑q 321002jda zuhe jian
shimian xiangjiaodian pian cl300 dn25 xb350 gaf sh3401
wufeng santong dn50*dn50 sch120 sch120 sh t3408 15crmo gb9948
shourong redianou redianou wrp–131 0–1600s xing l=900
shourong ruhua beng yeya guan 32*5m
class-to-number:
to facilitate the classification task, the category columns are all encoded into numbers.
Example 3.1:
the subclass name of the raw material data is encoded into a number:
dividing a sample set:
a test sample set is typically required to evaluate the generalization error of the classifier. Therefore, the sample set is divided into a training set and a testing set, and after the classifier is trained by the training sample set, the testing error on the testing sample set is used as an approximation of the generalization error. The dividing proportion of the sample set in the invention is the sample amount of the training set: the test set sample size was 7: 3.
Vectorizing the characteristics:
the independent variable of the classification task is a continuous real-valued vector, so the material length description (text data) is converted into a feature vector form. The text vectorization method mainly comprises a bag-of-words model and a tf-idf algorithm. In consideration of the characteristics of material data, the invention adopts a tf-idf algorithm to carry out feature vectorization.
the tf-idf algorithm is a statistical method for assessing the importance of a word to a document in a document set or corpus. The main idea is as follows: if a word occurs with a high frequency (tf) in one article and rarely occurs in other articles, the word is considered to have a good class distinction capability and is suitable for classification. the tf-idf algorithm is widely applied to search engines, keyword extraction, text similarity, text summarization and the like.
(1) The word frequency (tf) represents the frequency of occurrence of words in the text, and the calculation formula is as follows:
namely, it isWherein n isi,jIs that the word is in the document DjThe number of times of occurrence of (a),is a file DjThe sum of the number of occurrences of all words in (a).
(2) The inverse document frequency (idf) is the logarithm of the ratio of the number of files that contain the total number of files to the number of files for a particular term. The calculation formula is as follows:
namely, it isWhere | D | is the total number of files in the corpus, | { j: w ∈ DjIs the number of files containing the word w. The denominator is added to prevent the case where the word w is not in the corpus resulting in a denominator of 0.
The more files containing the word w, the larger idf value is, and the word has good category distinguishing capability.
(3) tf-idf=tf×idf
High frequency terms in a particular document, and low document frequency of the term across the document collection, may result in a high weighting of tf-idf. tf-idf tends to filter out common words, preserving important words.
Example 5.1:
preprocessed material data
kebian danhuang zhijia df07kfa116 2327n 2747n 9↑q 321002jda zuhe jian
shimian xiangjiaodian pian cl300 dn25 xb350 gaf sh3401
wufeng santong dn50*dn50 sch120 sch120 sh t3408 15crmo gb9948
shourong redianou redianou wrp–1310–1600s xing l=900
shourong ruhua beng yeya guan 32*5m
Expressed in the form of a feature vector:
[0 0 0 0 0 0 0 0 0 0 0 0.35355339 0 0 0.35355339 0 0.35355339 0 0 0 0 0 0.35355339 0 0 0 0 0 0 0 0.35355339 0.35355339 0 0 0 0 0.353553390.35355339 0 0 0 0]
[0.2811506 0.2811506 0.0.2811506 0 0 0 0 0 0 0.2811506 0 0 0 0 0 0 0 0 0 0 0 0.2811506 0 0 0.5623012 0 0.2811506 0 0 0 0 0 0.22683053 0 0.28115060 0 0 0.2811506 0 0 0]
[0 0 0 0 0 0 0.38775666 0 0.38775666 0 0 0.38775666 0 0 0 0 0 0 0 0.38775666 0 0 0 0 0 0 0.38775666 0 0 0 0 0 0 0.31283963 0 0 0 0 0 00.38775666 0 0]
[0 0 0.26726124 0 0 0 0 0 0 0 0 0 0 0 0 0 0.53452248 0 0.26726124 0 0 0 0 0 0 0 0 0 0.26726124 0.53452248 0.26726124 0 0 0 0.26726124 0 0.267261240 0 0 0 0 0]
[0 0 0 0 0.30151134 0.30151134 0 0.30151134 0 0.30151134 0 0 0 0.30151134 0.30151134 0 0 0 0 0 0.30151134 0.30151134 0 0 0.30151134 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0.30151134 0.30151134]
and (4) classification:
the classification task is to obtain an objective function through learning, and map each feature set x to a predefined class label yi。
The current mainstream classification method comprises the methods of logistic regression, naive Bayes, decision trees, support vector machines, K neighbor, random forests, GBDT, XGboost, neural networks and the like, and after the material data characteristics are fully considered, the invention adopts the logistic regression method added with the L2 regular term and uses the L-BFGS algorithm to carry out iterative solution.
And (4) evaluating the classification result:
the main metrics for evaluating the classification result are: accuracy, recall, and F1 values.
(1) Rate of accuracy
Accuracy is, as the name implies, the proportion of correctly sorted samples to the total number of samples. The calculation formula is as follows:
(2) recall rate
The recall rate is also called recall rate, and represents the proportion of the number of correctly classified samples of the good cases to the total number of samples of the good cases, and the calculation formula is as follows:
where TP represents the number of correctly classified positive examples and FN represents the number of incorrectly classified positive examples.
(3) F1 value
The F1 value is the harmonic mean of precision and recall, i.e.
Rate of accuracyWhere FP represents the number of misclassified negative samples.
Example 7.1:
in order to evaluate and compare the classification effect of the classification method on the material data sets, classification is performed on 50000 material data sets (total 1995 subclass categories) and 20564 material data sets (total 1213 subclass categories) by using logistic regression, naive Bayes, decision trees, support vector machines, K neighbor, random forests and XGboost methods respectively, and the classification result metric on the test set is shown in the following table.
Rate of accuracy | Recall rate | F1 value | |
logistic regression | 0.88 | 0.90 | 0.89 |
Naive Bayes | 0.60 | 0.65 | 0.57 |
Decision tree | 0.84 | 0.82 | 0.82 |
Support vector machine | 0.06 | 0.13 | 0.17 |
K nearest neighbor | 0.84 | 0.82 | 0.82 |
Random forest | 0.89 | 0.89 | 0.88 |
XGBoost | 0.67 | 0.73 | 0.69 |
The table above shows the comparison of the results of different classification methods on 50000 material data sets.
Rate of accuracy | Recall rate | F1 value | |
logistic regression | 0.88 | 0.90 | 0.89 |
Naive Bayes | 0.64 | 0.73 | 0.65 |
Decision tree | 0.87 | 0.89 | 0.87 |
Support vector machine | 0.18 | 0.22 | 0.18 |
K nearest neighbor | 0.82 | 0.82 | 0.80 |
Random forest | 0.86 | 0.84 | 0.84 |
XGBoost | 0.69 | 0.73 | 0.71 |
The table above is a comparison of the results of different classification methods on the 20564 material data sets.
From the two tables, the average classification effect of the logistic regression + L2 regularization + L-BFGS method adopted by the invention is superior to that of other classification methods.
The logistic regression model uses probabilistic estimates to classify. The latent variable y is assumed to represent the possibility of occurrence of a certain event to be researched, the value range of the latent variable y is the whole real number, and the larger the value of the latent variable y is, the higher the possibility of occurrence of the event is. The logistic regression model is widely applied to economic prediction, disaster weather prediction and auxiliary medical diagnosis.
For the material data classification problem, the event to be researched is that a long description of material data is classified into a certain subclass class. And (3) analyzing the internal association between the material data characteristics (namely words in the long description) and the subclass class by using logistic regression so as to predict the subclass class to which the material data belongs.
If the classification is binary, the independent variable is x to represent the characteristic of long description, and yiIndicates the likelihood that the bar description belongs to subclass class i, yi1 indicates belonging to this category, yi0 means not belonging to this category.
Assuming that the predicted values are linear combinations of features, the relationship between the predicted values z and the independent variables x generated by the linear regression model is as follows:
z=wTx+b
to convert real-valued z to 0/1 values, z is assumed to obey a logistic distribution, i.e.
Then the probability that the long description belongs to that category is
The above formula can be changed into
Is obviously provided with
The objective function of logistic regression is
W and b in the model can be estimated by maximum likelihood. The likelihood function of logistic regression is
Taking logarithm of likelihood function for convenient calculation
Maximizing the likelihood function is equivalent to minimizing
The maximum likelihood estimate is easily over-fitted and therefore a regularization term can be added to the objective function. Commonly used regularization terms are L1 regularization and L2 regularization. Adding an L2 regular term according to the prior characteristics of the material data
This is an unconstrained convex optimization problem. According to the convex optimization theory, a Newton-Raphson method is generally adopted for solving. As can be seen from the above formula, all training samples are required to solve the problem, and matrix inversion operation is required for each iteration during optimization of the Newton-Raphson method. In consideration of high dimensionality of text features, in order to reduce the calculation amount, an approximate algorithm, such as an L-BFGS algorithm, Newton-CG and the like, can be adopted for solving. The invention adopts an L-BFGS algorithm to solve.
The L-BFGS algorithm is the most common method for solving the unconstrained nonlinear programming problem, has the advantages of high convergence rate, low memory overhead and the like, and is suitable for large-scale calculation.
Let us assume that the unconstrained problem is defined as minf (x), x ∈ Rn
f (x) at x(k)At a second order Taylor expansion of
Since the extreme point of f (x) satisfiesNeglecting the last remainder and taking the derivative to obtain
Thus the iterative formula of Newton's method is
As can be seen from the above equation, Newton's method requires x to be calculated for each iteration(k)The inverse of the Hessian matrix is processed, and the Hessian matrix is not necessarily fixed, so that the inverse of the Hessian matrix is approximated by a matrix containing no second derivative, namely a quasi-newton method, and different construction methods of the approximation matrix determine different quasi-newton methods.
The BFGS algorithm uses a matrix Bk+1To approximate Hessian matrixIs calculated by the formula
Wherein p is(k)=x(k+1)-x(k),Order toThe BFGS formula can be obtained
Let yk=qk,sk=pk,The above formula can be rewritten as
Order toThenThe L-BFGS only takes the nearest m groups of data to construct an approximate calculation formula each time, namely
The pseudo-code for the L-BFGS algorithm is as follows:
the classification of subclasses of material data can accurately analyze the problems in the data, such as case/full half angle, connector, unit non-uniformity, similar pronunciation and the like, carry out a reasonable data preprocessing process, standardize and standardize the data, convert the data into a characteristic vector form, and classify the data by adopting a logistic regression + L2 regularization + L-BFGS optimization method. The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical solutions and their inventive concepts of the present invention with the equivalent alternatives or modifications within the scope of the present invention.
Claims (7)
1. A method for classifying names of subclasses corresponding to long description of material data comprises the following steps:
s1: raw material data: reading data of the original material;
s2: data preprocessing: preprocessing the read-in original material data, and standardizing the data;
s3: class-to-number: encoding the original material data category column into numbers;
s4: dividing a sample set: dividing a sample set into a training set and a testing set;
s5: vectorizing the characteristics: converting the material length description into a characteristic vector form;
s6: and (4) classification: obtaining an objective function through learning, and mapping each feature set to a predefined class label;
s7: and (4) evaluating the classification result: the classification result is evaluated by a classification result metric.
2. The method for classifying names of subclasses corresponding to long description of material data according to claim 1, wherein said S2 comprises the following steps:
s21: unifying the original material data unit and the connector;
s22: brackets and slashes are removed;
s23: after Chinese word segmentation, text-to-pinyin conversion is carried out;
s24: upper case to lower case and full angle to half angle.
3. The method for classifying names of subclasses corresponding to material data length description according to claim 1, wherein the original material data in S3 is material data length description and subclass name.
4. The method for classifying names of subclasses corresponding to long description of material data according to claim 1, wherein in step S4, the sample set is divided into training set samples and testing set samples in a ratio of 7: 3.
5. the method for classifying names of subclasses corresponding to material data length descriptions according to claim 1, wherein the material length descriptions in S5 are material text data, and the feature vectorization method is tf-idf algorithm.
6. The method for classifying names of subclasses corresponding to long description of material data according to claim 1, wherein the classification method in S6 includes logistic regression, naive Bayes, decision trees, support vector machines, K neighbors, random forests, GBDT, XGBoost, neural networks and the like.
7. The method for classifying names of corresponding subclasses described in claim 1, wherein said measures of evaluating classification results in S7 include accuracy, recall rate and F1 value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910877234.7A CN110619363A (en) | 2019-09-17 | 2019-09-17 | Classification method for subclass names corresponding to long description of material data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910877234.7A CN110619363A (en) | 2019-09-17 | 2019-09-17 | Classification method for subclass names corresponding to long description of material data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110619363A true CN110619363A (en) | 2019-12-27 |
Family
ID=68923609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910877234.7A Pending CN110619363A (en) | 2019-09-17 | 2019-09-17 | Classification method for subclass names corresponding to long description of material data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110619363A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11790033B2 (en) | 2020-09-16 | 2023-10-17 | International Business Machines Corporation | Accelerated Quasi-Newton methods on analog crossbar hardware |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491390A (en) * | 2018-03-28 | 2018-09-04 | 江苏满运软件科技有限公司 | A kind of main line logistics goods title automatic recognition classification method |
CN108777674A (en) * | 2018-04-24 | 2018-11-09 | 东南大学 | A kind of detection method for phishing site based on multi-feature fusion |
CN109165294A (en) * | 2018-08-21 | 2019-01-08 | 安徽讯飞智能科技有限公司 | Short text classification method based on Bayesian classification |
CN109271517A (en) * | 2018-09-29 | 2019-01-25 | 东北大学 | IG TF-IDF Text eigenvector generates and file classification method |
CN109308485A (en) * | 2018-08-02 | 2019-02-05 | 中国矿业大学 | A kind of migration sparse coding image classification method adapted to based on dictionary domain |
-
2019
- 2019-09-17 CN CN201910877234.7A patent/CN110619363A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491390A (en) * | 2018-03-28 | 2018-09-04 | 江苏满运软件科技有限公司 | A kind of main line logistics goods title automatic recognition classification method |
CN108777674A (en) * | 2018-04-24 | 2018-11-09 | 东南大学 | A kind of detection method for phishing site based on multi-feature fusion |
CN109308485A (en) * | 2018-08-02 | 2019-02-05 | 中国矿业大学 | A kind of migration sparse coding image classification method adapted to based on dictionary domain |
CN109165294A (en) * | 2018-08-21 | 2019-01-08 | 安徽讯飞智能科技有限公司 | Short text classification method based on Bayesian classification |
CN109271517A (en) * | 2018-09-29 | 2019-01-25 | 东北大学 | IG TF-IDF Text eigenvector generates and file classification method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11790033B2 (en) | 2020-09-16 | 2023-10-17 | International Business Machines Corporation | Accelerated Quasi-Newton methods on analog crossbar hardware |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10685044B2 (en) | Identification and management system for log entries | |
CN107577785B (en) | Hierarchical multi-label classification method suitable for legal identification | |
CN108320171B (en) | Hot-sold commodity prediction method, system and device | |
WO2020199591A1 (en) | Text categorization model training method, apparatus, computer device, and storage medium | |
US9646262B2 (en) | Data intelligence using machine learning | |
JP2020115346A (en) | AI driven transaction management system | |
Park et al. | Explainability of machine learning models for bankruptcy prediction | |
CN107622326B (en) | User classification and available resource prediction method, device and equipment | |
US20170024662A1 (en) | Data driven classification and troubleshooting system and method | |
CN112527970B (en) | Data dictionary standardization processing method, device, equipment and storage medium | |
US20230334119A1 (en) | Systems and techniques to monitor text data quality | |
Abakarim et al. | Towards an efficient real-time approach to loan credit approval using deep learning | |
CN110619363A (en) | Classification method for subclass names corresponding to long description of material data | |
CN116245107B (en) | Electric power audit text entity identification method, device, equipment and storage medium | |
CN117290404A (en) | Method and system for rapidly searching and practical main distribution network fault processing method | |
Sana et al. | Data transformation based optimized customer churn prediction model for the telecommunication industry | |
Zhang et al. | Can sentiment analysis help mimic decision-making process of loan granting? A novel credit risk evaluation approach using GMKL model | |
GUMUS et al. | Stock market prediction by combining stock price information and sentiment analysis | |
CN114443840A (en) | Text classification method, device and equipment | |
Anastasopoulos et al. | Computational text analysis for public management research: An annotated application to county budgets | |
CN112100370B (en) | Picture-trial expert combination recommendation method based on text volume and similarity algorithm | |
Hepburn et al. | Proper losses for learning with example-dependent costs | |
AU2020104034A4 (en) | IML-Cloud Data Performance: Cloud Data Performance Improved using Machine Learning. | |
CN116932487B (en) | Quantized data analysis method and system based on data paragraph division | |
US11816427B1 (en) | Automated data classification error correction through spatial analysis using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191227 |
|
RJ01 | Rejection of invention patent application after publication |