CN112580686A - Aluminum electrolytic capacitor purchase prediction method based on Markov distance KNN algorithm - Google Patents

Aluminum electrolytic capacitor purchase prediction method based on Markov distance KNN algorithm Download PDF

Info

Publication number
CN112580686A
CN112580686A CN202011299561.8A CN202011299561A CN112580686A CN 112580686 A CN112580686 A CN 112580686A CN 202011299561 A CN202011299561 A CN 202011299561A CN 112580686 A CN112580686 A CN 112580686A
Authority
CN
China
Prior art keywords
electrolytic capacitor
aluminum electrolytic
value
data
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011299561.8A
Other languages
Chinese (zh)
Other versions
CN112580686B (en
Inventor
郑鑫
陈建琪
徐楠楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Mengdou Network Technology Co ltd
Original Assignee
Qingdao Mengdou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Mengdou Network Technology Co ltd filed Critical Qingdao Mengdou Network Technology Co ltd
Priority to CN202011299561.8A priority Critical patent/CN112580686B/en
Publication of CN112580686A publication Critical patent/CN112580686A/en
Application granted granted Critical
Publication of CN112580686B publication Critical patent/CN112580686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention provides an aluminum electrolytic capacitor purchase prediction method based on a Mahalanobis distance KNN algorithm, which is characterized by comprising the following steps of: step 1: confirming the key parameters of the aluminum electrolytic capacitor by analyzing the materials of the aluminum electrolytic capacitor, preliminarily extracting the key parameter items of the aluminum electrolytic capacitor by adopting a frequent item set extraction method, and finally confirming the key parameter items of the aluminum electrolytic capacitor by the parameter items appearing in the frequent item set; step 2: determining a parameter matching scheme according to a decomposition rule of the material description, and determining a product range which can be used; and step 3: and (3) providing a purchase prediction of the product at the current user by adopting a prediction purchase method based on the KNN algorithm of the Mahalanobis distance, and primarily screening a product list which is possibly purchased for the user. The method can realize the product confirmation and purchase prediction functions of the aluminum electrolytic capacitor. The time can be saved for the user, the working efficiency is improved, and the platform experience is improved.

Description

Aluminum electrolytic capacitor purchase prediction method based on Markov distance KNN algorithm
Technical Field
The invention relates to the technical field of aluminum electrolytic capacitor decomposition and identification, in particular to an aluminum electrolytic capacitor purchase prediction method based on a KNN algorithm of Mahalanobis distance.
Background
In the electronic component industry, especially the electronic components such as resistors and capacitors, the manufacture threshold is low, so that the brands, series and suppliers are various. At present, for the product determination problem of the aluminum electrolytic capacitor, the product is generally determined by directly positioning the product by the original factory model, or whether the description of the aluminum electrolytic capacitor is directly consistent with the symbol comparison in the database of the aluminum electrolytic capacitor is judged to be the product required by the user. Direct positioning by the original factory model is a relatively accurate method. However, for a method that the material description is not suitable for direct positioning, for a user to determine a product by using the material description, direct operations of the material description are the same, the inclusion relationship and the like are not accurate and comprehensive enough, and meanwhile, various descriptions of the product in the database have high requirements and the requirements on writing and expression are comprehensive.
In order to provide better purchasing experience for users, more understanding of purchasing habits and purchasing criteria of users. When a user has past purchasing habits on a platform and has no uniqueness requirement on purchased products (the uniqueness requirement refers to that the user specifies an original factory model, if the user specifies the original factory model, the product is not required to be determined for the user, and the product can be directly positioned for the user), the more qualified products can be provided for the user according to the past purchasing conditions of the user.
Disclosure of Invention
The purpose of the invention is: aiming at the problem that the decomposition and recommendation of the aluminum electrolytic capacitor are difficult, the invention provides a parameter decomposition method of the aluminum electrolytic capacitor and a prediction purchase method of a KNN (K-nearest neighbor, K neighbor algorithm, KNN for short) method based on the Mahalanobis distance, provides the purchase prediction of products at the current user, preliminarily screens a product list which is possibly purchased for the user, saves time for the user, improves the working efficiency and maximizes the platform experience.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
the aluminum electrolytic capacitor purchase prediction method based on the KNN algorithm of the Mahalanobis distance is characterized by comprising the following steps of:
step 1: confirming the key parameters of the aluminum electrolytic capacitor by analyzing the materials of the aluminum electrolytic capacitor, preliminarily extracting the key parameter items of the aluminum electrolytic capacitor by adopting a frequent item set extraction method, and finally confirming the key parameter items of the aluminum electrolytic capacitor by the parameter items appearing in the frequent item set;
step 2: determining a parameter matching scheme according to a decomposition rule of the material description, and determining a product range which can be used;
and step 3: and (3) providing a purchase prediction of the product at the current user by adopting a prediction purchase method based on the KNN algorithm of the Mahalanobis distance, and primarily screening a product list which is possibly purchased for the user.
Further, in the step 1, the key parameters of the aluminum electrolytic capacitor are determined by analyzing the materials of the aluminum electrolytic capacitor, and the method specifically comprises the following steps:
step 1.1: collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor;
step 1.2: cleaning data: clearing blank or material description of the aluminum electrolytic capacitor only with Chinese character parts, then converting digital representation in the material description into a uniform numerical value, and setting a data set of the aluminum electrolytic capacitor after cleaning as D;
step 1.3: counting all character sets W appearing in aluminum electrolytic capacitors1Removing the weight;
step 1.4: extracting a frequent item set S according to the principle: if a set of items is a frequent set of items, then all of its non-empty subsets are also frequent; conversely, if a set of items is infrequent, then all of its supersets are also infrequent;
step 1.5: and extracting the character string with higher frequency of occurrence to represent the parameters according to the character string extracted from the set S, and finally determining the key parameter item of the aluminum electrolytic capacitor.
Further, the step 1.4 of extracting the frequent item set S includes:
step 1.4.1: will character set W1In (3) combining two by two to form a set W2Set W of2If the occurrence frequency is more than L, the character string in the data set D is a subset of a frequent item set, and the character string set of the data set D is a subset of the frequent item setIs synthesized into T2(ii) a The L is determined according to the number of the data sets, and is equal to N multiplied by 10 percent, wherein N is the number of the data sets; if not, the item set containing the character string is not a frequent item set;
step 1.4.2: will T2And W1Are combined, T2With the character string in front, W1Thereafter, a set W of character strings is formed3(ii) a Set W3If the occurrence frequency is more than L, the character string set is a subset of a frequent item set, and the character string set is T3(ii) a If not, the item set containing the character string is not a frequent item set;
step 1.4.3: t is3And W1Combining, repeating the step 1.42 to find frequent item set to obtain character string set T4(ii) a Circulate by this until TnWhen the time is empty, the cycle is ended;
step 1.4.4: set of string items W ═ Tn-1,…,T3,T2,T1,W1];
Step 1.4.5: counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the matrix FW comprises the character strings and the corresponding occurrence times;
step 1.4.6: merging character strings: when the character strings are combined, the character strings with shorter length start to go upwards; if the number of occurrences of the long character string is greater than or equal to the short character string, removing the short character string and the corresponding number of occurrences thereof in the FW, wherein the number of occurrences of the long character string is the sum of the number of occurrences of all the closest superset character strings of the short character string; if the occurrence frequency of the long character string is less than that of the short character string, modifying the corresponding frequency of the short character string into: the number of occurrences of the short string-the number of occurrences of the long string, where the number of occurrences of the long string is the sum of the number of occurrences of all superset strings that are closest to the short string;
step 1.4.7: and sorting, namely sorting the character strings from high to low according to the occurrence times of the character strings, wherein the occurrence times of the character strings exceed 30% of the total number N of the data sets to form a set S.
Further, in the step 2, a parameter matching scheme is determined according to a decomposition rule described in the material, and a product range which can be used is determined, and the specific steps include:
step 2.1: determining user input, original plant model and material description;
step 2.2: if the model is determined to be the original factory model through the database of the platform in the step 2.1, directly skipping the following steps of product determination and directly positioning the product; if the input is determined to be the material description in the step 2.1, the process is carried out downwards;
step 2.3: confirming the product types: checking whether the material description contains the category name and the alias in the corresponding data or not, and finally confirming the category of the material description;
step 2.4: determining the parameters of the aluminum electrolytic capacitor: in the material description about the aluminum electrolytic capacitor accumulated by the platform, referring to step 1, the key parameters for determining the aluminum electrolytic capacitor product are obtained by statistics as follows:
(1) capacity value: float type numerical value, the unified unit is UF, the unit does not display;
(2) rated voltage: float type numerical value, the unified unit is V, the unit does not display;
(3) precision: float type numerical value, the unit is unified as%, the unit is not displayed;
(4) service life: int type numerical value, the unified unit is HRS, and the unit is not displayed;
(5) working temperature: the character type, the working temperature high value and the working temperature low value are all int type numerical values, the working temperature low value and the working temperature high value are connected through a symbol '/', and finally the character type is returned;
(6) diameter: float type numerical value, the unified unit is MM, the unit does not display;
(7) height: float type numerical value, the unified unit is MM, the unit does not display;
(8) foot distance: float type numerical value, the unified unit is MM, the unit does not display;
(9) the installation mode is as follows: a character type;
the output format of the parameter items is the same as the data format of the unified product parameter items in the database;
step 2.5: uniform notation: part of symbols are replaced uniformly, so that subsequent operation and parameter extraction operation are facilitated;
step 2.6: the extraction and installation mode is as follows: the installation mode has a corresponding relation between description vocabularies, and the installation mode is extracted according to the corresponding relation; reserving the installation mode and extracting the vocabulary corresponding to the installation mode, and deleting the corresponding vocabulary in the material description at the same time, so as to ensure that the extracted information is not repeatedly used;
step 2.7: extracting precision; recording the extracted and originally expressed characters, and updating the material description;
step 2.8: uniform distance notation: unifying the distance units into MM, and simultaneously storing the unified distance values and original characters, namely recording the corresponding relation between data and original data;
step 2.9: extracting diameter, height and foot distance; the extracted material descriptions are the material descriptions after characters are capitalized, and parts except numbers and letters X are replaced by blank spaces;
step 2.10: uniformly replacing the descriptions about the capacity value, the voltage, the temperature and the time in the material description with respective uniform characters for description; storing the numerical values and the corresponding characters after conversion;
step 2.11: extracting capacity value, voltage and service life; updating the material description;
step 2.12: the temperature range of extraction; updating the material description;
step 2.13: extracting a capacitance value represented by the special symbol of the capacitor; if the volume value is not successfully extracted in the step 2.11, entering the step, further extracting the volume value, and updating the material description after the volume value is successfully extracted;
step 2.14: extracting the capacity value, voltage and precision of a scientific counting method; if the capacitance value or the voltage or the precision is not successfully extracted before the step, entering the step for extraction;
step 2.15: extracting the capacity value of the pure number representation; if the capacity value is not successfully extracted in the steps, the step is carried out to extract the capacity value; extracting pure numbers in the material description and numerical values without letters before and after, outputting the numerical values as capacity values, and updating the material description;
step 2.16: extracting the precision of the letter representation; if the precision is not successfully extracted in the steps, the step is carried out to extract the precision; the method comprises the following steps that (1) letters appearing in a material description are single, namely, letter symbols adjacent to the letters do not exist before and after the letters, the first letter appearing first is input as precision, and the material description is updated;
step 2.17: extracting the foot distance; if the step is not successful, entering the step to extract the foot distance; the remaining numbers in the description of the extract are not adjacent to letters in the front and the back, and if the numerical value is between [1, 50] and only one decimal number can represent the numerical value, the numerical value is output as the foot distance;
step 2.18: confirming a parameter item; if all the necessary parameter items are successfully extracted, entering the next step to confirm the product; otherwise, reminding the user that the parameters are missing, and completing the corresponding parameter items;
step 2.19: confirming the product; and determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the possibility of purchasing the purchasable products by KNN algorithm based on the Mahalanobis distance.
Further, the necessary parameter items of the aluminum electrolytic capacitor in step 2.18 include: capacity, rated voltage, precision, operating temperature, diameter/length, foot distance.
Further, the step 3 of predicting the purchase of the product at the current user by using the KNN algorithm based on the mahalanobis distance includes the specific steps of:
step 3.1: preparing data:
data preparation is carried out through the purchase records of the users, so that data samples of different categories, including purchase or non-purchase, are kept in a ratio of 1: 1; setting the number of samples as N, and simultaneously ensuring that the number of the samples is greater than the characteristic dimension L of the samples;
step 3.2: calculating the distance:
calculating the distance between the current sample to be classified and each sample in the classified samples in the training set; calculating the distance between the samples by adopting the Mahalanobis distance;
step 3.3: sorting distances;
according to the distance D between the current sample to be classified and each sample in the training seti1, 2, a.
Step 3.4: neighbor samples are determined.
Selecting the first K sample data as neighbor samples of the current sample to be classified according to the sorted distance;
step 3.5: counting the number of the category attributes of the neighbor samples;
counting the class attribute of K neighbor sample data, wherein the class is w1The number of samples of (1) is t1Class is w2The number of samples of (1) is t2
Step 3.6: determining the category attribute of a sample to be classified;
when t is1>t2Then, the sample data to be classified has a class attribute of w1I.e., purchase;
when t is1<t2Then, the sample data to be classified has a class attribute of w2I.e., not purchased;
step 3.7: and adding the prediction result of the purchasable products into the list recommended to be purchased for the user.
Further, the calculating the distance between the samples by using the mahalanobis distance in the step 3.2 specifically includes: calculating the distance D (Y) between samples using the Mahalanobis distancei,Yj) Wherein D (Y)i,Yj) Represents a sample YiAnd YjThe mahalanobis distance is specifically calculated as follows:
step 3.2.1: centralizing data;
Figure BDA0002786416250000071
wherein Y represents raw data, X represents centralized data,
Figure BDA0002786416250000072
mean values representing the raw data;
step 3.2.2: solving a mapping matrix; by covariance matrix
Figure BDA0002786416250000073
Solving the eigenvalue and the corresponding eigenvector;
step 3.2.3: sorting the eigenvalues in a descending order; sorting the eigenvalues solved in the step 3.2.2 in a descending order, and further sorting the eigenvectors corresponding to the eigenvalues to form an eigenvector matrix V; the matrix V represents the complete principal component space;
step 3.2.4: obtaining data in a rotation space; mapping the centralized data to a principal component space, and obtaining rotated data Z by taking Z as XV;
step 3.2.5: calculating Euclidean distance under the new coordinate, namely corresponding Mahalanobis distance between the original data;
Figure BDA0002786416250000074
the smaller the mahalanobis distance is, the higher the similarity between samples is; the larger the distance, the smaller the similarity between samples;
wherein Di(z0-zi) Is the current sample Y0And training set sample YiMahalanobis distance of; z is a radical of0Is the coordinate position of the current sample in the new coordinate system, ziIs the ith sample Y in the training setiAt the coordinate position of the new coordinate system.
Further, the selecting the first K sample data in step 3.4, where the K value determining method specifically includes:
selecting and determining a K value, and determining a final K value according to the accuracy of a leave-one-out experiment of current sample data; reserving one sample as a test set sample for the sample data of the known category each time, and using other sample data as training set samples; selecting a K value, predicting by the KNN method, and counting the accuracy; the K value with the highest accuracy is the set K value;
Figure BDA0002786416250000081
is odd number
Wherein, PKRepresents the average accuracy of the current K value; KNN (Y)i) Represents a sample YiAs a result of the remaining samples of the test set samples being used as training set samples, when the predicted result is associated with YiSame class of (C), KNN (Y)i) 1 is ═ 1; when the predicted result is equal to YiKNN (Y) when the categories of (C) are differenti)=0;
Figure BDA0002786416250000082
Representing the correct number of prediction results in N samples; selection of PKThe value corresponding to the maximum K is used as the final K value.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
(1) and for the confirmation of the key parameter items of the aluminum electrolytic capacitor, performing primary extraction by adopting an extraction method of a frequent item set, and finally confirming the key parameter items of the aluminum electrolytic capacitor through the parameter items appearing in the frequent item set. The process of counting and extracting the key parameters and the confirmed key parameters have less manual participation, and the confirmed result is objective and accurate, thereby better meeting the requirements of customers when actually selecting products. And determining the sign and rule of the decomposition of the parameters aiming at the confirmation parameters of the aluminum electrolytic capacitor.
(2) Due to the fact that prepared data respectively belong to scores or data of different aspects, and due to the fact that measurement units or scoring standards in raw data are different, if a direct distance calculation mode is adopted, data of certain dimensions may play a small role or even no role in distance calculation, and data of certain dimensions may play a large role or even an excessive role in calculation. Therefore, the mahalanobis distance is used as a measurement for calculation, the mahalanobis distance is not controlled by the dimension, and the mahalanobis distance between two samples is irrelevant to the measurement unit and the measurement standard of the original sample data. Meanwhile, the mahalanobis distance can consider the relevance between the data and is independent of the scale, namely independent of the measurement unit and the scoring standard. The mahalanobis distance calculated from the normalized data and the centered data is the same and there is no calculation error. Mahalanobis distance also eliminates correlation interference between variables.
(3) KNN is classified by predicting the distance between different samples. The idea is as follows: if the K most similar samples (i.e. closest samples in the feature space) in the feature space belong to a certain class, the sample also belongs to this class. The KNN method has lower training time complexity, namely O (n); the data is not assumed, the accuracy is high, and the data is not sensitive to abnormal points; the new data can be directly added into the data set without retraining, the data is added flexibly, and the data model does not need to be replaced regularly.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of an aluminum electrolytic capacitor purchase prediction method based on the KNN algorithm of mahalanobis distance disclosed in the embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a method for predicting aluminum electrolytic capacitor purchase based on a Mahalanobis distance KNN algorithm, which comprises the following steps: (1) confirming parameters of the aluminum electrolytic capacitor; (2) determining the product of the aluminum electrolytic capacitor; (3) the KNN algorithm based on mahalanobis distance is a purchase prediction three main steps. Confirming the key parameters of the aluminum electrolytic capacitor by analyzing the materials of the aluminum electrolytic capacitor, preliminarily extracting the key parameter items of the aluminum electrolytic capacitor by adopting a frequent item set extraction method, and finally confirming the key parameter items of the aluminum electrolytic capacitor by the parameter items appearing in the frequent item set; determining a parameter matching scheme according to a decomposition rule of the material description, and determining a product range which can be used; and (3) providing a purchase prediction of the product at the current user by adopting a prediction purchase method based on the KNN algorithm of the Mahalanobis distance, and primarily screening a product list which is possibly purchased for the user. The method is described in detail below.
(1) Parameter confirmation of aluminum electrolytic capacitor
Step 1.1: and collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor.
Step 1.2: and (6) cleaning data. Clearing blank or material description of the aluminum electrolytic capacitor only with Chinese character part, and then converting the digital representation in the material description into a uniform numerical value (namely, combination between different numerical values and the same character, the meaning of the representation is the same, namely, 25V and 50V are represented voltage values). The data set of the aluminum electrolytic capacitor after cleaning is D.
Step 1.3: and counting a character set W1 (except Chinese or English symbols, retaining mathematical expression symbols such as x, plus or minus) appearing in all the aluminum electrolytic capacitors, and performing deduplication.
Step 1.4: a frequent item set S is extracted (the principle is that if a certain item set is a frequent item set, then all its non-empty subsets are also frequent (a priori) — i.e., {0}, {1} must also be frequent if {0, 1} is frequent.
Wherein, the step of extracting the frequent item set S comprises the following steps:
step 1.4.1: will character set W1In (3) combining two by two to form a set W2. Set W2If the number of occurrences is greater than L (L is determined according to the number of data sets, where L is N × 10%, where N is the number of data pieces in a data set), the character string in (D) is a subset of the frequent item set, and the character string set is T2(ii) a If not, the item set containing the character string is not a frequent item set.
Step 1.4.2: will T2And W1Are combined, T2With the character string in front, W1Thereafter, a set W of character strings is formed3. Set W3If the occurrence frequency is more than L, the character string set is a subset of a frequent item set, and the character string set is T3(ii) a If not, the item set containing the character string is not a frequent item set.
Step 1.4.3: t is3And W1Combining, repeating the step 1.42 to find frequent item set to obtain character string set T4. Circulate by this until TnWhen empty, the loop ends.
Step 1.4.4: set of string items W ═ Tn-1,…,T3,T2,T1,W1]。
Step 1.4.5: the number of occurrences is counted. And counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the character strings and the corresponding occurrence times are contained.
Step 1.4.6: and merging the character strings. When merging character strings, the process proceeds from the character string with a shorter length to the upper side. A long character string and a subset character string, and if the number of occurrences of the long character string is greater than or equal to the number of occurrences of the short character string in FW, removing the short character string and the corresponding number of occurrences thereof (the number of occurrences of the long character string here is the sum of the number of occurrences of all superset character strings closest to the short character string); if the number of occurrences of the long character string is smaller than that of the short character string, the number of occurrences of the short character string is modified to be the number of occurrences of the short character string-the number of occurrences of the long character string (the number of occurrences of the long character string here is the sum of the numbers of occurrences of all superset character strings closest to the short character string). If 5pf, 5p, 5, 5V strings exist in FW, the nearest superset strings corresponding to short string 5 here are 5p and 5V, and the nearest superset string corresponding to short string 5p is 5 pf.
Step 1.4.7: and sorting, namely sorting the character strings from high to low according to the occurrence times of the character strings, wherein the occurrence times of the character strings exceed 30% of the total number N of the data sets to form a set S.
Step 1.5: and extracting the character string (such as 50PF, which represents the capacitance value; 50V, which represents the voltage) of the expression parameter with higher frequency of occurrence according to the character string extracted from the set S, and finally determining the key parameter item of the aluminum electrolytic capacitor.
The following exemplary steps for confirming the parameters of the aluminum electrolytic capacitor in the step (1) are as follows:
example (c): because the data set has larger data, the example display is more complex, and the next example is only displayed by displaying the keyword step confirmed by the aluminum electrolytic capacitor parameters or the data of the assumed condition.
Step 1.1: for example, there are three strings, '1000 uF/25V, 10 x 20, 105 ℃, 20%, 5', '2200 uF/35V, 16 x 25, 105 ℃, 20%, 7.5, 6000 HRS', 'capacitance';
step 1.2: cleaning data, here the number is uniformly replaced by 5. The data after washing are obtained as follows:
Figure BDA0002786416250000111
step 1.3: counting the current character set W1
W1={R,V,S,u,%,F,5,H,℃,*}
Step 1.4: and extracting a frequent item set S.
Step 1.4.1: from W1To obtain W2The value of L is here 0.2, i.e.T.can be obtained2The following are:
W2={RR,RV,RS,Ru,R%,RF,R5,RH,R℃,R*,VR,VV,VS,Vu,……}
T2={RS,uF,5V,5u,5%,5H,5℃,5*,HR,*5}
step 1.4.2: from T2And W1To obtain W3Thereby obtaining T3The following are:
W3={RSR,RSV,RSS,RSu,RS%,RSF,RS5,RSH,RS℃,RS*,……}
T3={5uF,5HR,5*5,HRS}
step 1.4.3: from T3And W1To obtain W4Thereby obtaining T4Sequentially circulating until TnThe following are null:
W4={5uFR,5uFV,5uFS,5uFu,5uF%,5uFF,5uF5,5uFH,5uF℃,5uF*,……}
T4={5HRS}
W5={5HRSR,5HRSV,5HRSS,5HRSu,5HRS%,5HRSF,5HRS5,5HRS℃,……}
T5={}
T5the loop ends with null.
Step 1.4.4: a set W of string item sets is obtained as follows:
W={5HRS,5uF,5HR,5*5,HRS,RS,uF,5V,5u,5%,5H,5℃,5*,HR,*5,R,V,S,u,%,F,5,H,℃,*}
step 1.4.5: and counting the occurrence times of the character strings to obtain FW.
FW={{5HRS,1},{5uF,2},{5HR,1},{5*5,2},{HRS,1},{RS,1},{uF,2},{5V,2},{5u,2},{5%,2},{5H,1},{5℃,2},{5*,2},{HR,1},{*5,2},{R,1},{V,2},{S,1},{u,2},{%,2},{F,2},{5,15},{H,1},{℃,2},{*,2}}
Step 1.4.6: and merging the character strings. This step should be performed starting with a string of shorter length. The examples herein are illustrated with main lines: 5 → F → u → 5u → uF → 5uF for example, do not proceed exactly as step 1.46 in the extraction of the frequent item set S.
(1) The character string '5' occurs 15 times, its most recent superset is '5V', '5 u', '5%', '5H', '5 ℃', '5', the sum of the times of occurrence is: 2+2+2+1+2+2+ 13, 5' and the corresponding number of times in FW are modified to {5, 2 }.
(2) And if the character string 'F' occurs for 2 times, the latest superset is 'uF', and the occurrence times are 2, removing 'F' in the FW and the corresponding times of the FW.
(3) If the character string 'u' occurs 2 times, the latest superset is 'uF', and the total number of occurrences is 2+2 — 4, then 'u' and its corresponding number in FW are removed.
(4) And if the occurrence frequency of the character string 'uF' is 2, the latest superset is '5 uF', and the occurrence frequency is 2, removing 'uF' in FW and the corresponding frequency thereof.
The FW after merging the character strings according to the step 1.4.6 of extracting the frequent item set S is:
FW={{5HRS,1},{5uF,2},{5*5,2},{5V,2},{5%,2},{5℃,2},{5,2}}
and 1.4.7, sequencing. A total number N of 30% is 0.6, which is true, S { {5uF, 2}, {5 × 5, 2}, {5V, 2}, { 5%, 2}, {5 ℃, 2}, {5, 2}, {5HRS, 1} }.
Step 1.5: according to the character strings displayed by the set S, for example, extracting the character strings representing parameters with higher occurrence frequency, and determining the meaning of the parameter items, such as: 5uF represents capacitance; 5 x 5 denotes diameter (length) and height (width); 5V represents a voltage; 5% indicates precision; 5 ℃ represents the working temperature; 5HRS represents lifetime.
(2) Product determination of aluminum electrolytic capacitor
Step 2.1: determining user input, a type of original factory (including material descriptions of only the type of original factory and the type of original factory, wherein the types of original factories mentioned below both include the two cases), and material descriptions (the material descriptions mentioned below both include material descriptions that identify the type of original factory unsuccessfully, and there are two cases, namely, one is a material description that is described as a simple material description, and the other is a description that includes the type of original factory that is not included in the platform).
Step 2.2: if the model of the original factory is determined through the database of the platform in the step 2.1, the following steps of determining the product can be directly skipped, and the product can be directly positioned. If the input is determined to be a material description in step 2.1, proceed downwards.
Step 2.3: and (5) confirming the product type. And checking whether the material description contains the category name, the alias and the like in the corresponding data or not, and finally confirming the category of the material description. If the material description of the aluminum electrolytic capacitor is adopted, the following parameter decomposition process is met.
Step 2.4: and determining the parameters of the aluminum electrolytic capacitor. From the material description about the aluminum electrolytic capacitor accumulated by the platform, the key parameters for determining the aluminum electrolytic capacitor product are statistically obtained (the statistical process is set forth in the step of confirming the parameters of the aluminum electrolytic capacitor 1), which is shown in the following table:
TABLE 1 aluminium electrolytic capacitor key parameter table
Figure BDA0002786416250000141
(1) Capacity value: float type values, in uniform units UF (microfarad), units are not shown.
(2) Rated voltage: float type value. The uniform unit is V (volts), and the unit is not shown.
(3) Precision: float type value. The unity unit is%, the unit is not shown.
(4) Service life: int type value. The uniform units are HRS (hours) and the units are not shown.
(5) Working temperature: the character type, the working temperature high value and the working temperature low value are all int type numerical values, the working temperature low value and the working temperature high value are connected through a symbol '/', and finally the character type is returned.
(6) Diameter (length): float type value. The uniform unit is MM (millimeters), and the unit is not shown.
(7) Height (width): float type value. The uniform unit is MM (millimeters), and the unit is not shown.
(8) Foot distance: float type value. The uniform unit is MM (millimeters), and the unit is not shown.
(9) The installation mode is as follows: the character type.
The output format of the above parameter items is the same as the data format of the unified product parameter items in the database.
Step 2.5: a uniform symbol. And certain symbols are replaced uniformly, so that subsequent operations, parameter extraction and other operations are facilitated. And replacing the symbols in the original material description according to the symbol corresponding table, wherein the following table shows the corresponding relationship of partial symbols.
TABLE 2 symbol mapping Table
Figure BDA0002786416250000142
Figure BDA0002786416250000151
Step 2.6: and extracting the installation mode. The installation mode has a corresponding relation between description vocabularies (as shown in table 3), and the installation mode is extracted according to the corresponding relation. The installation mode and the vocabulary corresponding to the installation mode are reserved (in the operation, the following steps are recording and extracting the characters corresponding to the original representation), and the corresponding vocabulary in the material description is deleted, so that the extracted information is ensured not to be repeatedly used (the action of deleting the extracted information characters in the material description is collectively called as updating the material description hereinafter).
TABLE 3 corresponding relation (part) of installation mode and description vocabulary
Figure BDA0002786416250000152
Step 2.7: and (5) extracting the precision. For descriptions with only one precision, such as descriptions with only 10% or +/-10, and the like, directly extracting '10.0' as an output of the precision; for precision descriptions where two or more or precision ranges appear in the description, such as 10% 20% or 10-20% or + -10-20%, and the like, a '10.0/20.0' is extracted as an output. Recording the extracted and originally expressed characters, and updating the material description.
Step 2.8: a uniform distance symbol. The equidistant units of 'MM', 'CM', 'DM','m', 'DM', 'CM', 'MM' are unified into MM (millimeters). For example, 5MM × 3MM, after unifying the distance units, a new material description 5MM × 3MM is obtained, and a value (float value, for example, 5.0) and an original character (for example, 5MM) after unifying the distance are stored, that is, a corresponding relationship between the recorded data and the original data is obtained.
Step 2.9: extracting diameter (length), height (width) and foot distance. The extracted material descriptions are the material descriptions after characters are capitalized, parts except numbers and letters X are replaced by blanks, a first step extracts a field of a form of multiplication of 7X8X6 or 7X8X6, and extracts the parts of numerical values in the fields, wherein the first numerical value is the diameter (length), the second numerical value is the height (width), and the third numerical value is the foot distance. If the first extraction fails, extracting a second extraction 7X8 or 7X8 continuous multiplication form field, and extracting the value part in the field, wherein the first value is the diameter (length) and the second value is the height (width) of the number extracted by the method; extracting 'PITCH', 'P-' and the like, which are followed by the value, namely the foot distance. And when extracting, extracting through the capital-followed material description, and confirming the corresponding character of the character in the original material description through the position of the extracted corresponding character. The original description as input in this step is: '10 pf, 50v, 7mm 8mm 6 mm', the material after treatment is described as '10, 50, 7x8x 6', after extraction has a diameter of 7.0, a height of 8.0, a foot pitch of 6.0, corresponding to a character of 7mm 8mm 6 mm.
Step 2.10: the descriptions about capacity, voltage, temperature and time in the material description are uniformly replaced by respective uniform characters for description. The same as the conversion of distance units in step 2.8. If '50 uf, 80uf, 500V', the converted numerical value about the capacity value is stored as [50.0, 80.0], and the original character represented by the capacity value is [ '50 uf', '80 uf' ]; the value for the voltage is stored as [500.0], the original character of the voltage representation is [ '500V' ]; material description updated to ' 50.0UF, 80.0UF, 500.0V ' after converting the replaced character '
After conversion, the values and corresponding characters are stored. The correspondence of the converted unit symbols and the uniform symbols for each unit is given in the following table.
TABLE 4 corresponding relationship (part) between each unit and the same unit of the device
Unit of Unified unit representation
Volume value MF,UF,NF,PF,F UF
Temperature of Degree centigrade, DEG C
Time HOUR, HOUR, HRS HRS
Voltage of MV,UV,VAC,KV,V,VDC V
Step 2.11: extracting capacity, voltage and service life. Taking the extraction capacity value as an example for explanation: if the numerical value related to the capacity value exists in the step 2.10, the maximum value is extracted and output as the capacity value. And simultaneously, extracting the original character basis of the volume value from the original character corresponding to the numerical value. The material description is updated (all of the parts on the volume value are updated). If the volume value is extracted from [50.0, 80.0], returning to 80.0; the original character is 80 uf; the material description is updated to 500.0V.
Step 2.12: the temperature range of extraction. Case 1: if two values relating to the temperature are extracted in step 2.10, the first value is taken as the lowest operating temperature and the second value is taken as the highest operating temperature, and the output is concatenated with the symbol '/'. Case 2: if only one value related to the temperature is extracted in the step 2.10, inquiring whether a value similar to '25-150 ℃, namely' -150 ℃ is present in the material description, namely the value immediately before and immediately before the value is the lowest working temperature, and if the value is present, outputting a temperature range representation; if the working range does not exist, inquiring a numerical value similar to '25 ℃ -150', namely, a numerical value immediately adjacent to the numerical value after '25 ℃ -150', and taking the numerical value as the highest working temperature, and if the working range exists, outputting a working range representation; if not, outputting the value represented by the current temperature. And records its original representation. And updating the material description.
Step 2.13: the capacitance value represented by the capacitance special symbol is extracted. And if the volume value is not successfully extracted in the step 2.11, entering the step, further extracting the volume value, and updating the material description after the volume value is successfully extracted. Similar volume value representations like '5U 3', '5U', 'U5' are extracted, which can be respectively extracted as numerical values 5.3, 5, 0.5, in UF units. The letters representing the special symbols of the capacitor and the correspondence with the units are shown in the following table:
TABLE 5 correspondence between special symbols of capacitors and units of capacitance values
1 2 3 4 5
Special characters P N U M R
Corresponding unit PF NF UF UF PF
Step 2.14: and extracting the capacity value, voltage and precision of the scientific counting method. If the capacitance value or the voltage or the precision is not successfully extracted before the step, entering the step for extraction. Fields like '104K 500', '104', '500', i.e. a combination of three-bit integers and letters or a combination of pure three-bit integers are extracted. Wherein, when the letter has a letter and the letter belongs to the letter set representing the precision, the first combination is in accordance with the condition, the integer is represented by a scientific counting method of a volume value, and the letter is represented by the letter representing the precision; if only one digital combination exists, only the capacitance value is not extracted at present, and the voltage is extracted, the digital combination is represented by a scientific counting method of the capacitance value; if only one digital combination exists, only the voltage is not extracted at present, and the capacitance value is extracted, the digital combination is represented by a scientific counting method of the voltage; if the number of the digital combinations is more than or equal to 2, the numerical value represented by the scientific counting method in the first two digital combinations is represented by a capacitance value, and the numerical value represented by the scientific counting method is represented by a voltage. The numerical combination expressed by the scientific counting method has the capacity value conversion mode as follows: the value represented by the first two digits of the combination of digits ^ (10^ (the value represented by the last digit of the combination of digits-6)); the voltage conversion mode is as follows: the first two digits of the digit combination represent the value x (the value represented by the last digit of the 10 combination). The capacitance value, voltage or precision extracted by the step only supplements the extracted information, but does not replace the extracted information. The conversion of the letters and the precision representation is shown in the following table:
table 6 correspondence relationship of precision representation
1 2 3 4 5 6 7
Letters J K M F G S Z
Accuracy of measurement 5.0 10.0 20.0 1.0 2.0 20.0/50.0 20.0/80.0
Step 2.15: the pure numerical values are extracted. And if the capacity value is not successfully extracted in the steps, entering the step to extract the capacity value. And extracting pure numbers in the material description, outputting the numerical values as volume values if the numerical values are numerical values without letters before and after, and updating the material description. If the material description is '2.32.5 s', 2.3 is extracted and output as the numerical value of the volume value.
Step 2.16: the accuracy of the alphabetical representation is extracted. And if the precision is not successfully extracted in the steps, entering the step to extract the precision. And if the letters in the sixth table exist, the first letter appearing first in the sequence of the sixth table is input as the precision, and the material description is updated. As with material description 'K J', since 'J' is advanced in table 6, i.e., the accuracy of extraction in this material description is '5.0', in '%'.
Step 2.17: and (5) extracting the foot distance. And if the step does not successfully extract the pin pitch, entering the step to extract the pin pitch. The extract describes the remaining numbers in the description without immediately preceding or following letters, and if the size of the number is between [1, 50] and only one decimal number at most can represent the size, the number is output as the foot distance.
Step 2.18: the parameter item is confirmed. If all the necessary parameter items (shown in the following table) are successfully extracted, entering the next step to confirm the product; otherwise, reminding the user that the parameters are missing and completing the corresponding parameter items.
TABLE 7 necessary parameters table for aluminum electrolytic capacitor
1 2 3 4 5 6
Aluminum electrolysisCapacitor with a capacitor element Volume value Rated voltage Accuracy of measurement Operating temperature Diameter (Length) Foot distance
Step 2.19: and (5) confirming the product. And determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the possibility of purchasing the purchasable products by KNN algorithm based on the Mahalanobis distance.
(3) Markov distance-based KNN algorithm purchase prediction
Step 3.1: and (4) preparing data.
Data preparation is performed by the user's purchase record, keeping the data samples of different categories (purchase, non-purchase) at a 1: 1 ratio. And setting the number of samples as N, and simultaneously ensuring that the number of the samples is greater than the characteristic dimension L of the samples.
Step 3.2: the distance is calculated.
And calculating the distance between the current sample to be classified and each sample in the classified samples in the training set. Considering that mahalanobis distance is independent of dimension, correlation interference between variables can be eliminated, so mahalanobis distance (the total number of samples is larger than the number of dimensions of samples) is used here to calculate the distance D (Y) between samplesi,Yj). Wherein D (Y)i,Yj) Represents a sample YiAnd YjThe specific calculation method of the mahalanobis distance is as follows:
and (3) calculating the Mahalanobis distance:
step 3.2.1: and (4) centralizing data.
Figure BDA0002786416250000191
Wherein Y represents raw data, X represents centralized data,
Figure BDA0002786416250000192
represents the average of the raw data.
Step 3.2.2: and solving the mapping matrix. By covariance matrix
Figure BDA0002786416250000193
And solving the eigenvalues and the corresponding eigenvectors.
Step 3.2.3: and sorting the eigenvalues in descending order. And (4) performing descending arrangement on the eigenvalues solved in the step (3.2.2), and further sequencing the eigenvectors corresponding to the eigenvalues to form an eigenvector matrix V. The matrix V represents the complete principal component space.
Step 3.2.4: data in the rotation space is obtained. The centered data is mapped into a principal component space, and the rotated data Z is obtained when Z is XV.
Step 3.2.5: and calculating Euclidean distance under the new coordinate, namely the corresponding Mahalanobis distance between the original data.
Figure BDA0002786416250000194
The smaller the mahalanobis distance is, the higher the similarity between samples is; the larger the distance, the smaller the similarity between samples.
Wherein Di(z0-zi) Is the current sample Y0And training set sample YiMahalanobis distance of. z is a radical of0Is the coordinate position of the current sample in the new coordinate system, ziIs the ith sample Y in the training setiAt the coordinate position of the new coordinate system.
Step 3.3: and (5) sorting the distances.
According to the distance D between the current sample to be classified and each sample in the training seti1, 2, the.
Step 3.4: neighbor samples are determined.
Selecting the first K sample data (K is odd number, excluding the condition that the two classified samples have the same number and are smaller than the number of the two classified samples)
Figure BDA0002786416250000202
And temporarily taking the value of K as 11, adjusting the values of N and K according to the later use condition, taking the data of known classes as a test set, adjusting N and K according to the accuracy rate to select the optimal value of N and K), and taking the optimal value of N and K as a neighbor sample of the current sample to be classified.
Step 3.5: and counting the number of the class attributes of the neighbor samples.
Counting the class attribute of K neighbor sample data, wherein the class is w1The number of samples of (1) is t1Class is w2The number of samples of (1) is t2
Step 3.6: and determining the class attribute of the sample to be classified.
When t is1>t2Then, the sample data to be classified has a class attribute of w1I.e., purchased.
When t is1<t2Then, the sample data to be classified has a class attribute of w2I.e., not purchased.
Step 3.7: and adding the prediction result of the purchasable products into the list recommended to be purchased for the user.
Note that: and K value determination.
And selecting and determining the K value, and determining the final K value according to the accuracy of the leave-one-out experiment of the current sample data. And reserving one sample as a test set sample and taking other sample data as training set samples. And selecting a K value, predicting by the KNN method, and counting the accuracy. The K value with the highest accuracy is the set K value.
Figure BDA0002786416250000201
And K is an odd number
Wherein, PKRepresenting the current value of KAverage accuracy. KNN (Y)i) Represents a sample YiAs a result of the remaining samples of the test set samples being used as training set samples, when the predicted result is associated with YiSame class of (C), KNN (Y)i) 1 is ═ 1; when the predicted result is equal to YiKNN (Y) when the categories of (C) are differenti)=0。
Figure BDA0002786416250000211
Indicating the number of N samples in which the prediction is correct. Selection of PKThe value corresponding to the maximum K is used as the final K value.
Step 3 exemplifies: markov distance-based KNN algorithm purchase prediction
Step 3.1: preparing sample data
Suppose there are four original sample data, each containing score data in three dimensions, and the matrix is [ Y ]1,Y2,Y3,Y4]=[160,60000,1;170,60000,1;160,60000,0;170,60000,0]The corresponding classification matrix is: [1,0,1,0]Where 1 indicates a classification as purchase and 0 indicates a classification as non-purchase.
The data to be classified is Y0=[160,59000,1]。
Then: the raw data and the data to be classified form a set Y.
Y=[Y1,Y2,Y3,Y4,Y0]=[60,600,1;70,600,1;60,600,0;70,600,0;60,590,1]
Step 3.2: calculating the Mahalanobis distance between the sample to be classified and the original sample data
Step 3.2.1 data centralization
The calculation can be carried out to obtain the,
Figure BDA0002786416250000212
then X [ -4, 2, 0.4], [6, 2, 0.4], [ -4, 2, -0.6], [6, 2, -0.6], [ -4, -8, 0.4]
Step 3.2.2 solving
Figure BDA0002786416250000213
The eigenvalues and corresponding eigenvectors of (a) are as follows:
eigenvalues (here keeping the four dimensional decimal representation): [28.964411.07610.1995]
Feature vectors (here the four digit decimal representation is retained):
Figure BDA0002786416250000214
step 3.2.3 eigenvalues are sorted in descending order to obtain an eigenvector matrix V:
Figure BDA0002786416250000215
and 3.2.4, obtaining data Z in the feature space (wherein the data are displayed by keeping four decimal places).
Figure BDA0002786416250000221
Step 3.2.5 calculates the distance between the sample to be classified and the original sample (four-digit decimal is retained for illustration).
D=[D1 D2 D3 D4]=[10.0 14.1421 10.0499 14.1774]
Namely, the similarity degree of the data to be classified and the samples in the original data is decreased in the following order: d0->D2->D1->D3
Step 3.3: sorting in ascending order of distance
D=[D1 D3 D2 D4]=[10.0 10.0499 14.1421 14.1774]
Step 3.4: determining neighbor samples
Assume that K is 3 here (not followed here because of the small amount of data)
Figure BDA0002786416250000222
I.e., 1. ltoreq. K. ltoreq.2).
The neighbor sample is the distance value D1,D3,D2Corresponding original sample Y1,Y3,Y2The corresponding category attributes are 1, 1, and 0, respectively.
Step 3.5: counting the number of class attributes of neighbor samples
The number of samples with the category of 1 is 2; the number of samples of class 0 is 1.
Step 3.6: determining class attributes of a sample to be classified
Since the number of samples with the category of 1 in the neighbor samples is greater than that of samples with the category of 0, the attribute category of the sample to be classified is classified as 1.
Step 3.7: and adding the sample to be classified into a list recommended to be purchased for the user.
It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. Of course, the processor and the storage medium may reside as discrete components in a user terminal.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".

Claims (8)

1. The aluminum electrolytic capacitor purchase prediction method based on the KNN algorithm of the Mahalanobis distance is characterized by comprising the following steps of:
step 1: confirming the key parameters of the aluminum electrolytic capacitor by analyzing the materials of the aluminum electrolytic capacitor, preliminarily extracting the key parameter items of the aluminum electrolytic capacitor by adopting a frequent item set extraction method, and finally confirming the key parameter items of the aluminum electrolytic capacitor by the parameter items appearing in the frequent item set;
step 2: determining a parameter matching scheme according to a decomposition rule of the material description, and determining a product range which can be used;
and step 3: and (3) providing a purchase prediction of the product at the current user by adopting a prediction purchase method based on the KNN algorithm of the Mahalanobis distance, and primarily screening a product list which is possibly purchased for the user.
2. The method for predicting the purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 1, wherein the key parameters of the aluminum electrolytic capacitor are confirmed by analyzing the material of the aluminum electrolytic capacitor in the step 1, and the method comprises the following specific steps:
step 1.1: collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor;
step 1.2: cleaning data: clearing blank or material description of the aluminum electrolytic capacitor only with Chinese character parts, then converting digital representation in the material description into a uniform numerical value, and setting a data set of the aluminum electrolytic capacitor after cleaning as D;
step 1.3: counting all character sets W appearing in aluminum electrolytic capacitors1Removing the weight;
step 1.4: extracting a frequent item set S according to the principle: if a set of items is a frequent set of items, then all of its non-empty subsets are also frequent; conversely, if a set of items is infrequent, then all of its supersets are also infrequent;
step 1.5: and extracting the character string with higher frequency of occurrence to represent the parameters according to the character string extracted from the set S, and finally determining the key parameter item of the aluminum electrolytic capacitor.
3. The method for predicting the purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 2, wherein the step 1.4 of extracting the frequent item set S comprises:
step 1.4.1: will character set W1In (3) combining two by two to form a set W2Set W of2If the occurrence frequency is more than L, the character string exists in the material description of the data set D, the character string is a subset of a frequent item set, and the character string set is T2(ii) a The L is determined according to the number of the data sets, and is equal to N multiplied by 10 percent, wherein N is the number of the data sets; if not, the item set containing the character string is not a frequent item set;
step 1.4.2: will T2And W1Are combined, T2With the character string in front, W1Thereafter, a set W of character strings is formed3(ii) a Set W3If the occurrence frequency is more than L, the character string set is a subset of a frequent item set, and the character string set is T3(ii) a If not, the item set containing the character string is not a frequent item set;
step 1.4.3: t is3And W1Combining, repeating the operation of searching a frequent item set in the step 1.42 to obtain a character string set T4; circulate by this until TnWhen the time is empty, the cycle is ended;
step 1.4.4: set of string items W ═ Tn-1,…,T3,T2,T1,W1];
Step 1.4.5: counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the matrix FW comprises the character strings and the corresponding occurrence times;
step 1.4.6: merging character strings: when the character strings are combined, the character strings with shorter length start to go upwards; if the number of occurrences of the long character string is greater than or equal to the short character string, removing the short character string and the corresponding number of occurrences thereof in the FW, wherein the number of occurrences of the long character string is the sum of the number of occurrences of all the closest superset character strings of the short character string; if the occurrence frequency of the long character string is less than that of the short character string, modifying the corresponding frequency of the short character string into: the number of occurrences of the short string-the number of occurrences of the long string, where the number of occurrences of the long string is the sum of the number of occurrences of all superset strings that are closest to the short string;
step 1.4.7: and sorting, namely sorting the character strings from high to low according to the occurrence times of the character strings, wherein the occurrence times of the character strings exceed 30% of the total number N of the data sets to form a set S.
4. The method for predicting the purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 1, wherein the step 2 of determining the parameter matching scheme according to the decomposition rule of the material description and determining the range of products which can be used comprises the following specific steps:
step 2.1: determining user input, original plant model and material description;
step 2.2: if the model is determined to be the original factory model through the database of the platform in the step 2.1, directly skipping the following steps of product determination and directly positioning the product; if the input is determined to be the material description in the step 2.1, the process is carried out downwards;
step 2.3: confirming the product types: checking whether the material description contains the category name and the alias in the corresponding data or not, and finally confirming the category of the material description;
step 2.4: determining the parameters of the aluminum electrolytic capacitor: in the material description about the aluminum electrolytic capacitor accumulated by the platform, referring to step 1, the key parameters for determining the aluminum electrolytic capacitor product are obtained by statistics as follows:
(1) capacity value: float type numerical value, the unified unit is UF, the unit does not display;
(2) rated voltage: float type numerical value, the unified unit is V, the unit does not display;
(3) precision: float type values in%, units are not shown;
(4) service life: int type numerical value, the unified unit is HRS, and the unit is not displayed;
(5) working temperature: the character type, the working temperature high value and the working temperature low value are all int type numerical values, the working temperature low value and the working temperature high value are connected through a symbol '/', and finally the character type is returned;
(6) diameter: float type numerical value, the unified unit is MM, the unit does not display;
(7) height: float type numerical value, the unified unit is MM, the unit does not display;
(8) foot distance: float type numerical value, the unified unit is MM, the unit does not display;
(9) the installation mode is as follows: a character type;
the output format of the parameter items is the same as the data format of the unified product parameter items in the database;
step 2.5: uniform notation: part of symbols are replaced uniformly, so that subsequent operation and parameter extraction operation are facilitated;
step 2.6: the extraction and installation mode is as follows: the installation mode has a corresponding relation between description vocabularies, and the installation mode is extracted according to the corresponding relation; reserving the installation mode and extracting the vocabulary corresponding to the installation mode, and deleting the corresponding vocabulary in the material description at the same time, so as to ensure that the extracted information is not repeatedly used;
step 2.7: extracting precision; recording the extracted and originally expressed characters, and updating the material description;
step 2.8: uniform distance notation: unifying the distance units into MM, and simultaneously storing the unified distance values and original characters, namely recording the corresponding relation between data and original data;
step 2.9: extracting diameter, height and foot distance; the extracted material descriptions are the material descriptions after characters are capitalized, and parts except numbers and letters X are replaced by blank spaces;
step 2.10: uniformly replacing the descriptions about the capacity value, the voltage, the temperature and the time in the material description with respective uniform characters for description; storing the numerical values and the corresponding characters after conversion;
step 2.11: extracting capacity value, voltage and service life; updating the material description;
step 2.12: the temperature range of extraction; updating the material description;
step 2.13: extracting a capacitance value represented by the special symbol of the capacitor; if the volume value is not successfully extracted in the step 2.11, entering the step, further extracting the volume value, and updating the material description after the volume value is successfully extracted;
step 2.14: extracting the capacity value, voltage and precision of a scientific counting method; if the capacitance value or the voltage or the precision is not successfully extracted before the step, entering the step for extraction;
step 2.15: extracting the capacity value of the pure number representation; if the capacity value is not successfully extracted in the steps, the step is carried out to extract the capacity value; extracting pure numbers in the material description and numerical values without letters before and after, outputting the numerical values as capacity values, and updating the material description;
step 2.16: extracting the precision of the letter representation; if the precision is not successfully extracted in the steps, the step is carried out to extract the precision; the method comprises the following steps that (1) letters appearing in a material description are single, namely, letter symbols adjacent to the letters do not exist before and after the letters, the first letter appearing first is input as precision, and the material description is updated;
step 2.17: extracting the foot distance; if the step is not successful, entering the step to extract the foot distance; the remaining numbers in the description of the extract are not adjacent to letters in the front and the back, and if the numerical value is between [1, 50] and only one decimal number can represent the numerical value, the numerical value is output as the foot distance;
step 2.18: confirming a parameter item; if all the necessary parameter items are successfully extracted, entering the next step to confirm the product; otherwise, reminding the user that the parameters are missing, and completing the corresponding parameter items;
step 2.19: confirming the product; and determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the possibility of purchasing the purchasable products by KNN algorithm based on the Mahalanobis distance.
5. The method for predicting the purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 4, wherein the necessary parameter items of the aluminum electrolytic capacitor in the step 2.18 include: capacity, rated voltage, precision, operating temperature, diameter/length, foot distance.
6. The method for predicting the purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 1, wherein the method for predicting the purchase of the product based on the mahalanobis distance KNN algorithm in the step 3 gives the prediction of the purchase of the product at the current user, and the specific steps comprise:
step 3.1: preparing data:
data preparation is carried out through the purchase records of the users, so that data samples of different categories, including purchase or non-purchase, are kept in a ratio of 1: 1; setting the number of samples as N, and simultaneously ensuring that the number of the samples is greater than the characteristic dimension L of the samples;
step 3.2: calculating the distance:
calculating the distance between the current sample to be classified and each sample in the classified samples in the training set; calculating the distance between the samples by adopting the Mahalanobis distance;
step 3.3: sorting distances;
according to the distance D between the current sample to be classified and each sample in the training seti1, 2, a.
Step 3.4: neighbor samples are determined.
Selecting the first K sample data as neighbor samples of the current sample to be classified according to the sorted distance;
step 3.5: counting the number of the category attributes of the neighbor samples;
counting the class attribute of K neighbor sample data, wherein the class is w1The number of samples of (1) is t1Class is w2The number of samples of (1) is t2
Step 3.6: determining the category attribute of a sample to be classified;
when t is1>t2Then, the sample data to be classified has a class attribute of w1I.e., purchase;
when t is1<t2Then, the sample data to be classified has a class attribute of W2I.e., not purchased;
step 3.7: and adding the prediction result of the purchasable products into the list recommended to be purchased for the user.
7. The method for predicting the purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 6, wherein the step 3.2 of calculating the distance between the samples by using the mahalanobis distance specifically comprises: calculating the distance D (Y) between samples using the Mahalanobis distancei,Yj) Wherein D (Y)i,Yj) Represents a sample YiAnd YjThe mahalanobis distance is specifically calculated as follows:
step 3.2.1: centralizing data;
Figure FDA0002786416240000063
wherein Y represents raw data, X represents centralized data,
Figure FDA0002786416240000064
mean values representing the raw data;
step 3.2.2: solving a mapping matrix; by covariance matrix
Figure FDA0002786416240000061
Solving the eigenvalue and the corresponding eigenvector;
step 3.2.3: sorting the eigenvalues in a descending order; sorting the eigenvalues solved in the step 3.2.2 in a descending order, and further sorting the eigenvectors corresponding to the eigenvalues to form an eigenvector matrix V; the matrix V represents the complete principal component space;
step 3.2.4: obtaining data in a rotation space; mapping the centralized data to a principal component space, and obtaining rotated data Z by taking Z as XV;
step 3.2.5: calculating Euclidean distance under the new coordinate, namely corresponding Mahalanobis distance between the original data;
Figure FDA0002786416240000062
the smaller the mahalanobis distance is, the higher the similarity between samples is; the larger the distance, the smaller the similarity between samples;
wherein Di(z0-zi) Is the current sample Y0And training set sample YiMahalanobis distance of; z is a radical of0Is the coordinate position of the current sample in the new coordinate system, ziIs the ith sample Y in the training setiAt the coordinate position of the new coordinate system.
8. The method for predicting the purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm according to claim 6, wherein the K value determination method for selecting the first K sample data in the step 3.4 specifically includes:
selecting and determining a K value, and determining a final K value according to the accuracy of a leave-one-out experiment of current sample data; reserving one sample as a test set sample for the sample data of the known category each time, and using other sample data as training set samples; selecting a K value, predicting by the KNN method, and counting the accuracy; the K value with the highest accuracy is the set K value;
Figure FDA0002786416240000071
and K is an odd number
Wherein, PKRepresents the average accuracy of the current K value; KNN (Y)i) Represents a sample YiAs a result of the remaining samples of the test set samples being used as training set samples, when the predicted result is associated with YiSame class of (C), KNN (Y)i) 1 is ═ 1; when the predicted result is equal to YiKNN (Y) when the categories of (C) are differenti)=0;
Figure FDA0002786416240000072
Representing the correct number of prediction results in N samples; selection of PKThe value corresponding to the maximum K is used as the final K value.
CN202011299561.8A 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance Active CN112580686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011299561.8A CN112580686B (en) 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011299561.8A CN112580686B (en) 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Publications (2)

Publication Number Publication Date
CN112580686A true CN112580686A (en) 2021-03-30
CN112580686B CN112580686B (en) 2023-05-02

Family

ID=75123094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011299561.8A Active CN112580686B (en) 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Country Status (1)

Country Link
CN (1) CN112580686B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113534454A (en) * 2021-07-12 2021-10-22 北京邮电大学 Multi-core optical fiber channel damage equalization method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609869A (en) * 2012-02-03 2012-07-25 纽海信息技术(上海)有限公司 Commodity purchasing system and method
CN107784538A (en) * 2016-08-26 2018-03-09 佛山市顺德区美的电热电器制造有限公司 The recommendation method and device of household electrical appliance
CN107862566A (en) * 2017-10-17 2018-03-30 杨明 A kind of Method of Commodity Recommendation and system
US20180285959A1 (en) * 2017-03-30 2018-10-04 Crane Merchandising Systems, Inc. Product recommendation engine for consumer interface of unattended retail points of sale
CN108647811A (en) * 2018-04-26 2018-10-12 中国联合网络通信集团有限公司 Predict that user buys method, apparatus, equipment and the storage medium of equity commodity
CN109255567A (en) * 2018-08-08 2019-01-22 北京京东尚科信息技术有限公司 Commodity part type matching process, device, system, electronic equipment and readable medium
CN110674384A (en) * 2019-09-27 2020-01-10 厦门晶欣电子有限公司 Component model matching method
US20200027103A1 (en) * 2018-07-23 2020-01-23 Adobe Inc. Prioritization System for Products Using a Historical Purchase Sequence and Customer Features
CN111652671A (en) * 2020-04-24 2020-09-11 青岛檬豆网络科技有限公司 Purchasing mall suitable for buyer market environment and purchasing method thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609869A (en) * 2012-02-03 2012-07-25 纽海信息技术(上海)有限公司 Commodity purchasing system and method
CN107784538A (en) * 2016-08-26 2018-03-09 佛山市顺德区美的电热电器制造有限公司 The recommendation method and device of household electrical appliance
US20180285959A1 (en) * 2017-03-30 2018-10-04 Crane Merchandising Systems, Inc. Product recommendation engine for consumer interface of unattended retail points of sale
CN107862566A (en) * 2017-10-17 2018-03-30 杨明 A kind of Method of Commodity Recommendation and system
CN108647811A (en) * 2018-04-26 2018-10-12 中国联合网络通信集团有限公司 Predict that user buys method, apparatus, equipment and the storage medium of equity commodity
US20200027103A1 (en) * 2018-07-23 2020-01-23 Adobe Inc. Prioritization System for Products Using a Historical Purchase Sequence and Customer Features
CN109255567A (en) * 2018-08-08 2019-01-22 北京京东尚科信息技术有限公司 Commodity part type matching process, device, system, electronic equipment and readable medium
CN110674384A (en) * 2019-09-27 2020-01-10 厦门晶欣电子有限公司 Component model matching method
CN111652671A (en) * 2020-04-24 2020-09-11 青岛檬豆网络科技有限公司 Purchasing mall suitable for buyer market environment and purchasing method thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113534454A (en) * 2021-07-12 2021-10-22 北京邮电大学 Multi-core optical fiber channel damage equalization method and system

Also Published As

Publication number Publication date
CN112580686B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
KR100249055B1 (en) Character recognition apparatus
US7584188B2 (en) System and method for searching and matching data having ideogrammatic content
CN110188223B (en) Image processing method and device and computer equipment
EP0608708B1 (en) Automatic handwriting recognition using both static and dynamic parameters
US7092567B2 (en) Post-processing system and method for correcting machine recognized text
US5787197A (en) Post-processing error correction scheme using a dictionary for on-line handwriting recognition
JPH0728949A (en) Equipment and method for handwriting recognition
JPH05217024A (en) Statistical mixing method for automatic handwritten-character recognition
CN1021937C (en) Data recognising device
Bussemaker et al. Regulatory element detection using a probabilistic segmentation model.
CN108830295B (en) Multivariate time sequence classification method based on multi-time scale echo state network
CN115186665B (en) Semantic-based unsupervised academic keyword extraction method and equipment
CN112580686B (en) KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance
JP2003524258A (en) Method and apparatus for processing electronic documents
CN113420546A (en) Text error correction method and device, electronic equipment and readable storage medium
CN112182406A (en) Mechanical design scheme recommendation method and device
JP4066507B2 (en) Japanese character recognition error correction method and apparatus, and recording medium on which error correction program is recorded
CN115170868A (en) Clustering-based small sample image classification two-stage meta-learning method
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
CN112309511B (en) Parameter decomposition and purchase prediction method for tantalum electrolytic capacitor
CN113408536A (en) Bill amount identification method and device, computer equipment and storage medium
JP2002183667A (en) Character-recognizing device and recording medium
CN112580685B (en) MLCC (multilayer ceramic capacitor) capacitor parameter matching method
JP3275704B2 (en) Input character string guessing recognition device
US11315351B2 (en) Information processing device, information processing method, and information processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant