CN112580686B

CN112580686B - KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Info

Publication number: CN112580686B
Application number: CN202011299561.8A
Authority: CN
Inventors: 郑鑫; 陈建琪; 徐楠楠
Original assignee: Qingdao Mengdou Network Technology Co ltd
Current assignee: Qingdao Mengdou Network Technology Co ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2023-05-02
Anticipated expiration: 2040-11-19
Also published as: CN112580686A

Abstract

The invention provides a KNN algorithm aluminum electrolytic capacitor purchase prediction method based on a Markov distance, which is characterized by comprising the following steps: step 1: the method comprises the steps of confirming key parameters of an aluminum electrolytic capacitor through material analysis of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set for confirming key parameter items of the aluminum electrolytic capacitor, and finally confirming the key parameter items of the aluminum electrolytic capacitor through parameter items appearing in the frequent item set; step 2: according to the decomposition rules of material description, a parameter matching scheme is determined, and a usable product range is determined; step 3: and a prediction purchase method based on a KNN algorithm of the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a list of products which can be purchased is primarily screened for the user. The method can realize the product confirmation and purchase prediction functions of the aluminum electrolytic capacitor. Time can be saved for the user, work efficiency is improved, and platform experience is improved.

Description

KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Technical Field

The invention relates to the technical field of aluminum electrolytic capacitor decomposition and identification, in particular to a KNN algorithm aluminum electrolytic capacitor purchase prediction method based on a Mahalanobis distance.

Background

In the electronic component industry, particularly, electronic components such as resistors and capacitors, have a low manufacturing threshold, and thus, a large number of brands and series of electronic components are produced, and a large number of suppliers are involved. At present, the problem of product determination of the aluminum electrolytic capacitor is generally solved by adopting the prior factory model to directly position the product or whether the description of the aluminum electrolytic capacitor is directly consistent with the symbol comparison in the database of the aluminum electrolytic capacitor so as to judge whether the product is a product required by a user. Direct positioning by the original model is a more accurate method. However, the method of direct positioning is not suitable for material description, and the direct operation of the material description is the same, the inclusion relation and the like are not accurate and comprehensive enough for a user to determine the product, and meanwhile, the requirements on various descriptions of the products in the database are high and the requirements on writing and expression are comprehensive.

To give the user a better purchase experience, better knowledge of the user's buying habits and buying criteria. When the user has the past purchasing habit on the platform and has no unique requirement on the purchased product (the unique requirement refers to the fact that the user designates the factory model, if the user designates the factory model, the user can directly position the product for the user without determining the product), and the product which meets the condition can be provided for the user according to the past purchasing condition of the user.

Disclosure of Invention

The purpose of the invention is that: aiming at the problem of difficulty in aluminum electrolytic capacitor decomposition recommendation, the invention provides a parameter decomposition method of an aluminum electrolytic capacitor and a KNN (K-NearestNeighbor, K nearest neighbor algorithm, KNN for short) method based on a Mahalanobis distance, provides a purchase prediction of a product at a current user, primarily screens a product list which is likely to be purchased for the user, saves time for the user, improves working efficiency and maximizes platform experience.

In order to solve the problems, the invention adopts the following technical scheme:

the KNN algorithm aluminum electrolytic capacitor purchase prediction method based on the Mahalanobis distance is characterized by comprising the following steps of:

step 1: the method comprises the steps of confirming key parameters of an aluminum electrolytic capacitor through material analysis of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set for confirming key parameter items of the aluminum electrolytic capacitor, and finally confirming the key parameter items of the aluminum electrolytic capacitor through parameter items appearing in the frequent item set;

step 2: according to the decomposition rules of material description, a parameter matching scheme is determined, and a usable product range is determined;

step 3: and a prediction purchase method based on a KNN algorithm of the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a list of products which can be purchased is primarily screened for the user.

Further, in the step 1, the material analysis of the aluminum electrolytic capacitor is used for confirming the key parameters of the aluminum electrolytic capacitor, and the specific steps include:

step 1.1: collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor;

step 1.2: cleaning data: clearing blank or material description of the aluminum electrolytic capacitor with only Chinese character parts, and converting digital representation in the material description into uniform numerical values, wherein the data set of the aluminum electrolytic capacitor after cleaning is D;

step 1.3: counting all character sets W of aluminum electrolytic capacitors ₁ Performing de-duplication;

step 1.4: the frequent item set S is extracted, and the principle is as follows: if a certain item set is a frequent item set, then all non-empty subsets thereof are frequent as well; conversely, if an item set is infrequent, then all its supersets are also infrequent;

step 1.5: and extracting character strings representing parameters with higher occurrence frequency according to the character strings extracted by the set S, and finally determining key parameter items of the aluminum electrolytic capacitor.

Further, the step of extracting the frequent item set S in step 1.4 includes:

step 1.4.1: will character set W ₁ The characters in the two are combined pairwise to form a set W ₂ Set W ₂ If the number of occurrence times is larger than L, the character strings in the data set D are subsets of frequent item sets, and the character string set is T ₂ The method comprises the steps of carrying out a first treatment on the surface of the The L is determined according to the number of the data sets, wherein L=Nx10%, and N is the number of the data pieces in the data sets; if not, the item sets containing the character string are not frequent item sets;

step 1.4.2: will T ₂ And W is ₁ Combining, T ₂ The character string in (a) is in front, W ₁ Thereafter, a set W of character strings is formed ₃ The method comprises the steps of carrying out a first treatment on the surface of the Set W ₃ If the number of occurrence times is larger than L, the subset of frequent item sets is the character string set of T ₃ The method comprises the steps of carrying out a first treatment on the surface of the If not, the item sets containing the character string are not frequent item sets;

step 1.4.3: t (T) ₃ And W is ₁ Combining, repeating step 1.4.2, and searching frequent item setsOperation to obtain a string set T ₄ The method comprises the steps of carrying out a first treatment on the surface of the In this way, circulate until T _n When the air is empty, ending the cycle;

step 1.4.4: set w= [ T ] of string item sets _n-1 ,…,T ₃ ,T ₂ ,T ₁ ,W ₁ ]；

Step 1.4.5: counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the matrix FW comprises the character strings and the corresponding occurrence times;

step 1.4.6: merging character strings: when character strings are combined, character strings with shorter lengths start to be upwards processed; if the number of times of occurrence of the long character string is greater than or equal to that of the short character string, removing the short character string and the corresponding occurrence number of times in FW, wherein the number of times of occurrence of the long character string is the sum of the number of times of all the super-set character strings nearest to the short character string; if the number of times of occurrence of the long character string is smaller than that of the short character string, modifying the corresponding number of times of the short character string to be: the number of occurrences of the short string-the number of occurrences of the long string, where the number of occurrences of the long string is the sum of the number of occurrences of all superset strings nearest to the short string;

Step 1.4.7: and sequencing, namely sequencing the character strings from high to low according to the occurrence times of the character strings, wherein the occurrence times of the character strings exceed 30% of the total number N of the data lump, so as to form a set S.

Further, the determining a parameter matching scheme according to the decomposition rule of the material description in the step 2, and determining a usable product range specifically includes:

step 2.1: determining user input, and describing the model and materials of a former factory;

step 2.2: if the step 2.1 is determined to be the original factory model through the database of the platform, the following steps of product determination are directly skipped, and the product is directly positioned; if the input is determined to be the material description in the step 2.1, the process is performed downwards;

step 2.3: identifying the category: checking whether the material description contains a class name and an alias in the corresponding data, and finally confirming the class of the material description;

step 2.4: determining parameters of an aluminum electrolytic capacitor: in the material description of the aluminum electrolytic capacitor accumulated by the platform, referring to the step 1, the statistics shows that the key parameters for determining the aluminum electrolytic capacitor product are as follows:

(1) Capacitance value: the float type number is unified in UF, and the unit is not displayed;

(2) Rated voltage: the float type value is unified in unit of V and is not displayed in unit;

(3) Precision: the float type value is unified in units which are not displayed;

(4) Life span: the int type value is unified in HRS, and the unit is not displayed;

(5) Operating temperature: the character type, the high working temperature value and the low working temperature value are both int type values, and the low working temperature value and the high working temperature value are connected through a symbol '/' so as to finally return to the character type;

(6) Diameter: the float type value is unified in MM and is not displayed in units;

(7) Height: the float type value is unified in MM and is not displayed in units;

(8) Foot distance: the float type value is unified in MM and is not displayed in units;

(9) The installation mode is as follows: a character type;

the output format of the parameter items is the same as the data format of the product parameter items after unification in the database;

step 2.5: unified symbol: partial symbols are replaced uniformly, so that subsequent operation and parameter extraction operation are facilitated;

step 2.6: the extraction and installation modes are as follows: the installation mode has a corresponding relation between description words, and the installation mode is extracted according to the corresponding relation; the installation mode is reserved, the vocabulary corresponding to the installation mode is extracted, the vocabulary corresponding to the material description is deleted, the extracted information is ensured, and the repeated application is avoided;

Step 2.7: extracting precision; recording, extracting and original representing characters, and updating material description;

step 2.8: unified distance symbol: unifying the distance units into MM, and simultaneously storing the value after unifying the distances and the original characters, namely recording the corresponding relation between the data and the original data;

step 2.9: extracting diameter, height and foot distance; the extracted material descriptions are all material descriptions after characters are capitalized, and parts except numbers and letters X are replaced by spaces;

step 2.10: the description about the capacity value, voltage, temperature and time in the material description is uniformly replaced by the description of the characters which are uniform respectively; after conversion, storing the numerical value and the corresponding character;

step 2.11: extracting the capacitance, voltage and service life; and updating the material description;

step 2.12: extracting a temperature range; and updating the material description;

step 2.13: extracting the capacitance value represented by the special symbol of the capacitor; if the capacity value is not successfully extracted in the step 2.11, the step is entered, the capacity value is further extracted, and after the extraction is successful, the material description is updated;

step 2.14: extracting the capacitance value, the voltage and the precision of a scientific counting method; if the capacity value or the voltage or the precision is not successfully extracted before the step, the step is entered for extraction;

Step 2.15: extracting the capacity value of the pure digital representation; if the above steps are not successful in extracting the capacity value, entering the step to extract the capacity value; extracting a pure numerical value without letters before and after the material description, outputting the numerical value as a capacity value, and updating the material description;

step 2.16: extracting the precision of the letter representation; if the steps are unsuccessful in extracting the precision, entering the step to extract the precision; the letters which appear singly in the material description, namely, the letter symbols which are not adjacent to the letters are arranged in the front and the back, the first letter appears firstly and is input as precision, and the material description is updated;

step 2.17: extracting the foot distance; if the step is not successful in extracting the foot distance, entering the step to extract the foot distance; the remaining numbers in the extract description and the absence of immediately preceding and following letters, if the value size is between [1,50] and at most only one decimal place can represent the size, outputting the value as a foot distance;

step 2.18: confirming a parameter item; if the necessary parameter items are extracted successfully, entering the next step to confirm the product; otherwise, reminding the user of parameter missing and complementing corresponding parameter items;

step 2.19: confirming a product; and determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the purchase possibility of the purchasable products by using a KNN algorithm based on the Mahalanobis distance.

Further, the necessary parameter items of the aluminum electrolytic capacitor in step 2.18 include: capacitance, rated voltage, accuracy, operating temperature, diameter/length, foot distance.

Further, the predicting purchasing method of the KNN algorithm based on the mahalanobis distance in the step 3 provides a purchasing prediction of the product at the current user, and the specific steps include:

step 3.1: data preparation:

data preparation is carried out through purchase records of users, so that data samples of different categories including purchase or non-purchase are kept in a ratio of 1:1; setting the number of samples as N, and simultaneously ensuring that the number of samples is larger than the characteristic dimension L of the samples;

step 3.2: calculating the distance:

calculating the distance between the current sample to be classified and each sample in the classified samples in the training set; calculating the distance between samples by using the mahalanobis distance;

step 3.3: sorting the distances;

according to the distance D between the current sample to be classified and each sample in the training set _i I=1, 2, … …, N, and are arranged in ascending order according to the distance;

step 3.4: neighbor samples are determined.

Selecting the first K sample data as neighbor samples of the current sample to be classified according to the ordered distance;

step 3.5: counting the number of category attributes of the neighbor samples;

Counting categories of K neighbor sample dataAttribute, category w ₁ The number of samples of (a) is t ₁ Category w ₂ The number of samples of (a) is t ₂ ；

Step 3.6: determining the category attribute of a sample to be classified;

when t ₁ >t ₂ When the sample data to be classified is classified, the category attribute is w ₁ I.e. purchased;

when t ₁ <t ₂ When the sample data to be classified is classified, the category attribute is w ₂ I.e., not purchased;

step 3.7: and adding the product forecast result which can be purchased as a purchased product to a list of recommended purchases for the user.

Further, the calculating the distance between the samples using the mahalanobis distance in step 3.2 specifically includes: calculating the distance D (Y) between samples using the Markov distance _i ,Y _j ) Wherein D (Y _i ,Y _j ) Representing sample Y _i And Y _j The specific calculation mode is as follows:

step 3.2.1: data centralization;

wherein Y represents the original data, X represents the data after centering,

representing an average value of the raw data;

step 3.2.2: solving a mapping matrix; from covariance matrix

Solving the eigenvalue and the corresponding eigenvector;

step 3.2.3: sorting the characteristic values in a descending order; the feature values solved in the step 3.2.2 are arranged in a descending order, and feature vectors corresponding to the feature values are ordered to form a feature vector matrix V; matrix V represents the complete principal component space;

Step 3.2.4: obtaining data in a rotation space; mapping the centralized data into a principal component space, wherein Z=XV is used for obtaining rotated data Z;

step 3.2.5: calculating Euclidean distance under the new coordinates, namely corresponding Markov distance between the original data;

/>

the smaller the mahalanobis distance, the higher the similarity between samples; the greater the distance, the less similarity between samples;

wherein D is _i (z ₀ -z _i ) Is the current sample Y ₀ And sample Y in training set _i Is the mahalanobis distance; z ₀ Is the coordinate position, z, of the current sample in the new coordinate system _i Is the ith sample Y in the training set _i At the coordinate position of the new coordinate system.

Further, the first K sample data are selected in the step 3.4, where the K value determining method specifically includes:

selecting and determining a K value, and determining a final K value through the accuracy of a leave-one-out experiment of the current sample data; taking one sample of known class of sample data each time as a test set sample, and taking other sample data as training set samples; k value is selected, prediction is carried out through the KNN method, and accuracy is counted; the K value with the highest accuracy is the set K value;

is odd number

Wherein P is _K Representing the average accuracy of the current K value; KNN (Y) _i ) Representing sample Y _i As a result when the remaining samples of the test set sample are used as training set samples, when the predicted result is equal to Y _i When the classification of (3) is the same, KNN (Y _i ) =1; when the predicted result is equal to Y _i KNN (Y) when the classification is different _i )＝0；

Representing the correct number of prediction results in N samples; select P _K The value corresponding to K at maximum is the final value of K.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

(1) And (3) for confirming key parameter items of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set, and finally confirming the key parameter items of the aluminum electrolytic capacitor through the parameter items in the frequent item set. And in the process of statistics and extraction of the key parameters, the confirmed key parameters are less in manual participation, the confirmation result is objective and accurate, and the requirements of customers in actual product selection are met. For the confirmation parameters of the aluminum electrolytic capacitor, determining the decomposed signs and rules of the parameters.

(2) Because the prepared data respectively belong to scores or data in different aspects, due to different measurement units or scoring standards in the original data, if a direct distance calculation mode is adopted, data in certain dimensions can play a small or even no role in distance calculation, and certain dimensions play a large or even overlarge role in calculation. Therefore, the mahalanobis distance is used as measurement calculation, the mahalanobis distance is not controlled by dimension, and the mahalanobis distance between two samples is irrelevant to the measurement unit and the measurement standard of the original sample data. While mahalanobis distance can take into account the correlation between the individual data and is scale independent, i.e. independent of the unit of measure and the scoring criteria. The mahalanobis distance calculated by the standardized data and the centralized data is the same, and no calculation error exists. The mahalanobis distance also eliminates the correlation interference between variables.

(3) KNN is classified by predicting the distance between different samples. The idea is that: if a sample most closely spaced in the feature space from the K most similar samples (i.e., closest in the feature space) belongs to a class, then the sample also belongs to that class. The training time of the KNN method is low in complexity and is only O (n); no assumption exists on the data, the accuracy is high, and the data is insensitive to abnormal points; new data can be directly added into the data set without retraining, the data is added flexibly, and the data model is not required to be replaced regularly.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

Fig. 1 is a flowchart of an aluminum electrolytic capacitor purchase prediction method based on a KNN algorithm of mahalanobis distance according to an embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides a KNN algorithm aluminum electrolytic capacitor purchase prediction method based on a Markov distance, which comprises the following steps: (1) confirming parameters of an aluminum electrolytic capacitor; (2) product determination of aluminum electrolytic capacitor; (3) Purchase prediction of the KNN algorithm based on mahalanobis distance three main steps. The method comprises the steps of confirming key parameters of an aluminum electrolytic capacitor through material analysis of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set for confirming key parameter items of the aluminum electrolytic capacitor, and finally confirming the key parameter items of the aluminum electrolytic capacitor through parameter items appearing in the frequent item set; according to the decomposition rules of material description, a parameter matching scheme is determined, and a usable product range is determined; and a prediction purchase method based on a KNN algorithm of the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a list of products which can be purchased is primarily screened for the user. The method is described in detail below.

(1) Parameter confirmation of aluminum electrolytic capacitor

Step 1.1: and collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor.

Step 1.2: and (5) cleaning data. And clearing the blank or the material description of the aluminum electrolytic capacitor with only Chinese character parts, and converting the numerical representation in the material description into a uniform numerical value (namely, the combination of different numerical values and the same character, wherein the meaning of the representation is the same, namely, the 25V and the 50V are the represented voltage values). The data set of the aluminum electrolytic capacitor after cleaning is D.

Step 1.3: and counting a character set W1 (except Chinese or English symbols, reserved mathematical expression symbols such as the symbols, +/-and the like) of all the aluminum electrolytic capacitors, and performing duplication elimination.

Step 1.4: the frequent item sets S are extracted (the principle is that if a certain item set is a frequent item set, then all its non-empty subsets are frequent (a priori principle) i.e. if 0,1 is frequent, then 0,1 must be frequent, conversely, i.e. if a item set is not frequent, then all its supersets are also not frequent.

Wherein, the step of extracting the frequent item set S includes:

step 1.4.1: will character set W ₁ The characters in the two are combined pairwise to form a set W ₂ . Set W ₂ If the number of occurrences is greater than L (L is determined according to the number of data sets, L=Nx10%, where N is the number of data pieces in the data set), it is a subset of frequent item sets, and its string set is T ₂ The method comprises the steps of carrying out a first treatment on the surface of the If not, none of the item sets containing the character string is a frequent item set.

Step 1.4.2: will T ₂ And W is ₁ Combining, T ₂ The character string in (a) is in front, W ₁ Thereafter, a set W of character strings is formed ₃ . Set W ₃ If the number of occurrence times is larger than L, the subset of frequent item sets is the character string set of T ₃ The method comprises the steps of carrying out a first treatment on the surface of the If not, none of the item sets containing the character string is a frequent item set.

Step 1.4.3: t (T) ₃ And W is ₁ Combining, repeating the operation of searching frequent item sets in the step 1.4.2 to obtain a character string set T ₄ . In this way, circulate until T _n When empty, the cycle is ended.

Step 1.4.4: set w= [ T ] of string item sets _n-1 ,…,T ₃ ,T ₂ ,T ₁ ,W ₁ ]。

Step 1.4.5: counting the occurrence times. Counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the matrix FW comprises the character strings and the corresponding occurrence times.

Step 1.4.6: and merging the character strings. When character strings are combined, character strings with shorter lengths start to be upwards performed. The method comprises the steps of removing a short string and the corresponding occurrence number (the occurrence number of the long string is the sum of the times of all the superset strings nearest to the short string) of the FW if the occurrence number of the long string is greater than or equal to the short string; if the number of occurrences of the long string is smaller than the short string, the corresponding number of occurrences of the short string is modified to be the number of occurrences of the short string-the number of occurrences of the long string (here, the number of occurrences of the long string is the sum of the numbers of occurrences of all the superset strings that the short string has latest). If the FW has 5pf,5p, 5V character strings, the most recent superset character string corresponding to the short character string 5 here is 5p and 5V, and the most recent superset character string corresponding to the short character string 5p is 5pf.

Step 1.5: and extracting character strings (such as 50PF, which represents a capacitance value; 50V, which represents a voltage) representing parameters with higher occurrence frequency according to the character strings extracted by the set S, and finally determining key parameter items of the aluminum electrolytic capacitor.

The following parameter confirmation steps for the aluminum electrolytic capacitor of step (1) are exemplified as follows:

examples: because the data set is larger, the example display is more complex, and the next example is only displayed by displaying the keyword step or the assumed condition data of the aluminum electrolytic capacitor parameter confirmation.

Step 1.1: for example, there are three character strings of '1000uF/25v,10 x 20, 105 ℃,20%,5', '2200uF/35v,16 x 25, 105 ℃,20%,7.5, 6000HRS', 'capacitance';

step 1.2: the cleaning data, here the number is replaced collectively with 5. The data obtained after washing are:

step 1.3: statistics of the present character set W ₁ 。

W ₁ ＝{R,V,S,u,％,F,5,H,℃,*}

Step 1.4: the frequent item set S is extracted.

Step 1.4.1: from W ₁ Obtaining W ₂ The value of L is 0.2, and T can be obtained ₂ The following are provided:

W ₂ ＝{RR,RV,RS,Ru,R％,RF,R5,RH,R℃,R*,VR,VV,VS,Vu,……}

T ₂ ＝{RS,uF,5V,5u,5％,5H,5℃,5*,HR,*5}

step 1.4.2: from T ₂ And W is ₁ Obtaining W ₃ Thereby obtaining T ₃ The following are provided:

W ₃ ＝{RSR,RSV,RSS,RSu,RS％,RSF,RS5,RSH,RS℃,RS*,……}

T ₃ ＝{5uF,5HR,5*5,HRS}

step 1.4.3: from T ₃ And W is ₁ Obtaining W ₄ Thereby obtaining T ₄ Sequentially circulate until T _n Is empty as follows:

W ₄ ＝{5uFR,5uFV,5uFS,5uFu,5uF％,5uFF,5uF5,5uFH,5uF℃,5uF*,……}

T ₄ ＝{5HRS}

W ₅

＝{5HRSR,5HRSV,5HRSS,5HRSu,5HRS％,5HRSF,5HRS5,5HRS℃,……}

T ₅ ＝{}

T ₅ and (5) being empty, and ending the cycle.

Step 1.4.4: a set W of string item sets is obtained as follows:

W＝{5HRS,5uF,5HR,5*5,HRS,RS,uF,5V,5u,5％,5H,5℃,5*,HR,*5,R,V,S,u,％,F,5,H,℃,*}

step 1.4.5: counting the occurrence times of the character strings to obtain FW.

FW＝{{5HRS,1},{5uF,2},{5HR,1},{5*5,2},{HRS,1},{RS,1},{uF,2},{5V,2},{5u,2},{5％,2},{5H,1},{5℃,2},{5*,2},{HR,1},{*5,2},{R,1},{V,2},{S,1},{u,2},{％,2},{F,2},{5,15},{H,1},{℃,2},{*,2}}

Step 1.4.6: and merging the character strings. This step should be performed starting from a shorter length string. The example here is the main line: 5→F→u→5u→uf→5uF are illustrated by way of example and do not proceed fully according to step 1.46 in the extraction frequent item set S.

(1) The string '5' occurs 15 times, the most recent superset of which is '5V', '5u', '5%', '5H', '5°c', '5 x', the sum of the times occurring is: 2+2+2+1+2+2=13, then modify '5' in FW and its corresponding number of times to {5,2}.

(2) The character string 'F' has the occurrence number of 2, the latest superset is 'uF', and the occurrence number of 2, and the 'F' and the corresponding times thereof in FW are removed.

(3) The number of occurrences of the character string 'u' is 2, the most recent superset thereof is 'uF', '5u', and the total number of occurrences is 2+2=4, and then 'u' and the corresponding number thereof in FW are removed.

(4) The character string 'uF' has a frequency of occurrence of 2, the most recent superset is '5uF', and the frequency of occurrence is 2, and the 'uF' and the corresponding frequency thereof in FW are removed.

FW after merging the character strings according to the step 1.4.6 of extracting the frequent item set S is as follows:

FW＝{{5HRS,1},{5uF,2},{5*5,2},{5V,2},{5％,2},{5℃,2},{5,2}}

step 1.4.7, sequencing. 30% of the total number N is 0.6, then the same applies, S= { {5uF,2}, {5 x 5,2}, {5V,2}, {5%,2}, {5 ℃,2}, {5HRS,1 }.

Step 1.5: according to the character strings displayed in the set S, for example, the character strings representing the parameters with higher occurrence frequency are extracted, and the meaning of the parameter items is determined, for example: 5uF represents capacitance; 5*5 the diameter (length) and height (width); 5V represents a voltage; 5% represents the precision; 5 ℃ represents the working temperature; 5HRS indicates lifetime.

(2) Product determination of aluminum electrolytic capacitor

Step 2.1: user input is determined, a factory model (including a factory model only and a material description containing the factory model, the factory model mentioned below contains both cases), a material description (the material description mentioned below is a material description which does not succeed in identifying the factory model, and two cases exist, namely, the description is a simple material description, and the description is a description containing the factory model which is not contained in a platform).

Step 2.2: if the step 2.1 is determined to be the former factory model through the database of the platform, the following steps for determining the product can be directly skipped, and the product can be directly positioned. If it is determined in step 2.1 that the input is a material description, then proceed downwards.

Step 2.3: and (5) confirming the category. And checking whether the material description contains the category names, aliases and the like in the corresponding data, and finally confirming the category of the material description. If the material description of the aluminum electrolytic capacitor is adopted, the following parameter decomposition process is met.

Step 2.4: and determining parameters of the aluminum electrolytic capacitor. In the material description of the aluminum electrolytic capacitor accumulated by the platform, the statistics result in determining the critical parameters of the aluminum electrolytic capacitor product (the statistical process is described in the step of confirming the parameters of the aluminum electrolytic capacitor, namely 1), and the following table is shown:

TABLE 1 Key parameters table for aluminium electrolytic capacitor

(1) Capacitance value: the float type number, in UF (microfarads) units, are not shown.

(2) Rated voltage: float type number. The unit is V (volts) and is not shown.

(3) Precision: float type number. The unified unit is% and the unit is not shown.

(4) Life span: int type value. The unit is HRS (hours) and the unit is not shown.

(5) Operating temperature: the character type, the high value of the working temperature and the low value of the working temperature are both int type values, and the low value of the working temperature and the high value of the working temperature are connected through a symbol '/' so as to finally return to the character type.

(6) Diameter (length): float type number. The unit is MM (millimeters) and is not shown.

(7) Height (width): float type number. The unit is MM (millimeters) and is not shown.

(8) Foot distance: float type number. The unit is MM (millimeters) and is not shown.

(9) The installation mode is as follows: character type.

The output format of the parameter items is the same as the data format of the product parameter items after unification in the database.

Step 2.5: unified symbols. And certain symbols are replaced and unified, so that subsequent operations, parameter extraction and the like are facilitated. And replacing the symbols in the original material description according to a symbol correspondence table, wherein the table is a part of symbol correspondence.

Table 2 symbol correspondence table

	1	2	3	4	5	6	7	8	9	10	……
												Original character	μ	～	——	％	Positive and negative	0hm	-/+	-\+	+-	_	……
Replacement character	u	-	-	％	±	ohm	±	±	±	-	……

Step 2.6: and (5) extracting an installation mode. The installation modes have corresponding relations (shown in table 3) among the description words, and the installation modes are extracted according to the corresponding relations. The installation mode and the vocabulary corresponding to the installation mode are reserved (the operation is that the following steps are recorded and extracted with the original representation characters), meanwhile, the corresponding vocabulary in the material description is deleted, the extracted information is ensured to be no longer reused (the action of deleting the extracted information character in the material description is hereinafter collectively referred to as updating the material description).

Table 3 correspondence (part) between the installation mode and the description vocabulary

/>

Step 2.7: and (5) extracting accuracy. For descriptions where only one precision appears, such as descriptions containing only 10% or + -10, etc., directly extracting '10.0' as the output of the precision; for descriptions where two or more or ranges of precision appear in the description, such as 10%20% or 10-20% or + -10-20% or the like, the '10.0/20.0' is extracted as output. And recording, extracting and original representation characters, and updating material description.

Step 2.8: unify distance symbols. The MM, CM, DM, rice, DM, CM, MM are given equal distance units. After a uniform distance unit, a new material description 5mm by 3mm is obtained, and a value (float value, such as 5.0) and an original character (such as 5 mm) after the uniform distance are stored, namely, the corresponding relation between the recorded data and the original data is also stored.

Step 2.9: diameter (length), height (width), foot distance are extracted. The extracted material description is the material description after the characters are capitalized, the parts except the numbers and the letters X are replaced by spaces, 7X8X6 or 7X8X6 continuous multiplication form fields are extracted in the first step, numerical value parts are extracted, the first numerical value is the diameter (length), the second numerical value is the height (width) and the third numerical value is the foot distance. If the first extraction fails, extracting 7*8 or 7X8 as a continuous multiplication form field, and extracting numerical values in the continuous multiplication form field, wherein the numerical values extracted by the method are the diameter (length) and the height (width) of the first numerical value; extracting values immediately following ' pitch= ', PITCH- ', ' PITCH ', ' p= ', ' P- ' and the like, wherein the values are foot distances. And when the character is extracted, the corresponding character of the character in the original material description is confirmed through the position of the extracted corresponding character. The original description as entered at this step is: '10pf,50v,7mm x 8mm x 6mm', the treated material was described as '10, 50, 7x8x 6', the diameter after extraction was 7.0, the height was 8.0, the foot distance was 6.0, and the corresponding characters were 7mm x 8mm x 6mm.

Step 2.10: and uniformly replacing the descriptions about the capacity value, the voltage, the temperature and the time in the material description with the respective uniform characters for description. The same as the conversion of the distance unit in step 2.8. For example, '50uf,80uf,500v', the value of the transformed value is stored as [50.0,80.0], and the original character represented by the value is [ '50uf', '80uf' ]; the value for the voltage is stored as [500.0], the original character represented by the voltage is [ '500V' ]; the material description after converting the replacement character is updated to '50.0UF,80.0UF,500.0V'

After conversion, the numerical values and the corresponding characters are stored. The correspondence of the converted unit symbols of the respective units and the unified symbol is given in the following table.

Table 4 correspondence (part) between units and units identical to each other

	Unit (B)	Unified unit representation
			Capacitance value	MF,UF,NF,PF,F	UF
Temperature (temperature)	Degree centigrade, degree centigrade	℃
			Time	Hour, HOURS, HOUR, HRS	HRS
Voltage (V)	MV,UV,VAC,KV,V,VDC	V

Step 2.11: extracting capacitance, voltage and service life. Taking the extraction capacity value as an example for explanation: if the value related to the capacity value exists in the step 2.10, the maximum value is extracted and output as the capacity value. And extracting the original character basis of the capacity value corresponding to the numerical value. Update the material description (update all about the part of the capacity). If the capacity value is extracted from [50.0,80.0], returning to 80.0; the original character is 80uf; the material description was updated to 500.0V.

Step 2.12: extraction temperature range. Case 1: in step 2.10, two values related to temperature are extracted, and the first value is taken as the lowest working temperature, the second value is taken as the highest working temperature, and the connection output is carried out by the symbol '/'. Case 2: in the step 2.10, only one value related to temperature is extracted, and whether the similar value 25-150 ℃ exists in the material description, namely, the value immediately before the similar value is the lowest working temperature, and if the similar value exists, the temperature range is output for representation; if not, inquiring the value similar to '25-150 ', namely, immediately adjacent to the value after '25-150 ℃, as the highest working temperature, and if so, outputting a working range representation; if not, outputting the value of the current temperature representation. And records its original representation. Updating the material description.

Step 2.13: the capacitance value of the capacitor special symbol representation is extracted. If the extraction of the capacity value is not successful in the step 2.11, the step is entered, the capacity value is further extracted, and after the extraction is successful, the material description is updated. Extracting the similar capacity value representations of '5U3', '5U', 'U5', and the like, and extracting the capacity value representations with the units of UF as numerical values of 5.3, 5 and 0.5 respectively. The letters representing the special symbols of the capacitor and the corresponding relation with the units are shown in the following table:

TABLE 5 correspondence of capacitance special symbols to capacitance units

	1	2	3	4	5
						Special character	P	N	U	M	R
Corresponding units	PF	NF	UF	UF	PF

Step 2.14: and extracting the capacitance, voltage and precision of the scientific counting method. If the capacity value or the voltage or the precision is not successfully extracted before the step, the step is entered for extraction. Fields like '104k 500', '104', '500', i.e. combinations of three-bit integers and letters or mere three-bit integer combinations, are extracted. Wherein for a combination of letters with a letter belonging to the letter set representing precision, then the first combination meeting the condition, the integer being a scientific count representation of the value, the letter being a letter representation of precision; if only one digital combination exists and only the capacitance value is not extracted currently and the voltage is extracted, the digital combination is a scientific counting method representation of the capacitance value; if only one digital combination exists and only the voltage is not extracted currently and the capacitance value is extracted, the digital combination is represented by a scientific counting method of the voltage; if the number of the digital combinations is greater than or equal to 2, the numerical value represented by the scientific counting method in the first two digital combinations is represented by a capacitance value, and the numerical value represented by the scientific counting method is represented by a voltage. The digital combination expressed by the scientific counting method comprises the following capacitance conversion modes: the first two digits of the digit combination represent a value (10 (the last digit of the digit combination represents a value of-6)); the voltage conversion mode is as follows: the first two digits of the numerical combination represent a value (the last digit of the 10-digit combination represents a value). The capacity value, voltage or precision extracted by the step only supplements the extracted information and does not replace the extracted information. The conversion of the letter and the precision representation is shown in the following table:

Table 6 correspondence represented by accuracy

	1	2	3	4	5	6	7
								Letter	J	K	M	F	G	S	Z
Precision of	5.0	10.0	20.0	1.0	2.0	20.0/50.0	20.0/80.0

Step 2.15: the capacity value of the pure digital representation is extracted. If the above steps are not successful in extracting the capacity value, the process is entered into the step to extract the capacity value. And extracting a pure numerical value without letters before and after the numerical value in the material description, outputting the numerical value as a capacity value, and updating the material description. If the material description is '2.3.2.5 s', 2.3 is extracted as the value of the capacitance value to be output.

Step 2.16: the accuracy of the alphabetical representation is extracted. If the steps are not successful in extracting the precision, the step is entered to extract the precision. If letters in the sixth table exist, the first letter in the order of the table 6 appears first, namely the letter is input as precision, and the material description is updated. For the material description 'KJ', since 'J' is the front in table 6, i.e. the accuracy of extraction in this material description is '5.0', in '%'.

Step 2.17: extracting foot distance. If the step is not successful in extracting the pitch, the step is entered to extract the pitch. The remaining numbers in the extract description, and no immediately preceding or following letters, are output as pitches if their values are between [1,50] and at most only one decimal place can represent their sizes.

Step 2.18: and confirming the parameter item. If the necessary parameter items (shown in the following table) are extracted successfully, entering the next step to confirm the product; otherwise, reminding the user of parameter missing and complementing corresponding parameter items.

TABLE 7 necessary parameters table for aluminum electrolytic capacitor

	1	2	3	4	5	6
							Aluminum electrolytic capacitor	Capacitance value	Rated voltage	Precision of	Operating temperature	Diameter (length)	Foot distance

Step 2.19: and (5) confirming the product. And determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the purchase possibility of the purchasable products by using a KNN algorithm based on the Mahalanobis distance.

(3) Purchase prediction for KNN algorithm based on Mahalanobis distance

Step 3.1: data preparation.

Data preparation is performed through the purchase records of the users, so that data samples of different categories (purchase and non-purchase) are kept at a ratio of 1:1. Let the number of samples be N while ensuring that the number of samples is greater than the characteristic dimension L of the samples.

Step 3.2: the distance is calculated.

And calculating the distance between the current sample to be classified and each sample in the classified samples in the training set. The distance D (Y) between samples is calculated here using the mahalanobis distance (the number of total samples is greater than the dimension of the sample) in consideration of the independence of the mahalanobis distance from the dimension (size) and the capability of excluding the correlation interference between the variables _i ,Y _j ). Wherein D (Y) _i ,Y _j ) Representing sample Y _i And Y _j The specific calculation mode is as follows:

and (3) calculating the mahalanobis distance:

step 3.2.1: data centralization.

Wherein Y represents the original data, X represents the data after centering,

represents the average of the raw data.

Step 3.2.2: and solving a mapping matrix. From covariance matrix

And solving the eigenvalue and the corresponding eigenvector.

Step 3.2.3: the eigenvalues are sorted in descending order. And (3) arranging the eigenvalues solved in the step (3.2.2) in a descending order, and further sequencing eigenvectors corresponding to the eigenvalues to form an eigenvector matrix V. The matrix V represents the complete principal component space.

Step 3.2.4: data in the rotation space is obtained. The centered data is mapped to the principal component space, and z=xv is used to obtain rotated data Z.

Step 3.2.5: and calculating the Euclidean distance under the new coordinates, namely the corresponding Markov distance between the original data.

/>

The smaller the mahalanobis distance, the higher the similarity between samples; the greater the distance, the less similarity between samples.

Wherein D is _i (z ₀ -z _i ) Is the current sample Y ₀ And sample Y in training set _i Is a mahalanobis distance. z ₀ Is the coordinate position, z, of the current sample in the new coordinate system _i Is the ith sample Y in the training set _i At the coordinate position of the new coordinate system.

Step 3.3: and (5) sorting the distances.

According to the distance D between the current sample to be classified and each sample in the training set _i I=1, 2, … …, N, and are arranged in ascending order according to the size of the distance.

Step 3.4: neighbor samples are determined.

According to the ordered distance, the first K sample data (K is odd, excluding the same number of two classified samples and smaller than

The method comprises the steps of firstly temporarily taking K as 11, adjusting the values of N and K according to the later use condition, taking the data of the known category as a test set, adjusting N and K according to the accuracy rate, and selecting the best value of N and K) as a neighbor sample of a current sample to be classified.

Step 3.5: and counting the number of category attributes of the neighbor samples.

Counting class attributes of K neighbor sample data, wherein the class is w ₁ The number of samples of (a) is t ₁ Category w ₂ The number of samples of (a) is t ₂ 。

Step 3.6: and determining the category attribute of the sample to be classified.

When t ₁ >t ₂ When the sample data to be classified is classified, the category attribute is w ₁ I.e. purchased.

When t ₁ <t ₂ When the sample data to be classified is classified, the category attribute is w ₂ I.e. not purchased.

Note that: and determining a K value.

And selecting and determining a K value, and determining a final K value through the accuracy of a leave-one-out experiment of the current sample data. Sample data of known classes are used as test set samples, and one sample is reserved at a time, and other sample data are used as training set samples. K value is selected, prediction is carried out through the KNN method, and accuracy is counted. The K value with the highest accuracy is the set K value.

And K is an odd number

Wherein P is _K Representing the average accuracy of the current K value. KNN (Y) _i ) Representing sample Y _i As a result when the remaining samples of the test set sample are used as training set samples, when the predicted result is equal to Y _i When the classification of (3) is the same, KNN (Y _i ) =1; when the predicted result is equal to Y _i KNN (Y) when the classification is different _i )＝0。

Indicating the correct number of predictors among the N samples. Select P _K The value corresponding to K at maximum is the final value of K.

Step 3 example: purchase prediction for KNN algorithm based on Mahalanobis distance

Step 3.1: preparing sample data

Assume that there are fourRaw sample data each comprising three dimensions of scoring data, the matrix being [ Y ] ₁ ,Y ₂ ,Y ₃ ,Y ₄ ]＝[160,60000,1；170,60000,1；160,60000,0；170,60000,0]The corresponding classification matrix is: [1,0,1,0]Where 1 indicates classification as purchase and 0 indicates classification as non-purchase.

The data to be classified is Y ₀ ＝[160,59000,1]。

Then: the original data and the data to be classified form a set Y.

Y＝[Y ₁ ,Y ₂ ,Y ₃ ,Y ₄ ,Y ₀ ]

＝[60,600,1；70,600,1；60,600,0；70,600,0；60,590,1]

Step 3.2: calculating the mahalanobis distance between the sample to be classified and the original sample data

Step 3.2.1 data centering

The calculation is made up of the steps of,

then x= [ [ -4,2,0.4], [6,2,0.4], [ -4,2, -0.6], [6,2, -0.6], [ -4, -8,0.4], ]

Step 3.2.2 solving

The eigenvalues and corresponding eigenvectors of (1) are as follows:

eigenvalues (here four-dimensional decimal display is retained): [28.9644 11.0761 0.1995]

Feature vector (here, the four-bit decimal display is retained):

step 3.2.3, the eigenvalues are arranged in descending order to obtain an eigenvector matrix V which is:

step 3.2.4 obtains data Z in the feature space (wherein the data all retain four decimal places for display).

Step 3.2.5 calculates the distance between the sample to be classified and the original sample (retaining the four-bit decimal places for display).

D＝[D ₁ D ₂ D ₃ D ₄ ]＝[10.0 14.1421 10.0499 14.1774]

I.e. the degree of similarity of the data to be classified to the samples in the original data decreases in the following order: d (D) ₀ ->D ₂ ->D ₁ ->D ₃

Step 3.3: distance ascending order

D＝[D ₁ D ₃ D ₂ D ₄ ]＝[10.0 10.0499 14.1421 14.1774]

Step 3.4: determining neighbor samples

Suppose here that K takes 3 (the amount of data is small, here not followed

I.e. 1.ltoreq.K.ltoreq.2).

The neighbor sample is the distance value D ₁ ,D ₃ ,D ₂ Corresponding original sample Y ₁ ,Y ₃ ,Y ₂ The corresponding category attributes are 1,1 and 0 respectively.

Step 3.5: counting the number of category attributes of neighbor samples

The number of samples with category 1 is 2; the number of samples of class 0 is 1.

Step 3.6: determining class attributes of a sample to be classified

Since the number of samples with the category of 1 in the neighbor samples is greater than the number of samples with the category of 0, the attribute category of the sample to be classified is classified as 1.

Step 3.7: and adding the sample to be classified into a list of recommended purchases for the user.

It should be understood that the specific order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate preferred embodiment of this invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. The processor and the storage medium may reside as discrete components in a user terminal.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. These software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.

The foregoing description includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, as used in the specification or claims, the term "comprising" is intended to be inclusive in a manner similar to the term "comprising," as interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean "non-exclusive or".

Claims

1. The KNN algorithm aluminum electrolytic capacitor purchase prediction method based on the Mahalanobis distance is characterized by comprising the following steps of:

step 3: a KNN algorithm prediction purchase method based on the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a product list which is possibly purchased is primarily screened for the user;

in the step 2, according to the decomposition rule of the material description, a parameter matching scheme is determined, and a usable product range is determined, and the specific steps include:

(3) Precision: the float type number, in units of non-display;

(9) The installation mode is as follows: a character type;

2. The method for predicting purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 1, wherein the step 1 of confirming the key parameters of the aluminum electrolytic capacitor by analyzing the materials of the aluminum electrolytic capacitor comprises the following specific steps:

3. The method for predicting purchase of aluminum electrolytic capacitor based on the KNN algorithm of mahalanobis distance as recited in claim 2, wherein the step of extracting the frequent item set S in step 1.4 includes:

step 1.4.3: t (T) ₃ And W is ₁ Combining, repeating the operation of searching frequent item sets in the step 1.4.2 to obtain a character string set T ₄ The method comprises the steps of carrying out a first treatment on the surface of the In this way, circulate until T _n When the air is empty, ending the cycle;

4. The method for predicting purchase of aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 1, wherein the necessary parameter items of the aluminum electrolytic capacitor in step 2.18 include: capacitance, rated voltage, accuracy, operating temperature, diameter/length, foot distance.

5. The method for predicting purchase of aluminum electrolytic capacitor based on the KNN algorithm of mahalanobis distance according to claim 1, wherein the predicting purchase method of the KNN algorithm based on mahalanobis distance in the step 3 provides a purchase prediction of the product at the current user, and the specific steps include:

step 3.1: data preparation:

step 3.2: calculating the distance:

Step 3.3: sorting the distances;

step 3.4: determining neighbor samples;

step 3.5: counting the number of category attributes of the neighbor samples;

counting class attributes of K neighbor sample data, wherein the class is w ₁ The number of samples of (a) is t ₁ Category w ₂ The number of samples of (a) is t ₂ ；

Step 3.6: determining the category attribute of a sample to be classified;

6. The method for predicting purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 5, wherein the first K sample data are selected in the step 3.4, wherein the K value determining method specifically comprises the following steps:

And K is an odd number

Wherein P is _K Representing the average accuracy of the current K value; KNN (Y) _i ) Representing sample Y _i As a result when the remaining samples of the test set sample are used as training set samples, when the predicted result is equal to Y _i When the classification of (3) is the same, KNN (Y _i ) =1; when the predicted result is equal to Y _i When the category of (3) is different

Representing the correct number of prediction results in N samples;

select P _K The value corresponding to K at maximum is the final value of K.