CN112580686B - KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance - Google Patents

KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance Download PDF

Info

Publication number
CN112580686B
CN112580686B CN202011299561.8A CN202011299561A CN112580686B CN 112580686 B CN112580686 B CN 112580686B CN 202011299561 A CN202011299561 A CN 202011299561A CN 112580686 B CN112580686 B CN 112580686B
Authority
CN
China
Prior art keywords
value
electrolytic capacitor
aluminum electrolytic
data
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011299561.8A
Other languages
Chinese (zh)
Other versions
CN112580686A (en
Inventor
郑鑫
陈建琪
徐楠楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Mengdou Network Technology Co ltd
Original Assignee
Qingdao Mengdou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Mengdou Network Technology Co ltd filed Critical Qingdao Mengdou Network Technology Co ltd
Priority to CN202011299561.8A priority Critical patent/CN112580686B/en
Publication of CN112580686A publication Critical patent/CN112580686A/en
Application granted granted Critical
Publication of CN112580686B publication Critical patent/CN112580686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a KNN algorithm aluminum electrolytic capacitor purchase prediction method based on a Markov distance, which is characterized by comprising the following steps: step 1: the method comprises the steps of confirming key parameters of an aluminum electrolytic capacitor through material analysis of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set for confirming key parameter items of the aluminum electrolytic capacitor, and finally confirming the key parameter items of the aluminum electrolytic capacitor through parameter items appearing in the frequent item set; step 2: according to the decomposition rules of material description, a parameter matching scheme is determined, and a usable product range is determined; step 3: and a prediction purchase method based on a KNN algorithm of the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a list of products which can be purchased is primarily screened for the user. The method can realize the product confirmation and purchase prediction functions of the aluminum electrolytic capacitor. Time can be saved for the user, work efficiency is improved, and platform experience is improved.

Description

KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance
Technical Field
The invention relates to the technical field of aluminum electrolytic capacitor decomposition and identification, in particular to a KNN algorithm aluminum electrolytic capacitor purchase prediction method based on a Mahalanobis distance.
Background
In the electronic component industry, particularly, electronic components such as resistors and capacitors, have a low manufacturing threshold, and thus, a large number of brands and series of electronic components are produced, and a large number of suppliers are involved. At present, the problem of product determination of the aluminum electrolytic capacitor is generally solved by adopting the prior factory model to directly position the product or whether the description of the aluminum electrolytic capacitor is directly consistent with the symbol comparison in the database of the aluminum electrolytic capacitor so as to judge whether the product is a product required by a user. Direct positioning by the original model is a more accurate method. However, the method of direct positioning is not suitable for material description, and the direct operation of the material description is the same, the inclusion relation and the like are not accurate and comprehensive enough for a user to determine the product, and meanwhile, the requirements on various descriptions of the products in the database are high and the requirements on writing and expression are comprehensive.
To give the user a better purchase experience, better knowledge of the user's buying habits and buying criteria. When the user has the past purchasing habit on the platform and has no unique requirement on the purchased product (the unique requirement refers to the fact that the user designates the factory model, if the user designates the factory model, the user can directly position the product for the user without determining the product), and the product which meets the condition can be provided for the user according to the past purchasing condition of the user.
Disclosure of Invention
The purpose of the invention is that: aiming at the problem of difficulty in aluminum electrolytic capacitor decomposition recommendation, the invention provides a parameter decomposition method of an aluminum electrolytic capacitor and a KNN (K-NearestNeighbor, K nearest neighbor algorithm, KNN for short) method based on a Mahalanobis distance, provides a purchase prediction of a product at a current user, primarily screens a product list which is likely to be purchased for the user, saves time for the user, improves working efficiency and maximizes platform experience.
In order to solve the problems, the invention adopts the following technical scheme:
the KNN algorithm aluminum electrolytic capacitor purchase prediction method based on the Mahalanobis distance is characterized by comprising the following steps of:
step 1: the method comprises the steps of confirming key parameters of an aluminum electrolytic capacitor through material analysis of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set for confirming key parameter items of the aluminum electrolytic capacitor, and finally confirming the key parameter items of the aluminum electrolytic capacitor through parameter items appearing in the frequent item set;
step 2: according to the decomposition rules of material description, a parameter matching scheme is determined, and a usable product range is determined;
step 3: and a prediction purchase method based on a KNN algorithm of the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a list of products which can be purchased is primarily screened for the user.
Further, in the step 1, the material analysis of the aluminum electrolytic capacitor is used for confirming the key parameters of the aluminum electrolytic capacitor, and the specific steps include:
step 1.1: collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor;
step 1.2: cleaning data: clearing blank or material description of the aluminum electrolytic capacitor with only Chinese character parts, and converting digital representation in the material description into uniform numerical values, wherein the data set of the aluminum electrolytic capacitor after cleaning is D;
step 1.3: counting all character sets W of aluminum electrolytic capacitors 1 Performing de-duplication;
step 1.4: the frequent item set S is extracted, and the principle is as follows: if a certain item set is a frequent item set, then all non-empty subsets thereof are frequent as well; conversely, if an item set is infrequent, then all its supersets are also infrequent;
step 1.5: and extracting character strings representing parameters with higher occurrence frequency according to the character strings extracted by the set S, and finally determining key parameter items of the aluminum electrolytic capacitor.
Further, the step of extracting the frequent item set S in step 1.4 includes:
step 1.4.1: will character set W 1 The characters in the two are combined pairwise to form a set W 2 Set W 2 If the number of occurrence times is larger than L, the character strings in the data set D are subsets of frequent item sets, and the character string set is T 2 The method comprises the steps of carrying out a first treatment on the surface of the The L is determined according to the number of the data sets, wherein L=Nx10%, and N is the number of the data pieces in the data sets; if not, the item sets containing the character string are not frequent item sets;
step 1.4.2: will T 2 And W is 1 Combining, T 2 The character string in (a) is in front, W 1 Thereafter, a set W of character strings is formed 3 The method comprises the steps of carrying out a first treatment on the surface of the Set W 3 If the number of occurrence times is larger than L, the subset of frequent item sets is the character string set of T 3 The method comprises the steps of carrying out a first treatment on the surface of the If not, the item sets containing the character string are not frequent item sets;
step 1.4.3: t (T) 3 And W is 1 Combining, repeating step 1.4.2, and searching frequent item setsOperation to obtain a string set T 4 The method comprises the steps of carrying out a first treatment on the surface of the In this way, circulate until T n When the air is empty, ending the cycle;
step 1.4.4: set w= [ T ] of string item sets n-1 ,…,T 3 ,T 2 ,T 1 ,W 1 ];
Step 1.4.5: counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the matrix FW comprises the character strings and the corresponding occurrence times;
step 1.4.6: merging character strings: when character strings are combined, character strings with shorter lengths start to be upwards processed; if the number of times of occurrence of the long character string is greater than or equal to that of the short character string, removing the short character string and the corresponding occurrence number of times in FW, wherein the number of times of occurrence of the long character string is the sum of the number of times of all the super-set character strings nearest to the short character string; if the number of times of occurrence of the long character string is smaller than that of the short character string, modifying the corresponding number of times of the short character string to be: the number of occurrences of the short string-the number of occurrences of the long string, where the number of occurrences of the long string is the sum of the number of occurrences of all superset strings nearest to the short string;
Step 1.4.7: and sequencing, namely sequencing the character strings from high to low according to the occurrence times of the character strings, wherein the occurrence times of the character strings exceed 30% of the total number N of the data lump, so as to form a set S.
Further, the determining a parameter matching scheme according to the decomposition rule of the material description in the step 2, and determining a usable product range specifically includes:
step 2.1: determining user input, and describing the model and materials of a former factory;
step 2.2: if the step 2.1 is determined to be the original factory model through the database of the platform, the following steps of product determination are directly skipped, and the product is directly positioned; if the input is determined to be the material description in the step 2.1, the process is performed downwards;
step 2.3: identifying the category: checking whether the material description contains a class name and an alias in the corresponding data, and finally confirming the class of the material description;
step 2.4: determining parameters of an aluminum electrolytic capacitor: in the material description of the aluminum electrolytic capacitor accumulated by the platform, referring to the step 1, the statistics shows that the key parameters for determining the aluminum electrolytic capacitor product are as follows:
(1) Capacitance value: the float type number is unified in UF, and the unit is not displayed;
(2) Rated voltage: the float type value is unified in unit of V and is not displayed in unit;
(3) Precision: the float type value is unified in units which are not displayed;
(4) Life span: the int type value is unified in HRS, and the unit is not displayed;
(5) Operating temperature: the character type, the high working temperature value and the low working temperature value are both int type values, and the low working temperature value and the high working temperature value are connected through a symbol '/' so as to finally return to the character type;
(6) Diameter: the float type value is unified in MM and is not displayed in units;
(7) Height: the float type value is unified in MM and is not displayed in units;
(8) Foot distance: the float type value is unified in MM and is not displayed in units;
(9) The installation mode is as follows: a character type;
the output format of the parameter items is the same as the data format of the product parameter items after unification in the database;
step 2.5: unified symbol: partial symbols are replaced uniformly, so that subsequent operation and parameter extraction operation are facilitated;
step 2.6: the extraction and installation modes are as follows: the installation mode has a corresponding relation between description words, and the installation mode is extracted according to the corresponding relation; the installation mode is reserved, the vocabulary corresponding to the installation mode is extracted, the vocabulary corresponding to the material description is deleted, the extracted information is ensured, and the repeated application is avoided;
Step 2.7: extracting precision; recording, extracting and original representing characters, and updating material description;
step 2.8: unified distance symbol: unifying the distance units into MM, and simultaneously storing the value after unifying the distances and the original characters, namely recording the corresponding relation between the data and the original data;
step 2.9: extracting diameter, height and foot distance; the extracted material descriptions are all material descriptions after characters are capitalized, and parts except numbers and letters X are replaced by spaces;
step 2.10: the description about the capacity value, voltage, temperature and time in the material description is uniformly replaced by the description of the characters which are uniform respectively; after conversion, storing the numerical value and the corresponding character;
step 2.11: extracting the capacitance, voltage and service life; and updating the material description;
step 2.12: extracting a temperature range; and updating the material description;
step 2.13: extracting the capacitance value represented by the special symbol of the capacitor; if the capacity value is not successfully extracted in the step 2.11, the step is entered, the capacity value is further extracted, and after the extraction is successful, the material description is updated;
step 2.14: extracting the capacitance value, the voltage and the precision of a scientific counting method; if the capacity value or the voltage or the precision is not successfully extracted before the step, the step is entered for extraction;
Step 2.15: extracting the capacity value of the pure digital representation; if the above steps are not successful in extracting the capacity value, entering the step to extract the capacity value; extracting a pure numerical value without letters before and after the material description, outputting the numerical value as a capacity value, and updating the material description;
step 2.16: extracting the precision of the letter representation; if the steps are unsuccessful in extracting the precision, entering the step to extract the precision; the letters which appear singly in the material description, namely, the letter symbols which are not adjacent to the letters are arranged in the front and the back, the first letter appears firstly and is input as precision, and the material description is updated;
step 2.17: extracting the foot distance; if the step is not successful in extracting the foot distance, entering the step to extract the foot distance; the remaining numbers in the extract description and the absence of immediately preceding and following letters, if the value size is between [1,50] and at most only one decimal place can represent the size, outputting the value as a foot distance;
step 2.18: confirming a parameter item; if the necessary parameter items are extracted successfully, entering the next step to confirm the product; otherwise, reminding the user of parameter missing and complementing corresponding parameter items;
step 2.19: confirming a product; and determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the purchase possibility of the purchasable products by using a KNN algorithm based on the Mahalanobis distance.
Further, the necessary parameter items of the aluminum electrolytic capacitor in step 2.18 include: capacitance, rated voltage, accuracy, operating temperature, diameter/length, foot distance.
Further, the predicting purchasing method of the KNN algorithm based on the mahalanobis distance in the step 3 provides a purchasing prediction of the product at the current user, and the specific steps include:
step 3.1: data preparation:
data preparation is carried out through purchase records of users, so that data samples of different categories including purchase or non-purchase are kept in a ratio of 1:1; setting the number of samples as N, and simultaneously ensuring that the number of samples is larger than the characteristic dimension L of the samples;
step 3.2: calculating the distance:
calculating the distance between the current sample to be classified and each sample in the classified samples in the training set; calculating the distance between samples by using the mahalanobis distance;
step 3.3: sorting the distances;
according to the distance D between the current sample to be classified and each sample in the training set i I=1, 2, … …, N, and are arranged in ascending order according to the distance;
step 3.4: neighbor samples are determined.
Selecting the first K sample data as neighbor samples of the current sample to be classified according to the ordered distance;
step 3.5: counting the number of category attributes of the neighbor samples;
Counting categories of K neighbor sample dataAttribute, category w 1 The number of samples of (a) is t 1 Category w 2 The number of samples of (a) is t 2
Step 3.6: determining the category attribute of a sample to be classified;
when t 1 >t 2 When the sample data to be classified is classified, the category attribute is w 1 I.e. purchased;
when t 1 <t 2 When the sample data to be classified is classified, the category attribute is w 2 I.e., not purchased;
step 3.7: and adding the product forecast result which can be purchased as a purchased product to a list of recommended purchases for the user.
Further, the calculating the distance between the samples using the mahalanobis distance in step 3.2 specifically includes: calculating the distance D (Y) between samples using the Markov distance i ,Y j ) Wherein D (Y i ,Y j ) Representing sample Y i And Y j The specific calculation mode is as follows:
step 3.2.1: data centralization;
Figure GDA0004047384000000071
wherein Y represents the original data, X represents the data after centering,
Figure GDA0004047384000000072
representing an average value of the raw data;
step 3.2.2: solving a mapping matrix; from covariance matrix
Figure GDA0004047384000000073
Solving the eigenvalue and the corresponding eigenvector;
step 3.2.3: sorting the characteristic values in a descending order; the feature values solved in the step 3.2.2 are arranged in a descending order, and feature vectors corresponding to the feature values are ordered to form a feature vector matrix V; matrix V represents the complete principal component space;
Step 3.2.4: obtaining data in a rotation space; mapping the centralized data into a principal component space, wherein Z=XV is used for obtaining rotated data Z;
step 3.2.5: calculating Euclidean distance under the new coordinates, namely corresponding Markov distance between the original data;
Figure GDA0004047384000000081
/>
the smaller the mahalanobis distance, the higher the similarity between samples; the greater the distance, the less similarity between samples;
wherein D is i (z 0 -z i ) Is the current sample Y 0 And sample Y in training set i Is the mahalanobis distance; z 0 Is the coordinate position, z, of the current sample in the new coordinate system i Is the ith sample Y in the training set i At the coordinate position of the new coordinate system.
Further, the first K sample data are selected in the step 3.4, where the K value determining method specifically includes:
selecting and determining a K value, and determining a final K value through the accuracy of a leave-one-out experiment of the current sample data; taking one sample of known class of sample data each time as a test set sample, and taking other sample data as training set samples; k value is selected, prediction is carried out through the KNN method, and accuracy is counted; the K value with the highest accuracy is the set K value;
Figure GDA0004047384000000082
is odd number
Wherein P is K Representing the average accuracy of the current K value; KNN (Y) i ) Representing sample Y i As a result when the remaining samples of the test set sample are used as training set samples, when the predicted result is equal to Y i When the classification of (3) is the same, KNN (Y i ) =1; when the predicted result is equal to Y i KNN (Y) when the classification is different i )=0;
Figure GDA0004047384000000083
Representing the correct number of prediction results in N samples; select P K The value corresponding to K at maximum is the final value of K.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
(1) And (3) for confirming key parameter items of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set, and finally confirming the key parameter items of the aluminum electrolytic capacitor through the parameter items in the frequent item set. And in the process of statistics and extraction of the key parameters, the confirmed key parameters are less in manual participation, the confirmation result is objective and accurate, and the requirements of customers in actual product selection are met. For the confirmation parameters of the aluminum electrolytic capacitor, determining the decomposed signs and rules of the parameters.
(2) Because the prepared data respectively belong to scores or data in different aspects, due to different measurement units or scoring standards in the original data, if a direct distance calculation mode is adopted, data in certain dimensions can play a small or even no role in distance calculation, and certain dimensions play a large or even overlarge role in calculation. Therefore, the mahalanobis distance is used as measurement calculation, the mahalanobis distance is not controlled by dimension, and the mahalanobis distance between two samples is irrelevant to the measurement unit and the measurement standard of the original sample data. While mahalanobis distance can take into account the correlation between the individual data and is scale independent, i.e. independent of the unit of measure and the scoring criteria. The mahalanobis distance calculated by the standardized data and the centralized data is the same, and no calculation error exists. The mahalanobis distance also eliminates the correlation interference between variables.
(3) KNN is classified by predicting the distance between different samples. The idea is that: if a sample most closely spaced in the feature space from the K most similar samples (i.e., closest in the feature space) belongs to a class, then the sample also belongs to that class. The training time of the KNN method is low in complexity and is only O (n); no assumption exists on the data, the accuracy is high, and the data is insensitive to abnormal points; new data can be directly added into the data set without retraining, the data is added flexibly, and the data model is not required to be replaced regularly.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
Fig. 1 is a flowchart of an aluminum electrolytic capacitor purchase prediction method based on a KNN algorithm of mahalanobis distance according to an embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a KNN algorithm aluminum electrolytic capacitor purchase prediction method based on a Markov distance, which comprises the following steps: (1) confirming parameters of an aluminum electrolytic capacitor; (2) product determination of aluminum electrolytic capacitor; (3) Purchase prediction of the KNN algorithm based on mahalanobis distance three main steps. The method comprises the steps of confirming key parameters of an aluminum electrolytic capacitor through material analysis of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set for confirming key parameter items of the aluminum electrolytic capacitor, and finally confirming the key parameter items of the aluminum electrolytic capacitor through parameter items appearing in the frequent item set; according to the decomposition rules of material description, a parameter matching scheme is determined, and a usable product range is determined; and a prediction purchase method based on a KNN algorithm of the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a list of products which can be purchased is primarily screened for the user. The method is described in detail below.
(1) Parameter confirmation of aluminum electrolytic capacitor
Step 1.1: and collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor.
Step 1.2: and (5) cleaning data. And clearing the blank or the material description of the aluminum electrolytic capacitor with only Chinese character parts, and converting the numerical representation in the material description into a uniform numerical value (namely, the combination of different numerical values and the same character, wherein the meaning of the representation is the same, namely, the 25V and the 50V are the represented voltage values). The data set of the aluminum electrolytic capacitor after cleaning is D.
Step 1.3: and counting a character set W1 (except Chinese or English symbols, reserved mathematical expression symbols such as the symbols, +/-and the like) of all the aluminum electrolytic capacitors, and performing duplication elimination.
Step 1.4: the frequent item sets S are extracted (the principle is that if a certain item set is a frequent item set, then all its non-empty subsets are frequent (a priori principle) i.e. if 0,1 is frequent, then 0,1 must be frequent, conversely, i.e. if a item set is not frequent, then all its supersets are also not frequent.
Wherein, the step of extracting the frequent item set S includes:
step 1.4.1: will character set W 1 The characters in the two are combined pairwise to form a set W 2 . Set W 2 If the number of occurrences is greater than L (L is determined according to the number of data sets, L=Nx10%, where N is the number of data pieces in the data set), it is a subset of frequent item sets, and its string set is T 2 The method comprises the steps of carrying out a first treatment on the surface of the If not, none of the item sets containing the character string is a frequent item set.
Step 1.4.2: will T 2 And W is 1 Combining, T 2 The character string in (a) is in front, W 1 Thereafter, a set W of character strings is formed 3 . Set W 3 If the number of occurrence times is larger than L, the subset of frequent item sets is the character string set of T 3 The method comprises the steps of carrying out a first treatment on the surface of the If not, none of the item sets containing the character string is a frequent item set.
Step 1.4.3: t (T) 3 And W is 1 Combining, repeating the operation of searching frequent item sets in the step 1.4.2 to obtain a character string set T 4 . In this way, circulate until T n When empty, the cycle is ended.
Step 1.4.4: set w= [ T ] of string item sets n-1 ,…,T 3 ,T 2 ,T 1 ,W 1 ]。
Step 1.4.5: counting the occurrence times. Counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the matrix FW comprises the character strings and the corresponding occurrence times.
Step 1.4.6: and merging the character strings. When character strings are combined, character strings with shorter lengths start to be upwards performed. The method comprises the steps of removing a short string and the corresponding occurrence number (the occurrence number of the long string is the sum of the times of all the superset strings nearest to the short string) of the FW if the occurrence number of the long string is greater than or equal to the short string; if the number of occurrences of the long string is smaller than the short string, the corresponding number of occurrences of the short string is modified to be the number of occurrences of the short string-the number of occurrences of the long string (here, the number of occurrences of the long string is the sum of the numbers of occurrences of all the superset strings that the short string has latest). If the FW has 5pf,5p, 5V character strings, the most recent superset character string corresponding to the short character string 5 here is 5p and 5V, and the most recent superset character string corresponding to the short character string 5p is 5pf.
Step 1.4.7: and sequencing, namely sequencing the character strings from high to low according to the occurrence times of the character strings, wherein the occurrence times of the character strings exceed 30% of the total number N of the data lump, so as to form a set S.
Step 1.5: and extracting character strings (such as 50PF, which represents a capacitance value; 50V, which represents a voltage) representing parameters with higher occurrence frequency according to the character strings extracted by the set S, and finally determining key parameter items of the aluminum electrolytic capacitor.
The following parameter confirmation steps for the aluminum electrolytic capacitor of step (1) are exemplified as follows:
examples: because the data set is larger, the example display is more complex, and the next example is only displayed by displaying the keyword step or the assumed condition data of the aluminum electrolytic capacitor parameter confirmation.
Step 1.1: for example, there are three character strings of '1000uF/25v,10 x 20, 105 ℃,20%,5', '2200uF/35v,16 x 25, 105 ℃,20%,7.5, 6000HRS', 'capacitance';
step 1.2: the cleaning data, here the number is replaced collectively with 5. The data obtained after washing are:
Figure GDA0004047384000000121
Figure GDA0004047384000000131
step 1.3: statistics of the present character set W 1
W 1 ={R,V,S,u,%,F,5,H,℃,*}
Step 1.4: the frequent item set S is extracted.
Step 1.4.1: from W 1 Obtaining W 2 The value of L is 0.2, and T can be obtained 2 The following are provided:
W 2 ={RR,RV,RS,Ru,R%,RF,R5,RH,R℃,R*,VR,VV,VS,Vu,……}
T 2 ={RS,uF,5V,5u,5%,5H,5℃,5*,HR,*5}
step 1.4.2: from T 2 And W is 1 Obtaining W 3 Thereby obtaining T 3 The following are provided:
W 3 ={RSR,RSV,RSS,RSu,RS%,RSF,RS5,RSH,RS℃,RS*,……}
T 3 ={5uF,5HR,5*5,HRS}
step 1.4.3: from T 3 And W is 1 Obtaining W 4 Thereby obtaining T 4 Sequentially circulate until T n Is empty as follows:
W 4 ={5uFR,5uFV,5uFS,5uFu,5uF%,5uFF,5uF5,5uFH,5uF℃,5uF*,……}
T 4 ={5HRS}
W 5
={5HRSR,5HRSV,5HRSS,5HRSu,5HRS%,5HRSF,5HRS5,5HRS℃,……}
T 5 ={}
T 5 and (5) being empty, and ending the cycle.
Step 1.4.4: a set W of string item sets is obtained as follows:
W={5HRS,5uF,5HR,5*5,HRS,RS,uF,5V,5u,5%,5H,5℃,5*,HR,*5,R,V,S,u,%,F,5,H,℃,*}
step 1.4.5: counting the occurrence times of the character strings to obtain FW.
FW={{5HRS,1},{5uF,2},{5HR,1},{5*5,2},{HRS,1},{RS,1},{uF,2},{5V,2},{5u,2},{5%,2},{5H,1},{5℃,2},{5*,2},{HR,1},{*5,2},{R,1},{V,2},{S,1},{u,2},{%,2},{F,2},{5,15},{H,1},{℃,2},{*,2}}
Step 1.4.6: and merging the character strings. This step should be performed starting from a shorter length string. The example here is the main line: 5→F→u→5u→uf→5uF are illustrated by way of example and do not proceed fully according to step 1.46 in the extraction frequent item set S.
(1) The string '5' occurs 15 times, the most recent superset of which is '5V', '5u', '5%', '5H', '5°c', '5 x', the sum of the times occurring is: 2+2+2+1+2+2=13, then modify '5' in FW and its corresponding number of times to {5,2}.
(2) The character string 'F' has the occurrence number of 2, the latest superset is 'uF', and the occurrence number of 2, and the 'F' and the corresponding times thereof in FW are removed.
(3) The number of occurrences of the character string 'u' is 2, the most recent superset thereof is 'uF', '5u', and the total number of occurrences is 2+2=4, and then 'u' and the corresponding number thereof in FW are removed.
(4) The character string 'uF' has a frequency of occurrence of 2, the most recent superset is '5uF', and the frequency of occurrence is 2, and the 'uF' and the corresponding frequency thereof in FW are removed.
FW after merging the character strings according to the step 1.4.6 of extracting the frequent item set S is as follows:
FW={{5HRS,1},{5uF,2},{5*5,2},{5V,2},{5%,2},{5℃,2},{5,2}}
step 1.4.7, sequencing. 30% of the total number N is 0.6, then the same applies, S= { {5uF,2}, {5 x 5,2}, {5V,2}, {5%,2}, {5 ℃,2}, {5HRS,1 }.
Step 1.5: according to the character strings displayed in the set S, for example, the character strings representing the parameters with higher occurrence frequency are extracted, and the meaning of the parameter items is determined, for example: 5uF represents capacitance; 5*5 the diameter (length) and height (width); 5V represents a voltage; 5% represents the precision; 5 ℃ represents the working temperature; 5HRS indicates lifetime.
(2) Product determination of aluminum electrolytic capacitor
Step 2.1: user input is determined, a factory model (including a factory model only and a material description containing the factory model, the factory model mentioned below contains both cases), a material description (the material description mentioned below is a material description which does not succeed in identifying the factory model, and two cases exist, namely, the description is a simple material description, and the description is a description containing the factory model which is not contained in a platform).
Step 2.2: if the step 2.1 is determined to be the former factory model through the database of the platform, the following steps for determining the product can be directly skipped, and the product can be directly positioned. If it is determined in step 2.1 that the input is a material description, then proceed downwards.
Step 2.3: and (5) confirming the category. And checking whether the material description contains the category names, aliases and the like in the corresponding data, and finally confirming the category of the material description. If the material description of the aluminum electrolytic capacitor is adopted, the following parameter decomposition process is met.
Step 2.4: and determining parameters of the aluminum electrolytic capacitor. In the material description of the aluminum electrolytic capacitor accumulated by the platform, the statistics result in determining the critical parameters of the aluminum electrolytic capacitor product (the statistical process is described in the step of confirming the parameters of the aluminum electrolytic capacitor, namely 1), and the following table is shown:
TABLE 1 Key parameters table for aluminium electrolytic capacitor
Figure GDA0004047384000000151
(1) Capacitance value: the float type number, in UF (microfarads) units, are not shown.
(2) Rated voltage: float type number. The unit is V (volts) and is not shown.
(3) Precision: float type number. The unified unit is% and the unit is not shown.
(4) Life span: int type value. The unit is HRS (hours) and the unit is not shown.
(5) Operating temperature: the character type, the high value of the working temperature and the low value of the working temperature are both int type values, and the low value of the working temperature and the high value of the working temperature are connected through a symbol '/' so as to finally return to the character type.
(6) Diameter (length): float type number. The unit is MM (millimeters) and is not shown.
(7) Height (width): float type number. The unit is MM (millimeters) and is not shown.
(8) Foot distance: float type number. The unit is MM (millimeters) and is not shown.
(9) The installation mode is as follows: character type.
The output format of the parameter items is the same as the data format of the product parameter items after unification in the database.
Step 2.5: unified symbols. And certain symbols are replaced and unified, so that subsequent operations, parameter extraction and the like are facilitated. And replacing the symbols in the original material description according to a symbol correspondence table, wherein the table is a part of symbol correspondence.
Table 2 symbol correspondence table
1 2 3 4 5 6 7 8 9 10 ……
Original character μ —— Positive and negative 0hm -/+ -\+ +- _ ……
Replacement character u - - ± ohm ± ± ± - ……
Step 2.6: and (5) extracting an installation mode. The installation modes have corresponding relations (shown in table 3) among the description words, and the installation modes are extracted according to the corresponding relations. The installation mode and the vocabulary corresponding to the installation mode are reserved (the operation is that the following steps are recorded and extracted with the original representation characters), meanwhile, the corresponding vocabulary in the material description is deleted, the extracted information is ensured to be no longer reused (the action of deleting the extracted information character in the material description is hereinafter collectively referred to as updating the material description).
Table 3 correspondence (part) between the installation mode and the description vocabulary
Figure GDA0004047384000000161
/>
Figure GDA0004047384000000171
Step 2.7: and (5) extracting accuracy. For descriptions where only one precision appears, such as descriptions containing only 10% or + -10, etc., directly extracting '10.0' as the output of the precision; for descriptions where two or more or ranges of precision appear in the description, such as 10%20% or 10-20% or + -10-20% or the like, the '10.0/20.0' is extracted as output. And recording, extracting and original representation characters, and updating material description.
Step 2.8: unify distance symbols. The MM, CM, DM, rice, DM, CM, MM are given equal distance units. After a uniform distance unit, a new material description 5mm by 3mm is obtained, and a value (float value, such as 5.0) and an original character (such as 5 mm) after the uniform distance are stored, namely, the corresponding relation between the recorded data and the original data is also stored.
Step 2.9: diameter (length), height (width), foot distance are extracted. The extracted material description is the material description after the characters are capitalized, the parts except the numbers and the letters X are replaced by spaces, 7X8X6 or 7X8X6 continuous multiplication form fields are extracted in the first step, numerical value parts are extracted, the first numerical value is the diameter (length), the second numerical value is the height (width) and the third numerical value is the foot distance. If the first extraction fails, extracting 7*8 or 7X8 as a continuous multiplication form field, and extracting numerical values in the continuous multiplication form field, wherein the numerical values extracted by the method are the diameter (length) and the height (width) of the first numerical value; extracting values immediately following ' pitch= ', PITCH- ', ' PITCH ', ' p= ', ' P- ' and the like, wherein the values are foot distances. And when the character is extracted, the corresponding character of the character in the original material description is confirmed through the position of the extracted corresponding character. The original description as entered at this step is: '10pf,50v,7mm x 8mm x 6mm', the treated material was described as '10, 50, 7x8x 6', the diameter after extraction was 7.0, the height was 8.0, the foot distance was 6.0, and the corresponding characters were 7mm x 8mm x 6mm.
Step 2.10: and uniformly replacing the descriptions about the capacity value, the voltage, the temperature and the time in the material description with the respective uniform characters for description. The same as the conversion of the distance unit in step 2.8. For example, '50uf,80uf,500v', the value of the transformed value is stored as [50.0,80.0], and the original character represented by the value is [ '50uf', '80uf' ]; the value for the voltage is stored as [500.0], the original character represented by the voltage is [ '500V' ]; the material description after converting the replacement character is updated to '50.0UF,80.0UF,500.0V'
After conversion, the numerical values and the corresponding characters are stored. The correspondence of the converted unit symbols of the respective units and the unified symbol is given in the following table.
Table 4 correspondence (part) between units and units identical to each other
Unit (B) Unified unit representation
Capacitance value MF,UF,NF,PF,F UF
Temperature (temperature) Degree centigrade, degree centigrade
Time Hour, HOURS, HOUR, HRS HRS
Voltage (V) MV,UV,VAC,KV,V,VDC V
Step 2.11: extracting capacitance, voltage and service life. Taking the extraction capacity value as an example for explanation: if the value related to the capacity value exists in the step 2.10, the maximum value is extracted and output as the capacity value. And extracting the original character basis of the capacity value corresponding to the numerical value. Update the material description (update all about the part of the capacity). If the capacity value is extracted from [50.0,80.0], returning to 80.0; the original character is 80uf; the material description was updated to 500.0V.
Step 2.12: extraction temperature range. Case 1: in step 2.10, two values related to temperature are extracted, and the first value is taken as the lowest working temperature, the second value is taken as the highest working temperature, and the connection output is carried out by the symbol '/'. Case 2: in the step 2.10, only one value related to temperature is extracted, and whether the similar value 25-150 ℃ exists in the material description, namely, the value immediately before the similar value is the lowest working temperature, and if the similar value exists, the temperature range is output for representation; if not, inquiring the value similar to '25-150 ', namely, immediately adjacent to the value after '25-150 ℃, as the highest working temperature, and if so, outputting a working range representation; if not, outputting the value of the current temperature representation. And records its original representation. Updating the material description.
Step 2.13: the capacitance value of the capacitor special symbol representation is extracted. If the extraction of the capacity value is not successful in the step 2.11, the step is entered, the capacity value is further extracted, and after the extraction is successful, the material description is updated. Extracting the similar capacity value representations of '5U3', '5U', 'U5', and the like, and extracting the capacity value representations with the units of UF as numerical values of 5.3, 5 and 0.5 respectively. The letters representing the special symbols of the capacitor and the corresponding relation with the units are shown in the following table:
TABLE 5 correspondence of capacitance special symbols to capacitance units
1 2 3 4 5
Special character P N U M R
Corresponding units PF NF UF UF PF
Step 2.14: and extracting the capacitance, voltage and precision of the scientific counting method. If the capacity value or the voltage or the precision is not successfully extracted before the step, the step is entered for extraction. Fields like '104k 500', '104', '500', i.e. combinations of three-bit integers and letters or mere three-bit integer combinations, are extracted. Wherein for a combination of letters with a letter belonging to the letter set representing precision, then the first combination meeting the condition, the integer being a scientific count representation of the value, the letter being a letter representation of precision; if only one digital combination exists and only the capacitance value is not extracted currently and the voltage is extracted, the digital combination is a scientific counting method representation of the capacitance value; if only one digital combination exists and only the voltage is not extracted currently and the capacitance value is extracted, the digital combination is represented by a scientific counting method of the voltage; if the number of the digital combinations is greater than or equal to 2, the numerical value represented by the scientific counting method in the first two digital combinations is represented by a capacitance value, and the numerical value represented by the scientific counting method is represented by a voltage. The digital combination expressed by the scientific counting method comprises the following capacitance conversion modes: the first two digits of the digit combination represent a value (10 (the last digit of the digit combination represents a value of-6)); the voltage conversion mode is as follows: the first two digits of the numerical combination represent a value (the last digit of the 10-digit combination represents a value). The capacity value, voltage or precision extracted by the step only supplements the extracted information and does not replace the extracted information. The conversion of the letter and the precision representation is shown in the following table:
Table 6 correspondence represented by accuracy
1 2 3 4 5 6 7
Letter J K M F G S Z
Precision of 5.0 10.0 20.0 1.0 2.0 20.0/50.0 20.0/80.0
Step 2.15: the capacity value of the pure digital representation is extracted. If the above steps are not successful in extracting the capacity value, the process is entered into the step to extract the capacity value. And extracting a pure numerical value without letters before and after the numerical value in the material description, outputting the numerical value as a capacity value, and updating the material description. If the material description is '2.3.2.5 s', 2.3 is extracted as the value of the capacitance value to be output.
Step 2.16: the accuracy of the alphabetical representation is extracted. If the steps are not successful in extracting the precision, the step is entered to extract the precision. If letters in the sixth table exist, the first letter in the order of the table 6 appears first, namely the letter is input as precision, and the material description is updated. For the material description 'KJ', since 'J' is the front in table 6, i.e. the accuracy of extraction in this material description is '5.0', in '%'.
Step 2.17: extracting foot distance. If the step is not successful in extracting the pitch, the step is entered to extract the pitch. The remaining numbers in the extract description, and no immediately preceding or following letters, are output as pitches if their values are between [1,50] and at most only one decimal place can represent their sizes.
Step 2.18: and confirming the parameter item. If the necessary parameter items (shown in the following table) are extracted successfully, entering the next step to confirm the product; otherwise, reminding the user of parameter missing and complementing corresponding parameter items.
TABLE 7 necessary parameters table for aluminum electrolytic capacitor
1 2 3 4 5 6
Aluminum electrolytic capacitor Capacitance value Rated voltage Precision of Operating temperature Diameter (length) Foot distance
Step 2.19: and (5) confirming the product. And determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the purchase possibility of the purchasable products by using a KNN algorithm based on the Mahalanobis distance.
(3) Purchase prediction for KNN algorithm based on Mahalanobis distance
Step 3.1: data preparation.
Data preparation is performed through the purchase records of the users, so that data samples of different categories (purchase and non-purchase) are kept at a ratio of 1:1. Let the number of samples be N while ensuring that the number of samples is greater than the characteristic dimension L of the samples.
Step 3.2: the distance is calculated.
And calculating the distance between the current sample to be classified and each sample in the classified samples in the training set. The distance D (Y) between samples is calculated here using the mahalanobis distance (the number of total samples is greater than the dimension of the sample) in consideration of the independence of the mahalanobis distance from the dimension (size) and the capability of excluding the correlation interference between the variables i ,Y j ). Wherein D (Y) i ,Y j ) Representing sample Y i And Y j The specific calculation mode is as follows:
and (3) calculating the mahalanobis distance:
step 3.2.1: data centralization.
Figure GDA0004047384000000211
Wherein Y represents the original data, X represents the data after centering,
Figure GDA0004047384000000212
represents the average of the raw data.
Step 3.2.2: and solving a mapping matrix. From covariance matrix
Figure GDA0004047384000000213
And solving the eigenvalue and the corresponding eigenvector.
Step 3.2.3: the eigenvalues are sorted in descending order. And (3) arranging the eigenvalues solved in the step (3.2.2) in a descending order, and further sequencing eigenvectors corresponding to the eigenvalues to form an eigenvector matrix V. The matrix V represents the complete principal component space.
Step 3.2.4: data in the rotation space is obtained. The centered data is mapped to the principal component space, and z=xv is used to obtain rotated data Z.
Step 3.2.5: and calculating the Euclidean distance under the new coordinates, namely the corresponding Markov distance between the original data.
Figure GDA0004047384000000221
/>
The smaller the mahalanobis distance, the higher the similarity between samples; the greater the distance, the less similarity between samples.
Wherein D is i (z 0 -z i ) Is the current sample Y 0 And sample Y in training set i Is a mahalanobis distance. z 0 Is the coordinate position, z, of the current sample in the new coordinate system i Is the ith sample Y in the training set i At the coordinate position of the new coordinate system.
Step 3.3: and (5) sorting the distances.
According to the distance D between the current sample to be classified and each sample in the training set i I=1, 2, … …, N, and are arranged in ascending order according to the size of the distance.
Step 3.4: neighbor samples are determined.
According to the ordered distance, the first K sample data (K is odd, excluding the same number of two classified samples and smaller than
Figure GDA0004047384000000222
The method comprises the steps of firstly temporarily taking K as 11, adjusting the values of N and K according to the later use condition, taking the data of the known category as a test set, adjusting N and K according to the accuracy rate, and selecting the best value of N and K) as a neighbor sample of a current sample to be classified.
Step 3.5: and counting the number of category attributes of the neighbor samples.
Counting class attributes of K neighbor sample data, wherein the class is w 1 The number of samples of (a) is t 1 Category w 2 The number of samples of (a) is t 2
Step 3.6: and determining the category attribute of the sample to be classified.
When t 1 >t 2 When the sample data to be classified is classified, the category attribute is w 1 I.e. purchased.
When t 1 <t 2 When the sample data to be classified is classified, the category attribute is w 2 I.e. not purchased.
Step 3.7: and adding the product forecast result which can be purchased as a purchased product to a list of recommended purchases for the user.
Note that: and determining a K value.
And selecting and determining a K value, and determining a final K value through the accuracy of a leave-one-out experiment of the current sample data. Sample data of known classes are used as test set samples, and one sample is reserved at a time, and other sample data are used as training set samples. K value is selected, prediction is carried out through the KNN method, and accuracy is counted. The K value with the highest accuracy is the set K value.
Figure GDA0004047384000000231
And K is an odd number
Wherein P is K Representing the average accuracy of the current K value. KNN (Y) i ) Representing sample Y i As a result when the remaining samples of the test set sample are used as training set samples, when the predicted result is equal to Y i When the classification of (3) is the same, KNN (Y i ) =1; when the predicted result is equal to Y i KNN (Y) when the classification is different i )=0。
Figure GDA0004047384000000232
Indicating the correct number of predictors among the N samples. Select P K The value corresponding to K at maximum is the final value of K.
Step 3 example: purchase prediction for KNN algorithm based on Mahalanobis distance
Step 3.1: preparing sample data
Assume that there are fourRaw sample data each comprising three dimensions of scoring data, the matrix being [ Y ] 1 ,Y 2 ,Y 3 ,Y 4 ]=[160,60000,1;170,60000,1;160,60000,0;170,60000,0]The corresponding classification matrix is: [1,0,1,0]Where 1 indicates classification as purchase and 0 indicates classification as non-purchase.
The data to be classified is Y 0 =[160,59000,1]。
Then: the original data and the data to be classified form a set Y.
Y=[Y 1 ,Y 2 ,Y 3 ,Y 4 ,Y 0 ]
=[60,600,1;70,600,1;60,600,0;70,600,0;60,590,1]
Step 3.2: calculating the mahalanobis distance between the sample to be classified and the original sample data
Step 3.2.1 data centering
The calculation is made up of the steps of,
Figure GDA0004047384000000245
then x= [ [ -4,2,0.4], [6,2,0.4], [ -4,2, -0.6], [6,2, -0.6], [ -4, -8,0.4], ]
Step 3.2.2 solving
Figure GDA0004047384000000241
The eigenvalues and corresponding eigenvectors of (1) are as follows:
eigenvalues (here four-dimensional decimal display is retained): [28.9644 11.0761 0.1995]
Feature vector (here, the four-bit decimal display is retained):
Figure GDA0004047384000000242
step 3.2.3, the eigenvalues are arranged in descending order to obtain an eigenvector matrix V which is:
Figure GDA0004047384000000243
step 3.2.4 obtains data Z in the feature space (wherein the data all retain four decimal places for display).
Figure GDA0004047384000000244
Step 3.2.5 calculates the distance between the sample to be classified and the original sample (retaining the four-bit decimal places for display).
D=[D 1 D 2 D 3 D 4 ]=[10.0 14.1421 10.0499 14.1774]
I.e. the degree of similarity of the data to be classified to the samples in the original data decreases in the following order: d (D) 0 ->D 2 ->D 1 ->D 3
Step 3.3: distance ascending order
D=[D 1 D 3 D 2 D 4 ]=[10.0 10.0499 14.1421 14.1774]
Step 3.4: determining neighbor samples
Suppose here that K takes 3 (the amount of data is small, here not followed
Figure GDA0004047384000000251
I.e. 1.ltoreq.K.ltoreq.2).
The neighbor sample is the distance value D 1 ,D 3 ,D 2 Corresponding original sample Y 1 ,Y 3 ,Y 2 The corresponding category attributes are 1,1 and 0 respectively.
Step 3.5: counting the number of category attributes of neighbor samples
The number of samples with category 1 is 2; the number of samples of class 0 is 1.
Step 3.6: determining class attributes of a sample to be classified
Since the number of samples with the category of 1 in the neighbor samples is greater than the number of samples with the category of 0, the attribute category of the sample to be classified is classified as 1.
Step 3.7: and adding the sample to be classified into a list of recommended purchases for the user.
It should be understood that the specific order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate preferred embodiment of this invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. The processor and the storage medium may reside as discrete components in a user terminal.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. These software codes may be stored in memory units and executed by processors. The memory unit may be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
The foregoing description includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, as used in the specification or claims, the term "comprising" is intended to be inclusive in a manner similar to the term "comprising," as interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean "non-exclusive or".

Claims (6)

1. The KNN algorithm aluminum electrolytic capacitor purchase prediction method based on the Mahalanobis distance is characterized by comprising the following steps of:
step 1: the method comprises the steps of confirming key parameters of an aluminum electrolytic capacitor through material analysis of the aluminum electrolytic capacitor, carrying out preliminary extraction by adopting an extraction method of a frequent item set for confirming key parameter items of the aluminum electrolytic capacitor, and finally confirming the key parameter items of the aluminum electrolytic capacitor through parameter items appearing in the frequent item set;
step 2: according to the decomposition rules of material description, a parameter matching scheme is determined, and a usable product range is determined;
step 3: a KNN algorithm prediction purchase method based on the Mahalanobis distance is adopted, the purchase prediction of the product at the current user is given, and a product list which is possibly purchased is primarily screened for the user;
in the step 2, according to the decomposition rule of the material description, a parameter matching scheme is determined, and a usable product range is determined, and the specific steps include:
step 2.1: determining user input, and describing the model and materials of a former factory;
step 2.2: if the step 2.1 is determined to be the original factory model through the database of the platform, the following steps of product determination are directly skipped, and the product is directly positioned; if the input is determined to be the material description in the step 2.1, the process is performed downwards;
Step 2.3: identifying the category: checking whether the material description contains a class name and an alias in the corresponding data, and finally confirming the class of the material description;
step 2.4: determining parameters of an aluminum electrolytic capacitor: in the material description of the aluminum electrolytic capacitor accumulated by the platform, referring to the step 1, the statistics shows that the key parameters for determining the aluminum electrolytic capacitor product are as follows:
(1) Capacitance value: the float type number is unified in UF, and the unit is not displayed;
(2) Rated voltage: the float type value is unified in unit of V and is not displayed in unit;
(3) Precision: the float type number, in units of non-display;
(4) Life span: the int type value is unified in HRS, and the unit is not displayed;
(5) Operating temperature: the character type, the high working temperature value and the low working temperature value are both int type values, and the low working temperature value and the high working temperature value are connected through a symbol '/' so as to finally return to the character type;
(6) Diameter: the float type value is unified in MM and is not displayed in units;
(7) Height: the float type value is unified in MM and is not displayed in units;
(8) Foot distance: the float type value is unified in MM and is not displayed in units;
(9) The installation mode is as follows: a character type;
The output format of the parameter items is the same as the data format of the product parameter items after unification in the database;
step 2.5: unified symbol: partial symbols are replaced uniformly, so that subsequent operation and parameter extraction operation are facilitated;
step 2.6: the extraction and installation modes are as follows: the installation mode has a corresponding relation between description words, and the installation mode is extracted according to the corresponding relation; the installation mode is reserved, the vocabulary corresponding to the installation mode is extracted, the vocabulary corresponding to the material description is deleted, the extracted information is ensured, and the repeated application is avoided;
step 2.7: extracting precision; recording, extracting and original representing characters, and updating material description;
step 2.8: unified distance symbol: unifying the distance units into MM, and simultaneously storing the value after unifying the distances and the original characters, namely recording the corresponding relation between the data and the original data;
step 2.9: extracting diameter, height and foot distance; the extracted material descriptions are all material descriptions after characters are capitalized, and parts except numbers and letters X are replaced by spaces;
step 2.10: the description about the capacity value, voltage, temperature and time in the material description is uniformly replaced by the description of the characters which are uniform respectively; after conversion, storing the numerical value and the corresponding character;
Step 2.11: extracting the capacitance, voltage and service life; and updating the material description;
step 2.12: extracting a temperature range; and updating the material description;
step 2.13: extracting the capacitance value represented by the special symbol of the capacitor; if the capacity value is not successfully extracted in the step 2.11, the step is entered, the capacity value is further extracted, and after the extraction is successful, the material description is updated;
step 2.14: extracting the capacitance value, the voltage and the precision of a scientific counting method; if the capacity value or the voltage or the precision is not successfully extracted before the step, the step is entered for extraction;
step 2.15: extracting the capacity value of the pure digital representation; if the above steps are not successful in extracting the capacity value, entering the step to extract the capacity value; extracting a pure numerical value without letters before and after the material description, outputting the numerical value as a capacity value, and updating the material description;
step 2.16: extracting the precision of the letter representation; if the steps are unsuccessful in extracting the precision, entering the step to extract the precision; the letters which appear singly in the material description, namely, the letter symbols which are not adjacent to the letters are arranged in the front and the back, the first letter appears firstly and is input as precision, and the material description is updated;
Step 2.17: extracting the foot distance; if the step is not successful in extracting the foot distance, entering the step to extract the foot distance; the remaining numbers in the extract description and the absence of immediately preceding and following letters, if the value size is between [1,50] and at most only one decimal place can represent the size, outputting the value as a foot distance;
step 2.18: confirming a parameter item; if the necessary parameter items are extracted successfully, entering the next step to confirm the product; otherwise, reminding the user of parameter missing and complementing corresponding parameter items;
step 2.19: confirming a product; and determining the products meeting the parameter values corresponding to the extracted parameter items in the database as purchasable products, namely predicting the purchase possibility of the purchasable products by using a KNN algorithm based on the Mahalanobis distance.
2. The method for predicting purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 1, wherein the step 1 of confirming the key parameters of the aluminum electrolytic capacitor by analyzing the materials of the aluminum electrolytic capacitor comprises the following specific steps:
step 1.1: collecting material description of the aluminum electrolytic capacitor, and constructing a material description data set of the aluminum electrolytic capacitor;
step 1.2: cleaning data: clearing blank or material description of the aluminum electrolytic capacitor with only Chinese character parts, and converting digital representation in the material description into uniform numerical values, wherein the data set of the aluminum electrolytic capacitor after cleaning is D;
Step 1.3: counting all character sets W of aluminum electrolytic capacitors 1 Performing de-duplication;
step 1.4: the frequent item set S is extracted, and the principle is as follows: if a certain item set is a frequent item set, then all non-empty subsets thereof are frequent as well; conversely, if an item set is infrequent, then all its supersets are also infrequent;
step 1.5: and extracting character strings representing parameters with higher occurrence frequency according to the character strings extracted by the set S, and finally determining key parameter items of the aluminum electrolytic capacitor.
3. The method for predicting purchase of aluminum electrolytic capacitor based on the KNN algorithm of mahalanobis distance as recited in claim 2, wherein the step of extracting the frequent item set S in step 1.4 includes:
step 1.4.1: will character set W 1 The characters in the two are combined pairwise to form a set W 2 Set W 2 If the number of occurrence times is larger than L, the character strings in the data set D are subsets of frequent item sets, and the character string set is T 2 The method comprises the steps of carrying out a first treatment on the surface of the The L is determined according to the number of the data sets, wherein L=Nx10%, and N is the number of the data pieces in the data sets; if not, the item sets containing the character string are not frequent item sets;
Step 1.4.2: will T 2 And W is 1 Combining, T 2 The character string in (a) is in front, W 1 Thereafter, a set W of character strings is formed 3 The method comprises the steps of carrying out a first treatment on the surface of the Set W 3 If the number of occurrence times is larger than L, the subset of frequent item sets is the character string set of T 3 The method comprises the steps of carrying out a first treatment on the surface of the If not, the item sets containing the character string are not frequent item sets;
step 1.4.3: t (T) 3 And W is 1 Combining, repeating the operation of searching frequent item sets in the step 1.4.2 to obtain a character string set T 4 The method comprises the steps of carrying out a first treatment on the surface of the In this way, circulate until T n When the air is empty, ending the cycle;
step 1.4.4: set w= [ T ] of string item sets n-1 ,…,T 3 ,T 2 ,T 1 ,W 1 ];
Step 1.4.5: counting the occurrence times of the character strings in the W in the character strings to form a matrix FW, wherein the matrix FW comprises the character strings and the corresponding occurrence times;
step 1.4.6: merging character strings: when character strings are combined, character strings with shorter lengths start to be upwards processed; if the number of times of occurrence of the long character string is greater than or equal to that of the short character string, removing the short character string and the corresponding occurrence number of times in FW, wherein the number of times of occurrence of the long character string is the sum of the number of times of all the super-set character strings nearest to the short character string; if the number of times of occurrence of the long character string is smaller than that of the short character string, modifying the corresponding number of times of the short character string to be: the number of occurrences of the short string-the number of occurrences of the long string, where the number of occurrences of the long string is the sum of the number of occurrences of all superset strings nearest to the short string;
Step 1.4.7: and sequencing, namely sequencing the character strings from high to low according to the occurrence times of the character strings, wherein the occurrence times of the character strings exceed 30% of the total number N of the data lump, so as to form a set S.
4. The method for predicting purchase of aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 1, wherein the necessary parameter items of the aluminum electrolytic capacitor in step 2.18 include: capacitance, rated voltage, accuracy, operating temperature, diameter/length, foot distance.
5. The method for predicting purchase of aluminum electrolytic capacitor based on the KNN algorithm of mahalanobis distance according to claim 1, wherein the predicting purchase method of the KNN algorithm based on mahalanobis distance in the step 3 provides a purchase prediction of the product at the current user, and the specific steps include:
step 3.1: data preparation:
data preparation is carried out through purchase records of users, so that data samples of different categories including purchase or non-purchase are kept in a ratio of 1:1; setting the number of samples as N, and simultaneously ensuring that the number of samples is larger than the characteristic dimension L of the samples;
step 3.2: calculating the distance:
calculating the distance between the current sample to be classified and each sample in the classified samples in the training set; calculating the distance between samples by using the mahalanobis distance;
Step 3.3: sorting the distances;
according to the distance D between the current sample to be classified and each sample in the training set i I=1, 2, … …, N, and are arranged in ascending order according to the distance;
step 3.4: determining neighbor samples;
selecting the first K sample data as neighbor samples of the current sample to be classified according to the ordered distance;
step 3.5: counting the number of category attributes of the neighbor samples;
counting class attributes of K neighbor sample data, wherein the class is w 1 The number of samples of (a) is t 1 Category w 2 The number of samples of (a) is t 2
Step 3.6: determining the category attribute of a sample to be classified;
when t 1 >t 2 When the sample data to be classified is classified, the category attribute is w 1 I.e. purchased;
when t 1 <t 2 When the sample data to be classified is classified, the category attribute is w 2 I.e., not purchased;
step 3.7: and adding the product forecast result which can be purchased as a purchased product to a list of recommended purchases for the user.
6. The method for predicting purchase of the aluminum electrolytic capacitor based on the mahalanobis distance KNN algorithm as claimed in claim 5, wherein the first K sample data are selected in the step 3.4, wherein the K value determining method specifically comprises the following steps:
selecting and determining a K value, and determining a final K value through the accuracy of a leave-one-out experiment of the current sample data; taking one sample of known class of sample data each time as a test set sample, and taking other sample data as training set samples; k value is selected, prediction is carried out through the KNN method, and accuracy is counted; the K value with the highest accuracy is the set K value;
Figure FDA0004047383990000071
And K is an odd number
Wherein P is K Representing the average accuracy of the current K value; KNN (Y) i ) Representing sample Y i As a result when the remaining samples of the test set sample are used as training set samples, when the predicted result is equal to Y i When the classification of (3) is the same, KNN (Y i ) =1; when the predicted result is equal to Y i When the category of (3) is different
Figure FDA0004047383990000072
Representing the correct number of prediction results in N samples;
select P K The value corresponding to K at maximum is the final value of K.
CN202011299561.8A 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance Active CN112580686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011299561.8A CN112580686B (en) 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011299561.8A CN112580686B (en) 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Publications (2)

Publication Number Publication Date
CN112580686A CN112580686A (en) 2021-03-30
CN112580686B true CN112580686B (en) 2023-05-02

Family

ID=75123094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011299561.8A Active CN112580686B (en) 2020-11-19 2020-11-19 KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance

Country Status (1)

Country Link
CN (1) CN112580686B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113534454A (en) * 2021-07-12 2021-10-22 北京邮电大学 Multi-core optical fiber channel damage equalization method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609869A (en) * 2012-02-03 2012-07-25 纽海信息技术(上海)有限公司 Commodity purchasing system and method
CN107784538A (en) * 2016-08-26 2018-03-09 佛山市顺德区美的电热电器制造有限公司 The recommendation method and device of household electrical appliance
CN107862566A (en) * 2017-10-17 2018-03-30 杨明 A kind of Method of Commodity Recommendation and system
CN108647811A (en) * 2018-04-26 2018-10-12 中国联合网络通信集团有限公司 Predict that user buys method, apparatus, equipment and the storage medium of equity commodity
CN109255567A (en) * 2018-08-08 2019-01-22 北京京东尚科信息技术有限公司 Commodity part type matching process, device, system, electronic equipment and readable medium
CN110674384A (en) * 2019-09-27 2020-01-10 厦门晶欣电子有限公司 Component model matching method
CN111652671A (en) * 2020-04-24 2020-09-11 青岛檬豆网络科技有限公司 Purchasing mall suitable for buyer market environment and purchasing method thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180285959A1 (en) * 2017-03-30 2018-10-04 Crane Merchandising Systems, Inc. Product recommendation engine for consumer interface of unattended retail points of sale
US20200027103A1 (en) * 2018-07-23 2020-01-23 Adobe Inc. Prioritization System for Products Using a Historical Purchase Sequence and Customer Features

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609869A (en) * 2012-02-03 2012-07-25 纽海信息技术(上海)有限公司 Commodity purchasing system and method
CN107784538A (en) * 2016-08-26 2018-03-09 佛山市顺德区美的电热电器制造有限公司 The recommendation method and device of household electrical appliance
CN107862566A (en) * 2017-10-17 2018-03-30 杨明 A kind of Method of Commodity Recommendation and system
CN108647811A (en) * 2018-04-26 2018-10-12 中国联合网络通信集团有限公司 Predict that user buys method, apparatus, equipment and the storage medium of equity commodity
CN109255567A (en) * 2018-08-08 2019-01-22 北京京东尚科信息技术有限公司 Commodity part type matching process, device, system, electronic equipment and readable medium
CN110674384A (en) * 2019-09-27 2020-01-10 厦门晶欣电子有限公司 Component model matching method
CN111652671A (en) * 2020-04-24 2020-09-11 青岛檬豆网络科技有限公司 Purchasing mall suitable for buyer market environment and purchasing method thereof

Also Published As

Publication number Publication date
CN112580686A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
US7584188B2 (en) System and method for searching and matching data having ideogrammatic content
CN107004140B (en) Text recognition method and computer program product
US5787197A (en) Post-processing error correction scheme using a dictionary for on-line handwriting recognition
CN104731976A (en) Method for finding and sorting private data in data table
CN107145516B (en) Text clustering method and system
JPH0728949A (en) Equipment and method for handwriting recognition
US3651459A (en) Character distance coding
CN112580686B (en) KNN algorithm aluminum electrolytic capacitor purchase prediction method based on Mahalanobis distance
Bussemaker et al. Regulatory element detection using a probabilistic segmentation model.
CN113420546A (en) Text error correction method and device, electronic equipment and readable storage medium
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
CN111898378A (en) Industry classification method and device for government and enterprise clients, electronic equipment and storage medium
CN111814781A (en) Method, apparatus, and storage medium for correcting image block recognition result
CN112651590B (en) Instruction processing flow recommending method
CN115713970A (en) Transcription factor identification method based on Transformer-Encoder and multi-scale convolutional neural network
CN112309511B (en) Parameter decomposition and purchase prediction method for tantalum electrolytic capacitor
CN113095064A (en) Code field identification method and device, electronic equipment and storage medium
JP2002183667A (en) Character-recognizing device and recording medium
CN112968705B (en) Number classification method and device, electronic equipment and storage medium
US11315351B2 (en) Information processing device, information processing method, and information processing program
Huang et al. Cryptogram decoding for optical character recognition
CN116682519B (en) Clinical experiment data unit analysis method
CN114519856B (en) Post-processing judgment correction method for character plaintext recognition result of aero-engine blade
US20210064816A1 (en) Information processing device and non-transitory computer readable medium
CN115630047A (en) Address matching analysis method and system based on weighted clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant