CN117807461B - Whisky origin tracing method and system based on big data - Google Patents

Whisky origin tracing method and system based on big data Download PDF

Info

Publication number
CN117807461B
CN117807461B CN202410204907.3A CN202410204907A CN117807461B CN 117807461 B CN117807461 B CN 117807461B CN 202410204907 A CN202410204907 A CN 202410204907A CN 117807461 B CN117807461 B CN 117807461B
Authority
CN
China
Prior art keywords
whiskey
similarity
inorganic element
inorganic
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410204907.3A
Other languages
Chinese (zh)
Other versions
CN117807461A (en
Inventor
李潇
赵博
陈欢欢
逯海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute of Metrology
Original Assignee
National Institute of Metrology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Institute of Metrology filed Critical National Institute of Metrology
Priority to CN202410204907.3A priority Critical patent/CN117807461B/en
Publication of CN117807461A publication Critical patent/CN117807461A/en
Application granted granted Critical
Publication of CN117807461B publication Critical patent/CN117807461B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data preprocessing, in particular to a whiskey wine producing area tracing method and system based on big data, which are used for carrying out similarity analysis on distribution conditions according to content data values of inorganic elements of whiskey wine samples to obtain content distribution similarity among the inorganic elements; dividing all the inorganic elements into inorganic element similarity sets through content distribution similarity to reduce data dimension, and repairing four degrees and content data values according to the content distribution in each inorganic element similarity set to obtain dimension-reduced data values; through the distribution similarity condition of the dimension reduction data values, clustering analysis is carried out on all whiskey wine samples, so that the efficiency of the clustering analysis and the accuracy of the clustering analysis result are higher, namely a more accurate whiskey wine producing place tracing model is obtained, and the whiskey wine producing place tracing effect is better according to the whiskey wine producing place tracing model.

Description

Whisky origin tracing method and system based on big data
Technical Field
The invention relates to the technical field of data preprocessing, in particular to a whisky wine producing area tracing method and system based on big data.
Background
Whiskey is an alcoholic product with regional characteristics, and the taste, quality and flavor of whiskeys corresponding to different wine producing places can be different to a certain extent; with the expansion of whiskey markets and the progress of production technologies, whiskey places are more and more complex, consumers are more and more difficult to learn about whiskey places, and the whiskey market competitiveness is greatly influenced by place information, so that the place tracing of whiskeys is important for consumers and whiskey brands;
In the prior art, clustering analysis is generally carried out on each whiskey sample according to component information such as inorganic element content, organic matter content and the like in the whiskey sample, so as to obtain a whiskey sample set in which each whiskey sample is positioned; determining the traceable wine producing places of each whiskey sample set according to the wine producing places of whiskey samples in the whiskey sample sets; constructing a whiskey production place traceability model; when tracing the production place of the whisky to be traced, further performing clustering analysis according to the component information in the whisky to be traced on the basis of a whisky production place tracing model, and determining the whisky sample set to be traced, so that the tracing production place of the whisky to be traced is obtained.
However, the whiskey has various component information, for example, inorganic elements comprise various metal elements such as zinc, copper, iron and manganese, so that when a whiskey origin tracing model is constructed by carrying out cluster analysis on each whiskey sample and carrying out cluster analysis according to the component information in the whiskey to be traced, a large number of data dimensions corresponding to each whiskey sample, namely, the problem of dimension disaster, occur, the calculation complexity and the calculation amount are huge, the clustering result is easily influenced by noise data, and the efficiency of the cluster analysis and the accuracy of the cluster analysis result are low; the method has the advantages that the whiskey origin tracing model is directly constructed on the component information in all whiskey samples in the prior art, and the effect of whiskey origin tracing is poor.
Disclosure of Invention
In order to solve the technical problem that the effect of tracing the whiskey producing place is poor in a whiskey producing place tracing model obtained by directly constructing component information in all whiskey samples in the prior art, the invention aims to provide a whiskey producing place tracing method and system based on big data, and the adopted technical scheme is as follows:
the invention provides a whisky origin tracing method based on big data, which comprises the following steps:
acquiring a wine producing place of each whiskey sample and a content data value of each inorganic element in the wine producing place;
In each whiskey sample, according to the situation that the content data value of each inorganic element is similar to the content data value of each other inorganic element in distribution in all content data values, obtaining the content distribution similarity between each inorganic element and each other inorganic element;
Dividing all inorganic elements into at least one inorganic element similarity set according to all content distribution similarities in each whiskey sample; obtaining a dimensionality reduction data value of each inorganic element similar set according to the content distribution similarity among the inorganic elements in each inorganic element similar set and the distribution condition of the corresponding content data value;
According to the condition that the distribution of the reduced data values between each whiskey sample and the rest whiskey samples is similar, carrying out cluster analysis on all whiskey samples to obtain a whiskey production place tracing model; and carrying out whiskey production place tracing according to the whiskey production place tracing model by combining the wine production places of each whiskey sample.
Further, the method for obtaining the content distribution similarity comprises the following steps:
wherein, For/>The first/>, corresponding to the individual whisky samplesInorganic element and the/>Content distribution similarity between species inorganic elements; /(I)For/>The first/>, corresponding to the individual whisky samplesA content data value of an inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesA content data value of an inorganic element; /(I)For/>The first corresponding whiskey sampleThe number of inorganic element species other than the seed inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesThe number of inorganic element species other than the seed inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesFirst/>, other than the seed inorganic elementA content data value of an inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesFirst/>, other than the seed inorganic elementA content data value of an inorganic element; /(I)For/>Standard deviation of content data values of all kinds of inorganic elements corresponding to individual whiskey samples; is an exponential function with a natural constant as a base; /(I) Is an absolute value sign.
Further, the method for obtaining the inorganic element similarity set comprises the following steps:
taking each inorganic element corresponding to each whiskey sample as each clustering data point; randomly selecting clustered data points as seed points, and carrying out region growth on all clustered data points of each whiskey sample according to each seed point to obtain at least one inorganic element similarity set; the region growing rule of the region growing is to take a preset number of clustering data points with the maximum content distribution similarity among inorganic elements corresponding to the seed points as new seed points; the growth stopping condition of the region growth is that all clustered data points are in the corresponding inorganic element similarity set.
Further, the method for acquiring the dimension reduction data value comprises the following steps:
For any one of the inorganic element similarity sets:
Calculating the average value of the content data values of all kinds of inorganic elements; the corresponding content data value and the inorganic element type closest to the average value of the content data value are used as the central inorganic element of the inorganic element similar set; taking other inorganic elements except the central inorganic element as dimension-reducing inorganic elements; taking the content distribution similarity between each dimension-reducing inorganic element and the central inorganic element as the dimension-reducing similarity of each dimension-reducing inorganic element;
Calculating the dimension reduction similarity accumulated values of all dimension reduction inorganic elements, and taking the ratio of the dimension reduction similarity of each dimension reduction inorganic element to the dimension reduction similarity accumulated value as the dimension reduction weight of each dimension reduction inorganic element;
Obtaining a weighted data value of each dimension-reducing inorganic element according to the content data value of each dimension-reducing inorganic element and the dimension-reducing weight;
and taking the accumulated value of the weighted data values of all the dimension-reducing inorganic elements as the dimension-reducing data value of the inorganic element similarity set.
Further, the method for acquiring the whisky origin tracing model comprises the following steps:
Arranging the dimensionality reduction data values of all the inorganic element similar sets corresponding to each whiskey sample in a sequence from large to small to obtain a dimensionality reduction data sequence of each whiskey sample; calculating the DTW distance between the reduced-dimension data sequence of each whiskey sample and the reduced-dimension data sequences of the rest whiskey samples through a dynamic sequence normalization algorithm; calculating pearson correlation coefficients between the reduced-dimension data sequences of each whiskey sample and the reduced-dimension data sequences of the rest whiskey samples;
Obtaining the clustering similarity degree between each whiskey sample and each other whiskey samples according to the DTW distance and the Pearson correlation coefficient; the DTW distance and the clustering similarity degree are in a negative correlation relationship, and the pearson correlation coefficient and the clustering similarity degree are in a positive correlation relationship;
Performing cluster analysis on all whiskey samples through a k-means clustering algorithm according to all the cluster similarity degrees to obtain whiskey sample sets where each whiskey sample is located; and obtaining a whiskey origin tracing model according to the whiskey sample set of each whiskey sample.
Further, the method for tracing the whisky place of origin by combining the whisky samples of origin according to the whisky place of origin tracing model comprises the following steps:
Taking the wine producing place with the most whisky samples in each whisky sample set as the tracing wine producing place of each whisky sample set;
On the basis of a whiskey production place tracing model, calculating the clustering similarity degree between whiskey to be traced and each whiskey sample; clustering the traceable whiskey into a corresponding whiskey sample set through a k-means clustering algorithm according to the clustering similarity; and taking the tracing wine producing place corresponding to the whisky sample set where the whisky to be traced is located as the tracing wine producing place of the whisky to be traced.
Further, the method for acquiring the weighted data value comprises the following steps:
and taking the product of the dimension reduction weight of each dimension reduction inorganic element and the corresponding content data value as the weighted data value of each dimension reduction inorganic element.
Further, the method for obtaining the clustering similarity degree comprises the following steps:
And taking a normalized value of the product of the negative correlation mapping value of the DTW distance and the pearson correlation coefficient as the clustering similarity degree between each whiskey sample and each other whiskey sample.
Further, the preset number is set to 3.
The invention also provides a whiskey wine producing place tracing system based on big data, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes any step of the whiskey wine producing place tracing method based on big data when executing the computer program.
The invention has the following beneficial effects:
The reason that the effect of tracing the whiskey production place is poor in the prior art is that the data dimension of the component information corresponding to the whiskey is more, the accuracy and the efficiency of clustering analysis are affected, and therefore the whiskey production place tracing is affected; therefore, if the tracing effect of the whiskey wine producing area is to be improved, the efficiency of the corresponding cluster analysis and the accuracy of the cluster analysis result in the construction process of the whiskey wine producing area tracing model are required to be improved; the key to influence the clustering process is the number of component information dimensions corresponding to whiskey; therefore, the data dimension reduction method can be used for carrying out the data dimension reduction on the component information of the whiskey, namely the inorganic element dimension in the invention.
The principle of dimension reduction is to extract principal component information or effective information, and the principle of cluster analysis is to cluster similar or approximate data into one cluster set, and the effective information corresponding to the data in each corresponding cluster set has consistency on the whole, so that the dimension reduction of the data can be carried out by combining a cluster analysis method; therefore, the invention performs cluster analysis on all data dimensions, namely content data values of all kinds of inorganic elements, in each whiskey sample; the conventional clustering analysis algorithm performs clustering analysis based on the distance between data points, namely the similarity between the data points, wherein the smaller the clustering distance is, the more likely a cluster is; it is further required to calculate the degree of similarity between different kinds of inorganic elements in each whiskey sample and perform cluster analysis on all the inorganic elements according to the degree of similarity. The method is characterized in that each inorganic element of each whiskey sample corresponds to one content data value, the similarity is represented only based on the difference between the content data values, the overall similarity representation effect between the inorganic elements is poor, local characteristics of the content data values of the inorganic elements of the whiskey samples cannot be reserved, namely, the difference representation similarity between the content data values has a certain limitation, so that the content distribution similarity between each inorganic element and other inorganic elements is obtained according to the situation that the content data values of each inorganic element and the content data values of other inorganic elements are distributed similarly in all the content data values in order to better reserve the local characteristics of the content data of the inorganic elements.
The inorganic element similarity sets can be divided by further classifying and dividing according to the content distribution similarity, namely, the local characteristics of the inorganic elements in each inorganic element similarity set and the content data values have similarity, and the dimension reduction can be realized by taking the whole inorganic element similarity set as a dimension; therefore, the dimensionality-reduced data representative value corresponding to each inorganic element similarity set is required to be obtained, namely, the dimensionality-reduced data value is required to be subjected to subsequent similarity analysis; for the inorganic element similar set, the larger the content data value of each corresponding inorganic element is, the higher the content distribution similarity of the whole is, which means that the higher the reliability of the corresponding main component value is, the larger the corresponding dimension reduction data value is; therefore, the invention obtains the dimensionality reduction data value of each inorganic element similar set according to the content distribution similarity among the inorganic elements in each inorganic element similar set and the distribution condition of the corresponding content data value.
After the dimensionality reduction data value is obtained, the data dimension reduction of the inorganic element dimension of the whiskey samples can be realized, so that the data dimension of each whiskey sample is fewer, the problem of dimension disasters is avoided, the calculation complexity and the calculation amount of subsequent clustering analysis and the influence of noise data are reduced, therefore, clustering analysis is further carried out on all whiskey samples according to the condition that the dimensionality reduction data value between each whiskey sample and each other whiskey sample is similar, the efficiency of the clustering analysis and the accuracy of the clustering analysis result are higher, namely, a more accurate whiskey origin tracing model is obtained, and the effect of carrying out whiskey origin tracing according to the whiskey origin tracing model is better.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a whisky origin tracing method based on big data according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given of a whiskey production area tracing method and system based on big data according to the present invention, and the detailed description of the specific implementation, structure, characteristics and effects thereof is given below with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The detailed scheme of the whisky origin tracing method and system based on big data is specifically described below with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a whisky origin tracing method based on big data according to an embodiment of the present invention is shown, where the method includes:
step S1: the place of origin of each whiskey sample and the content data value of each inorganic element therein were obtained.
The embodiment of the invention aims to provide a whiskey wine producing place tracing method based on big data, which is used for analyzing according to the inorganic element content data of each whiskey wine sample and the corresponding wine producing place, and the whiskey wine producing place tracing model is constructed to be more accurate by carrying out clustering analysis after carrying out dimension reduction on the content data values of the inorganic elements in all dimensions, so that the whiskey wine producing place tracing effect is better by finally combining the wine producing places according to the whiskey wine producing place tracing model.
Therefore, the embodiment of the invention firstly acquires the wine producing place of each whiskey sample and the content data value of each inorganic element. In the embodiment of the invention, 10 whiskey samples are randomly selected in each wine producing place, so that all whiskey samples are obtained for analysis, and each whiskey sample corresponds to the wine producing place; analyzing the content of each inorganic element in each whiskey sample by an inorganic element analysis method; and taking the ratio of the content of each inorganic element in the total content of all inorganic elements as a content data value of each inorganic element; the content data value is characterized by a duty ratio method, so that errors caused by different capacities of different whiskey samples can be avoided; in the embodiment of the invention, the content data values of all inorganic element types in the whiskey are analyzed, and an implementer can select a part of inorganic element types according to specific implementation environments for analysis, so that further description is omitted. It should be noted that, in addition to analyzing the content of each inorganic element in the whiskey sample by the inorganic element analysis method, the practitioner may also analyze the content data value of each stable isotope in the whiskey sample by the stable isotope analysis method, and the stable isotope types include, but are not limited to, carbon, hydrogen, oxygen, and nitrogen; the content data value of each of the tagged organic species in the whiskey samples may also be analyzed by a tagged organic analysis method, and the tagged organic species include, but are not limited to, alcohols, acids, esters; the analysis methods of the stable isotope and the marked organic matter are the same as those of the inorganic element, and are not further described herein.
Step S2: and in each whiskey sample, obtaining the content distribution similarity between each inorganic element and each other inorganic element according to the situation that the content data value of each inorganic element and the content data value of each other inorganic element are similar in distribution in all content data values.
The reason that the effect of tracing the whiskey production place is poor in the prior art is that the data dimension of the component information corresponding to the whiskey is more, the accuracy and the efficiency of clustering analysis are affected, and therefore the whiskey production place tracing is affected; therefore, if the tracing effect of the whiskey wine producing area is to be improved, the efficiency of the corresponding cluster analysis and the accuracy of the cluster analysis result in the construction process of the whiskey wine producing area tracing model are required to be improved; the key to influence the clustering process is the number of component information dimensions corresponding to whiskey; therefore, the data dimension reduction method can be used for carrying out the data dimension reduction on the component information of the whiskey, namely the inorganic element dimension in the invention.
The principle of dimension reduction is to extract principal component information or effective information, and the principle of cluster analysis is to cluster similar or approximate data into one cluster set, and the effective information corresponding to the data in each corresponding cluster set has consistency on the whole, so that the dimension reduction of the data can be carried out by combining a cluster analysis method; therefore, the invention performs cluster analysis on all data dimensions, namely content data values of all kinds of inorganic elements, in each whiskey sample; the conventional clustering analysis algorithm performs clustering analysis based on the distance between data points, namely the similarity between the data points, wherein the smaller the clustering distance is, the more likely a cluster is; it is further required to calculate the degree of similarity between different kinds of inorganic elements in each whiskey sample and perform cluster analysis on all the inorganic elements according to the degree of similarity. Each inorganic element of each whiskey sample corresponds to one content data value, the similarity is represented only based on the difference between the content data values, the overall similarity representing effect between the inorganic elements is poor, local characteristics of the content data values of the inorganic elements of the whiskey samples cannot be reserved, namely, the difference representing similarity between the content data values has a certain limitation, so that the local characteristics of the content data of the inorganic elements are reserved better.
Preferably, each inorganic element of each whiskey sample is taken as the first element in turnThe first/>, corresponding to the individual whisky samplesInorganic element species will be the/>Every inorganic element other than the species inorganic element as the/>Inorganic element, the first/>The first/>, corresponding to the individual whisky samplesInorganic element and the/>The method for obtaining the content distribution similarity between the inorganic elements comprises the following steps:
wherein, For/>The first/>, corresponding to the individual whisky samplesInorganic element and the/>Content distribution similarity between species inorganic elements; /(I)For/>The first/>, corresponding to the individual whisky samplesA content data value of an inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesA content data value of an inorganic element; /(I)For/>The first corresponding whiskey sampleThe number of inorganic element species other than the seed inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesThe number of inorganic element species other than the seed inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesFirst/>, other than the seed inorganic elementA content data value of an inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesFirst/>, other than the seed inorganic elementA content data value of an inorganic element; /(I)For/>Standard deviation of content data values of all kinds of inorganic elements corresponding to individual whiskey samples; is an exponential function with a natural constant as a base; /(I) Is an absolute value sign.
The Euclidean distance between the two inorganic element content data values cannot be directly calculated because the local characteristics of the inorganic element content data values of the whiskey samples need to be reserved; considering that the Gaussian kernel function can be analyzed by combining local features on the basis of Euclidean distance between data, the Gaussian kernel function is used for calculating the first in the numerator of a calculation formula of the content distribution similarityInorganic element and the/>Distribution similarity between species of inorganic elements; meanwhile, in order to avoid the influence of different dimensions, the accumulated value of the distribution similarity between each inorganic element and other various elements is calculated through a Gaussian kernel function to serve as a denominator, and the limitation of the value range is carried out, so that the similarity corresponding to the data values of different dimensions can be expressed under the same dimension, and the subsequent data dimension reduction operation is convenient. Further consider the/>Inorganic element and the/>The inorganic elements are different in overall distribution, and the corresponding comparison and analysis results are different when denominators are constructed, namely the/>Accumulated value of distribution similarity between inorganic element and other various elements, and/>The accumulated value of the distribution similarity between the inorganic elements and other various elements is different, namely the calculated data similarity/>Similarity to data/>There is a difference, so in order to make the pair/>Inorganic element and the/>The similarity characterization between the inorganic elements is more accurate, and the first/> is characterized by calculating the average value of the similarity of two dataThe first/>, corresponding to the individual whisky samplesInorganic element and the/>Similarity in content distribution between inorganic elements.
Step S3: dividing all inorganic elements into at least one inorganic element similarity set according to all content distribution similarities in each whiskey sample; and obtaining the dimensionality reduction data value of each inorganic element similar set according to the content distribution similarity among the inorganic elements in each inorganic element similar set and the distribution condition of the corresponding content data value.
The inorganic element similarity sets can be divided by further classifying and dividing according to the content distribution similarity, namely, the local characteristics and the content data values of the inorganic elements in each inorganic element similarity set have certain similarity, and the dimension reduction can be realized by taking the whole inorganic element similarity set as a dimension; therefore, the dimensionality-reduced data representative value corresponding to each inorganic element similarity set is required to be obtained, namely, the dimensionality-reduced data value is required to be subjected to subsequent similarity analysis; for the inorganic element similar set, the larger the content data value of each corresponding inorganic element is, the higher the content distribution similarity of the whole is, which means that the higher the reliability of the corresponding main component value is, the larger the corresponding dimension reduction data value is; therefore, according to the embodiment of the invention, all inorganic elements are divided into at least one inorganic element similarity set according to the similarity of all content distribution in each whiskey sample; and after the inorganic element similarity sets with similarity between the local characteristics of the inorganic elements and the content data values are obtained, further taking the whole inorganic element similarity sets as a dimension to realize dimension reduction. According to the embodiment of the invention, the dimensionality reduction data value of each inorganic element similar set is obtained according to the content distribution similarity among the inorganic elements in each inorganic element similar set and the distribution condition of the corresponding content data value.
Preferably, the method for obtaining the inorganic element similarity set includes:
For the inorganic element similarity set, the local characteristics and the content data values of the inorganic elements have certain similarity, and the clustering algorithm can only divide the set of the similarity according to the content data values of the inorganic elements, so that the local characteristics of the inorganic elements cannot be ensured; and the quantity of the content distribution similarity among the inorganic elements is more, and the calculation complexity corresponding to the method for directly clustering based on the content distribution similarity is higher. The local similarity is emphasized by a region growing algorithm based on the content distribution similarity, and compared with a method for carrying out clustering analysis according to all the content distribution similarities, the method has lower computational complexity, so that each inorganic element corresponding to each whiskey sample is used as each clustering data point in the embodiment of the invention; randomly selecting clustered data points as seed points, and carrying out region growth on all clustered data points of each whiskey sample according to each seed point to obtain at least one inorganic element similarity set; the region growing rule of the region growing is to take a preset number of clustered data points with the maximum content distribution similarity among inorganic elements corresponding to the seed points as new seed points; the growth stop condition for the region growth is that all clustered data points are in the corresponding inorganic element similarity set. Preferably, the preset number is set to 3. It should be noted that, the implementer can adjust the preset number according to the specific implementation environment, and further description is omitted herein.
For example, for the inorganic element a, the three inorganic elements with the largest content distribution similarity are B, C and D, respectively, and further, the three inorganic elements with the largest content distribution similarity corresponding to B, C and D are analyzed, for example, B corresponds to B1, B2, B3, C corresponds to C1, C2, C3, D corresponds to D1, D2, D3, respectively; at this time, the corresponding inorganic element similarity set includes A, B, C, D, B, B2, B3, C1, C2, C3, D1, D2, D3, and if there is a repetition of the element, only one of the repeated inorganic elements is retained. Further, the expansion is continued by taking B1, B2, B3, C1, C2, C3, D1, D2 and D3 as new growth points until the expansion cannot be carried out until new inorganic elements are obtained, and a corresponding inorganic element similar set is obtained.
Preferably, the method for acquiring the dimension reduction data value comprises the following steps:
For any one of the inorganic element similarity sets:
Calculating the average value of the content data values of all kinds of inorganic elements; the inorganic element type with the closest corresponding content data value and the content data value average value is used as the center inorganic element of the inorganic element similar set; taking other inorganic elements except the central inorganic element as dimension-reducing inorganic elements; taking the content distribution similarity between each dimension-reducing inorganic element and the central inorganic element as the dimension-reducing similarity of each dimension-reducing inorganic element; calculating the dimension reduction similarity accumulated values of all the dimension reduction inorganic elements, and taking the ratio of the dimension reduction similarity of each dimension reduction inorganic element to the dimension reduction similarity accumulated value as the dimension reduction weight of each dimension reduction inorganic element.
In each inorganic element similar set, the corresponding inorganic elements have certain similarity, so that the average value of the content data values of all the inorganic elements can characterize the characteristics of the whole content data value of each inorganic element similar set to a certain extent, and therefore, the inorganic element corresponding to the content data value closest to the average value of the content data values is taken as a central inorganic element, namely the representative element in each inorganic element similar set. Considering that the content distribution similarity between different dimension-reducing inorganic elements and the central inorganic element is different, if the content data value of the central inorganic element is only taken as the corresponding dimension-reducing data value, the representation of the local characteristics of the content data value is lacking, and when the average value of the content data values of all the inorganic elements is taken as the dimension-reducing data value, the influence of the content distribution similarity between the elements on the dimension-reducing result data is not considered; therefore, when the central inorganic element is used as the data value representation element of the inorganic element similarity set, and the content distribution similarity between each dimension-reducing inorganic element and the central inorganic element is used as the weight to perform content data value average value calculation, the obtained dimension-reducing data value can be enabled to have the influence on dimension-reducing result data by combining the content distribution similarity between the elements on the basis of combining the local characteristics of the content data value, and the higher the content distribution similarity between the corresponding dimension-reducing inorganic element and the central inorganic element is, the larger the dimension-reducing weight is. Therefore, the weighted data value of each dimension-reducing inorganic element is further obtained according to the content data value and the dimension-reducing weight of each dimension-reducing inorganic element; preferably, the method for acquiring the weighted data value includes: taking the product of the dimension reduction weight of each dimension reduction inorganic element and the corresponding content data value as the weighted data value of each dimension reduction inorganic element, wherein the larger the weighted data value is, the larger the dimension reduction data value is, which indicates the corresponding inorganic element similar set is.
Because the weighted data values are weighted in the form of the duty ratio, the dimension corresponding to the calculated dimension reduction data values and the dimension of the content data values can be unified by calculating the dimension reduction data values in an accumulation mode; the embodiment of the invention takes the accumulated value of the weighted data values of all the dimension-reducing inorganic elements as the dimension-reducing data value of the inorganic element similarity set.
In the embodiment of the invention, each inorganic element similar set of each whiskey sample is sequentially used as the first elementIndividual inorganic element similarity set, then the/>The method for acquiring the dimension reduction data value of the similar set of the inorganic elements is expressed as the following formula:
wherein, For/>Reduced data values of a similar set of inorganic elements,/>For/>The number of dimension-reducing inorganic elements in the similar set of inorganic elements; /(I)For/>The/>, in the similar set of inorganic elementsDimension reduction similarity of the individual dimension reduction inorganic elements; /(I)For/>Accumulated values of dimension reduction similarity of all dimension reduction inorganic elements in the inorganic element similarity set; /(I)For/>The/>, in the similar set of inorganic elementsDimension reduction weight of individual dimension reduction inorganic elements; /(I)For/>The/>, in the similar set of inorganic elementsContent data values of individual dimension-reducing inorganic elements; /(I)For/>The/>, in the similar set of inorganic elementsWeighted data values of the individual dimension-reducing inorganic elements.
Step S4: according to the condition that the distribution of the reduced data values between each whiskey sample and the rest whiskey samples is similar, carrying out cluster analysis on all whiskey samples to obtain a whiskey production place tracing model; and carrying out whiskey production place tracing according to the whiskey production place tracing model by combining the wine production place of each whiskey sample.
After the dimension reduction data value is obtained, the dimension reduction of the data of the inorganic element of the whiskey samples can be realized, so that the data dimension of each whiskey sample is smaller, the problem of dimension disasters is avoided, the calculation complexity and the calculation amount of subsequent clustering analysis and the influence of noise data are reduced, therefore, the aim of the embodiment of the invention is combined, and the clustering analysis is carried out on all whiskey samples according to the condition that the dimension reduction data value distribution between each whiskey sample and each other whiskey sample is similar, so that the tracing accuracy and efficiency of the whiskey production place tracing model are higher.
Preferably, the method for acquiring the whisky origin tracing model comprises the following steps:
And arranging the dimensionality reduction data values of all the inorganic element similar sets corresponding to each whiskey sample in a sequence from large to small to obtain a dimensionality reduction data sequence of each whiskey sample. The purpose of acquiring the dimension-reduced data sequence is to perform normalization on the data arrangement and prevent the influence on the subsequent sequence similarity association calculation, so that an implementer can arrange the inorganic element similarity sets in the whiskey samples in other sequences according to the specific implementation environment, for example, the dimension-reduced data values are from small to large, and further description is omitted here.
Considering that the number of the inorganic element similarity sets of different whiskey samples may be different, when calculating the sequence similarity, the dynamic time warping algorithm can calculate the similarity condition between two sequences with different sequence lengths, namely the DTW distance, and the smaller the corresponding DTW distance, the more similar the corresponding two sequences are. Therefore, the embodiment of the invention calculates the DTW distance between the reduced-dimension data sequence of each whiskey sample and the reduced-dimension data sequences of the rest whiskey samples through a dynamic sequence normalization algorithm.
Whereas calculating the correlation between two sequences is typically characterized by pearson correlation coefficients, embodiments of the present invention calculate pearson correlation coefficients between the reduced-dimension data sequence of each whiskey sample and the reduced-dimension data sequences of each of the remaining whiskey samples. When the pearson correlation coefficient is calculated in the correlation between sequences, the sequence length, that is, the number of elements, of the two sequences needs to be ensured to be the same. The larger the corresponding pearson correlation coefficient, the more correlated the data-reducing sequences of the corresponding two whiskey samples, i.e. the greater the correlation between the two whiskey samples, the more likely it is that the whiskey samples belong to one whiskey sample set. It should be noted that, the pearson correlation coefficient and the dynamic time warping algorithm are well known in the art, and are not further defined and described herein.
In a clustering algorithm, higher similarity and relevance generally exist between two data points in the same clustering set, so that the higher the similarity and relevance between two whiskey samples, the higher the corresponding clustering similarity is, and the more likely the corresponding clustering similarity belongs to the same clustering set, and therefore, the clustering similarity between each whiskey sample and other whiskey samples is obtained according to the DTW distance and the pearson correlation coefficient; the DTW distance and the clustering similarity degree are in a negative correlation relationship, and the pearson correlation coefficient and the clustering similarity degree are in a positive correlation relationship; preferably, the method for acquiring the clustering similarity degree comprises the following steps:
And taking a normalized value of the product of the negative correlation mapping value of the DTW distance and the pearson correlation coefficient as the clustering similarity degree between each whiskey sample and each other whiskey sample.
In the embodiment of the invention, each whiskey sample is taken as the first sample in sequenceSequentially mixing whisky samplesEach whiskey sample other than the individual whiskey sample as the/>Individual whisky samples, the/>Individual whisky samples and/>The method for obtaining the clustering similarity degree between the whiskey samples is expressed as the following formula:
wherein, For/>Individual whisky samples and/>The degree of clustering similarity between individual whiskey samples,For/>Reduced-dimension data sequence and/>, of individual whiskey samplesDTW distance between reduced-dimension data sequences of individual whiskey samples,/>For/>Reduced-dimension data sequence and/>, of individual whiskey samplesPearson correlation coefficients between the reduced-dimensional data sequences of individual whiskey samples; /(I)As an exponential function based on natural constants,/>For normalizing the function, the normalization method in the embodiment of the present invention samples linear normalization, and the implementer may normalize by other normalization methods according to the specific implementation environment, and the implementer may perform negative correlation mapping, such as reciprocal, by other methods than exponential functions based on natural constants, which are not further described herein.
In another embodiment of the present invention, the cosine similarity and the structural similarity coefficient between the reduced-dimension data sequence of each whiskey sample and the reduced-dimension data sequences of each other whiskey sample are calculated, and the normalized value of the product of the cosine similarity and the structural similarity coefficient is used as the corresponding clustering similarity, and when the cosine similarity and the structural similarity coefficient are calculated, the length of the two reduced-dimension data sequences is kept consistent by an interpolation method or a 0 supplementing method, and the calculation method of the cosine similarity and the structural similarity coefficient is a prior art well known to those skilled in the art, and is not further described herein.
Further, according to the embodiment of the invention, a clustering model is constructed to trace the origin of the whiskey, and all whiskey samples are subjected to clustering analysis through a k-means clustering algorithm according to all clustering similarity degrees to obtain a whiskey sample set in which each whiskey sample is located; according to the whiskey sample set of each whiskey sample, a whiskey origin tracing model is obtained, namely the whiskey origin tracing model is a clustering result obtained by taking whiskey samples as sample points and taking clustering similarity degree as distance, and in the whiskey origin tracing model, each whiskey sample has a corresponding clustering set, so that if the clustering similarity degree between each newly added whiskey and each whiskey sample is calculated on the basis of the whiskey origin tracing model, the newly added whiskey can be clustered, and the origin tracing can be further realized. It should be noted that, the number of whiskey sample sets in the whiskey origin traceability model is related to the k value in the k-means clustering algorithm, and the k value is set as the type of the wine origin in the embodiment of the invention, that is, the type of the wine origin in all whiskey samples is set as the number of whiskey sample sets. It should be noted that the k-means clustering algorithm is well known in the art, and is not further defined and described herein.
Finally, the embodiment of the invention performs whiskey production place tracing according to the whiskey production place tracing model. Preferably, the method for tracing the whisky wine producing place according to the whisky wine producing place tracing model comprises the following steps:
The content data values of all inorganic elements corresponding to the whiskey samples in each wine producing area are consistent or similar, and the whiskey samples in each whiskey sample set have similarity, so that the whiskey samples can correspond to one wine producing area on the whole.
Because the whiskey origin tracing model is a model corresponding to the clustering result, further clustering analysis is needed according to the clustering result when tracing is carried out through the model, and therefore the embodiment of the invention calculates the clustering similarity degree between whiskey to be traced and each whiskey sample on the basis of the whiskey origin tracing model; clustering the traceable whiskey into a corresponding whiskey sample set through a k-means clustering algorithm according to the clustering similarity; namely, each whiskey to be traced is divided into a whiskey sample set with the highest corresponding similarity, and the tracing wine producing place corresponding to the whiskey sample set where the whiskey to be traced is located is further used as the tracing wine producing place of the whiskey to be traced.
In summary, the similarity analysis on the distribution condition is performed according to the content data value of the inorganic elements of the whiskey sample to obtain the content distribution similarity among the inorganic elements; dividing all the inorganic elements into inorganic element similarity sets through content distribution similarity to reduce data dimension, and repairing four degrees and content data values according to the content distribution in each inorganic element similarity set to obtain dimension-reduced data values; through the distribution similarity condition of the dimension reduction data values, clustering analysis is carried out on all whiskey wine samples, so that the efficiency of the clustering analysis and the accuracy of the clustering analysis result are higher, namely a more accurate whiskey wine producing place tracing model is obtained, and the whiskey wine producing place tracing effect is better according to the whiskey wine producing place tracing model.
The invention also provides a whiskey wine producing place tracing system based on big data, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes any step of the whiskey wine producing place tracing method based on big data when executing the computer program.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims (9)

1. The whisky origin tracing method based on big data is characterized by comprising the following steps of:
acquiring a wine producing place of each whiskey sample and a content data value of each inorganic element in the wine producing place;
In each whiskey sample, according to the situation that the content data value of each inorganic element is similar to the content data value of each other inorganic element in distribution in all content data values, obtaining the content distribution similarity between each inorganic element and each other inorganic element;
Dividing all inorganic elements into at least one inorganic element similarity set according to all content distribution similarities in each whiskey sample; obtaining a dimensionality reduction data value of each inorganic element similar set according to the content distribution similarity among the inorganic elements in each inorganic element similar set and the distribution condition of the corresponding content data value;
according to the condition that the distribution of the reduced data values between each whiskey sample and the rest whiskey samples is similar, carrying out cluster analysis on all whiskey samples to obtain a whiskey production place tracing model; performing whiskey production place tracing according to the whiskey production place tracing model by combining the wine production place of each whiskey sample;
The method for acquiring the content distribution similarity comprises the following steps:
wherein, For/>The first/>, corresponding to the individual whisky samplesInorganic element and the/>Content distribution similarity between species inorganic elements; /(I)For/>The first/>, corresponding to the individual whisky samplesA content data value of an inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesA content data value of an inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesThe number of inorganic element species other than the seed inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesThe number of inorganic element species other than the seed inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesFirst/>, other than the seed inorganic elementA content data value of an inorganic element; /(I)For/>The first/>, corresponding to the individual whisky samplesFirst/>, other than the seed inorganic elementA content data value of an inorganic element; /(I)For/>Standard deviation of content data values of all kinds of inorganic elements corresponding to individual whiskey samples; /(I)Is an exponential function with a natural constant as a base; /(I)Is an absolute value sign.
2. The whisky origin tracing method based on big data according to claim 1, wherein the method for acquiring the inorganic element similarity set comprises the following steps:
taking each inorganic element corresponding to each whiskey sample as each clustering data point; randomly selecting clustered data points as seed points, and carrying out region growth on all clustered data points of each whiskey sample according to each seed point to obtain at least one inorganic element similarity set; the region growing rule of the region growing is to take a preset number of clustering data points with the maximum content distribution similarity among inorganic elements corresponding to the seed points as new seed points; the growth stopping condition of the region growth is that all clustered data points are in the corresponding inorganic element similarity set.
3. The whisky origin tracing method based on big data according to claim 1, wherein the method for acquiring the dimension reduction data value comprises the following steps:
For any one of the inorganic element similarity sets:
Calculating the average value of the content data values of all kinds of inorganic elements; the corresponding content data value and the inorganic element type closest to the average value of the content data value are used as the central inorganic element of the inorganic element similar set; taking other inorganic elements except the central inorganic element as dimension-reducing inorganic elements; taking the content distribution similarity between each dimension-reducing inorganic element and the central inorganic element as the dimension-reducing similarity of each dimension-reducing inorganic element;
Calculating the dimension reduction similarity accumulated values of all dimension reduction inorganic elements, and taking the ratio of the dimension reduction similarity of each dimension reduction inorganic element to the dimension reduction similarity accumulated value as the dimension reduction weight of each dimension reduction inorganic element;
Obtaining a weighted data value of each dimension-reducing inorganic element according to the content data value of each dimension-reducing inorganic element and the dimension-reducing weight;
and taking the accumulated value of the weighted data values of all the dimension-reducing inorganic elements as the dimension-reducing data value of the inorganic element similarity set.
4. The whisky origin tracing method based on big data according to claim 1, wherein the whisky origin tracing model obtaining method comprises the following steps:
Arranging the dimensionality reduction data values of all the inorganic element similar sets corresponding to each whiskey sample in a sequence from large to small to obtain a dimensionality reduction data sequence of each whiskey sample; calculating the DTW distance between the reduced-dimension data sequence of each whiskey sample and the reduced-dimension data sequences of the rest whiskey samples through a dynamic sequence normalization algorithm; calculating pearson correlation coefficients between the reduced-dimension data sequences of each whiskey sample and the reduced-dimension data sequences of the rest whiskey samples;
Obtaining the clustering similarity degree between each whiskey sample and each other whiskey samples according to the DTW distance and the Pearson correlation coefficient; the DTW distance and the clustering similarity degree are in a negative correlation relationship, and the pearson correlation coefficient and the clustering similarity degree are in a positive correlation relationship;
Performing cluster analysis on all whiskey samples through a k-means clustering algorithm according to all the cluster similarity degrees to obtain whiskey sample sets where each whiskey sample is located; and obtaining a whiskey origin tracing model according to the whiskey sample set of each whiskey sample.
5. The method for tracing the whisky place of origin based on big data according to claim 4, wherein the method for tracing the whisky place of origin by combining the place of origin of each whisky sample according to the whisky place of origin tracing model comprises the following steps:
Taking the wine producing place with the most whisky samples in each whisky sample set as the tracing wine producing place of each whisky sample set;
On the basis of a whiskey production place tracing model, calculating the clustering similarity degree between whiskey to be traced and each whiskey sample; clustering the traceable whiskey into a corresponding whiskey sample set through a k-means clustering algorithm according to the clustering similarity; and taking the tracing wine producing place corresponding to the whisky sample set where the whisky to be traced is located as the tracing wine producing place of the whisky to be traced.
6. A whiskey origin tracing method based on big data according to claim 3, wherein said weighted data value obtaining method comprises:
and taking the product of the dimension reduction weight of each dimension reduction inorganic element and the corresponding content data value as the weighted data value of each dimension reduction inorganic element.
7. The whisky origin tracing method based on big data according to claim 4, wherein the clustering similarity obtaining method comprises the following steps:
And taking a normalized value of the product of the negative correlation mapping value of the DTW distance and the pearson correlation coefficient as the clustering similarity degree between each whiskey sample and each other whiskey sample.
8. The whiskey origin tracing method based on big data according to claim 2, wherein said preset number is set to 3.
9. A whiskey origin tracing system based on big data, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-8 when executing the computer program.
CN202410204907.3A 2024-02-26 2024-02-26 Whisky origin tracing method and system based on big data Active CN117807461B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410204907.3A CN117807461B (en) 2024-02-26 2024-02-26 Whisky origin tracing method and system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410204907.3A CN117807461B (en) 2024-02-26 2024-02-26 Whisky origin tracing method and system based on big data

Publications (2)

Publication Number Publication Date
CN117807461A CN117807461A (en) 2024-04-02
CN117807461B true CN117807461B (en) 2024-04-26

Family

ID=90431682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410204907.3A Active CN117807461B (en) 2024-02-26 2024-02-26 Whisky origin tracing method and system based on big data

Country Status (1)

Country Link
CN (1) CN117807461B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US6134541A (en) * 1997-10-31 2000-10-17 International Business Machines Corporation Searching multidimensional indexes using associated clustering and dimension reduction information
CN105303450A (en) * 2015-11-09 2016-02-03 北京工业大学 Complex network community discovery method based on spectral clustering improved intersection
CN109994200A (en) * 2019-03-08 2019-07-09 华南理工大学 A kind of multiple groups cancer data confluence analysis method based on similarity fusion
CN112180056A (en) * 2020-10-27 2021-01-05 黄家岭 Maotai-flavor liquor tracing method and system based on rare earth element detection
CN114493632A (en) * 2022-01-24 2022-05-13 山东浪潮工业互联网产业股份有限公司 Authentication method and equipment for real-year wine
CN117573740A (en) * 2023-12-01 2024-02-20 浙江警察学院 Urban construction environment traffic accident spatial distribution influence factor analysis method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106248810A (en) * 2016-06-08 2016-12-21 深圳出入境检验检疫局食品检验检疫技术中心 A kind of wine place of production based on multielement and stable isotope source tracing method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6122628A (en) * 1997-10-31 2000-09-19 International Business Machines Corporation Multidimensional data clustering and dimension reduction for indexing and searching
US6134541A (en) * 1997-10-31 2000-10-17 International Business Machines Corporation Searching multidimensional indexes using associated clustering and dimension reduction information
CN105303450A (en) * 2015-11-09 2016-02-03 北京工业大学 Complex network community discovery method based on spectral clustering improved intersection
CN109994200A (en) * 2019-03-08 2019-07-09 华南理工大学 A kind of multiple groups cancer data confluence analysis method based on similarity fusion
CN112180056A (en) * 2020-10-27 2021-01-05 黄家岭 Maotai-flavor liquor tracing method and system based on rare earth element detection
CN114493632A (en) * 2022-01-24 2022-05-13 山东浪潮工业互联网产业股份有限公司 Authentication method and equipment for real-year wine
CN117573740A (en) * 2023-12-01 2024-02-20 浙江警察学院 Urban construction environment traffic accident spatial distribution influence factor analysis method

Also Published As

Publication number Publication date
CN117807461A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN109376772B (en) Power load combination prediction method based on neural network model
CN111899882B (en) Method and system for predicting cancer
CN104572886B (en) The financial time series similarity query method represented based on K line charts
CN112434662B (en) Tea leaf scab automatic identification algorithm based on multi-scale convolutional neural network
CN116414076B (en) Intelligent monitoring system for recovered alcohol production data
US20130304783A1 (en) Computer-implemented method for analyzing multivariate data
CN112270596A (en) Risk control system and method based on user portrait construction
CN115800245A (en) Short-term load prediction method based on SARIMA-random forest combined model
CN112036476A (en) Data feature selection method and device based on two-classification service and computer equipment
CN115545790B (en) Price data prediction method, price data prediction device, electronic equipment and storage medium
CN114003636A (en) Multivariate time sequence similarity searching method based on variable correlation
CN109597901B (en) Data analysis method based on biological data
CN117807461B (en) Whisky origin tracing method and system based on big data
Xie et al. A two-stage method for improving discrimination and variable selection in DEA models
CN116975535A (en) Multi-parameter data analysis method based on soil environment monitoring data
CN115935081A (en) Expert recommendation method based on user portrait and content collaborative filtering
CN112465054B (en) FCN-based multivariate time series data classification method
CN113792749A (en) Time series data abnormity detection method, device, equipment and storage medium
CN111488520B (en) Crop planting type recommendation information processing device, method and storage medium
CN112085522B (en) Construction cost data processing method, system, device and medium for engineering project
CN115206437A (en) Intelligent screening system for mitochondrial effect molecules and construction method and application thereof
CN114757495A (en) Membership value quantitative evaluation method based on logistic regression
CN117828002B (en) Intelligent management method and system for land resource information data
CN111310842A (en) Density self-adaptive rapid clustering method
Ahmad et al. Feature weighing for efficient clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant