Disclosure of Invention
In view of the above, it is necessary to provide a method and a system for comparing sensory quality of tea leaves based on an evaluation comment, which can improve accuracy and stability of a result of comparison of the quality of tea leaves.
A method for comparing sensory quality of tea based on an assessment comment comprises the following steps:
presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients, and the corresponding assignment of the degree adverbs comprises scale values;
respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;
acquiring a Hadamard product of the term weight vector and the scale vector as a comment vector;
extracting the maximum value and the minimum value of each dimension between the comment vectors according to a preset method to serve as the maximum value comment vector and the minimum value comment vector according to the comment vectors of each sample in the batch; calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector;
calculating a sample score for each sample based on the maximum similarity and the minimum similarity;
and judging the quality of the sensory quality of each sample according to the score of the sample.
In one embodiment, the corresponding assignment of the primitive term includes, in addition to the weight coefficient: positive and negative coefficient;
the step of setting the term weight vector and the scale vector of the batch respectively by the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch further comprises,
determining the commendatory and derogatory coefficient vector of the batch according to the commendatory and derogatory coefficients of the primitive terms contained in the comments of the samples of the batch;
the step of obtaining the hadamard products of the term weight vector and the scale vector as the comment vector specifically includes obtaining the hadamard products of the term weight vector, the scale vector and the commendatory and derogatory coefficient vector as the comment vector.
In one embodiment, the step of presetting the comment words and corresponding assignments thereof specifically includes:
defining comment words and corresponding assignments according to preset sensory evaluation categories; the sensory evaluation category comprises at least one of taste, aroma, dry tea appearance, liquor color and leaf bottom.
In one embodiment, the step of calculating the maximum similarity and the minimum similarity according to a preset similarity formula includes:
and calculating the maximum similarity and the minimum similarity according to the cosine similarity and/or the Jacard coefficient formula.
In one embodiment, after the step of determining the quality of the sensory quality of each sample, the method further comprises:
calculating a feature vector of each sample based on the maximum similarity and the minimum similarity;
performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters;
the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.
In one embodiment, after the step of determining the quality of the sensory quality of each sample, the method further comprises:
and (4) ranking the quality of the sensory quality of each sample according to the score of the sample.
Accordingly, a system for comparing sensory quality of tea leaves based on an evaluation comment, comprising:
the word stock unit is used for presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients, and the corresponding assignment of the degree adverbs comprises scale values;
the vector conversion unit is used for respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;
the comment vector unit is used for acquiring a Hadamard product of the term weight vector and the scale vector as a comment vector;
the similarity calculation unit is used for extracting the maximum value and the minimum value of each dimension between the comment vectors according to a preset method and using the maximum value comment vector and the minimum value comment vector as the comment vectors of each batch of samples; calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector;
a score calculating unit for calculating a sample score of each sample based on the maximum similarity and the minimum similarity;
and the result judging unit is used for judging the quality of the sensory quality of each sample according to the score of the sample.
Accordingly, in one embodiment, the thesaurus unit includes: an assignment setting subunit;
the assignment setting subunit is configured to preset a comment word and a corresponding assignment thereof, where the comment word includes a primitive term and a degree adverb, the corresponding assignment of the primitive term includes a weight coefficient and a commendatory and derogatory coefficient, and the corresponding assignment of the degree adverb includes a scale value;
the vector conversion unit is further used for setting the commendable and derogative coefficient vector of the batch according to the commendable and derogative coefficients of the primitive terms contained in the comments of the samples of the batch;
the comment vector unit is further used for acquiring a Hadamard product of the term weight vector, the scale vector and the commendatory and derogatory coefficient vector as a comment vector.
Accordingly, in one embodiment, the thesaurus unit includes: a category setting subunit;
the category setting subunit is used for defining comment words and corresponding assignments thereof according to preset sensory evaluation categories; the sensory evaluation category comprises at least one of taste, aroma, dry tea appearance, liquor color and leaf bottom.
Accordingly, in one embodiment, the similarity calculation unit includes: a formula setting subunit, which is used for calculating the maximum similarity and the minimum similarity according to the cosine similarity and/or the Jacard coefficient formula;
the result judgment unit includes: a ranking subunit and/or a sorting subunit; wherein, the grading subunit is configured to calculate feature vectors of the samples based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; grading the sensory quality of the batch of samples according to the size of the average value of the scores of the samples in each group; and the sorting subunit is used for sorting the quality of the sensory quality of each sample according to the score of the sample.
The invention has the following beneficial effects:
the method realizes quality comparison based on the tea leaf evaluation comment, and can ensure the consistency of the comparison result and the character comment. The evaluation vector in the method restores quality description information in the evaluation comment. The arrangement of the element terms and the degree adverbs is consistent with the quality description mode of 'sensory characteristics + sensory intensity' in the tea sensory evaluation comment. The weight coefficient is set, so that the tea leaf evaluation habit of different influence degrees of various sensory quality characteristics on the overall quality is adapted. The scale of the degree adverb scale value conforms to the description range of the sensory intensity in the text comment. Therefore, the comment vector consisting of the weight coefficient and the scale value avoids the information in the comment to a large extent from being reserved, and ensures that most quality description information in the comment directly participates in comparison between samples. The method for comparing the evaluation vectors excavates the quality difference information among the samples. And the difference degree of the advantages and the disadvantages of each sample is reflected by adopting a method for comparing the most value of the comment vector with the most value of the comment vector. Thus, the sample score calculated from each of the maximum and minimum similarity degrees can effectively reflect the difference between the original assessment comments.
Detailed Description
In order to facilitate an understanding of the invention, specific embodiments thereof will be described in more detail below with reference to the accompanying drawings.
Unless specifically stated or otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the case of combining the technical solutions of the present invention in a realistic scenario, all technical and scientific terms used herein may also have meanings corresponding to the purpose of achieving the technical solutions of the present invention.
As used herein, unless otherwise specified or defined, "first" and "second" … are used merely for name differentiation and do not denote any particular quantity or order.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items, unless specified or otherwise defined.
Fig. 1 is a flowchart of a method for comparing sensory quality of tea leaves based on an assessment comment according to the present invention, including:
s101: presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients, and the corresponding assignment of the degree adverbs comprises scale values;
s102: respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;
s103: acquiring a Hadamard product of the term weight vector and the scale vector as a comment vector;
s104: extracting a most valued comment vector according to the comment vectors of all samples in the batch; calculating the maximum similarity and the minimum similarity according to the comment vector and the most valued comment vector;
specifically, according to the comment vectors of all samples in the batch, extracting the maximum value and the minimum value of all dimensions among the comment vectors according to a preset method to serve as the maximum value comment vector and the minimum value comment vector; and calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector.
S105: calculating a sample score for each sample based on the maximum similarity and the minimum similarity;
s106: and judging the quality of the sensory quality of each sample according to the score of the sample.
The method realizes quality comparison based on the tea leaf evaluation comment, and can ensure the consistency of the comparison result and the character comment. The evaluation vector in the method restores quality description information in the evaluation comment. The arrangement of the element terms and the degree adverbs is consistent with the quality description mode of 'sensory characteristics + sensory intensity' in the tea sensory evaluation comment. The weight coefficient is set, so that the tea leaf evaluation habit of different influence degrees of various sensory quality characteristics on the overall quality is adapted. The scale of the degree adverb scale value conforms to the description range of the sensory intensity in the text comment. The setting of the positive and negative coefficient reduces the likes and dislikes of the appraisers on the sensory characteristics. The comment vector consisting of the weight coefficients, scale values and positive and negative coefficients thus largely avoids the retention of information in the evaluation comment, ensuring that most of the quality-describing information in the comment is directly involved in the comparison between the samples. The method for comparing the evaluation vectors excavates the quality difference information among the samples. And the difference degree of the advantages and the disadvantages of each sample is reflected by adopting a method for comparing the most value of the comment vector with the most value of the comment vector. The maximum and minimum similarity of each sample is measured based on cosine similarity and/or Jacard coefficients, taking into account the difference in direction and effective length of the comment vector. Thus, the sample score calculated from each of the maximum and minimum similarity degrees can effectively reflect the difference between the original assessment comments.
Fig. 2 is a flowchart illustrating a method for comparing sensory qualities of tea leaves according to a first embodiment of the present invention.
S201: presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients and commendatory and depreciation coefficients, and the corresponding assignment of the degree adverbs comprises a scale value;
s202: respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;
s203: determining the commendatory and derogatory coefficient vector of the batch according to the commendatory and derogatory coefficients of the primitive terms contained in the comments of the samples of the batch;
s204: the hadamard products of the term weight vector, the scale vector, the commendatory and the derogatory coefficient vector are obtained as the comment vector.
S205: extracting a most valued comment vector according to the comment vectors of all samples in the batch; calculating the maximum similarity and the minimum similarity according to the comment vector and the most valued comment vector;
specifically, according to the comment vectors of all samples in the batch, extracting the maximum value and the minimum value of all dimensions among the comment vectors according to a preset method to serve as the maximum value comment vector and the minimum value comment vector; then according to the comment vector, the maximum comment vector and the minimum comment vector, calculating the maximum similarity and the minimum similarity according to a preset similarity formula
S206: calculating a sample score for each sample based on the maximum similarity and the minimum similarity;
s207: judging the quality of the sensory quality of each sample according to the score of the sample;
s208: calculating a feature vector of each sample based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.
In this embodiment, the number of samples for which the comparison of sensory quality needs to be performed is determined, all samples are defined as a batch, and each primitive term corresponding to one sensory quality is extracted from the review comments of all samples in a batch. If a combination term corresponding to a plurality of sensory qualities is included in the assessment comment, the combination term is split into a plurality of primitive terms corresponding to only one sensory quality, and the degree adverb modifying the combination term is retained before the primitive terms. For the evaluation comment obtained by the standardized evaluation method, the selection of the primitive terms can be determined according to the type of the tea sample and the description object by referring to the national standard of the people's republic of China, namely tea sensory evaluation term (GB/T14487). GB/T14487 covers most of the commonly used terms used in the standardization evaluation, covers all tea species, and is a complete primitive term library. For a term not included in GB/T14487, it should be determined whether the term is the same as or similar to the semantic meaning of the recorded term in GB/T14487; if yes, the word is regarded as a primitive term; if not, the term is excluded. For the evaluation comments obtained without adopting the standardized evaluation method, the selection of the primitive terms is determined according to the differentiation and generalization of each sensory quality of the examiners participating in the batch of samples.
The extracted primitive terms are sorted. For example, the corresponding weight coefficients are ordered from high to low according to primitive terminology.
It should be added that, in step S201, the corresponding assignment of the primitive terms includes weight coefficients, for example, if there is a primitive term in a batch of evaluation comments on tea that contributes to the overall quality determination of tea more than other primitive terms, the weight coefficients corresponding to the term may be set differently.
In step S201, the respective assignments of the primitive terms further include the commendatory and derogatory coefficients. Different primitive terms have different correlations with the overall quality of the tea leaves, namely, some have positive correlations, some have negative correlations and others have no correlations. The setting of the positive and negative meaning coefficients can reflect the difference of the relevance of different primitive terms and the overall quality of the tea. In this embodiment, step S203 defines the determination coefficient vector of the primitive term according to the determination of the primitive term. "positive primitive term" denotes a term describing a sensory quality advantage, "depreciation primitive term" denotes a term describing a sensory quality disadvantage, and "neutral primitive term" denotes a term describing a sensory quality feature without positive and negative meanings. The setting of the positive and negative coefficient reduces the likes and dislikes of the appraisers on the sensory characteristics. The primitive term positive and negative coefficient vector is as follows:
s:=[s1 … si … sn]T (1)
(1≤i≤n)
where s includes n elements, each element corresponds to a respective criterion coefficient of a primitive term and is in order consistent with the ordering of the primitive terms. Element s in a vectori-1, 0, 1, wherein positive primitive term corresponding element is set to 1, negative primitive term corresponding element is set to-1, neutral primitive techniqueThe language corresponding element is set to 0.
The term weight vector is defined in terms of the importance of the sensory quality in the comparison of samples. The weight of each element term is determined by the evaluation factor corresponding to the attribute of the sensory quality, wherein the evaluation factor comprises five aspects of appearance, liquor color, aroma, taste and leaf bottom. Primitive terms belonging to the same evaluation factor can keep consistent with the weight value, and can also be set according to the difference of the importance of terms in specific quality comparison. The primitive term weight vector is shown as follows:
w:=[w1 … wi … wn]T (2)
(1≤i≤n)
where w includes n elements, each element corresponding to a weight value of a primitive term and being sequentially consistent with the ordering of the primitive terms. Element w in a vectori∈(0,1]The value can be referred to the coefficient value of ' evaluation coefficient of quality factor of various teas ' in the national standard of the people's republic of China ' tea sensory evaluation method ' (GB/T23776), and the coefficient value is divided by 100 and then the square root is opened to obtain the final value.
It should be added that the weighting factor value is also adjusted differently according to the importance of primitive terms in the overall quality evaluation of tea. For example, if a quality feature that contributes significantly to the overall quality is specified in the detailed review process, the weighting factor of the primitive term corresponding to the quality feature may be increased.
And constructing a table by taking the samples as rows and the primitive terms as columns, and counting the distribution of the primitive terms in each sample. The primitive term columns in the table are consistent with the specified ordering. Corresponding to each primitive term, extracting a phrase containing the term from the evaluation comment of each sample, and filling the phrase into the constructed table in a one-to-one correspondence manner. The extracted phrases reflect the sensory qualities referred to by the corresponding primitive terms and include degree adverbs that modify the intensity of the sensory qualities. If no corresponding primitive term is present in the sample, it is noted as "-" in the table.
From the statistical results of the distribution of primitive terms in a batch of samples, terms in the summary correspond to the types of phrases according to the difference of the types of the degree adverbs modifying the primitive terms, and are arranged in the order of the strength of the degree adverbs from weak to strong.
For the evaluation comments obtained by adopting the standardized evaluation method, the evaluation comments can be sorted based on the corresponding relation between the degree adverbs in the table t1 and the sensory intensity; degree adverbs not listed in table t1 should be emphasized or ranked according to their actual sensory degree in the comment. For the appraisal comments obtained by adopting a standard appraisal method, the corresponding relation between the degree adverb and the sensory intensity is determined according to the actual feeling of an appraiser.
Corresponding relation between degree adverbs of table t1 and sensory intensity
The individual degree adverbs are then scaled equidistantly. According to the ordering of degree adverbs of each primitive term, a scale is corresponding to the degree from weak (or no) to strong, the scale which refers to the term with the weakest strength or no primitive is correspondingly assigned with 0, and the corresponding assignment of each scale is increased by 1 compared with the previous scale until each degree adverb has a corresponding scale value.
Step S204 substitutes the corresponding scale values of the adverbs of the primitive glossary degrees to form a scale vector. And substituting the scale value into the primitive term distribution statistical table based on the corresponding relation between the degree adverb and the scale value. The contents of the table form a scale matrix in which each column corresponds to a scale vector for one sample. The scale matrix and scale vector are as follows:
(1≤j≤m,1≤i≤n)
in the formula, the scale matrix D is a matrix of m rows and n columns, each row corresponding to a sample and each column corresponding to a primitive term. Each row in the scaling matrix D is a scaling vector for one sample, denoted by D(j)Represents; each of the scale matricesElement with dj,iAnd (4) showing.
First, Min-Max normalization is performed on each column of the scale matrix D, forming a normalized scale matrix D'. At this time, each row in the normalized scale matrix is a normalized scale vector, d'(j)And (4) showing. The process is shown in formula (4-5):
second, on normalized scale vector d'(j)(each line in the scale matrix D'), the term weight vector w and the commendative coefficient vector s are multiplied element by element, and the resulting hadamard product forms the comment matrix, denoted X. At this time, each row in the comment matrix X is a comment vector of one sample, and X is used as the comment vector(j)Represents; each element in the comment matrix in xj,iAnd (4) showing. The above treatment process is shown in the formula (6-7):
X=D′⊙(s·J1,m)T⊙(w·J1,m)T (6)
finally, steps S205 to S208 are based on the clustering and quality level determination of the comment vector. As shown in the following steps S1 to S3.
And S1, extracting from the weighted comment matrix.
Extracting the maximum value and the minimum value of each dimension in the comment matrix X as a maximum value comment vector and a minimum value comment vector, and respectively taking X as the reference valuemaxAnd xminAnd (4) showing. The above treatment process is shown in the formula (8-9):
and S2-1, calculating the similarity according to the cosine similarity.
Respectively calculating comment vectors x of all samples(j)Cosine similarity with maximum and minimum comment vectors, in cmax,jAnd cmin,jAnd (4) showing. Respectively forming the cosine similarity values of the maximum value comment vector and the minimum value comment vector into vectors by cmaxAnd cminRepresents; wherein the arrangement sequence of the similarity values is consistent with the sequence of the samples in the comment matrix. The above treatment process is shown in the formula (10-13):
cmax:=[cmax,1 … cmax,j … cmax,m]T (12)
cmin:=[cmin,1 … cmin,j … cmin,m]T (13)
alternatively, S2-2 calculates the similarity according to the Jacard coefficient formula.
Adopting unit step function f (X) to convert comment matrix X and maximum comment vector XmaxAnd minimum value comment vector xminConversion to matrix X 'containing only 0 and 1, vector X'maxAnd x'min. The above treatment process is shown in the formula (14-17):
X′=f(X) (15)
x′max=f(xmax) (16)
x′min=f(xmin) (17)
each row in matrix X 'corresponds to a vector of sample information, in X'(j)And (4) showing. Respective sample vector x 'is calculated'(j)And vector x'maxAnd x'minJacard coefficient ofmax,jAnd jmin,jAnd (4) showing. The Jacard coefficient j of each samplemax,jAnd jmin,jRespectively form vectors with jmaxAnd jminRepresents; wherein the sequence of the Jacard coefficients is consistent with the sample sequence in the comment matrix. The above treatment process is shown in the formula (18-22):
jmax:=[jmax,1 … jmax,j … jmax,m]T (21)
jmin:=[jmin,1 … jmin,j … jmin,m]T (22)
and S3, clustering the samples and judging whether the samples are good or bad.
Further, step S208 grades the sensory quality of the batch of samples.
C is tomax、cmin、jmaxAnd jminConnecting into a matrix T, wherein each row of the matrix corresponds to a similarity characteristic vector of a sample, and T is(j)And (4) showing.
At this point, the matrix is taken as a data set, with each eigenvector as an input variable, with { t }(1),…,t(j),…,t(m)) And (4) showing. In addition, the number of grades of the sample fraction, denoted by K, was determined. Based on the data set t(1),…,t(j),…,t(m)Executing k-means clustering, wherein parameters of the k-means clustering are set as follows: the number of the clustering clusters is K, a method of 'K-means + +' is adopted for initializing the central point, the repeated execution times of the K-means clustering is 50, and the maximum iteration times of the single K-means clustering is 500.
And (4) judging the quality of each class, and defining the sample score of each sample according to the formula (24) and expressing the score by r. Grouping the sample scores of each sample into a vector, denoted by r; wherein the arrangement sequence of the sample scores is consistent with the arrangement sequence of the samples in the comment matrix.
rj:=(cmax,j+jmax,j)-(cmin,j+jmin,j) (24)
r:=[r1 … rj … rm]T (25)
From the k-means classification results, the average of all sample scores in each cluster was calculated. Clusters were ranked from high to low as the mean of the sample scores. The larger the sample score value is, the better the quality of the sample in the representative cluster is, and the higher the grade is; conversely, the smaller the sample score value, the worse the sample quality in the representative cluster, the lower the grade. Because the difference between the samples can be quantified and measured relatively well, the samples can be distinguished based on the difference between the samples, and the grading of the tea samples is completed.
In conclusion, the method adopts the method of taking the primitive term as the element and taking the degree adverb as the size to convert the appraisal comment into the comment vector which can be used for comparison, thereby reducing the loss of the comment information in the conversion process and ensuring that most of the appraisal information can directly contribute to the comparison among samples; the cosine similarity and/or Jacard similarity coefficient between the sample comment vector and the most valued vector are/is adopted to carry out clustering, and the information describing the sensory quality category and strength in the comment is comprehensively utilized, so that the quality difference information among samples can be mined, and the grading accuracy can be improved. The invention can realize grading based on evaluation result difference among tea leaves, and does not need the participation of tea leaf grading physical standard samples in the process.
Fig. 3 is a flowchart illustrating a method for comparing sensory qualities of tea leaves according to a second embodiment of the present invention.
S301: defining comment words and corresponding assignments according to preset sensory evaluation categories; the sensory evaluation category comprises at least one of taste, aroma, dry tea appearance, liquor color and leaf bottom. Wherein the comment words comprise primitive terms, degree adverbs, respective assignments of the primitive terms comprising weight coefficients and commendatory and derogatory coefficients, respective assignments of the degree adverbs comprising scale values;
s302: respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;
s303: determining the commendatory and derogatory coefficient vector of the batch according to the commendatory and derogatory coefficients of the primitive terms contained in the comments of the samples of the batch;
s304: the hadamard products of the term weight vector, the scale vector, the commendatory and the derogatory coefficient vector are obtained as the comment vector.
S305: extracting a most valued comment vector according to the comment vectors of all samples in the batch; calculating the maximum similarity and the minimum similarity according to the comment vector and the most valued comment vector;
specifically, according to the comment vectors of all samples in the batch, extracting the maximum value and the minimum value of all dimensions among the comment vectors according to a preset method to serve as the maximum value comment vector and the minimum value comment vector; and calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector.
S306: calculating a sample score for each sample based on the maximum similarity and the minimum similarity;
s307: judging the quality of the sensory quality of each sample according to the score of the sample;
s308: calculating a feature vector of each sample based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.
S309: and (4) ranking the quality of the sensory quality of each sample according to the score of the sample.
To better illustrate the implementation of the present invention, the following classification of samples was performed based on sensory evaluation results of oolong tea.
(1) A total of 19 oolong tea samples were identified for comparison and grouped together in an analysis batch, each sample having the following evaluation criteria 1:
TABLE 1 tea sample sensory evaluation comment
(2) Based on the evaluation of the 19 samples, 33 primitive terms corresponding to a single sensory quality were extracted. Wherein the "brown (dark/yellowish brown)" of the dry tea appearance and the "orange" of the liquor color are not pragmatically determined, and their pragmatic coefficient is set to 0, i.e. the similarity, score and classification calculation are not affected in the subsequent method execution, and thus are excluded and not shown in the table.
TABLE 2 induction table of primitive terms and their respective commendative and derogative coefficients and weighting coefficients
TABLE 2 SUCCESSIVE primitive glossary induction TABLE AND JUDY-AND-DEMANAGE COEFFICIENTS AND WEIGHT COEFFICIENTS FOR SUCCESSIVE glossary
TABLE 2 SUCCESSIVE primitive glossary induction TABLE AND JUDY-AND-DEMANAGE COEFFICIENTS AND WEIGHT COEFFICIENTS FOR SUCCESSIVE glossary
As can be seen from the data in table 2, the positive and negative coefficient vectors s and the term weight vector w of the primitive term are:
(3) the distribution of each primitive term in the sample was counted from 19 samples and the results are shown in the following table:
TABLE 3 distribution of primitive terms in respective sample comments
Note: "-" indicates no relevant primitive terms in the sample evaluation results
TABLE 3 distribution of primitive terms in (subsequent) sample comments
Note: "-" indicates no relevant primitive terms in the sample evaluation results
TABLE 3 distribution of primitive terms in (subsequent) sample comments
Note: "-" indicates no relevant primitive terms in the sample evaluation results
(4) Based on the distribution of each element term among samples, the types of the degree adverbs modifying the element terms are counted and arranged in the order of the strength of the degree adverb modification from weak to strong. The corresponding relationship between the degree adverb and the scale value is then defined, as shown in table 4 below.
TABLE 4 corresponding relationship between degree adverbs and scale values
TABLE 4 correspondence between degree adverb and scale value
TABLE 4 correspondence between degree adverb and scale value
(5) And replacing the primitive terms in each sample with the scale values according to the relation between the degree adverbs and the scale values to form scale vectors. Each row in table 5 represents a scale vector for one sample.
TABLE 5 distribution of primitive terms in each sample comment after replacement by scalar values
TABLE 5 distribution of primitive terms in respective sample comments after (subsequent) assignment substitutions
As can be seen from table 5, the scale matrix and scale vector are as follows:
(6) performing Min-Max normalization on each column of the scaling matrix to form a normalized scaling matrix:
(7) and adding a positive and negative coefficient and a term weight coefficient to the normalized scale matrix to form a comment matrix.
(8) The "maximum value comment vector" and the "minimum value comment vector" are extracted from the comment matrix. The two vectors are respectively as follows:
(9) and calculating the cosine similarity between the sample comment vector and the most valued comment vector.
(10) Calculating the Jacard coefficient between the sample comment vector and the most valued comment vector.
It should be added that the difference of the comment vector is mainly reflected in both the vector direction and the effective length (number of dimensions other than 0). Cosine similarity measures the difference in direction of vectors, and Jacard coefficients measure the difference in effective length of vectors. Thus, these two similarity calculation methods are preferably used. Other similarity or distance metrics that may be used herein are: (1) euclidean distance, (2) hamming distance, and (3) pearson correlation coefficient.
(11) And performing k-means clustering based on cosine similarity and Jacard coefficient between each sample comment vector and the most valued comment vector. The parameters for k-means clustering were set as follows: the number of the clustering clusters is 3, a method of 'k-means + +' is adopted for initializing the central point, the repeated execution times of the k-means clustering is 50, and the maximum iteration times of the single k-means clustering is 500.
(12) And calculating the average sample score of each clustering sample based on the k-means clustering result. The sample classifications are ranked from high to low by evaluation score and the sample rank in each classification is determined.
(13) The grading was completed and the oolong tea sample was graded into 3 grades with the results as given in table 6 below:
TABLE 6 oolong tea sample grading and ranking results
It should be added that the result of calculating the sample score of each sample by using only the cosine similarity formula is shown in the following table 6-1:
TABLE 6-1 oolong tea sample ranking and ranking results based on cosine similarity
And judging whether the sensory quality of any two samples is good or bad according to the scores of the samples.
Further, k-means clustering grouping is performed only on the basis of cosine similarity, the number of clusters is 3, and the batch of tea leaves can be divided into three groups as shown in table 6-1, and the three groups are separated by solid lines. Wherein the ranking of merits may be achieved by the average of the scores of the samples within each group, as shown in the three grades 1-3 of Table 6-1.
Further, the samples in one group are ranked according to the scores, so that the ranking of the samples can be realized.
And calculating the classification result of the sample scores of all samples by using the Jacard coefficient similarity formula, which is shown in the table 6-2.
TABLE 6-2 Jacobside coefficient-based oolong tea sample grading and ranking results
And judging whether the sensory quality of any two samples is good or bad according to the scores of the samples.
Further, k-means clustering grouping was performed based on the jaccard coefficient similarity only, with a cluster number of 3, and the batch of tea leaves were divided into three groups as shown in table 6-2, separated by solid lines. Wherein the ranking of merits may be achieved by the average of the scores of the samples within each group, as shown in the three grades 1-3 of Table 6-2.
Further, the samples in one group are ranked according to the scores, so that the ranking of the samples can be realized.
It should be added that the performance metrics based on the classification effects of the "cosine similarity", "jaccard coefficient", and "cosine similarity and jaccard coefficient" are described above. The Davies-Bouldin index is adopted to measure the classification effect, and the smaller the index value is, the better the classification effect is. It can be seen that the classification based on "cosine similarity and jaccard coefficient" is the best, see table 7:
TABLE 7 Davies-Bouldin index comparison
Examples
|
Davies-Bouldin index
|
Based on cosine similarity, see Table 6-1
|
0.734
|
Based on the Jacobsad coefficient, see Table 6-2
|
0.603
|
Based on cosine similarity and Jacobsad coefficients, see Table 6
|
0.555 |
In summary, the present embodiment shows: the calculation of the similarity between the samples can be based on a cosine similarity formula and/or a Jacard coefficient formula, or one or more similarity calculation methods in addition to the cosine similarity formula and/or the Jacard coefficient formula; the clustering method and the setting of the parameters of the clustering method are optional; when the samples do not need to be graded or classified, the sensory quality of each sample can be ranked based on the score of the samples; when the samples are classified into 3 levels, as shown in this embodiment, the number of clusters of the clustering method may be set to 3; when the samples are divided into 5 levels, as shown in this embodiment, the number of clusters in the clustering method can be set to 5, and further, the good and bad ranking of the sensory quality of each sample in each level can be simultaneously realized. The other clustering parameters are also set as required, and are not described in detail. Therefore, the method can realize the comparison, classification and sequencing of the sensory quality of the tea samples, and the comparison result is stable and accurate.
Fig. 4 is a schematic diagram of a comparison system of sensory quality of tea leaves based on an evaluation comment according to the present invention, including:
the word stock unit is used for presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients, and the corresponding assignment of the degree adverbs comprises scale values;
the vector conversion unit is used for respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;
the comment vector unit is used for acquiring a Hadamard product of the term weight vector and the scale vector as a comment vector;
the similarity calculation unit is used for extracting the maximum value and the minimum value of each dimension between the comment vectors according to a preset method and using the maximum value comment vector and the minimum value comment vector as the comment vectors of each batch of samples; calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector;
a score calculating unit for calculating a sample score of each sample based on the maximum similarity and the minimum similarity;
and the result judging unit is used for judging the quality of the sensory quality of each sample according to the score of the sample.
Fig. 5 is a schematic view of a first embodiment of the system for comparing sensory quality of tea leaves according to the present invention.
In the embodiment shown in fig. 5, the thesaurus unit includes: an assignment setting subunit;
the assignment setting subunit is configured to preset a comment word and a corresponding assignment thereof, where the comment word includes a primitive term and a degree adverb, the corresponding assignment of the primitive term includes a weight coefficient and a commendatory and derogatory coefficient, and the corresponding assignment of the degree adverb includes a scale value;
the vector conversion unit is further used for setting the commendable and derogative coefficient vector of the batch according to the commendable and derogative coefficients of the primitive terms contained in the comments of the samples of the batch;
the comment vector unit is further used for acquiring a Hadamard product of the term weight vector, the scale vector and the commendatory and derogatory coefficient vector as a comment vector.
In the embodiment shown in fig. 5, the similarity calculation unit includes: and the formula setting subunit is used for calculating the maximum similarity and the minimum similarity according to the cosine similarity and/or the Jacard coefficient formula.
In the embodiment shown in fig. 5, the result determining unit includes: a ranking subunit; wherein, the grading subunit is configured to calculate feature vectors of the samples based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.
Fig. 6 is a schematic diagram of a second embodiment of the system for comparing the sensory quality of tea leaves according to the present invention.
In the embodiment shown in fig. 6, the thesaurus unit includes: a category setting subunit;
the category setting subunit is used for defining comment words and corresponding assignments thereof according to preset sensory evaluation categories; the sensory evaluation category comprises at least one of taste, aroma, dry tea appearance, liquor color and leaf bottom.
In the embodiment shown in fig. 6, the result determining unit includes: a sorting subunit; the sorting subunit is used for sorting the quality of the sensory quality of each sample according to the size of the score of the sample.
The above comparison system for tea sensory quality based on the assessment comment corresponds to the above method one by one, and the corresponding description is as shown above and is not repeated one by one.
The above embodiments are provided to illustrate, reproduce and deduce the technical solutions of the present invention, and to fully describe the technical solutions, the objects and the effects of the present invention, so as to make the public more thoroughly and comprehensively understand the disclosure of the present invention, and not to limit the protection scope of the present invention.
The above examples are not intended to be exhaustive of the invention and there may be many other embodiments not listed. Any alterations and modifications without departing from the spirit of the invention are within the scope of the invention.