CN113177103A

CN113177103A - Evaluation comment-based tea sensory quality comparison method and system

Info

Publication number: CN113177103A
Application number: CN202110425809.9A
Authority: CN
Inventors: 陈维; 马成英; 苗爱清; 胡蝶; 乔小燕
Original assignee: Tea Research Institute Chinese Academy of Agricultural Sciences
Current assignee: Tea Research Institute Chinese Academy of Agricultural Sciences
Priority date: 2021-04-13
Filing date: 2021-04-20
Publication date: 2021-07-27
Anticipated expiration: 2041-04-20
Also published as: CN113177103B

Abstract

The invention discloses a method and system for comparing sensory quality of tea leaves based on review comments. The method includes: presetting comment words and their corresponding assignments; setting the term weight vectors and Scale vector; obtain the Hadamard product of the term weight vector and the scale vector as the comment vector; extract the most valuable comment vector according to the comment vectors of each sample in this batch; then calculate the maximum similarity based on the comment vector and the most valuable comment vector Based on the maximum similarity and the minimum similarity, the sample scores of each sample are calculated; the sensory quality of each sample is judged based on the size of the sample scores. By adopting the invention, the accuracy and stability of the results of tea quality comparison can be improved.

Description

Evaluation comment-based tea sensory quality comparison method and system

Technical Field

The invention relates to a natural language processing technology, in particular to a method and a system for comparing sensory quality of tea based on an appraisal comment.

Background

The quality of tea mainly comes from the color, fragrance, taste and shape of tea soup and dry tea. The sensory evaluation is the most intuitive, most convenient and most common method for evaluating the quality of the tea, and the result has direct influence on the judgment of the commodity value of the tea.

The evaluation comment is a carrier of the sensory evaluation result of the tea and is also the basis of the comparison of the tea quality. Standardized assessment comments generally encompass a characterization and a description of the degree of the main sensory attributes in the appearance and the endoplasmic reticulum of tea. However, since the comments are textual and contain various types of sensory descriptions, this makes direct comparison between comments difficult, especially for large and varied quality samples. Therefore, how to accurately 'translate' the assessment comment and implement comparison is the key and difficult point for developing the tea quality comparison.

At present, the most widely adopted means is to convert the examination and evaluation comment into a percentile evaluation and then realize the comparison of the tea quality by means of the evaluation. The method has the advantages of simple rule and convenient implementation. However, the drawbacks are also quite evident: on one hand, when the evaluation comment containing multi-dimensional quality description such as appearance, taste and aroma is converted into a single evaluation, the loss of detail information in the comment is inevitably caused; on the other hand, the scale of the percentile scoring is too fine and is not matched with the expression of the sensory quality difference in the tea leaf evaluation, so that the scoring judgment has strong randomness. The defects can amplify the influence of subjective factors in the conversion from comment to score, and reduce the accuracy and stability of the result of tea quality comparison.

Disclosure of Invention

In view of the above, it is necessary to provide a method and a system for comparing sensory quality of tea leaves based on an evaluation comment, which can improve accuracy and stability of a result of comparison of the quality of tea leaves.

A method for comparing sensory quality of tea based on an assessment comment comprises the following steps:

presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients, and the corresponding assignment of the degree adverbs comprises scale values;

respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;

acquiring a Hadamard product of the term weight vector and the scale vector as a comment vector;

extracting the maximum value and the minimum value of each dimension between the comment vectors according to a preset method to serve as the maximum value comment vector and the minimum value comment vector according to the comment vectors of each sample in the batch; calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector;

calculating a sample score for each sample based on the maximum similarity and the minimum similarity;

and judging the quality of the sensory quality of each sample according to the score of the sample.

In one embodiment, the corresponding assignment of the primitive term includes, in addition to the weight coefficient: positive and negative coefficient;

the step of setting the term weight vector and the scale vector of the batch respectively by the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch further comprises,

determining the commendatory and derogatory coefficient vector of the batch according to the commendatory and derogatory coefficients of the primitive terms contained in the comments of the samples of the batch;

the step of obtaining the hadamard products of the term weight vector and the scale vector as the comment vector specifically includes obtaining the hadamard products of the term weight vector, the scale vector and the commendatory and derogatory coefficient vector as the comment vector.

In one embodiment, the step of presetting the comment words and corresponding assignments thereof specifically includes:

defining comment words and corresponding assignments according to preset sensory evaluation categories; the sensory evaluation category comprises at least one of taste, aroma, dry tea appearance, liquor color and leaf bottom.

In one embodiment, the step of calculating the maximum similarity and the minimum similarity according to a preset similarity formula includes:

and calculating the maximum similarity and the minimum similarity according to the cosine similarity and/or the Jacard coefficient formula.

In one embodiment, after the step of determining the quality of the sensory quality of each sample, the method further comprises:

calculating a feature vector of each sample based on the maximum similarity and the minimum similarity;

performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters;

the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.

and (4) ranking the quality of the sensory quality of each sample according to the score of the sample.

Accordingly, a system for comparing sensory quality of tea leaves based on an evaluation comment, comprising:

the word stock unit is used for presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients, and the corresponding assignment of the degree adverbs comprises scale values;

the vector conversion unit is used for respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;

the comment vector unit is used for acquiring a Hadamard product of the term weight vector and the scale vector as a comment vector;

the similarity calculation unit is used for extracting the maximum value and the minimum value of each dimension between the comment vectors according to a preset method and using the maximum value comment vector and the minimum value comment vector as the comment vectors of each batch of samples; calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector;

a score calculating unit for calculating a sample score of each sample based on the maximum similarity and the minimum similarity;

and the result judging unit is used for judging the quality of the sensory quality of each sample according to the score of the sample.

Accordingly, in one embodiment, the thesaurus unit includes: an assignment setting subunit;

the assignment setting subunit is configured to preset a comment word and a corresponding assignment thereof, where the comment word includes a primitive term and a degree adverb, the corresponding assignment of the primitive term includes a weight coefficient and a commendatory and derogatory coefficient, and the corresponding assignment of the degree adverb includes a scale value;

the vector conversion unit is further used for setting the commendable and derogative coefficient vector of the batch according to the commendable and derogative coefficients of the primitive terms contained in the comments of the samples of the batch;

the comment vector unit is further used for acquiring a Hadamard product of the term weight vector, the scale vector and the commendatory and derogatory coefficient vector as a comment vector.

Accordingly, in one embodiment, the thesaurus unit includes: a category setting subunit;

the category setting subunit is used for defining comment words and corresponding assignments thereof according to preset sensory evaluation categories; the sensory evaluation category comprises at least one of taste, aroma, dry tea appearance, liquor color and leaf bottom.

Accordingly, in one embodiment, the similarity calculation unit includes: a formula setting subunit, which is used for calculating the maximum similarity and the minimum similarity according to the cosine similarity and/or the Jacard coefficient formula;

the result judgment unit includes: a ranking subunit and/or a sorting subunit; wherein, the grading subunit is configured to calculate feature vectors of the samples based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; grading the sensory quality of the batch of samples according to the size of the average value of the scores of the samples in each group; and the sorting subunit is used for sorting the quality of the sensory quality of each sample according to the score of the sample.

The invention has the following beneficial effects:

the method realizes quality comparison based on the tea leaf evaluation comment, and can ensure the consistency of the comparison result and the character comment. The evaluation vector in the method restores quality description information in the evaluation comment. The arrangement of the element terms and the degree adverbs is consistent with the quality description mode of 'sensory characteristics + sensory intensity' in the tea sensory evaluation comment. The weight coefficient is set, so that the tea leaf evaluation habit of different influence degrees of various sensory quality characteristics on the overall quality is adapted. The scale of the degree adverb scale value conforms to the description range of the sensory intensity in the text comment. Therefore, the comment vector consisting of the weight coefficient and the scale value avoids the information in the comment to a large extent from being reserved, and ensures that most quality description information in the comment directly participates in comparison between samples. The method for comparing the evaluation vectors excavates the quality difference information among the samples. And the difference degree of the advantages and the disadvantages of each sample is reflected by adopting a method for comparing the most value of the comment vector with the most value of the comment vector. Thus, the sample score calculated from each of the maximum and minimum similarity degrees can effectively reflect the difference between the original assessment comments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles and effects of the invention.

Unless otherwise specified or defined, the same reference numerals in different figures refer to the same or similar features, and different reference numerals may be used for the same or similar features.

FIG. 1 is a flow chart of a method of comparing sensory attributes of tea leaves based on an evaluation comment according to the present invention;

FIG. 2 is a flowchart of a comparison method of sensory quality of tea leaves based on an evaluation comment according to a first embodiment of the present invention;

FIG. 3 is a flowchart of a comparison method of sensory quality of tea leaves based on an evaluation comment according to a second embodiment of the present invention;

fig. 4 is a schematic diagram of a comparison system for sensory tea quality based on an assessment comment according to the present invention.

FIG. 5 is a schematic view of a first embodiment of a system for comparing sensory tea quality based on an assessment comment according to the present invention;

fig. 6 is a schematic diagram of a second embodiment of the system for comparing the sensory quality of tea leaves according to the present invention.

Detailed Description

In order to facilitate an understanding of the invention, specific embodiments thereof will be described in more detail below with reference to the accompanying drawings.

Unless specifically stated or otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In the case of combining the technical solutions of the present invention in a realistic scenario, all technical and scientific terms used herein may also have meanings corresponding to the purpose of achieving the technical solutions of the present invention.

As used herein, unless otherwise specified or defined, "first" and "second" … are used merely for name differentiation and do not denote any particular quantity or order.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items, unless specified or otherwise defined.

Fig. 1 is a flowchart of a method for comparing sensory quality of tea leaves based on an assessment comment according to the present invention, including:

s101: presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients, and the corresponding assignment of the degree adverbs comprises scale values;

s102: respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;

s103: acquiring a Hadamard product of the term weight vector and the scale vector as a comment vector;

s104: extracting a most valued comment vector according to the comment vectors of all samples in the batch; calculating the maximum similarity and the minimum similarity according to the comment vector and the most valued comment vector;

specifically, according to the comment vectors of all samples in the batch, extracting the maximum value and the minimum value of all dimensions among the comment vectors according to a preset method to serve as the maximum value comment vector and the minimum value comment vector; and calculating the maximum similarity and the minimum similarity according to a preset similarity formula according to the comment vector, the maximum comment vector and the minimum comment vector.

S105: calculating a sample score for each sample based on the maximum similarity and the minimum similarity;

s106: and judging the quality of the sensory quality of each sample according to the score of the sample.

The method realizes quality comparison based on the tea leaf evaluation comment, and can ensure the consistency of the comparison result and the character comment. The evaluation vector in the method restores quality description information in the evaluation comment. The arrangement of the element terms and the degree adverbs is consistent with the quality description mode of 'sensory characteristics + sensory intensity' in the tea sensory evaluation comment. The weight coefficient is set, so that the tea leaf evaluation habit of different influence degrees of various sensory quality characteristics on the overall quality is adapted. The scale of the degree adverb scale value conforms to the description range of the sensory intensity in the text comment. The setting of the positive and negative coefficient reduces the likes and dislikes of the appraisers on the sensory characteristics. The comment vector consisting of the weight coefficients, scale values and positive and negative coefficients thus largely avoids the retention of information in the evaluation comment, ensuring that most of the quality-describing information in the comment is directly involved in the comparison between the samples. The method for comparing the evaluation vectors excavates the quality difference information among the samples. And the difference degree of the advantages and the disadvantages of each sample is reflected by adopting a method for comparing the most value of the comment vector with the most value of the comment vector. The maximum and minimum similarity of each sample is measured based on cosine similarity and/or Jacard coefficients, taking into account the difference in direction and effective length of the comment vector. Thus, the sample score calculated from each of the maximum and minimum similarity degrees can effectively reflect the difference between the original assessment comments.

Fig. 2 is a flowchart illustrating a method for comparing sensory qualities of tea leaves according to a first embodiment of the present invention.

S201: presetting a comment word and corresponding assignment thereof, wherein the comment word comprises primitive terms and degree adverbs, the corresponding assignment of the primitive terms comprises weight coefficients and commendatory and depreciation coefficients, and the corresponding assignment of the degree adverbs comprises a scale value;

s202: respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;

s203: determining the commendatory and derogatory coefficient vector of the batch according to the commendatory and derogatory coefficients of the primitive terms contained in the comments of the samples of the batch;

s204: the hadamard products of the term weight vector, the scale vector, the commendatory and the derogatory coefficient vector are obtained as the comment vector.

S205: extracting a most valued comment vector according to the comment vectors of all samples in the batch; calculating the maximum similarity and the minimum similarity according to the comment vector and the most valued comment vector;

specifically, according to the comment vectors of all samples in the batch, extracting the maximum value and the minimum value of all dimensions among the comment vectors according to a preset method to serve as the maximum value comment vector and the minimum value comment vector; then according to the comment vector, the maximum comment vector and the minimum comment vector, calculating the maximum similarity and the minimum similarity according to a preset similarity formula

S206: calculating a sample score for each sample based on the maximum similarity and the minimum similarity;

s207: judging the quality of the sensory quality of each sample according to the score of the sample;

s208: calculating a feature vector of each sample based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.

In this embodiment, the number of samples for which the comparison of sensory quality needs to be performed is determined, all samples are defined as a batch, and each primitive term corresponding to one sensory quality is extracted from the review comments of all samples in a batch. If a combination term corresponding to a plurality of sensory qualities is included in the assessment comment, the combination term is split into a plurality of primitive terms corresponding to only one sensory quality, and the degree adverb modifying the combination term is retained before the primitive terms. For the evaluation comment obtained by the standardized evaluation method, the selection of the primitive terms can be determined according to the type of the tea sample and the description object by referring to the national standard of the people's republic of China, namely tea sensory evaluation term (GB/T14487). GB/T14487 covers most of the commonly used terms used in the standardization evaluation, covers all tea species, and is a complete primitive term library. For a term not included in GB/T14487, it should be determined whether the term is the same as or similar to the semantic meaning of the recorded term in GB/T14487; if yes, the word is regarded as a primitive term; if not, the term is excluded. For the evaluation comments obtained without adopting the standardized evaluation method, the selection of the primitive terms is determined according to the differentiation and generalization of each sensory quality of the examiners participating in the batch of samples.

The extracted primitive terms are sorted. For example, the corresponding weight coefficients are ordered from high to low according to primitive terminology.

It should be added that, in step S201, the corresponding assignment of the primitive terms includes weight coefficients, for example, if there is a primitive term in a batch of evaluation comments on tea that contributes to the overall quality determination of tea more than other primitive terms, the weight coefficients corresponding to the term may be set differently.

In step S201, the respective assignments of the primitive terms further include the commendatory and derogatory coefficients. Different primitive terms have different correlations with the overall quality of the tea leaves, namely, some have positive correlations, some have negative correlations and others have no correlations. The setting of the positive and negative meaning coefficients can reflect the difference of the relevance of different primitive terms and the overall quality of the tea. In this embodiment, step S203 defines the determination coefficient vector of the primitive term according to the determination of the primitive term. "positive primitive term" denotes a term describing a sensory quality advantage, "depreciation primitive term" denotes a term describing a sensory quality disadvantage, and "neutral primitive term" denotes a term describing a sensory quality feature without positive and negative meanings. The setting of the positive and negative coefficient reduces the likes and dislikes of the appraisers on the sensory characteristics. The primitive term positive and negative coefficient vector is as follows:

s：＝[s₁ … s_i … s_n]^T (1)

(1≤i≤n)

where s includes n elements, each element corresponds to a respective criterion coefficient of a primitive term and is in order consistent with the ordering of the primitive terms. Element s in a vector_i-1, 0, 1, wherein positive primitive term corresponding element is set to 1, negative primitive term corresponding element is set to-1, neutral primitive techniqueThe language corresponding element is set to 0.

The term weight vector is defined in terms of the importance of the sensory quality in the comparison of samples. The weight of each element term is determined by the evaluation factor corresponding to the attribute of the sensory quality, wherein the evaluation factor comprises five aspects of appearance, liquor color, aroma, taste and leaf bottom. Primitive terms belonging to the same evaluation factor can keep consistent with the weight value, and can also be set according to the difference of the importance of terms in specific quality comparison. The primitive term weight vector is shown as follows:

w：＝[w₁ … w_i … w_n]^T (2)

(1≤i≤n)

where w includes n elements, each element corresponding to a weight value of a primitive term and being sequentially consistent with the ordering of the primitive terms. Element w in a vector_i∈(0，1]The value can be referred to the coefficient value of ' evaluation coefficient of quality factor of various teas ' in the national standard of the people's republic of China ' tea sensory evaluation method ' (GB/T23776), and the coefficient value is divided by 100 and then the square root is opened to obtain the final value.

It should be added that the weighting factor value is also adjusted differently according to the importance of primitive terms in the overall quality evaluation of tea. For example, if a quality feature that contributes significantly to the overall quality is specified in the detailed review process, the weighting factor of the primitive term corresponding to the quality feature may be increased.

And constructing a table by taking the samples as rows and the primitive terms as columns, and counting the distribution of the primitive terms in each sample. The primitive term columns in the table are consistent with the specified ordering. Corresponding to each primitive term, extracting a phrase containing the term from the evaluation comment of each sample, and filling the phrase into the constructed table in a one-to-one correspondence manner. The extracted phrases reflect the sensory qualities referred to by the corresponding primitive terms and include degree adverbs that modify the intensity of the sensory qualities. If no corresponding primitive term is present in the sample, it is noted as "-" in the table.

From the statistical results of the distribution of primitive terms in a batch of samples, terms in the summary correspond to the types of phrases according to the difference of the types of the degree adverbs modifying the primitive terms, and are arranged in the order of the strength of the degree adverbs from weak to strong.

For the evaluation comments obtained by adopting the standardized evaluation method, the evaluation comments can be sorted based on the corresponding relation between the degree adverbs in the table t1 and the sensory intensity; degree adverbs not listed in table t1 should be emphasized or ranked according to their actual sensory degree in the comment. For the appraisal comments obtained by adopting a standard appraisal method, the corresponding relation between the degree adverb and the sensory intensity is determined according to the actual feeling of an appraiser.

Corresponding relation between degree adverbs of table t1 and sensory intensity

The individual degree adverbs are then scaled equidistantly. According to the ordering of degree adverbs of each primitive term, a scale is corresponding to the degree from weak (or no) to strong, the scale which refers to the term with the weakest strength or no primitive is correspondingly assigned with 0, and the corresponding assignment of each scale is increased by 1 compared with the previous scale until each degree adverb has a corresponding scale value.

Step S204 substitutes the corresponding scale values of the adverbs of the primitive glossary degrees to form a scale vector. And substituting the scale value into the primitive term distribution statistical table based on the corresponding relation between the degree adverb and the scale value. The contents of the table form a scale matrix in which each column corresponds to a scale vector for one sample. The scale matrix and scale vector are as follows:

(1≤j≤m，1≤i≤n)

in the formula, the scale matrix D is a matrix of m rows and n columns, each row corresponding to a sample and each column corresponding to a primitive term. Each row in the scaling matrix D is a scaling vector for one sample, denoted by D^(j)Represents; each of the scale matricesElement with d_j，iAnd (4) showing.

First, Min-Max normalization is performed on each column of the scale matrix D, forming a normalized scale matrix D'. At this time, each row in the normalized scale matrix is a normalized scale vector, d'^(j)And (4) showing. The process is shown in formula (4-5):

second, on normalized scale vector d'^(j)(each line in the scale matrix D'), the term weight vector w and the commendative coefficient vector s are multiplied element by element, and the resulting hadamard product forms the comment matrix, denoted X. At this time, each row in the comment matrix X is a comment vector of one sample, and X is used as the comment vector^(j)Represents; each element in the comment matrix in x_j，iAnd (4) showing. The above treatment process is shown in the formula (6-7):

X＝D′⊙(s·J_1，m)^T⊙(w·J_1，m)^T (6)

finally, steps S205 to S208 are based on the clustering and quality level determination of the comment vector. As shown in the following steps S1 to S3.

And S1, extracting from the weighted comment matrix.

Extracting the maximum value and the minimum value of each dimension in the comment matrix X as a maximum value comment vector and a minimum value comment vector, and respectively taking X as the reference value_maxAnd x_minAnd (4) showing. The above treatment process is shown in the formula (8-9):

and S2-1, calculating the similarity according to the cosine similarity.

Respectively calculating comment vectors x of all samples^(j)Cosine similarity with maximum and minimum comment vectors, in c_max，jAnd c_min，jAnd (4) showing. Respectively forming the cosine similarity values of the maximum value comment vector and the minimum value comment vector into vectors by c_maxAnd c_minRepresents; wherein the arrangement sequence of the similarity values is consistent with the sequence of the samples in the comment matrix. The above treatment process is shown in the formula (10-13):

c_max：＝[c_max，1 … c_max，j … c_max，m]^T (12)

c_min：＝[c_min，1 … c_min，j … c_min，m]^T (13)

alternatively, S2-2 calculates the similarity according to the Jacard coefficient formula.

Adopting unit step function f (X) to convert comment matrix X and maximum comment vector X_maxAnd minimum value comment vector x_minConversion to matrix X 'containing only 0 and 1, vector X'_maxAnd x'_min. The above treatment process is shown in the formula (14-17):

X′＝f(X) (15)

x′_max＝f(x_max) (16)

x′_min＝f(x_min) (17)

each row in matrix X 'corresponds to a vector of sample information, in X'^(j)And (4) showing. Respective sample vector x 'is calculated'^(j)And vector x'_maxAnd x'_minJacard coefficient of_max，jAnd j_min，jAnd (4) showing. The Jacard coefficient j of each sample_max，jAnd j_min，jRespectively form vectors with j_maxAnd j_minRepresents; wherein the sequence of the Jacard coefficients is consistent with the sample sequence in the comment matrix. The above treatment process is shown in the formula (18-22):

j_max：＝[j_max，1 … j_max，j … j_max，m]^T (21)

j_min：＝[j_min，1 … j_min，j … j_min，m]^T (22)

and S3, clustering the samples and judging whether the samples are good or bad.

Further, step S208 grades the sensory quality of the batch of samples.

C is to_max、c_min、j_maxAnd j_minConnecting into a matrix T, wherein each row of the matrix corresponds to a similarity characteristic vector of a sample, and T is^(j)And (4) showing.

At this point, the matrix is taken as a data set, with each eigenvector as an input variable, with { t }⁽¹⁾，…，t^(j)，…，t^(m)) And (4) showing. In addition, the number of grades of the sample fraction, denoted by K, was determined. Based on the data set t⁽¹⁾，…，t^(j)，…，t^(m)Executing k-means clustering, wherein parameters of the k-means clustering are set as follows: the number of the clustering clusters is K, a method of 'K-means + +' is adopted for initializing the central point, the repeated execution times of the K-means clustering is 50, and the maximum iteration times of the single K-means clustering is 500.

And (4) judging the quality of each class, and defining the sample score of each sample according to the formula (24) and expressing the score by r. Grouping the sample scores of each sample into a vector, denoted by r; wherein the arrangement sequence of the sample scores is consistent with the arrangement sequence of the samples in the comment matrix.

r_j：＝(c_max，j+j_max，j)-(c_min，j+j_min，j) (24)

r：＝[r₁ … r_j … r_m]^T (25)

From the k-means classification results, the average of all sample scores in each cluster was calculated. Clusters were ranked from high to low as the mean of the sample scores. The larger the sample score value is, the better the quality of the sample in the representative cluster is, and the higher the grade is; conversely, the smaller the sample score value, the worse the sample quality in the representative cluster, the lower the grade. Because the difference between the samples can be quantified and measured relatively well, the samples can be distinguished based on the difference between the samples, and the grading of the tea samples is completed.

In conclusion, the method adopts the method of taking the primitive term as the element and taking the degree adverb as the size to convert the appraisal comment into the comment vector which can be used for comparison, thereby reducing the loss of the comment information in the conversion process and ensuring that most of the appraisal information can directly contribute to the comparison among samples; the cosine similarity and/or Jacard similarity coefficient between the sample comment vector and the most valued vector are/is adopted to carry out clustering, and the information describing the sensory quality category and strength in the comment is comprehensively utilized, so that the quality difference information among samples can be mined, and the grading accuracy can be improved. The invention can realize grading based on evaluation result difference among tea leaves, and does not need the participation of tea leaf grading physical standard samples in the process.

Fig. 3 is a flowchart illustrating a method for comparing sensory qualities of tea leaves according to a second embodiment of the present invention.

S301: defining comment words and corresponding assignments according to preset sensory evaluation categories; the sensory evaluation category comprises at least one of taste, aroma, dry tea appearance, liquor color and leaf bottom. Wherein the comment words comprise primitive terms, degree adverbs, respective assignments of the primitive terms comprising weight coefficients and commendatory and derogatory coefficients, respective assignments of the degree adverbs comprising scale values;

s302: respectively setting a term weight vector and a scale vector of the batch according to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comment of each sample of the batch;

s303: determining the commendatory and derogatory coefficient vector of the batch according to the commendatory and derogatory coefficients of the primitive terms contained in the comments of the samples of the batch;

s304: the hadamard products of the term weight vector, the scale vector, the commendatory and the derogatory coefficient vector are obtained as the comment vector.

S305: extracting a most valued comment vector according to the comment vectors of all samples in the batch; calculating the maximum similarity and the minimum similarity according to the comment vector and the most valued comment vector;

S306: calculating a sample score for each sample based on the maximum similarity and the minimum similarity;

s307: judging the quality of the sensory quality of each sample according to the score of the sample;

s308: calculating a feature vector of each sample based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.

S309: and (4) ranking the quality of the sensory quality of each sample according to the score of the sample.

To better illustrate the implementation of the present invention, the following classification of samples was performed based on sensory evaluation results of oolong tea.

(1) A total of 19 oolong tea samples were identified for comparison and grouped together in an analysis batch, each sample having the following evaluation criteria 1:

TABLE 1 tea sample sensory evaluation comment

(2) Based on the evaluation of the 19 samples, 33 primitive terms corresponding to a single sensory quality were extracted. Wherein the "brown (dark/yellowish brown)" of the dry tea appearance and the "orange" of the liquor color are not pragmatically determined, and their pragmatic coefficient is set to 0, i.e. the similarity, score and classification calculation are not affected in the subsequent method execution, and thus are excluded and not shown in the table.

TABLE 2 induction table of primitive terms and their respective commendative and derogative coefficients and weighting coefficients

TABLE 2 SUCCESSIVE primitive glossary induction TABLE AND JUDY-AND-DEMANAGE COEFFICIENTS AND WEIGHT COEFFICIENTS FOR SUCCESSIVE glossary

As can be seen from the data in table 2, the positive and negative coefficient vectors s and the term weight vector w of the primitive term are:

(3) the distribution of each primitive term in the sample was counted from 19 samples and the results are shown in the following table:

TABLE 3 distribution of primitive terms in respective sample comments

Note: "-" indicates no relevant primitive terms in the sample evaluation results

TABLE 3 distribution of primitive terms in (subsequent) sample comments

TABLE 3 distribution of primitive terms in (subsequent) sample comments

(4) Based on the distribution of each element term among samples, the types of the degree adverbs modifying the element terms are counted and arranged in the order of the strength of the degree adverb modification from weak to strong. The corresponding relationship between the degree adverb and the scale value is then defined, as shown in table 4 below.

TABLE 4 corresponding relationship between degree adverbs and scale values

TABLE 4 correspondence between degree adverb and scale value

TABLE 4 correspondence between degree adverb and scale value

(5) And replacing the primitive terms in each sample with the scale values according to the relation between the degree adverbs and the scale values to form scale vectors. Each row in table 5 represents a scale vector for one sample.

TABLE 5 distribution of primitive terms in each sample comment after replacement by scalar values

TABLE 5 distribution of primitive terms in respective sample comments after (subsequent) assignment substitutions

As can be seen from table 5, the scale matrix and scale vector are as follows:

(6) performing Min-Max normalization on each column of the scaling matrix to form a normalized scaling matrix:

(7) and adding a positive and negative coefficient and a term weight coefficient to the normalized scale matrix to form a comment matrix.

(8) The "maximum value comment vector" and the "minimum value comment vector" are extracted from the comment matrix. The two vectors are respectively as follows:

(9) and calculating the cosine similarity between the sample comment vector and the most valued comment vector.

(10) Calculating the Jacard coefficient between the sample comment vector and the most valued comment vector.

It should be added that the difference of the comment vector is mainly reflected in both the vector direction and the effective length (number of dimensions other than 0). Cosine similarity measures the difference in direction of vectors, and Jacard coefficients measure the difference in effective length of vectors. Thus, these two similarity calculation methods are preferably used. Other similarity or distance metrics that may be used herein are: (1) euclidean distance, (2) hamming distance, and (3) pearson correlation coefficient.

(11) And performing k-means clustering based on cosine similarity and Jacard coefficient between each sample comment vector and the most valued comment vector. The parameters for k-means clustering were set as follows: the number of the clustering clusters is 3, a method of 'k-means + +' is adopted for initializing the central point, the repeated execution times of the k-means clustering is 50, and the maximum iteration times of the single k-means clustering is 500.

(12) And calculating the average sample score of each clustering sample based on the k-means clustering result. The sample classifications are ranked from high to low by evaluation score and the sample rank in each classification is determined.

(13) The grading was completed and the oolong tea sample was graded into 3 grades with the results as given in table 6 below:

TABLE 6 oolong tea sample grading and ranking results

It should be added that the result of calculating the sample score of each sample by using only the cosine similarity formula is shown in the following table 6-1:

TABLE 6-1 oolong tea sample ranking and ranking results based on cosine similarity

And judging whether the sensory quality of any two samples is good or bad according to the scores of the samples.

Further, k-means clustering grouping is performed only on the basis of cosine similarity, the number of clusters is 3, and the batch of tea leaves can be divided into three groups as shown in table 6-1, and the three groups are separated by solid lines. Wherein the ranking of merits may be achieved by the average of the scores of the samples within each group, as shown in the three grades 1-3 of Table 6-1.

Further, the samples in one group are ranked according to the scores, so that the ranking of the samples can be realized.

And calculating the classification result of the sample scores of all samples by using the Jacard coefficient similarity formula, which is shown in the table 6-2.

TABLE 6-2 Jacobside coefficient-based oolong tea sample grading and ranking results

Further, k-means clustering grouping was performed based on the jaccard coefficient similarity only, with a cluster number of 3, and the batch of tea leaves were divided into three groups as shown in table 6-2, separated by solid lines. Wherein the ranking of merits may be achieved by the average of the scores of the samples within each group, as shown in the three grades 1-3 of Table 6-2.

It should be added that the performance metrics based on the classification effects of the "cosine similarity", "jaccard coefficient", and "cosine similarity and jaccard coefficient" are described above. The Davies-Bouldin index is adopted to measure the classification effect, and the smaller the index value is, the better the classification effect is. It can be seen that the classification based on "cosine similarity and jaccard coefficient" is the best, see table 7:

TABLE 7 Davies-Bouldin index comparison

Examples	Davies-Bouldin index
		Based on cosine similarity, see Table 6-1	0.734
Based on the Jacobsad coefficient, see Table 6-2	0.603
		Based on cosine similarity and Jacobsad coefficients, see Table 6	0.555

In summary, the present embodiment shows: the calculation of the similarity between the samples can be based on a cosine similarity formula and/or a Jacard coefficient formula, or one or more similarity calculation methods in addition to the cosine similarity formula and/or the Jacard coefficient formula; the clustering method and the setting of the parameters of the clustering method are optional; when the samples do not need to be graded or classified, the sensory quality of each sample can be ranked based on the score of the samples; when the samples are classified into 3 levels, as shown in this embodiment, the number of clusters of the clustering method may be set to 3; when the samples are divided into 5 levels, as shown in this embodiment, the number of clusters in the clustering method can be set to 5, and further, the good and bad ranking of the sensory quality of each sample in each level can be simultaneously realized. The other clustering parameters are also set as required, and are not described in detail. Therefore, the method can realize the comparison, classification and sequencing of the sensory quality of the tea samples, and the comparison result is stable and accurate.

Fig. 4 is a schematic diagram of a comparison system of sensory quality of tea leaves based on an evaluation comment according to the present invention, including:

Fig. 5 is a schematic view of a first embodiment of the system for comparing sensory quality of tea leaves according to the present invention.

In the embodiment shown in fig. 5, the thesaurus unit includes: an assignment setting subunit;

In the embodiment shown in fig. 5, the similarity calculation unit includes: and the formula setting subunit is used for calculating the maximum similarity and the minimum similarity according to the cosine similarity and/or the Jacard coefficient formula.

In the embodiment shown in fig. 5, the result determining unit includes: a ranking subunit; wherein, the grading subunit is configured to calculate feature vectors of the samples based on the maximum similarity and the minimum similarity; performing clustering grouping on the characteristic vectors of the samples according to a preset clustering method and parameters; the samples of the batch were ranked for sensory quality according to the magnitude of the average of the sample scores within each group.

In the embodiment shown in fig. 6, the thesaurus unit includes: a category setting subunit;

In the embodiment shown in fig. 6, the result determining unit includes: a sorting subunit; the sorting subunit is used for sorting the quality of the sensory quality of each sample according to the size of the score of the sample.

The above comparison system for tea sensory quality based on the assessment comment corresponds to the above method one by one, and the corresponding description is as shown above and is not repeated one by one.

The above embodiments are provided to illustrate, reproduce and deduce the technical solutions of the present invention, and to fully describe the technical solutions, the objects and the effects of the present invention, so as to make the public more thoroughly and comprehensively understand the disclosure of the present invention, and not to limit the protection scope of the present invention.

The above examples are not intended to be exhaustive of the invention and there may be many other embodiments not listed. Any alterations and modifications without departing from the spirit of the invention are within the scope of the invention.

Claims

1. a comparison method of the sensory quality of tea leaves based on review comments, is characterized in that, comprises:

Preset comment words and their corresponding assignments, wherein the comment words include primitive terms and degree adverbs, the corresponding assignments of the primitive terms include weight coefficients, and the corresponding assignments of the degree adverbs include scale values;

According to the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comments of each sample in this batch, the term weight vector and scale vector of this batch are respectively set;

Obtain the Hadamard product of the term weight vector and the scale vector as the comment vector;

According to the comment vectors of each sample in this batch, the maximum value and the minimum value of each dimension between the comment vectors are extracted according to the preset method as the maximum comment vector and the minimum comment vector; and then according to the comment vector and the maximum comment vector vector, the minimum value comment vector, calculate the maximum similarity and the minimum similarity according to the preset similarity formula;

Based on the maximum similarity and the minimum similarity, calculating a sample score for each sample;

The sensory quality of each sample is judged according to the scores of the samples.

2. a kind of comparison method of the sensory quality of tea based on review comments according to claim 1, is characterized in that, the corresponding assignment of described primitive term comprises in addition to weight coefficient, also comprises: positive and negative meaning coefficient;

The step of setting the term weight vector and the scale vector of this batch from the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comments of each sample of this batch, also includes,

Set the positive and negative coefficient vector of this batch from the positive and negative meaning coefficients of the primitive terms contained in the comments of each sample in this batch;

The step of obtaining the Hadamard product of the term weight vector and the scale vector as the comment vector specifically includes: obtaining the Hadamard product of the term weight vector, the scale vector, and the evaluation coefficient vector as the comment vector.

3. a kind of comparison method of the sensory quality of tea leaves based on review comment according to claim 1, is characterized in that, the step of described preset comment word and its corresponding assignment, specifically comprises:

According to the preset sensory review categories, define comment words and their corresponding assignments; the sensory review categories include at least one of taste, aroma, dry tea shape, soup color, and leaf bottom.

4. a kind of comparison method based on the sensory quality of tea leaves according to the review comment according to claim 1, is characterized in that, the described step of calculating maximum similarity and minimum similarity by preset similarity formula, comprises:

Calculate the maximum similarity and minimum similarity according to the cosine similarity and/or the Jaccard coefficient formula.

5. a kind of comparison method of the sensory quality of tea based on review comments according to any one of claims 1-4, is characterized in that, after the described step of judging the quality of sensory quality of each sample, also comprises :

Based on the maximum similarity and the minimum similarity, calculate the feature vector of each sample;

According to the preset clustering method and parameters, clustering and grouping is performed on the feature vectors of the samples;

The sensory quality of each sample is graded according to the size of the average of the sample scores within each group.

6. a kind of comparison method of the sensory quality of tea based on review comments according to any one of claims 1-5, is characterized in that, after the described step of judging the quality of sensory quality of each sample, also comprises :

The sensory quality of each sample is ranked according to the scores of the samples.

7. A comparison system for the sensory quality of tea leaves based on review comments, is characterized in that, comprising:

Thesaurus unit is used to preset comment words and their corresponding assignments, wherein the comment words include primitive terms and degree adverbs, the corresponding assignments of the primitive terms include weight coefficients, and the corresponding assignments of the degree adverbs include subscripts. degree value;

The vector conversion unit is used to set the term weight vector and scale vector of this batch from the weight coefficient of the primitive term and the scale value of the degree adverb contained in the comments of each sample in this batch;

The comment vector unit is used to obtain the Hadamard product of the term weight vector and the scale vector as the comment vector;

The similarity calculation unit is used to extract, according to the comment vectors of each sample in this batch, the maximum value and the minimum value of each dimension between the comment vectors according to the preset method as the maximum comment vector and the minimum comment vector; and then according to the comments The vector, the maximum value comment vector and the minimum value comment vector, calculate the maximum similarity and the minimum similarity according to a preset similarity formula;

a score calculation unit for calculating the sample score of each sample based on the maximum similarity and the minimum similarity;

The result judging unit is used for judging the quality of the sensory quality of each sample based on the scores of the samples.

8. a kind of comparison system of the sensory quality of tea leaves based on review comments according to claim 7, is characterized in that, described thesaurus unit, comprises: Assignment setting subunit;

The assignment setting subunit is used to preset comment words and their corresponding assignments, wherein the comment words include primitive terms and degree adverbs, and the corresponding assignments of the primitive terms include a weight coefficient and a positive or negative sense coefficient, and the Corresponding assignments for adverbs of degree include scale values;

The vector conversion unit is also used to set the evaluation coefficient vector of this batch from the evaluation and derogation coefficients of the primitive terms contained in the comments of each sample of this batch;

The comment vector unit is also used to obtain the Hadamard product of the term weight vector, scale vector, and evaluation coefficient vector as a comment vector.

9. The system for comparing the sensory quality of tea leaves based on review comments according to claim 7, wherein the thesaurus unit comprises: a category setting subunit;

The category is set up with subunits, which are used to define comment words and their corresponding assignments according to the preset sensory review categories; the sensory review categories include taste, aroma, dry tea shape, soup color, and leaf bottom. at least one of.

10. a kind of comparison system of the sensory quality of tea based on review comments according to any one of claims 7-9, is characterized in that:

The similarity calculation unit, including a formula setting subunit, is used to calculate the maximum similarity and the minimum similarity according to the cosine similarity and/or the Jaccard coefficient formula;

The result judging unit includes a grading subunit and/or a sorting subunit; wherein, the grading subunit is used to calculate the feature vector of each sample based on the maximum similarity and the minimum similarity; According to the set clustering method and parameters, the feature vectors of the samples are clustered and grouped; according to the average value of the sample scores in each group, the sensory quality of the batch of samples is graded; the sorting subunit , which is used to rank the sensory quality of each sample based on the size of the sample score.