CN112269880A

CN112269880A - Sweet text classification matching system based on linear function

Info

Publication number: CN112269880A
Application number: CN202011217922.XA
Authority: CN
Inventors: 杜登斌; 杜小军; 杜乐
Original assignee: Wuzheng Intelligent Technology Beijing Co ltd
Current assignee: Wuzheng Intelligent Technology Beijing Co ltd
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2021-01-26
Anticipated expiration: 2040-11-04
Also published as: CN112269880B

Abstract

The invention provides a classification and matching system for a sweet text based on a linear function. The method comprises the following steps: the acquisition module is used for acquiring the characteristic information of the sweet, and establishing a sweet characteristic vector set according to the characteristic information of the sweet; the classification module is used for establishing a linear function classification method, classifying the sweet feature vector set according to the linear function classification method, establishing a traditional Chinese medicine sweet feature vector set and a western medicine sweet feature vector set, and combining the traditional Chinese medicine sweet feature vector set and the western medicine sweet feature vector set into a sweet feature vector matching model; the calculation module is used for establishing a TF-IDF algorithm, acquiring the text information of the sweet to be matched, selecting the characteristic words of the sweet and establishing a vector set of the sweet characteristics to be matched; and the matching module is used for calculating the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficient and generating a matching report according to the similarity. The text information can be accurately matched through a linear function classification method, a TF-IDF algorithm and a Jacard similarity coefficient, and the accuracy of the whole matching process is improved.

Description

Sweet text classification matching system based on linear function

Technical Field

The invention relates to the field of artificial intelligence, in particular to a sweet text classification matching system based on a linear function.

Background

In common speaking, "smelling the nose and smelling the smell, tasting the tongue with five flavors". Sour, sweet, bitter, spicy and salty taste information is transmitted by small papillae densely distributed on the tongue surface and taste cells called tongue buds, and then excitation is generated by taste centers of cerebral cortex, and the feedback loop neurohumoral system completes the whole taste analysis activity, but some people feel abnormal taste in the mouth when eating or do not eat the mouth, which often indicates that some diseases can be caused.

At present, the matching means for realizing the matching between the sweet text information and the corresponding disease information in the medical science is usually to collect the sweet text through the completion of a clinician, and then to select the sweet text through the operation of the clinician on a computer, but the prior art means is usually to traverse and match a large amount of information when matching the information, so that not only the consumed resources are large, the consumed time is long, and the prior scheme needs to be improved urgently.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

In view of this, the invention provides a system for classifying and matching a sweet text based on a linear function, and aims to solve the technical problem that the prior art cannot classify the sweet text information through the linear function so as to reduce the resource consumed by data processing.

The technical scheme of the invention is realized as follows:

in one aspect, the present invention provides a linear function-based sweet text classification matching system, including:

the acquisition module is used for acquiring the characteristic information of the sweet, and establishing a sweet characteristic vector set according to the characteristic information of the sweet;

the classification module is used for establishing a linear function classification method, classifying the sweet feature vector set according to the linear function classification method, establishing a traditional Chinese medicine sweet feature vector set and a western medicine sweet feature vector set, and merging the traditional Chinese medicine sweet feature vector set and the western medicine sweet feature vector set into a sweet feature vector matching model;

the calculating module is used for establishing a TF-IDF algorithm, acquiring the sweet text information to be matched, selecting sweet feature words from the sweet text information to be matched through the TF-IDF algorithm, and establishing a sweet feature vector set to be matched;

and the matching module is used for calculating the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficient and generating a matching report according to the similarity.

On the basis of the above technical solution, preferably, the obtaining module includes a processing module, configured to obtain feature information of the sweet, where the feature information of the sweet is feature information of a symptom accompanied by the sweet, establish a feature information integrity verification rule, verify the feature information of the symptom accompanied by the sweet according to the feature information integrity verification rule, and establish a sweet feature vector set according to the feature information of the symptom accompanied by the sweet when the verification is passed.

On the basis of the above technical solution, preferably, the obtaining module includes an adding module, configured to obtain historical characteristic information of the symptom associated with the sweet, compare the historical characteristic information of the symptom associated with the sweet with the characteristic information of the symptom associated with the sweet, screen out non-duplicated historical characteristic information of the symptom associated with the sweet, and add the historical characteristic information of the symptom associated with the sweet to an import sweet characteristic vector set.

On the basis of the above technical solution, preferably, the classification module includes a classification calculation module, configured to establish a linear classification function, and set two classification categories: according to the traditional Chinese medicine sweet taste and the western medicine sweet taste, the sweet taste feature vector set is used as a function vector, classification categories are used as classification marks, the traditional Chinese medicine sweet taste feature vector set and the western medicine sweet taste feature vector set are established by utilizing a linear classification function, and the traditional Chinese medicine sweet taste feature vector set and the western medicine sweet taste feature vector set are combined into a sweet taste feature vector matching model.

On the basis of the technical scheme, preferably, the calculation module comprises an algorithm module for establishing a TF-IDF algorithm to obtain the sweet text information to be matched, calculating the word frequency of each word in the sweet text information to be matched through the TF-IDF algorithm, and taking the word with the calculated word frequency as the word to be screened.

On the basis of the above technical scheme, preferably, the calculation module includes a feature word processing module, which sets a common word bank and a word frequency threshold, screens words to be screened according to the common word bank, after screening out the common words, compares the word frequency of the remaining words to be screened with the word frequency threshold, selects the words to be screened which meet the word frequency threshold as the feature words of the sweetness, and establishes a feature vector set of the sweetness to be matched.

On the basis of the above technical solution, preferably, the matching module includes a matching report generating module for establishing a jaccard similarity coefficient, calculating a similarity between the sweet feature vector matching model and the sweet feature vector set to be matched by the jaccard similarity coefficient, and generating a corresponding matching report according to the similarity.

Still further preferably, the linear function-based spoken text classification matching device comprises:

the acquiring unit is used for acquiring the characteristic information of the sweet and the characteristic information of the disease, and respectively establishing a sweet characteristic vector set and a disease characteristic vector set according to the characteristic information of the sweet and the characteristic information of the disease;

the classification unit is used for establishing a linear function classification method, classifying the sweet feature vector set according to the linear function classification method, establishing a traditional Chinese medicine sweet feature vector set and a western medicine sweet feature vector set, and merging the traditional Chinese medicine sweet feature vector set and the western medicine sweet feature vector set into a sweet feature vector matching model;

the calculating unit is used for establishing a TF-IDF algorithm, acquiring the sweet text information to be matched, selecting sweet feature words from the sweet text information to be matched through the TF-IDF algorithm, and establishing a sweet feature vector set to be matched;

and the matching unit is used for calculating the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficient and generating a matching report according to the similarity.

Compared with the prior art, the sweet text classification matching system based on the linear function has the following beneficial effects:

(1) the linear function classification method and the TF-IDF algorithm are used for extracting the feature words, so that the accuracy of the extracted feature words can be improved, the matching of subsequent information is facilitated, meanwhile, the feature vector set is classified through the linear function classification method, the resource consumption during information matching is greatly reduced, and the resource matching speed is improved;

(2) the similarity of the information text is calculated by using the Jacard similarity coefficient, so that the accuracy of information matching can be improved, and the speed of information matching can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a first embodiment of a linear function based spoken text classification matching system according to the present invention;

FIG. 2 is a block diagram of a second embodiment of the system for matching classified sweet text based on linear function according to the present invention;

FIG. 3 is a block diagram illustrating a third embodiment of the system for matching classified sweet text based on linear function according to the present invention;

FIG. 4 is a block diagram illustrating a fourth embodiment of the system for matching classified sweet text based on linear function according to the present invention;

FIG. 5 is a block diagram illustrating a fifth embodiment of the system for matching classified sweet text based on linear function according to the present invention;

FIG. 6 is a block diagram of the device for classifying and matching the sweet text based on linear function according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Referring to fig. 1, fig. 1 is a block diagram illustrating a first embodiment of a linear function-based classification and matching system for a sweet text according to the present invention. Wherein the sweet text classification matching system based on the linear function comprises: an acquisition module 10, a classification module 20, a calculation module 30 and a matching module 40.

The acquisition module 10 is configured to acquire the characteristics information of the sweet, and establish a sweet characteristic vector set according to the characteristics information of the sweet;

the classification module 20 is configured to establish a linear function classification method, classify the sweet feature vector set according to the linear function classification method, establish a traditional Chinese medicine sweet feature vector set and a western medicine sweet feature vector set, and merge the traditional Chinese medicine sweet feature vector set and the western medicine sweet feature vector set into a sweet feature vector matching model;

the calculating module 30 is used for establishing a TF-IDF algorithm, acquiring the sweet text information to be matched, selecting sweet feature words from the sweet text information to be matched through the TF-IDF algorithm, and establishing a sweet feature vector set to be matched;

and the matching module 40 is used for calculating the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficient, and generating a matching report according to the similarity.

Further, as shown in fig. 2, a block diagram of a second embodiment of the system for matching a classified sweet text based on a linear function according to the present invention is provided based on the above embodiments, in this embodiment, the obtaining module 10 further includes:

the processing module 101 is configured to obtain feature information of the sweet, where the feature information is feature information of symptom accompanied by the sweet, establish a feature information integrity verification rule, verify the feature information of symptom accompanied by the sweet according to the feature information integrity verification rule, and establish a sweet feature vector set according to the feature information of symptom accompanied by the sweet when the verification is passed.

And the adding module 102 is used for acquiring historical characteristic information of the accompanying symptoms of the sweet, comparing the historical characteristic information of the accompanying symptoms of the sweet with the characteristic information of the accompanying symptoms of the sweet, screening out the characteristic information of the accompanying symptoms of the historical sweet without duplication, and adding the characteristic information of the accompanying symptoms of the historical sweet into an imported sweet characteristic vector set.

It should be understood that, in this embodiment, the system acquires the feature information of the sweet, the feature information is feature information of symptom accompanied by the sweet, establishes a feature information integrity verification rule, verifies the feature information of symptom accompanied by the sweet according to the feature information integrity verification rule, and when the verification is passed, establishes a set of feature vectors of the sweet according to the feature information of symptom accompanied by the sweet, which is to detect the feature words in advance to ensure that the information can be directly matched when matched, and the matching failure due to incomplete feature information is avoided.

It is understood that the symptoms associated with sweetness are generally manifested as dry sweetness with little drinking, shortness of breath, tiredness, poor appetite, distension in the abdomen, and dry and soft stools. Taste recovery is at least 10 days, since taste bud cells are all renewed by surrounding epithelial cells. However, treatment must be found early, and is found within a month after the onset of taste disturbance.

It should be understood that, in this embodiment, historical feature information of the symptom accompanied by the sweet is also taken, the historical feature information of the symptom accompanied by the sweet is compared with the feature information of the symptom accompanied by the sweet, the feature information of the symptom accompanied by the sweet without duplication is screened out, and the feature information of the symptom accompanied by the sweet is added to the imported sweet feature vector set, which is further to add the sweet feature vector set further, so as to increase the reliability of information matching.

It should be understood that, in this embodiment, all the disease and disease symptom characteristic information corresponding to the characteristic information of the sweet accompanying symptom is also obtained, and a vector set of the disease and disease symptom characteristic information corresponding to the characteristic information of the sweet accompanying symptom is established. For example, TCM believes that sweetness is mostly caused by gastric dysfunction. Clinically, the oral liquid is divided into sweet taste due to heat in spleen and stomach and sweet taste due to qi and yin in spleen and stomach. The former is mostly caused by excessive intake of pungent, spicy and thick-tasting food and internal heat or exogenous pathogenic heat accumulated in the spleen and stomach, which is mostly damp-heat in the spleen and stomach. It is seen in diabetes patients who like to eat sweet, fat and thick taste. The symptoms of sweet and thirst, drinking water preference, polyphagia and hunger, or sores on lips and tongue, dry stool, red tongue with dry coating, rapid and forceful pulse and the like; the latter is caused by the impairment of spleen and stomach due to aging or chronic diseases, resulting in impairment of both qi and yin, endogenous deficient heat, and burning of spleen fluid, which is commonly manifested as dry mouth due to qi and yin deficiency of spleen and stomach, poor drinking water, short breath, fatigue, poor appetite, abdominal distention, dry and soft stool, etc.

Further, as shown in fig. 3, a block diagram of a third embodiment of the system for classifying and matching a sweet text based on a linear function according to the present invention is provided based on the above embodiments, in this embodiment, the classification module 20 further includes:

the classification calculation module 201 is configured to establish a linear classification function, and set two classification categories: according to the traditional Chinese medicine sweet taste and the western medicine sweet taste, the sweet taste feature vector set is used as a function vector, classification categories are used as classification marks, the traditional Chinese medicine sweet taste feature vector set and the western medicine sweet taste feature vector set are established by utilizing a linear classification function, and the traditional Chinese medicine sweet taste feature vector set and the western medicine sweet taste feature vector set are combined into a sweet taste feature vector matching model.

It should be understood that in this example, a linear classification function is established to classify the sweet into two categories according to the cause of onset of the sweet, and to classify the disease according to the characteristic information of the symptoms. These two categories are: traditional Chinese medicine and Western medicine (such as diabetes and the like). Each sample consists of a vector (i.e., the vector of text features) and a label (indicating which category the sample belongs to). Then, the classification category is used as a classification mark, a traditional Chinese medicine sweet taste feature vector set and a western medicine sweet taste feature vector set are established by utilizing a linear classification function, and the traditional Chinese medicine sweet taste feature vector set and the western medicine sweet taste feature vector set are combined into a sweet taste feature vector matching model

Further, as shown in fig. 4, a block diagram of a fourth embodiment of the system for matching a classified sweet text based on a linear function according to the present invention is provided based on the above embodiments, in this embodiment, the calculating module 30 includes:

the algorithm module 301 is configured to establish a TF-IDF algorithm, acquire the sweet text information to be matched, calculate the word frequency of each word in the sweet text information to be matched through the TF-IDF algorithm, and use the word with the calculated word frequency as a word to be screened.

The feature word processing module 302 sets a common word bank and a word frequency threshold, screens words to be screened according to the common word bank, compares the word frequency of the remaining words to be screened with the word frequency threshold after screening out the common words, selects the words to be screened meeting the word frequency threshold as the features of the sweetness, and establishes a feature vector set of the sweetness to be matched.

It should be understood that, in this embodiment, a TF-IDF algorithm is further established to obtain the sweet text information to be matched, the word frequency of each word in the sweet text information to be matched is calculated through the TF-IDF algorithm, and the word with the calculated word frequency is used as the word to be filtered.

It should be understood that the main ideas of TF-IDF are: if a word appears in an article with a high frequency TF and rarely appears in other articles, the word or phrase is considered to have a good classification capability and is suitable for classification. The word frequency (TF) represents the frequency with which terms (keywords) appear in text. This number is typically normalized (typically word frequency divided by the total word count of the article) to prevent it from being biased towards long documents.

It should be understood that, in order to select the feature words, the system further sets a common word bank and a word frequency threshold, and selects the words to be selected according to the common word bank. The common word bank comprises words such as conjunctions, word-atmosphere words and punctuation marks, after the common words are screened out, the word frequency of the remaining words to be screened is compared with a word frequency threshold value, the words to be screened meeting the word frequency threshold value are selected as the characteristic words of the sweet taste, and a characteristic vector set of the sweet taste to be matched is established.

Further, as shown in fig. 5, a block diagram of a fifth embodiment of the system for matching a classified sweet text based on a linear function according to the present invention is provided based on the above embodiments, in this embodiment, the matching module 40 includes:

the matching report generating module 401 is configured to establish a jaccard similarity coefficient, calculate a similarity between the sweet feature vector matching model and the sweet feature vector set to be matched according to the jaccard similarity coefficient, and generate a corresponding matching report according to the similarity.

It should be understood that, finally, the system establishes Jacard similarity coefficients, calculates the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficients, sets a corresponding similarity range, compares the calculated similarity with the similarity range, and finally generates a corresponding matching report, for example, if the oral cavity is sweet, usually caused by diabetes, or possibly caused by dysfunction of the spleen and stomach. Especially in the morning, the sensation was more pronounced. Even drinking boiled water, the tea can feel sweet.

The above description is only for illustrative purposes and does not limit the technical solutions of the present application in any way.

As can be easily found from the above description, the present embodiment provides a system for matching a classification of a sweet text based on a linear function, including: the acquisition module is used for acquiring the characteristic information of the sweet, and establishing a sweet characteristic vector set according to the characteristic information of the sweet; the classification module is used for establishing a linear function classification method, classifying the sweet feature vector set according to the linear function classification method, establishing a traditional Chinese medicine sweet feature vector set and a western medicine sweet feature vector set, and merging the traditional Chinese medicine sweet feature vector set and the western medicine sweet feature vector set into a sweet feature vector matching model; the calculating module is used for establishing a TF-IDF algorithm, acquiring the sweet text information to be matched, selecting sweet feature words from the sweet text information to be matched through the TF-IDF algorithm, and establishing a sweet feature vector set to be matched; and the matching module is used for calculating the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficient and generating a matching report according to the similarity. The embodiment can accurately match the text information through a linear function classification method, a TF-IDF algorithm and a Jacard similarity coefficient, and improves the accuracy of the whole matching process.

In addition, the embodiment of the invention also provides a device for classifying and matching the sweet texts based on the linear function. As shown in fig. 6, the linear function-based spoken text classification matching apparatus includes: an acquisition unit 10, a classification unit 20, a calculation unit 30 and a matching unit 40.

An obtaining unit 10, configured to obtain the characteristic information of the sweet and the characteristic information of the disease, and respectively establish a sweet characteristic vector set and a disease characteristic vector set according to the characteristic information of the sweet and the characteristic information of the disease;

the classification unit 20 is configured to establish a linear function classification method, classify the sweet feature vector set according to the linear function classification method, establish a traditional Chinese medicine sweet feature vector set and a western medicine sweet feature vector set, and merge the traditional Chinese medicine sweet feature vector set and the western medicine sweet feature vector set into a sweet feature vector matching model;

the calculating unit 30 is used for establishing a TF-IDF algorithm, acquiring the sweet text information to be matched, selecting sweet feature words from the sweet text information to be matched through the TF-IDF algorithm, and establishing a sweet feature vector set to be matched;

and the matching unit 40 is used for calculating the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficient, and generating a matching report according to the similarity.

In addition, it should be noted that the above-described embodiments of the apparatus are merely illustrative, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of the modules to implement the purpose of the embodiments according to actual needs, and the present invention is not limited herein.

In addition, the technical details that are not described in detail in this embodiment may be referred to a linear function-based classification and matching system for a sweet text provided in any embodiment of the present invention, and are not described herein again.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A linear function based sweet text classification matching system, comprising:

2. The linear function based spoken text classification matching system of claim 1, characterized by: the acquisition module comprises a processing module and is used for acquiring the characteristic information of the sweet, wherein the characteristic information of the sweet is characteristic information of the symptom accompanied by the sweet, establishing a characteristic information integrity verification rule, verifying the characteristic information of the symptom accompanied by the sweet according to the characteristic information integrity verification rule, and establishing a sweet characteristic vector set according to the characteristic information of the symptom accompanied by the sweet when the verification is passed.

3. The linear function based spoken text classification matching system of claim 2, characterized by: the acquisition module comprises an adding module which is used for acquiring historical sweet accompanying symptom characteristic information, comparing the historical sweet accompanying symptom characteristic information with the sweet accompanying symptom characteristic information, screening out non-repeated historical sweet accompanying symptom characteristic information, and adding the historical sweet accompanying symptom characteristic information into an imported sweet characteristic vector set.

4. The linear function based spoken text classification matching system of claim 3, characterized by: the classification module comprises a classification calculation module for establishing a linear classification function and setting two classification categories: according to the traditional Chinese medicine sweet taste and the western medicine sweet taste, the sweet taste feature vector set is used as a function vector, classification categories are used as classification marks, the traditional Chinese medicine sweet taste feature vector set and the western medicine sweet taste feature vector set are established by utilizing a linear classification function, and the traditional Chinese medicine sweet taste feature vector set and the western medicine sweet taste feature vector set are combined into a sweet taste feature vector matching model.

5. The linear function based spoken text classification matching system of claim 4, characterized by: the calculation module comprises an algorithm module used for establishing a TF-IDF algorithm to obtain the sweet text information to be matched, calculating the word frequency of each word in the sweet text information to be matched through the TF-IDF algorithm, and taking the word with the calculated word frequency as the word to be screened.

6. The linear function-based spoken text classification matching system of claim 5, characterized by: the calculation module comprises a feature word processing module, a common word bank and a word frequency threshold are set, words to be screened are screened according to the common word bank, after the common words are screened out, the word frequency of the remaining words to be screened is compared with the word frequency threshold, the words to be screened meeting the word frequency threshold are selected as the feature words of the sweet taste, and a feature vector set of the sweet taste to be matched is established.

7. The linear function-based spoken text classification matching system of claim 6, characterized by: the matching module comprises a matching report generating module which is used for establishing Jacard similarity coefficients, calculating the similarity between the sweet feature vector matching model and the sweet feature vector set to be matched through the Jacard similarity coefficients, and generating a corresponding matching report according to the similarity.

8. A linear function-based sweet text classification matching device is characterized by comprising: