CN116881799A - Method for classifying cigarette production data - Google Patents

Method for classifying cigarette production data Download PDF

Info

Publication number
CN116881799A
CN116881799A CN202310862754.7A CN202310862754A CN116881799A CN 116881799 A CN116881799 A CN 116881799A CN 202310862754 A CN202310862754 A CN 202310862754A CN 116881799 A CN116881799 A CN 116881799A
Authority
CN
China
Prior art keywords
data field
service system
words
distance
tobacco
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310862754.7A
Other languages
Chinese (zh)
Inventor
李新建
邹鑫灏
陈小虎
严智
谢超
郭著松
崔书方
潘伟
刘艳超
侯毓
程婉君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Hubei Industrial LLC
Original Assignee
China Tobacco Hubei Industrial LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Hubei Industrial LLC filed Critical China Tobacco Hubei Industrial LLC
Priority to CN202310862754.7A priority Critical patent/CN116881799A/en
Publication of CN116881799A publication Critical patent/CN116881799A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for classifying cigarette production data, which comprises the following steps: s1, extracting representative words of a service system, and taking the representative words as core words; s2, calculating the association distance between every two business systems to form a business system core word vector; a BOW word bag model is adopted, and all core words of each service system are formed into service system core word vectors in the occurrence times of different service systems; s3, normalizing core word vectors of the service system; s4, calculating the correlation distance of the service system; s5, drawing a two-dimensional distance distribution diagram of the service system; s6, clustering the service system; k-means clustering algorithm based on Euclidean distance is adopted, and K classifications are finally clustered in the determined two-dimensional plane; s7, determining a classification result; an appropriate K value is determined to determine the classification result. The method adopts transverse field association analysis, performs distance calculation, clustering and classification on a two-dimensional plane, and visualizes and visually displays; and visually displaying the classification area on a two-dimensional plane, and determining the final K value and the classification result.

Description

Method for classifying cigarette production data
Technical Field
The invention relates to the technical field of data security, in particular to a method for classifying production data of cigarettes.
Background
Data classification is the primary task of data security management. And each industry carries out data classification work according to the service characteristics and data of the industry. The digitalized transformation of tobacco manufacturing enterprises has the main business processes of informatization and digitalization, and the generated information is gradually transformed into important digital assets of the enterprises in different forms. Meanwhile, industrial data is more complex and diversified along with the increase of application scenes, and potential safety risks are faced from technology to management in the process of transferring industrial data which is not subjected to classified management among different businesses. The influence of security threats such as data leakage of the current tobacco manufacturing enterprises not only affects the interests of the enterprises, but also has a certain influence on social production and national security. How to guide cigarette manufacturing enterprises to standardize industrial data classification management and practically guarantee industrial data safety is a current urgent problem to be solved.
Therefore, a method for classifying the cigarette production data is provided.
Disclosure of Invention
The invention aims to provide a method for classifying cigarette production data, which solves the problems of insufficient automation and refinement degree of data classification of tobacco enterprises and huge manual input.
In order to achieve the above purpose, the present invention provides the following technical solutions: the method for classifying the cigarette production data is characterized by comprising the following steps of:
s1, extracting representative words of a service system, and taking the representative words as core words;
s2, calculating the association distance between every two business systems to form a business system core word vector; a BOW word bag model is adopted, and all core words of each service system are formed into service system core word vectors in the occurrence times of different service systems;
s3, normalizing core word vectors of the service system; wherein d is i Representing weights, retaining only valuable words, where c i Indicating that word i appears c in the business system field i The denominator is the number of all useful words, and the formula is:
s4, calculating the correlation distance of the service system; by calculating the euclidean distance of the core word vector,
s5, drawing a two-dimensional distance distribution diagram of the service system; selecting a system as a center dot in the two-dimensional plane; the distance between other systems and the system is used as the length of a connecting line, and the equal-length connecting points are arranged in a circumference manner; far outward discharging; gradually arranging all the systems on a two-dimensional plane; the point is a service system; the line is a service system associated distance connecting line;
s6, clustering the service system; k-means clustering algorithm based on Euclidean distance is adopted, and K classifications are finally clustered in the determined two-dimensional plane;
s7, determining a classification result; an appropriate K value is determined to determine the classification result.
Further, the representative words related to the business system include time, place, person, action, result, as core words.
Further, in step S2, the service system includes a representative word including name, mobile phone number, tobacco producing area, tobacco price, shredding, temperature and logistics; the second representative word of the service system comprises a name, a home address, a mobile phone number, a tobacco producing place, a tobacco price, a roll package and an activity;
the third representative word of the service system comprises a mobile phone number, a tobacco leaf producing place, a tobacco leaf price, a rolling package and an activity;
the fourth representative word of the service system comprises a name, a mobile phone number, a tobacco leaf producing place, a tobacco leaf price, a tobacco leaf manufacturing process and a tobacco leaf manufacturing process;
the service system five representative words comprise names, mobile phone numbers, coil packages and actions;
in step S2, two service systems may be selected from the five service systems.
Further, in step S2, two service systems, namely, the first service system and the second service system, construct a word bag:
dictionary= {1: "name", 2."like", 3."tobacco producing place", 4."tobacco price", 5."making filament", 6."home address", 7."wrapping", 8."active", 9."temperature", 10."logistics" }.
Further, in step S6,
when k=5 is chosen, clustering is performed to obtain: 1) a development data field, 2) a production data field, 3) a management data field, 4) an operation and maintenance data field, 5) an external data field.
When k=6 is chosen, clustering is performed to obtain: 1) a development data field, 2) a production data field, 3) an administration data field, 4) a management data field, 5) a support data field, 6) an external data field.
When k=7 is chosen, clustering yields: 1) a development data field, 2) a production data field, 3) a production data field, 4) an administration data field, 5) quality process control data, 6) plant operation control data, 7) production process control data.
Further, in step S7,
and finally selecting K=6 according to service judgment, and clustering to obtain: 1) a development data field, 2) a production data field, 3) an administration data field, 4) a management data field, 5) a support data field, 6) an external data field.
Further, in step S5, if some connection lines cannot simultaneously satisfy connection of a plurality of points on a plane, connection with a large value is discarded.
Compared with the prior art, the invention has the beneficial effects that:
(1) The method changes the traditional method for classifying the isolated field definition, adopts transverse field association analysis, performs distance calculation cluster classification on a two-dimensional plane, and visualizes and visually displays.
(2) The relative distance between the core points is calculated through the K-means clustering algorithm based on Euclidean distance, and different results cannot be generated due to different selected two-dimensional coordinate dots.
(3) The K value is adjusted to obtain different classification results, classification areas can be intuitively displayed on a two-dimensional plane, and a reference basis is provided for determining the final K value and the classification results.
Drawings
FIG. 1 is a flow chart of the classification of production data according to an embodiment of the present invention;
FIG. 2 is a two-dimensional distance distribution diagram of a business system according to an embodiment of the present invention;
FIG. 3 is a diagram of normalized core word vectors of a business system according to an embodiment of the present invention;
FIG. 4 is a business system associated distance graph of an embodiment of the present invention;
fig. 5 is a two-dimensional distance distribution diagram of a business system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-5, the present invention provides a technical solution: the method for classifying the cigarette production data is characterized by comprising the following steps of: .
As shown in fig. 1, the steps are described:
1. and extracting representative words of the service system as core words. And removing the field of whether the state is equal to the non-entity content. The time, place, person, action, result, etc. associated with the business system represent words as core words. The extraction object is a data field extracted from a data description such as a database table field, a transmission interface API, a message field, a data packet json, xml, a data file, and the like of the service system, for example: name, age, tobacco location, baking time, etc.
2. Calculating the association distance between every two business systems to form a business system core word vector; and adopting a BOW (bag of words) word bag model, and forming the core word vectors of the service systems by using the occurrence times of all core words of each service system in different service systems.
For example, there are two business systems:
business system is a representative word: name mobile phone number tobacco leaf producing place tobacco leaf price shredding temperature mobile phone number logistics.
Service system two representative words: the tobacco price package of the tobacco producing place of the name home address mobile phone number is active.
Based on the two business systems, a dictionary, namely a word bag, is constructed:
dictionary= {1: "name", 2."like", 3."tobacco producing place", 4."tobacco price", 5."making filament", 6."home address", 7."wrapping", 8."active", 9."temperature", 10."logistics" }.
The dictionary contains 10 different words in total, and by using the index number of the dictionary, the above two business systems can each be represented by a 10-dimensional vector (the number of times a word appears in a business system field is represented by the integer numbers 0-n (n is a positive integer):
1)X 1 =[1,2,1,1,1,0,0,0,1,1]
2)X 2 =[1,1,1,1,0,1,1,1,0,0]
each element in the vector represents the number of times the associated element in the dictionary appears in the business system. However, it can be seen in constructing the vector that the order in which the words appear in the original system is not expressed, i.e., regardless of the order, and only the number of occurrences is fetched.
3. And normalizing the core word vector of the service system. By d i Representing weights, retaining only valuable words, where c i Indicating that word i appears c in the business system field i The denominator is the number of all useful words, and the formula is:
the core word vectors of the component service system illustrated above are normalized as follows:
1)X 1 =[0.125,0.25,0.125,0.125,0.125,0,0,0,0.125,0.125]
2)X 2 =[0.14,0.14,0.14,0.14,0.00,0.14,0.14,0.14,0.00,0.00]
4. and calculating the correlation distance of the service system. The euclidean distance of the core word vector is calculated.
5. And drawing a two-dimensional distance distribution diagram of the service system. And selecting a system as a center dot in the two-dimensional plane. The distance between other systems and the system is used as the connecting length, the equal-length connecting points are arranged in a circle, the distance is far outwards arranged, and all the systems are gradually arranged on a two-dimensional plane. As shown in fig. 2 below. The points are service systems, and the lines are service system associated distance connecting lines. (if some connecting lines cannot meet the requirement that a plurality of points are connected on a plane at the same time, the connection with a large value is abandoned).
6. Clustering the business systems. The K-means clustering algorithm based on Euclidean distance is adopted, and the closer the distance between two targets is, the greater the similarity is. The two-dimensional plane determined by fig. 2 finally clusters K classifications.
When k=5 is chosen, clustering is performed to obtain: 1) a development data field, 2) a production data field, 3) a management data field, 4) an operation and maintenance data field, 5) an external data field.
When k=6 is chosen, clustering is performed to obtain: 1) a development data field, 2) a production data field, 3) an administration data field, 4) a management data field, 5) a support data field, 6) an external data field.
When k=7 is chosen, clustering yields: 1) a development data field, 2) a production data field, 3) a production data field, 4) an administration data field, 5) quality process control data, 6) plant operation control data, 7) production process control data.
7. And determining a classification result. And finally selecting K=6 according to service judgment, and clustering to obtain: 1) a development data field, 2) a production data field, 3) an administration data field, 4) a management data field, 5) a support data field, 6) an external data field.
The method is characterized by comprising the following steps:
classifying by service system clustering: (1) extracting representative words of the business system as core words. (2) forming word vectors by adopting a word bag model. (3) clustering by adopting a K-means algorithm. (4) examining the classification result using different K values.
And visually displaying the classification result on a two-dimensional plane: (1) selecting a system as a center dot in a two-dimensional plane. Other systems have the same length as the system distance, and the equal length connection points are arranged in a circle. Far outwardly. The point is the business system. The line is a business system associated distance line. (2) And selecting different K values to obtain different clustering classification results, and performing visual presentation.
For example, there are two business systems:
1-extracting representative words of the business system as core words.
Business system is a representative word: name mobile phone number tobacco leaf producing place tobacco leaf price shredding temperature mobile phone number logistics;
service system two representative words: the price package of the tobacco leaves in the tobacco producing place of the name home address mobile phone number is active;
three representative words of business system: mobile phone number tobacco leaf producing place tobacco leaf price wrapping is active;
service system four representative words: tobacco price of the tobacco producing area with the name of the mobile phone number is cut into shreds;
service system five representative words: the name mobile phone number package is active;
two service systems may be selected from the five service systems.
And 2, calculating the association distance between every two business systems to form a core word vector of the business system.
Based on the two business systems, a dictionary, namely a word bag, is constructed:
dictionary= {1: "name", 2 } "cell phone number", 3."tobacco producing place", 4."tobacco price", 5."making filament", 6."home address", 7."wrapping", 8."active", 9."temperature", 10."logistics" }.
1)X 1 =[1,2,1,1,1,0,0,0,1,1]
2)X 2 =[1,1,1,1,0,1,1,1,0,0]
3)X 3 =[0,1,1,1,0,0,1,1,0,0]
4)X 4 =[1,1,1,1,2,0,0,0,0,0]
5)X 5 =[1,1,0,0,0,0,1,1,0,0]
4-3-service system core word vector normalization, and the calculation result is shown in figure 3;
5-calculating the service system association distance. D (X) i ,X j ) The calculation result is shown in fig. 4;
6-drawing a two-dimensional distance distribution diagram of the service system, as shown in fig. 5;
7-clustering the service systems.
Taking k=1, all in one class. K=4, X2 and X3 are the same category, and the others are 3 categories. And determining a classification result. An appropriate K value is determined to determine the classification result. The present example recommends a value of k=4.
The data field mark is adopted for classification, and the isolated viewing field attribute is used for classification, but only one field point. Sensitive data is not obtained from the cross operation of the plurality of fields in the transverse direction, and the data quantity is accumulated in the longitudinal direction to reach the sensitivity degree. The method explores and carries out association analysis from transverse and multi-field, constructs core word vectors of the service system, calculates the similarity distance of the service system and carries out two-dimensional plane clustering classification.
The method for classifying the tobacco enterprise data by using the independent field semantic association analysis is characterized in that the method for classifying the tobacco enterprise data by using the independent field semantic association analysis is used for detecting and calculating the common personnel data and the tobacco manufacturing package and cut tobacco manufacturing data based on the large model, and the method is finer than the traditional classification and is more beneficial to service data sharing and flowing.
The method explores and changes the traditional method for classifying the isolated field definition, adopts the method for weighting the external sharing according to the horizontal field semantic association analysis and the longitudinal accumulation field based on the large model, and inherits the manual classification base number in the vertical direction to perform three-dimensional space clustering classification.
The traditional manual classification marking has huge workload, is difficult to memorize and is not fine. There is a separate look at the fields for the classification work of the data, relying on the business personnel to classify the interpretation and understanding of the fields. Whereas the data of all business systems has similar meaning with language and sentence expressions. The fields cannot be seen in isolation. The method is a purposeful and meaningful selection and presentation of specific data during data generation, warehousing and sharing. For example, inquiring the qualification rate of a product in a cut tobacco manufacturing workshop of a cigarette factory for 2 months. Although the final output is one percentage. But wherein the associated time, total product quantity, compliance product quantity, etc. fields. The traditional classification labels only look at the field definition and do not perform relevant field weighting analysis.
Primary classification: the cigarette production data is subjected to a correlation analysis flow in a transverse multi-field mode, and the correlation analysis flow is shown in figure 1. Through multi-field association analysis, a clustering algorithm is adopted to extract representative words (associated with a plurality of words) of main business systems of each tobacco industry as core words, then based on word association, primary association, secondary association and tertiary association are extended, such as A and B association, B and C association and C and D association, mutual weight attenuation is calculated, and association of A and D is obtained. All words are aggregated to the core words as far as possible, industry word lists are made, and clustering is carried out by adopting the clustering algorithm industry word lists, so that data classification is obtained.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. The method for classifying the cigarette production data is characterized by comprising the following steps of:
s1, extracting representative words of a service system, and taking the representative words as core words;
s2, calculating the association distance between every two business systems to form a business system core word vector; a BOW word bag model is adopted, and all core words of each service system are formed into service system core word vectors in the occurrence times of different service systems;
s3, normalizing core word vectors of the service system; wherein, the liquid crystal display device comprises a liquid crystal display device,
by d i Representing weights, retaining only valuable words, where c i Indicating that word i appears c in the business system field i The denominator is the number of all useful words, and the formula is:
s4, calculating the correlation distance of the service system; by calculating the euclidean distance of the core word vector,
s5, drawing a two-dimensional distance distribution diagram of the service system; selecting a system as a center dot in the two-dimensional plane; the distance between other systems and the system is used as the length of a connecting line, and the equal-length connecting points are arranged in a circumference manner; far outward discharging; gradually arranging all the systems on a two-dimensional plane; the point is a service system; the line is a service system associated distance connecting line;
s6, clustering the service system; k-means clustering algorithm based on Euclidean distance is adopted, and K classifications are finally clustered in the determined two-dimensional plane;
s7, determining a classification result; an appropriate K value is determined to determine the classification result.
2. The method for classifying cigarette production data according to claim 1, wherein: representative words associated with the business system include time, place, character, action, result, as core words.
3. The method for classifying cigarette production data according to claim 1 or 2, wherein: in the step S2, the service system comprises representative words including name, mobile phone number, tobacco producing place, tobacco price, shredding, temperature and logistics; the second representative word of the service system comprises a name, a home address, a mobile phone number, a tobacco producing place, a tobacco price, a roll package and an activity;
the third representative word of the service system comprises a mobile phone number, a tobacco leaf producing place, a tobacco leaf price, a rolling package and an activity;
the fourth representative word of the service system comprises a name, a mobile phone number, a tobacco leaf producing place, a tobacco leaf price, a tobacco leaf manufacturing process and a tobacco leaf manufacturing process;
the service system five representative words comprise names, mobile phone numbers, coil packages and actions;
in step S2, two service systems may be selected from the five service systems.
4. A method of sorting cigarette production data according to claim 3, wherein: in step S2, two business systems, namely, a first business system and a second business system, construct a word bag:
dictionary= {1: "name", 2."like", 3."tobacco producing place", 4."tobacco price", 5."making filament", 6."home address", 7."wrapping", 8."active", 9."temperature", 10."logistics" }.
5. A method of sorting cigarette production data according to claim 3, wherein: in the step S6 of the process,
when k=5 is chosen, clustering is performed to obtain: 1) a development data field, 2) a production data field, 3) a management data field, 4) an operation and maintenance data field, 5) an external data field.
When k=6 is chosen, clustering is performed to obtain: 1) a development data field, 2) a production data field, 3) an administration data field, 4) a management data field, 5) a support data field, 6) an external data field.
When k=7 is chosen, clustering yields: 1) a development data field, 2) a production data field, 3) a production data field, 4) an administration data field, 5) quality process control data, 6) plant operation control data, 7) production process control data.
6. The method for sorting cigarette production data according to claim 5, wherein: in the step S7 of the process,
and finally selecting K=6 according to service judgment, and clustering to obtain: 1) a development data field, 2) a production data field, 3) an administration data field, 4) a management data field, 5) a support data field, 6) an external data field.
7. The method for classifying cigarette production data according to claim 1, wherein: in step S5, if some connection lines cannot simultaneously satisfy the connection of a plurality of points on a plane, the connection with a large value is discarded.
CN202310862754.7A 2023-07-13 2023-07-13 Method for classifying cigarette production data Pending CN116881799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310862754.7A CN116881799A (en) 2023-07-13 2023-07-13 Method for classifying cigarette production data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310862754.7A CN116881799A (en) 2023-07-13 2023-07-13 Method for classifying cigarette production data

Publications (1)

Publication Number Publication Date
CN116881799A true CN116881799A (en) 2023-10-13

Family

ID=88254404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310862754.7A Pending CN116881799A (en) 2023-07-13 2023-07-13 Method for classifying cigarette production data

Country Status (1)

Country Link
CN (1) CN116881799A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093935A (en) * 2023-10-16 2023-11-21 深圳海云安网络安全技术有限公司 Classification method and system for service system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093935A (en) * 2023-10-16 2023-11-21 深圳海云安网络安全技术有限公司 Classification method and system for service system
CN117093935B (en) * 2023-10-16 2024-03-19 深圳海云安网络安全技术有限公司 Classification method and system for service system

Similar Documents

Publication Publication Date Title
CN109359244B (en) Personalized information recommendation method and device
CN107515873B (en) Junk information identification method and equipment
CN109408712B (en) Construction method of multidimensional information portrait of travel agency user
CN106997549A (en) The method for pushing and system of a kind of advertising message
CN107578270A (en) A kind of construction method, device and the computing device of financial label
CN104077407B (en) A kind of intelligent data search system and method
CN109460519B (en) Browsing object recommendation method and device, storage medium and server
CN111161021B (en) Quick secondary sorting method for recommended commodities based on real-time characteristics
CN105843796A (en) Microblog emotional tendency analysis method and device
CN116881799A (en) Method for classifying cigarette production data
CN106897359A (en) Internet information is collected and correlating method
CN109582783B (en) Hot topic detection method and device
CN106919699A (en) A kind of recommendation method for personalized information towards large-scale consumer
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
CN112256865B (en) Chinese text classification method based on classifier
Wei et al. Online education recommendation model based on user behavior data analysis
Efendi et al. Sentiment Analysis of Food Order Tweets to Find Out Demographic Customer Profile Using SVM
CN105354720B (en) A method of mixed recommendation is carried out to consumption place based on visual cluster
CN112434126B (en) Information processing method, device, equipment and storage medium
CN111444337B (en) Topic tracking method based on improved KL divergence
CN113836434A (en) Web page data processing method based on database
Guo et al. EC-Structure: Establishing consumption structure through mining e-commerce data to discover consumption upgrade
CN108287902B (en) Recommendation system method based on data non-random missing mechanism
CN112989165A (en) Method for calculating public opinion entity relevance
He Statistical Interpretation and Modeling Analysis of Multidimensional Complicated Computer Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication