CN110347828A - A kind of Metro Passenger demand dynamic acquisition method and its obtain system - Google Patents
A kind of Metro Passenger demand dynamic acquisition method and its obtain system Download PDFInfo
- Publication number
- CN110347828A CN110347828A CN201910561357.XA CN201910561357A CN110347828A CN 110347828 A CN110347828 A CN 110347828A CN 201910561357 A CN201910561357 A CN 201910561357A CN 110347828 A CN110347828 A CN 110347828A
- Authority
- CN
- China
- Prior art keywords
- text
- cluster
- demand
- word
- follows
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
The invention discloses a kind of Metro Passenger demand dynamic acquisition method and its obtain system, comprising the following steps: step 1: building demand dictionary obtains user's dispatch data from social network-i i-platform;Step 2: the data of acquisition are pre-processed;Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;Step 4: carrying out correlation cluster;Step 5: to each clustering cluster, giving label as requirement item, and calculate the different degree of requirement item;Step 6: requirement item is first determined whether it is present in demand dictionary, if then exiting, if otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand item if meeting, and demand dictionary is added it to, it is exited if being unsatisfactory for;The present invention can handle a large amount of user's dispatch, improve customer requirement retrieval efficiency, subjectivity is low;Demand perference and potential user demand can be obtained in real time from mass users dispatch.
Description
Technical field
The invention discloses a kind of Metro Passenger demand dynamic acquisition methods, and in particular to a kind of Metro Passenger demand dynamic
Acquisition methods and its acquisition system.
Background technique
Nearly 10 Yu Nianlai, the transport capacity of railway gradually enhance, and the volume of passenger transportation is also stepped up.Subway, high-speed rail visitor
Rail line road mileage will be further increased in the increase of freight volume and volume of the circular flow, increases railcar quantity on order.This is to ground
Iron car manufacturing enterprise provides opportunities and challenges.The client of rail vehicle manufacturing enterprise includes operation enterprise and passenger, however
Current track vehicle manufacture enterprise is primarily upon the demand of operation enterprise and lacks the analysis to passenger demand, to influence terminal
Client is unfavorable for improving the market competitiveness of enterprise to the satisfaction of rail vehicle manufacturing enterprise product.
Passenger demand includes passenger demand item and its different degree, all dynamic change at any time, and existing requirement acquisition method,
Such as questionnaire.Not only need to expend a large amount of manpowers when obtaining dynamic passenger demand but also there are biggish subjectivity,
This all constrains rail vehicle manufacturing enterprise and analyzes passenger demand.
Summary of the invention
The present invention provides the Metro Passenger demand dynamic acquisition method and its obtain that a kind of data acquisition is high-efficient, subjectivity is low
Take system.
The technical solution adopted by the present invention is that: a kind of Metro Passenger demand dynamic acquisition method, comprising the following steps:
Step 1: building demand dictionary, dictionary obtains user's dispatch data from social network-i i-platform according to demand;
Step 2: the data obtained to step 1 pre-process;
Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;
Step 4: the filtered text of step 3 being subjected to correlation by the modified K mean cluster method of silhouette coefficient and is gathered
Class;
Step 5: to each clustering cluster in step 4, giving label as requirement item, and calculate the different degree of requirement item;
Step 6: requirement item obtained in step 5 is first determined whether it is present in demand dictionary, if then exiting,
If otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand if meeting
, and demand dictionary is added it to, it is exited if being unsatisfactory for.
Further, it is as follows to obtain data procedures for the step 1:
It is retrieved in social network-i i-platform using the word in demand dictionary as keyword, obtains user's dispatch;Pass through net
Network crawler obtains text data.
Further, detailed process is as follows for step 3:
S11: to the pretreated text random sampling of step 2, training sample and test sample are generated;
S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample
Yield value is greater than the word of given threshold as Feature Words by the information gain value of this comentropy and each word;
Training samples information entropy IG (X) calculating process is as follows:
In formula: X is training sample set, N1And N2Respectively indicate related text quantity and uncorrelated amount of text;
Information gain value IG (word) value calculating process of each word is as follows:
In formula: word is the word that training sample is concentrated, and A, B are respectively each word in related text and uncorrelated text
The frequency of appearance, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text;
S13: calculating the characteristic value of Feature Words in each text, and text representation is characterized value vector;
S14: SVM classifier is constructed according to training sample, improves classifier with test sample;
S15: the support vector classifier obtained using step S14 classifies to data, be divided into demand related text and
Uncorrelated text removes uncorrelated text.
Further, the modified K mean cluster method of silhouette coefficient is poly- by passing through K mean value first in the step 4
Then class determines optimum cluster number of clusters k by silhouette coefficient;
K mean cluster process is as follows:
Determine in certain clustering cluster each point to the square distance and dist (S of cluster centrek):
In formula: SkFor the text collection of each cluster, xiFor SkThe feature value vector of text, n in clustersFor SkThe number of text in cluster
Amount, ukFor SkThe cluster centre of cluster, i are text label in cluster;
Wherein ukIt is as follows:
In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:
In formula: k is the number of clusters of cluster, and S is total text collection number, and j is each clustering cluster label in text collection;
Silhouette coefficient L (xi) it is as follows:
In formula: a (xi) it is text xiWith it with the average value of all text distances other in cluster, b (xi) it is text xiWith
xiThe average distance of all texts in an outer cluster;
Mean profile coefficient L (x)kAre as follows:
In formula: N is the amount of text of entire text set;
When mean profile coefficient maximum, corresponding number of clusters k is best cluster number of clusters.
Further, the step 5 different degree calculating process is as follows:
S21: temperature r is propagatedkIt is as follows:
In formula: nsFor amount of text in every cluster, ZiFor the transfer amount of i-th text in every cluster, DiFor the i-th provision in every cluster
This amount of thumbing up, PiFor the comment amount of i-th text in every cluster, w1、w2And w3For constant, k is cluster number of clusters;
S22: it is modified with range is propagated to temperature is propagated:
r′k=rk×gk
In formula: r 'kFor revised propagation temperature, gkTo propagate range, gk=ls/ns, lsFor the user to send the documents in every cluster
Quantity;
S23: different degree RkCalculation method is as follows:
In formula: S is total text collection number, r 'iFor the propagation temperature after i-th demand correction, i is requirement item label.
Further, counterpropagate persistence calculating process is as follows in the step 6:
S31: persistence j is propagatedkIt is as follows:
In formula: r 'k0、r′k1、r′k2For the propagation temperature obtained in continuous three periods, wherein r 'k0It is obtained for this
Propagate temperature;
S32: counterpropagate persistence JkAre as follows:
In formula: S is total text collection number, jiFor the propagation persistence of i-th demand, i is requirement item label.
Further, characteristic value is measured by term frequency-inverse document word frequency in the step S13, term frequency-inverse document word frequency
TF-IDF calculation method is as follows:
TF-IDF (word)=TF (word) × IDF (word)
In formula: TF is word frequency of occurrences in a text, and IDF is the word frequency of occurrences, TF in other texts
It (word) is some word frequency of occurrences in a text, IDF (word) is the inverse document frequency for occurring some word in text collection
Rate;
Wherein:
In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text,
F is training sample word sum, and F (word) is word frequency of occurrence in training sample.
A kind of Metro Passenger demand dynamic acquisition system, which is characterized in that including Data Acquisition Model, Text Pretreatment mould
Block, text filtering module, text cluster module, requirement extract module, new demand evaluation module and demand dictionary;
Demand dictionary is for storing the relevant requirement item of railcar passenger demand;
Data acquisition module is used to obtain the dispatch data in social network-i i-platform;
Text Pretreatment module is for pre-processing the text of acquisition;
Text filtering module is used to filter out in text and the incoherent text of passenger demand;
Text cluster module is used to carry out correlation cluster to filtered text data;
Requirement extract module is used to extract the requirement item in each clustering cluster;
New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary.
The beneficial effects of the present invention are:
(1) present invention obtains a large amount of user by web crawlers and sends the documents, and obtains passenger demand, improves user demand and obtain
Efficiency is taken, subjectivity is low;
(2) present invention can analyze the dynamic need of mass users in real time, persistently capture passenger demand preference, obtain accordingly
Effective passenger demand different degree.
(3) present invention can in real time, automatically have found emerging, potential user demand.
Detailed description of the invention
Fig. 1 is the method for the present invention flow diagram.
Fig. 2 is silhouette coefficient schematic diagram of calculation result in the embodiment of the present invention.
Fig. 3 is present system structural schematic diagram.
Fig. 4 is passenger demand variation tendency schematic diagram in the embodiment of the present invention.
Specific embodiment
The present invention will be further described in the following with reference to the drawings and specific embodiments.
As shown in Figure 1, a kind of Metro Passenger demand dynamic acquisition method, comprising the following steps:
Step 1: building demand dictionary, dictionary obtains user's dispatch data from social network-i i-platform according to demand;
User's dispatch is obtained from social network-i i-platform based on demand dictionary.Wherein demand dictionary is needed with railcar passenger
Ask set of correlation word, including passenger demand item, rail vehicle name of product etc..Using the word in demand dictionary as key
Word such as " subway speed " retrieves associated user's dispatch, then obtain these by web crawlers technology in social network-i i-platform
Text data.The present embodiment is retrieved with the keywords such as " subway wifi ", " subway speed ", " subway is steady ".
The words such as the passenger demand item (e.g., speed) of the storage in demand dictionary, railcar name of product (e.g., subway)
Be it is predefined according to practical term, these contents technical solution subsequent step can be enriched constantly through the invention.
Step 2: the data obtained to step 1 pre-process;
Pretreatment includes dispatch primary filtration, participle, the part-of-speech tagging etc. to acquisition.It is divided into following three step to carry out:
1) the dispatch feature for combining social platform, formulates filtering rule, further according to the rule drafted, primary filtration text.
Wherein, filtering rule, that is, primary filtration foundation, is write in the form of production rule.By whether including noise in analysis text
Character (e.g., #, []) carries out judging whether to filter.
2) text after primary filtration is subjected to participle and part-of-speech tagging.Participle is by text segmentation into word one by one,
Part-of-speech tagging is that the word that will get sticks the labels such as noun, verb.
3) word of incorporeity meaning, including two parts are filtered, first is that filtering stop words in conjunction with existing deactivated vocabulary, such as
" ", " " etc..Second is that in conjunction with the word other than part of speech filtering noun, verb, adjective, such as adverbial word, pronoun.
Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;
After the processing of step 1 and step 2, primary filtration noise, but include much noise.It makes an uproar this part
Sound text presentation be description main object be not railcar, but contain step 1 for retrieval keyword.Such as " subway
The speed that upper aunt robs seat enables me be taken aback ", this text cannot react demand of the passenger to railcar.To this partial noise text
This filtering, which can be considered, carries out two classification to text, is broadly divided into the following steps:
S11: to the pretreated text random sampling of step 2, training sample and test sample are generated;
Random sampling, and manually generated training sample and test sample are carried out to pretreated text.Wherein, sampling is wanted
Guarantee two principles, first is that the content of sample will be related to the content that each keyword retrieval goes out in step 1, second is that from each key
Word and search goes out the quantity sampled in content, and to each keyword retrieval to go out content quantity directly proportional.
S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample
Yield value is greater than the word of given threshold as Feature Words by the information gain value of this comentropy and each word;
Based on training sample, the Feature Words that can identify related text and uncorrelated text are selected, such as " aunt " " robs
Seat ".Using the method for information gain feature selecting:
Information gain is the feature selection approach that Feature Words are determined according to the information content size contained by word, information content
It is indicated with comentropy, calculating process is as follows:
In formula: X is training sample set, N1And N2Respectively indicate related text quantity and uncorrelated amount of text;
Information gain value IG (word) calculating process of each word is as follows:
In formula: word is the word that training sample is concentrated, and A, B are respectively each word in related text and uncorrelated text
The frequency of appearance, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text.
Each word is sorted from large to small by information increasing, then selected value is biggish as Feature Words, such as table 1 is the present embodiment
Some numerical results:
The sequence of 1. information gain value of table
Sequence | Word | Information gain value |
1 | Rob seat | 0.9744340029 |
2 | Aunt | 0.9631205685 |
3 | It checks card | 0.8819280948 |
4 | Spurt | 0.8529583405 |
5 | Transfer | 0.8329984805 |
… | … | … |
S13: calculating the characteristic value of Feature Words, and text representation is characterized value vector;
Term frequency-inverse document word frequency is to comprehensively consider word to occur in the frequency of occurrences (TF) and other texts in a text
The feature value calculating method of frequency (IDF), term frequency-inverse document word frequency calculation method are as follows:
TF-IDF (word)=TF (word) × IDF (word)
In formula: TF is word frequency of occurrences in a text, and IDF is the word frequency of occurrences, TF in other texts
It (word) is some word frequency of occurrences in a text, IDF (word) is the inverse document frequency for occurring some word in text collection
Rate;
Wherein:
In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text,
F is training sample word sum, and F (word) is word frequency of occurrence in training sample.
S14: constructing SVM classifier according to training sample, carries out classification to test sample and is trained;
According to the test result of each test sample, training sample is expanded to increase training sample to not of the same race
The coverage of noise like, improves classifier.
S15: the support vector classifier obtained using step S14 classifies to data, be divided into demand related text and
Uncorrelated text removes uncorrelated text.
Step 4: the filtered text of step 3 being subjected to correlation by the modified K mean cluster method of silhouette coefficient and is gathered
Class;
K mean cluster is carried out to the data that step 3 is obtained by filtration, optimal number of clusters k is determined by silhouette coefficient.
K mean cluster process is as follows:
K mean value is according to sorting out text apart from size between text, and between text is the correlation of text apart from size
Degree is measured using Euclidean distance, determines in certain clustering cluster each point to the square distance and dist (S of cluster centrek):
In formula: SkFor the text collection of each cluster, xiFor SkThe feature value vector of text, n in clustersFor SkThe number of text in cluster
Amount, ukFor SkThe cluster centre of cluster, i are text label in cluster;
Wherein ukIt is as follows:
The target of K mean cluster be all samples in Clustering Domain to be realized to the distance of cluster centre quadratic sum most
It is small;In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:
In formula: k is the number of clusters of cluster, and S is total text collection number, and j is each clustering cluster label in text collection;
Silhouette coefficient is the coefficient that measurement cluster result is carried out in conjunction with two kinds of factors of cohesion degree and separating degree.Silhouette coefficient is got over
Greatly, indicate that Clustering Effect is better, on the contrary it is poorer, and silhouette coefficient calculation formula is as follows:
In formula: a (xi) it is text xiWith it with the average value of all text distances other in cluster, for quantifying in cluster
Condensation degree, b (xi) it is text xiWith xiThe average distance of all texts, traverses every other cluster in an outer cluster, finds recently
Average distance, for quantifying separating degree between cluster.
Cluster number of clusters, mean profile coefficient L (x) are determined with the mean profile coefficient of entire text setkAre as follows:
In formula: N is the amount of text of entire text set;
When mean profile coefficient maximum, corresponding number of clusters k is best cluster number of clusters.
Attached drawing 2 is some numerical results of the embodiment of the present invention, when k takes 4 as can be seen from Figure 2, has maximum mean profile system
Number, the i.e. result of K mean cluster are best.
Step 5: every cluster in step 4 being clustered, gives label as requirement item, and calculate the different degree of requirement item;
Label is extracted from each cluster word occurred according to word frequency of occurrence size each in cluster as requirement item
Number sorts from large to small, and recommends engineer for word frequency of occurrence is biggish, such mark is therefrom summed up by engineer
Label are requirement item.It is shown such as some numerical results such as table 2 of the present embodiment cluster." metro noise " be can choose as demand
?.
2. word frequency of occurrence of table
Sequence | Word | Word frequency of occurrence |
1 | Subway | 541 |
2 | Ear | 426 |
3 | Noise | 346 |
4 | Sound | 312 |
… | … | … |
Passenger demand item different degree is measured with the counterpropagate temperature of the requirement item.Wherein, the meter of temperature is propagated
Calculate formula are as follows:
In formula: nsFor amount of text in every cluster, ZiFor the transfer amount of i-th text in every cluster, DiFor the i-th provision in every cluster
This amount of thumbing up, PiFor the comment amount of i-th text in every cluster, w1、w2And w3For constant, k is cluster number of clusters;
w1、w2And w3The weight for respectively indicating forwarding, thumbing up and commenting on, meets w1+w2+w3=1.
User repeats the influence of dispatch in order to prevent, with propagation range gk=ls/nsIt is modified to temperature is propagated, wherein
lsFor the number of users sent the documents in every cluster, revised propagation temperature is expressed as:
r′k=rk×gk
In formula: r 'kFor revised propagation temperature, gkTo propagate range, lsFor the number of users sent the documents in every cluster;
Counterpropagate temperature, that is, different degree calculation formula are as follows:
In formula: S is total text collection number, r 'iFor the propagation temperature after i-th demand correction, i is requirement item label.
Step 6: requirement item obtained in step 5 is first determined whether it is present in demand dictionary, if then exiting,
If otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, be added to demand word if meeting
Library is exited if being unsatisfactory for.
Acquired demand is evaluated in conjunction with the propagation temperature and propagation persistence of requirement item, judges whether it is newly to need
It asks.It is likely to occur the requirement item not having in demand dictionary in the passenger demand item of acquisition, is needed according to propagation temperature and propagation
Persistence judges these requirement items, judges whether it can be used as new demand item and add to demand dictionary.Mainly in two steps
It carries out:
1) requirement item that will acquire is matched with existing requirement item in demand dictionary, it is determined whether by not having in dictionary
Some requirement items;
2) by the counterpropagate temperature of requirement item and counterpropagate persistence and pre-set threshold value comparison.It is lasting to propagate
Degree is the propagation duration for measuring new demand.
Propagate persistence jkIt is as follows:
In formula: r 'k0、r′k1、r′k2For the propagation temperature obtained in continuous three periods, wherein r 'k0It is obtained for this
Propagate temperature;Acquisition is dynamically, i.e., to obtain data from social network-i i-platform automatically every a period.This refers to hair
Emerging, pent-up demand acquisition time section, r ' are showedk1、r′k2It respectively refers to following for the first time and when following the obtains twice
Between section propagation temperature.
Counterpropagate persistence JkAre as follows:
In formula: S is total text collection number, jiFor the propagation persistence of i-th demand, i is requirement item label.
It, could be as when the counterpropagate temperature and counterpropagate persistence of new demand are simultaneously greater than the threshold value set
Candidate new demand, then by manually being judged.
A kind of Metro Passenger demand dynamic acquisition system, including Data Acquisition Model, text can be constructed according to the above method
Preprocessing module, text filtering module, text cluster module, requirement extract module, new demand evaluation module and demand dictionary;For
Preferably be managed further include demand check module and demand dictionary management module realize engineer to systematic difference and
Maintenance.
For demand dictionary for storing the relevant requirement item of railcar passenger demand, specially Metro Passenger demand is relevant
Word.
Data acquisition module is for obtaining the dispatch data being used in social network-i i-platform;By the requirement item in demand dictionary
Related dispatch data are grabbed in social network-i i-platform as keyword.Additionally it is possible to pass through the acquisition frequency that the module is arranged
Rate obtains data in real time.
Text Pretreatment module is for pre-processing the text of acquisition;According to above-mentioned filter method to obtain text into
Row primary filtration;Filtered text is subjected to participle and part-of-speech tagging, then based on deactivated vocabulary and part of speech filtering incorporeity meaning
The word of justice.
Text filtering module is used to filter out in text and the incoherent text of passenger demand;With information gain feature selecting
Method obtains to identify the Feature Words of text type, then obtains Feature Words with term frequency-inverse document words-frequency feature value calculating method
Each text feature value vectorization is exported filtered text by SVM classifier as input by characteristic value.
Text cluster module is used to carry out correlation cluster to filtered text data;For by filtered text into
Row cluster using K mean cluster algorithm, and determines with mean profile coefficient the quantity of clustering cluster.
Requirement extract module is used to extract the requirement item in each clustering cluster;It is needed for extracting the passenger in each clustering cluster
It asks;The biggish word of frequency is recommended engineer, by it by calculating the frequency that each word occurs in each cluster by the module
Given requirement item title.Requirement item different degree is determined by counterpropagate temperature.
New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary.
By the threshold value comparison of the counterpropagate temperature of new demand and counterpropagate persistence and setting, meet threshold value recommends engineering
Teacher allows engineer to judge whether it is new demand, and new demand is stored in demand dictionary, realizes the update of dictionary.
Demand can also be set and check that module and demand dictionary management module, demand check module, using display, provided
Visualization interface, passenger demand is extracted, evaluates and checked.Requirement extract and evaluation implementation process and corresponding step
It is identical, it repeats no more.In addition, by the requirement item that requirement extract module is extracted and the different degree being calculated, with curve graph
Form show.Such as railcar, shown in the form of shown in Fig. 4.The different degree that curve A is subway wifi in figure becomes
Change curve, B is the different degree change curve of subway stationarity, and C is the different degree change curve of subway speed, and D is metro noise
Different degree change curve.
Demand dictionary management module is used for maintenance needs dictionary, is enriched constantly demand dictionary according to the new demand of acquisition, together
When can modify and delete demand.
The present invention for current railcar passenger demand acquisition methods not only need to expend a large amount of manpowers and also have compared with
Big subjectivity.It is proposed a kind of Metro Passenger demand dynamic acquisition method and system based on social network-i i-platform.Using data
Text Mining Technology in excavation excavates reaction passenger to the need of railcar from the dispatch of social network-i i-platform user
It asks.It is compared with the traditional method, can automatically analyze a large amount of user's dispatch, obtain potential passenger demand, improve number of users
According to efficiency is obtained, reducing subjectivity influences.The dynamic need of mass users can be analyzed in real time, and it is inclined persistently to capture passenger demand
It is good, and effective passenger demand different degree is extracted accordingly, it furthermore can also in real time, automatically find emerging, potential user
Demand, the driving factors as railcar research and development.
Claims (8)
1. a kind of Metro Passenger demand dynamic acquisition method, which comprises the following steps:
Step 1: building demand dictionary, dictionary obtains user's dispatch data from social network-i i-platform according to demand;
Step 2: the data obtained to step 1 pre-process;
Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;
Step 4: the filtered text of step 3 is subjected to correlation cluster by the modified K mean cluster method of silhouette coefficient;
Step 5: to each clustering cluster in step 4, giving label as requirement item, and calculate the different degree of requirement item;
Step 6: requirement item obtained in step 5 being first determined whether it is present in demand dictionary, if then exiting, if not
Then judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand item if meeting, and
Demand dictionary is added it to, is exited if being unsatisfactory for.
2. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that the step 1 obtains
Take data procedures as follows:
It is retrieved in social network-i i-platform using the word in demand dictionary as keyword, obtains user's dispatch;It is climbed by network
Worm obtains text data.
3. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that the specific mistake of step 3
Journey is as follows:
S11: to the pretreated text random sampling of step 2, training sample and test sample are generated;
S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample letter
Yield value is greater than the word of given threshold as Feature Words by the information gain value for ceasing entropy and each word;
Training samples information entropy IG (X) calculating process is as follows:
In formula: X is training sample set, N1And N2Respectively indicate related text quantity and uncorrelated amount of text;
Information gain value IG (word) calculating process of each word is as follows:
In formula: word is the word that training sample is concentrated, and A, B are respectively that each word occurs in related text and uncorrelated text
Frequency, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text;
S13: calculating the characteristic value of Feature Words in each text, and text representation is characterized value vector;
S14: SVM classifier is constructed according to training sample, improves classifier with test sample;
S15: the support vector classifier obtained using step S14 classifies to data, is divided into demand related text and non-phase
Text is closed, uncorrelated text is removed.
4. a kind of Metro Passenger demand dynamic acquisition method according to claim 3, which is characterized in that in the step 4
K mean cluster is first passed through headed by the modified K mean cluster method of silhouette coefficient, optimum cluster cluster is then determined by silhouette coefficient
Number k;
K mean cluster process is as follows:
Determine in certain clustering cluster each point to the square distance and dist (S of cluster centrek):
In formula: SkFor the text collection of each cluster, xiFor SkThe feature value vector of text, n in clustersFor SkThe quantity of text, u in clusterk
For SkThe cluster centre of cluster, i are text label in cluster;
Wherein ukIt is as follows:
In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:
In formula: k is the number of clusters of cluster, and S is total text collection number, and j is each clustering cluster label in text collection;
Silhouette coefficient L (xi) it is as follows:
In formula: a (xi) it is text xiWith it with the average value of all text distances other in cluster, b (xi) it is text xiWith xiOutside
A cluster in all texts average distance;
Mean profile coefficient L (x)kAre as follows:
In formula: N is the amount of text of entire text set;
When mean profile coefficient maximum, corresponding number of clusters k is best cluster number of clusters.
5. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that step 5 weight
It is as follows to spend calculating process:
S21: temperature r is propagatedkIt is as follows:
In formula: nsFor amount of text in every cluster, ZiFor the transfer amount of i-th text in every cluster, DiFor i-th text in every cluster
The amount of thumbing up, PiFor the comment amount of i-th text in every cluster, w1、w2And w3For constant, k is cluster number of clusters;
S22: it is modified with range is propagated to temperature is propagated:
r′k=rk×gk
In formula: r 'kFor revised propagation temperature, gkTo propagate range, gk=ls/ns, lsFor the number of users sent the documents in every cluster;
S23: different degree RkCalculation method is as follows:
In formula: S is total text collection number, ri' for the propagation temperature after i-th demand correction, i is requirement item label.
6. a kind of Metro Passenger demand dynamic acquisition method according to claim 5, which is characterized in that in the step 6
Counterpropagate persistence calculating process is as follows:
S31: persistence j is propagatedkIt is as follows:
In formula: r 'k0、r′k1、r′k2For the propagation temperature obtained in continuous three periods, wherein r 'k0The propagation obtained for this
Temperature;
S32: counterpropagate persistence JkAre as follows:
In formula: S is total text collection number, jiFor the propagation persistence of i-th demand, i is requirement item label.
7. a kind of Metro Passenger demand dynamic acquisition method according to claim 3, which is characterized in that the step S13
Middle characteristic value is measured by term frequency-inverse document word frequency, and term frequency-inverse document word frequency TF-IDF calculation method is as follows:
TF-IDF (word)=TF (word) × IDF (word)
In formula: TF is word frequency of occurrences in a text, and IDF is word frequency of occurrences in other texts, TF (word)
For some word, the frequency of occurrences, IDF (word) are the inverse document frequency of some word occur in text collection in a text;
Wherein:
In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text, and F is
Training sample word sum, F (word) are word frequency of occurrence in training sample.
8. using the acquisition system such as any one of claim 1~7 Metro Passenger demand dynamic acquisition method, which is characterized in that
Including Data Acquisition Model, Text Pretreatment module, text filtering module, text cluster module, requirement extract module, new demand
Evaluation module and demand dictionary;
Demand dictionary is for storing the relevant requirement item of railcar passenger demand;
Data acquisition module is used to obtain the dispatch data in social network-i i-platform;
Text Pretreatment module is for pre-processing the text of acquisition;
Text filtering module is used to filter out in text and the incoherent text of passenger demand;
Text cluster module is used to carry out correlation cluster to filtered text data;
Requirement extract module is used to extract the requirement item in each clustering cluster;
New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910561357.XA CN110347828B (en) | 2019-06-26 | 2019-06-26 | Subway passenger demand dynamic acquisition method and acquisition system thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910561357.XA CN110347828B (en) | 2019-06-26 | 2019-06-26 | Subway passenger demand dynamic acquisition method and acquisition system thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110347828A true CN110347828A (en) | 2019-10-18 |
CN110347828B CN110347828B (en) | 2022-03-15 |
Family
ID=68183218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910561357.XA Active CN110347828B (en) | 2019-06-26 | 2019-06-26 | Subway passenger demand dynamic acquisition method and acquisition system thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110347828B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114297401A (en) * | 2021-12-14 | 2022-04-08 | 中航机载系统共性技术有限公司 | System knowledge extraction method based on clustering algorithm |
CN114445141A (en) * | 2022-01-26 | 2022-05-06 | 西南交通大学 | Customer demand obtaining method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110137845A1 (en) * | 2009-12-09 | 2011-06-09 | Zemoga, Inc. | Method and apparatus for real time semantic filtering of posts to an internet social network |
US20130080212A1 (en) * | 2011-09-26 | 2013-03-28 | Xerox Corporation | Methods and systems for measuring engagement effectiveness in electronic social media |
CN103678564A (en) * | 2013-12-09 | 2014-03-26 | 国家计算机网络与信息安全管理中心 | Internet product research system based on data mining |
CN104484343A (en) * | 2014-11-26 | 2015-04-01 | 无锡清华信息科学与技术国家实验室物联网技术中心 | Topic detection and tracking method for microblog |
CN107909478A (en) * | 2017-11-27 | 2018-04-13 | 苏州点对点信息科技有限公司 | FOF mutual fund portfolio system and methods based on social network clustering and information gain entropy index |
CN107908753A (en) * | 2017-11-20 | 2018-04-13 | 合肥工业大学 | Customer demand method for digging and device based on social media comment data |
CN108388660A (en) * | 2018-03-08 | 2018-08-10 | 中国计量大学 | A kind of improved electric business product pain spot analysis method |
CN109165996A (en) * | 2018-07-18 | 2019-01-08 | 浙江大学 | Product function feature importance analysis method based on online user's comment |
CN109829166A (en) * | 2019-02-15 | 2019-05-31 | 重庆师范大学 | People place customer input method for digging based on character level convolutional neural networks |
-
2019
- 2019-06-26 CN CN201910561357.XA patent/CN110347828B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110137845A1 (en) * | 2009-12-09 | 2011-06-09 | Zemoga, Inc. | Method and apparatus for real time semantic filtering of posts to an internet social network |
US20130080212A1 (en) * | 2011-09-26 | 2013-03-28 | Xerox Corporation | Methods and systems for measuring engagement effectiveness in electronic social media |
CN103678564A (en) * | 2013-12-09 | 2014-03-26 | 国家计算机网络与信息安全管理中心 | Internet product research system based on data mining |
CN104484343A (en) * | 2014-11-26 | 2015-04-01 | 无锡清华信息科学与技术国家实验室物联网技术中心 | Topic detection and tracking method for microblog |
CN107908753A (en) * | 2017-11-20 | 2018-04-13 | 合肥工业大学 | Customer demand method for digging and device based on social media comment data |
CN107909478A (en) * | 2017-11-27 | 2018-04-13 | 苏州点对点信息科技有限公司 | FOF mutual fund portfolio system and methods based on social network clustering and information gain entropy index |
CN108388660A (en) * | 2018-03-08 | 2018-08-10 | 中国计量大学 | A kind of improved electric business product pain spot analysis method |
CN109165996A (en) * | 2018-07-18 | 2019-01-08 | 浙江大学 | Product function feature importance analysis method based on online user's comment |
CN109829166A (en) * | 2019-02-15 | 2019-05-31 | 重庆师范大学 | People place customer input method for digging based on character level convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
郑治豪等: "基于社交媒体大数据的交通感知分析系统", 《自动化学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114297401A (en) * | 2021-12-14 | 2022-04-08 | 中航机载系统共性技术有限公司 | System knowledge extraction method based on clustering algorithm |
CN114445141A (en) * | 2022-01-26 | 2022-05-06 | 西南交通大学 | Customer demand obtaining method |
Also Published As
Publication number | Publication date |
---|---|
CN110347828B (en) | 2022-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Keneshloo et al. | Predicting the popularity of news articles | |
CN103246670B (en) | Microblogging sequence, search, methods of exhibiting and system | |
CN109829166B (en) | People and host customer opinion mining method based on character-level convolutional neural network | |
CN105550269A (en) | Product comment analyzing method and system with learning supervising function | |
CN108763484A (en) | A kind of law article recommendation method based on LDA topic models | |
CN103116637A (en) | Text sentiment classification method facing Chinese Web comments | |
US10387805B2 (en) | System and method for ranking news feeds | |
CN109670039A (en) | Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering | |
CN103150333A (en) | Opinion leader identification method in microblog media | |
CN105225135B (en) | Potential customer identification method and device | |
CN111309900B (en) | Legal class similarity judging and pushing method | |
CN102156747B (en) | Method and device for forecasting collaborative filtering mark by introduction of social tag | |
CN102880631A (en) | Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method | |
CN104538036A (en) | Speaker recognition method based on semantic cell mixing model | |
CN110717654A (en) | Product quality evaluation method and system based on user comments | |
CN110347828A (en) | A kind of Metro Passenger demand dynamic acquisition method and its obtain system | |
CN110889092A (en) | Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data | |
KR20180131146A (en) | Apparatus and Method for Identifying Core Issues of Each Evaluation Criteria from User Reviews | |
CN109961311A (en) | Lead referral method, apparatus calculates equipment and storage medium | |
CN110109902A (en) | A kind of electric business platform recommender system based on integrated learning approach | |
CN110992988B (en) | Speech emotion recognition method and device based on domain confrontation | |
De Oña et al. | Analyzing transit service quality evolution using decision trees and gender segmentation | |
CN104361015A (en) | Mail classification and recognition method | |
Qi et al. | Investigation of the influence of Twitter user habits on sentiment of their opinions towards transportation services | |
KR100913049B1 (en) | Method and system for providing positive / negative search result using user preference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |