CN110347828A

CN110347828A - A kind of Metro Passenger demand dynamic acquisition method and its obtain system

Info

Publication number: CN110347828A
Application number: CN201910561357.XA
Authority: CN
Inventors: 黎荣; 黎伟洋; 王建; 丁国富; 张义军; 韩鑫; 郑宇飞
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2019-10-18
Anticipated expiration: 2039-06-26
Also published as: CN110347828B

Abstract

The invention discloses a kind of Metro Passenger demand dynamic acquisition method and its obtain system, comprising the following steps: step 1: building demand dictionary obtains user's dispatch data from social network-i i-platform；Step 2: the data of acquisition are pre-processed；Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand；Step 4: carrying out correlation cluster；Step 5: to each clustering cluster, giving label as requirement item, and calculate the different degree of requirement item；Step 6: requirement item is first determined whether it is present in demand dictionary, if then exiting, if otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand item if meeting, and demand dictionary is added it to, it is exited if being unsatisfactory for；The present invention can handle a large amount of user's dispatch, improve customer requirement retrieval efficiency, subjectivity is low；Demand perference and potential user demand can be obtained in real time from mass users dispatch.

Description

A kind of Metro Passenger demand dynamic acquisition method and its obtain system

Technical field

The invention discloses a kind of Metro Passenger demand dynamic acquisition methods, and in particular to a kind of Metro Passenger demand dynamic Acquisition methods and its acquisition system.

Background technique

Nearly 10 Yu Nianlai, the transport capacity of railway gradually enhance, and the volume of passenger transportation is also stepped up.Subway, high-speed rail visitor Rail line road mileage will be further increased in the increase of freight volume and volume of the circular flow, increases railcar quantity on order.This is to ground Iron car manufacturing enterprise provides opportunities and challenges.The client of rail vehicle manufacturing enterprise includes operation enterprise and passenger, however Current track vehicle manufacture enterprise is primarily upon the demand of operation enterprise and lacks the analysis to passenger demand, to influence terminal Client is unfavorable for improving the market competitiveness of enterprise to the satisfaction of rail vehicle manufacturing enterprise product.

Passenger demand includes passenger demand item and its different degree, all dynamic change at any time, and existing requirement acquisition method, Such as questionnaire.Not only need to expend a large amount of manpowers when obtaining dynamic passenger demand but also there are biggish subjectivity, This all constrains rail vehicle manufacturing enterprise and analyzes passenger demand.

Summary of the invention

The present invention provides the Metro Passenger demand dynamic acquisition method and its obtain that a kind of data acquisition is high-efficient, subjectivity is low Take system.

The technical solution adopted by the present invention is that: a kind of Metro Passenger demand dynamic acquisition method, comprising the following steps:

Step 1: building demand dictionary, dictionary obtains user's dispatch data from social network-i i-platform according to demand；

Step 2: the data obtained to step 1 pre-process；

Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand；

Step 4: the filtered text of step 3 being subjected to correlation by the modified K mean cluster method of silhouette coefficient and is gathered Class；

Step 5: to each clustering cluster in step 4, giving label as requirement item, and calculate the different degree of requirement item；

Step 6: requirement item obtained in step 5 is first determined whether it is present in demand dictionary, if then exiting, If otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand if meeting , and demand dictionary is added it to, it is exited if being unsatisfactory for.

Further, it is as follows to obtain data procedures for the step 1:

It is retrieved in social network-i i-platform using the word in demand dictionary as keyword, obtains user's dispatch；Pass through net Network crawler obtains text data.

Further, detailed process is as follows for step 3:

S11: to the pretreated text random sampling of step 2, training sample and test sample are generated；

S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample Yield value is greater than the word of given threshold as Feature Words by the information gain value of this comentropy and each word；

Training samples information entropy IG (X) calculating process is as follows:

In formula: X is training sample set, N₁And N₂Respectively indicate related text quantity and uncorrelated amount of text；

Information gain value IG (word) value calculating process of each word is as follows:

In formula: word is the word that training sample is concentrated, and A, B are respectively each word in related text and uncorrelated text The frequency of appearance, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text；

S13: calculating the characteristic value of Feature Words in each text, and text representation is characterized value vector；

S14: SVM classifier is constructed according to training sample, improves classifier with test sample；

S15: the support vector classifier obtained using step S14 classifies to data, be divided into demand related text and Uncorrelated text removes uncorrelated text.

Further, the modified K mean cluster method of silhouette coefficient is poly- by passing through K mean value first in the step 4 Then class determines optimum cluster number of clusters k by silhouette coefficient；

K mean cluster process is as follows:

Determine in certain clustering cluster each point to the square distance and dist (S of cluster centre_k):

In formula: S_kFor the text collection of each cluster, x_iFor S_kThe feature value vector of text, n in cluster_sFor S_kThe number of text in cluster Amount, u_kFor S_kThe cluster centre of cluster, i are text label in cluster；

Wherein u_kIt is as follows:

In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:

In formula: k is the number of clusters of cluster, and S is total text collection number, and j is each clustering cluster label in text collection；

Silhouette coefficient L (x_i) it is as follows:

In formula: a (x_i) it is text x_iWith it with the average value of all text distances other in cluster, b (x_i) it is text x_iWith x_iThe average distance of all texts in an outer cluster；

Mean profile coefficient L (x)_kAre as follows:

In formula: N is the amount of text of entire text set；

When mean profile coefficient maximum, corresponding number of clusters k is best cluster number of clusters.

Further, the step 5 different degree calculating process is as follows:

S21: temperature r is propagated_kIt is as follows:

In formula: n_sFor amount of text in every cluster, Z_iFor the transfer amount of i-th text in every cluster, D_iFor the i-th provision in every cluster This amount of thumbing up, P_iFor the comment amount of i-th text in every cluster, w₁、w₂And w₃For constant, k is cluster number of clusters；

S22: it is modified with range is propagated to temperature is propagated:

r′_k=r_k×g_k

In formula: r '_kFor revised propagation temperature, g_kTo propagate range, g_k=l_s/n_s, l_sFor the user to send the documents in every cluster Quantity；

S23: different degree R_kCalculation method is as follows:

In formula: S is total text collection number, r '_iFor the propagation temperature after i-th demand correction, i is requirement item label.

Further, counterpropagate persistence calculating process is as follows in the step 6:

S31: persistence j is propagated_kIt is as follows:

In formula: r '_k0、r′_k1、r′_k2For the propagation temperature obtained in continuous three periods, wherein r '_k0It is obtained for this Propagate temperature；

S32: counterpropagate persistence J_kAre as follows:

In formula: S is total text collection number, j_iFor the propagation persistence of i-th demand, i is requirement item label.

Further, characteristic value is measured by term frequency-inverse document word frequency in the step S13, term frequency-inverse document word frequency TF-IDF calculation method is as follows:

TF-IDF (word)=TF (word) × IDF (word)

In formula: TF is word frequency of occurrences in a text, and IDF is the word frequency of occurrences, TF in other texts It (word) is some word frequency of occurrences in a text, IDF (word) is the inverse document frequency for occurring some word in text collection Rate；

Wherein:

In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text, F is training sample word sum, and F (word) is word frequency of occurrence in training sample.

A kind of Metro Passenger demand dynamic acquisition system, which is characterized in that including Data Acquisition Model, Text Pretreatment mould Block, text filtering module, text cluster module, requirement extract module, new demand evaluation module and demand dictionary；

Demand dictionary is for storing the relevant requirement item of railcar passenger demand；

Data acquisition module is used to obtain the dispatch data in social network-i i-platform；

Text Pretreatment module is for pre-processing the text of acquisition；

Text filtering module is used to filter out in text and the incoherent text of passenger demand；

Text cluster module is used to carry out correlation cluster to filtered text data；

Requirement extract module is used to extract the requirement item in each clustering cluster；

New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary.

The beneficial effects of the present invention are:

(1) present invention obtains a large amount of user by web crawlers and sends the documents, and obtains passenger demand, improves user demand and obtain Efficiency is taken, subjectivity is low；

(2) present invention can analyze the dynamic need of mass users in real time, persistently capture passenger demand preference, obtain accordingly Effective passenger demand different degree.

(3) present invention can in real time, automatically have found emerging, potential user demand.

Detailed description of the invention

Fig. 1 is the method for the present invention flow diagram.

Fig. 2 is silhouette coefficient schematic diagram of calculation result in the embodiment of the present invention.

Fig. 3 is present system structural schematic diagram.

Fig. 4 is passenger demand variation tendency schematic diagram in the embodiment of the present invention.

Specific embodiment

The present invention will be further described in the following with reference to the drawings and specific embodiments.

As shown in Figure 1, a kind of Metro Passenger demand dynamic acquisition method, comprising the following steps:

User's dispatch is obtained from social network-i i-platform based on demand dictionary.Wherein demand dictionary is needed with railcar passenger Ask set of correlation word, including passenger demand item, rail vehicle name of product etc..Using the word in demand dictionary as key Word such as " subway speed " retrieves associated user's dispatch, then obtain these by web crawlers technology in social network-i i-platform Text data.The present embodiment is retrieved with the keywords such as " subway wifi ", " subway speed ", " subway is steady ".

The words such as the passenger demand item (e.g., speed) of the storage in demand dictionary, railcar name of product (e.g., subway) Be it is predefined according to practical term, these contents technical solution subsequent step can be enriched constantly through the invention.

Step 2: the data obtained to step 1 pre-process；

Pretreatment includes dispatch primary filtration, participle, the part-of-speech tagging etc. to acquisition.It is divided into following three step to carry out:

1) the dispatch feature for combining social platform, formulates filtering rule, further according to the rule drafted, primary filtration text. Wherein, filtering rule, that is, primary filtration foundation, is write in the form of production rule.By whether including noise in analysis text Character (e.g., #, []) carries out judging whether to filter.

2) text after primary filtration is subjected to participle and part-of-speech tagging.Participle is by text segmentation into word one by one, Part-of-speech tagging is that the word that will get sticks the labels such as noun, verb.

3) word of incorporeity meaning, including two parts are filtered, first is that filtering stop words in conjunction with existing deactivated vocabulary, such as " ", " " etc..Second is that in conjunction with the word other than part of speech filtering noun, verb, adjective, such as adverbial word, pronoun.

After the processing of step 1 and step 2, primary filtration noise, but include much noise.It makes an uproar this part Sound text presentation be description main object be not railcar, but contain step 1 for retrieval keyword.Such as " subway The speed that upper aunt robs seat enables me be taken aback ", this text cannot react demand of the passenger to railcar.To this partial noise text This filtering, which can be considered, carries out two classification to text, is broadly divided into the following steps:

Random sampling, and manually generated training sample and test sample are carried out to pretreated text.Wherein, sampling is wanted Guarantee two principles, first is that the content of sample will be related to the content that each keyword retrieval goes out in step 1, second is that from each key Word and search goes out the quantity sampled in content, and to each keyword retrieval to go out content quantity directly proportional.

Based on training sample, the Feature Words that can identify related text and uncorrelated text are selected, such as " aunt " " robs Seat ".Using the method for information gain feature selecting:

Information gain is the feature selection approach that Feature Words are determined according to the information content size contained by word, information content It is indicated with comentropy, calculating process is as follows:

Information gain value IG (word) calculating process of each word is as follows:

In formula: word is the word that training sample is concentrated, and A, B are respectively each word in related text and uncorrelated text The frequency of appearance, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text.

Each word is sorted from large to small by information increasing, then selected value is biggish as Feature Words, such as table 1 is the present embodiment Some numerical results:

The sequence of 1. information gain value of table

Sequence	Word	Information gain value
			1	Rob seat	0.9744340029
2	Aunt	0.9631205685
			3	It checks card	0.8819280948
4	Spurt	0.8529583405
			5	Transfer	0.8329984805
…	…	…

S13: calculating the characteristic value of Feature Words, and text representation is characterized value vector；

Term frequency-inverse document word frequency is to comprehensively consider word to occur in the frequency of occurrences (TF) and other texts in a text The feature value calculating method of frequency (IDF), term frequency-inverse document word frequency calculation method are as follows:

TF-IDF (word)=TF (word) × IDF (word)

Wherein:

S14: constructing SVM classifier according to training sample, carries out classification to test sample and is trained；

According to the test result of each test sample, training sample is expanded to increase training sample to not of the same race The coverage of noise like, improves classifier.

K mean cluster is carried out to the data that step 3 is obtained by filtration, optimal number of clusters k is determined by silhouette coefficient.

K mean cluster process is as follows:

K mean value is according to sorting out text apart from size between text, and between text is the correlation of text apart from size Degree is measured using Euclidean distance, determines in certain clustering cluster each point to the square distance and dist (S of cluster centre_k):

Wherein u_kIt is as follows:

The target of K mean cluster be all samples in Clustering Domain to be realized to the distance of cluster centre quadratic sum most It is small；In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:

Silhouette coefficient is the coefficient that measurement cluster result is carried out in conjunction with two kinds of factors of cohesion degree and separating degree.Silhouette coefficient is got over Greatly, indicate that Clustering Effect is better, on the contrary it is poorer, and silhouette coefficient calculation formula is as follows:

In formula: a (x_i) it is text x_iWith it with the average value of all text distances other in cluster, for quantifying in cluster Condensation degree, b (x_i) it is text x_iWith x_iThe average distance of all texts, traverses every other cluster in an outer cluster, finds recently Average distance, for quantifying separating degree between cluster.

Cluster number of clusters, mean profile coefficient L (x) are determined with the mean profile coefficient of entire text set_kAre as follows:

In formula: N is the amount of text of entire text set；

Attached drawing 2 is some numerical results of the embodiment of the present invention, when k takes 4 as can be seen from Figure 2, has maximum mean profile system Number, the i.e. result of K mean cluster are best.

Step 5: every cluster in step 4 being clustered, gives label as requirement item, and calculate the different degree of requirement item；

Label is extracted from each cluster word occurred according to word frequency of occurrence size each in cluster as requirement item Number sorts from large to small, and recommends engineer for word frequency of occurrence is biggish, such mark is therefrom summed up by engineer Label are requirement item.It is shown such as some numerical results such as table 2 of the present embodiment cluster." metro noise " be can choose as demand ?.

2. word frequency of occurrence of table

Sequence	Word	Word frequency of occurrence
			1	Subway	541
2	Ear	426
			3	Noise	346
4	Sound	312
			…	…	…

Passenger demand item different degree is measured with the counterpropagate temperature of the requirement item.Wherein, the meter of temperature is propagated Calculate formula are as follows:

w₁、w₂And w₃The weight for respectively indicating forwarding, thumbing up and commenting on, meets w₁+w₂+w₃=1.

User repeats the influence of dispatch in order to prevent, with propagation range g_k=l_s/n_sIt is modified to temperature is propagated, wherein l_sFor the number of users sent the documents in every cluster, revised propagation temperature is expressed as:

r′_k=r_k×g_k

In formula: r '_kFor revised propagation temperature, g_kTo propagate range, l_sFor the number of users sent the documents in every cluster；

Counterpropagate temperature, that is, different degree calculation formula are as follows:

Step 6: requirement item obtained in step 5 is first determined whether it is present in demand dictionary, if then exiting, If otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, be added to demand word if meeting Library is exited if being unsatisfactory for.

Acquired demand is evaluated in conjunction with the propagation temperature and propagation persistence of requirement item, judges whether it is newly to need It asks.It is likely to occur the requirement item not having in demand dictionary in the passenger demand item of acquisition, is needed according to propagation temperature and propagation Persistence judges these requirement items, judges whether it can be used as new demand item and add to demand dictionary.Mainly in two steps It carries out:

1) requirement item that will acquire is matched with existing requirement item in demand dictionary, it is determined whether by not having in dictionary Some requirement items；

2) by the counterpropagate temperature of requirement item and counterpropagate persistence and pre-set threshold value comparison.It is lasting to propagate Degree is the propagation duration for measuring new demand.

Propagate persistence j_kIt is as follows:

In formula: r '_k0、r′_k1、r′_k2For the propagation temperature obtained in continuous three periods, wherein r '_k0It is obtained for this Propagate temperature；Acquisition is dynamically, i.e., to obtain data from social network-i i-platform automatically every a period.This refers to hair Emerging, pent-up demand acquisition time section, r ' are showed_k1、r′_k2It respectively refers to following for the first time and when following the obtains twice Between section propagation temperature.

Counterpropagate persistence J_kAre as follows:

It, could be as when the counterpropagate temperature and counterpropagate persistence of new demand are simultaneously greater than the threshold value set Candidate new demand, then by manually being judged.

A kind of Metro Passenger demand dynamic acquisition system, including Data Acquisition Model, text can be constructed according to the above method Preprocessing module, text filtering module, text cluster module, requirement extract module, new demand evaluation module and demand dictionary；For Preferably be managed further include demand check module and demand dictionary management module realize engineer to systematic difference and Maintenance.

For demand dictionary for storing the relevant requirement item of railcar passenger demand, specially Metro Passenger demand is relevant Word.

Data acquisition module is for obtaining the dispatch data being used in social network-i i-platform；By the requirement item in demand dictionary Related dispatch data are grabbed in social network-i i-platform as keyword.Additionally it is possible to pass through the acquisition frequency that the module is arranged Rate obtains data in real time.

Text Pretreatment module is for pre-processing the text of acquisition；According to above-mentioned filter method to obtain text into Row primary filtration；Filtered text is subjected to participle and part-of-speech tagging, then based on deactivated vocabulary and part of speech filtering incorporeity meaning The word of justice.

Text filtering module is used to filter out in text and the incoherent text of passenger demand；With information gain feature selecting Method obtains to identify the Feature Words of text type, then obtains Feature Words with term frequency-inverse document words-frequency feature value calculating method Each text feature value vectorization is exported filtered text by SVM classifier as input by characteristic value.

Text cluster module is used to carry out correlation cluster to filtered text data；For by filtered text into Row cluster using K mean cluster algorithm, and determines with mean profile coefficient the quantity of clustering cluster.

Requirement extract module is used to extract the requirement item in each clustering cluster；It is needed for extracting the passenger in each clustering cluster It asks；The biggish word of frequency is recommended engineer, by it by calculating the frequency that each word occurs in each cluster by the module Given requirement item title.Requirement item different degree is determined by counterpropagate temperature.

New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary. By the threshold value comparison of the counterpropagate temperature of new demand and counterpropagate persistence and setting, meet threshold value recommends engineering Teacher allows engineer to judge whether it is new demand, and new demand is stored in demand dictionary, realizes the update of dictionary.

Demand can also be set and check that module and demand dictionary management module, demand check module, using display, provided Visualization interface, passenger demand is extracted, evaluates and checked.Requirement extract and evaluation implementation process and corresponding step It is identical, it repeats no more.In addition, by the requirement item that requirement extract module is extracted and the different degree being calculated, with curve graph Form show.Such as railcar, shown in the form of shown in Fig. 4.The different degree that curve A is subway wifi in figure becomes Change curve, B is the different degree change curve of subway stationarity, and C is the different degree change curve of subway speed, and D is metro noise Different degree change curve.

Demand dictionary management module is used for maintenance needs dictionary, is enriched constantly demand dictionary according to the new demand of acquisition, together When can modify and delete demand.

The present invention for current railcar passenger demand acquisition methods not only need to expend a large amount of manpowers and also have compared with Big subjectivity.It is proposed a kind of Metro Passenger demand dynamic acquisition method and system based on social network-i i-platform.Using data Text Mining Technology in excavation excavates reaction passenger to the need of railcar from the dispatch of social network-i i-platform user It asks.It is compared with the traditional method, can automatically analyze a large amount of user's dispatch, obtain potential passenger demand, improve number of users According to efficiency is obtained, reducing subjectivity influences.The dynamic need of mass users can be analyzed in real time, and it is inclined persistently to capture passenger demand It is good, and effective passenger demand different degree is extracted accordingly, it furthermore can also in real time, automatically find emerging, potential user Demand, the driving factors as railcar research and development.

Claims

1. a kind of Metro Passenger demand dynamic acquisition method, which comprises the following steps:

Step 2: the data obtained to step 1 pre-process；

Step 4: the filtered text of step 3 is subjected to correlation cluster by the modified K mean cluster method of silhouette coefficient；

Step 6: requirement item obtained in step 5 being first determined whether it is present in demand dictionary, if then exiting, if not Then judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand item if meeting, and Demand dictionary is added it to, is exited if being unsatisfactory for.

2. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that the step 1 obtains Take data procedures as follows:

It is retrieved in social network-i i-platform using the word in demand dictionary as keyword, obtains user's dispatch；It is climbed by network Worm obtains text data.

3. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that the specific mistake of step 3 Journey is as follows:

S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample letter Yield value is greater than the word of given threshold as Feature Words by the information gain value for ceasing entropy and each word；

Training samples information entropy IG (X) calculating process is as follows:

In formula: word is the word that training sample is concentrated, and A, B are respectively that each word occurs in related text and uncorrelated text Frequency, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text；

S15: the support vector classifier obtained using step S14 classifies to data, is divided into demand related text and non-phase Text is closed, uncorrelated text is removed.

4. a kind of Metro Passenger demand dynamic acquisition method according to claim 3, which is characterized in that in the step 4 K mean cluster is first passed through headed by the modified K mean cluster method of silhouette coefficient, optimum cluster cluster is then determined by silhouette coefficient Number k；

K mean cluster process is as follows:

In formula: S_kFor the text collection of each cluster, x_iFor S_kThe feature value vector of text, n in cluster_sFor S_kThe quantity of text, u in cluster_k For S_kThe cluster centre of cluster, i are text label in cluster；

Wherein u_kIt is as follows:

Silhouette coefficient L (x_i) it is as follows:

In formula: a (x_i) it is text x_iWith it with the average value of all text distances other in cluster, b (x_i) it is text x_iWith x_iOutside A cluster in all texts average distance；

Mean profile coefficient L (x)_kAre as follows:

In formula: N is the amount of text of entire text set；

5. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that step 5 weight It is as follows to spend calculating process:

S21: temperature r is propagated_kIt is as follows:

In formula: n_sFor amount of text in every cluster, Z_iFor the transfer amount of i-th text in every cluster, D_iFor i-th text in every cluster The amount of thumbing up, P_iFor the comment amount of i-th text in every cluster, w₁、w₂And w₃For constant, k is cluster number of clusters；

S22: it is modified with range is propagated to temperature is propagated:

r′_k=r_k×g_k

In formula: r '_kFor revised propagation temperature, g_kTo propagate range, g_k=l_s/n_s, l_sFor the number of users sent the documents in every cluster；

S23: different degree R_kCalculation method is as follows:

In formula: S is total text collection number, r_i' for the propagation temperature after i-th demand correction, i is requirement item label.

6. a kind of Metro Passenger demand dynamic acquisition method according to claim 5, which is characterized in that in the step 6 Counterpropagate persistence calculating process is as follows:

S31: persistence j is propagated_kIt is as follows:

In formula: r '_k0、r′_k1、r′_k2For the propagation temperature obtained in continuous three periods, wherein r '_k0The propagation obtained for this Temperature；

S32: counterpropagate persistence J_kAre as follows:

7. a kind of Metro Passenger demand dynamic acquisition method according to claim 3, which is characterized in that the step S13 Middle characteristic value is measured by term frequency-inverse document word frequency, and term frequency-inverse document word frequency TF-IDF calculation method is as follows:

TF-IDF (word)=TF (word) × IDF (word)

In formula: TF is word frequency of occurrences in a text, and IDF is word frequency of occurrences in other texts, TF (word) For some word, the frequency of occurrences, IDF (word) are the inverse document frequency of some word occur in text collection in a text；

Wherein:

In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text, and F is Training sample word sum, F (word) are word frequency of occurrence in training sample.

8. using the acquisition system such as any one of claim 1~7 Metro Passenger demand dynamic acquisition method, which is characterized in that Including Data Acquisition Model, Text Pretreatment module, text filtering module, text cluster module, requirement extract module, new demand Evaluation module and demand dictionary；

Text Pretreatment module is for pre-processing the text of acquisition；