CN110347828A - A kind of Metro Passenger demand dynamic acquisition method and its obtain system - Google Patents

A kind of Metro Passenger demand dynamic acquisition method and its obtain system Download PDF

Info

Publication number
CN110347828A
CN110347828A CN201910561357.XA CN201910561357A CN110347828A CN 110347828 A CN110347828 A CN 110347828A CN 201910561357 A CN201910561357 A CN 201910561357A CN 110347828 A CN110347828 A CN 110347828A
Authority
CN
China
Prior art keywords
text
cluster
demand
word
follows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910561357.XA
Other languages
Chinese (zh)
Other versions
CN110347828B (en
Inventor
黎荣
黎伟洋
王建
丁国富
张义军
韩鑫
郑宇飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN201910561357.XA priority Critical patent/CN110347828B/en
Publication of CN110347828A publication Critical patent/CN110347828A/en
Application granted granted Critical
Publication of CN110347828B publication Critical patent/CN110347828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention discloses a kind of Metro Passenger demand dynamic acquisition method and its obtain system, comprising the following steps: step 1: building demand dictionary obtains user's dispatch data from social network-i i-platform;Step 2: the data of acquisition are pre-processed;Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;Step 4: carrying out correlation cluster;Step 5: to each clustering cluster, giving label as requirement item, and calculate the different degree of requirement item;Step 6: requirement item is first determined whether it is present in demand dictionary, if then exiting, if otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand item if meeting, and demand dictionary is added it to, it is exited if being unsatisfactory for;The present invention can handle a large amount of user's dispatch, improve customer requirement retrieval efficiency, subjectivity is low;Demand perference and potential user demand can be obtained in real time from mass users dispatch.

Description

A kind of Metro Passenger demand dynamic acquisition method and its obtain system
Technical field
The invention discloses a kind of Metro Passenger demand dynamic acquisition methods, and in particular to a kind of Metro Passenger demand dynamic Acquisition methods and its acquisition system.
Background technique
Nearly 10 Yu Nianlai, the transport capacity of railway gradually enhance, and the volume of passenger transportation is also stepped up.Subway, high-speed rail visitor Rail line road mileage will be further increased in the increase of freight volume and volume of the circular flow, increases railcar quantity on order.This is to ground Iron car manufacturing enterprise provides opportunities and challenges.The client of rail vehicle manufacturing enterprise includes operation enterprise and passenger, however Current track vehicle manufacture enterprise is primarily upon the demand of operation enterprise and lacks the analysis to passenger demand, to influence terminal Client is unfavorable for improving the market competitiveness of enterprise to the satisfaction of rail vehicle manufacturing enterprise product.
Passenger demand includes passenger demand item and its different degree, all dynamic change at any time, and existing requirement acquisition method, Such as questionnaire.Not only need to expend a large amount of manpowers when obtaining dynamic passenger demand but also there are biggish subjectivity, This all constrains rail vehicle manufacturing enterprise and analyzes passenger demand.
Summary of the invention
The present invention provides the Metro Passenger demand dynamic acquisition method and its obtain that a kind of data acquisition is high-efficient, subjectivity is low Take system.
The technical solution adopted by the present invention is that: a kind of Metro Passenger demand dynamic acquisition method, comprising the following steps:
Step 1: building demand dictionary, dictionary obtains user's dispatch data from social network-i i-platform according to demand;
Step 2: the data obtained to step 1 pre-process;
Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;
Step 4: the filtered text of step 3 being subjected to correlation by the modified K mean cluster method of silhouette coefficient and is gathered Class;
Step 5: to each clustering cluster in step 4, giving label as requirement item, and calculate the different degree of requirement item;
Step 6: requirement item obtained in step 5 is first determined whether it is present in demand dictionary, if then exiting, If otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand if meeting , and demand dictionary is added it to, it is exited if being unsatisfactory for.
Further, it is as follows to obtain data procedures for the step 1:
It is retrieved in social network-i i-platform using the word in demand dictionary as keyword, obtains user's dispatch;Pass through net Network crawler obtains text data.
Further, detailed process is as follows for step 3:
S11: to the pretreated text random sampling of step 2, training sample and test sample are generated;
S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample Yield value is greater than the word of given threshold as Feature Words by the information gain value of this comentropy and each word;
Training samples information entropy IG (X) calculating process is as follows:
In formula: X is training sample set, N1And N2Respectively indicate related text quantity and uncorrelated amount of text;
Information gain value IG (word) value calculating process of each word is as follows:
In formula: word is the word that training sample is concentrated, and A, B are respectively each word in related text and uncorrelated text The frequency of appearance, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text;
S13: calculating the characteristic value of Feature Words in each text, and text representation is characterized value vector;
S14: SVM classifier is constructed according to training sample, improves classifier with test sample;
S15: the support vector classifier obtained using step S14 classifies to data, be divided into demand related text and Uncorrelated text removes uncorrelated text.
Further, the modified K mean cluster method of silhouette coefficient is poly- by passing through K mean value first in the step 4 Then class determines optimum cluster number of clusters k by silhouette coefficient;
K mean cluster process is as follows:
Determine in certain clustering cluster each point to the square distance and dist (S of cluster centrek):
In formula: SkFor the text collection of each cluster, xiFor SkThe feature value vector of text, n in clustersFor SkThe number of text in cluster Amount, ukFor SkThe cluster centre of cluster, i are text label in cluster;
Wherein ukIt is as follows:
In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:
In formula: k is the number of clusters of cluster, and S is total text collection number, and j is each clustering cluster label in text collection;
Silhouette coefficient L (xi) it is as follows:
In formula: a (xi) it is text xiWith it with the average value of all text distances other in cluster, b (xi) it is text xiWith xiThe average distance of all texts in an outer cluster;
Mean profile coefficient L (x)kAre as follows:
In formula: N is the amount of text of entire text set;
When mean profile coefficient maximum, corresponding number of clusters k is best cluster number of clusters.
Further, the step 5 different degree calculating process is as follows:
S21: temperature r is propagatedkIt is as follows:
In formula: nsFor amount of text in every cluster, ZiFor the transfer amount of i-th text in every cluster, DiFor the i-th provision in every cluster This amount of thumbing up, PiFor the comment amount of i-th text in every cluster, w1、w2And w3For constant, k is cluster number of clusters;
S22: it is modified with range is propagated to temperature is propagated:
r′k=rk×gk
In formula: r 'kFor revised propagation temperature, gkTo propagate range, gk=ls/ns, lsFor the user to send the documents in every cluster Quantity;
S23: different degree RkCalculation method is as follows:
In formula: S is total text collection number, r 'iFor the propagation temperature after i-th demand correction, i is requirement item label.
Further, counterpropagate persistence calculating process is as follows in the step 6:
S31: persistence j is propagatedkIt is as follows:
In formula: r 'k0、r′k1、r′k2For the propagation temperature obtained in continuous three periods, wherein r 'k0It is obtained for this Propagate temperature;
S32: counterpropagate persistence JkAre as follows:
In formula: S is total text collection number, jiFor the propagation persistence of i-th demand, i is requirement item label.
Further, characteristic value is measured by term frequency-inverse document word frequency in the step S13, term frequency-inverse document word frequency TF-IDF calculation method is as follows:
TF-IDF (word)=TF (word) × IDF (word)
In formula: TF is word frequency of occurrences in a text, and IDF is the word frequency of occurrences, TF in other texts It (word) is some word frequency of occurrences in a text, IDF (word) is the inverse document frequency for occurring some word in text collection Rate;
Wherein:
In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text, F is training sample word sum, and F (word) is word frequency of occurrence in training sample.
A kind of Metro Passenger demand dynamic acquisition system, which is characterized in that including Data Acquisition Model, Text Pretreatment mould Block, text filtering module, text cluster module, requirement extract module, new demand evaluation module and demand dictionary;
Demand dictionary is for storing the relevant requirement item of railcar passenger demand;
Data acquisition module is used to obtain the dispatch data in social network-i i-platform;
Text Pretreatment module is for pre-processing the text of acquisition;
Text filtering module is used to filter out in text and the incoherent text of passenger demand;
Text cluster module is used to carry out correlation cluster to filtered text data;
Requirement extract module is used to extract the requirement item in each clustering cluster;
New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary.
The beneficial effects of the present invention are:
(1) present invention obtains a large amount of user by web crawlers and sends the documents, and obtains passenger demand, improves user demand and obtain Efficiency is taken, subjectivity is low;
(2) present invention can analyze the dynamic need of mass users in real time, persistently capture passenger demand preference, obtain accordingly Effective passenger demand different degree.
(3) present invention can in real time, automatically have found emerging, potential user demand.
Detailed description of the invention
Fig. 1 is the method for the present invention flow diagram.
Fig. 2 is silhouette coefficient schematic diagram of calculation result in the embodiment of the present invention.
Fig. 3 is present system structural schematic diagram.
Fig. 4 is passenger demand variation tendency schematic diagram in the embodiment of the present invention.
Specific embodiment
The present invention will be further described in the following with reference to the drawings and specific embodiments.
As shown in Figure 1, a kind of Metro Passenger demand dynamic acquisition method, comprising the following steps:
Step 1: building demand dictionary, dictionary obtains user's dispatch data from social network-i i-platform according to demand;
User's dispatch is obtained from social network-i i-platform based on demand dictionary.Wherein demand dictionary is needed with railcar passenger Ask set of correlation word, including passenger demand item, rail vehicle name of product etc..Using the word in demand dictionary as key Word such as " subway speed " retrieves associated user's dispatch, then obtain these by web crawlers technology in social network-i i-platform Text data.The present embodiment is retrieved with the keywords such as " subway wifi ", " subway speed ", " subway is steady ".
The words such as the passenger demand item (e.g., speed) of the storage in demand dictionary, railcar name of product (e.g., subway) Be it is predefined according to practical term, these contents technical solution subsequent step can be enriched constantly through the invention.
Step 2: the data obtained to step 1 pre-process;
Pretreatment includes dispatch primary filtration, participle, the part-of-speech tagging etc. to acquisition.It is divided into following three step to carry out:
1) the dispatch feature for combining social platform, formulates filtering rule, further according to the rule drafted, primary filtration text. Wherein, filtering rule, that is, primary filtration foundation, is write in the form of production rule.By whether including noise in analysis text Character (e.g., #, []) carries out judging whether to filter.
2) text after primary filtration is subjected to participle and part-of-speech tagging.Participle is by text segmentation into word one by one, Part-of-speech tagging is that the word that will get sticks the labels such as noun, verb.
3) word of incorporeity meaning, including two parts are filtered, first is that filtering stop words in conjunction with existing deactivated vocabulary, such as " ", " " etc..Second is that in conjunction with the word other than part of speech filtering noun, verb, adjective, such as adverbial word, pronoun.
Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;
After the processing of step 1 and step 2, primary filtration noise, but include much noise.It makes an uproar this part Sound text presentation be description main object be not railcar, but contain step 1 for retrieval keyword.Such as " subway The speed that upper aunt robs seat enables me be taken aback ", this text cannot react demand of the passenger to railcar.To this partial noise text This filtering, which can be considered, carries out two classification to text, is broadly divided into the following steps:
S11: to the pretreated text random sampling of step 2, training sample and test sample are generated;
Random sampling, and manually generated training sample and test sample are carried out to pretreated text.Wherein, sampling is wanted Guarantee two principles, first is that the content of sample will be related to the content that each keyword retrieval goes out in step 1, second is that from each key Word and search goes out the quantity sampled in content, and to each keyword retrieval to go out content quantity directly proportional.
S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample Yield value is greater than the word of given threshold as Feature Words by the information gain value of this comentropy and each word;
Based on training sample, the Feature Words that can identify related text and uncorrelated text are selected, such as " aunt " " robs Seat ".Using the method for information gain feature selecting:
Information gain is the feature selection approach that Feature Words are determined according to the information content size contained by word, information content It is indicated with comentropy, calculating process is as follows:
In formula: X is training sample set, N1And N2Respectively indicate related text quantity and uncorrelated amount of text;
Information gain value IG (word) calculating process of each word is as follows:
In formula: word is the word that training sample is concentrated, and A, B are respectively each word in related text and uncorrelated text The frequency of appearance, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text.
Each word is sorted from large to small by information increasing, then selected value is biggish as Feature Words, such as table 1 is the present embodiment Some numerical results:
The sequence of 1. information gain value of table
Sequence Word Information gain value
1 Rob seat 0.9744340029
2 Aunt 0.9631205685
3 It checks card 0.8819280948
4 Spurt 0.8529583405
5 Transfer 0.8329984805
S13: calculating the characteristic value of Feature Words, and text representation is characterized value vector;
Term frequency-inverse document word frequency is to comprehensively consider word to occur in the frequency of occurrences (TF) and other texts in a text The feature value calculating method of frequency (IDF), term frequency-inverse document word frequency calculation method are as follows:
TF-IDF (word)=TF (word) × IDF (word)
In formula: TF is word frequency of occurrences in a text, and IDF is the word frequency of occurrences, TF in other texts It (word) is some word frequency of occurrences in a text, IDF (word) is the inverse document frequency for occurring some word in text collection Rate;
Wherein:
In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text, F is training sample word sum, and F (word) is word frequency of occurrence in training sample.
S14: constructing SVM classifier according to training sample, carries out classification to test sample and is trained;
According to the test result of each test sample, training sample is expanded to increase training sample to not of the same race The coverage of noise like, improves classifier.
S15: the support vector classifier obtained using step S14 classifies to data, be divided into demand related text and Uncorrelated text removes uncorrelated text.
Step 4: the filtered text of step 3 being subjected to correlation by the modified K mean cluster method of silhouette coefficient and is gathered Class;
K mean cluster is carried out to the data that step 3 is obtained by filtration, optimal number of clusters k is determined by silhouette coefficient.
K mean cluster process is as follows:
K mean value is according to sorting out text apart from size between text, and between text is the correlation of text apart from size Degree is measured using Euclidean distance, determines in certain clustering cluster each point to the square distance and dist (S of cluster centrek):
In formula: SkFor the text collection of each cluster, xiFor SkThe feature value vector of text, n in clustersFor SkThe number of text in cluster Amount, ukFor SkThe cluster centre of cluster, i are text label in cluster;
Wherein ukIt is as follows:
The target of K mean cluster be all samples in Clustering Domain to be realized to the distance of cluster centre quadratic sum most It is small;In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:
In formula: k is the number of clusters of cluster, and S is total text collection number, and j is each clustering cluster label in text collection;
Silhouette coefficient is the coefficient that measurement cluster result is carried out in conjunction with two kinds of factors of cohesion degree and separating degree.Silhouette coefficient is got over Greatly, indicate that Clustering Effect is better, on the contrary it is poorer, and silhouette coefficient calculation formula is as follows:
In formula: a (xi) it is text xiWith it with the average value of all text distances other in cluster, for quantifying in cluster Condensation degree, b (xi) it is text xiWith xiThe average distance of all texts, traverses every other cluster in an outer cluster, finds recently Average distance, for quantifying separating degree between cluster.
Cluster number of clusters, mean profile coefficient L (x) are determined with the mean profile coefficient of entire text setkAre as follows:
In formula: N is the amount of text of entire text set;
When mean profile coefficient maximum, corresponding number of clusters k is best cluster number of clusters.
Attached drawing 2 is some numerical results of the embodiment of the present invention, when k takes 4 as can be seen from Figure 2, has maximum mean profile system Number, the i.e. result of K mean cluster are best.
Step 5: every cluster in step 4 being clustered, gives label as requirement item, and calculate the different degree of requirement item;
Label is extracted from each cluster word occurred according to word frequency of occurrence size each in cluster as requirement item Number sorts from large to small, and recommends engineer for word frequency of occurrence is biggish, such mark is therefrom summed up by engineer Label are requirement item.It is shown such as some numerical results such as table 2 of the present embodiment cluster." metro noise " be can choose as demand ?.
2. word frequency of occurrence of table
Sequence Word Word frequency of occurrence
1 Subway 541
2 Ear 426
3 Noise 346
4 Sound 312
Passenger demand item different degree is measured with the counterpropagate temperature of the requirement item.Wherein, the meter of temperature is propagated Calculate formula are as follows:
In formula: nsFor amount of text in every cluster, ZiFor the transfer amount of i-th text in every cluster, DiFor the i-th provision in every cluster This amount of thumbing up, PiFor the comment amount of i-th text in every cluster, w1、w2And w3For constant, k is cluster number of clusters;
w1、w2And w3The weight for respectively indicating forwarding, thumbing up and commenting on, meets w1+w2+w3=1.
User repeats the influence of dispatch in order to prevent, with propagation range gk=ls/nsIt is modified to temperature is propagated, wherein lsFor the number of users sent the documents in every cluster, revised propagation temperature is expressed as:
r′k=rk×gk
In formula: r 'kFor revised propagation temperature, gkTo propagate range, lsFor the number of users sent the documents in every cluster;
Counterpropagate temperature, that is, different degree calculation formula are as follows:
In formula: S is total text collection number, r 'iFor the propagation temperature after i-th demand correction, i is requirement item label.
Step 6: requirement item obtained in step 5 is first determined whether it is present in demand dictionary, if then exiting, If otherwise judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, be added to demand word if meeting Library is exited if being unsatisfactory for.
Acquired demand is evaluated in conjunction with the propagation temperature and propagation persistence of requirement item, judges whether it is newly to need It asks.It is likely to occur the requirement item not having in demand dictionary in the passenger demand item of acquisition, is needed according to propagation temperature and propagation Persistence judges these requirement items, judges whether it can be used as new demand item and add to demand dictionary.Mainly in two steps It carries out:
1) requirement item that will acquire is matched with existing requirement item in demand dictionary, it is determined whether by not having in dictionary Some requirement items;
2) by the counterpropagate temperature of requirement item and counterpropagate persistence and pre-set threshold value comparison.It is lasting to propagate Degree is the propagation duration for measuring new demand.
Propagate persistence jkIt is as follows:
In formula: r 'k0、r′k1、r′k2For the propagation temperature obtained in continuous three periods, wherein r 'k0It is obtained for this Propagate temperature;Acquisition is dynamically, i.e., to obtain data from social network-i i-platform automatically every a period.This refers to hair Emerging, pent-up demand acquisition time section, r ' are showedk1、r′k2It respectively refers to following for the first time and when following the obtains twice Between section propagation temperature.
Counterpropagate persistence JkAre as follows:
In formula: S is total text collection number, jiFor the propagation persistence of i-th demand, i is requirement item label.
It, could be as when the counterpropagate temperature and counterpropagate persistence of new demand are simultaneously greater than the threshold value set Candidate new demand, then by manually being judged.
A kind of Metro Passenger demand dynamic acquisition system, including Data Acquisition Model, text can be constructed according to the above method Preprocessing module, text filtering module, text cluster module, requirement extract module, new demand evaluation module and demand dictionary;For Preferably be managed further include demand check module and demand dictionary management module realize engineer to systematic difference and Maintenance.
For demand dictionary for storing the relevant requirement item of railcar passenger demand, specially Metro Passenger demand is relevant Word.
Data acquisition module is for obtaining the dispatch data being used in social network-i i-platform;By the requirement item in demand dictionary Related dispatch data are grabbed in social network-i i-platform as keyword.Additionally it is possible to pass through the acquisition frequency that the module is arranged Rate obtains data in real time.
Text Pretreatment module is for pre-processing the text of acquisition;According to above-mentioned filter method to obtain text into Row primary filtration;Filtered text is subjected to participle and part-of-speech tagging, then based on deactivated vocabulary and part of speech filtering incorporeity meaning The word of justice.
Text filtering module is used to filter out in text and the incoherent text of passenger demand;With information gain feature selecting Method obtains to identify the Feature Words of text type, then obtains Feature Words with term frequency-inverse document words-frequency feature value calculating method Each text feature value vectorization is exported filtered text by SVM classifier as input by characteristic value.
Text cluster module is used to carry out correlation cluster to filtered text data;For by filtered text into Row cluster using K mean cluster algorithm, and determines with mean profile coefficient the quantity of clustering cluster.
Requirement extract module is used to extract the requirement item in each clustering cluster;It is needed for extracting the passenger in each clustering cluster It asks;The biggish word of frequency is recommended engineer, by it by calculating the frequency that each word occurs in each cluster by the module Given requirement item title.Requirement item different degree is determined by counterpropagate temperature.
New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary. By the threshold value comparison of the counterpropagate temperature of new demand and counterpropagate persistence and setting, meet threshold value recommends engineering Teacher allows engineer to judge whether it is new demand, and new demand is stored in demand dictionary, realizes the update of dictionary.
Demand can also be set and check that module and demand dictionary management module, demand check module, using display, provided Visualization interface, passenger demand is extracted, evaluates and checked.Requirement extract and evaluation implementation process and corresponding step It is identical, it repeats no more.In addition, by the requirement item that requirement extract module is extracted and the different degree being calculated, with curve graph Form show.Such as railcar, shown in the form of shown in Fig. 4.The different degree that curve A is subway wifi in figure becomes Change curve, B is the different degree change curve of subway stationarity, and C is the different degree change curve of subway speed, and D is metro noise Different degree change curve.
Demand dictionary management module is used for maintenance needs dictionary, is enriched constantly demand dictionary according to the new demand of acquisition, together When can modify and delete demand.
The present invention for current railcar passenger demand acquisition methods not only need to expend a large amount of manpowers and also have compared with Big subjectivity.It is proposed a kind of Metro Passenger demand dynamic acquisition method and system based on social network-i i-platform.Using data Text Mining Technology in excavation excavates reaction passenger to the need of railcar from the dispatch of social network-i i-platform user It asks.It is compared with the traditional method, can automatically analyze a large amount of user's dispatch, obtain potential passenger demand, improve number of users According to efficiency is obtained, reducing subjectivity influences.The dynamic need of mass users can be analyzed in real time, and it is inclined persistently to capture passenger demand It is good, and effective passenger demand different degree is extracted accordingly, it furthermore can also in real time, automatically find emerging, potential user Demand, the driving factors as railcar research and development.

Claims (8)

1. a kind of Metro Passenger demand dynamic acquisition method, which comprises the following steps:
Step 1: building demand dictionary, dictionary obtains user's dispatch data from social network-i i-platform according to demand;
Step 2: the data obtained to step 1 pre-process;
Step 3: using the filtering of SVM classifier and the incoherent text of Metro Passenger demand;
Step 4: the filtered text of step 3 is subjected to correlation cluster by the modified K mean cluster method of silhouette coefficient;
Step 5: to each clustering cluster in step 4, giving label as requirement item, and calculate the different degree of requirement item;
Step 6: requirement item obtained in step 5 being first determined whether it is present in demand dictionary, if then exiting, if not Then judge its different degree and counterpropagate persistence whether and meanwhile meet preset threshold, have found new demand item if meeting, and Demand dictionary is added it to, is exited if being unsatisfactory for.
2. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that the step 1 obtains Take data procedures as follows:
It is retrieved in social network-i i-platform using the word in demand dictionary as keyword, obtains user's dispatch;It is climbed by network Worm obtains text data.
3. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that the specific mistake of step 3 Journey is as follows:
S11: to the pretreated text random sampling of step 2, training sample and test sample are generated;
S12: determining related text and uncorrelated text according to training sample and determines its Feature Words respectively, calculates training sample letter Yield value is greater than the word of given threshold as Feature Words by the information gain value for ceasing entropy and each word;
Training samples information entropy IG (X) calculating process is as follows:
In formula: X is training sample set, N1And N2Respectively indicate related text quantity and uncorrelated amount of text;
Information gain value IG (word) calculating process of each word is as follows:
In formula: word is the word that training sample is concentrated, and A, B are respectively that each word occurs in related text and uncorrelated text Frequency, C, D are respectively the frequency that each word does not occur in related text and uncorrelated text;
S13: calculating the characteristic value of Feature Words in each text, and text representation is characterized value vector;
S14: SVM classifier is constructed according to training sample, improves classifier with test sample;
S15: the support vector classifier obtained using step S14 classifies to data, is divided into demand related text and non-phase Text is closed, uncorrelated text is removed.
4. a kind of Metro Passenger demand dynamic acquisition method according to claim 3, which is characterized in that in the step 4 K mean cluster is first passed through headed by the modified K mean cluster method of silhouette coefficient, optimum cluster cluster is then determined by silhouette coefficient Number k;
K mean cluster process is as follows:
Determine in certain clustering cluster each point to the square distance and dist (S of cluster centrek):
In formula: SkFor the text collection of each cluster, xiFor SkThe feature value vector of text, n in clustersFor SkThe quantity of text, u in clusterk For SkThe cluster centre of cluster, i are text label in cluster;
Wherein ukIt is as follows:
In Clustering Domain all samples to cluster centre distance quadratic sum dist (S) are as follows:
In formula: k is the number of clusters of cluster, and S is total text collection number, and j is each clustering cluster label in text collection;
Silhouette coefficient L (xi) it is as follows:
In formula: a (xi) it is text xiWith it with the average value of all text distances other in cluster, b (xi) it is text xiWith xiOutside A cluster in all texts average distance;
Mean profile coefficient L (x)kAre as follows:
In formula: N is the amount of text of entire text set;
When mean profile coefficient maximum, corresponding number of clusters k is best cluster number of clusters.
5. a kind of Metro Passenger demand dynamic acquisition method according to claim 1, which is characterized in that step 5 weight It is as follows to spend calculating process:
S21: temperature r is propagatedkIt is as follows:
In formula: nsFor amount of text in every cluster, ZiFor the transfer amount of i-th text in every cluster, DiFor i-th text in every cluster The amount of thumbing up, PiFor the comment amount of i-th text in every cluster, w1、w2And w3For constant, k is cluster number of clusters;
S22: it is modified with range is propagated to temperature is propagated:
r′k=rk×gk
In formula: r 'kFor revised propagation temperature, gkTo propagate range, gk=ls/ns, lsFor the number of users sent the documents in every cluster;
S23: different degree RkCalculation method is as follows:
In formula: S is total text collection number, ri' for the propagation temperature after i-th demand correction, i is requirement item label.
6. a kind of Metro Passenger demand dynamic acquisition method according to claim 5, which is characterized in that in the step 6 Counterpropagate persistence calculating process is as follows:
S31: persistence j is propagatedkIt is as follows:
In formula: r 'k0、r′k1、r′k2For the propagation temperature obtained in continuous three periods, wherein r 'k0The propagation obtained for this Temperature;
S32: counterpropagate persistence JkAre as follows:
In formula: S is total text collection number, jiFor the propagation persistence of i-th demand, i is requirement item label.
7. a kind of Metro Passenger demand dynamic acquisition method according to claim 3, which is characterized in that the step S13 Middle characteristic value is measured by term frequency-inverse document word frequency, and term frequency-inverse document word frequency TF-IDF calculation method is as follows:
TF-IDF (word)=TF (word) × IDF (word)
In formula: TF is word frequency of occurrences in a text, and IDF is word frequency of occurrences in other texts, TF (word) For some word, the frequency of occurrences, IDF (word) are the inverse document frequency of some word occur in text collection in a text;
Wherein:
In formula: W (word) is word frequency of occurrence in a text, and W is word sum of this time in place text, and F is Training sample word sum, F (word) are word frequency of occurrence in training sample.
8. using the acquisition system such as any one of claim 1~7 Metro Passenger demand dynamic acquisition method, which is characterized in that Including Data Acquisition Model, Text Pretreatment module, text filtering module, text cluster module, requirement extract module, new demand Evaluation module and demand dictionary;
Demand dictionary is for storing the relevant requirement item of railcar passenger demand;
Data acquisition module is used to obtain the dispatch data in social network-i i-platform;
Text Pretreatment module is for pre-processing the text of acquisition;
Text filtering module is used to filter out in text and the incoherent text of passenger demand;
Text cluster module is used to carry out correlation cluster to filtered text data;
Requirement extract module is used to extract the requirement item in each clustering cluster;
New demand evaluation module is updated demand dictionary for judging whether requirement item is included in demand dictionary.
CN201910561357.XA 2019-06-26 2019-06-26 Subway passenger demand dynamic acquisition method and acquisition system thereof Active CN110347828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910561357.XA CN110347828B (en) 2019-06-26 2019-06-26 Subway passenger demand dynamic acquisition method and acquisition system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910561357.XA CN110347828B (en) 2019-06-26 2019-06-26 Subway passenger demand dynamic acquisition method and acquisition system thereof

Publications (2)

Publication Number Publication Date
CN110347828A true CN110347828A (en) 2019-10-18
CN110347828B CN110347828B (en) 2022-03-15

Family

ID=68183218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910561357.XA Active CN110347828B (en) 2019-06-26 2019-06-26 Subway passenger demand dynamic acquisition method and acquisition system thereof

Country Status (1)

Country Link
CN (1) CN110347828B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297401A (en) * 2021-12-14 2022-04-08 中航机载系统共性技术有限公司 System knowledge extraction method based on clustering algorithm
CN114445141A (en) * 2022-01-26 2022-05-06 西南交通大学 Customer demand obtaining method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137845A1 (en) * 2009-12-09 2011-06-09 Zemoga, Inc. Method and apparatus for real time semantic filtering of posts to an internet social network
US20130080212A1 (en) * 2011-09-26 2013-03-28 Xerox Corporation Methods and systems for measuring engagement effectiveness in electronic social media
CN103678564A (en) * 2013-12-09 2014-03-26 国家计算机网络与信息安全管理中心 Internet product research system based on data mining
CN104484343A (en) * 2014-11-26 2015-04-01 无锡清华信息科学与技术国家实验室物联网技术中心 Topic detection and tracking method for microblog
CN107909478A (en) * 2017-11-27 2018-04-13 苏州点对点信息科技有限公司 FOF mutual fund portfolio system and methods based on social network clustering and information gain entropy index
CN107908753A (en) * 2017-11-20 2018-04-13 合肥工业大学 Customer demand method for digging and device based on social media comment data
CN108388660A (en) * 2018-03-08 2018-08-10 中国计量大学 A kind of improved electric business product pain spot analysis method
CN109165996A (en) * 2018-07-18 2019-01-08 浙江大学 Product function feature importance analysis method based on online user's comment
CN109829166A (en) * 2019-02-15 2019-05-31 重庆师范大学 People place customer input method for digging based on character level convolutional neural networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137845A1 (en) * 2009-12-09 2011-06-09 Zemoga, Inc. Method and apparatus for real time semantic filtering of posts to an internet social network
US20130080212A1 (en) * 2011-09-26 2013-03-28 Xerox Corporation Methods and systems for measuring engagement effectiveness in electronic social media
CN103678564A (en) * 2013-12-09 2014-03-26 国家计算机网络与信息安全管理中心 Internet product research system based on data mining
CN104484343A (en) * 2014-11-26 2015-04-01 无锡清华信息科学与技术国家实验室物联网技术中心 Topic detection and tracking method for microblog
CN107908753A (en) * 2017-11-20 2018-04-13 合肥工业大学 Customer demand method for digging and device based on social media comment data
CN107909478A (en) * 2017-11-27 2018-04-13 苏州点对点信息科技有限公司 FOF mutual fund portfolio system and methods based on social network clustering and information gain entropy index
CN108388660A (en) * 2018-03-08 2018-08-10 中国计量大学 A kind of improved electric business product pain spot analysis method
CN109165996A (en) * 2018-07-18 2019-01-08 浙江大学 Product function feature importance analysis method based on online user's comment
CN109829166A (en) * 2019-02-15 2019-05-31 重庆师范大学 People place customer input method for digging based on character level convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑治豪等: "基于社交媒体大数据的交通感知分析系统", 《自动化学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114297401A (en) * 2021-12-14 2022-04-08 中航机载系统共性技术有限公司 System knowledge extraction method based on clustering algorithm
CN114445141A (en) * 2022-01-26 2022-05-06 西南交通大学 Customer demand obtaining method

Also Published As

Publication number Publication date
CN110347828B (en) 2022-03-15

Similar Documents

Publication Publication Date Title
Keneshloo et al. Predicting the popularity of news articles
CN103246670B (en) Microblogging sequence, search, methods of exhibiting and system
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
CN105550269A (en) Product comment analyzing method and system with learning supervising function
CN108763484A (en) A kind of law article recommendation method based on LDA topic models
CN103116637A (en) Text sentiment classification method facing Chinese Web comments
US10387805B2 (en) System and method for ranking news feeds
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN103150333A (en) Opinion leader identification method in microblog media
CN105225135B (en) Potential customer identification method and device
CN111309900B (en) Legal class similarity judging and pushing method
CN102156747B (en) Method and device for forecasting collaborative filtering mark by introduction of social tag
CN102880631A (en) Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
CN104538036A (en) Speaker recognition method based on semantic cell mixing model
CN110717654A (en) Product quality evaluation method and system based on user comments
CN110347828A (en) A kind of Metro Passenger demand dynamic acquisition method and its obtain system
CN110889092A (en) Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data
KR20180131146A (en) Apparatus and Method for Identifying Core Issues of Each Evaluation Criteria from User Reviews
CN109961311A (en) Lead referral method, apparatus calculates equipment and storage medium
CN110109902A (en) A kind of electric business platform recommender system based on integrated learning approach
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
De Oña et al. Analyzing transit service quality evolution using decision trees and gender segmentation
CN104361015A (en) Mail classification and recognition method
Qi et al. Investigation of the influence of Twitter user habits on sentiment of their opinions towards transportation services
KR100913049B1 (en) Method and system for providing positive / negative search result using user preference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant