CN108304586A - A kind of availability of data improvement method of task orientation - Google Patents
A kind of availability of data improvement method of task orientation Download PDFInfo
- Publication number
- CN108304586A CN108304586A CN201810186852.2A CN201810186852A CN108304586A CN 108304586 A CN108304586 A CN 108304586A CN 201810186852 A CN201810186852 A CN 201810186852A CN 108304586 A CN108304586 A CN 108304586A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- attribute
- source
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Abstract
The invention discloses a kind of availability of data improvement methods of task orientation, and the correlation based on two part figures theory and data attribute and task attribute builds the potentially useful attribute excavation model of task orientation;And the data attribute correlation based on two part figures theory and task orientation, the multi-source data mining model with complementary attribute of structure task orientation;Then the potentially useful attribute of available data collection and complementary multi-source data are excavated by the potentially useful attribute excavation model of constructed task orientation;Other multi-source data collection of the available data collection with complementary attribute are excavated by the multi-source data mining model with complementary attribute of constructed task orientation again.Inherent nature mining model proposed by the invention and complementary multi-source data mining model, it is more than available attributes and multi-source data expected from user that can be filtered out for particular task, and then the realization efficiency of particular task can be improved.
Description
Technical field
The present invention relates to data processing fields, and in particular to a kind of availability of data improvement method of task orientation.
Background technology
With the development of information technology, data retrieval capabilities, which have, to be greatly improved, we can obtain magnanimity, more in time
Source, isomery data, however, for specific decision or prediction task, related data has great noise, namely very much
Existing obtainable data attribute is uncorrelated to particular task target;On the other hand, due to information island, data-privacy safety etc.
Reason much has and can not obtain in time with particular prediction or the relevant data attribute of decision task.
Thus there is an intrinsic contradictions in the availability of data analytic process of particular task:Specific task needs
Specific data attribute is wanted, but we cannot obtain these attributes from available data;There are many attributes for data available, but
These characteristics are not directly dependent upon with specific task.Previous problem is " mission requirements are supplied more than data ", i.e., specific to appoint
Being engaged in requirement cannot be by the attributes match of perhaps multiattribute data available;Latter problem is " data supply is more than mission requirements ",
I.e. there are many available data attributes, but they do not have the attribute of specific mission requirements related.For available data can
It is always that theoretical circles and application circle are paid close attention to more with sex chromosome mosaicism, namely for the data dependence and problem of completeness of particular task
Hot and difficult issue.
Currently, the shortcomings that prior art includes mainly the following:
(1) research of the quality of data is concentrated mainly on the accuracy and correlation research of data, and for towards specific
The research of the availability of data of business is less;
(2) research of data set correlation is concentrated mainly on information retrieval field, and application is mainly reflected in e-commerce
Precision marketing and personalized recommendation, and the research of current data dependence focuses mostly in available data attribute and mission requirements
Correlation still lacks the relevant mining for the potential valuable value attribute of data;
(3) correlative study of data set completeness is concentrated mainly on domain of data fusion, and application is mainly reflected in data
Transaction field, and current data extrapolating research focuses mostly in the integrality of available data attribute, still lacks and is directed to specific
Business demand excavates the complementarity of multi-source heterogeneous data attribute.
Invention content
To solve the above problems, the present invention provides a kind of availability of data improvement methods of task orientation.
To achieve the above object, the technical solution that the present invention takes is:
A kind of availability of data improvement method of task orientation, includes the following steps:
S1, correlation and completeness based on data attribute with task attribute formulate the availability of data of task orientation
Quantitative assessing index system;
S2, the correlation based on two part figures theory and data attribute and task attribute, build the potential of task orientation
Available attributes mining model;
S3, the data attribute correlation based on two part figures theory and task orientation, having for structure task orientation are mutual
Mend the multi-source data mining model of attribute;
S4, the potential of available data collection is excavated by the potentially useful attribute excavation model of constructed task orientation
Available attributes and complementary multi-source data;
S5, number is had by the multi-source data mining model excavation with complementary attribute of constructed task orientation
There are other multi-source data collection of complementary attribute according to collection.
Wherein, the potentially useful attribute excavation model of the task orientation is built by following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF;
Output:Data source DjWith task TiMatching matrix with potentially useful property matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiIn, select the data source D with maximum matching valuej;
Step 3:For particular task TiWith particular source Dj, it is based on data attribute matrix MDFWith task attribute matrix MTF
Calculate particular source DjPotentially useful degree;
Step 3.1:Calculate data source DjEach attribute and task TiEach attribute between degree of correlation CF;
Step 3.2:Based on certain dependent thresholds, the higher attribute of the degree of correlation is selected, and the addition of these attributes is taken office
Be engaged in TiProperty set in;
Step 3.3:Task based access control TiNew attribute, pass through data attribute matrix MDFWith new task attribute matrix MTFIt calculates
Data source DjPotentially useful degree;
4th step:It repeats the above steps, until traversing all tasks.
The step S4 specifically comprises the following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF;
Output:Data source DjWith task TiMatching matrix with complementary availability matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiThe data source D of the middle maximum matching value of selectionj;
Step 3:For specific tasks Ti(contain D with data source Dj), according to data attribute matrix MDFWith task attribute matrix
MTF, calculate particular source D between data source DjThe complementarity of availability;
Step 3.1:Calculate each data source DjSimilarity S between particular sourceD;
Step 3.2:Based on certain similarity threshold, the lower data source of similarity is selected;
Step 3.3:By selected data source (including particular source Dj) it is aggregated into entire data source D;
Step 3.4:Based on new data source D and data attribute matrix MDFWith new task attribute matrix MTF, calculate
Data source DjThe complementarity of availability;
4th step:It repeats the above steps, until traversing all tasks
According to a certain particular task in said program, and the potentially useful of data available is excavated based on certain dependent thresholds
Attribute, and excavate the multi-source data that there are potential supplementary functions with available data collection;Specifically:
(1) by the mining model of potentially useful attribute, we can be that particular task selects suitable available data sets
Inherent nature, so as to effectively improve the availability of available data.It is not only does this facilitate and increases the available of available data
Value, and the sunk cost of available data can be reduced.
(2) mining model of multi-source complementary data can be applied to the multi-source complementary data collection selection of particular task, in turn
It can more efficiently realize particular task.Be not only does this facilitate increase particular task realized value, but also can reduce by
The opportunity cost caused by the attribute requirements of particular task part is can not achieve in lacking data.
(3) consider available data and particular task, available data all can be improved for particular task reality in relational approach
Existing validity.
In short, inherent nature mining model proposed by the invention and complementary multi-source data mining model, can be directed to specific
Task, which filters out, to be more than available attributes and multi-source data expected from user, and then the realization efficiency of particular task can be improved.
Description of the drawings
Fig. 1 is available data collection usable value and the sunk cost signal that approach is improved based on different data collection availability
Figure;
In figure:(a) usable value of data set attribute;(b) sunk cost of data set attribute.
Fig. 2 is the realizable value and opportunity cost for the particular task that approach is improved based on different data availability;
In figure:(a) realizable value of particular task;(b) opportunity cost of particular task.
Fig. 3 is validity of the available data to particular task that approach is improved based on different data availability.
Specific implementation mode
In order to make objects and advantages of the present invention be more clearly understood, the present invention is carried out with reference to embodiments further
It is described in detail.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to limit this hair
It is bright.
Embodiment
The matching matrix of table 1. task attribute and data attribute
(1) potential value (Potential value) of data set:PV=13/ (13+36)=26.53%
(2) sunk cost (Sunk cost) of data set:SC=36/ (13+36)=73.47%
(3) realizable value (Realization value) of particular task:RV=13/ (13+28)=31.71%
(4) opportunity cost (Opportunity cost) of particular task:OC=28/ (13+28)=68.29%
(5) validity (Validity) of the data set for particular task:V=13/ (36+13+28)=16.88%
User's Travel Demand Forecasting example:
(1) particular task describes
Congested in traffic and demand side of driving is being solved, customization public service can provide quick and efficient clothes for passenger
Business.Currently, domestic, there are " the real-time public transport " of " the panda public transport " in Dalian, " heart enjoys bus " in Hangzhou and Nanjing etc. customizations
Public transit system.
Customization public transport be it is a kind of with demand be oriented to Public Transport Service, according to user trip needs, be capable of providing spirit
" specific time ", " locality " and " one seat of a people " bus service living, wherein user's Travel Demand Forecasting is to realize to have
The premise and key of effect customization bus service.
In customizing bus service, the specific tasks of user's Travel Demand Forecasting are required there are many relevant data attribute,
But existing multi-source data not only have part association attributes, but also exist with particular task demand properties it is unmatched its
His redundant attributes.
(2) available data describes:
Related data has following features:
(1) real-time:Pass through internet, phone, mobile phone and smart mobile phone;
(2) multi-source:Social media, smart card, point of interest map, GPS, location based service, video monitoring,
RFID;
(3) isomery:Information and travelling are changed in different traffic, such as public transport, taxi, subway, bicycle IC card information
Information;
(4) higher-dimension:Identity card, card type, travel permit, departure time, arrival time, starting station and destination;
(5) hierarchy:Urban district, street, etc..
Table 2 outlines associated data set and its attribute.
The data set and its attribute of 2. user's Travel Demand Forecasting of table
These above-mentioned data can by between confidentiality agreement and partner share, we with Nanjing Ya Gao Bus Groups
Cooperate and has data confidentiality agreement.We have collected relevant user's trip requirements data by data-interface, in addition, we from
The public transport company in Nanjing obtains historical user's travel data of real-time GPS data and passenger's IC card data.In short, in table 3
7 particular tasks, 13 data sets and relevant 23 attributes can pass through the inherent nature mining model of this research and complementation
Multi-source data mining model improves the realization efficiency of user's Travel Demand Forecasting task.
Table 3. can get data set, particular task and association attributes
(3) for it can get data set, the data set inherent nature mining model and complementary data of task orientation are dug
Dig the evaluation of result of model
For different mission requirements, property set possessed by specific set of data is there are different usable values and sinks
No cost.Excavate the potentially useful attribute of available data collection and complementary multi-source data by the model of this research, and by its with it is first
Beginning state is compared, it can be seen that the usable value of available data attribute is significantly increased, and its sunk cost has significant decrease.
Experimental result is as shown in Figure 1.
(4) for particular task, the data set inherent nature mining model and complementary data of task orientation excavate mould
The evaluation of result of type
For different available data collection and its attribute, particular task has different realized values and due to data
Different opportunity costs caused by attribute missing.The potentially useful attribute and mutually of available data collection is excavated by the model of this research
Multi-source data to be mended, and it is compared with original state, it can be seen that the realizable value of particular task is significantly increased, and its
Opportunity cost has significant decrease.Experimental result is as shown in Figure 2.
(5) particular task and available data, the data set inherent nature mining model and complementary data of task orientation are taken into account
The evaluation of result of mining model
The property set of available data and the property set of particular task are taken into account, available data collection is excavated by the model of this research
Potentially useful attribute and complementary multi-source data, and it is compared with original state, it can be seen that available data collection is for spy
The validity for determining task is significantly increased.Experimental result is as shown in Figure 3.
(6) particular task and available data collection, screening inherent nature and complementary multi-source data are directed to
For specific task, we can find the data attribute for being most suitable for the task in the initial state.Based on latent
In attribute excavation model, we can further filter out its potentially useful attribute.Based on complementary multi-source data mining model, I
Can further screen other multi-source data collection that there is complementary attribute for available data collection.Our experimental result is shown in Table 4.
Potentially useful attribute and complementary data collection of the table 4. based on particular task
To sum up, the embodiment of the present invention considers potentially useful attribute and multi-source complementary data according to particular task.This can be with
Us are helped to realize the surcharge of particular task.For example, we should be during decision based between data attribute
Task linked character solves the sparse deficiency of data value;Meanwhile we should also consider in data acquisition with it is existing
Data set has other multi-source data collection of complementary attribute, and then can reduce due to information island and data-privacy safety etc.
Data attribute caused by limitation lacks problem.In short, inherent nature mining model proposed by the invention and complementary multi-source number
According to mining model, it is more than available attributes and multi-source data expected from user that can be filtered out for particular task, and then can be carried
The realization efficiency of high particular task.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the principle of the present invention, it can also make several improvements and retouch, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (3)
1. a kind of availability of data improvement method of task orientation, which is characterized in that include the following steps:
S1, correlation and completeness based on data attribute with task attribute, the availability of data for formulating task orientation are quantitative
Assessment indicator system;
S2, the correlation based on two part figures theory and data attribute and task attribute, build the potentially useful of task orientation
Attribute excavation model;
S3, the data attribute correlation based on two part figures theory and task orientation, structure task orientation have complementary belong to
The multi-source data mining model of property;
S4, the potentially useful of available data collection is excavated by the potentially useful attribute excavation model of constructed task orientation
Attribute and complementary multi-source data;
S5, available data collection is excavated by the multi-source data mining model with complementary attribute of constructed task orientation
Other multi-source data collection with complementary attribute.
2. a kind of availability of data improvement method of task orientation as described in claim 1, which is characterized in that the task is led
The potentially useful attribute excavation model of tropism is built by following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF;
Output:Data source DjWith task TiMatching matrix with potentially useful property matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiIn, select the data source D with maximum matching valuej;
Step 3:For particular task TiWith particular source Dj, it is based on data attribute matrix MDFWith task attribute matrix MTFIt calculates
Particular source DjPotentially useful degree;
Step 3.1:Calculate data source DjEach attribute and task TiEach attribute between degree of correlation CF;
Step 3.2:Based on certain dependent thresholds, the higher attribute of the degree of correlation is selected, and these attributes are added to task Ti's
In property set;
Step 3.3:Task based access control TiNew attribute, pass through data attribute matrix MDFWith new task attribute matrix MTFCalculate data
Source DjPotentially useful degree;
4th step:It repeats the above steps, until traversing all tasks.
3. a kind of availability of data improvement method of task orientation as described in claim 1, which is characterized in that the step S4
Specifically comprise the following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF;
Output:Data source DjWith task TiMatching matrix with complementary availability matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiThe data source D of the middle maximum matching value of selectionj;
Step 3:For specific tasks Ti(contain D with data source Dj), according to data attribute matrix MDFWith task attribute matrix MTF, meter
Particular source D between calculation data source DjThe complementarity of availability;
Step 3.1:Calculate each data source DjSimilarity S between particular sourceD;
Step 3.2:Based on certain similarity threshold, the lower data source of similarity is selected;
Step 3.3:By selected data source (including particular source Dj) it is aggregated into entire data source D;
Step 3.4:Based on new data source D and data attribute matrix MDFWith new task attribute matrix MTF, calculate data source
DjThe complementarity of availability;
4th step:It repeats the above steps, until traversing all tasks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810186852.2A CN108304586A (en) | 2018-03-07 | 2018-03-07 | A kind of availability of data improvement method of task orientation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810186852.2A CN108304586A (en) | 2018-03-07 | 2018-03-07 | A kind of availability of data improvement method of task orientation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108304586A true CN108304586A (en) | 2018-07-20 |
Family
ID=62849389
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810186852.2A Pending CN108304586A (en) | 2018-03-07 | 2018-03-07 | A kind of availability of data improvement method of task orientation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108304586A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104820716A (en) * | 2015-05-21 | 2015-08-05 | 中国人民解放军海军工程大学 | Equipment reliability evaluation method based on data mining |
CN105787020A (en) * | 2016-02-24 | 2016-07-20 | 鄞州浙江清华长三角研究院创新中心 | Graph data partitioning method and device |
US20170053019A1 (en) * | 2015-08-17 | 2017-02-23 | Critical Informatics, Inc. | System to organize search and display unstructured data |
-
2018
- 2018-03-07 CN CN201810186852.2A patent/CN108304586A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104820716A (en) * | 2015-05-21 | 2015-08-05 | 中国人民解放军海军工程大学 | Equipment reliability evaluation method based on data mining |
US20170053019A1 (en) * | 2015-08-17 | 2017-02-23 | Critical Informatics, Inc. | System to organize search and display unstructured data |
CN105787020A (en) * | 2016-02-24 | 2016-07-20 | 鄞州浙江清华长三角研究院创新中心 | Graph data partitioning method and device |
Non-Patent Citations (1)
Title |
---|
王振涛: "基于二分图的RDF关键词扩展查询算法研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109189867A (en) | Relationship discovery method, apparatus and storage medium based on Corporate Intellectual map | |
CN103198104B (en) | A kind of public transport station OD acquisition methods based on city intelligent public transit system | |
Du et al. | Evaluation of the spatio-temporal pattern of urban ecological security using remote sensing and GIS | |
CN103123649B (en) | A kind of message searching method based on microblog and system | |
CN107038168A (en) | A kind of user's commuting track management method, apparatus and system | |
CN106096623A (en) | A kind of crime identifies and Forecasting Methodology | |
CN105389713A (en) | Mobile data traffic package recommendation algorithm based on user historical data | |
Zhang et al. | A system for tender price evaluation of construction project based on big data | |
CN104615687A (en) | Entity fine granularity classifying method and system for knowledge base updating | |
CN102426590A (en) | Quality evaluation method and device | |
CN106651027A (en) | Internet regular bus route optimization method based on social network | |
CN109325845A (en) | A kind of financial product intelligent recommendation method and system | |
CN106228440A (en) | A kind of income index based on dimension map coupling is efficiently entered an item of expenditure in the accounts method | |
CN110753307A (en) | Method for acquiring mobile phone signaling track data with label based on resident survey data | |
CN106911474A (en) | A kind of quantum key encryption method and device based on service attribute | |
CN104077723A (en) | Social network recommending system and social network recommending method | |
CN102073954A (en) | Financial clearing and settlement system and method for large business | |
CN105574761B (en) | A kind of taxpayer's interests related network parallel generation method based on Spark | |
CN110472797A (en) | A kind of city bus complex network automatic generating method based on web | |
CN113886596A (en) | Method for constructing flexible city knowledge graph based on city element and multi-disaster fusion | |
CN115130811A (en) | Method and device for establishing power user portrait and electronic equipment | |
CN104765763B (en) | A kind of semantic matching method of the Heterogeneous Spatial Information classification of service based on concept lattice | |
CN111428092B (en) | Bank accurate marketing method based on graph model | |
Lu et al. | Exploring travel patterns and static rebalancing strategies for dockless bike-sharing systems from multi-source data: a framework and case study | |
CN102750288B (en) | A kind of internet content recommend method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180720 |