CN108304586A - A kind of availability of data improvement method of task orientation - Google Patents

A kind of availability of data improvement method of task orientation Download PDF

Info

Publication number
CN108304586A
CN108304586A CN201810186852.2A CN201810186852A CN108304586A CN 108304586 A CN108304586 A CN 108304586A CN 201810186852 A CN201810186852 A CN 201810186852A CN 108304586 A CN108304586 A CN 108304586A
Authority
CN
China
Prior art keywords
data
task
attribute
source
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810186852.2A
Other languages
Chinese (zh)
Inventor
李保珍
韩占校
张亭亭
余臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING AUDIT UNIVERSITY
Original Assignee
NANJING AUDIT UNIVERSITY
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING AUDIT UNIVERSITY filed Critical NANJING AUDIT UNIVERSITY
Priority to CN201810186852.2A priority Critical patent/CN108304586A/en
Publication of CN108304586A publication Critical patent/CN108304586A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Abstract

The invention discloses a kind of availability of data improvement methods of task orientation, and the correlation based on two part figures theory and data attribute and task attribute builds the potentially useful attribute excavation model of task orientation;And the data attribute correlation based on two part figures theory and task orientation, the multi-source data mining model with complementary attribute of structure task orientation;Then the potentially useful attribute of available data collection and complementary multi-source data are excavated by the potentially useful attribute excavation model of constructed task orientation;Other multi-source data collection of the available data collection with complementary attribute are excavated by the multi-source data mining model with complementary attribute of constructed task orientation again.Inherent nature mining model proposed by the invention and complementary multi-source data mining model, it is more than available attributes and multi-source data expected from user that can be filtered out for particular task, and then the realization efficiency of particular task can be improved.

Description

A kind of availability of data improvement method of task orientation
Technical field
The present invention relates to data processing fields, and in particular to a kind of availability of data improvement method of task orientation.
Background technology
With the development of information technology, data retrieval capabilities, which have, to be greatly improved, we can obtain magnanimity, more in time Source, isomery data, however, for specific decision or prediction task, related data has great noise, namely very much Existing obtainable data attribute is uncorrelated to particular task target;On the other hand, due to information island, data-privacy safety etc. Reason much has and can not obtain in time with particular prediction or the relevant data attribute of decision task.
Thus there is an intrinsic contradictions in the availability of data analytic process of particular task:Specific task needs Specific data attribute is wanted, but we cannot obtain these attributes from available data;There are many attributes for data available, but These characteristics are not directly dependent upon with specific task.Previous problem is " mission requirements are supplied more than data ", i.e., specific to appoint Being engaged in requirement cannot be by the attributes match of perhaps multiattribute data available;Latter problem is " data supply is more than mission requirements ", I.e. there are many available data attributes, but they do not have the attribute of specific mission requirements related.For available data can It is always that theoretical circles and application circle are paid close attention to more with sex chromosome mosaicism, namely for the data dependence and problem of completeness of particular task Hot and difficult issue.
Currently, the shortcomings that prior art includes mainly the following:
(1) research of the quality of data is concentrated mainly on the accuracy and correlation research of data, and for towards specific The research of the availability of data of business is less;
(2) research of data set correlation is concentrated mainly on information retrieval field, and application is mainly reflected in e-commerce Precision marketing and personalized recommendation, and the research of current data dependence focuses mostly in available data attribute and mission requirements Correlation still lacks the relevant mining for the potential valuable value attribute of data;
(3) correlative study of data set completeness is concentrated mainly on domain of data fusion, and application is mainly reflected in data Transaction field, and current data extrapolating research focuses mostly in the integrality of available data attribute, still lacks and is directed to specific Business demand excavates the complementarity of multi-source heterogeneous data attribute.
Invention content
To solve the above problems, the present invention provides a kind of availability of data improvement methods of task orientation.
To achieve the above object, the technical solution that the present invention takes is:
A kind of availability of data improvement method of task orientation, includes the following steps:
S1, correlation and completeness based on data attribute with task attribute formulate the availability of data of task orientation Quantitative assessing index system;
S2, the correlation based on two part figures theory and data attribute and task attribute, build the potential of task orientation Available attributes mining model;
S3, the data attribute correlation based on two part figures theory and task orientation, having for structure task orientation are mutual Mend the multi-source data mining model of attribute;
S4, the potential of available data collection is excavated by the potentially useful attribute excavation model of constructed task orientation Available attributes and complementary multi-source data;
S5, number is had by the multi-source data mining model excavation with complementary attribute of constructed task orientation There are other multi-source data collection of complementary attribute according to collection.
Wherein, the potentially useful attribute excavation model of the task orientation is built by following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF
Output:Data source DjWith task TiMatching matrix with potentially useful property matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiIn, select the data source D with maximum matching valuej
Step 3:For particular task TiWith particular source Dj, it is based on data attribute matrix MDFWith task attribute matrix MTF Calculate particular source DjPotentially useful degree;
Step 3.1:Calculate data source DjEach attribute and task TiEach attribute between degree of correlation CF
Step 3.2:Based on certain dependent thresholds, the higher attribute of the degree of correlation is selected, and the addition of these attributes is taken office Be engaged in TiProperty set in;
Step 3.3:Task based access control TiNew attribute, pass through data attribute matrix MDFWith new task attribute matrix MTFIt calculates Data source DjPotentially useful degree;
4th step:It repeats the above steps, until traversing all tasks.
The step S4 specifically comprises the following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF
Output:Data source DjWith task TiMatching matrix with complementary availability matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiThe data source D of the middle maximum matching value of selectionj
Step 3:For specific tasks Ti(contain D with data source Dj), according to data attribute matrix MDFWith task attribute matrix MTF, calculate particular source D between data source DjThe complementarity of availability;
Step 3.1:Calculate each data source DjSimilarity S between particular sourceD
Step 3.2:Based on certain similarity threshold, the lower data source of similarity is selected;
Step 3.3:By selected data source (including particular source Dj) it is aggregated into entire data source D;
Step 3.4:Based on new data source D and data attribute matrix MDFWith new task attribute matrix MTF, calculate Data source DjThe complementarity of availability;
4th step:It repeats the above steps, until traversing all tasks
According to a certain particular task in said program, and the potentially useful of data available is excavated based on certain dependent thresholds Attribute, and excavate the multi-source data that there are potential supplementary functions with available data collection;Specifically:
(1) by the mining model of potentially useful attribute, we can be that particular task selects suitable available data sets Inherent nature, so as to effectively improve the availability of available data.It is not only does this facilitate and increases the available of available data Value, and the sunk cost of available data can be reduced.
(2) mining model of multi-source complementary data can be applied to the multi-source complementary data collection selection of particular task, in turn It can more efficiently realize particular task.Be not only does this facilitate increase particular task realized value, but also can reduce by The opportunity cost caused by the attribute requirements of particular task part is can not achieve in lacking data.
(3) consider available data and particular task, available data all can be improved for particular task reality in relational approach Existing validity.
In short, inherent nature mining model proposed by the invention and complementary multi-source data mining model, can be directed to specific Task, which filters out, to be more than available attributes and multi-source data expected from user, and then the realization efficiency of particular task can be improved.
Description of the drawings
Fig. 1 is available data collection usable value and the sunk cost signal that approach is improved based on different data collection availability Figure;
In figure:(a) usable value of data set attribute;(b) sunk cost of data set attribute.
Fig. 2 is the realizable value and opportunity cost for the particular task that approach is improved based on different data availability;
In figure:(a) realizable value of particular task;(b) opportunity cost of particular task.
Fig. 3 is validity of the available data to particular task that approach is improved based on different data availability.
Specific implementation mode
In order to make objects and advantages of the present invention be more clearly understood, the present invention is carried out with reference to embodiments further It is described in detail.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to limit this hair It is bright.
Embodiment
The matching matrix of table 1. task attribute and data attribute
(1) potential value (Potential value) of data set:PV=13/ (13+36)=26.53%
(2) sunk cost (Sunk cost) of data set:SC=36/ (13+36)=73.47%
(3) realizable value (Realization value) of particular task:RV=13/ (13+28)=31.71%
(4) opportunity cost (Opportunity cost) of particular task:OC=28/ (13+28)=68.29%
(5) validity (Validity) of the data set for particular task:V=13/ (36+13+28)=16.88%
User's Travel Demand Forecasting example:
(1) particular task describes
Congested in traffic and demand side of driving is being solved, customization public service can provide quick and efficient clothes for passenger Business.Currently, domestic, there are " the real-time public transport " of " the panda public transport " in Dalian, " heart enjoys bus " in Hangzhou and Nanjing etc. customizations Public transit system.
Customization public transport be it is a kind of with demand be oriented to Public Transport Service, according to user trip needs, be capable of providing spirit " specific time ", " locality " and " one seat of a people " bus service living, wherein user's Travel Demand Forecasting is to realize to have The premise and key of effect customization bus service.
In customizing bus service, the specific tasks of user's Travel Demand Forecasting are required there are many relevant data attribute, But existing multi-source data not only have part association attributes, but also exist with particular task demand properties it is unmatched its His redundant attributes.
(2) available data describes:
Related data has following features:
(1) real-time:Pass through internet, phone, mobile phone and smart mobile phone;
(2) multi-source:Social media, smart card, point of interest map, GPS, location based service, video monitoring, RFID;
(3) isomery:Information and travelling are changed in different traffic, such as public transport, taxi, subway, bicycle IC card information Information;
(4) higher-dimension:Identity card, card type, travel permit, departure time, arrival time, starting station and destination;
(5) hierarchy:Urban district, street, etc..
Table 2 outlines associated data set and its attribute.
The data set and its attribute of 2. user's Travel Demand Forecasting of table
These above-mentioned data can by between confidentiality agreement and partner share, we with Nanjing Ya Gao Bus Groups Cooperate and has data confidentiality agreement.We have collected relevant user's trip requirements data by data-interface, in addition, we from The public transport company in Nanjing obtains historical user's travel data of real-time GPS data and passenger's IC card data.In short, in table 3 7 particular tasks, 13 data sets and relevant 23 attributes can pass through the inherent nature mining model of this research and complementation Multi-source data mining model improves the realization efficiency of user's Travel Demand Forecasting task.
Table 3. can get data set, particular task and association attributes
(3) for it can get data set, the data set inherent nature mining model and complementary data of task orientation are dug Dig the evaluation of result of model
For different mission requirements, property set possessed by specific set of data is there are different usable values and sinks No cost.Excavate the potentially useful attribute of available data collection and complementary multi-source data by the model of this research, and by its with it is first Beginning state is compared, it can be seen that the usable value of available data attribute is significantly increased, and its sunk cost has significant decrease. Experimental result is as shown in Figure 1.
(4) for particular task, the data set inherent nature mining model and complementary data of task orientation excavate mould The evaluation of result of type
For different available data collection and its attribute, particular task has different realized values and due to data Different opportunity costs caused by attribute missing.The potentially useful attribute and mutually of available data collection is excavated by the model of this research Multi-source data to be mended, and it is compared with original state, it can be seen that the realizable value of particular task is significantly increased, and its Opportunity cost has significant decrease.Experimental result is as shown in Figure 2.
(5) particular task and available data, the data set inherent nature mining model and complementary data of task orientation are taken into account The evaluation of result of mining model
The property set of available data and the property set of particular task are taken into account, available data collection is excavated by the model of this research Potentially useful attribute and complementary multi-source data, and it is compared with original state, it can be seen that available data collection is for spy The validity for determining task is significantly increased.Experimental result is as shown in Figure 3.
(6) particular task and available data collection, screening inherent nature and complementary multi-source data are directed to
For specific task, we can find the data attribute for being most suitable for the task in the initial state.Based on latent In attribute excavation model, we can further filter out its potentially useful attribute.Based on complementary multi-source data mining model, I Can further screen other multi-source data collection that there is complementary attribute for available data collection.Our experimental result is shown in Table 4.
Potentially useful attribute and complementary data collection of the table 4. based on particular task
To sum up, the embodiment of the present invention considers potentially useful attribute and multi-source complementary data according to particular task.This can be with Us are helped to realize the surcharge of particular task.For example, we should be during decision based between data attribute Task linked character solves the sparse deficiency of data value;Meanwhile we should also consider in data acquisition with it is existing Data set has other multi-source data collection of complementary attribute, and then can reduce due to information island and data-privacy safety etc. Data attribute caused by limitation lacks problem.In short, inherent nature mining model proposed by the invention and complementary multi-source number According to mining model, it is more than available attributes and multi-source data expected from user that can be filtered out for particular task, and then can be carried The realization efficiency of high particular task.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the principle of the present invention, it can also make several improvements and retouch, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (3)

1. a kind of availability of data improvement method of task orientation, which is characterized in that include the following steps:
S1, correlation and completeness based on data attribute with task attribute, the availability of data for formulating task orientation are quantitative Assessment indicator system;
S2, the correlation based on two part figures theory and data attribute and task attribute, build the potentially useful of task orientation Attribute excavation model;
S3, the data attribute correlation based on two part figures theory and task orientation, structure task orientation have complementary belong to The multi-source data mining model of property;
S4, the potentially useful of available data collection is excavated by the potentially useful attribute excavation model of constructed task orientation Attribute and complementary multi-source data;
S5, available data collection is excavated by the multi-source data mining model with complementary attribute of constructed task orientation Other multi-source data collection with complementary attribute.
2. a kind of availability of data improvement method of task orientation as described in claim 1, which is characterized in that the task is led The potentially useful attribute excavation model of tropism is built by following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF
Output:Data source DjWith task TiMatching matrix with potentially useful property matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiIn, select the data source D with maximum matching valuej
Step 3:For particular task TiWith particular source Dj, it is based on data attribute matrix MDFWith task attribute matrix MTFIt calculates Particular source DjPotentially useful degree;
Step 3.1:Calculate data source DjEach attribute and task TiEach attribute between degree of correlation CF
Step 3.2:Based on certain dependent thresholds, the higher attribute of the degree of correlation is selected, and these attributes are added to task Ti's In property set;
Step 3.3:Task based access control TiNew attribute, pass through data attribute matrix MDFWith new task attribute matrix MTFCalculate data Source DjPotentially useful degree;
4th step:It repeats the above steps, until traversing all tasks.
3. a kind of availability of data improvement method of task orientation as described in claim 1, which is characterized in that the step S4 Specifically comprise the following steps:
Input:Data attribute matrix MDF, task attribute matrix MTF
Output:Data source DjWith task TiMatching matrix with complementary availability matching value;
Step 1:Based on bipartite graph theoretical calculation data task matrix
Step 2:In data task matrix MDTParticular task TiThe data source D of the middle maximum matching value of selectionj
Step 3:For specific tasks Ti(contain D with data source Dj), according to data attribute matrix MDFWith task attribute matrix MTF, meter Particular source D between calculation data source DjThe complementarity of availability;
Step 3.1:Calculate each data source DjSimilarity S between particular sourceD
Step 3.2:Based on certain similarity threshold, the lower data source of similarity is selected;
Step 3.3:By selected data source (including particular source Dj) it is aggregated into entire data source D;
Step 3.4:Based on new data source D and data attribute matrix MDFWith new task attribute matrix MTF, calculate data source DjThe complementarity of availability;
4th step:It repeats the above steps, until traversing all tasks.
CN201810186852.2A 2018-03-07 2018-03-07 A kind of availability of data improvement method of task orientation Pending CN108304586A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810186852.2A CN108304586A (en) 2018-03-07 2018-03-07 A kind of availability of data improvement method of task orientation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810186852.2A CN108304586A (en) 2018-03-07 2018-03-07 A kind of availability of data improvement method of task orientation

Publications (1)

Publication Number Publication Date
CN108304586A true CN108304586A (en) 2018-07-20

Family

ID=62849389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810186852.2A Pending CN108304586A (en) 2018-03-07 2018-03-07 A kind of availability of data improvement method of task orientation

Country Status (1)

Country Link
CN (1) CN108304586A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820716A (en) * 2015-05-21 2015-08-05 中国人民解放军海军工程大学 Equipment reliability evaluation method based on data mining
CN105787020A (en) * 2016-02-24 2016-07-20 鄞州浙江清华长三角研究院创新中心 Graph data partitioning method and device
US20170053019A1 (en) * 2015-08-17 2017-02-23 Critical Informatics, Inc. System to organize search and display unstructured data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820716A (en) * 2015-05-21 2015-08-05 中国人民解放军海军工程大学 Equipment reliability evaluation method based on data mining
US20170053019A1 (en) * 2015-08-17 2017-02-23 Critical Informatics, Inc. System to organize search and display unstructured data
CN105787020A (en) * 2016-02-24 2016-07-20 鄞州浙江清华长三角研究院创新中心 Graph data partitioning method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王振涛: "基于二分图的RDF关键词扩展查询算法研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN109189867A (en) Relationship discovery method, apparatus and storage medium based on Corporate Intellectual map
CN103198104B (en) A kind of public transport station OD acquisition methods based on city intelligent public transit system
Du et al. Evaluation of the spatio-temporal pattern of urban ecological security using remote sensing and GIS
CN103123649B (en) A kind of message searching method based on microblog and system
CN107038168A (en) A kind of user's commuting track management method, apparatus and system
CN106096623A (en) A kind of crime identifies and Forecasting Methodology
CN105389713A (en) Mobile data traffic package recommendation algorithm based on user historical data
Zhang et al. A system for tender price evaluation of construction project based on big data
CN104615687A (en) Entity fine granularity classifying method and system for knowledge base updating
CN102426590A (en) Quality evaluation method and device
CN106651027A (en) Internet regular bus route optimization method based on social network
CN109325845A (en) A kind of financial product intelligent recommendation method and system
CN106228440A (en) A kind of income index based on dimension map coupling is efficiently entered an item of expenditure in the accounts method
CN110753307A (en) Method for acquiring mobile phone signaling track data with label based on resident survey data
CN106911474A (en) A kind of quantum key encryption method and device based on service attribute
CN104077723A (en) Social network recommending system and social network recommending method
CN102073954A (en) Financial clearing and settlement system and method for large business
CN105574761B (en) A kind of taxpayer's interests related network parallel generation method based on Spark
CN110472797A (en) A kind of city bus complex network automatic generating method based on web
CN113886596A (en) Method for constructing flexible city knowledge graph based on city element and multi-disaster fusion
CN115130811A (en) Method and device for establishing power user portrait and electronic equipment
CN104765763B (en) A kind of semantic matching method of the Heterogeneous Spatial Information classification of service based on concept lattice
CN111428092B (en) Bank accurate marketing method based on graph model
Lu et al. Exploring travel patterns and static rebalancing strategies for dockless bike-sharing systems from multi-source data: a framework and case study
CN102750288B (en) A kind of internet content recommend method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180720