CN110110224A - A kind of data migration method and system based on the multiple label of data - Google Patents

A kind of data migration method and system based on the multiple label of data Download PDF

Info

Publication number
CN110110224A
CN110110224A CN201910305239.2A CN201910305239A CN110110224A CN 110110224 A CN110110224 A CN 110110224A CN 201910305239 A CN201910305239 A CN 201910305239A CN 110110224 A CN110110224 A CN 110110224A
Authority
CN
China
Prior art keywords
data
label
application
metadata tag
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910305239.2A
Other languages
Chinese (zh)
Inventor
王石
邢国贤
赵学豪
张俊曦
吴坤鹏
黄蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Jin Lian (beijing) Science And Technology Co Ltd
Original Assignee
Zhongke Jin Lian (beijing) Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Jin Lian (beijing) Science And Technology Co Ltd filed Critical Zhongke Jin Lian (beijing) Science And Technology Co Ltd
Priority to CN201910305239.2A priority Critical patent/CN110110224A/en
Publication of CN110110224A publication Critical patent/CN110110224A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of data migration methods and system based on the multiple label of data.Method includes: acquisition data, it is that the data establish metadata tag based on data dictionary, construct metadata tag collection, dictionary and application scenarios are that the data are established using label based on the data, tally set is applied in building, it combines metadata tag collection and applies tally set, form the multiple label of data, carry out data analysis using the identical multiple label of data in different application systems.By establishing the unitized multiple label of data for data, each application scenarios corresponds to unique data application label, the Data Migration of acquisition easily can be subjected to data mining into different application systems, the universality of data application is improved, and then improves the accuracy and efficiency of data mining.

Description

A kind of data migration method and system based on the multiple label of data
Technical field
The invention belongs to Data Transference Technology fields, and in particular to a kind of data migration method based on the multiple label of data And system.
Background technique
As the extension of the development of Internet era, the increase of number of users, application function brings the data of magnanimity.It fills Divide and effectively utilize these big datas, information, such as the data rule of development etc. that data are hidden behind can be grasped, thus more smart Quasi- formulation is tactful accordingly.Data mining is effective a kind of means in big data analysis, passes through statistics, analysis, retrieval, machine The methods of device study, expert system and pattern-recognition, obtain the information hidden in data.
Typically, data mining is carried out by increasing tagged mode to data.Such as a video file can root Its label is used as according to personage involved in content, event, position etc..But the use of data label is more random, especially exists In terms of user's customized label, this randomness is more prominent.Since this randomness causes data label shortage unique right It should be related to, so that deviation will occur in the result of data mining.
It proposes in the prior art and establishes standard label library, the customized label of user and standard label are subjected to character string User's customized label, is mapped on standard label, by tag standards to a certain degree by matching if successful match On improve the accuracy of data mining.But data mining application range is wider, can be used for a variety of different types of applications System, every kind of application system have one or more application scenarios, and single standard label is difficult to adapt to different application systems Demand.Such as after acquiring several label datas progress public sentiment monitorings suitable for public sentiment monitoring system in microblogging application, If necessary to which these data are used for interest recommender system, due to lack relevant interest recommend respective labels and cause data without Method migration, so that the data universality of acquisition is poor, and then the efficiency of data mining and accuracy are lower.
Summary of the invention
In order to which the single standard label universality for solving above-mentioned is poor, the efficiency and accuracy of data mining are lower Technical problem, the invention proposes a kind of data migration methods and system based on the multiple label of data.
A kind of data migration method based on the multiple label of data, comprising: acquisition data;It is the number based on data dictionary According to metadata tag is established, metadata tag collection is constructed;Dictionary and application scenarios are that data foundation is answered based on the data With label, tally set is applied in building;It combines metadata tag collection and applies tally set, form the multiple label of data;And not Data analysis is carried out using the identical multiple label of data in same application system.
Further, the data dictionary is preset, and being includes metadata keys and application environment keyword Set of keywords.
Further, the metadata tag collection includes one or more metadata tags, and the metadata tag is selected from Metadata keys in the data dictionary.
Further, the application tally set includes one or more application label, and the application label is selected from the number According to the application environment keyword in dictionary.
Further, described to be corresponded to using label for describing characteristic of the data in application scenarios, an application scenarios Label is applied in one.
A kind of data mover system based on the multiple label of data, comprising: data acquisition module, for acquiring data;Member Data label collection establishes module, is that the data establish metadata tag based on data dictionary, constructs metadata tag collection;Using Tally set establishes module, and dictionary and application scenarios are that the data are established using label based on the data, and label is applied in building Collection;Composite module forms the multiple label of data for combining the metadata tag collection and using tally set;And data point Module is analysed, for carrying out data analysis using the identical multiple label of data in different application systems.
Further, the data dictionary is preset, is the pass of metadata keys and application environment keyword Key word set.
Further, the metadata tag collection includes one or more metadata tags, and the metadata tag is selected from Metadata keys in the data dictionary.
Further, the application tally set includes one or more application label, and the application label is selected from the number According to the application environment keyword in dictionary.
Further, described to be corresponded to using label for describing characteristic of the data in application scenarios, an application scenarios Label is applied in one.
Beneficial effects of the present invention: by establishing the unitized multiple label of data, each application scenarios pair for data Unique data application label is answered, the Data Migration of acquisition can be carried out easily to data digging into different application systems Pick, improves the universality of data application, and then improve the accuracy and efficiency of data mining.What the embodiment of the present invention proposed Method and system can be used for analyzing network data and being managed, for example, being recorded according to accession page of the user on network Or access history, providing user may interested content.
Detailed description of the invention
Fig. 1 is the method flow diagram based on multiple label data moving method proposed according to embodiments of the present invention;
Fig. 2 is the multiple label schematic diagram of data proposed according to embodiments of the present invention;
Fig. 3 is the structural schematic diagram based on a variety of label data migratory systems of data proposed according to embodiments of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.But as known to those skilled in the art, the invention is not limited to attached drawings and following reality Apply example.
The embodiment of the present invention proposes a kind of data migration method based on the multiple label of data.
Fig. 1 is a kind of data migration method based on multiple label proposed according to embodiments of the present invention.As shown in Figure 1, In step 110, data are acquired.Web data can be crawled by Python, user uploads the modes such as data and obtains data.
Definition: data dictionary is expressed as DD[r], metadata keys be expressed as MD[i], application environment keyword: AE[j], r, I, j=1 ... n.
In the step 120, it is that data establish metadata tag based on data dictionary DD, constructs metadata tag collection MD.Number Be according to dictionary DD it is preset, for the data application knowledge base comprising metadata keys and application environment keyword.
Keyword of the data label in data dictionary.Metadata tag collection MD is for describing data fundamental characteristics Tally set, be the key that choose relevant to metadata characteristics keyword from the metadata keys of data dictionary to be constituted Set of wordsOne data has the attribute of n metadata, has corresponded to n keyword in data dictionary, these passes The set of key word just constitutes a metadata tag collection.Metadata tag collection includes one or more labels, including but not limited to Data source is classified, name, founder, renewal time etc..
In step 130, dictionary DD and application scenarios are that the data are established using label based on the data, and building is answered With tally set AE.In different application environments, data have different characteristics, answer for describing data in difference using label With the characteristic in scene.According to each application scenarios, corresponding key is chosen from the application environment keyword of data dictionary Word, which is used as, applies label, and each application scenarios corresponding one is applied label.One data can be applied to n application scenarios, tool There are n using label, this n constitutes using label using tally set AE, meets
In step 140, combine metadata tag collection MD and using tally set AE constitute data multiple label MD, AE}.Fig. 2 is the multiple label schematic diagram of a kind of data proposed according to embodiments of the present invention.As shown in Fig. 2, data label collection and The multiple label of data is constituted using tally set using what label formed by multiple.
{ MD, AE }=∑ MD+ ∑ AE
In step 150, data analysis is carried out using the multiple label of identical data in different application systems.To In different application systems, the multiple label of identical data can be used.So as to which easily the data of acquisition are moved It moves to and carries out data mining in different application systems, improve the universality of data application, and then improve data mining Accuracy and efficiency.
The embodiment of the present invention also proposed a kind of data mover system based on the multiple label of data.
Fig. 3 is a kind of data mover system based on the multiple label of data proposed according to embodiments of the present invention.Such as Fig. 3 institute Show, data mover system 300 includes data acquisition module 310, for acquiring data.Webpage number can be crawled by Python The modes such as data, which are uploaded, according to, user obtains data.
Data mover system 300 further includes that metadata tag collection establishes module 320, is that data establish member based on data dictionary Data label constructs metadata tag collection.Data dictionary be it is preset, being includes that metadata keys and application environment are closed The data application knowledge base of key word.Keyword of the data label in data dictionary.Metadata tag collection is for describing The tally set of data fundamental characteristics is that keyword relevant to metadata characteristics is chosen from the metadata keys of data dictionary The keyword set constituted.One data has the attribute of n metadata, has corresponded to n keyword in data dictionary, this The set of a little keywords just constitutes a metadata tag collection.Metadata tag collection includes one or more labels, including but not It is limited to data source, classifies, name, founder, renewal time etc..
Data mover system 300 further includes establishing module 330 using tally set, based on the data dictionary and application scenarios It is established for the data and applies label, tally set is applied in building.In different application environments, data have different characteristics, Using label for describing characteristic of the data in different application scene.According to each application scenarios, from answering for data dictionary It uses and chooses corresponding keyword in environment keyword as using label, each application scenarios corresponding one is applied label.One Data can be applied to n application scenarios, there are n to be applied label, this n constitutes using label using tally set.
Data mover system 300 further includes composite module 340, for combining metadata tag collection and application tally set with shape At the multiple label of data.
Data mover system 300 further includes data analysis module 350, and identical data are used in different application systems Multiple label carries out data analysis.To which the multiple label of identical data can be used in different application systems.To The Data Migration of acquisition easily can be subjected to data mining into different application systems, improve the pervasive of data application Property, and then improve the accuracy and efficiency of data mining.The embodiment of the present invention also proposes a kind of computer readable storage medium, The step of being stored thereon with computer program, the above method realized when which is executed by processor.
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage The step of computer program, the processor realizes the above method when executing described program.
It will be understood by those skilled in the art that in flow charts indicate or logic described otherwise above herein and/or Step may be embodied in and appoint for example, being considered the order list of the executable instruction for realizing logic function In what computer-readable medium, for instruction execution system, device or equipment (such as computer based system including processor System or other can be from instruction execution system, device or equipment instruction fetch and the system executed instruction) use, or combine this A little instruction execution systems, device or equipment and use.For the purpose of this specification, " computer-readable medium " can be it is any can be with Include, store, communicate, propagate, or transport program is for instruction execution system, device or equipment or in conjunction with these instruction execution systems System, device or equipment and the device used.
The more specific example (non-exhaustive list) of computer-readable medium include the following: there are one or more wirings Electrical connection section (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.
More than, embodiments of the present invention are illustrated.But the present invention is not limited to above embodiment.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in guarantor of the invention Within the scope of shield.

Claims (10)

1. a kind of data migration method based on the multiple label of data characterized by comprising
Acquire data;
It is that the data establish metadata tag based on data dictionary, constructs metadata tag collection;
Dictionary and application scenarios are that the data are established using label based on the data, and tally set is applied in building;
It combines metadata tag collection and applies tally set, form the multiple label of data;And
Data analysis is carried out using the identical multiple label of data in different application systems.
2. data migration method as described in claim 1, which is characterized in that the data dictionary be it is preset, for packet Include the set of keywords of metadata keys and application environment keyword.
3. data migration method as claimed in claim 2, which is characterized in that the metadata tag collection includes one or more Metadata tag, metadata keys of the metadata tag in the data dictionary.
4. data migration method as claimed in claim 2, which is characterized in that the application tally set is answered including one or more With label, the application environment keyword using label in the data dictionary.
5. data migration method as described in claim 1, which is characterized in that the application label is being applied for describing data Characteristic in scene, an application scenarios correspond to one and apply label.
6. a kind of data mover system based on the multiple label of data characterized by comprising
Data acquisition module, for acquiring data;
Metadata tag collection establishes module, is that the data establish metadata tag based on data dictionary, constructs metadata tag Collection;
Module is established using tally set, dictionary and application scenarios are that the data are established using label, building based on the data Using tally set;
Composite module forms the multiple label of data for combining the metadata tag collection and using tally set;And
Data analysis module, for carrying out data point using the identical multiple label of data in different application systems Analysis.
7. data mover system as claimed in claim 6, which is characterized in that the data dictionary be it is preset, for packet Include the set of keywords of metadata keys and application environment keyword.
8. data mover system as claimed in claim 7, which is characterized in that the metadata tag collection includes one or more Metadata tag, metadata keys of the metadata tag in the data dictionary.
9. data mover system as claimed in claim 7, which is characterized in that the application tally set is answered including one or more With label, the application environment keyword using label in the data dictionary.
10. data mover system as claimed in claim 6, which is characterized in that the application label is being answered for describing data With the characteristic in scene, an application scenarios correspond to one and apply label.
CN201910305239.2A 2019-04-16 2019-04-16 A kind of data migration method and system based on the multiple label of data Pending CN110110224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910305239.2A CN110110224A (en) 2019-04-16 2019-04-16 A kind of data migration method and system based on the multiple label of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910305239.2A CN110110224A (en) 2019-04-16 2019-04-16 A kind of data migration method and system based on the multiple label of data

Publications (1)

Publication Number Publication Date
CN110110224A true CN110110224A (en) 2019-08-09

Family

ID=67485535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910305239.2A Pending CN110110224A (en) 2019-04-16 2019-04-16 A kind of data migration method and system based on the multiple label of data

Country Status (1)

Country Link
CN (1) CN110110224A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082825A1 (en) * 2009-10-05 2011-04-07 Nokia Corporation Method and apparatus for providing a co-creation platform
CN102982076A (en) * 2012-10-30 2013-03-20 新华通讯社 Multi-dimensionality content labeling method based on semanteme label database
CN108153465A (en) * 2016-12-05 2018-06-12 百度在线网络技术(北京)有限公司 Label setting method and device based on enterprise SaaS applications
CN108197197A (en) * 2017-12-27 2018-06-22 北京百度网讯科技有限公司 Entity description type label method for digging, device and terminal device
CN109471904A (en) * 2018-11-01 2019-03-15 杭州数澜科技有限公司 A kind of method and system for tissue label
CN109522333A (en) * 2018-11-23 2019-03-26 北京锐安科技有限公司 Data analysing method, device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110082825A1 (en) * 2009-10-05 2011-04-07 Nokia Corporation Method and apparatus for providing a co-creation platform
CN102982076A (en) * 2012-10-30 2013-03-20 新华通讯社 Multi-dimensionality content labeling method based on semanteme label database
CN108153465A (en) * 2016-12-05 2018-06-12 百度在线网络技术(北京)有限公司 Label setting method and device based on enterprise SaaS applications
CN108197197A (en) * 2017-12-27 2018-06-22 北京百度网讯科技有限公司 Entity description type label method for digging, device and terminal device
CN109471904A (en) * 2018-11-01 2019-03-15 杭州数澜科技有限公司 A kind of method and system for tissue label
CN109522333A (en) * 2018-11-23 2019-03-26 北京锐安科技有限公司 Data analysing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US8374914B2 (en) Advertising using image comparison
AU2014201827B2 (en) Scoring concept terms using a deep network
US9449271B2 (en) Classifying resources using a deep network
US11574145B2 (en) Cross-modal weak supervision for media classification
WO2016161976A1 (en) Method and device for selecting data content to be pushed to terminals
CN102171689A (en) Providing posts to discussion threads in response to a search query
US10282752B2 (en) Computerized system and method for displaying a map system user interface and digital content
US11755676B2 (en) Systems and methods for generating real-time recommendations
US9135357B2 (en) Using scenario-related information to customize user experiences
CN102483745A (en) Co-selected image classification
CN108021660B (en) Topic self-adaptive microblog emotion analysis method based on transfer learning
CN103902697A (en) Combinatorial search method, client and server
CN111310074B (en) Method and device for optimizing labels of interest points, electronic equipment and computer readable medium
US20230388261A1 (en) Determining topic cohesion between posted and linked content
CN104268192A (en) Webpage information extracting method, device and terminal
CN113343091A (en) Industrial and enterprise oriented science and technology service recommendation calculation method, medium and program
CN114692007B (en) Method, device, equipment and storage medium for determining representation information
CN114201516A (en) User portrait construction method, information recommendation method and related device
CN113569118B (en) Self-media pushing method, device, computer equipment and storage medium
CN106462588B (en) Content creation from extracted content
JP2023517518A (en) Vector embedding model for relational tables with null or equivalent values
CN115131052A (en) Data processing method, computer equipment and storage medium
CN116977701A (en) Video classification model training method, video classification method and device
Abebe et al. Overview of event-based collective knowledge management in multimedia digital ecosystems
CN110516162A (en) A kind of information recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190809