CN110110224A - A kind of data migration method and system based on the multiple label of data - Google Patents
A kind of data migration method and system based on the multiple label of data Download PDFInfo
- Publication number
- CN110110224A CN110110224A CN201910305239.2A CN201910305239A CN110110224A CN 110110224 A CN110110224 A CN 110110224A CN 201910305239 A CN201910305239 A CN 201910305239A CN 110110224 A CN110110224 A CN 110110224A
- Authority
- CN
- China
- Prior art keywords
- data
- label
- application
- metadata tag
- dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of data migration methods and system based on the multiple label of data.Method includes: acquisition data, it is that the data establish metadata tag based on data dictionary, construct metadata tag collection, dictionary and application scenarios are that the data are established using label based on the data, tally set is applied in building, it combines metadata tag collection and applies tally set, form the multiple label of data, carry out data analysis using the identical multiple label of data in different application systems.By establishing the unitized multiple label of data for data, each application scenarios corresponds to unique data application label, the Data Migration of acquisition easily can be subjected to data mining into different application systems, the universality of data application is improved, and then improves the accuracy and efficiency of data mining.
Description
Technical field
The invention belongs to Data Transference Technology fields, and in particular to a kind of data migration method based on the multiple label of data
And system.
Background technique
As the extension of the development of Internet era, the increase of number of users, application function brings the data of magnanimity.It fills
Divide and effectively utilize these big datas, information, such as the data rule of development etc. that data are hidden behind can be grasped, thus more smart
Quasi- formulation is tactful accordingly.Data mining is effective a kind of means in big data analysis, passes through statistics, analysis, retrieval, machine
The methods of device study, expert system and pattern-recognition, obtain the information hidden in data.
Typically, data mining is carried out by increasing tagged mode to data.Such as a video file can root
Its label is used as according to personage involved in content, event, position etc..But the use of data label is more random, especially exists
In terms of user's customized label, this randomness is more prominent.Since this randomness causes data label shortage unique right
It should be related to, so that deviation will occur in the result of data mining.
It proposes in the prior art and establishes standard label library, the customized label of user and standard label are subjected to character string
User's customized label, is mapped on standard label, by tag standards to a certain degree by matching if successful match
On improve the accuracy of data mining.But data mining application range is wider, can be used for a variety of different types of applications
System, every kind of application system have one or more application scenarios, and single standard label is difficult to adapt to different application systems
Demand.Such as after acquiring several label datas progress public sentiment monitorings suitable for public sentiment monitoring system in microblogging application,
If necessary to which these data are used for interest recommender system, due to lack relevant interest recommend respective labels and cause data without
Method migration, so that the data universality of acquisition is poor, and then the efficiency of data mining and accuracy are lower.
Summary of the invention
In order to which the single standard label universality for solving above-mentioned is poor, the efficiency and accuracy of data mining are lower
Technical problem, the invention proposes a kind of data migration methods and system based on the multiple label of data.
A kind of data migration method based on the multiple label of data, comprising: acquisition data;It is the number based on data dictionary
According to metadata tag is established, metadata tag collection is constructed;Dictionary and application scenarios are that data foundation is answered based on the data
With label, tally set is applied in building;It combines metadata tag collection and applies tally set, form the multiple label of data;And not
Data analysis is carried out using the identical multiple label of data in same application system.
Further, the data dictionary is preset, and being includes metadata keys and application environment keyword
Set of keywords.
Further, the metadata tag collection includes one or more metadata tags, and the metadata tag is selected from
Metadata keys in the data dictionary.
Further, the application tally set includes one or more application label, and the application label is selected from the number
According to the application environment keyword in dictionary.
Further, described to be corresponded to using label for describing characteristic of the data in application scenarios, an application scenarios
Label is applied in one.
A kind of data mover system based on the multiple label of data, comprising: data acquisition module, for acquiring data;Member
Data label collection establishes module, is that the data establish metadata tag based on data dictionary, constructs metadata tag collection;Using
Tally set establishes module, and dictionary and application scenarios are that the data are established using label based on the data, and label is applied in building
Collection;Composite module forms the multiple label of data for combining the metadata tag collection and using tally set;And data point
Module is analysed, for carrying out data analysis using the identical multiple label of data in different application systems.
Further, the data dictionary is preset, is the pass of metadata keys and application environment keyword
Key word set.
Further, the metadata tag collection includes one or more metadata tags, and the metadata tag is selected from
Metadata keys in the data dictionary.
Further, the application tally set includes one or more application label, and the application label is selected from the number
According to the application environment keyword in dictionary.
Further, described to be corresponded to using label for describing characteristic of the data in application scenarios, an application scenarios
Label is applied in one.
Beneficial effects of the present invention: by establishing the unitized multiple label of data, each application scenarios pair for data
Unique data application label is answered, the Data Migration of acquisition can be carried out easily to data digging into different application systems
Pick, improves the universality of data application, and then improve the accuracy and efficiency of data mining.What the embodiment of the present invention proposed
Method and system can be used for analyzing network data and being managed, for example, being recorded according to accession page of the user on network
Or access history, providing user may interested content.
Detailed description of the invention
Fig. 1 is the method flow diagram based on multiple label data moving method proposed according to embodiments of the present invention;
Fig. 2 is the multiple label schematic diagram of data proposed according to embodiments of the present invention;
Fig. 3 is the structural schematic diagram based on a variety of label data migratory systems of data proposed according to embodiments of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference
Attached drawing, the present invention is described in more detail.But as known to those skilled in the art, the invention is not limited to attached drawings and following reality
Apply example.
The embodiment of the present invention proposes a kind of data migration method based on the multiple label of data.
Fig. 1 is a kind of data migration method based on multiple label proposed according to embodiments of the present invention.As shown in Figure 1,
In step 110, data are acquired.Web data can be crawled by Python, user uploads the modes such as data and obtains data.
Definition: data dictionary is expressed as DD[r], metadata keys be expressed as MD[i], application environment keyword: AE[j], r,
I, j=1 ... n.
In the step 120, it is that data establish metadata tag based on data dictionary DD, constructs metadata tag collection MD.Number
Be according to dictionary DD it is preset, for the data application knowledge base comprising metadata keys and application environment keyword.
Keyword of the data label in data dictionary.Metadata tag collection MD is for describing data fundamental characteristics
Tally set, be the key that choose relevant to metadata characteristics keyword from the metadata keys of data dictionary to be constituted
Set of wordsOne data has the attribute of n metadata, has corresponded to n keyword in data dictionary, these passes
The set of key word just constitutes a metadata tag collection.Metadata tag collection includes one or more labels, including but not limited to
Data source is classified, name, founder, renewal time etc..
In step 130, dictionary DD and application scenarios are that the data are established using label based on the data, and building is answered
With tally set AE.In different application environments, data have different characteristics, answer for describing data in difference using label
With the characteristic in scene.According to each application scenarios, corresponding key is chosen from the application environment keyword of data dictionary
Word, which is used as, applies label, and each application scenarios corresponding one is applied label.One data can be applied to n application scenarios, tool
There are n using label, this n constitutes using label using tally set AE, meets
In step 140, combine metadata tag collection MD and using tally set AE constitute data multiple label MD,
AE}.Fig. 2 is the multiple label schematic diagram of a kind of data proposed according to embodiments of the present invention.As shown in Fig. 2, data label collection and
The multiple label of data is constituted using tally set using what label formed by multiple.
{ MD, AE }=∑ MD+ ∑ AE
In step 150, data analysis is carried out using the multiple label of identical data in different application systems.To
In different application systems, the multiple label of identical data can be used.So as to which easily the data of acquisition are moved
It moves to and carries out data mining in different application systems, improve the universality of data application, and then improve data mining
Accuracy and efficiency.
The embodiment of the present invention also proposed a kind of data mover system based on the multiple label of data.
Fig. 3 is a kind of data mover system based on the multiple label of data proposed according to embodiments of the present invention.Such as Fig. 3 institute
Show, data mover system 300 includes data acquisition module 310, for acquiring data.Webpage number can be crawled by Python
The modes such as data, which are uploaded, according to, user obtains data.
Data mover system 300 further includes that metadata tag collection establishes module 320, is that data establish member based on data dictionary
Data label constructs metadata tag collection.Data dictionary be it is preset, being includes that metadata keys and application environment are closed
The data application knowledge base of key word.Keyword of the data label in data dictionary.Metadata tag collection is for describing
The tally set of data fundamental characteristics is that keyword relevant to metadata characteristics is chosen from the metadata keys of data dictionary
The keyword set constituted.One data has the attribute of n metadata, has corresponded to n keyword in data dictionary, this
The set of a little keywords just constitutes a metadata tag collection.Metadata tag collection includes one or more labels, including but not
It is limited to data source, classifies, name, founder, renewal time etc..
Data mover system 300 further includes establishing module 330 using tally set, based on the data dictionary and application scenarios
It is established for the data and applies label, tally set is applied in building.In different application environments, data have different characteristics,
Using label for describing characteristic of the data in different application scene.According to each application scenarios, from answering for data dictionary
It uses and chooses corresponding keyword in environment keyword as using label, each application scenarios corresponding one is applied label.One
Data can be applied to n application scenarios, there are n to be applied label, this n constitutes using label using tally set.
Data mover system 300 further includes composite module 340, for combining metadata tag collection and application tally set with shape
At the multiple label of data.
Data mover system 300 further includes data analysis module 350, and identical data are used in different application systems
Multiple label carries out data analysis.To which the multiple label of identical data can be used in different application systems.To
The Data Migration of acquisition easily can be subjected to data mining into different application systems, improve the pervasive of data application
Property, and then improve the accuracy and efficiency of data mining.The embodiment of the present invention also proposes a kind of computer readable storage medium,
The step of being stored thereon with computer program, the above method realized when which is executed by processor.
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage
The step of computer program, the processor realizes the above method when executing described program.
It will be understood by those skilled in the art that in flow charts indicate or logic described otherwise above herein and/or
Step may be embodied in and appoint for example, being considered the order list of the executable instruction for realizing logic function
In what computer-readable medium, for instruction execution system, device or equipment (such as computer based system including processor
System or other can be from instruction execution system, device or equipment instruction fetch and the system executed instruction) use, or combine this
A little instruction execution systems, device or equipment and use.For the purpose of this specification, " computer-readable medium " can be it is any can be with
Include, store, communicate, propagate, or transport program is for instruction execution system, device or equipment or in conjunction with these instruction execution systems
System, device or equipment and the device used.
The more specific example (non-exhaustive list) of computer-readable medium include the following: there are one or more wirings
Electrical connection section (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any
One or more embodiment or examples in can be combined in any suitable manner.
More than, embodiments of the present invention are illustrated.But the present invention is not limited to above embodiment.It is all
Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in guarantor of the invention
Within the scope of shield.
Claims (10)
1. a kind of data migration method based on the multiple label of data characterized by comprising
Acquire data;
It is that the data establish metadata tag based on data dictionary, constructs metadata tag collection;
Dictionary and application scenarios are that the data are established using label based on the data, and tally set is applied in building;
It combines metadata tag collection and applies tally set, form the multiple label of data;And
Data analysis is carried out using the identical multiple label of data in different application systems.
2. data migration method as described in claim 1, which is characterized in that the data dictionary be it is preset, for packet
Include the set of keywords of metadata keys and application environment keyword.
3. data migration method as claimed in claim 2, which is characterized in that the metadata tag collection includes one or more
Metadata tag, metadata keys of the metadata tag in the data dictionary.
4. data migration method as claimed in claim 2, which is characterized in that the application tally set is answered including one or more
With label, the application environment keyword using label in the data dictionary.
5. data migration method as described in claim 1, which is characterized in that the application label is being applied for describing data
Characteristic in scene, an application scenarios correspond to one and apply label.
6. a kind of data mover system based on the multiple label of data characterized by comprising
Data acquisition module, for acquiring data;
Metadata tag collection establishes module, is that the data establish metadata tag based on data dictionary, constructs metadata tag
Collection;
Module is established using tally set, dictionary and application scenarios are that the data are established using label, building based on the data
Using tally set;
Composite module forms the multiple label of data for combining the metadata tag collection and using tally set;And
Data analysis module, for carrying out data point using the identical multiple label of data in different application systems
Analysis.
7. data mover system as claimed in claim 6, which is characterized in that the data dictionary be it is preset, for packet
Include the set of keywords of metadata keys and application environment keyword.
8. data mover system as claimed in claim 7, which is characterized in that the metadata tag collection includes one or more
Metadata tag, metadata keys of the metadata tag in the data dictionary.
9. data mover system as claimed in claim 7, which is characterized in that the application tally set is answered including one or more
With label, the application environment keyword using label in the data dictionary.
10. data mover system as claimed in claim 6, which is characterized in that the application label is being answered for describing data
With the characteristic in scene, an application scenarios correspond to one and apply label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910305239.2A CN110110224A (en) | 2019-04-16 | 2019-04-16 | A kind of data migration method and system based on the multiple label of data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910305239.2A CN110110224A (en) | 2019-04-16 | 2019-04-16 | A kind of data migration method and system based on the multiple label of data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110110224A true CN110110224A (en) | 2019-08-09 |
Family
ID=67485535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910305239.2A Pending CN110110224A (en) | 2019-04-16 | 2019-04-16 | A kind of data migration method and system based on the multiple label of data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110110224A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110082825A1 (en) * | 2009-10-05 | 2011-04-07 | Nokia Corporation | Method and apparatus for providing a co-creation platform |
CN102982076A (en) * | 2012-10-30 | 2013-03-20 | 新华通讯社 | Multi-dimensionality content labeling method based on semanteme label database |
CN108153465A (en) * | 2016-12-05 | 2018-06-12 | 百度在线网络技术(北京)有限公司 | Label setting method and device based on enterprise SaaS applications |
CN108197197A (en) * | 2017-12-27 | 2018-06-22 | 北京百度网讯科技有限公司 | Entity description type label method for digging, device and terminal device |
CN109471904A (en) * | 2018-11-01 | 2019-03-15 | 杭州数澜科技有限公司 | A kind of method and system for tissue label |
CN109522333A (en) * | 2018-11-23 | 2019-03-26 | 北京锐安科技有限公司 | Data analysing method, device, equipment and medium |
-
2019
- 2019-04-16 CN CN201910305239.2A patent/CN110110224A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110082825A1 (en) * | 2009-10-05 | 2011-04-07 | Nokia Corporation | Method and apparatus for providing a co-creation platform |
CN102982076A (en) * | 2012-10-30 | 2013-03-20 | 新华通讯社 | Multi-dimensionality content labeling method based on semanteme label database |
CN108153465A (en) * | 2016-12-05 | 2018-06-12 | 百度在线网络技术(北京)有限公司 | Label setting method and device based on enterprise SaaS applications |
CN108197197A (en) * | 2017-12-27 | 2018-06-22 | 北京百度网讯科技有限公司 | Entity description type label method for digging, device and terminal device |
CN109471904A (en) * | 2018-11-01 | 2019-03-15 | 杭州数澜科技有限公司 | A kind of method and system for tissue label |
CN109522333A (en) * | 2018-11-23 | 2019-03-26 | 北京锐安科技有限公司 | Data analysing method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8374914B2 (en) | Advertising using image comparison | |
AU2014201827B2 (en) | Scoring concept terms using a deep network | |
US9449271B2 (en) | Classifying resources using a deep network | |
US11574145B2 (en) | Cross-modal weak supervision for media classification | |
WO2016161976A1 (en) | Method and device for selecting data content to be pushed to terminals | |
CN102171689A (en) | Providing posts to discussion threads in response to a search query | |
US10282752B2 (en) | Computerized system and method for displaying a map system user interface and digital content | |
US11755676B2 (en) | Systems and methods for generating real-time recommendations | |
US9135357B2 (en) | Using scenario-related information to customize user experiences | |
CN102483745A (en) | Co-selected image classification | |
CN108021660B (en) | Topic self-adaptive microblog emotion analysis method based on transfer learning | |
CN103902697A (en) | Combinatorial search method, client and server | |
CN111310074B (en) | Method and device for optimizing labels of interest points, electronic equipment and computer readable medium | |
US20230388261A1 (en) | Determining topic cohesion between posted and linked content | |
CN104268192A (en) | Webpage information extracting method, device and terminal | |
CN113343091A (en) | Industrial and enterprise oriented science and technology service recommendation calculation method, medium and program | |
CN114692007B (en) | Method, device, equipment and storage medium for determining representation information | |
CN114201516A (en) | User portrait construction method, information recommendation method and related device | |
CN113569118B (en) | Self-media pushing method, device, computer equipment and storage medium | |
CN106462588B (en) | Content creation from extracted content | |
JP2023517518A (en) | Vector embedding model for relational tables with null or equivalent values | |
CN115131052A (en) | Data processing method, computer equipment and storage medium | |
CN116977701A (en) | Video classification model training method, video classification method and device | |
Abebe et al. | Overview of event-based collective knowledge management in multimedia digital ecosystems | |
CN110516162A (en) | A kind of information recommendation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190809 |