CN109408688A - A kind of unstructured data marking management method and system - Google Patents

A kind of unstructured data marking management method and system Download PDF

Info

Publication number
CN109408688A
CN109408688A CN201811208798.3A CN201811208798A CN109408688A CN 109408688 A CN109408688 A CN 109408688A CN 201811208798 A CN201811208798 A CN 201811208798A CN 109408688 A CN109408688 A CN 109408688A
Authority
CN
China
Prior art keywords
module
data
label
unstructured data
unstructured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811208798.3A
Other languages
Chinese (zh)
Inventor
邓炽成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Zhitu Digital Research Information Technology Co Ltd
Original Assignee
Zhuhai Zhitu Digital Research Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Zhitu Digital Research Information Technology Co Ltd filed Critical Zhuhai Zhitu Digital Research Information Technology Co Ltd
Priority to CN201811208798.3A priority Critical patent/CN109408688A/en
Publication of CN109408688A publication Critical patent/CN109408688A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of unstructured data marking management method and systems, including tag control platform, the tag control platform includes that feature extraction module, memory management module, conversion loading module, data label module, access interface module and query processing module, tag control platform connecting pin are equipped with business abstract module and prosthetic processing module;The data label module includes tag creation module, label mark module and label memory module;The feature extraction module includes text abstraction module, image extraction module, audio abstraction module and video extraction module;The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text.The present invention realizes " one-stop " management of the data label creation for describing service attribute, conversion, storage using data label module by construction tag control platform, and it is horizontal to improve line business intelligent data analysis for raising big data utility value.

Description

A kind of unstructured data marking management method and system
Technical field
The present invention relates to data management field, in particular to a kind of unstructured data marking management method and system.
Background technique
The information that a large amount of crawlers crawl is mainly the data of various unformatteds, lacks formatting, standardized requirement, this There is many-sided incomplete defect to the analysis of line business in a little data, will have a direct impact on the analysis result of line business.
Therefore, it invents a kind of unstructured data marking management method and system is necessary to solve the above problems.
Summary of the invention
The purpose of the present invention is to provide a kind of unstructured data marking management method and systems, by building label tube Platform realizes " one-stop " pipe of data label creation, the conversion, storage of description service attribute using data label module Reason improves big data utility value, improves line business intelligent data analysis level, mentioned above in the background art to solve Problem.
To achieve the above object, the invention provides the following technical scheme: a kind of unstructured data marking management method and System, including tag control platform, the tag control platform include feature extraction module, memory management module, conversion load Module, data label module, access interface module and query processing module, tag control platform connecting pin are taken out equipped with business As module and prosthetic processing module;
The data label module includes tag creation module, label mark module and label memory module;
The feature extraction module includes text abstraction module;
The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text;
The memory management module provides storage modeling function, can be inserted into, modifies and delete unstructured data;
The conversion loading module is used for the feature according to extraction to the text of common format, image, audio and video number According to progress automatic processing;
The access interface module is used to cope with the interface requirement of query language, application program and Web service access;
The query processing module returns to function for providing result ranking and batch, and line range of going forward side by side is inquired, is complete Query text, sample inquiry and semantic query, look into the unstructured data in management system using a variety of inquiry modes It askes;
Business unstructured data is abstracted by the business abstract module, formulates the data standard for meeting business need;
The prosthetic processing module is specially user terminal, utilizes requirement pair that is artificial, handling according to unstructured data Data carry out artificial treatment, adjust data, comply with the data standard of business need;
The tag creation module, label conversion module and label memory module are for realizing the data for describing service attribute Label creation, conversion, the " one-stop " of storage manage.
Preferably, the memory management module supports the master datas classes such as shaping, floating type, Boolean type, character string, date Type.
Preferably, the conversion loading module provides preliminary automatic structureization to unstructured data and handles.
Preferably, the query processing module supports inquiry unstructured data.
The invention also discloses a kind of unstructured data marking management methods, specifically includes the following steps:
Step 1 operates tag control platform, when unstructured data is transmitted to tag control platform, storage tube It manages module and storage modeling is carried out according to the initial data of unstructured data, essential attribute, low-level image feature and semantic feature, so that Unstructured data carries out conversion storage in tag control platform, be applicable to the functional module of tag control platform interior into Row calculation process;
Step 2 is also handled into the unstructured processing in tag control platform by business abstract module, by industry Being engaged in, unstructured data is abstract, and formulation meets the data standard of business need;
Step 3, feature extraction module in unstructured data text, image, audio and video data all kinds of tools Body information and feature are extracted, and are carried out at preliminary automation using conversion loading module to the feature that text data extracts Reason;
Step 4, staff is by user terminal, and using artificial, the requirement handled according to unstructured data is to data Artificial treatment is carried out, data is adjusted, the data standard of business need is complied with, by artificial means, by the non-knot of downloading Structure data are converted into the normal data that can describe service attribute, the final output for realizing standardized data.
Step 5 is carried out according to the text in the unstructured data of conversion load, image, audio and video data information Classification, and sorted data are handled using data label module, to the carry out data mark for wherein describing service attribute Label creation, conversion and the management stored, and the data after management are transmitted in user terminal.
Technical effect and advantage of the invention:
1, by construction tag control platform, realize that the data label of description service attribute is created using data label module " one-stop " management building, convert, storing, improves big data utility value, it is horizontal to improve line business intelligent data analysis;
2, by the way that business to be abstracted, formulation meets the data standard of business need, will by automation and artificial means The unstructured data of downloading is converted into the normal data that can describe service attribute, so that unstructured data management system is run Efficiency obtained greatly being promoted.
Detailed description of the invention
Fig. 1 is overall structure of the present invention.
Fig. 2 is feature of present invention abstraction module structural schematic diagram.
Fig. 3 is data label modular structure schematic diagram of the present invention.
In figure: 1 tag control platform, 2 feature extraction modules, 3 memory management modules, 4 conversion loading modules, 5 data marks Sign module, 6 access interface modules, 7 query processing modules, 8 business abstract modules, 9 prosthetic processing modules.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
A kind of unstructured data as shown in Figs. 1-3 marks management system, including tag control platform 1, the label Managing platform 1 includes feature extraction module 2, memory management module 3, conversion loading module 4, data label module 5, access interface Module 6 and query processing module 7,1 connecting pin of tag control platform are equipped with business abstract module 8 and prosthetic processing module 9;
The data label module 5 includes tag creation module, label mark module and label memory module;
The feature extraction module 2 includes text abstraction module;
The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text;
The memory management module 3 provides the conceptual level for covering initial data, essential attribute, low-level image feature and semantic feature Modeling function is stored, the storage modeling function of logical layer is provided, it can be non-according to being inserted into, modifying and deleting in the storage example of foundation Structural data;
The conversion loading module 4 is used for the feature according to extraction to the text of common format, image, audio and video number According to progress automatic processing;
The access interface module 6 is used to cope with the interface requirement of query language, application program and Web service access;
The query processing module 7 returns to function for providing result ranking and batch, and line range of going forward side by side is inquired, is complete Query text, sample inquiry and semantic query, look into the unstructured data in management system using a variety of inquiry modes It askes;
Business unstructured data is abstracted by the business abstract module 8, formulates the data standard for meeting business need;
The prosthetic processing module 9 is specially user terminal, utilizes requirement pair that is artificial, handling according to unstructured data Data carry out artificial treatment, adjust data, comply with the data standard of business need;
The tag creation module, label mark module and label memory module are for realizing the data for describing service attribute Label creation, conversion, the " one-stop " of storage manage.
The memory management module 3 supports the basic data types such as shaping, floating type, Boolean type, character string, date, institute It states conversion loading module 4 and automatic processing is carried out to the text data of common format according to the feature of extraction;The query processing Module 7 supports inquiry unstructured data.
Embodiment 2
A kind of unstructured data marking management method as shown in Figs. 1-3, specifically includes the following steps:
Step 1 operates tag control platform 1, when unstructured data is transmitted to tag control platform 1, storage Management module 3 carries out storage modeling according to the initial data of unstructured data, essential attribute, low-level image feature and semantic feature, So that unstructured data carries out conversion storage in tag control platform 1, the function being applicable to inside tag control platform 1 Module carries out calculation process;
Step 2 is also handled by business abstract module 8 into the unstructured processing in tag control platform 1, will Business unstructured data is abstract, formulates the data standard for meeting business need;
Step 3, feature extraction module 2 in unstructured data text, image, audio and video data it is all kinds of Specifying information and feature are extracted, and carry out tentatively automatic to the feature that text data extracts using conversion loading module 4 Change processing;
Step 4, according in the unstructured data for failing automatic processing text, image, audio and video information into Row classification, and sorted data are handled using data label module 5, to the carry out data for wherein describing service attribute Label creation, conversion and the management stored, and the data after management are transmitted in user terminal;
Step 5, staff is by user terminal, and using artificial, the requirement handled according to unstructured data is to data Carry out artificial treatment, adjust data, comply with the data standard of business need, by automating and artificial means, will under The unstructured data of load is converted into the normal data that can describe service attribute, the final output for realizing standardized data.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention, Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features, All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (5)

1. a kind of unstructured data marks management system, including tag control platform (1), it is characterised in that: the label tube Platform (1) include feature extraction module (2), memory management module (3), conversion loading module (4), data label module (5), Access interface module (6) and query processing module (7), tag control platform (1) connecting pin are equipped with business abstract module (8) With prosthetic processing module (9);
The data label module (5) includes tag creation module, label mark module and label memory module;
The feature extraction module (2) includes text abstraction module;
The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text;
The memory management module (3) provides storage modeling function, can be inserted into, modifies and delete unstructured data;
Conversion loading module (4) is used for text, image, audio and video data according to the feature of extraction to common format Carry out automatic processing;
The access interface module (6) is used to cope with the interface requirement of query language, application program and Web service access;
The query processing module (7) returns to function, line range of going forward side by side inquiry, full text for providing result ranking and batch Inquiry, sample inquiry and semantic query, inquire the unstructured data in management system using a variety of inquiry modes;
Business unstructured data is abstracted by the business abstract module (8), formulates the data standard for meeting business need;
The prosthetic processing module (9) is specially user terminal, utilizes requirement logarithm that is artificial, handling according to unstructured data According to artificial treatment is carried out, data are adjusted, the data standard of business need is complied with;
The tag creation module, label conversion module and label memory module are for realizing the data label for describing service attribute Creation, conversion, the " one-stop " of storage manage.
2. a kind of unstructured data according to claim 1 marks management system, it is characterised in that: the storage management Module (3) supports the basic data types such as shaping, floating type, Boolean type, character string, date.
3. a kind of unstructured data according to claim 1 marks management system, it is characterised in that: the conversion load Module (4) provides preliminary automatic structureization to unstructured data and handles.
4. a kind of unstructured data according to claim 1 marks management system, it is characterised in that: the query processing Module (7) supports inquiry unstructured data.
5. a kind of unstructured data marking management method according to any one of claims 1-4, which is characterized in that tool Body the following steps are included:
Step 1 operates tag control platform (1), when unstructured data is transmitted to tag control platform (1), storage Management module (3) carries out storage according to the initial data of unstructured data, essential attribute, low-level image feature and semantic feature and builds Mould is applicable in tag control platform (1) so that unstructured data carries out conversion storage in tag control platform (1) The functional module in portion carries out calculation process;
Step 2 is also handled by business abstract module (8) into the unstructured processing in tag control platform (1), will Business unstructured data is abstract, formulates the data standard for meeting business need;
Step 3, feature extraction module (2) in unstructured data text, image, audio and video data all kinds of tools Body information and feature are extracted, and carry out tentatively automatic to the feature that text data extracts using conversion loading module (4) Change processing;
Step 4, staff is by user terminal, using artificial, is carried out according to the requirement that unstructured data is handled to data Artificial treatment adjusts data, the data standard of business need is complied with, by artificial means, by the unstructured of downloading Data are converted into the normal data that can describe service attribute, the final output for realizing standardized data.
Step 5 is divided according to the text in the unstructured data of conversion load, image, audio and video data information Class, and sorted data are handled using data label module (5), to the carry out data mark for wherein describing service attribute Label creation, conversion and the management stored, and the data after management are transmitted in user terminal.
CN201811208798.3A 2018-10-17 2018-10-17 A kind of unstructured data marking management method and system Pending CN109408688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811208798.3A CN109408688A (en) 2018-10-17 2018-10-17 A kind of unstructured data marking management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811208798.3A CN109408688A (en) 2018-10-17 2018-10-17 A kind of unstructured data marking management method and system

Publications (1)

Publication Number Publication Date
CN109408688A true CN109408688A (en) 2019-03-01

Family

ID=65468294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811208798.3A Pending CN109408688A (en) 2018-10-17 2018-10-17 A kind of unstructured data marking management method and system

Country Status (1)

Country Link
CN (1) CN109408688A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399988A (en) * 2019-07-31 2019-11-01 中国工商银行股份有限公司 Equipment portrait generation method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248129A1 (en) * 2005-04-29 2006-11-02 Wonderworks Llc Method and device for managing unstructured data
US20080235289A1 (en) * 2005-04-29 2008-09-25 Wonderworks Llc Method and device for managing unstructured data
CN102591896A (en) * 2011-01-05 2012-07-18 北京大用科技有限责任公司 System, implementation, application, and query language for a tetrahedral data model for unstructured data
CN104217003A (en) * 2014-09-15 2014-12-17 国家电网公司 Data modeling system
CN106202292A (en) * 2016-06-30 2016-12-07 中国电力科学研究院 A kind of standard information based on structural data model analyzes method
CN108021632A (en) * 2017-11-23 2018-05-11 中国移动通信集团河南有限公司 Unstructured data and the mutual conversion process method of structural data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248129A1 (en) * 2005-04-29 2006-11-02 Wonderworks Llc Method and device for managing unstructured data
US20080235289A1 (en) * 2005-04-29 2008-09-25 Wonderworks Llc Method and device for managing unstructured data
CN102591896A (en) * 2011-01-05 2012-07-18 北京大用科技有限责任公司 System, implementation, application, and query language for a tetrahedral data model for unstructured data
CN104217003A (en) * 2014-09-15 2014-12-17 国家电网公司 Data modeling system
CN106202292A (en) * 2016-06-30 2016-12-07 中国电力科学研究院 A kind of standard information based on structural data model analyzes method
CN108021632A (en) * 2017-11-23 2018-05-11 中国移动通信集团河南有限公司 Unstructured data and the mutual conversion process method of structural data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399988A (en) * 2019-07-31 2019-11-01 中国工商银行股份有限公司 Equipment portrait generation method and system

Similar Documents

Publication Publication Date Title
CN101937430B (en) Method for extracting event sentence pattern from Chinese sentence
CN103631882B (en) Semantization service generation system and method based on graph mining technique
CN104679867B (en) Address method of knowledge processing and device based on figure
CN104361018B (en) Electronic archives information reorganization method and device
CN106202292B (en) Standard information analysis method based on structured data model
CN105095320A (en) System for identifying, correlating, searching and displaying documents based on relationship superposition and combination
CN102184217A (en) Emergency plan generating system and method
CN113298435A (en) Intelligent construction scheme compiling method and system for building industry
CN110334214A (en) A kind of method of false lawsuit in automatic identification case
CN109033523A (en) A kind of Assembly process specification generation System and method for based on three-dimensional CAD model
CN112905685A (en) Framework management and control system and equipment for informatization construction
CN112084248A (en) Intelligent data retrieval, lookup and model acquisition method based on graph database
CN112182241A (en) Automatic construction method of knowledge graph in field of air traffic control
CN109408688A (en) A kind of unstructured data marking management method and system
CN102722368A (en) Plug-in software designing method based on document tree and message pump
CN113569543B (en) Implementation method of nuclear power engineering automatic report generation technology
CN110210025A (en) A kind of conversion method based on Text Feature Extraction
CN105468792B (en) A kind of fuzzy query method and system based on big data
CN101866370B (en) Processing method of subgrade engineering cross section designing template
CN109271479A (en) A kind of resume structuring processing method
CN115374765B (en) Computing power network 5G data analysis system and method based on natural language processing
CN116090101B (en) CATIA-based symmetrical part rapid modeling method
CN116861337A (en) Electric power engineering label draws and discernment platform based on fuse LSTM
CN116152451A (en) Multidimensional parameterized city information model construction method, system and computer equipment
CN108205564B (en) Knowledge system construction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190301