CN109408688A - A kind of unstructured data marking management method and system - Google Patents
A kind of unstructured data marking management method and system Download PDFInfo
- Publication number
- CN109408688A CN109408688A CN201811208798.3A CN201811208798A CN109408688A CN 109408688 A CN109408688 A CN 109408688A CN 201811208798 A CN201811208798 A CN 201811208798A CN 109408688 A CN109408688 A CN 109408688A
- Authority
- CN
- China
- Prior art keywords
- module
- data
- label
- unstructured data
- unstructured
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of unstructured data marking management method and systems, including tag control platform, the tag control platform includes that feature extraction module, memory management module, conversion loading module, data label module, access interface module and query processing module, tag control platform connecting pin are equipped with business abstract module and prosthetic processing module;The data label module includes tag creation module, label mark module and label memory module;The feature extraction module includes text abstraction module, image extraction module, audio abstraction module and video extraction module;The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text.The present invention realizes " one-stop " management of the data label creation for describing service attribute, conversion, storage using data label module by construction tag control platform, and it is horizontal to improve line business intelligent data analysis for raising big data utility value.
Description
Technical field
The present invention relates to data management field, in particular to a kind of unstructured data marking management method and system.
Background technique
The information that a large amount of crawlers crawl is mainly the data of various unformatteds, lacks formatting, standardized requirement, this
There is many-sided incomplete defect to the analysis of line business in a little data, will have a direct impact on the analysis result of line business.
Therefore, it invents a kind of unstructured data marking management method and system is necessary to solve the above problems.
Summary of the invention
The purpose of the present invention is to provide a kind of unstructured data marking management method and systems, by building label tube
Platform realizes " one-stop " pipe of data label creation, the conversion, storage of description service attribute using data label module
Reason improves big data utility value, improves line business intelligent data analysis level, mentioned above in the background art to solve
Problem.
To achieve the above object, the invention provides the following technical scheme: a kind of unstructured data marking management method and
System, including tag control platform, the tag control platform include feature extraction module, memory management module, conversion load
Module, data label module, access interface module and query processing module, tag control platform connecting pin are taken out equipped with business
As module and prosthetic processing module;
The data label module includes tag creation module, label mark module and label memory module;
The feature extraction module includes text abstraction module;
The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text;
The memory management module provides storage modeling function, can be inserted into, modifies and delete unstructured data;
The conversion loading module is used for the feature according to extraction to the text of common format, image, audio and video number
According to progress automatic processing;
The access interface module is used to cope with the interface requirement of query language, application program and Web service access;
The query processing module returns to function for providing result ranking and batch, and line range of going forward side by side is inquired, is complete
Query text, sample inquiry and semantic query, look into the unstructured data in management system using a variety of inquiry modes
It askes;
Business unstructured data is abstracted by the business abstract module, formulates the data standard for meeting business need;
The prosthetic processing module is specially user terminal, utilizes requirement pair that is artificial, handling according to unstructured data
Data carry out artificial treatment, adjust data, comply with the data standard of business need;
The tag creation module, label conversion module and label memory module are for realizing the data for describing service attribute
Label creation, conversion, the " one-stop " of storage manage.
Preferably, the memory management module supports the master datas classes such as shaping, floating type, Boolean type, character string, date
Type.
Preferably, the conversion loading module provides preliminary automatic structureization to unstructured data and handles.
Preferably, the query processing module supports inquiry unstructured data.
The invention also discloses a kind of unstructured data marking management methods, specifically includes the following steps:
Step 1 operates tag control platform, when unstructured data is transmitted to tag control platform, storage tube
It manages module and storage modeling is carried out according to the initial data of unstructured data, essential attribute, low-level image feature and semantic feature, so that
Unstructured data carries out conversion storage in tag control platform, be applicable to the functional module of tag control platform interior into
Row calculation process;
Step 2 is also handled into the unstructured processing in tag control platform by business abstract module, by industry
Being engaged in, unstructured data is abstract, and formulation meets the data standard of business need;
Step 3, feature extraction module in unstructured data text, image, audio and video data all kinds of tools
Body information and feature are extracted, and are carried out at preliminary automation using conversion loading module to the feature that text data extracts
Reason;
Step 4, staff is by user terminal, and using artificial, the requirement handled according to unstructured data is to data
Artificial treatment is carried out, data is adjusted, the data standard of business need is complied with, by artificial means, by the non-knot of downloading
Structure data are converted into the normal data that can describe service attribute, the final output for realizing standardized data.
Step 5 is carried out according to the text in the unstructured data of conversion load, image, audio and video data information
Classification, and sorted data are handled using data label module, to the carry out data mark for wherein describing service attribute
Label creation, conversion and the management stored, and the data after management are transmitted in user terminal.
Technical effect and advantage of the invention:
1, by construction tag control platform, realize that the data label of description service attribute is created using data label module
" one-stop " management building, convert, storing, improves big data utility value, it is horizontal to improve line business intelligent data analysis;
2, by the way that business to be abstracted, formulation meets the data standard of business need, will by automation and artificial means
The unstructured data of downloading is converted into the normal data that can describe service attribute, so that unstructured data management system is run
Efficiency obtained greatly being promoted.
Detailed description of the invention
Fig. 1 is overall structure of the present invention.
Fig. 2 is feature of present invention abstraction module structural schematic diagram.
Fig. 3 is data label modular structure schematic diagram of the present invention.
In figure: 1 tag control platform, 2 feature extraction modules, 3 memory management modules, 4 conversion loading modules, 5 data marks
Sign module, 6 access interface modules, 7 query processing modules, 8 business abstract modules, 9 prosthetic processing modules.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
A kind of unstructured data as shown in Figs. 1-3 marks management system, including tag control platform 1, the label
Managing platform 1 includes feature extraction module 2, memory management module 3, conversion loading module 4, data label module 5, access interface
Module 6 and query processing module 7,1 connecting pin of tag control platform are equipped with business abstract module 8 and prosthetic processing module 9;
The data label module 5 includes tag creation module, label mark module and label memory module;
The feature extraction module 2 includes text abstraction module;
The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text;
The memory management module 3 provides the conceptual level for covering initial data, essential attribute, low-level image feature and semantic feature
Modeling function is stored, the storage modeling function of logical layer is provided, it can be non-according to being inserted into, modifying and deleting in the storage example of foundation
Structural data;
The conversion loading module 4 is used for the feature according to extraction to the text of common format, image, audio and video number
According to progress automatic processing;
The access interface module 6 is used to cope with the interface requirement of query language, application program and Web service access;
The query processing module 7 returns to function for providing result ranking and batch, and line range of going forward side by side is inquired, is complete
Query text, sample inquiry and semantic query, look into the unstructured data in management system using a variety of inquiry modes
It askes;
Business unstructured data is abstracted by the business abstract module 8, formulates the data standard for meeting business need;
The prosthetic processing module 9 is specially user terminal, utilizes requirement pair that is artificial, handling according to unstructured data
Data carry out artificial treatment, adjust data, comply with the data standard of business need;
The tag creation module, label mark module and label memory module are for realizing the data for describing service attribute
Label creation, conversion, the " one-stop " of storage manage.
The memory management module 3 supports the basic data types such as shaping, floating type, Boolean type, character string, date, institute
It states conversion loading module 4 and automatic processing is carried out to the text data of common format according to the feature of extraction;The query processing
Module 7 supports inquiry unstructured data.
Embodiment 2
A kind of unstructured data marking management method as shown in Figs. 1-3, specifically includes the following steps:
Step 1 operates tag control platform 1, when unstructured data is transmitted to tag control platform 1, storage
Management module 3 carries out storage modeling according to the initial data of unstructured data, essential attribute, low-level image feature and semantic feature,
So that unstructured data carries out conversion storage in tag control platform 1, the function being applicable to inside tag control platform 1
Module carries out calculation process;
Step 2 is also handled by business abstract module 8 into the unstructured processing in tag control platform 1, will
Business unstructured data is abstract, formulates the data standard for meeting business need;
Step 3, feature extraction module 2 in unstructured data text, image, audio and video data it is all kinds of
Specifying information and feature are extracted, and carry out tentatively automatic to the feature that text data extracts using conversion loading module 4
Change processing;
Step 4, according in the unstructured data for failing automatic processing text, image, audio and video information into
Row classification, and sorted data are handled using data label module 5, to the carry out data for wherein describing service attribute
Label creation, conversion and the management stored, and the data after management are transmitted in user terminal;
Step 5, staff is by user terminal, and using artificial, the requirement handled according to unstructured data is to data
Carry out artificial treatment, adjust data, comply with the data standard of business need, by automating and artificial means, will under
The unstructured data of load is converted into the normal data that can describe service attribute, the final output for realizing standardized data.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention,
Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features,
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (5)
1. a kind of unstructured data marks management system, including tag control platform (1), it is characterised in that: the label tube
Platform (1) include feature extraction module (2), memory management module (3), conversion loading module (4), data label module (5),
Access interface module (6) and query processing module (7), tag control platform (1) connecting pin are equipped with business abstract module (8)
With prosthetic processing module (9);
The data label module (5) includes tag creation module, label mark module and label memory module;
The feature extraction module (2) includes text abstraction module;
The text abstraction module is used to extract stop words, TF-IDF feature and keyword from text;
The memory management module (3) provides storage modeling function, can be inserted into, modifies and delete unstructured data;
Conversion loading module (4) is used for text, image, audio and video data according to the feature of extraction to common format
Carry out automatic processing;
The access interface module (6) is used to cope with the interface requirement of query language, application program and Web service access;
The query processing module (7) returns to function, line range of going forward side by side inquiry, full text for providing result ranking and batch
Inquiry, sample inquiry and semantic query, inquire the unstructured data in management system using a variety of inquiry modes;
Business unstructured data is abstracted by the business abstract module (8), formulates the data standard for meeting business need;
The prosthetic processing module (9) is specially user terminal, utilizes requirement logarithm that is artificial, handling according to unstructured data
According to artificial treatment is carried out, data are adjusted, the data standard of business need is complied with;
The tag creation module, label conversion module and label memory module are for realizing the data label for describing service attribute
Creation, conversion, the " one-stop " of storage manage.
2. a kind of unstructured data according to claim 1 marks management system, it is characterised in that: the storage management
Module (3) supports the basic data types such as shaping, floating type, Boolean type, character string, date.
3. a kind of unstructured data according to claim 1 marks management system, it is characterised in that: the conversion load
Module (4) provides preliminary automatic structureization to unstructured data and handles.
4. a kind of unstructured data according to claim 1 marks management system, it is characterised in that: the query processing
Module (7) supports inquiry unstructured data.
5. a kind of unstructured data marking management method according to any one of claims 1-4, which is characterized in that tool
Body the following steps are included:
Step 1 operates tag control platform (1), when unstructured data is transmitted to tag control platform (1), storage
Management module (3) carries out storage according to the initial data of unstructured data, essential attribute, low-level image feature and semantic feature and builds
Mould is applicable in tag control platform (1) so that unstructured data carries out conversion storage in tag control platform (1)
The functional module in portion carries out calculation process;
Step 2 is also handled by business abstract module (8) into the unstructured processing in tag control platform (1), will
Business unstructured data is abstract, formulates the data standard for meeting business need;
Step 3, feature extraction module (2) in unstructured data text, image, audio and video data all kinds of tools
Body information and feature are extracted, and carry out tentatively automatic to the feature that text data extracts using conversion loading module (4)
Change processing;
Step 4, staff is by user terminal, using artificial, is carried out according to the requirement that unstructured data is handled to data
Artificial treatment adjusts data, the data standard of business need is complied with, by artificial means, by the unstructured of downloading
Data are converted into the normal data that can describe service attribute, the final output for realizing standardized data.
Step 5 is divided according to the text in the unstructured data of conversion load, image, audio and video data information
Class, and sorted data are handled using data label module (5), to the carry out data mark for wherein describing service attribute
Label creation, conversion and the management stored, and the data after management are transmitted in user terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811208798.3A CN109408688A (en) | 2018-10-17 | 2018-10-17 | A kind of unstructured data marking management method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811208798.3A CN109408688A (en) | 2018-10-17 | 2018-10-17 | A kind of unstructured data marking management method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109408688A true CN109408688A (en) | 2019-03-01 |
Family
ID=65468294
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811208798.3A Pending CN109408688A (en) | 2018-10-17 | 2018-10-17 | A kind of unstructured data marking management method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109408688A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399988A (en) * | 2019-07-31 | 2019-11-01 | 中国工商银行股份有限公司 | Equipment portrait generation method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060248129A1 (en) * | 2005-04-29 | 2006-11-02 | Wonderworks Llc | Method and device for managing unstructured data |
US20080235289A1 (en) * | 2005-04-29 | 2008-09-25 | Wonderworks Llc | Method and device for managing unstructured data |
CN102591896A (en) * | 2011-01-05 | 2012-07-18 | 北京大用科技有限责任公司 | System, implementation, application, and query language for a tetrahedral data model for unstructured data |
CN104217003A (en) * | 2014-09-15 | 2014-12-17 | 国家电网公司 | Data modeling system |
CN106202292A (en) * | 2016-06-30 | 2016-12-07 | 中国电力科学研究院 | A kind of standard information based on structural data model analyzes method |
CN108021632A (en) * | 2017-11-23 | 2018-05-11 | 中国移动通信集团河南有限公司 | Unstructured data and the mutual conversion process method of structural data |
-
2018
- 2018-10-17 CN CN201811208798.3A patent/CN109408688A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060248129A1 (en) * | 2005-04-29 | 2006-11-02 | Wonderworks Llc | Method and device for managing unstructured data |
US20080235289A1 (en) * | 2005-04-29 | 2008-09-25 | Wonderworks Llc | Method and device for managing unstructured data |
CN102591896A (en) * | 2011-01-05 | 2012-07-18 | 北京大用科技有限责任公司 | System, implementation, application, and query language for a tetrahedral data model for unstructured data |
CN104217003A (en) * | 2014-09-15 | 2014-12-17 | 国家电网公司 | Data modeling system |
CN106202292A (en) * | 2016-06-30 | 2016-12-07 | 中国电力科学研究院 | A kind of standard information based on structural data model analyzes method |
CN108021632A (en) * | 2017-11-23 | 2018-05-11 | 中国移动通信集团河南有限公司 | Unstructured data and the mutual conversion process method of structural data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399988A (en) * | 2019-07-31 | 2019-11-01 | 中国工商银行股份有限公司 | Equipment portrait generation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101937430B (en) | Method for extracting event sentence pattern from Chinese sentence | |
CN103631882B (en) | Semantization service generation system and method based on graph mining technique | |
CN104679867B (en) | Address method of knowledge processing and device based on figure | |
CN104361018B (en) | Electronic archives information reorganization method and device | |
CN106202292B (en) | Standard information analysis method based on structured data model | |
CN105095320A (en) | System for identifying, correlating, searching and displaying documents based on relationship superposition and combination | |
CN102184217A (en) | Emergency plan generating system and method | |
CN113298435A (en) | Intelligent construction scheme compiling method and system for building industry | |
CN110334214A (en) | A kind of method of false lawsuit in automatic identification case | |
CN109033523A (en) | A kind of Assembly process specification generation System and method for based on three-dimensional CAD model | |
CN112905685A (en) | Framework management and control system and equipment for informatization construction | |
CN112084248A (en) | Intelligent data retrieval, lookup and model acquisition method based on graph database | |
CN112182241A (en) | Automatic construction method of knowledge graph in field of air traffic control | |
CN109408688A (en) | A kind of unstructured data marking management method and system | |
CN102722368A (en) | Plug-in software designing method based on document tree and message pump | |
CN113569543B (en) | Implementation method of nuclear power engineering automatic report generation technology | |
CN110210025A (en) | A kind of conversion method based on Text Feature Extraction | |
CN105468792B (en) | A kind of fuzzy query method and system based on big data | |
CN101866370B (en) | Processing method of subgrade engineering cross section designing template | |
CN109271479A (en) | A kind of resume structuring processing method | |
CN115374765B (en) | Computing power network 5G data analysis system and method based on natural language processing | |
CN116090101B (en) | CATIA-based symmetrical part rapid modeling method | |
CN116861337A (en) | Electric power engineering label draws and discernment platform based on fuse LSTM | |
CN116152451A (en) | Multidimensional parameterized city information model construction method, system and computer equipment | |
CN108205564B (en) | Knowledge system construction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190301 |