CN110442671A - A kind of method and system of unstructured data processing - Google Patents

A kind of method and system of unstructured data processing Download PDF

Info

Publication number
CN110442671A
CN110442671A CN201910709505.8A CN201910709505A CN110442671A CN 110442671 A CN110442671 A CN 110442671A CN 201910709505 A CN201910709505 A CN 201910709505A CN 110442671 A CN110442671 A CN 110442671A
Authority
CN
China
Prior art keywords
data
keyword
unstructured data
unstructured
normalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910709505.8A
Other languages
Chinese (zh)
Inventor
陈万林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Baisheng Industrial Electronic Commerce Platform Development Co Ltd
Original Assignee
Shenzhen Baisheng Industrial Electronic Commerce Platform Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Baisheng Industrial Electronic Commerce Platform Development Co Ltd filed Critical Shenzhen Baisheng Industrial Electronic Commerce Platform Development Co Ltd
Priority to CN201910709505.8A priority Critical patent/CN110442671A/en
Publication of CN110442671A publication Critical patent/CN110442671A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

A kind of method and system of unstructured data processing of the present invention, this method comprises: obtaining unstructured data;According to preset resolution rules, the keyword that is extracted from unstructured data;Judge whether keyword is already present in key word library, if keyword does not exist in key word library, keyword is added in key word library;Unstructured data is normalized, obtains the normalization data of uniform format, and normalization data is stored in normalization numerical value library corresponding with key word library;Normalization data is formatted by user demand, the data after obtaining format conversion, the data after format is converted export.The present invention passes through the key word library being continuously replenished and improve in database, improves the flexibility analyzed unstructured data and handled, and by the way that unstructured data is normalized, improves the search efficiency and utilization efficiency of unstructured data.

Description

A kind of method and system of unstructured data processing
Technical field
The present invention relates to big data processing fields, and in particular to a kind of method and system of unstructured data processing.
Background technique
The spies such as the generally existing data volume of big data is big, discreteness is high, data noise is more, type is complicated, data source is polynary The pre-processing of point, big data is most important.If there are problems in terms of pre-processing for big data, rear issue can be directly resulted in According to utilization efficiency and data value etc. go wrong;If certain data can not be counted effectively without structuring According to storage and analysis, also just it is unable to fully improve the availability and utilization rate of data.
Specifically, the data type of big data is totally divided into structural data, semi-structured data and unstructured data, Wherein, unstructured data has become the major part of big data composition now, and as the high speed of big data technology is sent out Exhibition, the amount of unstructured data will be increasing.The data that present all trades and professions generate are multifarious, disorderly and unsystematic mostly , these data belong to unstructured data.Since unstructured data might not follow data structure (such as mould of standard The row and column of formula definition standard), therefore be not easy directly to be understood and utilized by computer program.
Currently, being usually the key in database demand predetermined to unstructured data analysis and processing method Word, but the flexibility when analyzing unstructured data and handling of the keyword of demand predetermined is poor, Wu Fashi Should various unstructured datas at present, therefore when analyzing unstructured data and handling demand predetermined keyword There is certain limitation.
Summary of the invention
What it is it is an object of the invention to solution is that existing unstructured data analysis and processing method flexibility are poor The problem of, to improve the search efficiency and utilization efficiency of unstructured data.
For this purpose, according in a first aspect, the embodiment of the invention discloses a kind of methods of unstructured data processing, comprising:
Obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;
According to unstructured data, the keyword of unstructured data is obtained, according to unstructured data, obtains unstructured number According to keyword include the keyword that is extracted from unstructured data according to preset resolution rules;
Judge whether keyword is already present in key word library, if keyword does not exist in key word library, by keyword It is added in key word library;
Unstructured data is normalized, obtains the normalization data of uniform format, and normalization data is stored In normalization numerical value library corresponding with key word library;
Normalization data is formatted by user demand, the data after obtaining format conversion, the number after format is converted According to being exported.
Optionally, after obtaining unstructured data further include:
Obtain status information relevant to unstructured data and environmental information, the relevant status information of unstructured data and ring Border information is corresponding with keyword;
By status information and environmental information storage into environmental state information corresponding with key word library library.
Optionally, unstructured data is normalized, obtains the normalization data of uniform format specifically:
Unstructured data is split and is filtered, independent data field is obtained;
Classify to independent data field, obtains classification data field;
Classification data field is subjected to format conversion, obtains the normalization data of uniform format.
Optionally, classify to independent data field, obtain classification data field and specifically include:
Obtain the keyword of independent data field;
The keyword of independent data field is subjected to duplicate removal, association and enhancing processing;
The keyword of independent data field is matched one by one with the keyword in key word library, obtains classification data field.
According to second aspect, the embodiment of the invention provides a kind of systems of unstructured data processing, comprising:
Data acquisition module, for obtaining unstructured data, the acquisition of unstructured data includes being manually entered or electronics Equipment acquisition;
Keyword obtains module, for the keyword of unstructured data being obtained, according to unstructured according to unstructured data Data, the keyword for obtaining unstructured data include being extracted from unstructured data according to preset resolution rules Keyword;
Keyword judgment module, for judging whether keyword is already present in key word library, if keyword does not exist in pass In key character library, then keyword is added in key word library;
Data normalization module obtains the normalization data of uniform format for unstructured data to be normalized, And normalization data is stored in normalization numerical value library corresponding with key word library;
Data convert output module, for formatting normalization data by user demand, after obtaining format conversion Data, the data after format is converted export.
Optionally, further includes:
Relevant information obtains module, unstructured for obtaining status information relevant to unstructured data and environmental information The relevant status information of data and environmental information are corresponding with keyword;
Relevant information memory module, for believing status information and environmental information storage to ambient condition corresponding with key word library It ceases in library.
Optionally, data normalization module includes:
Divide filter element and obtains independent data field for being split and filtering to unstructured data;
Data sorting unit obtains classification data field for classifying to independent data field;
Format conversion unit obtains the normalization data of uniform format for classification data field to be carried out format conversion.
Optionally, data sorting unit includes:
First operation subelement, for obtaining the keyword of independent data field;
Second operation subelement, is handled for the keyword of independent data field to be carried out duplicate removal, association and enhancing;
Third operates subelement, for carrying out the keyword in the keyword and key word library of independent data field one by one Match, obtains classification data field.
According to the third aspect, the present invention provides a kind of computing terminal, including processor, processor is for executing memory The method that the computer program of middle storage realizes above-mentioned first aspect any one.
According to fourth aspect, the present invention provides a kind of computer readable storage mediums, are stored thereon with computer program, The method that storage medium is used to store computer program to realize above-mentioned first aspect any one.
The beneficial effects of the present invention are:
A kind of method and system of unstructured data processing of the present invention, the method for unstructured data processing includes following step It is rapid: to obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to non-structural Change data, obtains the keyword of unstructured data, according to unstructured data, the keyword for obtaining unstructured data includes According to preset resolution rules, the keyword that is extracted from unstructured data;Judge whether keyword is already present on In key word library, if keyword does not exist in key word library, keyword is added in key word library;To unstructured number According to being normalized, the normalization data of uniform format is obtained, and normalization data is stored in corresponding with key word library Normalization numerical value library in;Normalization data is formatted by user demand, the data after obtaining format conversion, by lattice Data after formula conversion are exported.Technical solution of the present invention passes through the keyword for obtaining unstructured data first, then will Judge whether the keyword is already present in database, if not there is no the keyword in database, which is added It is added in database, can be continuously replenished and improve in this way the key word library in database, improve to unstructured data The flexibility of analysis and processing adapts to various unstructured datas at present, and the present invention is by carrying out unstructured data Normalized helps unstructured data carrying out structuring, formatting, standardization, improves looking into for unstructured data Ask efficiency and utilization efficiency.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those skilled in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is the structural block diagram of unstructured database in the embodiment of the present invention;
Fig. 2 is a kind of implementation flow chart of the method for unstructured data processing in the embodiment of the present invention;
Fig. 3 is a kind of one of structural schematic diagram of system of unstructured data processing in the embodiment of the present invention;
Fig. 4 is a kind of second structural representation of the system of unstructured data processing in the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of data normalization module in the embodiment of the present invention;
Fig. 6 is the structural schematic diagram of data sorting unit in the embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Unstructured data refers to the inconvenient data showed with database two dimension logical table, and unstructured data is meter Calculation machine or life at text information, data therein might not follow the data structure of standard (such as pattern definition specification Row and column), it is not easy to directly understood and utilized by computer program.Unstructured data includes all format texts, picture, each Class report, image, audio and video information etc., unstructured data may be obtained from all trades and professions by various modes.
In an embodiment of the present invention, referring to FIG. 1, Fig. 1 is the structure of unstructured database in the embodiment of the present invention Include key word library 10, normalization numerical value library 11 and environmental state information library 12, key word library in block diagram unstructured database 10, normalizing the corresponding relationship between numerical value library 11 and environmental state information library 12 is to correspond.Wherein, key word library 10 is used Store the keyword of unstructured data, normalization numerical value library 11 is used to store after unstructured data is normalized to obtain Normalization data, environmental state information library 12 be used to store unstructured data generate when, it is relevant to unstructured data Status information and environmental information.
A kind of method of unstructured data processing provided by the invention is explained in detail below, referring to FIG. 2, figure The implementation flow chart of 2 methods handled for unstructured data a kind of in the embodiment of the present invention, unstructured data processing Method includes:
S101, obtains unstructured data, and the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;
In the embodiment of the present invention, before the keyword for extracting unstructured data, need to obtain unstructured data, it is available The data of the various formats inputted by way of man-machine interactively can also call various sensors or detection device to collect Various formats data, the present invention is to this and without limitation.
S102 obtains the keyword of unstructured data according to unstructured data, according to unstructured data, obtains The keyword of unstructured data includes the keyword that extracts from unstructured data according to preset resolution rules.
In inventive embodiments, preset resolution rules include that customized resolution rules and system are pre- in advance by user The resolution rules first configured, resolution rules can be regular expression rule or other forms be able to achieve extract it is unstructured The rule of critical field in data defines the operation rules for extracting critical field in unstructured data in resolution rules.
It should be noted that unstructured data processing system is right in order to improve the analyzing efficiency of unstructured data When unstructured data is parsed, first unstructured data can be solved using system preconfigured resolution rules Analysis, to obtain the keyword of unstructured data.If be unable to complete using the preconfigured resolution rules of system to non-structural Change the parsing of data, reusing user, customized resolution rules parse unstructured data in advance, to obtain non-knot The keyword of structure data.
It should be noted that extracting the keyword of unstructured data, and store a key in advantageous in key word library Unstructured data is managed and is adjusted in user, has stored a key in the non-of key word library when user calls again Structural data, system do not need to scan for all unstructured datas saved in database, it is only necessary to according to non-knot The keyword of structure data carries out simple retrieval, further according between key word library, normalization numerical value library and environmental state information library One-to-one relationship, complete unstructured data can be obtained.
S103, judges whether keyword is already present in key word library, if keyword does not exist in key word library, Keyword is added in key word library.
In an embodiment of the present invention, unstructured data processing system is by the keyword and keyword of unstructured data All keywords in library are compareed one by one, if existing identical with the keyword of unstructured data in key word library Keyword, then the keyword of unstructured data is no longer added to key word library by unstructured data processing system, if closed Not there is no keyword identical with the keyword of unstructured data in key character library, then adds the keyword of unstructured data To key word library, it can be continuously replenished and be improved in this way the key word library in database, improve and unstructured data is analyzed With the flexibility of processing, it is adapted to current various unstructured datas.
Unstructured data is normalized in S104, obtains the normalization data of uniform format, and will normalization Data are stored in normalization numerical value library corresponding with key word library.
In an embodiment of the present invention, according to key already present in the keyword of unstructured data and key word library Unstructured data is normalized in word, obtains the normalization data of uniform format, and normalization data is stored in In normalization numerical value library corresponding with key word library, specific normalized process provides in the following embodiments.
It should be noted that the normalization of non-structural data refer to by by unstructured data be split and filtering at Reason obtains independent data field, and the whole process of classification and format conversion, non-structural data are carried out to independent data field After normalized, obtained normalization data belongs to structural data, and normalization data can be carried out by database Storage is also convenient for computer and carries out operation.
S105 is formatted normalization data by user demand, and the data after obtaining format conversion turn format Data after changing are exported.
In an embodiment of the present invention, it when user needs to check unstructured data again, can be carried out based on keyword Retrieval, unstructured data can be called by key search user, the user for needing to illustrate can by format conversion, The various data formats that the data being stored in normalization numerical value library are converted into user demand are exported again, such as text formatting, sound Frequency format, video format etc..
Further, as a kind of optional embodiment of the present embodiment, unstructured data is obtained in step S101 Later, further comprising the steps of:
S201 obtains status information relevant to unstructured data and environmental information, the relevant state letter of unstructured data Breath and environmental information are corresponding with keyword.
In an embodiment of the present invention, extract unstructured data keyword before, it is also necessary to obtain with it is non-structural Change the relevant status information of data and environmental information, the relevant status information of unstructured data includes temporal information, longitude letter Breath, latitude information, altitude information, related personnel's information, dependent event information etc., the relevant environmental information of unstructured data Including temperature information, pressure information etc..
For example, it is assumed that the unstructured data got is event information, then needing simultaneously in data acquisition phase Obtain the event information occur when the specific time, altitude information (longitude, latitude), the event information occur when temperature, Atmospheric pressure environment etc., while also needing to obtain that related personnel's information, dependent event information occurs with the event information.
S202, by status information and environmental information storage into environmental state information corresponding with key word library library.
In an embodiment of the present invention, the status information that needs will acquire, environmental information storage is arrived and key word library pair In the environmental state information library answered, facilitate it is subsequent be normalized data analysis and user's later period calling.
Further, as a kind of optional embodiment of the present embodiment, step S104 carries out unstructured data Normalized obtains the normalization data of uniform format, specifically includes the following steps:
S301 is split unstructured data and filters, and obtains independent data field.
It should be noted that the complexity of unstructured data is higher, therefore needed when carrying out unstructured data processing Unstructured data is split and is filtered, after unstructured data is converted into independent data field, the complexity of data It is substantially reduced.
S302 classifies to independent data field, obtains classification data field.
In an embodiment of the present invention, after obtaining independent data field, the keyword of independent data field is being closed Compareed one by one in key character library, by the data value storage of same keyword or the independent data field of similar keyword in In the corresponding normalization numerical value library of its key word library, to complete to classify to independent data field.
For example, unstructured data processing system obtains after unstructured data is split and is filtered Independent data field remains unordered, therefore after obtaining independent data field, it is also necessary to according to independent data field Keyword is classified, to facilitate subsequent storage and calling.
Classification data field is carried out format conversion, obtains the normalization data of uniform format by S303.
In an embodiment of the present invention, after completing to classify to independent data field, it is also necessary to what is classified Independent data field carry out format conversion, format conversion here be usually by the format conversion of independent data field for convenience Normalize the format of numerical value library storage.
For example, a kind of normalization data library is as follows:
A kind of normalization data library of table 1
Object Product Or Thing Object It compiles Code Number According to Knot Fruit It compiles Code Through Degree Latitude Degree Sea It pulls out Hair It is raw When Between Note Record People Phase It closes People Phase It closes Thing Part It compiles Code Ring Border Item Part It compiles Code It closes Key Word It compiles Code Meter Amount It is single Position It is fixed Amount Value It is fixed Amount Model It encloses It rises Point It is fixed Amount Model It encloses Eventually Point It is fixed Property Value Hair It is raw It is secondary Number Wind Danger System Number Sternly Weight Journey Degree Hair It is raw Generally Rate It is hidden Contain Property Generally Rate Scape It retouches It states It is former Cause Point Analysis Consequence Analysis
Normalization data library as shown in Table 1, every data therein are all 24 dimension datas, and the dimension of every data is all The performance that identical exactly data are normalized.In the present embodiment, the independence obtained after unstructured data being normalized Data field, then independent data field correspondence is stored in the corresponding position in normalization data library.
For example, certain video conference display products finds occur mist spot on screen after being sent to somewhere, collects and shields Occur the relevant various information of mist spot this event on curtain, these information can regard one as since format is different Kind unstructured data, after the unstructured data is normalized using technical solution of the present invention, obtains following knot Fruit:
Article or things encode A100009
Data result encodes R1000078
Longitude 114.93
Latitude 25.83
109 meters of height above sea level
10 minutes and 40 seconds 15 points of time of origin on July 10th, 2018
Recorder Zhang San
Relevant people Li Si
Dependent event encodes A30008801
Environmental condition encodes C50000002
Keyword encodes F20001
Measurement unit square millimeter
Quantitative values 16000
Quantification range starting point 9000
Quantification range terminal 21000
The thick mist spot of qualitative value
Frequency 2
Risk factor 0.7 × 0.6 × 0.1=0.042
Severity 0.7
Probability of happening 0.6
Implicity probability 0.1
One video conference display products of scene description find mist spot occur on screen after being sent to the ground A
The analysis of causes may cause screen intrinsic fog spot since the weather humidity on the ground A is larger
Consequences analysis mist fleck rings the display effect of video conference display screen
In above-mentioned normalization data result, scene description refers to that this unstructured data is shown from a video conference Screen products find mist spot occur on screen after being sent to the ground A;To article or things, keyword, environmental condition, dependent event, number Encoded according to result is that normalization data is facilitated to be stored;Quantitative values are the practical big of this display screen mist spot area Small, qualitative value refers to that the attribute of this mist spot is thick mist spot;Quantification range starting point, quantification range terminal refer to display screen mist spot face The range that product is likely to occur;Severity, probability of happening, implicity probability pass through statistics and obtain, and numerical value is between 0 to 1; Risk factor be severity, probability of happening, implicity probability this three product;The analysis of causes: may be due to the weather on the ground A Humidity is larger and causes screen intrinsic fog spot;Consequences analysis: mist fleck rings the display effect of video conference display screen.
Further, as a kind of optional embodiment of the present embodiment, step S302 carries out independent data field Classification, obtains classification data field, specifically includes the following steps:
S401 obtains the keyword of independent data field.
In an embodiment of the present invention, it needs to extract each independent data according to resolution rules preset in S101 The keyword of field, it should be noted that each independent data field is corresponding with a keyword.
The keyword of independent data field is carried out duplicate removal, association and enhancing and handled by S402.
In an embodiment of the present invention, the independent data field of same keyword is subjected to keyword duplicate removal processing, it will be same The independent data field of class keyword carries out keyword association processing, and the independent data field of different keywords is carried out keyword Enhancing processing, to protrude the otherness of independent data field, facilitates the classification of subsequent independent data field.
For example, multiple independent data fields relevant with " temperature " are partitioned into some unstructured data, then this Multiple keywords extracted with " temperature " relevant independent data field are related to " temperature ", it is therefore desirable to independent digit Duplicate removal processing is carried out according to the keyword of field, only retains a keyword.
The keyword of independent data field is matched one by one with the keyword in key word library, is classified by S403 Data field.
In an embodiment of the present invention, the keyword in the keyword and key word library of independent data field is carried out one by one Matching, classifies to independent data field according to the matching result of keyword, and by the data value storage of independent data field In normalization numerical value library corresponding with key word library, so far, unstructured data has turned to tie by normalized Structure data.It should be noted that each independent data field is corresponding with a keyword and corresponding data value.
In conclusion unstructured data is normalized, detailed process is as follows:
Firstly, unstructured data to be split and filter, independent data field is obtained.Secondly, according to being set in advance in S102 Fixed resolution rules extract the keyword of each independent data field.Again, it will extract from unstructured data Keyword carries out duplicate removal, association and enhancing processing.Finally, by the keyword in the keyword and key word library of independent data field It is matched, is classified according to the matching result of keyword to independent data field one by one, and by the number of independent data field It is stored in normalization numerical value library corresponding with key word library according to value.
Further, as a kind of optional embodiment of the present embodiment, step S104 carries out unstructured data Normalized obtains the normalization data of uniform format, and normalization data is stored in normalizing corresponding with key word library It is further comprising the steps of after changing in numerical value library:
S501 carries out data analysis to normalization data, and data analysis includes scene description, the analysis of causes, severity analysis, hair Raw probability calculation, implicity analysis, risk factor analysis, potential failure mode and consequences analysis and related personnel's analysis.
In an embodiment of the present invention, to normalization data carry out data analyze when, need combine with it is non-structural The information for changing the relevant status information of data, environmental information and normalization data itself is analyzed.
For example, it is assumed that the unstructured data got is event information, carries out data data are normalized When being analyzed, unstructured data processing system can pass through the letter of status information, environmental information and normalization data itself Specific scene, Producing reason, the probability of generation, existing risk and the consequence of generation etc. that analysis event occurs are ceased, And it also requires analyzing in conjunction with related personnel's information, dependent event information, a comprehensive analysis is obtained as a result, and should As a result it stores in database, user is facilitated to call.
In conclusion this method is non-structural by obtaining the invention discloses a kind of method of unstructured data processing Change data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to unstructured data, obtain non- The keyword of structural data, according to unstructured data, the keyword for obtaining unstructured data includes that basis is preset Resolution rules, the keyword extracted from unstructured data;Judge whether keyword is already present in key word library, if Keyword does not exist in key word library, then keyword is added in key word library;Unstructured data is normalized Processing, obtains the normalization data of uniform format, and normalization data is stored in normalization numerical value corresponding with key word library In library;Normalization data is formatted by user demand, the data after obtaining format conversion, the number after format is converted According to being exported.Technical solution of the present invention passes through the keyword for obtaining unstructured data first, then will judge the keyword Whether it is already present in database, if not there is no the keyword in database, which is added to database and is worked as In, can be continuously replenished and improve in this way the key word library in database, improve to unstructured data analyze and handle Flexibility adapts to various unstructured datas at present, and the present invention has by the way that unstructured data is normalized Help unstructured data carrying out structuring, formatting, standardization, improve the search efficiency of unstructured data and utilizes effect Rate.
The optional embodiment of the embodiment of the present invention is described in detail above, still, the embodiment of the present invention is not limited to The detail in embodiment is stated, it, can be to the skill of the embodiment of the present invention in the range of the technology design of the embodiment of the present invention Art scheme carries out a variety of simple variants, these simple variants belong to the protection scope of the embodiment of the present invention.
Referring to FIG. 3, Fig. 3 be the system of a kind of unstructured data processing in the embodiment of the present invention structural schematic diagram it One, the unstructured data processing system include:
Data acquisition module 101, for obtaining unstructured data, the acquisition of unstructured data includes being manually entered or electricity Sub- equipment acquisition;
Keyword obtains module 102, for the keyword of unstructured data being obtained, according to non-knot according to unstructured data Structure data, the keyword for obtaining unstructured data includes according to preset resolution rules, from unstructured data The keyword of extraction;
Keyword judgment module 103, for judging whether keyword is already present in key word library, if keyword does not exist in In key word library, then keyword is added in key word library;
Data normalization module 104 obtains the normalization number of uniform format for unstructured data to be normalized According to, and normalization data is stored in normalization numerical value library corresponding with key word library;
Data convert output module 105, for formatting normalization data by user demand, after obtaining format conversion Data, the data after format is converted export.
Referring to FIG. 4, Fig. 4 be the system of a kind of unstructured data processing in the embodiment of the present invention structural schematic diagram it Two, the system of unstructured data processing further include:
Relevant information obtains module 106, non-structural for obtaining status information relevant to unstructured data and environmental information Change the relevant status information of data and environmental information is corresponding with keyword;
Relevant information memory module 107, for status information and environmental information storage to be arrived environment shape corresponding with key word library In state information bank.
Referring to FIG. 5, Fig. 5 is the structural schematic diagram of data normalization module in the embodiment of the present invention, data normalization mould Block 104 includes:
Divide filter element 411 and obtains independent data field for being split and filtering to unstructured data;
Data sorting unit 412 obtains classification data field for classifying to independent data field;
Format conversion unit 413 obtains the normalization data of uniform format for classification data field to be carried out format conversion.
Referring to FIG. 6, Fig. 6 is the structural schematic diagram of data sorting unit in the embodiment of the present invention, data sorting unit 412 Include:
First operation subelement 421, for obtaining the keyword of independent data field;
Second operation subelement 422, is handled for the keyword of independent data field to be carried out duplicate removal, association and enhancing;
Third operates subelement 423, for carrying out the keyword in the keyword and key word library of independent data field one by one Matching, obtains classification data field.
Other implementation details and beneficial effect of the system of the unstructured data processing of the embodiment of the present invention can refer to The embodiment of the method about unstructured data processing is stated, details are not described herein.
The optional embodiment of the embodiment of the present invention is described in detail in conjunction with attached drawing above, still, the embodiment of the present invention is simultaneously The detail being not limited in above embodiment can be to of the invention real in the range of the technology design of the embodiment of the present invention The technical solution for applying example carries out a variety of simple variants, these simple variants belong to the protection scope of the embodiment of the present invention.
In addition, also providing a kind of computer installation in the embodiment of the present invention, processor passes through computer instructions, thus Realize following methods:
Obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to non-knot Structure data obtain the keyword of unstructured data, according to unstructured data, obtain the keyword packet of unstructured data Include the keyword extracted from unstructured data according to preset resolution rules;Judge whether keyword has existed In key word library, if keyword does not exist in key word library, keyword is added in key word library;To unstructured Data are normalized, and obtain the normalization data of uniform format, and normalization data is stored in and key word library pair In the normalization numerical value library answered;Normalization data is formatted by user demand, the data after obtaining format conversion will Data after format conversion are exported.
It is that can lead to it will be understood by those skilled in the art that realizing all or part of the process in above-described embodiment method Computer program is crossed to instruct relevant hardware and complete, program can be stored in a computer-readable storage medium, should Program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic disk, light Disk, read-only memory (ROM) or random access memory (RAM) etc..Computer processor is for executing in storage medium The computer program of storage realizes following methods:
Obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to non-knot Structure data obtain the keyword of unstructured data, according to unstructured data, obtain the keyword packet of unstructured data Include the keyword extracted from unstructured data according to preset resolution rules;Judge whether keyword has existed In key word library, if keyword does not exist in key word library, keyword is added in key word library;To unstructured Data are normalized, and obtain the normalization data of uniform format, and normalization data is stored in and key word library pair In the normalization numerical value library answered;Normalization data is formatted by user demand, the data after obtaining format conversion will Data after format conversion are exported.
What has been described above is only an embodiment of the present invention, and the common sense such as well known specific structure and characteristic are not made herein in scheme Excessive description.It, without departing from the structure of the invention, can be with it should be pointed out that for those skilled in the art Make several modifications and improvements.These also should be considered as protection scope of the present invention, these all will not influence what the present invention was implemented Effect and patent practicability.The scope of protection required by this application should be based on the content of the claims, in specification The records such as specific embodiment can be used for explaining the content of claim.

Claims (10)

1. a kind of method of unstructured data processing, which is characterized in that the described method includes:
Obtain unstructured data, the acquisition of the unstructured data includes being manually entered or electronic equipment acquisition;
According to the unstructured data, the keyword of the unstructured data is obtained, it is described according to the unstructured number According to the keyword for obtaining the unstructured data includes according to preset resolution rules, from the unstructured data The keyword of middle extraction;
Judge whether the keyword is already present in key word library, if the keyword does not exist in the key word library In, then the keyword is added in the key word library;
The unstructured data is normalized, obtains the normalization data of uniform format, and by the normalization Data are stored in normalization numerical value library corresponding with the key word library;
The normalization data is formatted by user demand, the data after obtaining format conversion turn the format Data after changing are exported.
2. the method for unstructured data processing as described in claim 1, which is characterized in that obtain unstructured number described According to later further include:
Obtain relevant to unstructured data status information and environmental information, the relevant state of the unstructured data Information and environmental information are corresponding with the keyword;
By the status information and environmental information storage into environmental state information corresponding with key word library library.
3. the method for unstructured data as described in claim 1 processing, which is characterized in that the unstructured data into Row normalized obtains the normalization data of uniform format specifically:
The unstructured data is split and is filtered, independent data field is obtained;
Classify to the independent data field, obtains classification data field;
The classification data field is subjected to format conversion, obtains the normalization data of uniform format.
4. the method for unstructured data processing as claimed in claim 3, which is characterized in that described to the independent data word Duan Jinhang classification, obtains classification data field and specifically includes:
Obtain the keyword of the independent data field;
The keyword of the independent data field is subjected to duplicate removal, association and enhancing processing;
The keyword of the independent data field is matched one by one with the keyword in the key word library, obtains institute State classification data field.
5. a kind of system of unstructured data processing, which is characterized in that the system comprises:
Data acquisition module, for obtaining unstructured data, the acquisition of the unstructured data include be manually entered or Electronic equipment acquisition;
Keyword obtains module, described for obtaining the keyword of the unstructured data according to the unstructured data According to the unstructured data, obtain the unstructured data keyword include according to preset resolution rules, The keyword extracted from the unstructured data;
Keyword judgment module, for judging whether the keyword is already present in key word library, if the keyword is not It is present in the key word library, then the keyword is added in the key word library;
Data normalization module obtains the normalization of uniform format for the unstructured data to be normalized Data, and the normalization data is stored in normalization numerical value library corresponding with the key word library;
Data convert output module, for formatting the normalization data by user demand, obtain format conversion Data afterwards export the data after format conversion.
6. the system of unstructured data processing as claimed in claim 5, which is characterized in that further include:
Relevant information obtains module, described for obtaining status information relevant to the unstructured data and environmental information The relevant status information of unstructured data and environmental information are corresponding with the keyword;
Relevant information memory module, it is corresponding with the key word library for arriving the status information and environmental information storage Environmental state information library in.
7. the system of unstructured data processing as claimed in claim 6, which is characterized in that the data normalization module packet It includes:
Divide filter element and obtains independent data field for the unstructured data to be split and filtered;
Data sorting unit obtains classification data field for classifying to the independent data field;
Format conversion unit obtains the normalization data of uniform format for the classification data field to be carried out format conversion.
8. the system of unstructured data processing as claimed in claim 6, which is characterized in that the data sorting unit packet It includes:
First operation subelement, for obtaining the keyword of the independent data field;
Second operation subelement, is handled for the keyword of the independent data field to be carried out duplicate removal, association and enhancing;
Third operates subelement, for by the keyword in the keyword of the independent data field and the key word library It is matched one by one, obtains the classification data field.
9. a kind of computing terminal, which is characterized in that the computing terminal includes processor, and the processor is for executing memory The method that the computer program of middle storage is handled with the unstructured data realized such as claim 1-4 any one.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the storage medium is used In method of the storage computer program to realize the unstructured data processing as described in claim 1-4 any one.
CN201910709505.8A 2019-08-02 2019-08-02 A kind of method and system of unstructured data processing Pending CN110442671A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910709505.8A CN110442671A (en) 2019-08-02 2019-08-02 A kind of method and system of unstructured data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910709505.8A CN110442671A (en) 2019-08-02 2019-08-02 A kind of method and system of unstructured data processing

Publications (1)

Publication Number Publication Date
CN110442671A true CN110442671A (en) 2019-11-12

Family

ID=68432928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910709505.8A Pending CN110442671A (en) 2019-08-02 2019-08-02 A kind of method and system of unstructured data processing

Country Status (1)

Country Link
CN (1) CN110442671A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241177A (en) * 2019-12-31 2020-06-05 中国联合网络通信集团有限公司 Data acquisition method, system and network equipment
CN113703409A (en) * 2021-08-31 2021-11-26 中冶华天南京工程技术有限公司 Belt flow data acquisition and control system for iron and steel enterprise

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200478A1 (en) * 2005-03-02 2006-09-07 Egon Pasztor Generating structured information
CN104239506A (en) * 2014-09-12 2014-12-24 北京优特捷信息技术有限公司 Unstructured data processing method and device
CN105183916A (en) * 2015-10-16 2015-12-23 辽宁工程技术大学 Device and method for managing unstructured data
CN109033319A (en) * 2018-07-18 2018-12-18 长扬科技(北京)有限公司 A kind of big data log method for normalizing and tool

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200478A1 (en) * 2005-03-02 2006-09-07 Egon Pasztor Generating structured information
CN104239506A (en) * 2014-09-12 2014-12-24 北京优特捷信息技术有限公司 Unstructured data processing method and device
CN105183916A (en) * 2015-10-16 2015-12-23 辽宁工程技术大学 Device and method for managing unstructured data
CN109033319A (en) * 2018-07-18 2018-12-18 长扬科技(北京)有限公司 A kind of big data log method for normalizing and tool

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241177A (en) * 2019-12-31 2020-06-05 中国联合网络通信集团有限公司 Data acquisition method, system and network equipment
CN113703409A (en) * 2021-08-31 2021-11-26 中冶华天南京工程技术有限公司 Belt flow data acquisition and control system for iron and steel enterprise

Similar Documents

Publication Publication Date Title
Thom et al. Spatiotemporal anomaly detection through visual analysis of geolocated twitter messages
CN110428091B (en) Risk identification method based on data analysis and related equipment
CN110740141A (en) integration network security situation perception method, device and computer equipment
CN110275965B (en) False news detection method, electronic device and computer readable storage medium
US11221904B2 (en) Log analysis system, log analysis method, and log analysis program
CN109582551A (en) Daily record data analytic method, device, computer equipment and storage medium
US20180349250A1 (en) Content-level anomaly detector for systems with limited memory
CN111045847A (en) Event auditing method and device, terminal equipment and storage medium
US20150205862A1 (en) Method and device for recognizing and labeling peaks, increases, or abnormal or exceptional variations in the throughput of a stream of digital documents
CN113254255B (en) Cloud platform log analysis method, system, device and medium
CN110442671A (en) A kind of method and system of unstructured data processing
CN106202126B (en) A kind of data analysing method and device for logistics monitoring
CN107526820A (en) A kind of more storehouse enterprise innovation monitoring big data normal data base construction methods of multi-source
CN114896305A (en) Smart internet security platform based on big data technology
CN111866196A (en) Domain name traffic characteristic extraction method, device, equipment and readable storage medium
US20210350160A1 (en) System And Method For An Activity Based Intelligence Contextualizer
CN110889451B (en) Event auditing method, device, terminal equipment and storage medium
CN112433874A (en) Fault positioning method, system, electronic equipment and storage medium
CN111753070A (en) System and method for processing server monitoring log
Apostol et al. ContCommRTD: A distributed content-based misinformation-aware community detection system for real-time disaster reporting
CN110874366A (en) Data processing and query method and device
CN113015171A (en) System with network public opinion monitoring and analyzing functions
Girish et al. Extreme event detection and management using twitter data analysis
CN112306820A (en) Log operation and maintenance root cause analysis method and device, electronic equipment and storage medium
CN113139043A (en) Question and answer sample generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination