CN110442671A - A kind of method and system of unstructured data processing - Google Patents
A kind of method and system of unstructured data processing Download PDFInfo
- Publication number
- CN110442671A CN110442671A CN201910709505.8A CN201910709505A CN110442671A CN 110442671 A CN110442671 A CN 110442671A CN 201910709505 A CN201910709505 A CN 201910709505A CN 110442671 A CN110442671 A CN 110442671A
- Authority
- CN
- China
- Prior art keywords
- data
- keyword
- unstructured data
- unstructured
- normalization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
A kind of method and system of unstructured data processing of the present invention, this method comprises: obtaining unstructured data;According to preset resolution rules, the keyword that is extracted from unstructured data;Judge whether keyword is already present in key word library, if keyword does not exist in key word library, keyword is added in key word library;Unstructured data is normalized, obtains the normalization data of uniform format, and normalization data is stored in normalization numerical value library corresponding with key word library;Normalization data is formatted by user demand, the data after obtaining format conversion, the data after format is converted export.The present invention passes through the key word library being continuously replenished and improve in database, improves the flexibility analyzed unstructured data and handled, and by the way that unstructured data is normalized, improves the search efficiency and utilization efficiency of unstructured data.
Description
Technical field
The present invention relates to big data processing fields, and in particular to a kind of method and system of unstructured data processing.
Background technique
The spies such as the generally existing data volume of big data is big, discreteness is high, data noise is more, type is complicated, data source is polynary
The pre-processing of point, big data is most important.If there are problems in terms of pre-processing for big data, rear issue can be directly resulted in
According to utilization efficiency and data value etc. go wrong;If certain data can not be counted effectively without structuring
According to storage and analysis, also just it is unable to fully improve the availability and utilization rate of data.
Specifically, the data type of big data is totally divided into structural data, semi-structured data and unstructured data,
Wherein, unstructured data has become the major part of big data composition now, and as the high speed of big data technology is sent out
Exhibition, the amount of unstructured data will be increasing.The data that present all trades and professions generate are multifarious, disorderly and unsystematic mostly
, these data belong to unstructured data.Since unstructured data might not follow data structure (such as mould of standard
The row and column of formula definition standard), therefore be not easy directly to be understood and utilized by computer program.
Currently, being usually the key in database demand predetermined to unstructured data analysis and processing method
Word, but the flexibility when analyzing unstructured data and handling of the keyword of demand predetermined is poor, Wu Fashi
Should various unstructured datas at present, therefore when analyzing unstructured data and handling demand predetermined keyword
There is certain limitation.
Summary of the invention
What it is it is an object of the invention to solution is that existing unstructured data analysis and processing method flexibility are poor
The problem of, to improve the search efficiency and utilization efficiency of unstructured data.
For this purpose, according in a first aspect, the embodiment of the invention discloses a kind of methods of unstructured data processing, comprising:
Obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;
According to unstructured data, the keyword of unstructured data is obtained, according to unstructured data, obtains unstructured number
According to keyword include the keyword that is extracted from unstructured data according to preset resolution rules;
Judge whether keyword is already present in key word library, if keyword does not exist in key word library, by keyword
It is added in key word library;
Unstructured data is normalized, obtains the normalization data of uniform format, and normalization data is stored
In normalization numerical value library corresponding with key word library;
Normalization data is formatted by user demand, the data after obtaining format conversion, the number after format is converted
According to being exported.
Optionally, after obtaining unstructured data further include:
Obtain status information relevant to unstructured data and environmental information, the relevant status information of unstructured data and ring
Border information is corresponding with keyword;
By status information and environmental information storage into environmental state information corresponding with key word library library.
Optionally, unstructured data is normalized, obtains the normalization data of uniform format specifically:
Unstructured data is split and is filtered, independent data field is obtained;
Classify to independent data field, obtains classification data field;
Classification data field is subjected to format conversion, obtains the normalization data of uniform format.
Optionally, classify to independent data field, obtain classification data field and specifically include:
Obtain the keyword of independent data field;
The keyword of independent data field is subjected to duplicate removal, association and enhancing processing;
The keyword of independent data field is matched one by one with the keyword in key word library, obtains classification data field.
According to second aspect, the embodiment of the invention provides a kind of systems of unstructured data processing, comprising:
Data acquisition module, for obtaining unstructured data, the acquisition of unstructured data includes being manually entered or electronics
Equipment acquisition;
Keyword obtains module, for the keyword of unstructured data being obtained, according to unstructured according to unstructured data
Data, the keyword for obtaining unstructured data include being extracted from unstructured data according to preset resolution rules
Keyword;
Keyword judgment module, for judging whether keyword is already present in key word library, if keyword does not exist in pass
In key character library, then keyword is added in key word library;
Data normalization module obtains the normalization data of uniform format for unstructured data to be normalized,
And normalization data is stored in normalization numerical value library corresponding with key word library;
Data convert output module, for formatting normalization data by user demand, after obtaining format conversion
Data, the data after format is converted export.
Optionally, further includes:
Relevant information obtains module, unstructured for obtaining status information relevant to unstructured data and environmental information
The relevant status information of data and environmental information are corresponding with keyword;
Relevant information memory module, for believing status information and environmental information storage to ambient condition corresponding with key word library
It ceases in library.
Optionally, data normalization module includes:
Divide filter element and obtains independent data field for being split and filtering to unstructured data;
Data sorting unit obtains classification data field for classifying to independent data field;
Format conversion unit obtains the normalization data of uniform format for classification data field to be carried out format conversion.
Optionally, data sorting unit includes:
First operation subelement, for obtaining the keyword of independent data field;
Second operation subelement, is handled for the keyword of independent data field to be carried out duplicate removal, association and enhancing;
Third operates subelement, for carrying out the keyword in the keyword and key word library of independent data field one by one
Match, obtains classification data field.
According to the third aspect, the present invention provides a kind of computing terminal, including processor, processor is for executing memory
The method that the computer program of middle storage realizes above-mentioned first aspect any one.
According to fourth aspect, the present invention provides a kind of computer readable storage mediums, are stored thereon with computer program,
The method that storage medium is used to store computer program to realize above-mentioned first aspect any one.
The beneficial effects of the present invention are:
A kind of method and system of unstructured data processing of the present invention, the method for unstructured data processing includes following step
It is rapid: to obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to non-structural
Change data, obtains the keyword of unstructured data, according to unstructured data, the keyword for obtaining unstructured data includes
According to preset resolution rules, the keyword that is extracted from unstructured data;Judge whether keyword is already present on
In key word library, if keyword does not exist in key word library, keyword is added in key word library;To unstructured number
According to being normalized, the normalization data of uniform format is obtained, and normalization data is stored in corresponding with key word library
Normalization numerical value library in;Normalization data is formatted by user demand, the data after obtaining format conversion, by lattice
Data after formula conversion are exported.Technical solution of the present invention passes through the keyword for obtaining unstructured data first, then will
Judge whether the keyword is already present in database, if not there is no the keyword in database, which is added
It is added in database, can be continuously replenished and improve in this way the key word library in database, improve to unstructured data
The flexibility of analysis and processing adapts to various unstructured datas at present, and the present invention is by carrying out unstructured data
Normalized helps unstructured data carrying out structuring, formatting, standardization, improves looking into for unstructured data
Ask efficiency and utilization efficiency.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those skilled in the art, without creative efforts,
It is also possible to obtain other drawings based on these drawings.
Fig. 1 is the structural block diagram of unstructured database in the embodiment of the present invention;
Fig. 2 is a kind of implementation flow chart of the method for unstructured data processing in the embodiment of the present invention;
Fig. 3 is a kind of one of structural schematic diagram of system of unstructured data processing in the embodiment of the present invention;
Fig. 4 is a kind of second structural representation of the system of unstructured data processing in the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of data normalization module in the embodiment of the present invention;
Fig. 6 is the structural schematic diagram of data sorting unit in the embodiment of the present invention.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation
Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill
Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Unstructured data refers to the inconvenient data showed with database two dimension logical table, and unstructured data is meter
Calculation machine or life at text information, data therein might not follow the data structure of standard (such as pattern definition specification
Row and column), it is not easy to directly understood and utilized by computer program.Unstructured data includes all format texts, picture, each
Class report, image, audio and video information etc., unstructured data may be obtained from all trades and professions by various modes.
In an embodiment of the present invention, referring to FIG. 1, Fig. 1 is the structure of unstructured database in the embodiment of the present invention
Include key word library 10, normalization numerical value library 11 and environmental state information library 12, key word library in block diagram unstructured database
10, normalizing the corresponding relationship between numerical value library 11 and environmental state information library 12 is to correspond.Wherein, key word library 10 is used
Store the keyword of unstructured data, normalization numerical value library 11 is used to store after unstructured data is normalized to obtain
Normalization data, environmental state information library 12 be used to store unstructured data generate when, it is relevant to unstructured data
Status information and environmental information.
A kind of method of unstructured data processing provided by the invention is explained in detail below, referring to FIG. 2, figure
The implementation flow chart of 2 methods handled for unstructured data a kind of in the embodiment of the present invention, unstructured data processing
Method includes:
S101, obtains unstructured data, and the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;
In the embodiment of the present invention, before the keyword for extracting unstructured data, need to obtain unstructured data, it is available
The data of the various formats inputted by way of man-machine interactively can also call various sensors or detection device to collect
Various formats data, the present invention is to this and without limitation.
S102 obtains the keyword of unstructured data according to unstructured data, according to unstructured data, obtains
The keyword of unstructured data includes the keyword that extracts from unstructured data according to preset resolution rules.
In inventive embodiments, preset resolution rules include that customized resolution rules and system are pre- in advance by user
The resolution rules first configured, resolution rules can be regular expression rule or other forms be able to achieve extract it is unstructured
The rule of critical field in data defines the operation rules for extracting critical field in unstructured data in resolution rules.
It should be noted that unstructured data processing system is right in order to improve the analyzing efficiency of unstructured data
When unstructured data is parsed, first unstructured data can be solved using system preconfigured resolution rules
Analysis, to obtain the keyword of unstructured data.If be unable to complete using the preconfigured resolution rules of system to non-structural
Change the parsing of data, reusing user, customized resolution rules parse unstructured data in advance, to obtain non-knot
The keyword of structure data.
It should be noted that extracting the keyword of unstructured data, and store a key in advantageous in key word library
Unstructured data is managed and is adjusted in user, has stored a key in the non-of key word library when user calls again
Structural data, system do not need to scan for all unstructured datas saved in database, it is only necessary to according to non-knot
The keyword of structure data carries out simple retrieval, further according between key word library, normalization numerical value library and environmental state information library
One-to-one relationship, complete unstructured data can be obtained.
S103, judges whether keyword is already present in key word library, if keyword does not exist in key word library,
Keyword is added in key word library.
In an embodiment of the present invention, unstructured data processing system is by the keyword and keyword of unstructured data
All keywords in library are compareed one by one, if existing identical with the keyword of unstructured data in key word library
Keyword, then the keyword of unstructured data is no longer added to key word library by unstructured data processing system, if closed
Not there is no keyword identical with the keyword of unstructured data in key character library, then adds the keyword of unstructured data
To key word library, it can be continuously replenished and be improved in this way the key word library in database, improve and unstructured data is analyzed
With the flexibility of processing, it is adapted to current various unstructured datas.
Unstructured data is normalized in S104, obtains the normalization data of uniform format, and will normalization
Data are stored in normalization numerical value library corresponding with key word library.
In an embodiment of the present invention, according to key already present in the keyword of unstructured data and key word library
Unstructured data is normalized in word, obtains the normalization data of uniform format, and normalization data is stored in
In normalization numerical value library corresponding with key word library, specific normalized process provides in the following embodiments.
It should be noted that the normalization of non-structural data refer to by by unstructured data be split and filtering at
Reason obtains independent data field, and the whole process of classification and format conversion, non-structural data are carried out to independent data field
After normalized, obtained normalization data belongs to structural data, and normalization data can be carried out by database
Storage is also convenient for computer and carries out operation.
S105 is formatted normalization data by user demand, and the data after obtaining format conversion turn format
Data after changing are exported.
In an embodiment of the present invention, it when user needs to check unstructured data again, can be carried out based on keyword
Retrieval, unstructured data can be called by key search user, the user for needing to illustrate can by format conversion,
The various data formats that the data being stored in normalization numerical value library are converted into user demand are exported again, such as text formatting, sound
Frequency format, video format etc..
Further, as a kind of optional embodiment of the present embodiment, unstructured data is obtained in step S101
Later, further comprising the steps of:
S201 obtains status information relevant to unstructured data and environmental information, the relevant state letter of unstructured data
Breath and environmental information are corresponding with keyword.
In an embodiment of the present invention, extract unstructured data keyword before, it is also necessary to obtain with it is non-structural
Change the relevant status information of data and environmental information, the relevant status information of unstructured data includes temporal information, longitude letter
Breath, latitude information, altitude information, related personnel's information, dependent event information etc., the relevant environmental information of unstructured data
Including temperature information, pressure information etc..
For example, it is assumed that the unstructured data got is event information, then needing simultaneously in data acquisition phase
Obtain the event information occur when the specific time, altitude information (longitude, latitude), the event information occur when temperature,
Atmospheric pressure environment etc., while also needing to obtain that related personnel's information, dependent event information occurs with the event information.
S202, by status information and environmental information storage into environmental state information corresponding with key word library library.
In an embodiment of the present invention, the status information that needs will acquire, environmental information storage is arrived and key word library pair
In the environmental state information library answered, facilitate it is subsequent be normalized data analysis and user's later period calling.
Further, as a kind of optional embodiment of the present embodiment, step S104 carries out unstructured data
Normalized obtains the normalization data of uniform format, specifically includes the following steps:
S301 is split unstructured data and filters, and obtains independent data field.
It should be noted that the complexity of unstructured data is higher, therefore needed when carrying out unstructured data processing
Unstructured data is split and is filtered, after unstructured data is converted into independent data field, the complexity of data
It is substantially reduced.
S302 classifies to independent data field, obtains classification data field.
In an embodiment of the present invention, after obtaining independent data field, the keyword of independent data field is being closed
Compareed one by one in key character library, by the data value storage of same keyword or the independent data field of similar keyword in
In the corresponding normalization numerical value library of its key word library, to complete to classify to independent data field.
For example, unstructured data processing system obtains after unstructured data is split and is filtered
Independent data field remains unordered, therefore after obtaining independent data field, it is also necessary to according to independent data field
Keyword is classified, to facilitate subsequent storage and calling.
Classification data field is carried out format conversion, obtains the normalization data of uniform format by S303.
In an embodiment of the present invention, after completing to classify to independent data field, it is also necessary to what is classified
Independent data field carry out format conversion, format conversion here be usually by the format conversion of independent data field for convenience
Normalize the format of numerical value library storage.
For example, a kind of normalization data library is as follows:
A kind of normalization data library of table 1
Object Product Or Thing Object It compiles Code | Number According to Knot Fruit It compiles Code | Through Degree | Latitude Degree | Sea It pulls out | Hair It is raw When Between | Note Record People | Phase It closes People | Phase It closes Thing Part It compiles Code | Ring Border Item Part It compiles Code | It closes Key Word It compiles Code | Meter Amount It is single Position | It is fixed Amount Value | It is fixed Amount Model It encloses It rises Point | It is fixed Amount Model It encloses Eventually Point | It is fixed Property Value | Hair It is raw It is secondary Number | Wind Danger System Number | Sternly Weight Journey Degree | Hair It is raw Generally Rate | It is hidden Contain Property Generally Rate | Scape It retouches It states | It is former Cause Point Analysis | Consequence Analysis |
Normalization data library as shown in Table 1, every data therein are all 24 dimension datas, and the dimension of every data is all
The performance that identical exactly data are normalized.In the present embodiment, the independence obtained after unstructured data being normalized
Data field, then independent data field correspondence is stored in the corresponding position in normalization data library.
For example, certain video conference display products finds occur mist spot on screen after being sent to somewhere, collects and shields
Occur the relevant various information of mist spot this event on curtain, these information can regard one as since format is different
Kind unstructured data, after the unstructured data is normalized using technical solution of the present invention, obtains following knot
Fruit:
Article or things encode A100009
Data result encodes R1000078
Longitude 114.93
Latitude 25.83
109 meters of height above sea level
10 minutes and 40 seconds 15 points of time of origin on July 10th, 2018
Recorder Zhang San
Relevant people Li Si
Dependent event encodes A30008801
Environmental condition encodes C50000002
Keyword encodes F20001
Measurement unit square millimeter
Quantitative values 16000
Quantification range starting point 9000
Quantification range terminal 21000
The thick mist spot of qualitative value
Frequency 2
Risk factor 0.7 × 0.6 × 0.1=0.042
Severity 0.7
Probability of happening 0.6
Implicity probability 0.1
One video conference display products of scene description find mist spot occur on screen after being sent to the ground A
The analysis of causes may cause screen intrinsic fog spot since the weather humidity on the ground A is larger
Consequences analysis mist fleck rings the display effect of video conference display screen
In above-mentioned normalization data result, scene description refers to that this unstructured data is shown from a video conference
Screen products find mist spot occur on screen after being sent to the ground A;To article or things, keyword, environmental condition, dependent event, number
Encoded according to result is that normalization data is facilitated to be stored;Quantitative values are the practical big of this display screen mist spot area
Small, qualitative value refers to that the attribute of this mist spot is thick mist spot;Quantification range starting point, quantification range terminal refer to display screen mist spot face
The range that product is likely to occur;Severity, probability of happening, implicity probability pass through statistics and obtain, and numerical value is between 0 to 1;
Risk factor be severity, probability of happening, implicity probability this three product;The analysis of causes: may be due to the weather on the ground A
Humidity is larger and causes screen intrinsic fog spot;Consequences analysis: mist fleck rings the display effect of video conference display screen.
Further, as a kind of optional embodiment of the present embodiment, step S302 carries out independent data field
Classification, obtains classification data field, specifically includes the following steps:
S401 obtains the keyword of independent data field.
In an embodiment of the present invention, it needs to extract each independent data according to resolution rules preset in S101
The keyword of field, it should be noted that each independent data field is corresponding with a keyword.
The keyword of independent data field is carried out duplicate removal, association and enhancing and handled by S402.
In an embodiment of the present invention, the independent data field of same keyword is subjected to keyword duplicate removal processing, it will be same
The independent data field of class keyword carries out keyword association processing, and the independent data field of different keywords is carried out keyword
Enhancing processing, to protrude the otherness of independent data field, facilitates the classification of subsequent independent data field.
For example, multiple independent data fields relevant with " temperature " are partitioned into some unstructured data, then this
Multiple keywords extracted with " temperature " relevant independent data field are related to " temperature ", it is therefore desirable to independent digit
Duplicate removal processing is carried out according to the keyword of field, only retains a keyword.
The keyword of independent data field is matched one by one with the keyword in key word library, is classified by S403
Data field.
In an embodiment of the present invention, the keyword in the keyword and key word library of independent data field is carried out one by one
Matching, classifies to independent data field according to the matching result of keyword, and by the data value storage of independent data field
In normalization numerical value library corresponding with key word library, so far, unstructured data has turned to tie by normalized
Structure data.It should be noted that each independent data field is corresponding with a keyword and corresponding data value.
In conclusion unstructured data is normalized, detailed process is as follows:
Firstly, unstructured data to be split and filter, independent data field is obtained.Secondly, according to being set in advance in S102
Fixed resolution rules extract the keyword of each independent data field.Again, it will extract from unstructured data
Keyword carries out duplicate removal, association and enhancing processing.Finally, by the keyword in the keyword and key word library of independent data field
It is matched, is classified according to the matching result of keyword to independent data field one by one, and by the number of independent data field
It is stored in normalization numerical value library corresponding with key word library according to value.
Further, as a kind of optional embodiment of the present embodiment, step S104 carries out unstructured data
Normalized obtains the normalization data of uniform format, and normalization data is stored in normalizing corresponding with key word library
It is further comprising the steps of after changing in numerical value library:
S501 carries out data analysis to normalization data, and data analysis includes scene description, the analysis of causes, severity analysis, hair
Raw probability calculation, implicity analysis, risk factor analysis, potential failure mode and consequences analysis and related personnel's analysis.
In an embodiment of the present invention, to normalization data carry out data analyze when, need combine with it is non-structural
The information for changing the relevant status information of data, environmental information and normalization data itself is analyzed.
For example, it is assumed that the unstructured data got is event information, carries out data data are normalized
When being analyzed, unstructured data processing system can pass through the letter of status information, environmental information and normalization data itself
Specific scene, Producing reason, the probability of generation, existing risk and the consequence of generation etc. that analysis event occurs are ceased,
And it also requires analyzing in conjunction with related personnel's information, dependent event information, a comprehensive analysis is obtained as a result, and should
As a result it stores in database, user is facilitated to call.
In conclusion this method is non-structural by obtaining the invention discloses a kind of method of unstructured data processing
Change data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to unstructured data, obtain non-
The keyword of structural data, according to unstructured data, the keyword for obtaining unstructured data includes that basis is preset
Resolution rules, the keyword extracted from unstructured data;Judge whether keyword is already present in key word library, if
Keyword does not exist in key word library, then keyword is added in key word library;Unstructured data is normalized
Processing, obtains the normalization data of uniform format, and normalization data is stored in normalization numerical value corresponding with key word library
In library;Normalization data is formatted by user demand, the data after obtaining format conversion, the number after format is converted
According to being exported.Technical solution of the present invention passes through the keyword for obtaining unstructured data first, then will judge the keyword
Whether it is already present in database, if not there is no the keyword in database, which is added to database and is worked as
In, can be continuously replenished and improve in this way the key word library in database, improve to unstructured data analyze and handle
Flexibility adapts to various unstructured datas at present, and the present invention has by the way that unstructured data is normalized
Help unstructured data carrying out structuring, formatting, standardization, improve the search efficiency of unstructured data and utilizes effect
Rate.
The optional embodiment of the embodiment of the present invention is described in detail above, still, the embodiment of the present invention is not limited to
The detail in embodiment is stated, it, can be to the skill of the embodiment of the present invention in the range of the technology design of the embodiment of the present invention
Art scheme carries out a variety of simple variants, these simple variants belong to the protection scope of the embodiment of the present invention.
Referring to FIG. 3, Fig. 3 be the system of a kind of unstructured data processing in the embodiment of the present invention structural schematic diagram it
One, the unstructured data processing system include:
Data acquisition module 101, for obtaining unstructured data, the acquisition of unstructured data includes being manually entered or electricity
Sub- equipment acquisition;
Keyword obtains module 102, for the keyword of unstructured data being obtained, according to non-knot according to unstructured data
Structure data, the keyword for obtaining unstructured data includes according to preset resolution rules, from unstructured data
The keyword of extraction;
Keyword judgment module 103, for judging whether keyword is already present in key word library, if keyword does not exist in
In key word library, then keyword is added in key word library;
Data normalization module 104 obtains the normalization number of uniform format for unstructured data to be normalized
According to, and normalization data is stored in normalization numerical value library corresponding with key word library;
Data convert output module 105, for formatting normalization data by user demand, after obtaining format conversion
Data, the data after format is converted export.
Referring to FIG. 4, Fig. 4 be the system of a kind of unstructured data processing in the embodiment of the present invention structural schematic diagram it
Two, the system of unstructured data processing further include:
Relevant information obtains module 106, non-structural for obtaining status information relevant to unstructured data and environmental information
Change the relevant status information of data and environmental information is corresponding with keyword;
Relevant information memory module 107, for status information and environmental information storage to be arrived environment shape corresponding with key word library
In state information bank.
Referring to FIG. 5, Fig. 5 is the structural schematic diagram of data normalization module in the embodiment of the present invention, data normalization mould
Block 104 includes:
Divide filter element 411 and obtains independent data field for being split and filtering to unstructured data;
Data sorting unit 412 obtains classification data field for classifying to independent data field;
Format conversion unit 413 obtains the normalization data of uniform format for classification data field to be carried out format conversion.
Referring to FIG. 6, Fig. 6 is the structural schematic diagram of data sorting unit in the embodiment of the present invention, data sorting unit 412
Include:
First operation subelement 421, for obtaining the keyword of independent data field;
Second operation subelement 422, is handled for the keyword of independent data field to be carried out duplicate removal, association and enhancing;
Third operates subelement 423, for carrying out the keyword in the keyword and key word library of independent data field one by one
Matching, obtains classification data field.
Other implementation details and beneficial effect of the system of the unstructured data processing of the embodiment of the present invention can refer to
The embodiment of the method about unstructured data processing is stated, details are not described herein.
The optional embodiment of the embodiment of the present invention is described in detail in conjunction with attached drawing above, still, the embodiment of the present invention is simultaneously
The detail being not limited in above embodiment can be to of the invention real in the range of the technology design of the embodiment of the present invention
The technical solution for applying example carries out a variety of simple variants, these simple variants belong to the protection scope of the embodiment of the present invention.
In addition, also providing a kind of computer installation in the embodiment of the present invention, processor passes through computer instructions, thus
Realize following methods:
Obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to non-knot
Structure data obtain the keyword of unstructured data, according to unstructured data, obtain the keyword packet of unstructured data
Include the keyword extracted from unstructured data according to preset resolution rules;Judge whether keyword has existed
In key word library, if keyword does not exist in key word library, keyword is added in key word library;To unstructured
Data are normalized, and obtain the normalization data of uniform format, and normalization data is stored in and key word library pair
In the normalization numerical value library answered;Normalization data is formatted by user demand, the data after obtaining format conversion will
Data after format conversion are exported.
It is that can lead to it will be understood by those skilled in the art that realizing all or part of the process in above-described embodiment method
Computer program is crossed to instruct relevant hardware and complete, program can be stored in a computer-readable storage medium, should
Program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic disk, light
Disk, read-only memory (ROM) or random access memory (RAM) etc..Computer processor is for executing in storage medium
The computer program of storage realizes following methods:
Obtain unstructured data, the acquisition of unstructured data includes being manually entered or electronic equipment acquisition;According to non-knot
Structure data obtain the keyword of unstructured data, according to unstructured data, obtain the keyword packet of unstructured data
Include the keyword extracted from unstructured data according to preset resolution rules;Judge whether keyword has existed
In key word library, if keyword does not exist in key word library, keyword is added in key word library;To unstructured
Data are normalized, and obtain the normalization data of uniform format, and normalization data is stored in and key word library pair
In the normalization numerical value library answered;Normalization data is formatted by user demand, the data after obtaining format conversion will
Data after format conversion are exported.
What has been described above is only an embodiment of the present invention, and the common sense such as well known specific structure and characteristic are not made herein in scheme
Excessive description.It, without departing from the structure of the invention, can be with it should be pointed out that for those skilled in the art
Make several modifications and improvements.These also should be considered as protection scope of the present invention, these all will not influence what the present invention was implemented
Effect and patent practicability.The scope of protection required by this application should be based on the content of the claims, in specification
The records such as specific embodiment can be used for explaining the content of claim.
Claims (10)
1. a kind of method of unstructured data processing, which is characterized in that the described method includes:
Obtain unstructured data, the acquisition of the unstructured data includes being manually entered or electronic equipment acquisition;
According to the unstructured data, the keyword of the unstructured data is obtained, it is described according to the unstructured number
According to the keyword for obtaining the unstructured data includes according to preset resolution rules, from the unstructured data
The keyword of middle extraction;
Judge whether the keyword is already present in key word library, if the keyword does not exist in the key word library
In, then the keyword is added in the key word library;
The unstructured data is normalized, obtains the normalization data of uniform format, and by the normalization
Data are stored in normalization numerical value library corresponding with the key word library;
The normalization data is formatted by user demand, the data after obtaining format conversion turn the format
Data after changing are exported.
2. the method for unstructured data processing as described in claim 1, which is characterized in that obtain unstructured number described
According to later further include:
Obtain relevant to unstructured data status information and environmental information, the relevant state of the unstructured data
Information and environmental information are corresponding with the keyword;
By the status information and environmental information storage into environmental state information corresponding with key word library library.
3. the method for unstructured data as described in claim 1 processing, which is characterized in that the unstructured data into
Row normalized obtains the normalization data of uniform format specifically:
The unstructured data is split and is filtered, independent data field is obtained;
Classify to the independent data field, obtains classification data field;
The classification data field is subjected to format conversion, obtains the normalization data of uniform format.
4. the method for unstructured data processing as claimed in claim 3, which is characterized in that described to the independent data word
Duan Jinhang classification, obtains classification data field and specifically includes:
Obtain the keyword of the independent data field;
The keyword of the independent data field is subjected to duplicate removal, association and enhancing processing;
The keyword of the independent data field is matched one by one with the keyword in the key word library, obtains institute
State classification data field.
5. a kind of system of unstructured data processing, which is characterized in that the system comprises:
Data acquisition module, for obtaining unstructured data, the acquisition of the unstructured data include be manually entered or
Electronic equipment acquisition;
Keyword obtains module, described for obtaining the keyword of the unstructured data according to the unstructured data
According to the unstructured data, obtain the unstructured data keyword include according to preset resolution rules,
The keyword extracted from the unstructured data;
Keyword judgment module, for judging whether the keyword is already present in key word library, if the keyword is not
It is present in the key word library, then the keyword is added in the key word library;
Data normalization module obtains the normalization of uniform format for the unstructured data to be normalized
Data, and the normalization data is stored in normalization numerical value library corresponding with the key word library;
Data convert output module, for formatting the normalization data by user demand, obtain format conversion
Data afterwards export the data after format conversion.
6. the system of unstructured data processing as claimed in claim 5, which is characterized in that further include:
Relevant information obtains module, described for obtaining status information relevant to the unstructured data and environmental information
The relevant status information of unstructured data and environmental information are corresponding with the keyword;
Relevant information memory module, it is corresponding with the key word library for arriving the status information and environmental information storage
Environmental state information library in.
7. the system of unstructured data processing as claimed in claim 6, which is characterized in that the data normalization module packet
It includes:
Divide filter element and obtains independent data field for the unstructured data to be split and filtered;
Data sorting unit obtains classification data field for classifying to the independent data field;
Format conversion unit obtains the normalization data of uniform format for the classification data field to be carried out format conversion.
8. the system of unstructured data processing as claimed in claim 6, which is characterized in that the data sorting unit packet
It includes:
First operation subelement, for obtaining the keyword of the independent data field;
Second operation subelement, is handled for the keyword of the independent data field to be carried out duplicate removal, association and enhancing;
Third operates subelement, for by the keyword in the keyword of the independent data field and the key word library
It is matched one by one, obtains the classification data field.
9. a kind of computing terminal, which is characterized in that the computing terminal includes processor, and the processor is for executing memory
The method that the computer program of middle storage is handled with the unstructured data realized such as claim 1-4 any one.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the storage medium is used
In method of the storage computer program to realize the unstructured data processing as described in claim 1-4 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910709505.8A CN110442671A (en) | 2019-08-02 | 2019-08-02 | A kind of method and system of unstructured data processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910709505.8A CN110442671A (en) | 2019-08-02 | 2019-08-02 | A kind of method and system of unstructured data processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110442671A true CN110442671A (en) | 2019-11-12 |
Family
ID=68432928
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910709505.8A Pending CN110442671A (en) | 2019-08-02 | 2019-08-02 | A kind of method and system of unstructured data processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442671A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241177A (en) * | 2019-12-31 | 2020-06-05 | 中国联合网络通信集团有限公司 | Data acquisition method, system and network equipment |
CN113703409A (en) * | 2021-08-31 | 2021-11-26 | 中冶华天南京工程技术有限公司 | Belt flow data acquisition and control system for iron and steel enterprise |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060200478A1 (en) * | 2005-03-02 | 2006-09-07 | Egon Pasztor | Generating structured information |
CN104239506A (en) * | 2014-09-12 | 2014-12-24 | 北京优特捷信息技术有限公司 | Unstructured data processing method and device |
CN105183916A (en) * | 2015-10-16 | 2015-12-23 | 辽宁工程技术大学 | Device and method for managing unstructured data |
CN109033319A (en) * | 2018-07-18 | 2018-12-18 | 长扬科技(北京)有限公司 | A kind of big data log method for normalizing and tool |
-
2019
- 2019-08-02 CN CN201910709505.8A patent/CN110442671A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060200478A1 (en) * | 2005-03-02 | 2006-09-07 | Egon Pasztor | Generating structured information |
CN104239506A (en) * | 2014-09-12 | 2014-12-24 | 北京优特捷信息技术有限公司 | Unstructured data processing method and device |
CN105183916A (en) * | 2015-10-16 | 2015-12-23 | 辽宁工程技术大学 | Device and method for managing unstructured data |
CN109033319A (en) * | 2018-07-18 | 2018-12-18 | 长扬科技(北京)有限公司 | A kind of big data log method for normalizing and tool |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241177A (en) * | 2019-12-31 | 2020-06-05 | 中国联合网络通信集团有限公司 | Data acquisition method, system and network equipment |
CN113703409A (en) * | 2021-08-31 | 2021-11-26 | 中冶华天南京工程技术有限公司 | Belt flow data acquisition and control system for iron and steel enterprise |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Thom et al. | Spatiotemporal anomaly detection through visual analysis of geolocated twitter messages | |
CN110428091B (en) | Risk identification method based on data analysis and related equipment | |
CN110740141A (en) | integration network security situation perception method, device and computer equipment | |
CN110275965B (en) | False news detection method, electronic device and computer readable storage medium | |
US11221904B2 (en) | Log analysis system, log analysis method, and log analysis program | |
CN109582551A (en) | Daily record data analytic method, device, computer equipment and storage medium | |
US20180349250A1 (en) | Content-level anomaly detector for systems with limited memory | |
CN111045847A (en) | Event auditing method and device, terminal equipment and storage medium | |
US20150205862A1 (en) | Method and device for recognizing and labeling peaks, increases, or abnormal or exceptional variations in the throughput of a stream of digital documents | |
CN113254255B (en) | Cloud platform log analysis method, system, device and medium | |
CN110442671A (en) | A kind of method and system of unstructured data processing | |
CN106202126B (en) | A kind of data analysing method and device for logistics monitoring | |
CN107526820A (en) | A kind of more storehouse enterprise innovation monitoring big data normal data base construction methods of multi-source | |
CN114896305A (en) | Smart internet security platform based on big data technology | |
CN111866196A (en) | Domain name traffic characteristic extraction method, device, equipment and readable storage medium | |
US20210350160A1 (en) | System And Method For An Activity Based Intelligence Contextualizer | |
CN110889451B (en) | Event auditing method, device, terminal equipment and storage medium | |
CN112433874A (en) | Fault positioning method, system, electronic equipment and storage medium | |
CN111753070A (en) | System and method for processing server monitoring log | |
Apostol et al. | ContCommRTD: A distributed content-based misinformation-aware community detection system for real-time disaster reporting | |
CN110874366A (en) | Data processing and query method and device | |
CN113015171A (en) | System with network public opinion monitoring and analyzing functions | |
Girish et al. | Extreme event detection and management using twitter data analysis | |
CN112306820A (en) | Log operation and maintenance root cause analysis method and device, electronic equipment and storage medium | |
CN113139043A (en) | Question and answer sample generation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |