CN104239506A - Unstructured data processing method and device - Google Patents

Unstructured data processing method and device Download PDF

Info

Publication number
CN104239506A
CN104239506A CN201410466111.1A CN201410466111A CN104239506A CN 104239506 A CN104239506 A CN 104239506A CN 201410466111 A CN201410466111 A CN 201410466111A CN 104239506 A CN104239506 A CN 104239506A
Authority
CN
China
Prior art keywords
data
resolution rules
unstructured data
critical field
user defined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410466111.1A
Other languages
Chinese (zh)
Inventor
陈军
梁玫娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410466111.1A priority Critical patent/CN104239506A/en
Publication of CN104239506A publication Critical patent/CN104239506A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an unstructured data processing method and device, which are used for converting unstructured data into structured data. The method comprises the following steps: acquiring a resolving rule for extracting a key field in unstructured data; extracting the key field in the unstructured data by using the analyzing rule; naming the extracted key field as a preset parameter, and assigning the preset parameter as the extracted key field to generate structured data. According to the technical scheme, the unstructured data can be converted into the structured data, so that convenience is brought to inquiry and statistics, and the calculation space and inquiry time are saved.

Description

A kind of unstructured data disposal route and device
Technical field
The present invention relates to unstructured data processing technology field, particularly relate to a kind of unstructured data disposal route and device.
Background technology
In today of infotech develop rapidly, people produce a large amount of numerical information in various society and economic activity, corporate information technology infrastructure construction scale constantly expands, IT monitoring, operational system are also found broad application, the data of various sensor, intelligent appliance generation simultaneously, and the data bulk that various transaction system (securities exchange system, electronic commerce transaction system) produces is huge, form is also not quite similar, and is difficult to be utilized.
Unstructured data is the text message that computing machine or people generate, and data wherein might not follow the data structure (row and column as pattern definition standard) of standard, are not easy directly to be understood by computer program and utilize.After unstructured data is converted into structural data, can stored in search engine, SQL (Structured Query Language, Structured Query Language (SQL)), the system such as NoSQL (Not Only SQL, non-relational data), be further analyzed.Major part business intelligence (Business Intelligence) software can only analyze the structural data based on database.Such as oracle database merges intelligent data type and optimization data structure by operational symbol, to analyze and to operate the unstructured datas such as XML (Exteile Marku Laguage, extendability identifiable language) document, content of multimedia, text and geospatial information.
Unstructured data form is various, can be catalogued and quote in usage data storehouse by " based on pointer " method to the document, image and the media content that store hereof.Use XML format tissue and preserve semi-structured data and different classes of information is kept in the different node of XML, but search efficiency is lower, needs to complete query statistic by XPATH (XML Path, XML path language).In addition, the shortcoming of usage data library storage unstructured data needs pre-defined Schema, i.e. database tableau format, and after defining, more difficult amendment, causes dirigibility poor, cannot adapt to current various unstructured data.
Wherein, unstructured data has the feature of unstructured data, is a kind of unstructured data, and unstructured data also has the aforementioned problem being not easy to query statistic, storing difficulty.
Summary of the invention
For overcoming Problems existing in correlation technique, the embodiment of the present invention provides a kind of unstructured data disposal route and device, in order to unstructured data is converted to structural data.
According to the first aspect of the embodiment of the present invention, a kind of unstructured data disposal route is provided, comprises:
Obtain the resolution rules for extracting critical field in unstructured data;
Utilize described resolution rules, extract the critical field in unstructured data;
By the critical field called after parameter preset extracted, and be the critical field extracted by described parameter preset assignment, generating structured data.
In one embodiment, described acquisition, for extracting the resolution rules of critical field in unstructured data, comprising: according to the application information generating described unstructured data, search User Defined resolution rules;
Describedly utilize described resolution rules, extract the critical field in unstructured data, comprising: utilize described User Defined resolution rules to extract critical field in unstructured data; When not finding described User Defined resolution rules or described User Defined resolution rules does not mate with described unstructured data, the built-in resolution rules of seeking system; Utilize the critical field in described system built-in resolution rules extraction unstructured data.
In one embodiment, the described application information according to generating described unstructured data, search User Defined resolution rules, comprising: according to the application information generating described unstructured data, searching is the User Defined resolution rules of described unstructured data configuration in advance;
The described critical field utilized in described User Defined resolution rules extraction unstructured data, comprising: the User Defined resolution rules in advance for described unstructured data configuration described in utilization extracts the critical field in unstructured data.
In one embodiment, the described critical field utilized in described User Defined resolution rules extraction unstructured data, comprising:
When described User Defined resolution rules has multiple, use the critical field in each User Defined resolution rules extraction unstructured data successively.
In one embodiment, the described critical field utilized in described system built-in resolution rules extraction unstructured data, comprising:
When the built-in resolution rules of described system has multiple, use the critical field in each system built-in resolution rules extraction unstructured data successively.
In one embodiment, described method also comprises:
Judge whether the value of the parameter preset in described structural data meets and preset alarm conditions;
When the value of the parameter preset in described structural data meets default alarm conditions, send alarm and/or block operation corresponding to described parameter preset.
In one embodiment, described method also comprises:
By data exchange interface, from third party database, search the data of mating with described structural data, the data in described third party database are consistent with described Structured data sources; Or by data exchange interface, import the data in third party database, the data in described third party database are consistent with described Structured data sources; The data matched with described structural data are searched in the data of described importing;
Visualization processing is carried out to the data matched with described structural data.
In one embodiment, described method also comprises:
Described structural data is imported in third party database, to upgrade the data in described third party database.
According to the second aspect of the embodiment of the present invention, a kind of unstructured data treating apparatus is provided, comprises:
Rule acquisition module, for obtaining the resolution rules for extracting critical field in unstructured data;
Field abstraction module, for utilizing described resolution rules, extracts the critical field in unstructured data;
Described parameter preset assignment for the critical field called after parameter preset that will extract, and is the critical field extracted by data generation module, generating structured data.
In one embodiment, described rule acquisition module can comprise:
First searches submodule, for according to the application information generating described unstructured data, searches User Defined resolution rules;
Described field abstraction module comprises:
First extracts submodule, for utilizing the critical field in described User Defined resolution rules extraction unstructured data;
Second searches submodule, for when described first search submodule do not find described User Defined resolution rules or described User Defined resolution rules do not mate with described unstructured data time, the built-in resolution rules of seeking system;
Second extracts submodule, for utilizing the critical field in described system built-in resolution rules extraction unstructured data.
In one embodiment, described first search submodule and can comprise:
Search unit, for the application information according to the described unstructured data of generation, searching is the User Defined resolution rules of described unstructured data configuration in advance;
Described first extracts submodule comprises:
First extracting unit is the critical field in the User Defined resolution rules extraction unstructured data of described unstructured data configuration in advance described in utilizing.
In one embodiment, described first extraction submodule can comprise:
Second extracting unit, for when described User Defined resolution rules has multiple, uses the critical field in each User Defined resolution rules extraction unstructured data successively.
In one embodiment, described second extraction submodule can comprise:
3rd extracting unit, for when the built-in resolution rules of described system has multiple, uses the critical field in each system built-in resolution rules extraction unstructured data successively.
In one embodiment, described device also comprises:
Judge module, presets alarm conditions for judging whether the value of the parameter preset in described structural data meets;
First processing module, when presetting alarm conditions for meeting when the value of the parameter preset in described structural data, sending alarm and/or blocking operation corresponding to described parameter preset.
In one embodiment, described device also comprises:
First searches module, and for passing through data exchange interface, from third party database, search the data of mating with described structural data, the data in described third party database are consistent with described Structured data sources;
First imports module, and for passing through data exchange interface, import the data in third party database, the data in described third party database are consistent with described Structured data sources;
Second searches module, for searching the data matched with described structural data in the data of described importing;
Second processing module, for carrying out visualization processing to the data matched with described structural data.
In one embodiment, described device also comprises:
Second imports module, for importing in third party database by described structural data, to upgrade the data in described third party database.
The technical scheme that embodiments of the invention provide can comprise following beneficial effect:
Unstructured data can be converted to structural data by the said method that the embodiment of the present invention provides, thus is convenient to query statistic, saves computer memory and query time.After being converted to structural data, can by its importing in real time, batch processing importing or real-time streaming importing other system, also can as structural data stored in the system such as search engine, SQL, NoSQL, data visualization is provided, or analyzes for business intelligence (Business Intelligence) software.
Should be understood that, it is only exemplary and explanatory that above general description and details hereinafter describe, and can not limit the present invention.
Accompanying drawing explanation
Accompanying drawing to be herein merged in instructions and to form the part of this instructions, shows embodiment according to the invention, and is used from instructions one and explains principle of the present invention.
Fig. 1 is the process flow diagram of a kind of unstructured data disposal route that the embodiment of the present invention provides.
Fig. 2 is the process flow diagram of a kind of unstructured data disposal route that specific embodiment one provides.
Fig. 3 is the structural drawing of a kind of unstructured data treating apparatus that the embodiment of the present invention provides.
Fig. 4 is the structural drawing of the another kind of unstructured data treating apparatus that the embodiment of the present invention provides.
Embodiment
Here will be described exemplary embodiment in detail, its sample table shows in the accompanying drawings.When description below relates to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawing represents same or analogous key element.Embodiment described in following exemplary embodiment does not represent all embodiments consistent with the present invention.On the contrary, they only with as in appended claims describe in detail, the example of apparatus and method that aspects more of the present invention are consistent.
Fig. 1 is the process flow diagram of a kind of unstructured data disposal route according to an exemplary embodiment, and the method can be applicable to data processing equipment or data processor, and as shown in Figure 1, the method comprises the following steps S101-S103:
Step S101, obtain resolution rules for extracting critical field in unstructured data.
Wherein, resolution rules can be the rule that can realize extracting critical field in unstructured data of regular expression rule or other form.The working rule extracting critical field in unstructured data is defined in resolution rules.
Step S102, utilize resolution rules, extract the critical field in unstructured data.
Step S103, the critical field called after parameter preset that will extract, and be the critical field extracted by parameter preset assignment, generating structured data.
In said method, resolution rules comprises user's self-defining User Defined resolution rules and pre-configured built-in resolution rules of system of system in advance, in order to improve analyzing efficiency, said method first can utilize the critical field in User Defined resolution rules extraction unstructured data, when utilizing User Defined resolution rules successfully can not extract critical field, the built-in resolution rules of reutilization system extracts the critical field in unstructured data.Follow-up, index can be set up to the critical field extracted and search service is provided, or provide inquiry service stored in database; After unstructured data is converted into structural data, business intelligence (Business Intelligence) software can be supplied to carry out analyzing and doing data visualization.Specifically see the explanation of following specific embodiment:
Embodiment one
In the present embodiment one, first utilize the critical field in User Defined resolution rules extraction unstructured data, when utilizing User Defined resolution rules successfully can not extract critical field, the built-in resolution rules of reutilization system extracts the critical field in unstructured data, as shown in Figure 2, the method comprises:
Step S201, according to generating the application information of unstructured data, search User Defined resolution rules (being a kind of embodiment of abovementioned steps S101).When User Defined resolution rules can be found, continue to perform step S202; When not finding User Defined resolution rules, continue to perform step S203.
Wherein, application information can be that the mark of application program is as App Name.
Step S202, the critical field utilized in User Defined resolution rules extraction unstructured data, continue to perform step S205.
In one embodiment, step S201 can be embodied as: according to the application information generating unstructured data, and searching is the User Defined resolution rules of unstructured data configuration in advance.Now, step S202 can be embodied as: utilizing is the critical field in the User Defined resolution rules extraction unstructured data of unstructured data configuration in advance.The benefit done like this to improve extraction efficiency.
In another embodiment, step S202 also can be embodied as: when User Defined resolution rules has multiple, uses the critical field in each User Defined resolution rules extraction unstructured data successively.
Step S203, when not finding User Defined resolution rules or User Defined resolution rules does not mate with unstructured data (when utilizing User Defined resolution rules successfully can not extract critical field), the built-in resolution rules of seeking system, continues to perform step S204.
Step S204, the critical field (step S202-204 is a kind of embodiment of abovementioned steps 102) utilized in system built-in resolution rules extraction unstructured data, continue to perform step S205.
In one embodiment, step S204 can be embodied as: when the built-in resolution rules of system has multiple, uses the critical field in each system built-in resolution rules extraction unstructured data successively.
Step S205, the critical field called after parameter preset that will extract, and be the critical field extracted by parameter preset assignment, generating structured data, continue to perform step S206.
Such as: by the critical field called after field_name extracted, and set up the such corresponding relation of field_name=field_value (be field_value by field_name assignment), wherein, field_value is the critical field extracted, thus generates structural data.
Step S206, structural data carried out to application process, such as index is set up to provide search service to the critical field extracted, or provide inquiry service etc. stored in database.
The unstructured data of an Apache Server such as:
”114.249.30.56--[12/Sep/2011:21:00:42+0800]"GET/zabbix/images/gradients/button.gif?HTTP/1.1"2001706"http://map.so.com/?ie=utf-8&t=map&k=%E5%8D%97%E6%98%8C%E6%B4%AA%E9%83%BD%E5%A4%A7%E5%B8%82&c=%EF%BF%BD%D0%B9%EF%BF%BD&src=360se6_search""Mozilla/5.0(Windows;U;Windows?NT?6.1;)AppleWebKit/534.12(KHTML,like?Gecko)Maxthon/3.0Safari/534.12“
}
The critical field that the said method utilizing the embodiment of the present invention to provide extracts is as follows, wherein, be positioned at the word of " " before ": " this symbol for naming formed parameter preset to the critical field extracted, the word being positioned at " " after ": " this symbol is the critical field extracted:
"clientip":"114.249.30.56",
"ident":"-",
"auth":"-",
"timestamp":"12/Sep/2011:21:00:42+0800",
"verb":"GET",
"request":"/zabbix/images/gradients/button.gif",
"httpversion":"1.1",
"response":"200",
"bytes":"1706",
"referrer":
"\"http://map.so.com/?ie=utf-8&t=map&k=%E5%8D%97%E6%98%8C%E6%B4%AA%E9%83%BD%E5%A4%A7%E5%B8%82&c=%EF%BF%BD%D0%B9%EF%BF%BD&src=360se6_search\"",
"agent":"\"Mozilla/5.0(Windows;U;Windows?NT?6.1;)AppleWebKit/534.12(KHTML,like?Gecko)Maxthon/3.0?Safari/534.12\""
}
Visible, the unstructured data of this Apache Server is converted to structural data.
Unstructured data can be converted to structural data by the said method that the embodiment of the present invention provides, thus is convenient to query statistic, saves computer memory and query time.After being converted to structural data, can by its importing in real time, batch processing importing or real-time streaming importing other system, also can as structural data stored in the system such as search engine, SQL, NoSQL, data visualization is provided, or analyzes for business intelligence (Business Intelligence) software.
Unstructured data described in the embodiment of the present invention can be any type of unstructured data, such as destructuring daily record.
The structural data that the embodiment of the present invention generates can be applied in various Data application system, and now, above-mentioned unstructured data disposal route also can comprise the following steps A1-A2:
Whether the value of steps A 1, the parameter preset judged in structural data meets is preset alarm conditions.
Steps A 2, when the value of the parameter preset in structural data meet preset alarm conditions time, send alarm and/or block operation corresponding to parameter preset.
Above-mentioned steps S101-S103, A1-A2 can perform in real time, that is: after often generating a unstructured data, just perform step S101-S103 immediately and unstructured data is converted into structural data, then, perform steps A 1-A2 immediately, thus realize carrying out data processing and safe operation in real time, intelligently, can be applicable to various infosystem, realize different functions.
According to the feature of different information systems, said method is described respectively below.
Car is networked
The said method that the embodiment of the present invention provides may be used for car networking.First introduce car networking below.Car networking is as the product be born under mobile Internet overall background, and be connected with network by car, each automobile has all installed a large amount of sensors and microprocessor, can pass in time and produce huge data.From reception travelling data, send data formation data analysis, then to feeding back to car owner, car networking there is mass data to utilize.Each group data that vehicle is uploaded, all with positional information and temporal information, can be regarded time series data as, and be easy to form mass data.Many data, as engine speed, wheel shaft rotation situation etc., there is certain repeatability numerical value aspect, but the positional information of generation data and temporal information are different, therefore, the positional information in car networking data and temporal information have purchased the important component part into these information.If these data integrities and precisely, the driving behavior of driver can be analyzed.Vehicles Collected from Market has occurred OBD (on-board diagnostic) OBD, main task is for the electronic control module ECU of automobile provides the data such as engine and environment temperature, the speed of a motor vehicle, air inflow.By OBD interface, data such as comprising engine failure, automobile electronic circuit, wheel tyre pressure, in-car air quality can be obtained.Be that product function is intelligent not based on the car networked product of OBD and the defect of common GPS product, real-time is also inadequate, in time information processing, analysis result can not be fed back to user.For vehicle driver, it is inadequate for only understanding vehicle condition, and the driving habits understanding oneself is also very crucial.Driver wishes energy Real-time Obtaining safety prompt function when driving.
Therefore, in order to make vehicle driver's energy Real-time Obtaining safety prompt function, the embodiment of the present invention can on preceding method basis, utilize the structural data produced, realize the safety prompt function to vehicle driver, now, unstructured data in said method can be the car networking data that car networking produces, and the parameter preset in the structural data utilizing said method to generate can be defined as the speed of a motor vehicle, continue any one in driving data such as driving duration, driving behavior parameter, distance travelled, vehicle condition etc. or multinomial; Wherein, the corresponding default alarm conditions of each parameter preset, these default alarm conditions are what pre-set, setting up procedure can be analysis of history car networking data in advance, therefrom analyzing for defining the whether safe driving data reference value of driving condition, determining default alarm conditions according to this driving data reference value.Utilize the value of the parameter preset in structural data whether to meet and preset alarm conditions, determine whether sending alarm.Illustrate below:
When parameter preset is the speed of a motor vehicle, default alarm conditions can be that current vehicle speed value exceeds preset security vehicle speed value 20% or other condition.When carrying out security alarm operation, when first can judge whether the value of current vehicle speed meets default alarm conditions, send alarm when meeting, this alarm is used for driver and exceeds the speed limit, and reminds driver to reduce the speed of a motor vehicle.When parameter preset is for continuing to drive duration, default alarm conditions can be that existing continuous driving duration is equal to or greater than preset fatigue driving duration maximal value or other condition.When carrying out security alarm operation, first can judge that existing continuous is driven duration and whether met when presetting alarm conditions, send alarm when meeting, this alarm is used for driver and is in fatigue driving filling, reminds driver to stop rest.When parameter preset is driving behavior parameter, driving behavior parameter can specifically brake hard situation, zig zag situation, anxious accelerated condition, hypervelocity behavioral aspect, current present position situation etc.; For each driving behavior parameter, correspondence default alarm conditions can be pre-set; Such as brake hard situation, default alarm conditions can be that current brake hard frequency is equal to or greater than brake hard frequency threshold, therefore, brake hard frequency threshold is equal to or greater than in current brake hard frequency, send alarm, this alarm is used for driver brake hard too continually, reminds driver to note.When parameter preset is vehicle condition, vehicle condition can specifically gasoline surplus, machine oil surplus, parts ruuning situation etc., when the value of these parameters meets default alarm conditions, send alarm, alarm is used for driver and adds gasoline in time, adds machine oil, maintains, keeps in repair.
Said method brings convenience to vehicle driver, real-time reminding can be obtained in the process of moving if there is fatigue driving, the dangerous driving behavior such as to drive over the speed limit, real-time monitoring vehicle driving safety, sends a warning message to user when driving time or abnormal behavior.The present invention is real-time monitoring vehicle situation also, and before vehicle part goes wrong, timely driver overhauls, maintains.
Auditing system
At present, having a lot of industry field as government affairs department, financial department etc., whether meet internal security standard and process requirements, identify potential security risk etc. if all applying auditing system to supervise operation in industry.Common flow process is the daily record of work (as operation behavior daily record, admin log etc.) of each information equipment in register system, by monitoring the daily record of work of each information equipment in system, thus independent evaluations can be made to the compliance of internal process and security, effectively avoid the loss that system or human error cause, and guarantee the reliability of carrying out system needed for operational decision making in time.At present, normally carry out manual audit by safety manager or auditor, therefore, generally need the daily record data of each information equipment of centralized collection, secondly the risk of instrument or artificial cognition behavior is passed through, the behaviorist risk gone out by artificial cognition is advised provision, safe operation flow process or the management system of auditing with conjunction and is compared, the violation event of auditing out real.This manual audit's mode needs to drop into a large amount of human resources and system resource for the collection of data and risk identification, expend energy on is also needed to carry out closing the comparison of rule requirement, be easy to occur leaking problems such as examining, mistake is examined, thus audit of information security accurately can not be accomplished in management, there is administrative vulnerability.And be timing, focus on, real-time is poor, can not Timeliness coverage problem.
Meet the good conjunction ruleization audit technique of real-time, intellectuality and security to provide a kind of to user, the said method that the embodiment of the present invention provides can be applied to auditing system simultaneously:
The non-structured daily record of work that each information equipment in auditing system produces is converted to structural data, and recycling structural data realizes the audit process of intelligence.
Now, unstructured data in said method can be the daily record of work that in auditing system, each information equipment produces, the structural data utilizing said method to generate is structuring daily record, and parameter preset wherein can be defined as any one in the running parameters such as operand, running time, operation place, action type, authorization mechanism, DB amount or multinomial; Wherein, the corresponding default alarm conditions of each parameter preset, these default alarm conditions are what pre-set, setting up procedure can be make default alarm conditions according to historical auditing data, conjunction rule provision, safe operation flow process or management system etc., when the value of parameter preset meets default alarm conditions, sends alarm, alarm does not conform to the behavior of rule for informing, meanwhile, if there are the means blocking and do not conform to rule behavior, this means can also be started simultaneously.Illustrate below:
When parameter preset is operand, suppose that corresponding default alarm conditions be operand are not predetermined registration operation objects, now, if the value of parameter preset shows that current operation object is not predetermined registration operation object, then alarm can be sent, alarm is used for informing that current operation object is not valid operation object, meanwhile, current operation object can be stoped to continue operation.When parameter preset is DB amount, suppose that corresponding default alarm conditions are that DB amount is equal to or greater than the preset data amount of money, now, if the value of parameter preset shows that the current data amount of money is equal to or greater than the preset data amount of money, then alarm can be sent, alarm is used for informing that the current data amount of money transfinites, and meanwhile, can stop the further operation to DB amount.
Said method is applied in auditing system, can greatly improve audit efficiency, save time and human cost, and can reach real-time auditing, much more timely than current auditing at a fixed time; And when finding unlawful practice, automatically can cut off violation operation.
In addition, the structural data that the embodiment of the present invention generates can also be applied to the exchange carrying out data with third party database.Now, above-mentioned unstructured data disposal route, after implementation step S101-103, also can be implemented by the following two kinds mode:
Mode one (comprising step B1-B2)
Step B1, by data exchange interface, search the data of mating with structural data from third party database, the data in third party database are consistent with Structured data sources;
Step B2, carries out visualization processing to the data matched with structural data.
Mode two (comprising step C1-C3)
Step C1, by data exchange interface, import the data in third party database, the data in third party database are consistent with Structured data sources;
Step C2, searches the data matched with structural data in the data imported;
Step C3, carries out visualization processing to the data matched with structural data.
In said method, data exchange interface can be Restful API (Application Programming Interface, application programming interface), the form of the data exchanged can be JSON (JavaScript Object Notation), XML, CSV Comma-Separated Values, comma separated value file form), any data layout that can exchange such as TSV or ProtocolBuffer.
Such as, above-mentioned unstructured data disposal route is applied in internet, and the structural data of generation is: " ip ": [" 124.230.159.131 "].Parameter preset is IP address.Therefore by data exchange interface, by system and the third party database consistent with Structured data sources---the address base docking of the IP whole nation, above-mentioned IP address is searched from the address base of the IP whole nation, or by the data importing in the address base of the IP whole nation in system, and then above-mentioned IP address is searched from the data imported, be " XX province XX city telecommunications " if Query Result is this IP address.For the result of multiple structured data query, system carries out statistical study or visualization processing (such as forming statistics or analysis diagram etc.) to Query Result, and result is presented to user, make user only need uploading data can get analytic statistics form or visualization result, greatly enhance the experience of user.
In one embodiment, also can the structural data accumulated in system be imported in third party database, to upgrade the data in third party database, thus for third party database.
The said method that the corresponding embodiment of the present invention provides, the embodiment of the present invention additionally provides a kind of unstructured data treating apparatus, as shown in Figure 3, comprising:
Rule acquisition module 31, for obtaining the resolution rules for extracting critical field in unstructured data;
Field abstraction module 32, for utilizing resolution rules, extracts the critical field in unstructured data;
Parameter preset assignment for the critical field called after parameter preset that will extract, and is the critical field extracted by data generation module 33, generating structured data.
In one embodiment, as shown in Figure 4, above-mentioned rule acquisition module 31 can comprise:
First searches submodule 41, for according to the application information generating unstructured data, searches User Defined resolution rules;
Field abstraction module 32 can comprise:
First extracts submodule 42, for utilizing the critical field in User Defined resolution rules extraction unstructured data;
Second searches submodule 43, for when first search submodule do not find User Defined resolution rules or User Defined resolution rules do not mate with unstructured data time, the built-in resolution rules of seeking system;
Second extracts submodule 44, for utilizing the critical field in system built-in resolution rules extraction unstructured data.
In one embodiment, above-mentioned first search submodule and can comprise:
Search unit, for the application information according to generation unstructured data, searching is the User Defined resolution rules that unstructured data configures in advance;
First extracts submodule comprises:
First extracting unit is the critical field in the User Defined resolution rules extraction unstructured data of unstructured data configuration for utilizing in advance.
In one embodiment, the first extraction submodule can comprise:
Second extracting unit, for when User Defined resolution rules has multiple, uses the critical field in each User Defined resolution rules extraction unstructured data successively.
In one embodiment, the second extraction submodule can comprise:
3rd extracting unit, for when the built-in resolution rules of system has multiple, uses the critical field in each system built-in resolution rules extraction unstructured data successively.
In one embodiment, said apparatus also can comprise:
Judge module, presets alarm conditions for judging whether the value of the parameter preset in structural data meets;
First processing module, when presetting alarm conditions for meeting when the value of the parameter preset in structural data, sending alarm and/or blocking operation corresponding to parameter preset.
In one embodiment, said apparatus also can comprise:
First searches module, and for passing through data exchange interface, from third party database, search the data of mating with structural data, the data in third party database are consistent with Structured data sources;
First imports module, and for passing through data exchange interface, import the data in third party database, the data in third party database are consistent with Structured data sources;
Second searches module, for searching the data matched with structural data in the data imported;
Second processing module, for carrying out visualization processing to the data matched with structural data.
In one embodiment, said apparatus also can comprise:
Second imports module, for importing in third party database by structural data, to upgrade the data in third party database.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (14)

1. a unstructured data disposal route, is characterized in that, comprising:
Obtain the resolution rules for extracting critical field in unstructured data;
Utilize described resolution rules, extract the critical field in unstructured data;
By the critical field called after parameter preset extracted, and be the critical field extracted by described parameter preset assignment, generating structured data.
2. the method for claim 1, is characterized in that,
Described acquisition, for extracting the resolution rules of critical field in unstructured data, comprising: according to the application information generating described unstructured data, search User Defined resolution rules;
Describedly utilize described resolution rules, extract the critical field in unstructured data, comprising: utilize described User Defined resolution rules to extract critical field in unstructured data; When not finding described User Defined resolution rules or described User Defined resolution rules does not mate with described unstructured data, the built-in resolution rules of seeking system; Utilize the critical field in described system built-in resolution rules extraction unstructured data.
3. method as claimed in claim 2, is characterized in that,
The described application information according to generating described unstructured data, search User Defined resolution rules, comprise: according to the application information generating described unstructured data, searching is the User Defined resolution rules of described unstructured data configuration in advance;
The described critical field utilized in described User Defined resolution rules extraction unstructured data, comprising: the User Defined resolution rules in advance for described unstructured data configuration described in utilization extracts the critical field in unstructured data.
4. method as claimed in claim 2, is characterized in that,
The described critical field utilized in described User Defined resolution rules extraction unstructured data, comprising: when described User Defined resolution rules has multiple, uses the critical field in each User Defined resolution rules extraction unstructured data successively; Or
The described critical field utilized in described system built-in resolution rules extraction unstructured data, comprising: when the built-in resolution rules of described system has multiple, uses the critical field in each system built-in resolution rules extraction unstructured data successively.
5. the method for claim 1, is characterized in that, described method also comprises:
Judge whether the value of the parameter preset in described structural data meets and preset alarm conditions;
When the value of the parameter preset in described structural data meets default alarm conditions, send alarm and/or block operation corresponding to described parameter preset.
6. the method for claim 1, is characterized in that, described method also comprises:
By data exchange interface, from third party database, search the data of mating with described structural data, the data in described third party database are consistent with described Structured data sources; Or by data exchange interface, import the data in third party database, the data in described third party database are consistent with described Structured data sources; The data matched with described structural data are searched in the data of described importing;
Visualization processing is carried out to the data matched with described structural data.
7. method as claimed in claim 6, it is characterized in that, described method also comprises:
Described structural data is imported in third party database, to upgrade the data in described third party database.
8. a unstructured data treating apparatus, is characterized in that, comprising:
Rule acquisition module, for obtaining the resolution rules for extracting critical field in unstructured data;
Field abstraction module, for utilizing described resolution rules, extracts the critical field in unstructured data;
Described parameter preset assignment for the critical field called after parameter preset that will extract, and is the critical field extracted by data generation module, generating structured data.
9. device as claimed in claim 8, is characterized in that,
Described rule acquisition module comprises:
First searches submodule, for according to the application information generating described unstructured data, searches User Defined resolution rules;
Described field abstraction module comprises:
First extracts submodule, for utilizing the critical field in described User Defined resolution rules extraction unstructured data;
Second searches submodule, for when described first search submodule do not find described User Defined resolution rules or described User Defined resolution rules do not mate with described unstructured data time, the built-in resolution rules of seeking system;
Second extracts submodule, for utilizing the critical field in described system built-in resolution rules extraction unstructured data.
10. device as claimed in claim 9, is characterized in that,
Described first searches submodule comprises:
Search unit, for the application information according to the described unstructured data of generation, searching is the User Defined resolution rules of described unstructured data configuration in advance;
Described first extracts submodule comprises:
First extracting unit is the critical field in the User Defined resolution rules extraction unstructured data of described unstructured data configuration in advance described in utilizing.
11. devices as claimed in claim 9, is characterized in that,
Described first extracts submodule comprises: the second extracting unit, for when described User Defined resolution rules has multiple, uses the critical field in each User Defined resolution rules extraction unstructured data successively; Or
Described second extracts submodule comprises: the 3rd extracting unit, for when the built-in resolution rules of described system has multiple, uses the critical field in each system built-in resolution rules extraction unstructured data successively.
12. devices as claimed in claim 8, it is characterized in that, described device also comprises:
Judge module, presets alarm conditions for judging whether the value of the parameter preset in described structural data meets;
First processing module, when presetting alarm conditions for meeting when the value of the parameter preset in described structural data, sending alarm and/or blocking operation corresponding to described parameter preset.
13. devices as claimed in claim 8, it is characterized in that, described device also comprises:
First searches module, and for passing through data exchange interface, from third party database, search the data of mating with described structural data, the data in described third party database are consistent with described Structured data sources;
First imports module, and for passing through data exchange interface, import the data in third party database, the data in described third party database are consistent with described Structured data sources;
Second searches module, for searching the data matched with described structural data in the data of described importing;
Second processing module, for carrying out visualization processing to the data matched with described structural data.
14. devices as claimed in claim 13, it is characterized in that, described device also comprises:
Second imports module, for importing in third party database by described structural data, to upgrade the data in described third party database.
CN201410466111.1A 2014-09-12 2014-09-12 Unstructured data processing method and device Pending CN104239506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410466111.1A CN104239506A (en) 2014-09-12 2014-09-12 Unstructured data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410466111.1A CN104239506A (en) 2014-09-12 2014-09-12 Unstructured data processing method and device

Publications (1)

Publication Number Publication Date
CN104239506A true CN104239506A (en) 2014-12-24

Family

ID=52227565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410466111.1A Pending CN104239506A (en) 2014-09-12 2014-09-12 Unstructured data processing method and device

Country Status (1)

Country Link
CN (1) CN104239506A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106066848A (en) * 2016-05-24 2016-11-02 辽宁蓝卡医疗投资管理有限公司 Data processing method, apparatus and system
CN106294873A (en) * 2016-08-24 2017-01-04 北京互利科技有限公司 The analytical equipment of a kind of machine data and the method for analysis
CN106503191A (en) * 2016-10-26 2017-03-15 冯村 A kind of data management apparatus and method
CN106557569A (en) * 2016-11-14 2017-04-05 用友网络科技股份有限公司 Introduction method and gatherer based on the non-structured document of meta-model
CN106815268A (en) * 2015-12-01 2017-06-09 中广核工程有限公司 The structuring processing method and system of magnanimity destructuring e-file
CN107251010A (en) * 2015-03-24 2017-10-13 英特尔公司 Unstructured UI
CN107436895A (en) * 2016-05-26 2017-12-05 中国移动通信集团云南有限公司 A kind of method and apparatus of unstructured data identification
CN108228664A (en) * 2016-12-22 2018-06-29 中国移动通信集团上海有限公司 Unstructured data processing method and processing device
CN108846003A (en) * 2018-04-20 2018-11-20 广东电网有限责任公司 A kind of unstructured machine data processing method and processing device
CN109063136A (en) * 2018-08-03 2018-12-21 北京大米未来科技有限公司 Non-relational database inquiry system and method
CN109359143A (en) * 2018-10-31 2019-02-19 新华三信息安全技术有限公司 A kind of report generation method and device
CN109710413A (en) * 2018-12-29 2019-05-03 重庆誉存大数据科技有限公司 A kind of integral Calculation Method of the rule engine system of semi-structured text data
CN109885607A (en) * 2019-01-11 2019-06-14 中广核工程有限公司 A kind of industry magnanimity unstructured data processing method and system
CN110442671A (en) * 2019-08-02 2019-11-12 深圳百胜扬工业电子商务平台发展有限公司 A kind of method and system of unstructured data processing
CN111625616A (en) * 2020-05-11 2020-09-04 苏州盈数智能科技有限公司 Enterprise-level data management system capable of realizing mass storage
CN112527862A (en) * 2020-12-10 2021-03-19 国网河北省电力有限公司雄安新区供电公司 Time sequence data processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070011175A1 (en) * 2005-07-05 2007-01-11 Justin Langseth Schema and ETL tools for structured and unstructured data
CN101055578A (en) * 2006-04-12 2007-10-17 龙搜(北京)科技有限公司 File content dredger based on rule
CN101231661A (en) * 2008-02-19 2008-07-30 上海估家网络科技有限公司 Method and system for digging object grade knowledge

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070011175A1 (en) * 2005-07-05 2007-01-11 Justin Langseth Schema and ETL tools for structured and unstructured data
CN101055578A (en) * 2006-04-12 2007-10-17 龙搜(北京)科技有限公司 File content dredger based on rule
CN101231661A (en) * 2008-02-19 2008-07-30 上海估家网络科技有限公司 Method and system for digging object grade knowledge

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁杰等: "面向多级调度管理的融合型搜索引擎", 《电力系统自动化》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107251010A (en) * 2015-03-24 2017-10-13 英特尔公司 Unstructured UI
US10922474B2 (en) 2015-03-24 2021-02-16 Intel Corporation Unstructured UI
CN106815268A (en) * 2015-12-01 2017-06-09 中广核工程有限公司 The structuring processing method and system of magnanimity destructuring e-file
CN106066848A (en) * 2016-05-24 2016-11-02 辽宁蓝卡医疗投资管理有限公司 Data processing method, apparatus and system
CN107436895B (en) * 2016-05-26 2020-12-04 中国移动通信集团云南有限公司 Method and device for identifying unstructured data
CN107436895A (en) * 2016-05-26 2017-12-05 中国移动通信集团云南有限公司 A kind of method and apparatus of unstructured data identification
CN106294873A (en) * 2016-08-24 2017-01-04 北京互利科技有限公司 The analytical equipment of a kind of machine data and the method for analysis
CN106503191A (en) * 2016-10-26 2017-03-15 冯村 A kind of data management apparatus and method
CN106557569A (en) * 2016-11-14 2017-04-05 用友网络科技股份有限公司 Introduction method and gatherer based on the non-structured document of meta-model
CN106557569B (en) * 2016-11-14 2020-07-03 用友网络科技股份有限公司 Method and device for importing unstructured document based on meta-model
CN108228664A (en) * 2016-12-22 2018-06-29 中国移动通信集团上海有限公司 Unstructured data processing method and processing device
CN108846003A (en) * 2018-04-20 2018-11-20 广东电网有限责任公司 A kind of unstructured machine data processing method and processing device
CN109063136A (en) * 2018-08-03 2018-12-21 北京大米未来科技有限公司 Non-relational database inquiry system and method
CN109359143A (en) * 2018-10-31 2019-02-19 新华三信息安全技术有限公司 A kind of report generation method and device
CN109359143B (en) * 2018-10-31 2022-03-22 新华三信息安全技术有限公司 Report generation method and device
CN109710413A (en) * 2018-12-29 2019-05-03 重庆誉存大数据科技有限公司 A kind of integral Calculation Method of the rule engine system of semi-structured text data
CN109885607A (en) * 2019-01-11 2019-06-14 中广核工程有限公司 A kind of industry magnanimity unstructured data processing method and system
CN110442671A (en) * 2019-08-02 2019-11-12 深圳百胜扬工业电子商务平台发展有限公司 A kind of method and system of unstructured data processing
CN111625616A (en) * 2020-05-11 2020-09-04 苏州盈数智能科技有限公司 Enterprise-level data management system capable of realizing mass storage
CN111625616B (en) * 2020-05-11 2024-02-06 苏州盈数智能科技有限公司 Enterprise-level data management system capable of mass storage
CN112527862A (en) * 2020-12-10 2021-03-19 国网河北省电力有限公司雄安新区供电公司 Time sequence data processing method and device

Similar Documents

Publication Publication Date Title
CN104239506A (en) Unstructured data processing method and device
US11875144B2 (en) Over-the-air (OTA) mobility services platform
US11983639B2 (en) Systems and methods for identifying process flows from log files and visualizing the flow
US10275407B2 (en) Apparatus and method for executing an automated analysis of data, in particular social media data, for product failure detection
US10025659B2 (en) System and method for batch monitoring of performance data
Ghezzi et al. Mining behavior models from user-intensive web applications
US20170109676A1 (en) Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process
US20180046956A1 (en) Warning About Steps That Lead to an Unsuccessful Execution of a Business Process
EP3584703A1 (en) Over-the-air (ota) mobility services platform
CN112540811B (en) Cache data detection method and device, computer equipment and storage medium
JP2022520425A (en) A system for processing geolocation event data for low latency
US20170109639A1 (en) General Model for Linking Between Nonconsecutively Performed Steps in Business Processes
JP6324534B2 (en) Promotion status data monitoring method, apparatus, device, and non-executable computer storage medium
US20170109638A1 (en) Ensemble-Based Identification of Executions of a Business Process
CN112541009A (en) Data query method and device, electronic equipment and storage medium
CN112100239A (en) Portrait generation method and apparatus for vehicle detection device, server and readable storage medium
Kim-Hung et al. A scalable IoT framework to design logical data flow using virtual sensor
KR101942576B1 (en) System for integrally analyzing and auditing heterogeneous personal information protection products
CN111917848A (en) Data processing method based on edge computing and cloud computing cooperation and cloud server
KR100798577B1 (en) Analyzing system and analyzing method for data quality problem
US20220237074A1 (en) Data quality-based computations for kpis derived from time-series data
CN109284833A (en) Method, equipment and the storage medium of characteristic are obtained for machine learning model
US11175648B2 (en) Method and apparatus for providing an instantiated industrial semantic model for an industrial infrastructure
CN105787132A (en) Method and system for controlling user behavior analysis
US8504599B1 (en) Intelligent system for database retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141224