CN107871009A - A kind of method and device for gathering directory metadata - Google Patents

A kind of method and device for gathering directory metadata Download PDF

Info

Publication number
CN107871009A
CN107871009A CN201711146068.0A CN201711146068A CN107871009A CN 107871009 A CN107871009 A CN 107871009A CN 201711146068 A CN201711146068 A CN 201711146068A CN 107871009 A CN107871009 A CN 107871009A
Authority
CN
China
Prior art keywords
definition
acquisition tasks
information
directory metadata
collected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711146068.0A
Other languages
Chinese (zh)
Inventor
王震
李连伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Cloud Service Information Technology Co Ltd
Original Assignee
Shandong Inspur Cloud Service Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Cloud Service Information Technology Co Ltd filed Critical Shandong Inspur Cloud Service Information Technology Co Ltd
Priority to CN201711146068.0A priority Critical patent/CN107871009A/en
Publication of CN107871009A publication Critical patent/CN107871009A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata

Abstract

The invention provides a kind of method and device for gathering directory metadata, this method includes:Default at least one acquisition tasks definition, and acquisition system definition corresponding to the definition of each acquisition tasks and collection rule definition;It is performed both by for the definition of each acquisition tasks:Tasks carrying frequency in being defined according to current acquisition tasks, acquisition system definition, log-on message source systems periodically according to corresponding to the definition of current acquisition tasks;According to collection rule definition corresponding to the definition of current acquisition tasks, the field to be collected in information source system is determined;Gather directory metadata information corresponding to field to be collected.Acquisition tasks definition, acquisition system definition and collection rule definition through pre-setting, it is possible to achieve the distributed automatic data collection of directory metadata information.Therefore, this programme can improve the collecting efficiency of directory metadata.

Description

A kind of method and device for gathering directory metadata
Technical field
The present invention relates to field of computer technology, more particularly to a kind of method and device for gathering directory metadata.
Background technology
Today's society has stepped into the big data epoch, and data sharing is realized between each service unit, is social development Inevitable requirement.It is mutually isolated between operation system because each service unit is mutually isolated, therefore data processing can be carried out, To realize the data sharing between service unit.
At present, staff can manually make a copy of the information such as the directory metadata in each operation system, and then after execution It is continuous to arrange.
But the usual substantial amounts of service unit so that artificial to gather the less efficient of directory metadata.
The content of the invention
The invention provides a kind of method and device for gathering directory metadata, it is possible to increase the collection effect of directory metadata Rate.
In order to achieve the above object, the present invention is achieved through the following technical solutions:
On the one hand, the invention provides a kind of method for gathering directory metadata, at least one acquisition tasks definition is preset, And acquisition system definition and collection rule definition corresponding to each described acquisition tasks definition;Also include:
It is performed both by for acquisition tasks definition each described:Tasks carrying frequency in being defined according to current acquisition tasks Rate, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, log-on message source systems;
According to target collection rule definition corresponding to the current acquisition tasks definition, determine in described information source systems Field to be collected;
Gather directory metadata information corresponding to the field to be collected.
Further, the acquisition system definition includes:System reference address, and/or, system accesses log-on message;
The collection rule definition includes:Page info, page area information and field information;
The field to be collected determined in described information source systems, including:Defined according to the target collection rule In page info, the page to be collected gone out using matching regular expressions in described information source systems;According to the target Collection rule define in page area information, utilize regular expression, XPATH (Xml Path Language, expansible mark Remember language path language), any one matching in CSS (Cascading Style Sheets, CSS) selector The page area to be collected gone out in the page to be collected;Field information in being defined according to the target collection rule, utilize Any one in regular expression, XPATH, CSS selector matches the field to be collected in the page area to be collected.
Further, this method also includes:Judge whether define materialization in the current acquisition tasks definition, if so, root Data source definitions in being defined according to the current acquisition tasks, by the directory metadata information materialization to the data source definitions In corresponding database;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types Any one or more;
The data of the directory metadata information preserve form be JSON (JavaScript Object Notation, JavaScript object markup language) string.
Further, this method also includes:In undefined materialization in judging the current acquisition tasks definition, with Restapi (Representational State Transfer Application Programming Interface, RESTful application programming interfaces) mode or based on SOAP (Simple Object Access Protocol, simple object Access protocol) webservice modes issue the directory metadata information.
Further, also include in the acquisition tasks definition:Task names, Mission Monitor, failover, mission statement In any one or more.
On the other hand, the invention provides a kind of device for gathering directory metadata, including:
Setting unit, for presetting at least one acquisition tasks definition, and each described acquisition tasks definition corresponds to Acquisition system definition and collection rule definition;
First processing units, for being performed both by for acquisition tasks definition each described:Determined according to current acquisition tasks Tasks carrying frequency in justice, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, log in letter Cease source systems;
Determining unit, for the target collection rule definition according to corresponding to the current acquisition tasks definition, it is determined that described Field to be collected in information source system;
Collecting unit, for gathering directory metadata information corresponding to the field to be collected.
Further, the acquisition system definition includes:System reference address, and/or, system accesses log-on message;
The collection rule definition includes:Page info, page area information and field information;
The determining unit, specifically for the page info in being defined according to the target collection rule, utilize canonical table The page to be collected in described information source systems is matched up to formula;Page area in being defined according to the target collection rule Information, matched using any one in regular expression, XPATH, CSS selector to be collected in the page to be collected Page area;Field information in being defined according to the target collection rule, utilizes regular expression, XPATH, CSS selector In any one match field to be collected in the page area to be collected.
Further, the device of the collection directory metadata also includes:Second processing unit, materialization unit;
The second processing unit, for judging whether define materialization in the current acquisition tasks definition, if so, triggering The materialization unit;
The materialization unit, for the data source definitions in being defined according to the current acquisition tasks, by catalogue member In database corresponding to data message materialization to the data source definitions;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types Any one or more;
The data of the directory metadata information preserve form and gone here and there for JSON.
Further, the device of the collection directory metadata also includes:Release unit;
The second processing unit, it is additionally operable to, in undefined materialization in judging the current acquisition tasks definition, touch Send out release unit described;
The release unit, the catalogue is issued for the webservice modes in a manner of restapi or based on SOAP Metadata information.
Further, also include in the acquisition tasks definition:Task names, Mission Monitor, failover, mission statement In any one or more.
The invention provides a kind of method and device for gathering directory metadata, this method includes:Preset at least one adopt Set task defines, and acquisition system definition corresponding to the definition of each acquisition tasks and collection rule definition;For each acquisition tasks Definition is performed both by:Tasks carrying frequency in being defined according to current acquisition tasks, periodically defined according to current acquisition tasks Corresponding acquisition system definition, log-on message source systems;Defined according to collection rule corresponding to the definition of current acquisition tasks, really Determine the field to be collected in information source system;Gather directory metadata information corresponding to field to be collected.Through pre-setting Acquisition tasks definition, acquisition system definition and collection rule definition, it is possible to achieve the distribution of directory metadata information is automatic Collection.Therefore, the present invention can improve the collecting efficiency of directory metadata.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis These accompanying drawings obtain other accompanying drawings.
Fig. 1 is a kind of flow chart of the method for collection directory metadata that one embodiment of the invention provides;
Fig. 2 is the flow chart of the method for another collection directory metadata that one embodiment of the invention provides;
Fig. 3 is a kind of schematic diagram of the device for collection directory metadata that one embodiment of the invention provides;
Fig. 4 is the schematic diagram of the device for another collection directory metadata that one embodiment of the invention provides.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
As shown in figure 1, the embodiments of the invention provide a kind of method for gathering directory metadata, following step can be included Suddenly:
Step 101:Default at least one acquisition tasks definition, and gathered corresponding to each described acquisition tasks definition System defines and collection rule definition.
Step 102:It is performed both by for acquisition tasks definition each described:Task in being defined according to current acquisition tasks Frequency is performed, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, log-on message source systems.
Step 103:According to target collection rule definition corresponding to the current acquisition tasks definition, determine that described information is come Field to be collected in the system of source.
Step 104:Gather directory metadata information corresponding to the field to be collected.
The embodiments of the invention provide a kind of method for gathering directory metadata, presets at least one acquisition tasks definition, And acquisition system definition and collection rule definition corresponding to each acquisition tasks definition;It is performed both by for the definition of each acquisition tasks: Tasks carrying frequency in being defined according to current acquisition tasks, periodically gather system according to corresponding to the definition of current acquisition tasks System definition, log-on message source systems;According to collection rule definition corresponding to the definition of current acquisition tasks, information source system is determined Field to be collected in system;Gather directory metadata information corresponding to field to be collected.Acquisition tasks through pre-setting are determined Justice, acquisition system definition and collection rule definition, it is possible to achieve the distributed automatic data collection of directory metadata information.Therefore, originally Inventive embodiments can improve the collecting efficiency of directory metadata.
In one embodiment of the invention, the acquisition system definition includes:System reference address, and/or, system is visited Ask log-on message;
The collection rule definition includes:Page info, page area information and field information;
The field to be collected determined in described information source systems, including:Defined according to the target collection rule In page info, the page to be collected gone out using matching regular expressions in described information source systems;According to the target Collection rule define in page area information, matched using any one in regular expression, XPATH, CSS selector Page area to be collected in the page to be collected;Field information in being defined according to the target collection rule, using just Then any one in expression formula, XPATH, CSS selector matches the field to be collected in the page area to be collected.
In detail, it can be account and password used in login system that said system, which accesses log-on message,.Acquired system System definition, can be defined to the system in the source of directory metadata information, to provide the necessary condition of system login.
In the embodiment of the present invention, multiple information source systems generally be present, adopted in each information source system including some Set task defines the directory metadata information of required collection.
In an embodiment of the invention, database corresponding to different information source systems can be different, and different pieces of information The server of place category can be different.So, it is possible to achieve the distributed capture of directory metadata information, by task in Duo Tai Server carries out burst, so as to realize load balancing, failover and the effect of state supervision.
In an embodiment of the invention, for collection rule definition, the page, page area and word can be defined Section.Wherein, in the definition of same collection rule, at least one page can be defined, can be defined in each page at least one Page area, at least one field can be defined in each page area.
Wherein it is possible to match the page using regular expression, regular expression, XPATH, CSS selector can be utilized In any one match page area and matching field.
For example, collection rule definition can be divided into three steps:
1st step:Define the page;
2nd step:Define page area;
3rd step:Define field.
For example, for above-mentioned 1st step:The reference address of the page where directory metadata information can be defined, can be with Define it is multiple, such as:
The A pages:http://127.0.0.1:8080/example/index1.html;
The B pages:http://127.0.0.1:8080/example/index2.html;
Regular expression, the A pages are supported in the definition of page address, and the definition of the B pages can be reduced to:http:// 127.0.0.1:8080/example/index*.html。
For above-mentioned 2nd step:Region where directory metadata information can be chosen on the page, the selection branch of page area Regular expression is held, supports css selectors, supports XPATH to choose.For example all there is a list, table in the A pages and the B pages Single title is all called form1, and so we can select form1 lists by CSS selector:Form [name=' form1’]。
For above-mentioned 3rd step:If there are many fields on page area, but not all field is all that we want , so we can continue to define the rule that field is chosen.If the field that we select has individual general character, there are CSS classes Cs1, so we can pass through:.cs1 our desired fields are chosen.
In an embodiment of the invention, it can use and increase income HTMLUNIT technologies, based on browser engine framework of increasing income, The page, page area, field are realized, the matching with regular expression, XPATH, CSS selector.
In the embodiment of the present invention, for the directory metadata information collected, based on different practical application requests, at least There may be following two kinds of subsequent treatment modes:
Mode 1:Materialization is to database;
Mode 2:Opened to third party.
In detail, corresponding to aforesaid way 1:
In one embodiment of the invention, this method may further include:Judge the current acquisition tasks definition In whether define materialization, if so, the data source definitions in being defined according to the current acquisition tasks, the directory metadata is believed Cease in database corresponding to materialization to the data source definitions;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types Any one or more;
The data of the directory metadata information preserve form and gone here and there for JSON.
In detail, after directory metadata acquisition of information, directory metadata information can be generated to form to the number defined Prepared according to the catalogue data collection in storehouse, thinking follow-up.
In detail, when setting acquisition tasks to define, it may be determined that the directory metadata letter that each acquisition tasks are collected Whether breath needs materialization.For example staff is when thinking to need materialization, " whether materialization " this task attribute can be hooked Choosing.Certainly, if needing materialization, data source definitions can also be included in acquisition tasks definition.Wherein, data source definitions can include number Log-on message, data source types etc. are accessed according to source address, data source.
When needing materialization, a specific data source can be determined according to data source definitions, such as on a certain server Database, then the directory metadata information collected can be stored in a manner of database table.Certainly, at this Invent in another embodiment, the directory metadata information collected can equally be stored in a manner of file.
In an embodiment of the invention, every catalogue metadata information can use the capitalization shape of Chinese Pin Yin initial Formula preserves, and data preserve form and gone here and there for JSON.
As an example it is assumed that the directory metadata information that a task names are called examination & approval application has currently been collected, bag Containing enterprise name, application time, social credibility code, enterprise address, this 5 information of business entity.If it is determined that materialization, system, Such as the system run on a kind of device for gathering directory metadata, can by this 5 direct materializations of information into table structure, and A table for including this five fields is generated in the database defined.
In detail, corresponding to aforesaid way 2:
In one embodiment of the invention, this method may further include:Judging the current acquisition tasks In definition during undefined materialization, the webservice modes in a manner of restapi or based on SOAP issue the directory metadata Information.
In detail, different from materialization, the system in the embodiment of the present invention equally can be as just an instrument, to provide Used to third party.At this moment system can directly provide collected directory metadata information, and this knot is taken by third party It is further processed after fruit.Such as third party information can be modified after materialization again, or be directly stored in big data text In part system.
In detail, there may be two kinds of presentation modes, first, the webservice of SOAP modes, another is restapi side Formula.Preferably, restapi modes can be used, that is, opens an address and is directly called for third party to obtain result.
In the embodiment of the present invention, distributed task scheduling can be used, using HTMLUNIT technologies of increasing income, realizes that catalogue gathers, Collection result is issued with service form, is realized shared.
In one embodiment of the invention, also include in the acquisition tasks definition:Task names, Mission Monitor, mistake Imitate any one or more in transfer, mission statement.
In detail, to realize the collection of directory metadata, acquisition tasks definition, acquisition tasks definition can be pre-set In can not only include tasks carrying frequency, whether materialization, and need to further comprise data source definitions during materialization, except this it Outside, task names, Mission Monitor, failover, mission statement etc. can also be included.
As shown in Fig. 2 the method that one embodiment of the invention provides another collection directory metadata, specifically include with Lower step:
Step 201:Default at least one acquisition tasks definition, and acquisition system corresponding to the definition of each acquisition tasks Definition and collection rule definition.
In detail, acquisition tasks definition in can include task names, tasks carrying frequency, Mission Monitor, failover, Mission statement, whether materialization, data source definitions etc..
In detail, acquisition system definition can include system reference address and system accesses log-on message.
In detail, collection rule definition can include page info, page area information and field information.
Step 202:It is performed both by for the definition of each acquisition tasks:Tasks carrying in being defined according to current acquisition tasks Frequency, acquisition system definition, log-on message source systems periodically according to corresponding to the definition of current acquisition tasks.
Step 203:It is determined that target collection rule definition corresponding to current acquisition tasks definition.
Step 204:Page info in being defined according to target collection rule, go out information source using matching regular expressions The page to be collected in system.
In detail, the page, page area, the relevant information of field can be included during collection rule defines.
Step 205:Page area information in being defined according to target collection rule, is matched using CSS selector and waits to adopt Collect the page area to be collected in the page.
Step 206:Field information in being defined according to target collection rule, page to be collected is matched using CSS selector Field to be collected in the region of face.
Step 207:Gather directory metadata information corresponding to field to be collected.
Step 208:Judge whether define materialization in current acquisition tasks definition, if so, being defined according to current acquisition tasks In data source definitions, by database corresponding to directory metadata information materialization to data source definitions, otherwise, with restapi Mode issues directory metadata information.
Certainly, in an alternative embodiment of the invention, equally mesh can be issued in a manner of the webservice based on SOAP Record metadata information.
In detail, data source definitions can include data source address, data source is accessed in log-on message, data source types Any one or more.
In detail, the data of directory metadata information preserve form and gone here and there for JSON.
In summary, the method for the collection directory metadata described in the embodiment of the present invention, can be related to acquisition tasks definition, Acquisition system definition, collection rule definition, data source definitions, the issue of directory metadata information define this several part.
Wherein, acquisition tasks definition can configure acquisition tasks engine rule, realize that system captures automatically;Acquisition system is determined Justice can be defined to the system of directory metadata information source;Collection rule definition can be to directory metadata information extraction Rule is specified, and according to the rule of definition, system is automatically extracted directory metadata information;Data source definitions can be right The directory metadata information collected want materialization to data source be defined, system after directory metadata information is collected, As needed can be to directory metadata item of information materialization to the data source specified;The issue definition of directory metadata information can be right The directory metadata information collected is issued, and facilitates third party system to use the achievement of this method embodiment.
Based on the above, the acquisition method of the directory metadata described in the embodiment of the present invention, it is only necessary to once configure, nothing Manual intervention is needed, the automatic data collection of directory metadata information can be realized, saves human cost, is beneficial to improve directory metadata The efficiency and accuracy of collection, propulsion data share the smooth execution of work.
As shown in figure 3, one embodiment of the invention provides a kind of device for gathering directory metadata, including:
Setting unit 301, for presetting at least one acquisition tasks definition, and each described acquisition tasks definition pair The acquisition system definition and collection rule definition answered;
First processing units 302, for being performed both by for acquisition tasks definition each described:According to current acquisition tasks Tasks carrying frequency in definition, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, are logged in Information source system;
Determining unit 303, for the target collection rule definition according to corresponding to the current acquisition tasks definition, determine institute State the field to be collected in information source system;
Collecting unit 304, for gathering directory metadata information corresponding to the field to be collected.
In an embodiment of the invention, the acquisition system definition includes:System reference address, and/or, system accesses Log-on message;
The collection rule definition includes:Page info, page area information and field information;
The determining unit 303, specifically for the page info in being defined according to the target collection rule, utilize canonical Expression formula matches the page to be collected in described information source systems;Page pool in being defined according to the target collection rule Domain information, matched using any one in regular expression, XPATH, CSS selector in the page to be collected and wait to adopt Collect page area;Field information in being defined according to the target collection rule, selected using regular expression, XPATH, CSS Any one in device matches the field to be collected in the page area to be collected.
In an embodiment of the invention, Fig. 4 is refer to, the device of the collection directory metadata can also include:Second Processing unit 401, materialization unit 402;
The second processing unit 401, for judging whether define materialization in the current acquisition tasks definition, if so, Trigger the materialization unit 402;
The materialization unit 402, for the data source definitions in being defined according to the current acquisition tasks, by the catalogue In database corresponding to metadata information materialization to the data source definitions;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types Any one or more;
The data of the directory metadata information preserve form and gone here and there for JSON.
In an embodiment of the invention, Fig. 4 is refer to, the device of the collection directory metadata can also include:Issue Unit 403;
The second processing unit 401, it is additionally operable in undefined materialization in judging the current acquisition tasks definition, Trigger the release unit 403;
The release unit 403, the mesh is issued for the webservice modes in a manner of restapi or based on SOAP Record metadata information.
In an embodiment of the invention, also include in the acquisition tasks definition:Task names, Mission Monitor, failure Transfer, any one or more in mission statement.
The contents such as the information exchange between each unit, implementation procedure in said apparatus, due to implementing with the inventive method Example is based on same design, and particular content can be found in the narration in the inventive method embodiment, and here is omitted.
In summary, each embodiment of the invention at least has the advantages that:
1st, in the embodiment of the present invention, at least one acquisition tasks definition is preset, and adopted corresponding to the definition of each acquisition tasks Collecting system defines and collection rule definition;It is performed both by for the definition of each acquisition tasks:Appointing in being defined according to current acquisition tasks Business performs frequency, acquisition system definition, log-on message source systems periodically according to corresponding to the definition of current acquisition tasks;Root According to collection rule definition corresponding to the definition of current acquisition tasks, the field to be collected in information source system is determined;Gather and wait to adopt Collect directory metadata information corresponding to field.Acquisition tasks definition, acquisition system definition and collection rule through pre-setting Definition, it is possible to achieve the distributed automatic data collection of directory metadata information.Therefore, the embodiment of the present invention can improve catalogue member number According to collecting efficiency.
2nd, the acquisition method of the directory metadata described in the embodiment of the present invention, it is only necessary to once configure, without manual intervention, The automatic data collection of directory metadata information can be realized, saves human cost, is beneficial to the efficiency for improving directory metadata collection And accuracy, propulsion data share the smooth execution of work.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity Or operation makes a distinction with another entity or operation, and not necessarily require or imply and exist between these entities or operation Any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non- It is exclusive to include, so that process, method, article or equipment including a series of elements not only include those key elements, But also the other element including being not expressly set out, or also include solid by this process, method, article or equipment Some key elements.In the absence of more restrictions, by sentence " including the key element that a 〃 〃 " is limited, it is not excluded that Other identical factor in the process including the key element, method, article or equipment also be present.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in computer-readable storage medium, the program Upon execution, the step of execution includes above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.
It is last it should be noted that:Presently preferred embodiments of the present invention is the foregoing is only, is merely to illustrate the skill of the present invention Art scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention, Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.

Claims (10)

  1. A kind of 1. method for gathering directory metadata, it is characterised in that default at least one acquisition tasks definition, and each Acquisition system definition and collection rule definition corresponding to the acquisition tasks definition;Also include:
    It is performed both by for acquisition tasks definition each described:Tasks carrying frequency in being defined according to current acquisition tasks, week The acquisition system definition according to corresponding to the current acquisition tasks definition of phase property, log-on message source systems;
    According to target collection rule definition corresponding to the current acquisition tasks definition, determine to treat in described information source systems Gather field;
    Gather directory metadata information corresponding to the field to be collected.
  2. 2. according to the method for claim 1, it is characterised in that
    The acquisition system definition includes:System reference address, and/or, system accesses log-on message;
    The collection rule definition includes:Page info, page area information and field information;
    The field to be collected determined in described information source systems, including:In being defined according to the target collection rule Page info, the page to be collected gone out using matching regular expressions in described information source systems;Gathered according to the target Page area information in rule definition, utilizes regular expression, extensible markup language path language XPATH, cascading style Any one in table CSS selector matches the page area to be collected in the page to be collected;Adopted according to the target Field information in collection rule definition, described treat is matched using any one in regular expression, XPATH, CSS selector Gather the field to be collected in page area.
  3. 3. according to the method for claim 1, it is characterised in that
    Further comprise:Judge whether define materialization in the current acquisition tasks definition, if so, appointing according to the current collection Data source definitions in business definition, by the directory metadata information materialization into database corresponding to the data source definitions;
    Wherein, the data source definitions include:Data source address, data source access any in log-on message, data source types It is one or more;
    The data of the directory metadata information preserve form and gone here and there for JavaScript object markup language JSON.
  4. 4. according to the method for claim 3, it is characterised in that
    Further comprise:In undefined materialization in judging the current acquisition tasks definition, compiled with RESTful application programs Journey interface restapi modes or webservice modes based on simple object access protocol issue the directory metadata Information.
  5. 5. according to any described method in Claims 1-4, it is characterised in that
    Also include in the acquisition tasks definition:Any one in task names, Mission Monitor, failover, mission statement It is or a variety of.
  6. A kind of 6. device for gathering directory metadata, it is characterised in that including:
    Setting unit, for presetting at least one acquisition tasks definition, and adopted corresponding to each described acquisition tasks definition Collecting system defines and collection rule definition;
    First processing units, for being performed both by for acquisition tasks definition each described:In being defined according to current acquisition tasks Tasks carrying frequency, the periodically acquisition system definition according to corresponding to the current acquisition tasks definition, log-on message come Source system;
    Determining unit, for the target collection rule definition according to corresponding to the current acquisition tasks definition, determine described information Field to be collected in source systems;
    Collecting unit, for gathering directory metadata information corresponding to the field to be collected.
  7. 7. the device of collection directory metadata according to claim 6, it is characterised in that
    The acquisition system definition includes:System reference address, and/or, system accesses log-on message;
    The collection rule definition includes:Page info, page area information and field information;
    The determining unit, specifically for the page info in being defined according to the target collection rule, utilize regular expression Match the page to be collected in described information source systems;Page area letter in being defined according to the target collection rule Breath, utilizes any one in regular expression, extensible markup language path language XPATH, CSS CSS selector Match the page area to be collected in the page to be collected;Field information in being defined according to the target collection rule, Matched using any one in regular expression, XPATH, CSS selector to be collected in the page area to be collected Field.
  8. 8. the device of collection directory metadata according to claim 6, it is characterised in that
    Also include:Second processing unit, materialization unit;
    The second processing unit, for judging whether define materialization in the current acquisition tasks definition, if so, described in triggering Materialization unit;
    The materialization unit, for the data source definitions in being defined according to the current acquisition tasks, by the directory metadata In database corresponding to information materialization to the data source definitions;
    Wherein, the data source definitions include:Data source address, data source access any in log-on message, data source types It is one or more;
    The data of the directory metadata information preserve form and gone here and there for JavaScript object markup language JSON.
  9. 9. the device of collection directory metadata according to claim 8, it is characterised in that
    Also include:Release unit;
    The second processing unit, it is additionally operable to, in undefined materialization in judging the current acquisition tasks definition, trigger institute State release unit;
    The release unit, for being assisted in a manner of RESTful application programming interfaces restapi or based on simple object access View SOAP webservice modes issue the directory metadata information.
  10. 10. according to the device of any described collection directory metadata in claim 6 to 9, it is characterised in that
    Also include in the acquisition tasks definition:Any one in task names, Mission Monitor, failover, mission statement It is or a variety of.
CN201711146068.0A 2017-11-17 2017-11-17 A kind of method and device for gathering directory metadata Pending CN107871009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711146068.0A CN107871009A (en) 2017-11-17 2017-11-17 A kind of method and device for gathering directory metadata

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711146068.0A CN107871009A (en) 2017-11-17 2017-11-17 A kind of method and device for gathering directory metadata

Publications (1)

Publication Number Publication Date
CN107871009A true CN107871009A (en) 2018-04-03

Family

ID=61754063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711146068.0A Pending CN107871009A (en) 2017-11-17 2017-11-17 A kind of method and device for gathering directory metadata

Country Status (1)

Country Link
CN (1) CN107871009A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360636A (en) * 2018-11-16 2019-02-19 江苏盛益医疗科技有限公司 A kind of hospital information management system
CN112988730A (en) * 2021-03-29 2021-06-18 国网宁夏电力有限公司电力科学研究院 Metadata collection method based on enterprise data inventory

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361022A (en) * 2014-10-22 2015-02-18 浪潮软件集团有限公司 Method based on collected data statistics and foreground display
CN104484424A (en) * 2014-12-19 2015-04-01 浪潮通用软件有限公司 Establishing method for resource price information base of construction enterprise based on internet
US20150186521A1 (en) * 2013-12-31 2015-07-02 Clicktale Ltd. Method and system for tracking and gathering multivariate testing data
CN104794161A (en) * 2015-03-24 2015-07-22 浪潮集团有限公司 Method for monitoring network public opinions
CN104915259A (en) * 2015-06-15 2015-09-16 浪潮软件集团有限公司 Task scheduling method applied to distributed acquisition system
CN106096056A (en) * 2016-06-30 2016-11-09 西南石油大学 A kind of based on distributed public sentiment data real-time collecting method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186521A1 (en) * 2013-12-31 2015-07-02 Clicktale Ltd. Method and system for tracking and gathering multivariate testing data
CN104361022A (en) * 2014-10-22 2015-02-18 浪潮软件集团有限公司 Method based on collected data statistics and foreground display
CN104484424A (en) * 2014-12-19 2015-04-01 浪潮通用软件有限公司 Establishing method for resource price information base of construction enterprise based on internet
CN104794161A (en) * 2015-03-24 2015-07-22 浪潮集团有限公司 Method for monitoring network public opinions
CN104915259A (en) * 2015-06-15 2015-09-16 浪潮软件集团有限公司 Task scheduling method applied to distributed acquisition system
CN106096056A (en) * 2016-06-30 2016-11-09 西南石油大学 A kind of based on distributed public sentiment data real-time collecting method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360636A (en) * 2018-11-16 2019-02-19 江苏盛益医疗科技有限公司 A kind of hospital information management system
CN112988730A (en) * 2021-03-29 2021-06-18 国网宁夏电力有限公司电力科学研究院 Metadata collection method based on enterprise data inventory

Similar Documents

Publication Publication Date Title
CN106484828B (en) Distributed internet data rapid acquisition system and acquisition method
TWI290698B (en) System and method for updating and displaying patent citation information
US20070198907A1 (en) System and method for enabling persistent values when navigating in electronic documents
DE102017111438A1 (en) API LEARNING
CN106897322A (en) The access method and device of a kind of database and file system
CN107544984A (en) A kind of method and apparatus of data processing
KR102222287B1 (en) Web Crawler System for Collecting a Structured and Unstructured Data in Hidden URL
CN107092639A (en) A kind of search engine system
CN110321383A (en) Big data platform method of data synchronization, device, computer equipment and storage medium
CN104933168B (en) A kind of web page contents automatic acquiring method
CN107506464A (en) A kind of method that HBase secondary indexs are realized based on ES
CN105528218B (en) Data drawing list Cascading Methods and data drawing list cascade system
CN110245145A (en) Structure synchronization method and apparatus of the relevant database to Hadoop database
Suzanti et al. REST API implementation on android based monitoring application
CN109150585A (en) A kind of network O&M failure solution, system, device and storage medium
CN107871009A (en) A kind of method and device for gathering directory metadata
CN103914486B (en) Document search and display system
US8341168B1 (en) System for displaying hierarchical data
Luo et al. Efficacy of transcatheter aortic valve implantation in patients with aortic stenosis and reduced LVEF
DE112016004967T5 (en) Automated discovery of information
CN113130086A (en) Health medical big data platform
CN104331512A (en) Automatic BBS (bulletin board system) page acquisition method
US20150066949A1 (en) Computerized systems and methods for social networking
CN108073637A (en) A kind of method for establishing left-eyed flounder economic characters information database
EP3523732A1 (en) Systems and methods for efficiently distributing alert messages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180403

RJ01 Rejection of invention patent application after publication