CN107871009A - A kind of method and device for gathering directory metadata - Google Patents
A kind of method and device for gathering directory metadata Download PDFInfo
- Publication number
- CN107871009A CN107871009A CN201711146068.0A CN201711146068A CN107871009A CN 107871009 A CN107871009 A CN 107871009A CN 201711146068 A CN201711146068 A CN 201711146068A CN 107871009 A CN107871009 A CN 107871009A
- Authority
- CN
- China
- Prior art keywords
- definition
- acquisition tasks
- information
- directory metadata
- collected
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24573—Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
Abstract
The invention provides a kind of method and device for gathering directory metadata, this method includes:Default at least one acquisition tasks definition, and acquisition system definition corresponding to the definition of each acquisition tasks and collection rule definition;It is performed both by for the definition of each acquisition tasks:Tasks carrying frequency in being defined according to current acquisition tasks, acquisition system definition, log-on message source systems periodically according to corresponding to the definition of current acquisition tasks;According to collection rule definition corresponding to the definition of current acquisition tasks, the field to be collected in information source system is determined;Gather directory metadata information corresponding to field to be collected.Acquisition tasks definition, acquisition system definition and collection rule definition through pre-setting, it is possible to achieve the distributed automatic data collection of directory metadata information.Therefore, this programme can improve the collecting efficiency of directory metadata.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of method and device for gathering directory metadata.
Background technology
Today's society has stepped into the big data epoch, and data sharing is realized between each service unit, is social development
Inevitable requirement.It is mutually isolated between operation system because each service unit is mutually isolated, therefore data processing can be carried out,
To realize the data sharing between service unit.
At present, staff can manually make a copy of the information such as the directory metadata in each operation system, and then after execution
It is continuous to arrange.
But the usual substantial amounts of service unit so that artificial to gather the less efficient of directory metadata.
The content of the invention
The invention provides a kind of method and device for gathering directory metadata, it is possible to increase the collection effect of directory metadata
Rate.
In order to achieve the above object, the present invention is achieved through the following technical solutions:
On the one hand, the invention provides a kind of method for gathering directory metadata, at least one acquisition tasks definition is preset,
And acquisition system definition and collection rule definition corresponding to each described acquisition tasks definition;Also include:
It is performed both by for acquisition tasks definition each described:Tasks carrying frequency in being defined according to current acquisition tasks
Rate, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, log-on message source systems;
According to target collection rule definition corresponding to the current acquisition tasks definition, determine in described information source systems
Field to be collected;
Gather directory metadata information corresponding to the field to be collected.
Further, the acquisition system definition includes:System reference address, and/or, system accesses log-on message;
The collection rule definition includes:Page info, page area information and field information;
The field to be collected determined in described information source systems, including:Defined according to the target collection rule
In page info, the page to be collected gone out using matching regular expressions in described information source systems;According to the target
Collection rule define in page area information, utilize regular expression, XPATH (Xml Path Language, expansible mark
Remember language path language), any one matching in CSS (Cascading Style Sheets, CSS) selector
The page area to be collected gone out in the page to be collected;Field information in being defined according to the target collection rule, utilize
Any one in regular expression, XPATH, CSS selector matches the field to be collected in the page area to be collected.
Further, this method also includes:Judge whether define materialization in the current acquisition tasks definition, if so, root
Data source definitions in being defined according to the current acquisition tasks, by the directory metadata information materialization to the data source definitions
In corresponding database;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types
Any one or more;
The data of the directory metadata information preserve form be JSON (JavaScript Object Notation,
JavaScript object markup language) string.
Further, this method also includes:In undefined materialization in judging the current acquisition tasks definition, with
Restapi (Representational State Transfer Application Programming Interface,
RESTful application programming interfaces) mode or based on SOAP (Simple Object Access Protocol, simple object
Access protocol) webservice modes issue the directory metadata information.
Further, also include in the acquisition tasks definition:Task names, Mission Monitor, failover, mission statement
In any one or more.
On the other hand, the invention provides a kind of device for gathering directory metadata, including:
Setting unit, for presetting at least one acquisition tasks definition, and each described acquisition tasks definition corresponds to
Acquisition system definition and collection rule definition;
First processing units, for being performed both by for acquisition tasks definition each described:Determined according to current acquisition tasks
Tasks carrying frequency in justice, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, log in letter
Cease source systems;
Determining unit, for the target collection rule definition according to corresponding to the current acquisition tasks definition, it is determined that described
Field to be collected in information source system;
Collecting unit, for gathering directory metadata information corresponding to the field to be collected.
Further, the acquisition system definition includes:System reference address, and/or, system accesses log-on message;
The collection rule definition includes:Page info, page area information and field information;
The determining unit, specifically for the page info in being defined according to the target collection rule, utilize canonical table
The page to be collected in described information source systems is matched up to formula;Page area in being defined according to the target collection rule
Information, matched using any one in regular expression, XPATH, CSS selector to be collected in the page to be collected
Page area;Field information in being defined according to the target collection rule, utilizes regular expression, XPATH, CSS selector
In any one match field to be collected in the page area to be collected.
Further, the device of the collection directory metadata also includes:Second processing unit, materialization unit;
The second processing unit, for judging whether define materialization in the current acquisition tasks definition, if so, triggering
The materialization unit;
The materialization unit, for the data source definitions in being defined according to the current acquisition tasks, by catalogue member
In database corresponding to data message materialization to the data source definitions;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types
Any one or more;
The data of the directory metadata information preserve form and gone here and there for JSON.
Further, the device of the collection directory metadata also includes:Release unit;
The second processing unit, it is additionally operable to, in undefined materialization in judging the current acquisition tasks definition, touch
Send out release unit described;
The release unit, the catalogue is issued for the webservice modes in a manner of restapi or based on SOAP
Metadata information.
Further, also include in the acquisition tasks definition:Task names, Mission Monitor, failover, mission statement
In any one or more.
The invention provides a kind of method and device for gathering directory metadata, this method includes:Preset at least one adopt
Set task defines, and acquisition system definition corresponding to the definition of each acquisition tasks and collection rule definition;For each acquisition tasks
Definition is performed both by:Tasks carrying frequency in being defined according to current acquisition tasks, periodically defined according to current acquisition tasks
Corresponding acquisition system definition, log-on message source systems;Defined according to collection rule corresponding to the definition of current acquisition tasks, really
Determine the field to be collected in information source system;Gather directory metadata information corresponding to field to be collected.Through pre-setting
Acquisition tasks definition, acquisition system definition and collection rule definition, it is possible to achieve the distribution of directory metadata information is automatic
Collection.Therefore, the present invention can improve the collecting efficiency of directory metadata.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
These accompanying drawings obtain other accompanying drawings.
Fig. 1 is a kind of flow chart of the method for collection directory metadata that one embodiment of the invention provides;
Fig. 2 is the flow chart of the method for another collection directory metadata that one embodiment of the invention provides;
Fig. 3 is a kind of schematic diagram of the device for collection directory metadata that one embodiment of the invention provides;
Fig. 4 is the schematic diagram of the device for another collection directory metadata that one embodiment of the invention provides.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
As shown in figure 1, the embodiments of the invention provide a kind of method for gathering directory metadata, following step can be included
Suddenly:
Step 101:Default at least one acquisition tasks definition, and gathered corresponding to each described acquisition tasks definition
System defines and collection rule definition.
Step 102:It is performed both by for acquisition tasks definition each described:Task in being defined according to current acquisition tasks
Frequency is performed, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, log-on message source systems.
Step 103:According to target collection rule definition corresponding to the current acquisition tasks definition, determine that described information is come
Field to be collected in the system of source.
Step 104:Gather directory metadata information corresponding to the field to be collected.
The embodiments of the invention provide a kind of method for gathering directory metadata, presets at least one acquisition tasks definition,
And acquisition system definition and collection rule definition corresponding to each acquisition tasks definition;It is performed both by for the definition of each acquisition tasks:
Tasks carrying frequency in being defined according to current acquisition tasks, periodically gather system according to corresponding to the definition of current acquisition tasks
System definition, log-on message source systems;According to collection rule definition corresponding to the definition of current acquisition tasks, information source system is determined
Field to be collected in system;Gather directory metadata information corresponding to field to be collected.Acquisition tasks through pre-setting are determined
Justice, acquisition system definition and collection rule definition, it is possible to achieve the distributed automatic data collection of directory metadata information.Therefore, originally
Inventive embodiments can improve the collecting efficiency of directory metadata.
In one embodiment of the invention, the acquisition system definition includes:System reference address, and/or, system is visited
Ask log-on message;
The collection rule definition includes:Page info, page area information and field information;
The field to be collected determined in described information source systems, including:Defined according to the target collection rule
In page info, the page to be collected gone out using matching regular expressions in described information source systems;According to the target
Collection rule define in page area information, matched using any one in regular expression, XPATH, CSS selector
Page area to be collected in the page to be collected;Field information in being defined according to the target collection rule, using just
Then any one in expression formula, XPATH, CSS selector matches the field to be collected in the page area to be collected.
In detail, it can be account and password used in login system that said system, which accesses log-on message,.Acquired system
System definition, can be defined to the system in the source of directory metadata information, to provide the necessary condition of system login.
In the embodiment of the present invention, multiple information source systems generally be present, adopted in each information source system including some
Set task defines the directory metadata information of required collection.
In an embodiment of the invention, database corresponding to different information source systems can be different, and different pieces of information
The server of place category can be different.So, it is possible to achieve the distributed capture of directory metadata information, by task in Duo Tai
Server carries out burst, so as to realize load balancing, failover and the effect of state supervision.
In an embodiment of the invention, for collection rule definition, the page, page area and word can be defined
Section.Wherein, in the definition of same collection rule, at least one page can be defined, can be defined in each page at least one
Page area, at least one field can be defined in each page area.
Wherein it is possible to match the page using regular expression, regular expression, XPATH, CSS selector can be utilized
In any one match page area and matching field.
For example, collection rule definition can be divided into three steps:
1st step:Define the page;
2nd step:Define page area;
3rd step:Define field.
For example, for above-mentioned 1st step:The reference address of the page where directory metadata information can be defined, can be with
Define it is multiple, such as:
The A pages:http://127.0.0.1:8080/example/index1.html;
The B pages:http://127.0.0.1:8080/example/index2.html;
Regular expression, the A pages are supported in the definition of page address, and the definition of the B pages can be reduced to:http://
127.0.0.1:8080/example/index*.html。
For above-mentioned 2nd step:Region where directory metadata information can be chosen on the page, the selection branch of page area
Regular expression is held, supports css selectors, supports XPATH to choose.For example all there is a list, table in the A pages and the B pages
Single title is all called form1, and so we can select form1 lists by CSS selector:Form [name='
form1’]。
For above-mentioned 3rd step:If there are many fields on page area, but not all field is all that we want
, so we can continue to define the rule that field is chosen.If the field that we select has individual general character, there are CSS classes
Cs1, so we can pass through:.cs1 our desired fields are chosen.
In an embodiment of the invention, it can use and increase income HTMLUNIT technologies, based on browser engine framework of increasing income,
The page, page area, field are realized, the matching with regular expression, XPATH, CSS selector.
In the embodiment of the present invention, for the directory metadata information collected, based on different practical application requests, at least
There may be following two kinds of subsequent treatment modes:
Mode 1:Materialization is to database;
Mode 2:Opened to third party.
In detail, corresponding to aforesaid way 1:
In one embodiment of the invention, this method may further include:Judge the current acquisition tasks definition
In whether define materialization, if so, the data source definitions in being defined according to the current acquisition tasks, the directory metadata is believed
Cease in database corresponding to materialization to the data source definitions;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types
Any one or more;
The data of the directory metadata information preserve form and gone here and there for JSON.
In detail, after directory metadata acquisition of information, directory metadata information can be generated to form to the number defined
Prepared according to the catalogue data collection in storehouse, thinking follow-up.
In detail, when setting acquisition tasks to define, it may be determined that the directory metadata letter that each acquisition tasks are collected
Whether breath needs materialization.For example staff is when thinking to need materialization, " whether materialization " this task attribute can be hooked
Choosing.Certainly, if needing materialization, data source definitions can also be included in acquisition tasks definition.Wherein, data source definitions can include number
Log-on message, data source types etc. are accessed according to source address, data source.
When needing materialization, a specific data source can be determined according to data source definitions, such as on a certain server
Database, then the directory metadata information collected can be stored in a manner of database table.Certainly, at this
Invent in another embodiment, the directory metadata information collected can equally be stored in a manner of file.
In an embodiment of the invention, every catalogue metadata information can use the capitalization shape of Chinese Pin Yin initial
Formula preserves, and data preserve form and gone here and there for JSON.
As an example it is assumed that the directory metadata information that a task names are called examination & approval application has currently been collected, bag
Containing enterprise name, application time, social credibility code, enterprise address, this 5 information of business entity.If it is determined that materialization, system,
Such as the system run on a kind of device for gathering directory metadata, can by this 5 direct materializations of information into table structure, and
A table for including this five fields is generated in the database defined.
In detail, corresponding to aforesaid way 2:
In one embodiment of the invention, this method may further include:Judging the current acquisition tasks
In definition during undefined materialization, the webservice modes in a manner of restapi or based on SOAP issue the directory metadata
Information.
In detail, different from materialization, the system in the embodiment of the present invention equally can be as just an instrument, to provide
Used to third party.At this moment system can directly provide collected directory metadata information, and this knot is taken by third party
It is further processed after fruit.Such as third party information can be modified after materialization again, or be directly stored in big data text
In part system.
In detail, there may be two kinds of presentation modes, first, the webservice of SOAP modes, another is restapi side
Formula.Preferably, restapi modes can be used, that is, opens an address and is directly called for third party to obtain result.
In the embodiment of the present invention, distributed task scheduling can be used, using HTMLUNIT technologies of increasing income, realizes that catalogue gathers,
Collection result is issued with service form, is realized shared.
In one embodiment of the invention, also include in the acquisition tasks definition:Task names, Mission Monitor, mistake
Imitate any one or more in transfer, mission statement.
In detail, to realize the collection of directory metadata, acquisition tasks definition, acquisition tasks definition can be pre-set
In can not only include tasks carrying frequency, whether materialization, and need to further comprise data source definitions during materialization, except this it
Outside, task names, Mission Monitor, failover, mission statement etc. can also be included.
As shown in Fig. 2 the method that one embodiment of the invention provides another collection directory metadata, specifically include with
Lower step:
Step 201:Default at least one acquisition tasks definition, and acquisition system corresponding to the definition of each acquisition tasks
Definition and collection rule definition.
In detail, acquisition tasks definition in can include task names, tasks carrying frequency, Mission Monitor, failover,
Mission statement, whether materialization, data source definitions etc..
In detail, acquisition system definition can include system reference address and system accesses log-on message.
In detail, collection rule definition can include page info, page area information and field information.
Step 202:It is performed both by for the definition of each acquisition tasks:Tasks carrying in being defined according to current acquisition tasks
Frequency, acquisition system definition, log-on message source systems periodically according to corresponding to the definition of current acquisition tasks.
Step 203:It is determined that target collection rule definition corresponding to current acquisition tasks definition.
Step 204:Page info in being defined according to target collection rule, go out information source using matching regular expressions
The page to be collected in system.
In detail, the page, page area, the relevant information of field can be included during collection rule defines.
Step 205:Page area information in being defined according to target collection rule, is matched using CSS selector and waits to adopt
Collect the page area to be collected in the page.
Step 206:Field information in being defined according to target collection rule, page to be collected is matched using CSS selector
Field to be collected in the region of face.
Step 207:Gather directory metadata information corresponding to field to be collected.
Step 208:Judge whether define materialization in current acquisition tasks definition, if so, being defined according to current acquisition tasks
In data source definitions, by database corresponding to directory metadata information materialization to data source definitions, otherwise, with restapi
Mode issues directory metadata information.
Certainly, in an alternative embodiment of the invention, equally mesh can be issued in a manner of the webservice based on SOAP
Record metadata information.
In detail, data source definitions can include data source address, data source is accessed in log-on message, data source types
Any one or more.
In detail, the data of directory metadata information preserve form and gone here and there for JSON.
In summary, the method for the collection directory metadata described in the embodiment of the present invention, can be related to acquisition tasks definition,
Acquisition system definition, collection rule definition, data source definitions, the issue of directory metadata information define this several part.
Wherein, acquisition tasks definition can configure acquisition tasks engine rule, realize that system captures automatically;Acquisition system is determined
Justice can be defined to the system of directory metadata information source;Collection rule definition can be to directory metadata information extraction
Rule is specified, and according to the rule of definition, system is automatically extracted directory metadata information;Data source definitions can be right
The directory metadata information collected want materialization to data source be defined, system after directory metadata information is collected,
As needed can be to directory metadata item of information materialization to the data source specified;The issue definition of directory metadata information can be right
The directory metadata information collected is issued, and facilitates third party system to use the achievement of this method embodiment.
Based on the above, the acquisition method of the directory metadata described in the embodiment of the present invention, it is only necessary to once configure, nothing
Manual intervention is needed, the automatic data collection of directory metadata information can be realized, saves human cost, is beneficial to improve directory metadata
The efficiency and accuracy of collection, propulsion data share the smooth execution of work.
As shown in figure 3, one embodiment of the invention provides a kind of device for gathering directory metadata, including:
Setting unit 301, for presetting at least one acquisition tasks definition, and each described acquisition tasks definition pair
The acquisition system definition and collection rule definition answered;
First processing units 302, for being performed both by for acquisition tasks definition each described:According to current acquisition tasks
Tasks carrying frequency in definition, periodically the acquisition system definition according to corresponding to the current acquisition tasks definition, are logged in
Information source system;
Determining unit 303, for the target collection rule definition according to corresponding to the current acquisition tasks definition, determine institute
State the field to be collected in information source system;
Collecting unit 304, for gathering directory metadata information corresponding to the field to be collected.
In an embodiment of the invention, the acquisition system definition includes:System reference address, and/or, system accesses
Log-on message;
The collection rule definition includes:Page info, page area information and field information;
The determining unit 303, specifically for the page info in being defined according to the target collection rule, utilize canonical
Expression formula matches the page to be collected in described information source systems;Page pool in being defined according to the target collection rule
Domain information, matched using any one in regular expression, XPATH, CSS selector in the page to be collected and wait to adopt
Collect page area;Field information in being defined according to the target collection rule, selected using regular expression, XPATH, CSS
Any one in device matches the field to be collected in the page area to be collected.
In an embodiment of the invention, Fig. 4 is refer to, the device of the collection directory metadata can also include:Second
Processing unit 401, materialization unit 402;
The second processing unit 401, for judging whether define materialization in the current acquisition tasks definition, if so,
Trigger the materialization unit 402;
The materialization unit 402, for the data source definitions in being defined according to the current acquisition tasks, by the catalogue
In database corresponding to metadata information materialization to the data source definitions;
Wherein, the data source definitions include:Data source address, data source are accessed in log-on message, data source types
Any one or more;
The data of the directory metadata information preserve form and gone here and there for JSON.
In an embodiment of the invention, Fig. 4 is refer to, the device of the collection directory metadata can also include:Issue
Unit 403;
The second processing unit 401, it is additionally operable in undefined materialization in judging the current acquisition tasks definition,
Trigger the release unit 403;
The release unit 403, the mesh is issued for the webservice modes in a manner of restapi or based on SOAP
Record metadata information.
In an embodiment of the invention, also include in the acquisition tasks definition:Task names, Mission Monitor, failure
Transfer, any one or more in mission statement.
The contents such as the information exchange between each unit, implementation procedure in said apparatus, due to implementing with the inventive method
Example is based on same design, and particular content can be found in the narration in the inventive method embodiment, and here is omitted.
In summary, each embodiment of the invention at least has the advantages that:
1st, in the embodiment of the present invention, at least one acquisition tasks definition is preset, and adopted corresponding to the definition of each acquisition tasks
Collecting system defines and collection rule definition;It is performed both by for the definition of each acquisition tasks:Appointing in being defined according to current acquisition tasks
Business performs frequency, acquisition system definition, log-on message source systems periodically according to corresponding to the definition of current acquisition tasks;Root
According to collection rule definition corresponding to the definition of current acquisition tasks, the field to be collected in information source system is determined;Gather and wait to adopt
Collect directory metadata information corresponding to field.Acquisition tasks definition, acquisition system definition and collection rule through pre-setting
Definition, it is possible to achieve the distributed automatic data collection of directory metadata information.Therefore, the embodiment of the present invention can improve catalogue member number
According to collecting efficiency.
2nd, the acquisition method of the directory metadata described in the embodiment of the present invention, it is only necessary to once configure, without manual intervention,
The automatic data collection of directory metadata information can be realized, saves human cost, is beneficial to the efficiency for improving directory metadata collection
And accuracy, propulsion data share the smooth execution of work.
It should be noted that herein, such as first and second etc relational terms are used merely to an entity
Or operation makes a distinction with another entity or operation, and not necessarily require or imply and exist between these entities or operation
Any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non-
It is exclusive to include, so that process, method, article or equipment including a series of elements not only include those key elements,
But also the other element including being not expressly set out, or also include solid by this process, method, article or equipment
Some key elements.In the absence of more restrictions, by sentence " including the key element that a 〃 〃 " is limited, it is not excluded that
Other identical factor in the process including the key element, method, article or equipment also be present.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through
Programmed instruction related hardware is completed, and foregoing program can be stored in computer-readable storage medium, the program
Upon execution, the step of execution includes above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or light
Disk etc. is various can be with the medium of store program codes.
It is last it should be noted that:Presently preferred embodiments of the present invention is the foregoing is only, is merely to illustrate the skill of the present invention
Art scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention,
Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.
Claims (10)
- A kind of 1. method for gathering directory metadata, it is characterised in that default at least one acquisition tasks definition, and each Acquisition system definition and collection rule definition corresponding to the acquisition tasks definition;Also include:It is performed both by for acquisition tasks definition each described:Tasks carrying frequency in being defined according to current acquisition tasks, week The acquisition system definition according to corresponding to the current acquisition tasks definition of phase property, log-on message source systems;According to target collection rule definition corresponding to the current acquisition tasks definition, determine to treat in described information source systems Gather field;Gather directory metadata information corresponding to the field to be collected.
- 2. according to the method for claim 1, it is characterised in thatThe acquisition system definition includes:System reference address, and/or, system accesses log-on message;The collection rule definition includes:Page info, page area information and field information;The field to be collected determined in described information source systems, including:In being defined according to the target collection rule Page info, the page to be collected gone out using matching regular expressions in described information source systems;Gathered according to the target Page area information in rule definition, utilizes regular expression, extensible markup language path language XPATH, cascading style Any one in table CSS selector matches the page area to be collected in the page to be collected;Adopted according to the target Field information in collection rule definition, described treat is matched using any one in regular expression, XPATH, CSS selector Gather the field to be collected in page area.
- 3. according to the method for claim 1, it is characterised in thatFurther comprise:Judge whether define materialization in the current acquisition tasks definition, if so, appointing according to the current collection Data source definitions in business definition, by the directory metadata information materialization into database corresponding to the data source definitions;Wherein, the data source definitions include:Data source address, data source access any in log-on message, data source types It is one or more;The data of the directory metadata information preserve form and gone here and there for JavaScript object markup language JSON.
- 4. according to the method for claim 3, it is characterised in thatFurther comprise:In undefined materialization in judging the current acquisition tasks definition, compiled with RESTful application programs Journey interface restapi modes or webservice modes based on simple object access protocol issue the directory metadata Information.
- 5. according to any described method in Claims 1-4, it is characterised in thatAlso include in the acquisition tasks definition:Any one in task names, Mission Monitor, failover, mission statement It is or a variety of.
- A kind of 6. device for gathering directory metadata, it is characterised in that including:Setting unit, for presetting at least one acquisition tasks definition, and adopted corresponding to each described acquisition tasks definition Collecting system defines and collection rule definition;First processing units, for being performed both by for acquisition tasks definition each described:In being defined according to current acquisition tasks Tasks carrying frequency, the periodically acquisition system definition according to corresponding to the current acquisition tasks definition, log-on message come Source system;Determining unit, for the target collection rule definition according to corresponding to the current acquisition tasks definition, determine described information Field to be collected in source systems;Collecting unit, for gathering directory metadata information corresponding to the field to be collected.
- 7. the device of collection directory metadata according to claim 6, it is characterised in thatThe acquisition system definition includes:System reference address, and/or, system accesses log-on message;The collection rule definition includes:Page info, page area information and field information;The determining unit, specifically for the page info in being defined according to the target collection rule, utilize regular expression Match the page to be collected in described information source systems;Page area letter in being defined according to the target collection rule Breath, utilizes any one in regular expression, extensible markup language path language XPATH, CSS CSS selector Match the page area to be collected in the page to be collected;Field information in being defined according to the target collection rule, Matched using any one in regular expression, XPATH, CSS selector to be collected in the page area to be collected Field.
- 8. the device of collection directory metadata according to claim 6, it is characterised in thatAlso include:Second processing unit, materialization unit;The second processing unit, for judging whether define materialization in the current acquisition tasks definition, if so, described in triggering Materialization unit;The materialization unit, for the data source definitions in being defined according to the current acquisition tasks, by the directory metadata In database corresponding to information materialization to the data source definitions;Wherein, the data source definitions include:Data source address, data source access any in log-on message, data source types It is one or more;The data of the directory metadata information preserve form and gone here and there for JavaScript object markup language JSON.
- 9. the device of collection directory metadata according to claim 8, it is characterised in thatAlso include:Release unit;The second processing unit, it is additionally operable to, in undefined materialization in judging the current acquisition tasks definition, trigger institute State release unit;The release unit, for being assisted in a manner of RESTful application programming interfaces restapi or based on simple object access View SOAP webservice modes issue the directory metadata information.
- 10. according to the device of any described collection directory metadata in claim 6 to 9, it is characterised in thatAlso include in the acquisition tasks definition:Any one in task names, Mission Monitor, failover, mission statement It is or a variety of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711146068.0A CN107871009A (en) | 2017-11-17 | 2017-11-17 | A kind of method and device for gathering directory metadata |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711146068.0A CN107871009A (en) | 2017-11-17 | 2017-11-17 | A kind of method and device for gathering directory metadata |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107871009A true CN107871009A (en) | 2018-04-03 |
Family
ID=61754063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711146068.0A Pending CN107871009A (en) | 2017-11-17 | 2017-11-17 | A kind of method and device for gathering directory metadata |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107871009A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109360636A (en) * | 2018-11-16 | 2019-02-19 | 江苏盛益医疗科技有限公司 | A kind of hospital information management system |
CN112988730A (en) * | 2021-03-29 | 2021-06-18 | 国网宁夏电力有限公司电力科学研究院 | Metadata collection method based on enterprise data inventory |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361022A (en) * | 2014-10-22 | 2015-02-18 | 浪潮软件集团有限公司 | Method based on collected data statistics and foreground display |
CN104484424A (en) * | 2014-12-19 | 2015-04-01 | 浪潮通用软件有限公司 | Establishing method for resource price information base of construction enterprise based on internet |
US20150186521A1 (en) * | 2013-12-31 | 2015-07-02 | Clicktale Ltd. | Method and system for tracking and gathering multivariate testing data |
CN104794161A (en) * | 2015-03-24 | 2015-07-22 | 浪潮集团有限公司 | Method for monitoring network public opinions |
CN104915259A (en) * | 2015-06-15 | 2015-09-16 | 浪潮软件集团有限公司 | Task scheduling method applied to distributed acquisition system |
CN106096056A (en) * | 2016-06-30 | 2016-11-09 | 西南石油大学 | A kind of based on distributed public sentiment data real-time collecting method and system |
-
2017
- 2017-11-17 CN CN201711146068.0A patent/CN107871009A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150186521A1 (en) * | 2013-12-31 | 2015-07-02 | Clicktale Ltd. | Method and system for tracking and gathering multivariate testing data |
CN104361022A (en) * | 2014-10-22 | 2015-02-18 | 浪潮软件集团有限公司 | Method based on collected data statistics and foreground display |
CN104484424A (en) * | 2014-12-19 | 2015-04-01 | 浪潮通用软件有限公司 | Establishing method for resource price information base of construction enterprise based on internet |
CN104794161A (en) * | 2015-03-24 | 2015-07-22 | 浪潮集团有限公司 | Method for monitoring network public opinions |
CN104915259A (en) * | 2015-06-15 | 2015-09-16 | 浪潮软件集团有限公司 | Task scheduling method applied to distributed acquisition system |
CN106096056A (en) * | 2016-06-30 | 2016-11-09 | 西南石油大学 | A kind of based on distributed public sentiment data real-time collecting method and system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109360636A (en) * | 2018-11-16 | 2019-02-19 | 江苏盛益医疗科技有限公司 | A kind of hospital information management system |
CN112988730A (en) * | 2021-03-29 | 2021-06-18 | 国网宁夏电力有限公司电力科学研究院 | Metadata collection method based on enterprise data inventory |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106484828B (en) | Distributed internet data rapid acquisition system and acquisition method | |
TWI290698B (en) | System and method for updating and displaying patent citation information | |
US20070198907A1 (en) | System and method for enabling persistent values when navigating in electronic documents | |
DE102017111438A1 (en) | API LEARNING | |
CN106897322A (en) | The access method and device of a kind of database and file system | |
CN107544984A (en) | A kind of method and apparatus of data processing | |
KR102222287B1 (en) | Web Crawler System for Collecting a Structured and Unstructured Data in Hidden URL | |
CN107092639A (en) | A kind of search engine system | |
CN110321383A (en) | Big data platform method of data synchronization, device, computer equipment and storage medium | |
CN104933168B (en) | A kind of web page contents automatic acquiring method | |
CN107506464A (en) | A kind of method that HBase secondary indexs are realized based on ES | |
CN105528218B (en) | Data drawing list Cascading Methods and data drawing list cascade system | |
CN110245145A (en) | Structure synchronization method and apparatus of the relevant database to Hadoop database | |
Suzanti et al. | REST API implementation on android based monitoring application | |
CN109150585A (en) | A kind of network O&M failure solution, system, device and storage medium | |
CN107871009A (en) | A kind of method and device for gathering directory metadata | |
CN103914486B (en) | Document search and display system | |
US8341168B1 (en) | System for displaying hierarchical data | |
Luo et al. | Efficacy of transcatheter aortic valve implantation in patients with aortic stenosis and reduced LVEF | |
DE112016004967T5 (en) | Automated discovery of information | |
CN113130086A (en) | Health medical big data platform | |
CN104331512A (en) | Automatic BBS (bulletin board system) page acquisition method | |
US20150066949A1 (en) | Computerized systems and methods for social networking | |
CN108073637A (en) | A kind of method for establishing left-eyed flounder economic characters information database | |
EP3523732A1 (en) | Systems and methods for efficiently distributing alert messages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180403 |
|
RJ01 | Rejection of invention patent application after publication |