CN107341217A - A kind of data capture method and equipment - Google Patents

A kind of data capture method and equipment Download PDF

Info

Publication number
CN107341217A
CN107341217A CN201710501301.6A CN201710501301A CN107341217A CN 107341217 A CN107341217 A CN 107341217A CN 201710501301 A CN201710501301 A CN 201710501301A CN 107341217 A CN107341217 A CN 107341217A
Authority
CN
China
Prior art keywords
elasticsearch
data
search engine
methods
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710501301.6A
Other languages
Chinese (zh)
Other versions
CN107341217B (en
Inventor
支猛
张文明
陈少杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201710501301.6A priority Critical patent/CN107341217B/en
Publication of CN107341217A publication Critical patent/CN107341217A/en
Priority to PCT/CN2017/120216 priority patent/WO2019000897A1/en
Application granted granted Critical
Publication of CN107341217B publication Critical patent/CN107341217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Stored Programmes (AREA)

Abstract

The present invention, which provides a kind of data capture method and equipment, methods described, to be included:Set data query conditions and the rule parsed to the ElasticSearch data returned are supplied to predefined data acquisition component;Call the data acquisition component to initiate roll screen inquiry request to search engine ElasticSearch, obtain returning results of the search engine ElasticSearch by parsing to the roll screen inquiry request.A kind of data capture method proposed by the present invention and equipment, by calling customized data acquisition component to obtain high-volume data to search engine ElasticSearch so that data acquisition is more directly more reliable, in order, in real time and not repeatedly using ElasticSearch ScrollAPI.

Description

A kind of data capture method and equipment
Technical field
The present invention relates to field of software engineering, more particularly, to a kind of data capture method and equipment.
Background technology
ElasticSearch is an outstanding distributed search engine of increasing income, except for searching for, ElasticSearch is also daily record storage, the sharp weapon of off line data analysis excavation.It can be received in real time using ElasticSearch Using being output to daily record on disk in the process of running on line concentration, and by real-time collecting to daily record storage arrive In ElasticSearch clusters.
For the daily record being stored in ElasticSearch clusters, there are following two application scenarios:On the one hand according to being opened The daily record central platform of hair, developer is by setting on search condition information trunk using the various days of output on the platform Will, the problem of so as to help developer to understand the running situation applied on line and applied on fast positioning line.On the other hand Storm clusters can pull the polymerization calculating that complexity is done in daily record from ElasticSearch clusters in bulk in real time, such as distributed to adjust With chain calculating etc..Both the above scene be required to rapidly, continuously, a large amount of numbers are obtained from ElasticSearch clusters in real time According to.ElasticSearch provides ScrollAPI (rolling search) and is used to make ElasticSearch quickly and efficiently perform greatly The data query of batch.
But ScrollAPI (rolling search) is adapted to the substantial amounts of data of processing, is not suitable for active user request, and whenever application When program initiates a new Scroll API Calls again, ElasticSearch can returned data from the beginning, cause client End receives the data repeated.Following ask directly can be brought to application program using the ElasticSearch ScrollAPI provided Topic:Can not ensure application program end it is reliable, sequentially, in real time and not repeatedly obtain large batch of data.
The content of the invention
In order to overcome directly using ElasticSearch provide ScrollAPI bring can not reliably, sequentially, in real time And the problem of not repeatedly obtaining high-volume data, the present invention provides a kind of data capture method and equipment.
According to an aspect of the present invention, there is provided a kind of data capture method, including:
S1, set data query conditions and the rule parsed to the ElasticSearch data returned are carried Supply predefined data acquisition component;
S2, call the data acquisition component to initiate roll screen inquiry request to search engine ElasticSearch, obtain warp Cross returning results of the search engine ElasticSearch to the roll screen inquiry request of parsing.
Wherein, also include before step S1:
S0, realize the data acquisition component based on ElasticSearch ScrollAPI.
Wherein, the data acquisition component specifically includes:Prepare query interface class and roll screen enquiring component class;
The preparation query interface class includes prepare methods and parseResult methods, and the prepare methods are used In providing developer's querying condition set to data acquisition component, the parseResult methods are used for data acquisition Component provide developer set to the resolution rules of the data got from search engine ElasticSearch;
The roll screen enquiring component class includes doScrollSearch methods, the doScrollSearch methods be used for ElasticSearch ScrollAPI mode obtains the data in search engine ElasticSearch, described DoScrollSearch methods enter example of the ginseng for the preparation query interface class.
Wherein, step S1 further comprises:
S11, by the preparation query interface class instantiation, obtain an instance objects for preparing query interface class;
S12, the instance objects are passed to the doScrollSearch methods of the roll screen enquiring component.
Wherein, step S2 further comprises:
S21, the prepare methods are adjusted back in the doScrollSearch methods and obtain looking into for developer's setting Inquiry condition, roll screen inquiry request is initiated to search engine ElasticSearch;
S22, when getting search engine ElasticSearch to the returning result of the roll screen inquiry request, readjustment The parseResult methods parse to the returning result, obtain the ElasticSearch data by parsing;
S23, return to the ElasticSearch data by parsing.
Wherein, the roll screen inquiry request includes:The querying condition of developer's setting, request contexts ID, offset The index of query argument and last visit.
Wherein, after the step of initiating roll screen inquiry request to search engine ElasticSearch in the step s 21, also Including:
Search engine ElasticSearch is set to carry out ascending sort to data according to offset fields.
Wherein, step S22 also includes:
If knowing, search engine ElasticSearch is obtained less than the request contexts institute according to the request contexts ID Corresponding data, then initiate new roll screen inquiry request again to search engine ElasticSearch.
According to another aspect of the present invention, there is provided a kind of data acquisition facility, including memory, processor, Yi Jizong Line,
The processor and memory complete mutual communication by the bus;
The memory storage has and can call the memory by the programmed instruction of the computing device, the processor In programmed instruction, to perform foregoing data capture method.
According to a further aspect of the invention, there is provided a kind of non-transient computer readable storage medium storing program for executing, the non-transient meter Calculation machine readable storage medium storing program for executing stores computer instruction, and the computer instruction obtains the foregoing data of the computer execution Take method.
A kind of data capture method proposed by the present invention and equipment, by calling customized data acquisition component to search Engine ElasticSearch obtains high-volume data so that data acquisition is more directly using ElasticSearch's ScrollAPI is more reliable, in order, in real time and not repeatedly.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the data capture method provided according to one embodiment of the invention;
Fig. 2 is the schematic flow sheet based on step S2 in Fig. 1 provided according to another embodiment of the present invention;
Fig. 3 is a kind of structural representation for data acquisition facility that another embodiment of the present invention provides.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
As shown in figure 1, be the schematic flow sheet of the data capture method provided according to one embodiment of the invention, including:
S1, set data query conditions and the rule parsed to the ElasticSearch data returned are carried Supply predefined data acquisition component;
S2, call the data acquisition component to initiate roll screen inquiry request to search engine ElasticSearch, obtain warp Cross returning results of the search engine ElasticSearch to the roll screen inquiry request of parsing.
Specifically, it is large batch of in order to be obtained rapidly, continuously and in real time from search engine ElasticSearch Data are used for specific business scenario, and application program can be by calling predefined data acquisition component to search engine ElasticSearch initiates roll screen inquiry request (ScrollAPI requests), and prior art is replaced by data acquisition component ScrollAPI interacts with search engine ElasticSearch, meets so as to obtain in search engine ElasticSearch The data of querying condition, and accessed ElasticSearch data can be parsed into specific business by data acquisition component Type object required for scene.Developer is only needed before predefined data acquisition component is called, by set number Data acquisition component is supplied to according to querying condition and the rule parsed to the ElasticSearch data returned, it is possible to By calling the data acquisition component, the ElasticSearch data after parsing are got.
Data acquisition component to the content of the search engine ElasticSearch roll screen inquiry requests sent be developer The querying condition of the QueryBuilder types of setting, and added on the basis of the QueryBuilder type queries condition The index of offset (data-bias value) query argument, scrollId parameters (request contexts ID) and last visit, and show The content for the roll screen inquiry request that some is sent by ScrollAPI generally only includes the querying condition of developer's setting.Number ScrollId parameters, which are relied on, according to securing component ensures the real-time of data acquisition and not repeated, dependence offset mechanism guarantee numbers According to the order of acquisition.
A kind of data capture method provided in an embodiment of the present invention, by calling customized data acquisition component to search Engine ElasticSearch obtains high-volume data so that data acquisition is more directly using ElasticSearch's ScrollAPI is more reliable, in order, in real time and not repeatedly.
Another embodiment of the present invention, on the basis of above-described embodiment, also include before step S1:
S0, realize the data acquisition component based on ElasticSearch ScrollAPI;
Wherein, the data acquisition component specifically includes:Prepare query interface class and roll screen enquiring component class;
The preparation query interface class includes prepare methods and parseResult methods, and the prepare methods are used In providing developer's querying condition set to data acquisition component, the parseResult methods are used for data acquisition Component provide developer set to the resolution rules of the data got from search engine ElasticSearch;
The roll screen enquiring component class includes doScrollSearch methods, the doScrollSearch methods be used for ElasticSearch ScrollAPI mode obtains the data in search engine ElasticSearch, described DoScrollSearch methods enter example of the ginseng for the preparation query interface class.
Specifically, the data acquisition component is based on ElasticSearch ScrollAPI, realizes that the data obtain Component is taken to include:Construction prepares query interface class (IPrepareSearch<T>Interface class) and construction roll screen enquiring component class (ScrollSearchComponent classes).
1) IPrepareSearch is constructed<T>Interface class
The effect of the interface be available to application program (developer) querying condition is configured, to from The resolution rules for the data that ElasticSearch is obtained are configured.
IPrepareSearch<T>Interface class is defined as follows:
The interface class is made up of two methods, and one is prepare methods, and the effect of this method is available to opening for application Hair personnel provide to data acquisition component prepares querying condition.Here SearchRequestVo is defined to be used to describe exploit person The inquiry data that member provides to data acquisition component.Another method is parseResult methods, and this method is supplied to exploit person Member is set represents that data obtain to the resolution rules of the data got from ElasticSearch, the source that enters to join of wherein method The a data for taking component to be got from ElasticSearch, the return Value Types of method have used general type, real by developer Existing IPrepareSearch<R>There is provided during interface, source type is Map<String,Object>Type, it is more original Data type, the data that developer finally needs are the type objects required for specific business scenario, it is therefore desirable to are passed through ParseResult parses to the ElasticSearch data got.
SearchRequestVo is defined as follows:
Wherein, scrollId represents that ElasticSearch is that each roll screen inquiry request (scrollAPI requests) creates Request contexts id.When application program calls ElasticSearch scroll API first, ElasticSearch meetings A request contexts are created for the application program, the request contexts have necessarily ageing, i.e., after the specified time Can be expired.And within the effective time of the request contexts, application program calls ElasticSearch ScrollAPI's again During request, as long as the scrollId is transmitted into ElasticSearch, then ElasticSearch will then last time inquiry As a result, remaining data are returned to, so as to ensure that the not repeated of data acquisition.
What scrollWindow was represented is that ElasticSearch is the expired time that request contexts are set, and unit is milli Second, 180000 be the expired duration that developer is set, simply exemplary herein, can also be set as others as needed Value, scrollWindow has default value, if developer is not configured to scrollWindow value, ScrollWindow is default value.
Offset represented within the last time request contexts effective time, the last number that application program is had access to According to deviant, the important role of the offset, for ensureing that application program will not be obtained disorderly in ElasticSearch Data.ElasticSearch can't be to be automatically stored to data addition offset fields therein, that is to say, that ours should Needed to ensure must have offset fields, the requirement of the field to the data of ElasticSearch cluster-based storages with developer It is globally unique and monotonic increase.
Class in the java client libraries that queryBuilder type is provided by ElasticSearch defines, and represents out Querying condition constructed by hair personnel.
2) ScrollSearchComponent classes are constructed
ScrollSearchComponent classes are the core classes of data acquisition component, developer by such DoScrollSearch methods are finally obtained in ElasticSearch clusters in a manner of ElasticSearch ScrollAPI Data.It is the signature of doScrollSearch methods below:
public<T>SearchResponseVo<T>
doScrollSearch(IPrepareSearch<T>prepareSearch)
The type for entering ginseng of this method is IPrepareSearch<T>Interface type, inside doScrollSearch methods The prepare methods in prepareSearch can be adjusted back to determine the query argument of this inquiry request, while can automatically be adjusted The every data got from ElasticSearch is solved with the parseResult methods in prepareSearch Analysis.The result to application program local search is finally returned to, the Query Result is defined by SearchResponseVo, content It is as follows:
Wherein scrollId, offset and scrollWindow are consistent with the implication in SearchRequestVo classes, this In content represent call prepareSearch in parseResult methods ElasticSearch data are solved Data after analysis.
Another embodiment of the present invention, on the basis of above-described embodiment, step S1 further comprises:
S11, by the preparation query interface class instantiation, obtain an instance objects for preparing query interface class;
S12, the instance objects are passed to the doScrollSearch methods of the roll screen enquiring component class.
Specifically, developer is calling the data acquisition component to search engine after data acquisition component is realized Prepare query interface class (IPrepareSearch, it is necessary to realize before ElasticSearch acquisition data<T>Interface class), will The preparation query interface class instantiation, obtains an example for preparing query interface class, so as to realize IPrepareSearch<T>The prepare methods and parseResult methods of interface class.Developer passes through prepare methods The querying condition write according to the rule of business is provided to data acquisition component, the querying condition is QueryBuilder types, How provided by parseResult methods to data acquisition component by Map<String,Object>The data conversion of type into The domain model type that developer wants.
Step S12 refers to using the instance objects as roll screen enquiring component class (ScrollSearchComponent classes) DoScrollSearch methods enter ginseng.
As shown in Fig. 2 be another embodiment of the present invention, on the basis of above-described embodiment, step S2 schematic flow sheet, Including:
S21, the prepare methods are adjusted back in the doScrollSearch methods and obtain looking into for developer's setting Inquiry condition, roll screen inquiry request is initiated to search engine ElasticSearch;
S22, when getting search engine ElasticSearch to the returning result of the roll screen inquiry request, readjustment The parseResult methods parse to the returning result, obtain the ElasticSearch data by parsing;
S23, return to the ElasticSearch data by parsing.
Specifically, predefined data acquisition component is called to initiate roll screen to search engine ElasticSearch (scroll) inquiry request, returns of the search engine ElasticSearch by parsing to the roll screen inquiry request is obtained As a result the step of, includes:
The inquiry request that data acquisition component obtains developer and set by adjusting back prepare methods, then in component It is internal to initiate roll screen inquiry request to ElasticSearch.From above-described embodiment, the return Value Types of prepare methods For SearchRequestVo, four parameters, i.e. request contexts ID values scrollId, request contexts ID institutes table are included The expired duration scrollWindow for the request contexts shown, the deviant offset of the last data got and exploitation The querying condition queryBuilder that personnel are set.Data acquisition component is by adjusting back prepare methods, you can on getting State each parameter.After getting out querying condition, data acquisition component initiates roll screen inquiry to search engine ElasticSearch please Ask.
When data acquisition component initiates roll screen inquiry request to search engine ElasticSearch for the first time, ScrollId and offset is empty or default value, and ElasticSearch, which can create one, has ageing request contexts, And return to the scrollId associated with the request contexts.When ElasticSearch poll-finals return to data acquisition component While returning data, data acquisition component can retain the offset fields of the last item data this time got.When data obtain (there is identical with roll screen inquiry request before when taking component to send same roll screen inquiry request to search engine again ScrollId values), if the scrollId values are still effective, then ElasticSearch can just open according to offset fields Beginning returned data, so as to ensure that the data that multiple roll screen inquiry request is returned are continuous and unduplicated.If should Request contexts corresponding to scrollId are invalid, then ElasticSearch can prompt to data acquisition component should ScrollId is invalid.
ScrollId and offset values in Prepare methods, are by data acquisition component and ElasticSearch Between interact and automatically update.Interacting between data acquisition component and ElasticSearch is to pass through ElasticSearch For the Java client of offer come what is realized, the Java client access the end of ElasticSearch servers in the form of TCP Mouthful.
When data acquisition component gets ElasticSearch to the returning result of just inquiry request, the result Data type is Map<String,Object>, data acquisition component can adjust back parseResult methods, will be without parsing ElasticSearch return request results pass to parseResult methods, it is specific according to set by developer Resolution rules parse the request results.Finally, data acquisition component will pass through what parseResult methods parsed ElasticSearch data return to caller, i.e. developer.
Step S2 whole process is all to enter in ScrollSearchComponent classes in doScrollSearch methods Capable.
Based on above-described embodiment, the roll screen inquiry request includes:The querying condition of developer's setting, request contexts The index of ID, offset query argument and last visit.
The type of roll screen inquiry request is also QueryBuilder types, and its content is the inquiry bar set in developer On the basis of part queryBuilder, addition offset query arguments, scrollId parameters and last index of reference, search Index holds up ElasticSearch and inquires about the number for meeting querying condition in its server according to the content of above-mentioned roll screen inquiry request According to.
Based on above-described embodiment, in the step s 21 to the step of search engine ElasticSearch initiation roll screen inquiry requests After rapid, in addition to:
Search engine ElasticSearch is set to carry out ascending sort to data according to offset fields.
Specifically, application program initiates a new Scroll API Calls again every time, then ElasticSearch is just Understand returned data from the beginning, this will result in client and receives the data repeated.In order to solve this problem, data acquisition group Part requires that application developer ensures that storage must contain offset fields to the data in ElasticSearch, and the field needs Want globally unique and monotonic increase (there are many schemes to realize the demand in the industry).So data acquisition component can be every in application After the new ScrollAPI call requests of secondary execution, it can all require that ElasticSearch rises to data in offset fields Sequence sorts, and the offset fields of the last item data in the data set got are remained, and ensures new one with this The ScrollAPI requests of wheel are to continue to obtain data on the basis of upper ScrollAPI requests once.
Require that ElasticSearch is entered to data based on offset when being asked by performing new ScrollAPI every time Row sequence can ensure the sequence type of data.Data acquisition component relies on ElasticSearch and offset mechanism and ensured continuously Property and not repeated.
Based on above-described embodiment, step S22 also includes:
If knowing, search engine ElasticSearch is obtained less than the request contexts institute according to the request contexts ID Corresponding data, then initiate new roll screen inquiry request again to search engine ElasticSearch.
Specifically, the result that ElasticSearch Scroll API requests return reflects initial search requests and established When the state that indexes.It is just as a real-time snapshot, the follow-up change to text (insertion, renewal or delete) all only shadows Later request is rung.That is ElasticSearch for new ScrollAPI requests create request contexts it Afterwards, after this it is new on ElasticSearch addition, delete, renewal data all without influence the request contexts under Multiple Scroll requests.Created in order that data acquisition component can be realized to get in real-time in Scroll request contexts The data newly increased after building, data acquisition component is achieved in that knowing how search engine ElasticSearch passes through ScrollId is obtained less than the data (showing that data have obtained to be over) corresponding to the request contexts, then data acquisition component The roll screen inquiry request of a new round can be initiated again to search engine ElasticSearch.
As described in Figure 3, a kind of structural representation of the data acquisition facility provided for another embodiment of the present invention, including deposit Reservoir 31, processor 32 and bus 33,
The processor 32 and memory 31 complete mutual communication by the bus 33;
The memory 31 is stored with the programmed instruction that can be performed by the processor 32, and the processor 32 calls described Programmed instruction in memory 31, with the data capture method described in execution as described above each embodiment, such as including:By set by Data query conditions and predefined data acquisition is supplied to the rule that is parsed of data that ElasticSearch is returned Component;Call the data acquisition component to initiate roll screen inquiry request to search engine ElasticSearch, obtain by parsing Search engine ElasticSearch to the returning result of the roll screen inquiry request.
Further embodiment of this invention, there is provided a kind of non-transient computer readable storage medium storing program for executing, the non-transient computer can Storage medium storage computer instruction is read, the computer instruction makes the computer perform the number described in each embodiment as described above According to acquisition methods, such as including:Parsed by set data query conditions and to the ElasticSearch data returned Rule be supplied to predefined data acquisition component;The data acquisition component is called to search engine ElasticSearch Roll screen inquiry request is initiated, obtains returns of the search engine ElasticSearch by parsing to the roll screen inquiry request As a result.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass through Programmed instruction related hardware is completed, and foregoing program can be stored in a computer read/write memory medium, the program Upon execution, the step of execution includes above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or light Disk etc. is various can be with the medium of store program codes.
The embodiment of data acquisition facility described above is only schematical, wherein described say as separating component Bright unit can be or may not be physically separate, can be as the part that unit is shown or can not also It is physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to the need of reality Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying In the case of going out performing creative labour, you can to understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can Realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on such understanding, on The part that technical scheme substantially in other words contributes to prior art is stated to embody in the form of software product, should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including some fingers Make to cause a computer equipment (can be personal computer, server, or network equipment etc.) to perform each implementation Method described in some parts of example or embodiment.
The data capture method and equipment that the various embodiments described above of the present invention propose, by calling customized data acquisition group Part obtains high-volume data to search engine ElasticSearch so that data acquisition is more directly using ElasticSearch's ScrollAPI is more reliable, in order, in real time and not repeatedly.
Finally, method of the invention is only preferable embodiment, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc., the protection of the present invention should be included in Within the scope of.

Claims (10)

  1. A kind of 1. data capture method, it is characterised in that including:
    S1, set data query conditions and the rule parsed to the ElasticSearch data returned are supplied to Predefined data acquisition component;
    S2, call the data acquisition component to initiate roll screen inquiry request to search engine ElasticSearch, obtain by solution Returning results of the search engine ElasticSearch of analysis to the roll screen inquiry request.
  2. 2. according to the method for claim 1, it is characterised in that also include before step S1:
    S0, realize the data acquisition component based on ElasticSearch ScrollAPI.
  3. 3. according to the method for claim 2, it is characterised in that the data acquisition component specifically includes:Prepare inquiry to connect Mouth class and roll screen enquiring component class;
    The preparation query interface class includes prepare methods and parseResult methods, the prepare methods for Data acquisition component provides the querying condition that developer is set, and the parseResult methods are used for data acquisition component There is provided developer set to the resolution rules of the data got from search engine ElasticSearch;
    The roll screen enquiring component class includes doScrollSearch methods, the doScrollSearch methods be used for ElasticSearch ScrollAPI mode obtains the data in search engine ElasticSearch, described DoScrollSearch methods enter example of the ginseng for the preparation query interface class.
  4. 4. according to the method for claim 3, it is characterised in that step S1 further comprises:
    S11, by the preparation query interface class instantiation, obtain an instance objects for preparing query interface class;
    S12, the instance objects are passed to the doScrollSearch methods of the roll screen enquiring component.
  5. 5. according to the method for claim 3, it is characterised in that step S2 further comprises:
    S21, the prepare methods are adjusted back in the doScrollSearch methods and obtain the inquiry bar that developer is set Part, roll screen inquiry request is initiated to search engine ElasticSearch;
    S22, when getting search engine ElasticSearch to the returning result of the roll screen inquiry request, described in readjustment ParseResult methods parse to the returning result, obtain the ElasticSearch data by parsing;
    S23, return to the ElasticSearch data by parsing.
  6. 6. according to the method for claim 5, it is characterised in that the roll screen inquiry request includes:What developer was set The index of querying condition, request contexts ID, offset query argument and last visit.
  7. 7. according to the method for claim 5, it is characterised in that sent out in the step s 21 to search engine ElasticSearch After the step of playing roll screen inquiry request, in addition to:
    Search engine ElasticSearch is set to carry out ascending sort to data according to offset fields.
  8. 8. according to the method for claim 6, it is characterised in that step S22 also includes:
    If know search engine ElasticSearch according to corresponding to request contexts ID acquisitions less than the request contexts Data, then initiate new roll screen inquiry request again to search engine ElasticSearch.
  9. A kind of 9. data acquisition facility, it is characterised in that including memory, processor and bus,
    The processor and memory complete mutual communication by the bus;
    The memory storage has and can called by the programmed instruction of the computing device, the processor in the memory Programmed instruction, to perform the method as described in claim 1 to 8 is any.
  10. 10. a kind of non-transient computer readable storage medium storing program for executing, it is characterised in that the non-transient computer readable storage medium storing program for executing is deposited Computer instruction is stored up, the computer instruction makes the computer perform the method as described in claim 1 to 8 is any.
CN201710501301.6A 2017-06-27 2017-06-27 Data acquisition method and equipment Active CN107341217B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710501301.6A CN107341217B (en) 2017-06-27 2017-06-27 Data acquisition method and equipment
PCT/CN2017/120216 WO2019000897A1 (en) 2017-06-27 2017-12-29 Data acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710501301.6A CN107341217B (en) 2017-06-27 2017-06-27 Data acquisition method and equipment

Publications (2)

Publication Number Publication Date
CN107341217A true CN107341217A (en) 2017-11-10
CN107341217B CN107341217B (en) 2020-02-07

Family

ID=60221638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710501301.6A Active CN107341217B (en) 2017-06-27 2017-06-27 Data acquisition method and equipment

Country Status (2)

Country Link
CN (1) CN107341217B (en)
WO (1) WO2019000897A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019000897A1 (en) * 2017-06-27 2019-01-03 武汉斗鱼网络科技有限公司 Data acquisition method and device
CN113407785A (en) * 2021-06-11 2021-09-17 西北工业大学 Data processing method and system based on distributed storage system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457189A (en) * 2019-07-02 2019-11-15 平安科技(深圳)有限公司 A kind of blog management method and system, relevant device of application program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399887A (en) * 2013-07-19 2013-11-20 蓝盾信息安全技术股份有限公司 Query and statistical analysis system for mass logs
US20160203548A1 (en) * 2007-02-09 2016-07-14 Xcira, Inc. Integrated auctioning environment platform
CN106126731A (en) * 2016-07-01 2016-11-16 百势软件(北京)有限公司 A kind of method and device obtaining Elasticsearch paged data
CN106528797A (en) * 2016-11-10 2017-03-22 上海轻维软件有限公司 DSL query method based on Elasticsearch

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341217B (en) * 2017-06-27 2020-02-07 武汉斗鱼网络科技有限公司 Data acquisition method and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203548A1 (en) * 2007-02-09 2016-07-14 Xcira, Inc. Integrated auctioning environment platform
CN103399887A (en) * 2013-07-19 2013-11-20 蓝盾信息安全技术股份有限公司 Query and statistical analysis system for mass logs
CN106126731A (en) * 2016-07-01 2016-11-16 百势软件(北京)有限公司 A kind of method and device obtaining Elasticsearch paged data
CN106528797A (en) * 2016-11-10 2017-03-22 上海轻维软件有限公司 DSL query method based on Elasticsearch

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019000897A1 (en) * 2017-06-27 2019-01-03 武汉斗鱼网络科技有限公司 Data acquisition method and device
CN113407785A (en) * 2021-06-11 2021-09-17 西北工业大学 Data processing method and system based on distributed storage system
CN113407785B (en) * 2021-06-11 2023-02-28 西北工业大学 Data processing method and system based on distributed storage system

Also Published As

Publication number Publication date
WO2019000897A1 (en) 2019-01-03
CN107341217B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
CN104838377B (en) It is handled using mapping reduction integration events
EP3502928A1 (en) Intelligent natural language query processor
US7958104B2 (en) Context based data searching
Bosco et al. Discovering automatable routines from user interaction logs
US8443346B2 (en) Server evaluation of client-side script
US8418142B2 (en) Architecture for data validation
EP2469420A1 (en) CEP engine and method for processing CEP queries
US10922282B2 (en) On-demand collaboration user interfaces
US10599654B2 (en) Method and system for determining unique events from a stream of events
EP1811447A1 (en) Declarative adaptation of software entities stored in an object repository
US7836429B2 (en) Data synchronization mechanism for change-request-management repository interoperation
CN110990447B (en) Data exploration method, device, equipment and storage medium
CN109656963A (en) Metadata acquisition methods, device, equipment and computer readable storage medium
Alchin Pro Django
CN107341217A (en) A kind of data capture method and equipment
Gilmore Beginning PHP and MySQL 5: From novice to professional
CN109299913B (en) Employee salary scheme generation method and device
CN108268468A (en) The analysis method and system of a kind of big data
CN107085613A (en) Enter the filter method and device of library file
CN108733543A (en) A kind of method, apparatus of log analysis, electronic equipment and readable storage medium storing program for executing
Orlovskyi et al. Enterprise architecture modeling support based on data extraction from business process models.
CN109344173A (en) Data managing method and device, data structure
US10983989B2 (en) Issue rank management in an issue tracking system
US7499932B2 (en) Accessing data in an interlocking trees data structure using an application programming interface
US8875137B2 (en) Configurable mass data portioning for parallel processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant