WO2019000897A1 - 一种数据获取方法及设备 - Google Patents

一种数据获取方法及设备 Download PDF

Info

Publication number
WO2019000897A1
WO2019000897A1 PCT/CN2017/120216 CN2017120216W WO2019000897A1 WO 2019000897 A1 WO2019000897 A1 WO 2019000897A1 CN 2017120216 W CN2017120216 W CN 2017120216W WO 2019000897 A1 WO2019000897 A1 WO 2019000897A1
Authority
WO
WIPO (PCT)
Prior art keywords
elasticsearch
data
query
data acquisition
search engine
Prior art date
Application number
PCT/CN2017/120216
Other languages
English (en)
French (fr)
Inventor
支猛
张文明
陈少杰
Original Assignee
武汉斗鱼网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉斗鱼网络科技有限公司 filed Critical 武汉斗鱼网络科技有限公司
Publication of WO2019000897A1 publication Critical patent/WO2019000897A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Definitions

  • the present invention relates to the field of software engineering, and in particular, to a data acquisition method and device.
  • ElasticSearch is an excellent open source distributed search engine. In addition to searching, ElasticSearch is also a tool for log storage and offline data analysis mining. ElasticSearch can be used to collect logs that are output to disk on-line during the running process, and store the logs collected in real time into the ElasticSearch cluster.
  • ElasticSearch For the logs stored in the ElasticSearch cluster, there are two application scenarios: on the one hand, according to the developed log center platform, on which the developer helps the developer by setting the search conditions to query various logs output by the online application. Learn about the operation of online applications and quickly locate problems with online applications.
  • the Storm cluster will pull logs from the ElasticSearch cluster in real time to do complex aggregation calculations, such as distributed call chain calculation. Both scenarios require fast, continuous, real-time access to large amounts of data from the ElasticSearch cluster.
  • ElasticSearch provides the Scroll API (Rolling Search) for ElasticSearch to perform large batches of data queries quickly and efficiently.
  • Scroll API is suitable for processing large amounts of data, not suitable for real-time user requests, and whenever an application re-initiates a new Scroll API call, ElasticSearch will return data from the beginning, causing the client to receive duplicate data.
  • Direct use of the Scroll API provided by ElasticSearch poses the following problems for applications: there is no guarantee that large amounts of data will be reliably, sequentially, in real time, and non-repeated on the application side.
  • the present invention provides a data acquisition method and device.
  • a data acquisition method including:
  • the method further includes:
  • the data acquisition component specifically includes: preparing a query interface class and a scrolling query component class; wherein,
  • the preparation query interface class includes a prepare method and a parseResult method, and the prepare method is configured to provide a query condition set by a developer to a data acquisition component, where the parseResult method is used to provide a developer-set pair-to-search engine to the data acquisition component. Parsing rules for data obtained in ElasticSearch;
  • the scrolling query component class includes a doScrollSearch method for acquiring data in the search engine ElasticSearch in the manner of the ElasticSearch Scroll API, and the input parameter of the doScrollSearch method is an instance of the prepared query interface class.
  • the step S1 further includes:
  • S11 Instantiate the preparation query interface class to obtain an instance object of the preparation query interface class.
  • the step S2 further includes:
  • the information carried by the scrolling query request includes: a data query condition set by the developer, a request context ID, an offset query parameter, and index information of the last access.
  • the method further includes:
  • the search engine ElasticSearch sorts the data in ascending order according to the offset field.
  • step S22 further includes:
  • a data acquisition device including a memory, a processor, and a bus is provided.
  • the processor and the memory complete communication with each other through the bus;
  • the memory stores program instructions executable by the processor, the processor invoking program instructions in the memory to perform a data acquisition method as previously described.
  • a non-transitory computer readable storage medium storing computer instructions that cause the computer to perform data acquisition as previously described method.
  • the data acquisition method and device provided by the invention acquires large-volume data from the search engine ElasticSearch by calling a customized data acquisition component, so that the data acquisition is more reliable, orderly, real-time and non-repeating than the directly using ElasticSearch's Scroll API. .
  • FIG. 1 is a schematic flowchart of a data acquisition method according to an embodiment of the invention
  • FIG. 2 is a schematic flowchart of step S2 according to FIG. 1 according to another embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a data acquiring device according to another embodiment of the present invention.
  • FIG. 1 it is a schematic flowchart of a data acquisition method according to an embodiment of the present invention, including:
  • the application can initiate a scrolling query request to the search engine ElasticSearch by calling a predefined data acquisition component (Scroll API) Request), the data acquisition component replaces the prior art Scroll API and interacts with the search engine ElasticSearch to obtain the query data in the search engine ElasticSearch, and the data acquisition component parses the obtained ElasticSearch data into a specific business scenario.
  • a predefined data acquisition component Scroll API
  • the content of the scroll query request sent by the data acquisition component to the search engine ElasticSearch is a query condition of the QueryBuilder type set by the developer, and an offset (data offset value) query parameter and a scrollId parameter are added on the basis of the query condition of the QueryBuilder type ( The request context ID) and the last accessed index, while the content of the existing scroll query request sent through the Scroll API usually only includes the query conditions set by the developer.
  • the data acquisition component relies on the scrollId parameter to ensure real-time and non-repetitive data acquisition, and relies on the offset mechanism to ensure the order of data acquisition.
  • the data acquisition method provided by the embodiment of the present invention acquires large-volume data from the search engine ElasticSearch by calling a customized data acquisition component, so that the data acquisition is more reliable, orderly, real-time and non-repeating than the directly using the ElasticSearch Scroll API. .
  • the method further includes:
  • the data obtaining component specifically includes: preparing a query interface class and a scrolling query component class;
  • the preparation query interface class includes a prepare method and a parseResult method, and the prepare method is configured to provide a query condition set by a developer to a data acquisition component, where the parseResult method is used to provide a developer-set pair-to-search engine to the data acquisition component. Parsing rules for data obtained in ElasticSearch;
  • the scrolling query component class includes a doScrollSearch method for acquiring data in the search engine ElasticSearch in the manner of the ElasticSearch Scroll API, and the input parameter of the doScrollSearch method is an instance of the prepared query interface class.
  • the data acquisition component is based on the ElasticSearch Scroll API, and the data acquisition component is implemented by: constructing a preparation query interface class (IPrepareSearch ⁇ T>interface class) and constructing a scrolling query component class (ScrollSearchComponent class).
  • This interface is to provide the application (developer) to set the query conditions and set the parsing rules for the data obtained from ElasticSearch.
  • IPrepareSearch ⁇ T> interface class is defined as follows:
  • the interface class consists of two methods, one is the prepare method, the function of which is to provide the application developer with the preparation query conditions for the data acquisition component. This defines SearchRequestVo to describe the query data that developers provide to the data acquisition component.
  • Another method is the parseResult method, which provides the developer with the parsing rules for the data retrieved from ElasticSearch.
  • the method's input parameter represents a piece of data retrieved by the data acquisition component from ElasticSearch.
  • the return value type of the method is used.
  • Generics, provided by the developer when implementing the IPrepareSearch ⁇ R> interface the source type is Map ⁇ String, Object> type, which is a primitive data type.
  • the data that the developer ultimately needs is the type object required by the specific business scenario. Therefore, the parsed ElasticSearch data needs to be parsed by parseResult.
  • SearchRequestVo is defined as follows:
  • scrollId represents the request context id created by ElasticSearch for each scrolling query request (scroll API request).
  • ElasticSearch creates a request context for the application, which is time-sensitive, that is, expires after a specified time.
  • ElasticSearch will continue the last query result and return the remaining data, thus ensuring the data.
  • scrollWindow represents the expiration time set by ElasticSearch for the request context, the unit is milliseconds, and 180000 is the expiration time set by the developer. This is only an example. It can also be set to other values as needed.
  • the scrollWindow has a default value. The developer does not set the value of the scrollWindow, then the scrollWindow is the default value.
  • Offset represents the most recent data offset value accessed by the application during the last time the request context is valid. The offset is important to ensure that the application does not get the data in ElasticSearch out of order. ElasticSearch does not add an offset field to the data that is automatically stored in it, which means that our application developers need to ensure that the data stored in the ElasticSearch cluster must have an offset field that is globally unique and monotonically increasing.
  • the type of queryBuilder is defined by the class in the java client library provided by ElasticSearch, which represents the query conditions built by the developer.
  • the ScrollSearchComponent class is the core class of the data acquisition component.
  • the developer uses the doScrollSearch method of this class to finally get the data in the ElasticSearch cluster in the way of the ElasticSearch Scroll API.
  • the following is the signature of the doScrollSearch method:
  • the type of the method is IPrepareSearch ⁇ T> interface type
  • the doScrollSearch method internally calls the prepare method in prepareSearch to determine the query parameters of this query request, and will automatically call the parseResult method in prepareSearch to get from ElasticSearch. Each piece of data is parsed.
  • the result of the local query returned to the application is defined by SearchResponseVo, the content is as follows:
  • the meanings of the scrollId, offset, and scrollWindow and SearchRequestVo classes are the same.
  • the content here refers to the data that parses the ElasticSearch data by calling the parseResult method in prepareSearch.
  • step S1 further includes:
  • S11 Instantiate the preparation query interface class to obtain an instance object of the preparation query interface class.
  • the developer needs to implement the preparation query interface class (IPrepareSearch ⁇ T> interface class) before calling the data acquisition component to obtain data from the search engine ElasticSearch, that is, instantiating the prepared query interface class. Obtaining an instance of the prepared query interface class, thereby implementing the prepare method and the parseResult method of the IPrepareSearch ⁇ T> interface class.
  • the developer provides the query condition written according to the rules of the business to the data acquisition component through the prepare method.
  • the query condition is the QueryBuilder type.
  • the parseResult method provides the data acquisition component with how to convert the data of the Map ⁇ String, Object> type into the developer. The type of domain model you want.
  • Step S12 refers to taking the instance object as an input parameter of the doScrollSearch method of the scrolling query component class (ScrollSearchComponent class).
  • a schematic flowchart of step S2 includes:
  • the pre-defined data acquisition component is invoked to initiate a scroll query request to the search engine ElasticSearch, and the step of obtaining the returned result of the scrolling query request by the parsed search engine ElasticSearch includes:
  • the data acquisition component obtains the query request set by the developer through the callback prepare method, and then initiates a scroll query request to ElasticSearch inside the component.
  • the return value of the prepare method is SearchRequestVo, and includes four parameters, that is, the request context ID value scrollId, the expiration duration of the request context represented by the request context ID, the scrollWindow, and the offset of the latest acquired data.
  • the value offset and the query condition set by the developer queryBuilder The data acquisition component can obtain the above parameters by calling the prepare method. After the query condition is prepared, the data acquisition component initiates a scrolling query request to the search engine ElasticSearch.
  • both scrollId and offset are empty or default values, and ElasticSearch creates a time-sensitive request context and returns the scrollId associated with the request context.
  • ElasticSearch query ends returning data to the data acquisition component, the data acquisition component retains the offset field of the last data acquired.
  • the data acquisition component sends the same scrolling query request to the search engine again (ie, has the same scrollId value as the previous scrolling query request), if the scrollId value is still valid, ElasticSearch can start to return data according to the offset field, thus ensuring The data returned by multiple scrolling query requests is continuous and non-repeating. If the request context corresponding to the scrollId is already invalid, ElasticSearch will prompt the data acquisition component that the scrollId is invalid.
  • the scrollId and offset values in the Prepare method are automatically updated by the interaction between the data acquisition component and ElasticSearch.
  • the interaction between the data acquisition component and ElasticSearch is implemented through the Java client provided by ElasticSearch, which accesses the port of the ElasticSearch server in the form of TCP.
  • the data acquisition component When the data acquisition component obtains the return result of ElasticSearch to the query request, the data type of the result is Map ⁇ String, Object>, and the data acquisition component will call back the parseResult method, and the result of the request returned by the unparsed ElasticSearch is passed to the parseResult. The method parses the request result according to a specific parsing rule set by the developer. Finally, the data acquisition component returns the ElasticSearch data parsed by the parseResult method to the caller, the developer.
  • step S2 The entire process of step S2 is performed in the doScrollSearch method in the ScrollSearchComponent class.
  • the information carried by the scrolling query request includes: a data query condition set by a developer, a request context ID, an offset query parameter, and index information of the last access.
  • the type of the scroll query request is also the QueryBuilder type.
  • the content is based on the query condition queryBuilder set by the developer.
  • the offset query parameter, the scrollId parameter, and the index of the last visit are added.
  • the search engine ElasticSearch is based on the content of the above scrolling query request.
  • the server queries the data that meets the query criteria.
  • the method further includes:
  • the search engine ElasticSearch sorts the data in ascending order according to the offset field.
  • ElasticSearch will return data from the beginning, which will cause the client to receive duplicate data.
  • the data acquisition component requires the application developer to ensure that the data stored in ElasticSearch must contain an offset field that needs to be globally unique and monotonically increasing (there are many scenarios in the industry to achieve this). In this way, the data acquisition component will ask ElasticSearch to sort the data in the offset field as soon as possible after executing the new Scroll API call request, and retain the offset field of the last data in the acquired data set. To ensure that the new round of Scroll API request is to continue to obtain data based on the last Scroll API request.
  • ElasticSearch is required to sort the data based on the offset each time a new Scroll API request is executed to ensure the order of the data.
  • the data acquisition component relies on the ElasticSearch and offset mechanisms to ensure continuity and non-repetition.
  • step S22 further includes:
  • the results returned by the ElasticSearch Scroll API request reflect the state of the index at the time the initial search request was created. It is like a real-time snapshot, and subsequent changes to the text (insert, update, or delete) only affect subsequent requests. That is to say, after ElasticSearch creates a request context for the new Scroll API request, the newly added, deleted, and updated data for ElasticSearch will not affect multiple Scroll requests under the request context.
  • the implementation manner is that the data acquisition component can obtain the data corresponding to the request context by the search engine ElasticSearch through the scrollId (indicating Once the data has been retrieved, the data acquisition component will re-initiate a new round of scrolling query requests to the search engine ElasticSearch.
  • a schematic structural diagram of a data acquisition device includes a memory 31, a processor 32, and a bus 33.
  • the processor 32 and the memory 31 complete communication with each other through the bus 33;
  • the memory 31 stores program instructions that are executable by the processor 32, and the processor 32 calls the program instructions in the memory 31 to perform the data acquisition method as described in the above embodiments, including, for example: Providing the set data query condition and the rule for parsing the data returned by ElasticSearch to the predefined data acquisition component; calling the data acquisition component to initiate a scrolling query request to the search engine ElasticSearch, and obtaining the parsed search engine ElasticSearch The result of scrolling the query request.
  • the embodiment discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, the computer
  • the method provided by the foregoing method embodiments may include, for example, including: S1, acquiring live list data including a 3D Touch configuration; and S2, performing a live list unit item supporting the 3D Touch preview based on the live list data.
  • the 3D Touch interactive proxy configuration registers each live broadcast list unit item supporting the 3D Touch preview as a 3D Touch interactive recognition response object; S3, according to the obtained user, pressing the target live broadcast list unit item event that supports the 3D Touch preview, The 3D Touch interaction recognition response object corresponding to the target live broadcast list unit item performs a corresponding callback response logic processing class callback method to implement a live view browsing interaction based on 3D Touch.
  • a further embodiment of the present invention provides a non-transitory computer readable storage medium storing computer instructions, the computer instructions causing the computer to perform the operations as described in the above embodiments
  • the data acquisition method includes, for example, providing a set data query condition and a rule for parsing the data returned by the ElasticSearch to a predefined data acquisition component; calling the data acquisition component to initiate a scrolling query request to the search engine ElasticSearch, and obtaining the The result of the parsing search engine ElasticSearch returning the request to the scrolling query.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the embodiments of the data acquisition device described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units. That is, it can be located in one place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
  • the data acquisition method and device provided by the foregoing embodiments of the present invention acquire large-volume data from the search engine ElasticSearch by calling a customized data acquisition component, so that the data acquisition is more reliable, orderly, real-time and more directly than using the ElasticSearch Scroll API. Not repeating.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Stored Programmes (AREA)

Abstract

一种数据获取方法及设备,所述方法包括:将开发人员设置的数据查询条件和对搜索引擎ElasticSearch返回的数据进行解析的规则提供给预定义的数据获取组件(S1);调用所述数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求,获取经过解析的搜索引擎ElasticSearch对所述滚屏查询请求的返回结果(S2)。上述方法通过调用自定义的数据获取组件向搜索引擎ElasticSearch获取大批量数据,使得数据获取操作更加地可靠、有序、实时和不重复。

Description

一种数据获取方法及设备
交叉引用
本申请引用于2017年06月27日提交的专利名称为“一种数据获取方法及设备”的第2017105013016号中国专利申请,其通过引用被全部并入本申请。
技术领域
本发明涉及软件工程领域,更具体地,涉及一种数据获取方法及设备。
背景技术
ElasticSearch是一个优秀的开源分布式搜索引擎,除了用于搜索,ElasticSearch也是日志存储、离线数据分析挖掘的利器。应用ElasticSearch可以实时收集线上应用在运行过程中输出到磁盘上的日志,并将实时收集到的日志存储到ElasticSearch集群中。
对于存储在ElasticSearch集群中的日志,有以下两种应用场景:一方面根据所开发的日志中心平台,在该平台上开发人员通过设置检索条件查询线上应用输出的各种日志,从而帮助开发人员了解线上应用的运行情况和快速定位线上应用的问题。另一方面Storm集群会实时批量地从ElasticSearch集群中拉取日志做复杂的聚合计算,如分布式调用链计算等。以上两种场景均要求快速、连续、实时地从ElasticSearch集群中获取大量数据。ElasticSearch提供了Scroll API(滚动搜索)用于使ElasticSearch快速、有效地执行大批量的数据查询。
但Scroll API(滚动搜索)适合处理大量的数据,不适合实时用户请求,且每当应用程序重新发起一个新的Scroll API调用时,ElasticSearch会从头开始返回数据,造成客户端接收重复的数据。直接 使用ElasticSearch提供的Scroll API会给应用程序带来了以下问题:无法确保在应用程序端可靠、按序、实时以及不重复地获取大批量的数据。
发明内容
为了克服直接使用ElasticSearch提供的Scroll API带来的无法可靠、按序、实时及不重复地获取大批量数据的问题,本发明提供一种数据获取方法及设备。
根据本发明的一个方面,提供一种数据获取方法,包括:
S1,将开发人员设置的数据查询条件和对搜索引擎ElasticSearch返回的数据进行解析的规则提供给预定义的数据获取组件;
S2,调用所述数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求,获取经过解析的搜索引擎ElasticSearch对所述滚屏查询请求的返回结果。
其中,在所述步骤S1之前还包括:
S0,实现基于ElasticSearch Scroll API的数据获取组件。
其中,所述数据获取组件具体包括:准备查询接口类和滚屏查询组件类;其中,
所述准备查询接口类包括prepare方法和parseResult方法,所述prepare方法用于向数据获取组件提供开发人员设置的查询条件,所述parseResult方法用于向数据获取组件提供开发人员设置的对从搜索引擎ElasticSearch中获取到的数据的解析规则;
所述滚屏查询组件类包括doScrollSearch方法,所述doScrollSearch方法用于以ElasticSearch Scroll API的方式获取搜索引擎ElasticSearch中的数据,所述doScrollSearch方法的入参为所述准备查询接口类的实例。
其中,所述步骤S1进一步包括:
S11,将所述准备查询接口类实例化,获得所述准备查询接口类的 一个实例对象;
S12,将所述实例对象传递给所述滚屏查询组件的doScrollSearch方法。
其中,所述步骤S2进一步包括:
S21,在所述doScrollSearch方法中回调所述prepare方法获取开发人员设置的数据查询条件,并向搜索引擎ElasticSearch发起滚屏查询请求;
S22,当接收到搜索引擎ElasticSearch对所述滚屏查询请求的返回结果时,回调所述parseResult方法对所述返回结果进行解析,获得经过解析的ElasticSearch数据;
S23,返回所述经过解析的ElasticSearch数据。
其中,所述滚屏查询请求携带的信息包括:开发人员设置的数据查询条件、请求上下文ID、offset查询参数以及上一次访问的索引信息。
其中,在步骤S21中向搜索引擎ElasticSearch发起滚屏查询请求的步骤之后,还包括:
使搜索引擎ElasticSearch按照offset字段对数据进行升序排序。
其中,步骤S22还包括:
若获知搜索引擎ElasticSearch根据所述请求上下文ID获取不到该请求上下文所对应的数据,则向搜索引擎ElasticSearch重新发起新的滚屏查询请求。
根据本发明的另一个方面,提供一种数据获取设备,包括存储器、处理器、以及总线,
所述处理器和存储器通过所述总线完成相互间的通信;
所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述存储器中的程序指令,以执行如前所述的数据获取方法。
根据本发明的又一个方面,提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如前所述的数据获取方法。
本发明提出的一种数据获取方法及设备,通过调用自定义的数据获取组件向搜索引擎ElasticSearch获取大批量数据,使得数据获取较直接使用ElasticSearch的Scroll API更加地可靠、有序、实时和不重复。
附图说明
图1为根据本发明一实施例提供的数据获取方法的流程示意图;
图2为根据本发明另一实施例提供的基于图1中步骤S2的流程示意图;
图3为本发明另一实施例提供的一种数据获取设备的结构示意图。
具体实施方式
下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围。
如图1所示,为根据本发明一实施例提供的数据获取方法的流程示意图,包括:
S1,将开发人员设置的数据查询条件和对搜索引擎ElasticSearch返回的数据进行解析的规则提供给预定义的数据获取组件;
S2,调用所述数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求,获取经过解析的搜索引擎ElasticSearch对所述滚屏查询请求的返回结果。
具体地,为了能够快速、连续且实时地从搜索引擎ElasticSearch中获取大批量的数据用于具体的业务场景,应用程序可以通过调用预定义的数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求(Scroll API请求),由数据获取组件代替现有技术的Scroll API与搜索引擎ElasticSearch进行交互,从而获取搜索引擎ElasticSearch中符合查询条件的数据,并且数据获取组件会将所获取到的ElasticSearch数据解析成具体业务场景所需要的类型对象。开发人员只需要在调用预定义的数据获取组件前,将所设置的数据查询条件和对ElasticSearch返回的数据进行解析的规则提供给数据获取组件,就可以通过调用所 述数据获取组件,获取到解析后的ElasticSearch数据。
数据获取组件向搜索引擎ElasticSearch发送的滚屏查询请求的内容是开发人员设置的QueryBuilder类型的查询条件,并在所述QueryBuilder类型查询条件的基础上添加offset(数据偏移值)查询参数、scrollId参数(请求上下文ID)以及上次访问的索引,而现有的通过Scroll API发送的滚屏查询请求的内容通常只包括开发人员设置的查询条件。数据获取组件依赖scrollId参数保证数据获取的实时性和不重复性,依赖offset机制保证数据获取的有序性。
本发明实施例提供的一种数据获取方法,通过调用自定义的数据获取组件向搜索引擎ElasticSearch获取大批量数据,使得数据获取较直接使用ElasticSearch的Scroll API更加地可靠、有序、实时和不重复。
本发明另一实施例,在上述实施例的基础上,在所述步骤S1之前还包括:
S0,实现基于ElasticSearch Scroll API的数据获取组件;
其中,所述数据获取组件具体包括:准备查询接口类和滚屏查询组件类;
所述准备查询接口类包括prepare方法和parseResult方法,所述prepare方法用于向数据获取组件提供开发人员设置的查询条件,所述parseResult方法用于向数据获取组件提供开发人员设置的对从搜索引擎ElasticSearch中获取到的数据的解析规则;
所述滚屏查询组件类包括doScrollSearch方法,所述doScrollSearch方法用于以ElasticSearch Scroll API的方式获取搜索引擎ElasticSearch中的数据,所述doScrollSearch方法的入参为所述准备查询接口类的实例。
具体地,所述数据获取组件是基于ElasticSearch Scroll API的,实现所述数据获取组件包括:构造准备查询接口类(IPrepareSearch<T>接口类)和构造滚屏查询组件类(ScrollSearchComponent类)。
1)构造IPrepareSearch<T>接口类
该接口的作用是提供给应用程序(开发人员)对查询条件进行设置、对从ElasticSearch获取的数据的解析规则进行设置。
IPrepareSearch<T>接口类的定义如下:
Figure PCTCN2017120216-appb-000001
该接口类由两个方法构成,一个是prepare方法,该方法的作用是提供给应用的开发人员向数据获取组件提供准备查询条件。这里定义了SearchRequestVo用于描述开发人员向数据获取组件提供的查询数据。另一个方法是parseResult方法,该方法提供给开发人员设置对从ElasticSearch获取到的数据的解析规则,其中方法的入参source表示数据获取组件从ElasticSearch获取到的一条数据,方法的返回值类型使用了泛型,由开发人员实现IPrepareSearch<R>接口时提供,source的类型为Map<String,Object>类型,是比较原始的数据类型,开发人员最终需要的数据是具体业务场景所需要的类型对象,因此需要通过parseResult对获取到的ElasticSearch数据进行解析。
SearchRequestVo的定义如下:
Figure PCTCN2017120216-appb-000002
其中,scrollId表示ElasticSearch为每次的滚屏查询请求(scroll API请求)创建的请求上下文id。当应用程序首次调用ElasticSearch的scroll API时,ElasticSearch会为该应用程序创建一个请求上下文,该请求上下文具有一定的时效性,即在指定的时间后会过期。而在该请求上 下文的有效时间内,应用程序再次调用ElasticSearch Scroll API的请求时,只要将该scrollId传给ElasticSearch,那么ElasticSearch就会接着上次的查询结果,返回剩下的数据,从而保证了数据获取的不重复性。
scrollWindow表示的是ElasticSearch为请求上下文设置的过期时间,单位是毫秒,180000是开发人员设置的过期时长,在此只是示例性的,还可以根据需要设定为其他的值,scrollWindow有默认值,若开发人员没有对scrollWindow的取值进行设置,则scrollWindow为默认值。
offset表示在上次请求上下文有效的时间内,应用程序所访问到的最近一次的数据偏移值,该offset的作用很重要,用于保证应用程序不会无序地获取ElasticSearch中的数据。ElasticSearch并不会为自动存储到其中的数据添加offset字段,也就是说我们的应用开发人员需要保证向ElasticSearch集群存储的数据必须具offset字段,该字段的要求是全局唯一且单调递增。
queryBuilder的类型由ElasticSearch提供的java客户端库中的类定义,表示开发人员所构建的查询条件。
2)构造ScrollSearchComponent类
ScrollSearchComponent类是数据获取组件的核心类,开发人员通过该类的doScrollSearch方法最终以ElasticSearch Scroll API的方式获取ElasticSearch集群中的数据。以下是doScrollSearch方法的签名:
public<T>SearchResponseVo<T>
doScrollSearch(IPrepareSearch<T>prepareSearch)
该方法的入参的类型是IPrepareSearch<T>接口类型,doScrollSearch方法内部会回调prepareSearch中的prepare方法以确定本次查询请求的查询参数,同时会自动的调用prepareSearch中的parseResult方法对从ElasticSearch中获取到的每条数据进行解析。最后返回给应用程序本地查询的结果,该查询结果由SearchResponseVo进行定义,内容如下:
Figure PCTCN2017120216-appb-000003
其中scrollId、offset以及scrollWindow和SearchRequestVo类中的含义一致,这里的content表示调用prepareSearch中的parseResult方法对ElasticSearch数据进行解析后的数据。
本发明另一实施例,在上述实施例的基础上,步骤S1进一步包括:
S11,将所述准备查询接口类实例化,获得所述准备查询接口类的一个实例对象;
S12,将所述实例对象传递给所述滚屏查询组件类的doScrollSearch方法。
具体地,开发人员在实现数据获取组件后,在调用该数据获取组件向搜索引擎ElasticSearch获取数据前,需要实现准备查询接口类(IPrepareSearch<T>接口类),即将所述准备查询接口类实例化,获得所述准备查询接口类的一个实例,从而实现IPrepareSearch<T>接口类的prepare方法和parseResult方法。开发人员通过prepare方法向数据获取组件提供根据业务的规则编写的查询条件,该查询条件是QueryBuilder类型,通过parseResult方法向数据获取组件提供如何将Map<String,Object>类型的数据转换成开发人员想要的领域模型类型。
步骤S12是指将所述实例对象作为滚屏查询组件类(ScrollSearchComponent类)的doScrollSearch方法的入参。
如图2所示,为本发明另一实施例,在上述实施例的基础上,步骤S2的流程示意图,包括:
S21,在所述doScrollSearch方法中回调所述prepare方法获取开发人员设置的数据查询条件,并向搜索引擎ElasticSearch发起滚屏查 询请求;
S22,当接收到搜索引擎ElasticSearch对所述滚屏查询请求的返回结果时,回调所述parseResult方法对所述返回结果进行解析,获得经过解析的ElasticSearch数据;
S23,返回所述经过解析的ElasticSearch数据。
具体地,调用预定义的数据获取组件向搜索引擎ElasticSearch发起滚屏(scroll)查询请求,获取经过解析的搜索引擎ElasticSearch对所述滚屏查询请求的返回结果的步骤包括:
数据获取组件通过回调prepare方法获取开发人员设置的查询请求,然后在组件内部向ElasticSearch发起滚屏查询请求。由上述实施例可知,prepare方法的返回值类型为SearchRequestVo,包含四个参数,即请求上下文ID值scrollId、所述请求上下文ID所表示的请求上下文的过期时长scrollWindow、最近一次获取到的数据的偏移值offset和开发人员设置的查询条件queryBuilder。数据获取组件通过回调prepare方法,即可以获取到上述各参数。准备好查询条件后,数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求。
在数据获取组件第一次向搜索引擎ElasticSearch发起滚屏查询请求时,scrollId和offset均为空或默认值,ElasticSearch会创建一个具有时效性的请求上下文,并返回与该请求上下文相关联的scrollId。当ElasticSearch查询结束向数据获取组件返回数据的同时,数据获取组件会保留此次获取到的最后一条数据的offset字段。当数据获取组件再次向搜索引擎发送同一滚屏查询请求时(即与之前的滚屏查询请求具有相同的scrollId值),如果该scrollId值仍然有效,那么ElasticSearch就能够根据offset字段开始返回数据,从而保证了多次的滚屏查询请求所返回的数据是连续且不重复的。如果该scrollId所对应的请求上下文已经无效,那么ElasticSearch会向数据获取组件提示该scrollId已经无效。
Prepare方法中的scrollId和offset值,是通过数据获取组件与ElasticSearch的之间交互而自动更新的。数据获取组件与ElasticSearch 之间的交互是通过ElasticSearch提供的Java client来实现的,该Java client以TCP的形式访问ElasticSearch服务器的端口。
当数据获取组件获取到ElasticSearch对刚刚查询请求的返回结果时,该结果的数据类型是Map<String,Object>,数据获取组件会回调parseResult方法,即将未经过解析的ElasticSearch返回的请求结果传递给parseResult方法,根据开发人员所设置的具体的解析规则解析所述请求结果。最后,数据获取组件将经过parseResult方法解析过的ElasticSearch数据返回给调用者,即开发人员。
步骤S2的整个过程都是在ScrollSearchComponent类中doScrollSearch方法中进行的。
基于上述实施例,所述滚屏查询请求携带的信息包括:开发人员设置的数据查询条件、请求上下文ID、offset查询参数以及上一次访问的索引信息。
滚屏查询请求的类型也是QueryBuilder类型,其内容是在开发人员设置的查询条件queryBuilder的基础上,添加offset查询参数、scrollId参数以及上一次访问的索引,搜索引擎ElasticSearch根据上述滚屏查询请求的内容在其服务器中查询符合查询条件的数据。
基于上述实施例,在步骤S21中向搜索引擎ElasticSearch发起滚屏查询请求的步骤之后,还包括:
使搜索引擎ElasticSearch按照offset字段对数据进行升序排序。
具体地,应用程序每次重新发起一个新的Scroll API调用,那么ElasticSearch就会从头开始返回数据,这就会造成客户端接收重复的数据。为了解决这个问题,数据获取组件要求应用开发人员保证存储到ElasticSearch中的数据必须含有offset字段,该字段需要全局唯一且单调递增(业内有很多方案可以实现该需求)。这样数据获取组件会在应用每次执行新的Scroll API调用请求后,都会要求ElasticSearch对数据在offset字段上进行升序排序,并将获取到的数据集中的最后一条数据的offset字段保留下来,以此来保证新一轮的Scroll API请求是在 上一次的Scroll API请求的基础上继续获取数据。
通过每次执行新的Scroll API请求时都要求ElasticSearch对数据基于offset进行排序可以保证数据的顺序型。数据获取组件依赖ElasticSearch和offset机制保证连续性和不重复性。
基于上述实施例,步骤S22还包括:
若获知搜索引擎ElasticSearch根据所述请求上下文ID获取不到该请求上下文所对应的数据,则向搜索引擎ElasticSearch重新发起新的滚屏查询请求。
具体地,ElasticSearch Scroll API请求返回的结果反映了初始search请求建立时索引的状态。它就像一个实时的快照,后续对文本的改变(插入,更新或者删除)都仅仅影响了后来的请求。也就是说ElasticSearch在为新的Scroll API请求创建了请求上下文之后,在此之后的对ElasticSearch新添加、删除、更新的数据都不会影响该请求上下文下的多次Scroll请求。为了使数据获取组件在实时性上能够实现获取到在Scroll请求上下文创建之后的新增加的数据,实现方式是数据获取组件如果获知搜索引擎ElasticSearch通过scrollId获取不到该请求上下文所对应的数据(表明数据已经获取完了),则数据获取组件会向搜索引擎ElasticSearch重新发起新一轮的滚屏查询请求。
如图3所述,为本发明另一实施例提供的一种数据获取设备的结构示意图,包括存储器31、处理器32、以及总线33,
所述处理器32和存储器31通过所述总线33完成相互间的通信;
所述存储器31存储有可被所述处理器32执行的程序指令,所述处理器32调用所述存储器31中的程序指令,以执行如上述各实施例所述的数据获取方法,例如包括:将所设置的数据查询条件和对ElasticSearch返回的数据进行解析的规则提供给预定义的数据获取组件;调用所述数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求,获取经过解析的搜索引擎ElasticSearch对所述滚屏查询请求的返回结果。
本实施例公开一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的方法,例如包括:例如包括:S1,获取包含3D Touch配置的直播列表数据;S2,基于所述直播列表数据,对支持3D Touch预览的直播列表单元项进行3D Touch交互代理配置,将每个所述支持3D Touch预览的直播列表单元项注册为3D Touch交互识别响应对象;S3,根据获取到的用户按压支持3D Touch预览的目标直播列表单元项事件,对所述目标直播列表单元项所对应的3D Touch交互识别响应对象执行相应的按压响应逻辑处理类中的回调方法,实现基于3D Touch的直播列表浏览交互。
本发明又一实施例,提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如上述各实施例所述的数据获取方法,例如包括:将所设置的数据查询条件和对ElasticSearch返回的数据进行解析的规则提供给预定义的数据获取组件;调用所述数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求,获取经过解析的搜索引擎ElasticSearch对所述滚屏查询请求的返回结果。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所描述的数据获取设备的实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员 在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
本发明上述各实施例提出的数据获取方法及设备,通过调用自定义的数据获取组件向搜索引擎ElasticSearch获取大批量数据,使得数据获取较直接使用ElasticSearch的Scroll API更加地可靠、有序、实时和不重复。

Claims (11)

  1. 一种数据获取方法,其特征在于,包括:
    S1,将开发人员设置的数据查询条件和对搜索引擎ElasticSearch返回的数据进行解析的规则提供给预定义的数据获取组件;
    S2,调用所述数据获取组件向搜索引擎ElasticSearch发起滚屏查询请求,获取经过解析的搜索引擎ElasticSearch对所述滚屏查询请求的返回结果。
  2. 根据权利要求1所述的方法,其特征在于,在所述步骤S1之前还包括:
    S0,实现基于ElasticSearch Scroll API的数据获取组件。
  3. 根据权利要求2所述的方法,其特征在于,所述数据获取组件具体包括:准备查询接口类和滚屏查询组件类;其中,
    所述准备查询接口类包括prepare方法和parseResult方法,所述prepare方法用于向数据获取组件提供开发人员设置的数据查询条件,所述parseResult方法用于向数据获取组件提供开发人员设置的对从搜索引擎ElasticSearch中获取到的数据的解析规则;
    所述滚屏查询组件类包括doScrollSearch方法,所述doScrollSearch方法用于以ElasticSearch Scroll API的方式获取搜索引擎ElasticSearch中的数据,所述doScrollSearch方法的入参为所述准备查询接口类的实例。
  4. 根据权利要求3所述的方法,其特征在于,所述步骤S1进一步包括:
    S11,将所述准备查询接口类实例化,获得所述准备查询接口类的一个实例对象;
    S12,将所述实例对象传递给所述滚屏查询组件的doScrollSearch方法。
  5. 根据权利要求3所述的方法,其特征在于,所述步骤S2进一步包括:
    S21,在所述doScrollSearch方法中回调所述prepare方法获取开发人员设置的数据查询条件,并向搜索引擎ElasticSearch发起滚屏查询请求;
    S22,当接收到搜索引擎ElasticSearch对所述滚屏查询请求的返回结果时,回调所述parseResult方法对所述返回结果进行解析,获得经过解析的ElasticSearch数据;
    S23,返回所述经过解析的ElasticSearch数据。
  6. 根据权利要求5所述的方法,其特征在于,所述滚屏查询请求携带的信息包括:开发人员设置的数据查询条件、请求上下文ID、offset查询参数以及上一次访问的索引信息。
  7. 根据权利要求5所述的方法,其特征在于,在步骤S21中向搜索引擎ElasticSearch发起滚屏查询请求的步骤之后,还包括:
    使搜索引擎ElasticSearch按照offset字段对数据进行升序排序。
  8. 根据权利要求6所述的方法,其特征在于,步骤S22还包括:
    若获知搜索引擎ElasticSearch根据所述请求上下文ID获取不到该请求上下文所对应的数据,则向搜索引擎ElasticSearch重新发起新的滚屏查询请求。
  9. 一种数据获取设备,其特征在于,包括存储器、处理器、以及总线,
    所述处理器和存储器通过所述总线完成相互间的通信;
    所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述存储器中的程序指令,以执行如权利要求1至8任一所述的方法。
  10. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如权利要求1至8任一所述的方法。
  11. 一种计算机程序产品,其特征在于,所述计算机程序产品包 括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行如权利要求1至8任一所述的方法。
PCT/CN2017/120216 2017-06-27 2017-12-29 一种数据获取方法及设备 WO2019000897A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710501301.6 2017-06-27
CN201710501301.6A CN107341217B (zh) 2017-06-27 2017-06-27 一种数据获取方法及设备

Publications (1)

Publication Number Publication Date
WO2019000897A1 true WO2019000897A1 (zh) 2019-01-03

Family

ID=60221638

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/120216 WO2019000897A1 (zh) 2017-06-27 2017-12-29 一种数据获取方法及设备

Country Status (2)

Country Link
CN (1) CN107341217B (zh)
WO (1) WO2019000897A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457189A (zh) * 2019-07-02 2019-11-15 平安科技(深圳)有限公司 一种应用程序的日志管理方法及系统、相关设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341217B (zh) * 2017-06-27 2020-02-07 武汉斗鱼网络科技有限公司 一种数据获取方法及设备
CN113407785B (zh) * 2021-06-11 2023-02-28 西北工业大学 一种基于分布式储存系统的数据处理方法和系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399887A (zh) * 2013-07-19 2013-11-20 蓝盾信息安全技术股份有限公司 一种海量日志的查询与统计分析系统
CN106528797A (zh) * 2016-11-10 2017-03-22 上海轻维软件有限公司 基于Elasticsearch的DSL查询方法
CN107341217A (zh) * 2017-06-27 2017-11-10 武汉斗鱼网络科技有限公司 一种数据获取方法及设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467686B2 (en) * 2007-02-09 2019-11-05 Xcira, Inc. Integrated auctioning environment platform
CN106126731B (zh) * 2016-07-01 2020-02-14 百势软件(北京)有限公司 一种获取Elasticsearch分页数据的方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399887A (zh) * 2013-07-19 2013-11-20 蓝盾信息安全技术股份有限公司 一种海量日志的查询与统计分析系统
CN106528797A (zh) * 2016-11-10 2017-03-22 上海轻维软件有限公司 基于Elasticsearch的DSL查询方法
CN107341217A (zh) * 2017-06-27 2017-11-10 武汉斗鱼网络科技有限公司 一种数据获取方法及设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457189A (zh) * 2019-07-02 2019-11-15 平安科技(深圳)有限公司 一种应用程序的日志管理方法及系统、相关设备

Also Published As

Publication number Publication date
CN107341217B (zh) 2020-02-07
CN107341217A (zh) 2017-11-10

Similar Documents

Publication Publication Date Title
CN110908997B (zh) 数据血缘构建方法、装置、服务器及可读存储介质
US8762408B2 (en) Optimizing software applications
US8863075B2 (en) Automated support for distributed platform development
WO2018035799A1 (zh) 数据查询方法、应用和数据库服务器、中间件及系统
CA2619313A1 (en) Initial server-side content rendering for client-script web pages
TW201120665A (en) Systems and methods for providing advanced search result page content
CN105550206B (zh) 结构化查询语句的版本控制方法及装置
US11252148B2 (en) Secure web application delivery platform
CN108228875B (zh) 基于完美哈希的日志解析方法及装置
WO2019000897A1 (zh) 一种数据获取方法及设备
US9122755B2 (en) Instantaneous incremental search user interface
US20170359434A1 (en) Web caching with image and local storage
US10620970B2 (en) Request processing by a runtime agent in a network system
US10606805B2 (en) Object-level image query and retrieval
US10827035B2 (en) Data uniqued by canonical URL for rest application
US11841841B2 (en) Stand in tables
CN114168119B (zh) 代码文件编辑方法、装置、电子设备以及存储介质
CN113742420B (zh) 数据同步方法和装置
CN108491448B (zh) 一种数据推送的方法和装置
CN110830537B (zh) 一种页面处理方法及装置
CN112052234A (zh) 业务数据的处理方法和装置、存储介质、电子装置
CN106933826B (zh) 数据预处理方法及装置
US20180165289A1 (en) Domain similarity scores for information retrieval
US9679010B2 (en) Methods, systems, and apparatus for search of electronic information attachments
US11301498B2 (en) Multi-cloud object store access

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17915973

Country of ref document: EP

Kind code of ref document: A1