CN111814020A

CN111814020A - Data acquisition method and device

Info

Publication number: CN111814020A
Application number: CN202010581970.0A
Authority: CN
Inventors: 司翔; 史忠伟
Original assignee: Wuba Co Ltd
Current assignee: Wuba Co Ltd
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2020-10-23

Abstract

The embodiment of the invention provides a data acquisition method and a data acquisition device, wherein a data acquisition request sent by a user terminal is received, a target index server corresponding to the data acquisition request is determined from at least one index server, then a first identifier aiming at the data acquisition request is determined from the target index server, so that a target storage server corresponding to the first identifier is selected from at least one storage server, and target data is acquired from the target storage server.

Description

Data acquisition method and device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data acquisition method and a data acquisition apparatus.

Background

With the increasingly complex business logic and the increasing number of data dimensions, some problems which cannot be solved by the conventional relational database begin to appear, including defects in data modeling and limitations on horizontal scaling of large data volumes and multiple servers, for example, the data volumes generated by enterprise users, data systems and clients increase exponentially and the data volumes increase continuously, so that the conventional relational database cannot meet the storage and processing of large data at the present stage. In addition, scenes such as social networks, intelligent recommendation, knowledge maps and the like are raised on a large scale, the treatment demand of relational data is urgent, and when the problems are solved, the performance of the traditional database is easy to have more discomfort, so that the data processing is influenced.

Disclosure of Invention

The embodiment of the invention provides a data acquisition method, which aims to solve the problem that the storage and processing of complex relation data cannot be met in the prior art.

Correspondingly, the embodiment of the invention also provides a data acquisition device, which is used for ensuring the realization and the application of the method.

In order to solve the above problem, an embodiment of the present invention discloses a data acquisition method, including:

receiving a data acquisition request;

determining a target index server corresponding to the data acquisition request;

acquiring a first identifier matched with the data acquisition request from the target index server;

and acquiring target data aiming at the data acquisition request from a target storage server matched with the first identifier.

Optionally, the data obtaining request includes a target data identifier, and the determining a target index server corresponding to the data obtaining request includes:

acquiring a second identifier matched with the target data identifier from a preset index mapping relation, wherein the index mapping relation is a corresponding relation between the data identifier and an index server;

and selecting a target index server matched with the second identifier from at least one preset index server.

Optionally, the obtaining, from the target index server, the first identifier matching the data obtaining request includes:

acquiring a target storage mapping relation corresponding to the target data identifier from the target index server;

and determining a first identifier corresponding to the target data identifier by adopting the target storage mapping relation.

Optionally, the method further comprises:

and sending the target data to a user terminal, wherein the user terminal is used for carrying out data processing on the target data.

Optionally, the method further comprises:

acquiring original data and an original data identifier of the original data;

determining at least one raw storage server for storing the raw data;

establishing a storage mapping relation aiming at the original data by adopting a storage server identifier of the at least one original storage server and the original data;

determining at least one original index server for storing the storage mapping relation;

and establishing an index mapping relation aiming at the original data by adopting the index server identification of the at least one original index server and the original data identification.

The embodiment of the invention also discloses a data acquisition device, which comprises:

the request receiving module is used for receiving a data acquisition request;

the index server determining module is used for determining a target index server corresponding to the data acquisition request;

the first identifier acquisition module is used for acquiring a first identifier matched with the data acquisition request from the target index server;

and the target data acquisition module is used for acquiring target data aiming at the data acquisition request from a target storage server matched with the first identifier.

Optionally, the data obtaining request includes a target data identifier, and the index server determining module includes:

the second identifier obtaining submodule is used for obtaining a second identifier matched with the target data identifier from a preset index mapping relation, wherein the index mapping relation is a corresponding relation between the data identifier and the index server;

and the index server selection submodule is used for selecting a target index server matched with the second identifier from at least one preset index server.

Optionally, the first identifier obtaining module includes:

a storage mapping relation obtaining submodule, configured to obtain, from the target index server, a target storage mapping relation corresponding to the target data identifier;

and the first identifier acquisition submodule is used for determining a first identifier corresponding to the target data identifier by adopting the target storage mapping relation.

Optionally, the method further comprises:

and the data sending module is used for sending the target data to a user terminal, and the user terminal is used for carrying out data processing on the target data.

Optionally, the method further comprises:

the system comprises an original data acquisition module, a data processing module and a data processing module, wherein the original data acquisition module is used for acquiring original data and an original data identifier of the original data;

a first server determination submodule for determining at least one original storage server for storing the original data;

a storage mapping relationship generation module, configured to establish a storage mapping relationship for the original data by using the storage server identifier of the at least one original storage server and the original data;

the second server determination submodule is used for determining at least one original index server for storing the storage mapping relation;

and the index mapping relation generating module is used for establishing an index mapping relation aiming at the original data by adopting the index server identifier of the at least one original index server and the original data identifier.

The embodiment of the invention also discloses an electronic device, which comprises:

one or more processors; and

one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the electronic device to perform one or more methods as described above.

Embodiments of the invention also disclose one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform one or more of the methods described above.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, by receiving a data acquisition request sent by a user terminal, determining a target index server corresponding to the data acquisition request from at least one index server, and then determining a first identifier aiming at the data acquisition request from the target index server, so as to select the target storage server corresponding to the first identifier from at least one storage server and acquire target data from the target storage server, each server is independent of each other through a distributed server setting mode, thereby improving the flexibility of data processing, facilitating the query or offline processing of data stored in a database by a user, being applicable to various different data processing scenarios, and widening the universality of data processing.

Drawings

FIG. 1 is a flow chart of the steps of one embodiment of a method of data acquisition of the present invention;

FIG. 2 is a block diagram of a data processing architecture according to an embodiment of the present invention;

FIG. 3 is a flow chart of the steps of one embodiment of a method for data acquisition of the present invention;

FIG. 4 is a schematic diagram of a data storage architecture in an embodiment of the present invention;

FIG. 5 is a diagram of an index building architecture in an embodiment of the invention;

FIG. 6 is a diagram illustrating the JanusGraph configuration in an embodiment of the present invention;

FIG. 7 is a diagram of a data architecture in an embodiment of the invention;

fig. 8 is a block diagram of an embodiment of a data acquisition apparatus according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The data processing server may be a distributed, open source, scalable graph database, may support graph data processing for storage and querying, containing graph data of hundreds of millions of nodes and edges distributed across multiple server clusters, for example, the data processing server may be janus graph.

The storage server may be a NoSQL database, which may not conform to the relational database model, may not use SQL as a query language, may not require a fixed table storage schema, and has horizontally extensible features, such as Hbase, etc.

The index server may be an open source distributed search engine that supports the storage, searching and analysis of large amounts of data in a very short time, such as an ElasticSearch, etc.

A graph data structure, i.e., a graph, is a collection of vertices connected by a series of edges. For example, a graph may represent a social network where each person is a vertex and the people that know each other are connected by edges. In the data processing process, related index relationships can be established for the same type of data and stored in the same or different servers, so that a graph data structure is established.

Therefore, one of the core inventions of the embodiments of the present invention is to establish a distributed relationship between the data processing server and the index server and the storage server, and linearly improve the data processing capability while ensuring mutual independence between the servers, and meanwhile, solve the problems of storage and query of complex data, and widen the universality of data processing.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data obtaining method according to the present invention is shown, which may specifically include the following steps:

step 101, receiving a data acquisition request;

as an example, as shown in fig. 2, which shows a schematic diagram of a data processing architecture in the embodiment of the present invention, a storage server may be integrated with a Hadoop ecosystem, so as to ensure strict consistent reading and writing, and simultaneously, a linear expansion capability of an external storage server cluster is utilized, so that a data storage range may be increased for storing data; the index server may store mapping relationships of different data, and the nodes (i.e., storage servers) where the data are stored in the graph data structure may be obtained through the mapping relationships. Each component is isolated on different servers, so that the servers are scaled and managed independently from each other, and the flexibility of data processing is improved.

In a specific implementation, the data processing server may communicate with at least one user terminal, at least one index server, at least one storage server, and the like, and optionally, the data processing server may also establish a communication connection with other extended function servers.

In one example, an application program corresponding to the data processing server may be installed in the user terminal, so that a user may access the data processing server through the application program, so that the data processing server performs communication interaction with the index server and the storage server after receiving a data acquisition request sent by the user terminal, so as to acquire target data corresponding to the data acquisition request.

The user terminal may include a mobile phone, a PDA (Personal Digital Assistant), a laptop computer, a palm computer, an intelligent wearable device (such as an intelligent bracelet, intelligent glasses, an intelligent head band, and the like), and may also include a fixed device, and specifically may include a vehicle-mounted terminal, an intelligent home, and the like. The terminals may support operating systems such as Windows, Android (Android), IOS, Windows phone, and the like, may also support embedded systems, and may generally run application programs for data acquisition, reception, and processing, and the like, which is not limited in this embodiment of the present invention.

Step 102, determining a target index server corresponding to the data acquisition request;

in a specific implementation, the data acquisition request sent by the user terminal may include a target data identifier, and the target data identifier may be a data identifier of data that the user needs to acquire, so that the data processing server may obtain the target data identifier by parsing the data acquisition request, so as to determine a corresponding target index server from at least one index server according to the target data identifier.

Step 103, acquiring a first identifier matched with the data acquisition request from the target index server;

in a specific implementation, mapping relationships of different data may be stored in the index server, and a node (i.e., a storage server) where each data is stored in the graph data structure may be obtained through the mapping relationships, so that after the data processing server determines a target index server, the data processing server may obtain a first identifier matching the data obtaining request from the index server.

The first identifier may be a storage server identifier, for example, an ID (identity identification number) of the storage server, an IP address, and the like, and the first identifier may be used to accurately determine a target storage server from at least one storage server connected to the data processing server so as to obtain corresponding data.

And 104, acquiring target data aiming at the data acquisition request from the target storage server matched with the first identifier.

When the data processing server determines a target storage server matched with the first identifier from at least one storage server, the target data aiming at the data acquisition request can be acquired from the target storage server, so that each server is independent of each other through a distributed server setting mode, the flexibility of data processing is improved, a user can inquire or off-line process the data stored in the database, the data processing method and the data processing system can be suitable for various different data processing scenes, and the universality of data processing is widened.

Referring to fig. 3, a flowchart illustrating steps of an embodiment of a data obtaining method according to the present invention is shown, which may specifically include the following steps:

step 301, receiving a data acquisition request;

in one example, an application program corresponding to the data processing server may be installed in the user terminal, so that a user may access the data processing server through the application program, so that the data processing server performs communication interaction with the index server and the storage server after receiving a data acquisition request sent by the user terminal, so as to acquire target data corresponding to the data acquisition request. The data acquisition request may include a target data identifier, and the target data identifier may be a data identifier of data that needs to be processed or acquired by a user.

Step 302, determining a target index server corresponding to the data acquisition request;

in a specific implementation, when the data processing server receives a data acquisition request sent by the user terminal, the data acquisition request may be parsed to obtain the target data identifier. Then, the data processing server may obtain a second identifier matching the target data identifier from the index mapping relationship, and select a target index server matching the second identifier from at least one preset index server. Wherein, the index mapping relationship is the corresponding relationship between the data identification and the server identification of the index server

It should be noted that the first identifier may be a storage server identifier, for example, an ID (Identity document) identifier (for example, a serial number of the server, etc.), an IP address, and the like of the storage server; the second identification may be an index server identification, for example, an ID, an IP address, etc. of the storage server. When the first identifier is an ID, the second server can also be the ID identifier; when the first server is represented as an IP address, the second identifier may also be an IP address, and optionally, may also be set in other manners, which is not limited in the present invention.

The index mapping relationship may be a corresponding relationship between the data identifier and a server identifier of the index server, which indicates in which index server or index servers the index information corresponding to the data identifier is stored, for example, as shown in table 1:

data identification	Index server
		Identification (1)	Server A
Identification 2	Server B
		Identification (c)	Server C
Identification	Server A, server B

TABLE 1

The index server corresponding to the data identifier (i) can be a server A; the index server corresponding to the data identifier (II) can be a server B; the index server corresponding to the data identifier (C) can be the server C; the data identification (r) corresponding index server can be server A, server B, etc

Therefore, after the data processing server obtains the data identifier, the target index server corresponding to the data identifier can be determined from the index mapping relation of the local storage, so that the storage condition corresponding to the data identifier can be determined from the target index server.

In an optional embodiment of the present invention, the mapping relationship between the data and the index server and between the data and the storage server may be established in advance. Specifically, the data processing server may obtain the original data and an original data identifier of the original data, determine at least one original storage server for storing the original data, then establish a storage mapping relationship for the original data by using the storage server identifier of the at least one original storage server and the original data, then determine at least one original index server for storing the storage mapping relationship, and establish an index mapping relationship for the original data by using an index server identifier of the at least one original index server and the original data identifier.

In a specific implementation, data in the same type or the same data table may be stored in the same or multiple different storage servers, and after the data processing server obtains the original data identifier of the original data, the data processing server may obtain server identifiers of all storage servers storing the original data, and establish a storage mapping relationship between the original data identifier and the storage server identifier, for example, as shown in table 2:

data identification	Storage server
		Identification (1)	Server 1
Identification (1)	Server two
		Identification (1)	Server three
Identification 2	Server 1
		Identification (c)	Server 1
Identification (c)	Server three

TABLE 2

The method comprises the steps that original data I are stored in a first storage server, a second storage server and a third storage server respectively; storing the original data in a first storage server; and storing the original data in the first storage server, the third storage server and the like.

After the data processing server establishes the storage mapping relationship, it may determine an appropriate index server, store the storage mapping relationship in the index server, obtain an index server identifier of the index server, then establish an index mapping relationship between the index server identifier and the original data identifier, and store the index mapping relationship in a local area, so as to perform subsequent data search, for example, as shown in table 3, different index servers may store different or the same storage mapping relationship:

TABLE 3

It should be noted that the embodiment of the present invention includes but is not limited to the above examples, and it is understood that, under the guidance of the idea of the embodiment of the present invention, a person skilled in the art can set the method according to practical situations, and the present invention is not limited to this.

Step 303, acquiring a first identifier matched with the data acquisition request from the target index server;

in a specific implementation, after the data processing server determines the target index server, a mapping relationship obtaining request may be sent to the target index server, where the request may include the data identifier. The target index server analyzes the mapping relation obtaining request to obtain a data identifier, then inquires a locally stored storage mapping relation according to the data identifier, determines a target server identifier corresponding to the data identifier, then packages the target server identifier, generates mapping relation reply information, and returns the mapping relation reply information to the data processing server.

Step 304, acquiring target data aiming at the data acquisition request from a target storage server matched with the first identifier;

step 305, sending the target data to a user terminal, where the user terminal is configured to perform data processing on the target data.

In a specific implementation, after receiving the mapping relationship reply information returned by the target index server, the data processing server analyzes the mapping relationship reply information to obtain a target server identifier, and then determines a target storage server from all the storage servers.

After determining the target storage servers, the data processing server may send data acquisition requests to the respective target storage servers, requesting corresponding data. After receiving the data acquisition request, the target storage server may extract corresponding target data according to the data identifier, and send the target data to the data processing server, so that the data processing server returns the target data to the user terminal, so that the user may process the target data, for example, modify, add, delete, and the like, and thus, through a distributed server setting manner, each server is independent of each other, thereby improving flexibility of data processing, facilitating the user to query or perform offline processing on data stored in the database, and being applicable to various different data processing scenarios, and widening universality of data processing.

In order to make those skilled in the art better understand the technical solutions of the embodiments of the present invention, the following description and explanation are made by way of an example.

As shown in fig. 4, which illustrates a schematic diagram of a data Storage architecture in an embodiment of the present invention, a data processing server may be a janus graph, a Storage server may be an Hbase, and a distributed janus graph may be communicatively connected to a distributed Hbase through a Storage Interface of the distributed janus graph.

As shown in fig. 5, which illustrates a schematic diagram of an Index building architecture in the embodiment of the present invention, a janussgraph may be in communication connection with an Index server through an Index Interface, and the Index server may be an ElasticSearch, and the creating setting of the ElasticSearch Index is embedded into the janussgraph, so that the traversal speed with constraints is significantly increased after the Index is built. In janussgraph, except for indexes such as a combined index and a unique index which can be maintained in a memory by themselves, the indexes can be used without a third-party component as index storage, and when fuzzy matching, full-text retrieval, searching according to predicates and the like are queried, a mixed index needs to be constructed, and particularly, the mixed index can be constructed according to identification information and functional information of nodes and position information in a graph data structure.

As shown in fig. 6, which shows a schematic diagram of a janus graph configuration in the embodiment of the present invention, a Gremlin server may be packaged into janus graph, that is, a janus graph server. And when the system is started, the maintained graph instance is managed and operated by monitoring external socket connection or http request through attribute file initialization, logic diagram binding, storage components and index components. Since the Socket link is established between the client and the server for communication, and additional consumption may be caused by network bandwidth and request time consumption in the middle, the janus graph server is used as a management entry and does not process large-batch data transaction operations.

In addition, the janus graph can be embedded into a general service program, so that a user can directly interact with the janus graph by using a Gremlin query language in a Java virtual machine which is the same as the janus graph through an application program, as shown in fig. 7, a schematic diagram of a data architecture in the embodiment of the invention is shown, a function of isolating multi-user and multi-graph configuration can be realized through external maintenance public configuration, the transaction processing capability can be linearly improved by expanding a janus graph example by means of the capability of a cloud platform, and a unified external service interface is provided through packaging, so that a plurality of application programs can directly call services to complete data processing.

Specifically, a user can initiate a data query request through an application program of a user terminal, after receiving the data query request, janussgraph analyzes to obtain a data identifier, queries a target index mapping relation corresponding to the data identifier from the local, and then determines a target Elasticsearch. JanusGraph sends a mapping relation acquisition request to an elastic search, and the request can include a data identifier. The target Elasticissearch analyzes the mapping relation and obtains a request to obtain a data identifier, then, according to the data identifier, a storage mapping relation of local storage is inquired, a target server identifier corresponding to the data identifier is determined, then, the target server identifier is packaged, mapping relation reply information is generated, and the JanusGraph is returned. The method comprises the steps that JanusGraph sends a data query request to Hbase corresponding to a target server identification, receives target data returned by the Hbase, and then sends the target data to a user terminal, so that a data query process is completed, and through a distributed server setting mode, each server is independent of each other, so that the flexibility of data processing is improved, a user can query or perform offline processing on data stored in a database, the method is suitable for various different data processing scenes, and the universality of data processing is widened.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 8, a block diagram of a data obtaining apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:

a request receiving module 801, configured to receive a data obtaining request;

an index server determining module 802, configured to determine a target index server corresponding to the data obtaining request;

a first identifier obtaining module 803, configured to obtain, from the target index server, a first identifier matching the data obtaining request;

a target data obtaining module 804, configured to obtain target data for the data obtaining request from a target storage server matched with the first identifier.

In an optional embodiment of the present invention, the data obtaining request includes a target data identifier, and the index server determining module 802 includes:

In an optional embodiment of the present invention, the first identifier obtaining module 803 includes:

In an optional embodiment of the present invention, further comprising:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present invention further provides an electronic device, including:

one or more processors; and

one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the electronic device to perform methods as described in embodiments of the invention.

Embodiments of the invention also provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the methods described in embodiments of the invention.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The above detailed description is provided for a data acquisition method and a data acquisition device, and the principle and the implementation of the present invention are explained by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for obtaining data, comprising:

receiving a data acquisition request;

2. The method of claim 1, wherein the data acquisition request includes a target data identifier, and wherein determining the target index server corresponding to the data acquisition request comprises:

3. The method of claim 2, wherein the retrieving, from the target index server, the first identifier matching the data retrieval request comprises:

4. The method of claim 1, further comprising:

5. The method of claim 1, further comprising:

acquiring original data and an original data identifier of the original data;

determining at least one raw storage server for storing the raw data;

6. An apparatus for acquiring data, comprising:

the request receiving module is used for receiving a data acquisition request;

7. The apparatus of claim 6, wherein the data acquisition request comprises a target data identifier, and wherein the index server determination module comprises:

8. The apparatus of claim 7, wherein the first identity acquisition module comprises:

9. The apparatus of claim 6, further comprising:

10. The apparatus of claim 6, further comprising:

11. An electronic device, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-5.

12. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the method of any one of claims 1-5.