CN113157742A - Data lake management method and system for intelligent bus - Google Patents

Data lake management method and system for intelligent bus Download PDF

Info

Publication number
CN113157742A
CN113157742A CN202110457293.6A CN202110457293A CN113157742A CN 113157742 A CN113157742 A CN 113157742A CN 202110457293 A CN202110457293 A CN 202110457293A CN 113157742 A CN113157742 A CN 113157742A
Authority
CN
China
Prior art keywords
data
pool
query
lake
packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110457293.6A
Other languages
Chinese (zh)
Inventor
张世强
孙宏飞
钱贵涛
李峰巍
赵岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hualu Zhida Technology Co Ltd
Original Assignee
Hualu Zhida Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hualu Zhida Technology Co Ltd filed Critical Hualu Zhida Technology Co Ltd
Priority to CN202110457293.6A priority Critical patent/CN113157742A/en
Publication of CN113157742A publication Critical patent/CN113157742A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Abstract

The invention discloses a method and a system for managing data lakes of an intelligent bus, which classify data produced in the management process of a bus system, divide the data lakes into a plurality of data pools according to data types, store the classified data into the data pools of corresponding categories, establish a data set for standardizing the data in each data pool, analyze the data set according to an inquiry request to obtain an inquiry condition when a user needs to inquire the data, generate an inquiry index list based on the inquiry condition, and facilitate index search when calling the data, thereby providing a solution strategy for applying the data lakes to the bus management instead of a relational database so as to improve the utilization rate of the bus management data.

Description

Data lake management method and system for intelligent bus
Technical Field
The invention relates to the technical field of intelligent bus operation management, in particular to a data lake management method and system for an intelligent bus.
Background
With the continuous development and progress of big data analysis technology, data becomes an important asset for public transportation enterprises or organizations; in order to effectively manage data, most current users adopt a big data platform for management, but the existing big data platform provides challenges for storage, effective management and centralized management of original data, particularly data tracing and calling, and a data management form more suitable for intelligent buses needs to be researched so as to meet the storage and calculation capabilities required by the intelligent buses for processing large-scale data and provide multi-mode data processing capabilities for the users. In addition, most of the data lakes used at present are unidirectional, that is, the data lakes only have the function of storing data, and the data in the data lakes are not classified and integrated, so that the data cannot be extracted and utilized.
Disclosure of Invention
The invention provides a data lake management method and a data lake management system for an intelligent bus, which aim to overcome the technical problems.
The invention discloses a data lake management method of an intelligent bus, which comprises the following steps:
acquiring a data packet uploaded by a public transport system, and classifying data in the data packet into different data types; the data packet is a data set generated in the management process of the public transportation system; the data types in the data packet comprise: structured data, semi-structured data, and unstructured data;
dividing the data lake into different data pools according to different data types, and storing the data in the data packet into the corresponding data pools according to different data types;
establishing a data set in a data pool according to the data in the data packet; the data set, comprising: target data, pool metadata, a meta-processing procedure, data transformation standards, pool descriptions and pool targets;
after a user initiates a query request, analyzing the data set according to the query request to obtain a query condition, and generating a query index list based on the query condition;
and judging whether matched data exist or not based on the query index list, if so, packaging the matched data and sending the packaged matched data to a user, and otherwise, feeding back query failure to the user.
Further, the dividing the data lake into different data pools according to different data types includes: dividing a data lake into a structured data pool, a semi-structured data pool and an unstructured data pool; the structured data pool is used for storing bus basic data, bus configuration data, driving area region data and user personal information data; the semi-structured data pool is used for storing HTML page files and log files with file formats of CSV, XML and JSON; the unstructured data pool is used for storing e-mails, documents, graphics, audios and videos, and message and instruction data in the public transport office system.
Further, the storing the data in the data packet into the corresponding data pool according to the different data types includes:
splitting the data packet, wherein the splitting principle is that the data packet is split into at least one sub data packet based on the data type;
carrying out type attribute information identification on the split sub-data packets one by one, and forming a plurality of primary data storage forms after adding time authentication information;
setting a plurality of storage position forms stored in corresponding data pool positions;
acquiring and storing a storage position mapping table of each primary data storage form; the storage location mapping table is used for representing the storage location of the primary data storage form on the storage location form.
Further, the target data is data which is stored in the data pool and can be really analyzed and used; the pool metadata is data describing physical characteristics of data in the data pool; the meta-processing process is a file that illustrates the steps of converting raw data in a data pool into usable standardized data;
converting the original data in the data pool into a file of available standardized data by formula (1);
Figure BDA0003040969480000021
in the formula, I is input original data, and a is a file of available standardized data; n represents the number of times data is processed, n is 3,
Figure BDA0003040969480000022
processing data for the t time by using a zipper algorithm, wherein W is a linear regression matrix, omega represents weight, and f (I) represents that data are converted by using convolution;
the data conversion standard is a file which indicates a standard to be followed when converting the original data; the pool description includes: external and internal descriptions of the data pool; the pool target is a file representing the direction of application of the data.
Further, the target data are searched out in a data lake through a machine learning and concept search method, and after the target data are eliminated, data with unclear standards are obtained;
finding out the target data in a data lake through an equation (2);
Figure BDA0003040969480000031
in the formula, t is target data, and f is a data lake; m represents the total amount of data in the data lake, l (. + -.) represents the characteristic extracted by the convolution network, and f (. + -.) represents the serialization of the data; po is the possibility of being the target data, and if the po value is larger than a preset threshold, it means that the data is the target data.
Further, the determining whether there is matched data based on the query index list includes:
judging whether matched data exist or not through the formula (3);
Figure BDA0003040969480000032
where x is the dataset and y is the queryAn index list; vx,VyRepresents confidence coefficient and takes value range [0,1]L (#) represents the features extracted by the convolution network, f (#) represents the serialization of the data, and the matching value is larger than a set threshold value to calculate that the matching is successful; dis is the degree of match.
Further, after the user initiates a query request, parsing the data set according to the query request to obtain a query condition, and generating a query index list based on the query condition, including:
a user initiates a query request;
resolving the query request into a plurality of fields to form a plurality of query conditions; generating a query index list based on the query condition; the query index list at least comprises the analyzed field and the matched type attribute information corresponding to the field; the type attribute information is obtained based on a fuzzy matching algorithm.
Further, establishing a data pool to be processed; and storing the data with unclear standards into the data pool to be processed, and using the data after the data is standardized again.
The utility model provides a data lake management system of intelligence public transit, includes:
the system comprises a data packet processing unit, a data pool processing unit and a data query unit;
the data packet processing unit is used for acquiring data packets uploaded by a public transport system and classifying data in the data packets into different data types; the data packet is a data set generated in the management process of the public transportation system; the data types in the data packet comprise: structured data, semi-structured data, and unstructured data;
the data pool processing unit is used for dividing the data lake into different data pools according to different data types and storing the data in the data packet into the corresponding data pools according to different data types; establishing a data set in a data pool according to the data in the data packet; the data set, comprising: target data, pool metadata, a meta-processing procedure, data transformation standards, pool descriptions and pool targets;
the data query unit is used for analyzing the data set according to the query request to obtain a query condition after a user initiates the query request, and generating a query index list based on the query condition; and judging whether matched data exist or not based on the query index list, if so, packaging the matched data and sending the packaged matched data to a user, and otherwise, feeding back query failure to the user.
The data generated in the public transportation system management process is classified, the data lakes are divided into a plurality of data pools according to data types, the classified data are stored into the data pools of corresponding categories, then data sets enabling the data to be standardized are established in the data pools, when a user needs to inquire the data, the data sets can be analyzed according to inquiry requests to obtain inquiry conditions, an inquiry index list is generated based on the inquiry conditions, and index searching is facilitated when the data is called, so that a solving strategy of using the data lakes to replace relational databases to be applied to public transportation management is provided, data islands are eliminated, data standards are unified, data change is accelerated, and the utilization rate of public transportation management data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a data lake management method for intelligent buses;
fig. 2 is a schematic structural diagram of a data lake management system of an intelligent bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present embodiment provides a data lake management method and system for an intelligent bus, including:
101. acquiring a data packet uploaded by a public transport system, and classifying data in the data packet into different data types; the data packet is a data set generated in the management process of the public transportation system; types of data in the data packet, including: structured data, semi-structured data, and unstructured data;
specifically, a data lake can be built by using a blue-ray storage (optomagnetic fusion storage) technology or a cloud platform, and a public transportation system data lake operation platform is built for applications such as data integration, data cleaning, data management and intelligent services. Currently, a common means for implementing a data lake is Hadoop. The evolved Hadoop data management architecture depends on an Apache Falcon data management platform, and a data group is connected with a program, an operation rule, a display and a history record to finish the use target of the data lake. The data uploaded by the public transportation system comprises various different types of data such as structured data, semi-structured data and unstructured data, and all the data are stored in a data lake to serve as a water source of the data lake.
Structured data is data that can be represented with a uniform structure. Generally, the data can be logically expressed by a two-dimensional table structure, and the data stored in a relational database in the public transportation system belongs to structured data. The semi-structured data is data between strictly defined structured data and completely unstructured data, and mainly comprises HTML page files and log files with file formats of CSV, XML and JSON. The unstructured data refers to data which is not convenient to be represented by a database two-dimensional logic table, namely, the unstructured data comprises office documents, texts, pictures, subset XML under a standard general markup language, various reports, images, audio/video information and the like in all formats.
102. Dividing the data lake into different data pools according to different data types, and storing the data in the data packet into corresponding data pools according to different data types;
specifically, if the data lake data is not classified or integrated, the data lake data cannot be extracted and utilized. The solution strategy adopted by the method is to divide the data lake into a structured data pool, a semi-structured data pool and an unstructured data pool. The data pools in the data lake are closely connected, one data is classified into different data pools according to the data type of the data after entering the data lake, and the different data pools are respectively used for storing different types of data and establishing a relationship among the different types of data to share information. The structured data pool is used for storing bus basic data, bus configuration data, driving area region data and user personal information data; the public transportation basic data mainly comprises a plurality of groups of basic data with invariable values, such as information of vehicle numbers, line names, line numbers, IP addresses and ports of vehicle-mounted terminals and the like; the public transport configuration data mainly comprises information such as a vehicle-mounted terminal system configuration parameter IP address and port, engine parameters and the like; the driving area region data mainly comprises bus stops and longitude and latitude on lines; the user personal information mainly comprises driver information, service personnel information and other staff information.
The semi-structured data pool is used for storing HTML page files and log files, namely data obtained by application API (application program interface), such as running logs, scheduling logs and the like of a vehicle-mounted terminal system, and the file format can be CSV (common service provider), XML (extensible markup language) and JSON (Java service provider);
the unstructured data pool is used for storing various messages and instructions such as e-mails, documents and PDFs issued in the public transportation office system, and graphs, audios and videos collected by the public transportation operation state, such as images/videos of people in a carriage, road conditions and the like.
The step of storing the data in the data packet into the data lake is as follows:
1. splitting the data packet, wherein the splitting principle is that the data packet is split into at least one sub data packet based on the data type;
2. carrying out type attribute information identification on the split sub-data packets one by one, and forming a plurality of primary data storage forms after adding time authentication information;
3. setting a plurality of storage position forms stored in corresponding data pool positions;
4. acquiring and storing a storage position mapping table of each primary data storage form; the storage location mapping table is used for representing the storage location of the primary data storage form on the storage location form.
103. Establishing a data set in a data pool according to data in the data packet; a data set comprising: target data, pool metadata, a meta-processing procedure, data transformation standards, pool descriptions and pool targets;
specifically, target data can be found in the data lake through a machine learning and concept search method, and after the target data is eliminated, data with unclear standards are obtained. There are many ways to find the data, for example, first find the limiting factor of the data, then check the data tag, and finally find a large amount of data.
The target data is data which is stored in the data pool and can be really analyzed and used, and the data can be directly used without processing; the pool metadata is data describing physical characteristics of data in the data pool; the meta-processing procedure is a file illustrating the steps of converting raw data in the data pool into usable standardized data; the data conversion standard is a file for explaining the standard to be followed when converting the original data; the pool description includes: external description and internal description of the data pool, the external description comprising: function, size of the data pool; the internal description comprises the source, volume, updating frequency, extraction, conversion, standard of data in the data pool and the relation between the data; the pool target is a file representing the direction of application of the data. The public traffic system data lake operation platform converts non-target data into usable target data through a data cleaning function according to pool metadata, a metadata processing process, a data conversion standard, pool description and a pool target, and stores the usable target data in a uniform standard format.
Converting the original data in the data pool into a file of available standardized data by formula (1);
Figure BDA0003040969480000071
i is input original data, a is available standard data file; n represents the number of times data is processed, n is 3,
Figure BDA0003040969480000072
processing data for the t time by using a zipper algorithm, wherein W is a linear regression matrix, omega represents weight, and f (I) represents that data are converted by using convolution;
finding out the target data in a data lake through an equation (2);
Figure BDA0003040969480000073
in the formula, t is target data, and f is a data lake; m represents the total amount of data in the data lake, l (. + -.) represents the characteristic extracted by the convolution network, and f (. + -.) represents the serialization of the data; po is the possibility of being the target data, and if the po value is larger than a preset threshold, it means that the data is the target data.
In addition, considering that the data still has many data which cannot be utilized and has unclear standards after being cleaned, the data cannot be retrieved after being discarded. And a data pool to be processed can be established again, the data with unclear standards can be stored in the data pool to be processed, and the data can be used after being standardized again.
104. After a user initiates a query request, analyzing the data set according to the query request to obtain a query condition, and generating a query index list based on the query condition;
specifically, a user initiates a query request, and then the query request is analyzed into a plurality of fields to form a plurality of query conditions; generating a query index list based on the query condition; the query index list at least comprises the analyzed field and the matched type attribute information corresponding to the field; the type attribute information is obtained based on a fuzzy matching algorithm.
105. And judging whether matched data exist or not based on the query index list, if so, packaging the matched data and sending the packaged matched data to the user, and otherwise, feeding back the query failure to the user.
Specifically, whether matched data exists is judged through the formula (3);
Figure BDA0003040969480000074
in the formula, x is a data set, and y is a query index list; vx,VyRepresents confidence coefficient and takes value range [0,1]L (#) represents the features extracted by the convolution network, f (#) represents the serialization of the data, and the matching value is larger than a set threshold value to calculate that the matching is successful; dis is the degree of match.
In addition, the method in the present invention only describes the query method of the client, and does not describe the operation methods of adding, deleting, checking, changing, etc. of the management end, because the operation method of the terminal is not the focus of the present invention, but an extended research can be performed according to the present invention.
As shown in fig. 2, this embodiment provides a data lake management system of intelligent public transport, including:
the system comprises a data packet processing unit, a data pool processing unit and a data query unit;
the data packet processing unit is used for acquiring data packets uploaded by the public transportation system and classifying data in the data packets into different data types; the data packet is a data set generated in the management process of the public transportation system; types of data in the data packet, including: structured data, semi-structured data, and unstructured data;
the data pool processing unit is used for dividing the data lake into different data pools according to different data types and storing the data in the data packet into the corresponding data pools according to different data types; establishing a data set in the data pool according to the data in the data packet; a data set comprising: target data, pool metadata, a meta-processing procedure, data transformation standards, pool descriptions and pool targets;
the data query unit is used for analyzing the data set according to the query request to obtain a query condition after a user initiates the query request, and generating a query index list based on the query condition; and judging whether matched data exist or not based on the query index list, if so, packaging the matched data and sending the packaged matched data to the user, and otherwise, feeding back the query failure to the user.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A data lake management method of an intelligent bus is characterized by comprising the following steps:
acquiring a data packet uploaded by a public transport system, and classifying data in the data packet into different data types; the data packet is a data set generated in the management process of the public transportation system; the data types in the data packet comprise: structured data, semi-structured data, and unstructured data;
dividing the data lake into different data pools according to different data types, and storing the data in the data packet into the corresponding data pools according to different data types;
establishing a data set in a data pool according to the data in the data packet; the data set, comprising: target data, pool metadata, a meta-processing procedure, data transformation standards, pool descriptions and pool targets;
after a user initiates a query request, analyzing the data set according to the query request to obtain a query condition, and generating a query index list based on the query condition;
and judging whether matched data exist or not based on the query index list, if so, packaging the matched data and sending the packaged matched data to a user, and otherwise, feeding back query failure to the user.
2. The method for managing the data lake of the intelligent bus according to claim 1, wherein the dividing the data lake into different data pools according to different data types comprises:
dividing a data lake into a structured data pool, a semi-structured data pool and an unstructured data pool;
the structured data pool is used for storing bus basic data, bus configuration data, driving area region data and user personal information data;
the semi-structured data pool is used for storing HTML page files and log files with file formats of CSV, XML and JSON;
the unstructured data pool is used for storing e-mails, documents, graphics, audios and videos, and message and instruction data in the public transport office system.
3. The method for managing the data lake of the intelligent bus according to claim 2, wherein the step of storing the data in the data packet into the corresponding data pool according to different data types comprises:
splitting the data packet, wherein the splitting principle is that the data packet is split into at least one sub data packet based on the data type;
carrying out type attribute information identification on the split sub-data packets one by one, and forming a plurality of primary data storage forms after adding time authentication information;
setting a plurality of storage position forms stored in corresponding data pool positions;
acquiring and storing a storage position mapping table of each primary data storage form; the storage location mapping table is used for representing the storage location of the primary data storage form on the storage location form.
4. The data lake management method of the intelligent bus according to claim 3,
the target data is data which is stored in the data pool and can be really analyzed and used;
the pool metadata is data describing physical characteristics of data in the data pool;
the meta-processing process is a file that illustrates the steps of converting raw data in a data pool into usable standardized data;
converting the original data in the data pool into a file of available standardized data by formula (1);
Figure FDA0003040969470000021
i is input original data, a is available standard data file; n represents the number of times data is processed, n is 3,
Figure FDA0003040969470000022
processing data for the t time by using a zipper algorithm, wherein W is a linear regression matrix, omega represents weight, and f (I) represents that data are converted by using convolution;
the data conversion standard is a file which indicates a standard to be followed when converting the original data;
the pool description includes: external and internal descriptions of the data pool;
the pool target is a file representing the direction of application of the data.
5. The data lake management method of the intelligent bus according to claim 4, wherein the target data is found in the data lake through a machine learning and concept search method, and standard unclear data is obtained after the target data is eliminated;
finding out the target data in a data lake through an equation (2);
Figure FDA0003040969470000023
in the formula, t is target data, and f is a data lake; m represents the total amount of data in the data lake, l (. + -.) represents the characteristic extracted by the convolution network, and f (. + -.) represents the serialization of the data; po is the possibility of being the target data, and if the po value is larger than a preset threshold, it means that the data is the target data.
6. The method as claimed in claim 5, wherein the step of determining whether there is matched data based on the query index list comprises:
judging whether matched data exist or not through the formula (3);
Figure FDA0003040969470000024
in the formula, x is a data set, and y is a query index list; vx,VyRepresents confidence coefficient and takes value range [0,1]L (#) represents the features extracted by the convolution network, f (#) represents the serialization of the data, and the matching value is larger than a set threshold value to calculate that the matching is successful; dis is the degree of match.
7. The method for managing the data lake of the intelligent bus according to claim 1, wherein after the user initiates an inquiry request, the data set is analyzed according to the inquiry request to obtain an inquiry condition, and an inquiry index list is generated based on the inquiry condition, comprising:
a user initiates a query request;
resolving the query request into a plurality of fields to form a plurality of query conditions; generating a query index list based on the query condition; the query index list at least comprises the analyzed field and the matched type attribute information corresponding to the field; the type attribute information is obtained based on a fuzzy matching algorithm.
8. The data lake management method of the intelligent bus according to claim 5, wherein a to-be-processed data pool is established; and storing the data with unclear standards into the data pool to be processed, and using the data after the data is standardized again.
9. The utility model provides a data lake management system of intelligence public transit which characterized in that includes:
the system comprises a data packet processing unit, a data pool processing unit and a data query unit;
the data packet processing unit is used for acquiring data packets uploaded by a public transport system and classifying data in the data packets into different data types; the data packet is a data set generated in the management process of the public transportation system; the data types in the data packet comprise: structured data, semi-structured data, and unstructured data;
the data pool processing unit is used for dividing the data lake into different data pools according to different data types and storing the data in the data packet into the corresponding data pools according to different data types; establishing a data set in a data pool according to the data in the data packet; the data set, comprising: target data, pool metadata, a meta-processing procedure, data transformation standards, pool descriptions and pool targets;
the data query unit is used for analyzing the data set according to the query request to obtain a query condition after a user initiates the query request, and generating a query index list based on the query condition; and judging whether matched data exist or not based on the query index list, if so, packaging the matched data and sending the packaged matched data to a user, and otherwise, feeding back query failure to the user.
CN202110457293.6A 2021-04-27 2021-04-27 Data lake management method and system for intelligent bus Pending CN113157742A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110457293.6A CN113157742A (en) 2021-04-27 2021-04-27 Data lake management method and system for intelligent bus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110457293.6A CN113157742A (en) 2021-04-27 2021-04-27 Data lake management method and system for intelligent bus

Publications (1)

Publication Number Publication Date
CN113157742A true CN113157742A (en) 2021-07-23

Family

ID=76871135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110457293.6A Pending CN113157742A (en) 2021-04-27 2021-04-27 Data lake management method and system for intelligent bus

Country Status (1)

Country Link
CN (1) CN113157742A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809249A (en) * 2023-02-03 2023-03-17 杭州比智科技有限公司 Data lake management method and system based on proprietary data set

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109298840A (en) * 2018-11-19 2019-02-01 平安科技(深圳)有限公司 Data integrating method, server and storage medium based on data lake
CN111666263A (en) * 2020-05-12 2020-09-15 埃睿迪信息技术(北京)有限公司 Method for realizing heterogeneous data management in data lake environment
CN112052259A (en) * 2020-09-28 2020-12-08 深圳前海微众银行股份有限公司 Data processing method, device, equipment and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109298840A (en) * 2018-11-19 2019-02-01 平安科技(深圳)有限公司 Data integrating method, server and storage medium based on data lake
CN111666263A (en) * 2020-05-12 2020-09-15 埃睿迪信息技术(北京)有限公司 Method for realizing heterogeneous data management in data lake environment
CN112052259A (en) * 2020-09-28 2020-12-08 深圳前海微众银行股份有限公司 Data processing method, device, equipment and computer storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809249A (en) * 2023-02-03 2023-03-17 杭州比智科技有限公司 Data lake management method and system based on proprietary data set

Similar Documents

Publication Publication Date Title
CN109033387B (en) Internet of things searching system and method fusing multi-source data and storage medium
Li et al. A storage solution for massive IoT data based on NoSQL
US6505191B1 (en) Distributed computer database system and method employing hypertext linkage analysis
US20120158791A1 (en) Feature vector construction
US10555139B1 (en) Deriving signal location information
CN106407208A (en) Establishment method and system for city management ontology knowledge base
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
US20230024345A1 (en) Data processing method and apparatus, device, and readable storage medium
US20230106416A1 (en) Graph-based labeling of heterogenous digital content items
CN108287901A (en) Method and apparatus for generating information
CN110990447A (en) Data probing method, device, equipment and storage medium
Gao et al. Real-time social media retrieval with spatial, temporal and social constraints
CN111723161A (en) Data processing method, device and equipment
CN113157742A (en) Data lake management method and system for intelligent bus
Alsubaiee et al. Asterix: scalable warehouse-style web data integration
CN113159320A (en) Scientific and technological resource data integration method and device based on knowledge graph
CN106933844B (en) Construction method of reachability query index facing large-scale RDF data
CN114925286B (en) Public opinion data processing method and device
CN111581420B (en) Flink-based medical image real-time retrieval method
CN115204393A (en) Smart city knowledge ontology base construction method and device based on knowledge graph
CN117194668A (en) Knowledge graph construction method, device, equipment and storage medium
CN110740046B (en) Method and device for analyzing service contract
CN112559758A (en) Method, device and equipment for constructing knowledge graph and computer readable storage medium
CN110750678A (en) Method and system for monitoring video data association description and storage management
CN114840686B (en) Knowledge graph construction method, device, equipment and storage medium based on metadata

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210723

RJ01 Rejection of invention patent application after publication