WO2021068549A1 - Data processing method, platform and system - Google Patents

Data processing method, platform and system Download PDF

Info

Publication number
WO2021068549A1
WO2021068549A1 PCT/CN2020/096999 CN2020096999W WO2021068549A1 WO 2021068549 A1 WO2021068549 A1 WO 2021068549A1 CN 2020096999 W CN2020096999 W CN 2020096999W WO 2021068549 A1 WO2021068549 A1 WO 2021068549A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
query
index
relational database
database
Prior art date
Application number
PCT/CN2020/096999
Other languages
French (fr)
Chinese (zh)
Inventor
万鹏程
吕勇
李春生
贾洪园
Original Assignee
苏宁易购集团股份有限公司
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司, 苏宁云计算有限公司 filed Critical 苏宁易购集团股份有限公司
Priority to CA3154438A priority Critical patent/CA3154438A1/en
Publication of WO2021068549A1 publication Critical patent/WO2021068549A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Definitions

  • This application relates to the field of business data calculation and query, and in particular to a data processing method, platform and system.
  • This application provides a data processing method, platform, and system to solve the problem of low efficiency in calculating and querying product content data in the prior art.
  • a data processing method includes:
  • index data is created according to the original product content data and store it in an index database;
  • the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
  • the method further includes:
  • the method further includes:
  • At least part of the calculation result data is associated with the query dimension identification data and stored in the index database.
  • the calculation result data obtained by invoking the calculation program to calculate the original product content data includes:
  • the storing the calculation result data in association with the query dimension identification data in the first relational database includes:
  • the storing at least part of the calculation result data in association with the query dimension identification data in the index database includes:
  • the total quality score of each commodity is associated with the query dimension identification data and stored in the index database.
  • the query dimension identification data is a product code and/or a merchant code.
  • the method further includes:
  • the receiving the original product content data and storing it in a second relational database in clusters, databases and tables includes:
  • the original product content data is received and stored in the second relational database according to the product code, cluster and database and table.
  • the first relational database is Hbase
  • the second relational database is Mysql
  • the calculation program is Spark
  • the index database is Elasticsearch.
  • Another aspect of the present application also provides a data processing platform, which includes a data storage layer and a data calculation layer;
  • the data storage layer is used to store the original product content data in clusters, databases and tables in a first relational database, and build index data based on the original product content data and store it in the index database;
  • the index data includes keywords The query dimension identification data corresponding to the field and each keyword field;
  • the data calculation layer is used to call a calculation program to calculate the original product content data to obtain calculation result data and store the calculation result data in association with the query dimension identification data in the first relational database.
  • a computer system including:
  • One or more processors are One or more processors.
  • a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
  • index data is created according to the original product content data and store it in an index database;
  • the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
  • a calculation program is called to calculate the original product content data to obtain calculation result data, and the calculation result data is associated with the query dimension identification data and stored in the first relational database.
  • the technical solution of this application improves the calculation efficiency by storing the original commodity data in clusters, databases, and tables in a relational database, calling calculation programs for calculation, and establishing an index database according to the query dimensions, and then indexing is performed in the subsequent query. Will improve query efficiency.
  • this solution can quickly provide multi-dimensional query of calculation result data, and avoids the problem of low efficiency caused by direct query in a relational database.
  • Fig. 1 is a structural diagram of a data processing platform provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a cluster database provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of original product content data synchronization provided by an embodiment of the present application.
  • FIG. 4 is a flow chart of product content quality sub-query according to an embodiment of the present application.
  • Fig. 5 is a flowchart of a data processing method provided by an embodiment of the present application.
  • Fig. 6 is an architecture diagram of a computer system provided by an embodiment of the present application.
  • This application aims to provide a method for processing product content data.
  • the original product content data is stored in a relational database in clusters, databases, and tables, and then the calculation program is called to perform parallel calculations in a relational database to improve calculation efficiency, and according to the required query dimensions Index data is established so that subsequent queries can be performed in the relational database after matching the identification data in the index database.
  • This solution can quickly provide multi-dimensional queries of the calculation result data and improve the query efficiency.
  • the data processing platform structure diagram of one embodiment of this application includes Mysql database, Hbase database, Spark computing program for calculation, search engine Elasticsearch, remote service framework RSF, and merchants that perform queries.
  • the Mysql database is used as a database for receiving original product content data, and it stores a large amount of original product content data in its own database by clustering and sub-database sub-table. Specifically, it can be divided into clusters and tables according to the way of commodity coding, and the specific operations will be described in detail later.
  • the Hbase database is used for synchronization based on the data in the Mysql database. Specifically, synchronization can be accomplished through the data replication and data exchange platform. After synchronization, the Hbase database stores the original product content data in a manner of clustering, database, and table.
  • the original product content data can be directly stored in the Hbase database without going through the Mysql database.
  • the way to go through the Mysql database is to take into account the stability of data backup, and the other is to take into account that other business processes need to rely on the Mysql database for operations.
  • the index database Elasticsearch stores keyword fields for query, such as product brands, and identification data corresponding to the keyword fields, such as product codes. Based on the index, the query keywords entered by the user (merchant) can be matched to the corresponding product code.
  • the Spark calculation program is used to perform MapReduce (programming model, parallel operation for large-scale data sets (greater than 1TB)) on the original product content data of each cluster according to the expression rules according to the number segment of the product code to obtain the calculation result , Such as calculating the quality score of the product content.
  • MapReduce programming model, parallel operation for large-scale data sets (greater than 1TB)
  • the calculation result and identification data such as the product code are stored in the Hbase database.
  • RSF When a user enters a query keyword, RSF first searches in the index to determine matching identification data such as a product code, and then determines the calculation result data in the Hbase database according to the product code.
  • the establishment of the aforementioned index can be independent of the establishment of the calculation process.
  • at least a part of the calculation result can also be stored in the index database.
  • the original product content data involved is divided into clusters, databases, tables, storage process, product content data synchronization process, product content quality calculation process, product content quality synchronization process
  • the process, indexing process and product content quality sub-query process are introduced in detail:
  • the original product content data is stored in clusters, databases, and tables:
  • the original product content data is stored in 4 clusters of Mysql according to the number segment of the product code, and the result of modulo 10 according to the last two digits of the product code is stored in the 10 sub-databases of each cluster, according to the product code
  • the last digit is taken out of 10 and stored in the 10 sub-tables of each sub-library, so that more than one billion product content data is scattered into hundreds of sub-tables.
  • Figure 2 shows a schematic diagram of a cluster sub-library.
  • the product data of the segment from 000000000000000000 to 000000000500000000 is stored in cluster 1; the product data of the segment from 000000000500000001 to 000000001000000000 is stored in cluster 2; the product data of the segment from 000000001000000001 to 000000001500000000 is stored in 3 clusters; the commodity data from 000000001500000001 to 000000002000000000 are stored in 4 clusters.
  • the commodity code 000000001500000023 belongs to the 4 clusters, 3 sub-bases and 4 sub-tables.
  • RDRS RealTime Data Replication System
  • IDE data exchange platform
  • the RDRS platform synchronizes product content data to HBase by analyzing the binlog information of the Mysql database cluster in quasi-real time.
  • the data exchange platform synchronizes the incremental data of product content information to HBase every day, and compares and corrects it with the quasi-real-time HBase product content data.
  • the data exchange platform synchronizes the full amount of product data to HBase every week, and compares and corrects it with the current HBase product content data.
  • the quality of product content is mainly affected by seven content dimensions: basic information, parameter information, category information, main image information, title information, selling point information, and detailed information.
  • the Spark program will calculate the basic information, parameter information, category information, main image information, title information, selling point information, detailed information scores, and finally for all sub-library products based on the expression rules in parallel for each sub-library.
  • the dimensional scores are summarized and written into Hive (a data warehouse tool of Hadoop), specifically:
  • MapReduce to calculate the basic information, parameter information, category information, main image information, title information, selling point information, and detailed information scores of all sub-libraries according to the sub-libraries. Calculating according to the sub-database is mainly to reduce the excessive tilt of the data, thereby improving the calculation efficiency.
  • the corresponding total scores are calculated by summarizing the scores, such as the total score of a certain product or the total score of a certain merchant.
  • the scores such as the total score of a certain product or the total score of a certain merchant.
  • other dimensions are also possible.
  • data such as the product content quality score of each dimension, the total product content quality score, and the score aggregated according to the set query dimensions are synchronized to HBase.
  • index data according to query dimensions such as product code and merchant code.
  • the index data includes keyword fields and corresponding query dimension identification data. Such as product brand and corresponding product code.
  • the index can be established based on the synchronization process of the product content quality score data.
  • the product content quality score data is calculated and synchronized to HBase
  • the corresponding relationship between the keyword field in the original product data and the query dimension identification data is established, and the corresponding relationship is established according to the
  • the total score data obtained by querying dimensions such as product code and merchant code is synchronized to the index data.
  • the relevant calculation result data of Elasticsearch and HBase are incremental updates.
  • RSF Remote Service Framework
  • SQLService query services
  • FIG. 4 is a flowchart of the product content quality sub-query process, including the following steps:
  • the client sends a query service request for product quality score
  • the query server performs expression analysis on the product quality sub-query service request sent by the client;
  • the query server submits the parsed query request to an Elasticsearch cluster (Elasticsearch Cluster); in this embodiment, Elasticsearch sets up the cluster to prevent a single point of failure of the machine.
  • Elasticsearch Cluster Elasticsearch Cluster
  • the Elasticsearch cluster returns the query result (product code + merchant code) to the query server;
  • the query server submits a query request to the HBase Cluster (HBase Cluster) according to the query result returned by the Elasticsearch cluster;
  • the HBase cluster returns the final query result corresponding to the product code and merchant code to the query server;
  • the query server returns the final query result to the client.
  • the aforementioned databases or calculation programs Spark can be replaced by similar functional modules, and the calculation results can also be set as other data other than the product content quality score according to user needs.
  • the first embodiment of the present application provides a data processing method, as shown in FIG. 5, including the following steps:
  • index data Establish index data according to the original product content data and store it in an index database;
  • the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
  • S53 Invoke a calculation program to calculate the original product content data to obtain calculation result data, and store the calculation result data in association with the query dimension identification data in the first relational database.
  • the method also includes:
  • the method may further include:
  • At least part of the calculation result data is associated with the query dimension identification data and stored in the index database.
  • the method further includes: receiving the original product content data and storing it in a second relational database in clusters, databases, and tables; specifically, clusters and tables are classified according to the product code;
  • this application also provides a data processing platform, which includes a data storage layer and a data calculation layer;
  • the data storage layer is used to divide the original product content data into clusters, databases and tables to store the first relational database, and build index data according to the original product content data and store it in the index database;
  • the index data includes keywords The query dimension identification data corresponding to the field and each keyword field;
  • the data calculation layer is used to call a calculation program to calculate the original product content data to obtain calculation result data and store the calculation result data in association with the query dimension identification data in the first relational database.
  • the data processing platform further includes a data application layer, which is used to receive a user's query request, analyze to obtain the keyword to be queried, and perform a query in the index database to obtain a query corresponding to the keyword to be queried
  • the dimension identification data is used as a target identification and is queried in the first relational database to obtain calculation result data corresponding to the target identification so as to return the result data to the user.
  • the storage layer is further configured to store at least part of the calculation result data in association with the query dimension identification data in the index database.
  • the storage layer is further configured to receive the original product content data and store it in a second relational database in clusters, databases and tables, and synchronize the original product content data in the second relational database. To the first relational database.
  • Embodiment 3 of the present application provides a computer system, including:
  • One or more processors are One or more processors.
  • a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
  • the original product content data is stored in the first relational database in clusters, databases and tables;
  • index data is created according to the original product content data and store it in an index database;
  • the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
  • the original product content data is calculated by a calculation program to obtain calculation result data, and the calculation result data is associated with the query dimension identification data and stored in the first relational database.
  • FIG. 6 exemplarily shows the architecture of the computer system, which may specifically include a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520.
  • the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected through the communication bus 1530.
  • the processor 1510 may be implemented by a general-purpose CPU (Central ProcElasticsearchsing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits. Perform relevant procedures to realize the technical solutions provided in this application.
  • a general-purpose CPU Central ProcElasticsearchsing Unit, central processing unit
  • microprocessor application specific integrated circuit
  • ASIC Application Specific Integrated Circuit
  • the memory 1520 may be implemented in the form of ROM (Read Only Memory), RAM (Random AccElasticsearch Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 1520 may store an operating system 1521 used to control the operation of the computer system 1500, and a basic input output system (BIOS) used to control low-level operations of the computer system 1500.
  • BIOS basic input output system
  • a web browser 1523, a data storage management system 1524, and an icon font processing system 1525 can also be stored.
  • the foregoing icon font processing system 1525 may be an application program that specifically implements the foregoing steps in the embodiment of the present application.
  • the related program code is stored in the memory 1520, and is called and executed by the processor 1510.
  • the input/output interface 1513 is used to connect input/output modules to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or it can be connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, and an indicator light.
  • the network interface 1514 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • the bus 1530 includes a path to transmit information between various components of the device (for example, the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).
  • various components of the device for example, the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520.
  • the computer system 1500 can also obtain information about specific receiving conditions from the virtual resource object receiving condition information database 1541 for condition determination, and so on.
  • the above device only shows the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus 1530, etc., in the specific implementation process, the The equipment may also include other components necessary for normal operation.
  • the above-mentioned device may also include only the components necessary to implement the solution of the present application, and not necessarily include all the components shown in the figure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data processing method, platform and system. The method comprises: storing original commodity content data in a first relational database by cluster, by library and by table (S51); establishing index data according to the original commodity content data and storing the index data in an index database (S52), the index data comprising keyword fields and query dimension identification data corresponding to each keyword field; and computing the original commodity content data by means of a computing program to obtain computing result data, and associatively storing the computing result data and the query dimension identification data in the first relational database (S53). The computing efficiency is improved, and an index database is established according to query dimensions to perform indexing in advance during a subsequent query, which inevitably improves the querying efficiency.

Description

一种数据处理方法、平台及系统Data processing method, platform and system 技术领域Technical field
本申请涉及业务数据计算查询领域,特别是涉及一种数据处理方法、平台及系统。This application relates to the field of business data calculation and query, and in particular to a data processing method, platform and system.
背景技术Background technique
商家在销售商品时,经常需要通过一些分析数据作为运营的指导基础。这些分析数据多基于平台对大量的商品内容数据分析计算得到。比如用于表征商品描述信息质量的商品内容质量分,该数据可为销售实物类商品的商家提供商品运营指导。这一数据通过平台对众多商家的众多商品内容数据进行汇总分析计算得到。目前对众多商品内容数据进行汇总分析计算多通过Java和关系型数据库Mysql的方式实现。当商家需要查询这些计算结果数据时,会直接查询Mysql。When merchants sell goods, they often need to use some analytical data as a basis for guidance for operations. These analysis data are mostly based on the platform's analysis and calculation of a large number of commodity content data. For example, the product content quality score used to characterize the quality of product description information can provide merchandise operation guidance for merchants selling physical merchandise. This data is obtained through the platform's summary analysis and calculation of numerous product content data of many merchants. At present, the summary analysis and calculation of many product content data are mostly realized through Java and the relational database Mysql. When merchants need to query these calculation result data, they will directly query Mysql.
但在电子商务迅速发展的时代背景下,商品内容数据海量产生,尤其是平台大促期间,如“双十一”,“618”,“818”,“双十二”等,数据量更是大幅增长。Java和关系型数据库Mysql的方式对数据计算的效率较低,当商家进行计算结果数据查询时,Java和关系型数据库Mysql的方式也导致查询效率低。尤其遇到一些复杂的查询条件,查询时间基本都在秒级。However, in the context of the rapid development of e-commerce, a large amount of product content data is generated, especially during the platform promotion period, such as "Double Eleven", "618", "818", "Double 12", etc. The amount of data is even greater. Substantial growth. The method of Java and the relational database Mysql has low efficiency in data calculation. When the merchant performs the calculation result data query, the method of Java and the relational database Mysql also leads to low query efficiency. Especially when encountering some complex query conditions, the query time is basically in the second level.
发明内容Summary of the invention
本申请提供了一种数据处理方法、平台及系统,以解决现有技术中对商品内容数据计算查询效率低的问题。This application provides a data processing method, platform, and system to solve the problem of low efficiency in calculating and querying product content data in the prior art.
本申请提供了如下方案:This application provides the following solutions:
一方面提供了一种数据处理方法,所述方法包括:In one aspect, a data processing method is provided, and the method includes:
将原始商品内容数据分集群分库分表存储在第一关系型数据库中;Store the original product content data in clusters, databases and tables in the first relational database;
根据所述原始商品内容数据建立索引数据并存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;Create index data according to the original product content data and store it in an index database; the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
调用计算程序对所述原始商品内容数据进行计算得到计算结果数据并将 所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。Calling a calculation program to calculate the original product content data to obtain calculation result data and store the calculation result data in association with the query dimension identification data in the first relational database.
优选的,所述方法还包括:Preferably, the method further includes:
接收用户的查询请求;Receive user's query request;
对所述查询请求进行解析获取待查询关键词;Parse the query request to obtain keywords to be queried;
在所述索引数据库中进行查询获得与所述待查询关键词对应的查询维度标识数据作为目标标识;Query in the index database to obtain query dimension identification data corresponding to the keyword to be queried as a target identification;
在所述第一关系型数据库中进行查询获得与所述目标标识对应的计算结果数据。Query in the first relational database to obtain calculation result data corresponding to the target identifier.
优选的,所述方法还包括:Preferably, the method further includes:
将所述计算结果数据的至少部分数据与所述查询维度标识数据关联存储在所述索引数据库中。At least part of the calculation result data is associated with the query dimension identification data and stored in the index database.
优选的,Preferably,
所述调用计算程序对所述原始商品内容数据进行计算得到计算结果数据包括:The calculation result data obtained by invoking the calculation program to calculate the original product content data includes:
调用计算程序对所述原始商品内容数据在至少两个内容维度计算每种商品的各维度内容质量得分并根据各维度得分计算每种商品的内容质量总分;Invoking a calculation program to calculate the content quality score of each dimension of each product in at least two content dimensions for the original product content data, and calculate the total content quality score of each product according to the scores of each dimension;
所述将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中包括:The storing the calculation result data in association with the query dimension identification data in the first relational database includes:
将所述每种商品的各维度内容质量得分和每种商品的内容质量总分与所述查询维度标识数据关联存储在所述第一关系型数据库中;Storing the content quality score of each dimension of each commodity and the total content quality score of each commodity in association with the query dimension identification data in the first relational database;
所述将所述计算结果数据的至少部分数据与所述查询维度标识数据关联存储在所述索引数据库中包括:The storing at least part of the calculation result data in association with the query dimension identification data in the index database includes:
将所述每种商品的质量总分与所述查询维度标识数据关联存储在所述索引数据库中。The total quality score of each commodity is associated with the query dimension identification data and stored in the index database.
优选的,所述查询维度标识数据为商品编码和/或商家编码。Preferably, the query dimension identification data is a product code and/or a merchant code.
优选的,所述方法还包括:Preferably, the method further includes:
接收所述原始商品内容数据并分集群分库分表存储在第二关系型数据库 中;Receiving the original product content data and storing it in a second relational database in clusters, databases and tables;
将所述第二关系型数据库中的所述原始商品内容数据同步到所述第一关系型数据库中。Synchronize the original product content data in the second relational database to the first relational database.
优选的,所述接收所述原始商品内容数据并分集群分库分表存储在第二关系型数据库中包括:Preferably, the receiving the original product content data and storing it in a second relational database in clusters, databases and tables includes:
接收所述原始商品内容数据并按照商品编码分集群分库分表存储在第二关系型数据库中。The original product content data is received and stored in the second relational database according to the product code, cluster and database and table.
优选的,Preferably,
所述第一关系型数据库为Hbase,所述第二关系型数据库为Mysql,所述计算程序为Spark,所述索引数据库为Elasticsearch。The first relational database is Hbase, the second relational database is Mysql, the calculation program is Spark, and the index database is Elasticsearch.
本申请另一方面还提供一种数据处理平台,所述平台包括数据存储层和数据计算层;Another aspect of the present application also provides a data processing platform, which includes a data storage layer and a data calculation layer;
所述数据存储层用于将原始商品内容数据分集群分库分表存储在第一关系型数据库,并根据所述原始商品内容数据建立索引数据存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;The data storage layer is used to store the original product content data in clusters, databases and tables in a first relational database, and build index data based on the original product content data and store it in the index database; the index data includes keywords The query dimension identification data corresponding to the field and each keyword field;
所述数据计算层,用于调用计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。The data calculation layer is used to call a calculation program to calculate the original product content data to obtain calculation result data and store the calculation result data in association with the query dimension identification data in the first relational database.
本申请再一方面还提供一种计算机系统,包括:In yet another aspect of this application, a computer system is also provided, including:
一个或多个处理器;以及One or more processors; and
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如下操作:A memory associated with the one or more processors, where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
将原始商品内容数据分集群分库分表存储在第一关系型数据库中;Store the original product content data in clusters, databases and tables in the first relational database;
根据所述原始商品内容数据建立索引数据并存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;Create index data according to the original product content data and store it in an index database; the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
调用计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。A calculation program is called to calculate the original product content data to obtain calculation result data, and the calculation result data is associated with the query dimension identification data and stored in the first relational database.
根据本申请提供的具体实施例,本申请公开了以下技术效果:According to the specific embodiments provided in this application, this application discloses the following technical effects:
本申请的技术方案,通过将商品原始数据分集群分库分表存储在关系型数据库,调用计算程序进行计算,提高了计算效率,并根据查询维度建立索引数据库,后续在查询时先进行索引,必会提高查询效率。相比现有技术,该方案可以快速提供计算结果数据的多维度方面的查询,避免了直接在关系型数据库中进行查询导致的效率低的问题。The technical solution of this application improves the calculation efficiency by storing the original commodity data in clusters, databases, and tables in a relational database, calling calculation programs for calculation, and establishing an index database according to the query dimensions, and then indexing is performed in the subsequent query. Will improve query efficiency. Compared with the prior art, this solution can quickly provide multi-dimensional query of calculation result data, and avoids the problem of low efficiency caused by direct query in a relational database.
当然,实施本申请的任一产品并不一定需要同时达到以上所述的所有优点。Of course, implementing any product of the present application does not necessarily need to achieve all the advantages described above at the same time.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present application or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the embodiments. Obviously, the drawings in the following description are only some of the present application. Embodiments, for those of ordinary skill in the art, without creative work, other drawings can be obtained based on these drawings.
图1是本申请实施例提供的数据处理平台结构图;Fig. 1 is a structural diagram of a data processing platform provided by an embodiment of the present application;
图2是本申请实施例提供的集群分库示意图;Figure 2 is a schematic diagram of a cluster database provided by an embodiment of the present application;
图3是本申请实施例提供的原始商品内容数据同步流程图;FIG. 3 is a flowchart of original product content data synchronization provided by an embodiment of the present application;
图4是本申请实施例提供的商品内容质量分查询流程图;FIG. 4 is a flow chart of product content quality sub-query according to an embodiment of the present application;
图5是本申请实施例提供的数据处理方法流程图;Fig. 5 is a flowchart of a data processing method provided by an embodiment of the present application;
图6是本申请实施例提供的计算机系统架构图。Fig. 6 is an architecture diagram of a computer system provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art fall within the protection scope of this application.
本申请旨在提供一种商品内容数据的处理方法,通过将原始商品内容数据分集群分库分表存储在关系型数据库后调用计算程序分库并行进行计算提高计算效率,并根据所需查询维度建立索引数据,使得后续查询时,可先在索引 数据库匹配标识数据后再在关系型数据库进行查询,该方案可快速提供计算结果数据的多维度方面的查询,提高了查询效率。This application aims to provide a method for processing product content data. The original product content data is stored in a relational database in clusters, databases, and tables, and then the calculation program is called to perform parallel calculations in a relational database to improve calculation efficiency, and according to the required query dimensions Index data is established so that subsequent queries can be performed in the relational database after matching the identification data in the index database. This solution can quickly provide multi-dimensional queries of the calculation result data and improve the query efficiency.
如图1所示,为本申请其中一个实施例的数据处理平台结构图,包括Mysql数据库、Hbase数据库、用于计算的Spark计算程序、搜索引擎Elasticsearch、远程服务框架RSF以及进行查询的商家。As shown in Fig. 1, the data processing platform structure diagram of one embodiment of this application includes Mysql database, Hbase database, Spark computing program for calculation, search engine Elasticsearch, remote service framework RSF, and merchants that perform queries.
其中,Mysql数据库作为接收原始商品内容数据的数据库,其通过分集群分库分表的方式将海量的原始商品内容数据存储在自身数据库中。具体的可以按照商品编码的方式完成分集群分库分表,具体操作将在后续进行详细介绍。Among them, the Mysql database is used as a database for receiving original product content data, and it stores a large amount of original product content data in its own database by clustering and sub-database sub-table. Specifically, it can be divided into clusters and tables according to the way of commodity coding, and the specific operations will be described in detail later.
Hbase数据库用于根据Mysql数据库中的数据进行同步。其具体可通过数据复制和数据交换平台完成同步。经同步,Hbase数据库按照分集群分库分表的方式存储原始商品内容数据。The Hbase database is used for synchronization based on the data in the Mysql database. Specifically, synchronization can be accomplished through the data replication and data exchange platform. After synchronization, the Hbase database stores the original product content data in a manner of clustering, database, and table.
本申请的其他实施例中,原始商品内容数据可以直接存储在Hbase数据库,不用经过Mysql数据库。而经过Mysql数据库的方式一是考虑到数据备份的稳定性,二是考虑到其他业务过程需要依赖Mysql数据库进行操作。In other embodiments of the present application, the original product content data can be directly stored in the Hbase database without going through the Mysql database. And the way to go through the Mysql database is to take into account the stability of data backup, and the other is to take into account that other business processes need to rely on the Mysql database for operations.
计算后续要查询的结果数据,建立索引,并建立索引与计算得到的结果数据之间的关联关系。使得根据索引数据可以进一步查询到结果数据:Calculate the result data to be queried later, establish an index, and establish an association relationship between the index and the result data obtained by the calculation. So that the result data can be further queried according to the index data:
在索引数据库Elasticsearch中存储有用于查询的关键词字段如商品品牌及关键词字段对应的标识数据如商品编码。基于该索引,即可对用户(商家)输入的查询关键词匹配出对应的商品编码。The index database Elasticsearch stores keyword fields for query, such as product brands, and identification data corresponding to the keyword fields, such as product codes. Based on the index, the query keywords entered by the user (merchant) can be matched to the corresponding product code.
Spark计算程序用于根据商品编码的号段,对每个集群的原始商品内容数据根据表达式规则进行MapReduce(编程模型,用于大规模数据集(大于1TB)的并行运算),以得到计算结果,如计算出商品内容质量分。得到计算结果后将计算结果与标识数据如商品编码存储在Hbase数据库中。The Spark calculation program is used to perform MapReduce (programming model, parallel operation for large-scale data sets (greater than 1TB)) on the original product content data of each cluster according to the expression rules according to the number segment of the product code to obtain the calculation result , Such as calculating the quality score of the product content. After the calculation result is obtained, the calculation result and identification data such as the product code are stored in the Hbase database.
经上述步骤就通过标识数据建立了Elasticsearch中索引数据与Hbase数据库中计算结果数据的关联。After the above steps, the association between the index data in Elasticsearch and the calculation result data in the Hbase database is established through the identification data.
当用户输入查询关键词,RSF首先在索引中进行查询确定匹配的标识数据如商品编码,进而根据商品编码在Hbase数据库中确定计算结果数据。When a user enters a query keyword, RSF first searches in the index to determine matching identification data such as a product code, and then determines the calculation result data in the Hbase database according to the product code.
上述索引的建立可以独立于计算过程建立,当然本申请中也可以将计算结果的至少一部分存储在索引数据库中。当进行该部分结果的查询时,仅仅通过Elasticsearch即可完成,无需进一步在Hbase数据库中进行查询。The establishment of the aforementioned index can be independent of the establishment of the calculation process. Of course, in this application, at least a part of the calculation result can also be stored in the index database. When the query of this part of the result is performed, it can be completed only through Elasticsearch, without further querying in the Hbase database.
需要说明,上述的Mysql数据库、Hbase数据库、Spark计算程序、搜索引擎Elasticsearch都可以用类似功能的模块代替,图1仅仅是本申请一种具体的系统结构而已。It should be noted that the aforementioned Mysql database, Hbase database, Spark calculation program, and search engine Elasticsearch can all be replaced by modules with similar functions. Figure 1 is only a specific system structure of this application.
以图1所示系统和计算商品内容质量分为例,对涉及到的原始商品内容数据分集群分库分表存储过程、商品内容数据同步过程、商品内容质量分计算过程、商品内容质量分同步过程、索引建立过程和商品内容质量分查询过程进行详细介绍:Taking the system shown in Figure 1 and calculating the quality of product content as an example, the original product content data involved is divided into clusters, databases, tables, storage process, product content data synchronization process, product content quality calculation process, product content quality synchronization process The process, indexing process and product content quality sub-query process are introduced in detail:
原始商品内容数据进行分集群分库分表存储:The original product content data is stored in clusters, databases, and tables:
将原始商品内容数据按照商品编码的号段分别存储在Mysql的4个集群中,按照商品编码的最后两位对10取模的结果存储在每个集群的10个分库中,按照商品编码的最后一位对10取余存储在每个分库的10个分表中,这样十几亿的商品内容数据就分散到几百张分表中。图2中示出了一种集群分库示意图。The original product content data is stored in 4 clusters of Mysql according to the number segment of the product code, and the result of modulo 10 according to the last two digits of the product code is stored in the 10 sub-databases of each cluster, according to the product code The last digit is taken out of 10 and stored in the 10 sub-tables of each sub-library, so that more than one billion product content data is scattered into hundreds of sub-tables. Figure 2 shows a schematic diagram of a cluster sub-library.
如定义每个集群所存储的商品编码的号段:000000000000000000到000000000500000000号段的商品数据存储在1集群;000000000500000001到000000001000000000号段的商品数据存储在2集群;000000001000000001到000000001500000000号段的商品数据存储在3集群;000000001500000001到000000002000000000号段的商品数据存储在4集群。For example, define the number segment of the product code stored in each cluster: the product data of the segment from 000000000000000000 to 000000000500000000 is stored in cluster 1; the product data of the segment from 000000000500000001 to 000000001000000000 is stored in cluster 2; the product data of the segment from 000000001000000001 to 000000001500000000 is stored in 3 clusters; the commodity data from 000000001500000001 to 000000002000000000 are stored in 4 clusters.
定义每个商品所属集群的分库:根据商品编码后两位对10取模的结果指定对应的分库进行存储。Define the sub-library of the cluster to which each product belongs: specify the corresponding sub-library to store the result of modulo 10 based on the last two bits of the product code.
定义每个商品所属集群所属分库的分表:根据商品编码最后一位对10取余的结果指定对应的分表进行存储。Define the sub-table of the sub-library to which each product belongs: According to the last digit of the product code, specify the corresponding sub-table to store the result of the remainder of 10.
例如商品编码000000001500000023就属于4集群3分库4分表。For example, the commodity code 000000001500000023 belongs to the 4 clusters, 3 sub-bases and 4 sub-tables.
原始商品内容数据同步:Original product content data synchronization:
原始商品内容数据同步分为三种:准实时地增量更新、每天一次的增量更 新、每周一次的全量更新。其中每天的增量更新和每周的全量更新都是为了容错。There are three types of original product content data synchronization: quasi-real-time incremental update, daily incremental update, and weekly full update. The daily incremental update and weekly full update are for fault tolerance.
如图3所示,具体的,可定义实时数据复制平台RDRS(RealTime Data Replication System),用于准实时地同步Mysql数据至HBase和定义数据交换平台IDE,用于每天增量和每周全量将Mysql数据同步至HBase:As shown in Figure 3, specifically, you can define a real-time data replication platform RDRS (RealTime Data Replication System), which is used to synchronize Mysql data to HBase in quasi-real time and define the data exchange platform IDE, which is used for daily incremental and weekly full transfer Mysql data is synchronized to HBase:
RDRS平台通过准实时解析Mysql数据库集群的binlog信息将商品内容数据同步至HBase。The RDRS platform synchronizes product content data to HBase by analyzing the binlog information of the Mysql database cluster in quasi-real time.
数据交换平台每天将商品内容信息增量数据同步至HBase,并和准实时的HBase商品内容数据进行对比纠正。The data exchange platform synchronizes the incremental data of product content information to HBase every day, and compares and corrects it with the quasi-real-time HBase product content data.
数据交换平台每周将全量的商品数据同步至HBase,并和当前的HBase商品内容数据进行对比纠正。The data exchange platform synchronizes the full amount of product data to HBase every week, and compares and corrects it with the current HBase product content data.
商品内容质量分计算:Product content quality score calculation:
商品内容质量主要由基本信息,参数信息,类目信息,主图信息,标题信息,卖点信息,详情信息七个内容维度影响。Spark程序会基于表达式规则,对每一分库并行计算,算出所有分库商品的基本信息,参数信息,类目信息,主图信息,标题信息,卖点信息,详情信息分数,最后对所有的维度得分进行汇总写入Hive(Hadoop的一个数据仓库工具),具体的:The quality of product content is mainly affected by seven content dimensions: basic information, parameter information, category information, main image information, title information, selling point information, and detailed information. The Spark program will calculate the basic information, parameter information, category information, main image information, title information, selling point information, detailed information scores, and finally for all sub-library products based on the expression rules in parallel for each sub-library. The dimensional scores are summarized and written into Hive (a data warehouse tool of Hadoop), specifically:
首先按照分库用MapReduce算出所有分库的基本信息,参数信息,类目信息,主图信息,标题信息,卖点信息,详情信息得分。按照分库来计算主要是为了降低数据过分倾斜,从而提高计算效率。First, use MapReduce to calculate the basic information, parameter information, category information, main image information, title information, selling point information, and detailed information scores of all sub-libraries according to the sub-libraries. Calculating according to the sub-database is mainly to reduce the excessive tilt of the data, thereby improving the calculation efficiency.
将基本信息,参数信息,类目信息,主图信息,标题信息,卖点信息,详情信息得分的得分合并起来得到总的得分。Combine the scores of basic information, parameter information, category information, main image information, title information, selling point information, and detailed information scores to get the total score.
以下是对本申请和现有技术计算效率的测试:The following is a test of the calculation efficiency of this application and the prior art:
在商品质量评估的待计算表插入100w条待计算数据,1000w条待计算数据,1亿条待计算数据。然后分别基于java+Mysql和Spark+HBase的方式进行计算。测试结果记录在表1中。Insert 100w pieces of data to be calculated, 1000w pieces of data to be calculated, and 100 million pieces of data to be calculated into the to-be-calculated table of product quality evaluation. Then calculate based on java+Mysql and Spark+HBase respectively. The test results are recorded in Table 1.
表1.Spark+HBase和java计算效率对比Table 1. Comparison of computing efficiency between Spark+HBase and java
表中记录数Number of records in the table Spark+HBaseSpark+HBase Java+MysqJava+Mysq
100w100w 30分钟30 minutes 8小时8 hours
1000w1000w 2小时2 hours 3天3 days
1亿100000000 5小时5 hours 30天30 days
通过测试结果可以看出基于Spark+HBase结合的计算会极大地提高计算效率,即使数据条数成倍地增加,计算效率依然有极好的表现。Through the test results, it can be seen that the calculation based on the combination of Spark+HBase will greatly improve the calculation efficiency. Even if the number of data items doubles, the calculation efficiency still has an excellent performance.
商品内容质量分数据同步:Product content quality sub-data synchronization:
按照设定查询维度如商品、商家把各得分汇总计算得到对应的总分,比如某一种商品总分或者某一商家的总分。当然也可以是其他维度。之后将各维度商品内容质量得分、商品内容质量总分以及将按照设定查询维度汇总的得分等数据同步到HBase。According to the set query dimensions such as products and merchants, the corresponding total scores are calculated by summarizing the scores, such as the total score of a certain product or the total score of a certain merchant. Of course, other dimensions are also possible. After that, data such as the product content quality score of each dimension, the total product content quality score, and the score aggregated according to the set query dimensions are synchronized to HBase.
查询维度索引建立:Query dimension index establishment:
按照查询维度如商品编码、商家编码建立索引数据,索引数据中包括关键词字段和对应的查询维度标识数据。如商品品牌与对应的商品编码。Establish index data according to query dimensions such as product code and merchant code. The index data includes keyword fields and corresponding query dimension identification data. Such as product brand and corresponding product code.
该索引的建立可以基于商品内容质量分数据同步过程建立,在计算得到商品内容质量分数据并同步到HBase时,建立原始商品数据中的关键词字段与查询维度标识数据的对应关系,并将按照查询维度如商品编码、商家编码汇总得到的总分数据同步到索引数据中。The index can be established based on the synchronization process of the product content quality score data. When the product content quality score data is calculated and synchronized to HBase, the corresponding relationship between the keyword field in the original product data and the query dimension identification data is established, and the corresponding relationship is established according to the The total score data obtained by querying dimensions such as product code and merchant code is synchronized to the index data.
其中,Elasticsearch和HBase的相关计算结果数据如商品内容质量分数数据都是增量更新。Among them, the relevant calculation result data of Elasticsearch and HBase, such as product content quality score data, are incremental updates.
商品内容质量分查询。Product content quality sub-query.
针对用户需要的不同类型的查询条件的数据,需要相对应的查询接口和请求参数,然后会根据查询条件先去Elasticsearch获取到对应的商品编码、商家编码,接着根据查询到的商品编码、商家编码去HBase查询到所需要的数据,最后将符合条件的数据整合过滤后返回给用户,具体的:According to the data of different types of query conditions required by users, corresponding query interfaces and request parameters are required. Then, according to the query conditions, it will first go to Elasticsearch to obtain the corresponding product code and merchant code, and then according to the queried product code and merchant code Go to HBase to query the required data, and finally integrate and filter the eligible data and return it to the user. Specifically:
首先定义远程服务框架RSF(Remote Service Framework),用于向查询器组件提供远程查询服务和定义查询服务(QueryService),用于处理商家的查询。其根据商家输入的查询条件,调用RSF服务进行各种类型的迭代查询,然后把各个子查询条件的结果求交集,其中每个子查询为并发查询。First define the remote service framework RSF (Remote Service Framework), which is used to provide remote query services to the querier component and define query services (QueryService), which are used to process merchant queries. According to the query conditions entered by the merchant, it calls the RSF service to perform various types of iterative queries, and then intersects the results of each sub-query condition, where each sub-query is a concurrent query.
图4是商品内容质量分查询过程流程图,包括以下步骤:Figure 4 is a flowchart of the product content quality sub-query process, including the following steps:
客户端发出商品质量分的查询服务请求;The client sends a query service request for product quality score;
查询服务器将客户端发来的商品质量分查询服务请求进行表达式解析;The query server performs expression analysis on the product quality sub-query service request sent by the client;
查询服务器将解析后的查询请求向Elasticsearch集群(Elasticsearch Cluster)提交查询请求;该实施例中Elasticsearch设置集群是为了防止机器单点故障。The query server submits the parsed query request to an Elasticsearch cluster (Elasticsearch Cluster); in this embodiment, Elasticsearch sets up the cluster to prevent a single point of failure of the machine.
Elasticsearch集群向查询服务器返回查询结果(商品编码+商家编码);The Elasticsearch cluster returns the query result (product code + merchant code) to the query server;
查询服务器按照Elasticsearch集群返回的查询结果向HBase集群(HBase Cluster)提交查询请求;The query server submits a query request to the HBase Cluster (HBase Cluster) according to the query result returned by the Elasticsearch cluster;
HBase集群向查询服务器返回与商品编码和商家编码对应的最终查询结果;The HBase cluster returns the final query result corresponding to the product code and merchant code to the query server;
查询服务器向客户端返回最终查询结果。The query server returns the final query result to the client.
以下是对本申请和现有技术查询效率的测试:The following is a test of the efficiency of this application and the prior art query:
在商品质量评估的待计算表插入100w条待计算数据,1000w条待计算数据,1亿条待计算数据。然后分别基于Java+Mysql和Spark+HBase的方式进行计算。Insert 100w pieces of data to be calculated, 1000w pieces of data to be calculated, and 100 million pieces of data to be calculated into the to-be-calculated table of product quality evaluation. Then calculate based on Java+Mysql and Spark+HBase respectively.
通过测试结果可以看出基于Spark+HBase结合的计算会极大地提高计算效率,即使数据条数成倍地增加,计算效率依然有极好的表现。Through the test results, it can be seen that the calculation based on the combination of Spark+HBase will greatly improve the calculation efficiency. Even if the number of data items doubles, the calculation efficiency still has an excellent performance.
在Elasticsearch和HBase的不同表中分别插入100万数据、1000万数据、1亿数据、10亿数据,每条记录15个字段。然后分别基于java+Mysql和Elasticsearch+HBase的方式进行查询。Insert 1 million data, 10 million data, 100 million data, and 1 billion data into different tables of Elasticsearch and HBase, with 15 fields in each record. Then query based on java+Mysql and Elasticsearch+HBase respectively.
测试结果记录在表2中。The test results are recorded in Table 2.
表2.Elasticsearch+HBase和Java+Mysql查询效率对比Table 2. Comparison of query efficiency between Elasticsearch+HBase and Java+Mysql
表中记录数Number of records in the table Elasticsearch+HBaseElasticsearch+HBase Java+MysqlJava+Mysql
100w100w 125ms125ms 0.564s0.564s
1000w1000w 140ms140ms 2.543s2.543s
1亿100000000 162ms162ms 超时异常Timeout exception
10亿1000000000 190ms190ms 超时异常Timeout exception
通过测试结果可以看出基于Elasticsearch+HBase结合的查询会极大地提高查询效率,即使数据条数成倍地增加,查询效率依然有极好的表现。Through the test results, it can be seen that the query based on the combination of Elasticsearch+HBase will greatly improve the query efficiency. Even if the number of data items doubles, the query efficiency still has an excellent performance.
实施例一Example one
如之前所述,上述各数据库或计算程序Spark可以为类似功能模块替代,而计算结果也可以根据用户需求设为商品内容质量分以外的其他数据。基于此,本申请实施例一提供一种数据处理方法,如图5所示,包括如下步骤:As mentioned earlier, the aforementioned databases or calculation programs Spark can be replaced by similar functional modules, and the calculation results can also be set as other data other than the product content quality score according to user needs. Based on this, the first embodiment of the present application provides a data processing method, as shown in FIG. 5, including the following steps:
S51、将原始商品内容数据分集群分库分表存储在第一关系型数据库中;S51. Store the original product content data in clusters, databases and tables in the first relational database;
S52、根据所述原始商品内容数据建立索引数据并存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;S52. Establish index data according to the original product content data and store it in an index database; the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
S53、调用计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。S53: Invoke a calculation program to calculate the original product content data to obtain calculation result data, and store the calculation result data in association with the query dimension identification data in the first relational database.
优选的,Preferably,
所述方法还包括:The method also includes:
接收用户的查询请求;Receive user's query request;
对所述查询请求进行解析获取待查询关键词;Parse the query request to obtain keywords to be queried;
在所述索引数据库中进行查询获得与所述待查询关键词对应的查询维度标识数据作为目标标识;Query in the index database to obtain query dimension identification data corresponding to the keyword to be queried as a target identification;
在所述第一关系型数据库中进行查询获得与所述目标标识对应的计算结果数据。Query in the first relational database to obtain calculation result data corresponding to the target identifier.
另外,所述方法还可包括:In addition, the method may further include:
将所述计算结果数据的至少部分数据与所述查询维度标识数据关联存储在所述索引数据库中。At least part of the calculation result data is associated with the query dimension identification data and stored in the index database.
另一优选实施例中,所述方法还包括:接收所述原始商品内容数据并分集群分库分表存储在第二关系型数据库中;具体可按照商品编码进行分集群分库分表;In another preferred embodiment, the method further includes: receiving the original product content data and storing it in a second relational database in clusters, databases, and tables; specifically, clusters and tables are classified according to the product code;
将所述第二关系型数据库中的所述原始商品内容数据同步到所述第一关系型数据库中。Synchronize the original product content data in the second relational database to the first relational database.
实施例二Example two
与上述方法相对应,本申请还提供一种数据处理平台,所述平台包括数据存储层和数据计算层;Corresponding to the above method, this application also provides a data processing platform, which includes a data storage layer and a data calculation layer;
所述数据存储层用于将原始商品内容数据分集群分库分表存储的第一关系型数据库,并根据所述原始商品内容数据建立索引数据存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;The data storage layer is used to divide the original product content data into clusters, databases and tables to store the first relational database, and build index data according to the original product content data and store it in the index database; the index data includes keywords The query dimension identification data corresponding to the field and each keyword field;
所述数据计算层,用于调用计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。The data calculation layer is used to call a calculation program to calculate the original product content data to obtain calculation result data and store the calculation result data in association with the query dimension identification data in the first relational database.
优选实施例中,所述数据处理平台还包括数据应用层,用于接收用户的查询请求进行解析获取待查询关键词,在所述索引数据库中进行查询获得与所述待查询关键词对应的查询维度标识数据作为目标标识并在所述第一关系型数据库中进行查询获得与所述目标标识对应的计算结果数据以便返回该结果数据给用户。In a preferred embodiment, the data processing platform further includes a data application layer, which is used to receive a user's query request, analyze to obtain the keyword to be queried, and perform a query in the index database to obtain a query corresponding to the keyword to be queried The dimension identification data is used as a target identification and is queried in the first relational database to obtain calculation result data corresponding to the target identification so as to return the result data to the user.
优选实施例中,存储层还用于将所述计算结果数据的至少部分数据与所述查询维度标识数据关联存储在所述索引数据库中。In a preferred embodiment, the storage layer is further configured to store at least part of the calculation result data in association with the query dimension identification data in the index database.
优选实施例中,存储层还用于接收所述原始商品内容数据并分集群分库分表存储在第二关系型数据库中,将所述第二关系型数据库中的所述原始商品内容数据同步到所述第一关系型数据库中。In a preferred embodiment, the storage layer is further configured to receive the original product content data and store it in a second relational database in clusters, databases and tables, and synchronize the original product content data in the second relational database. To the first relational database.
实施例三Example three
对应上述方法和平台,本申请实施例三提供一种计算机系统,包括:Corresponding to the foregoing method and platform, Embodiment 3 of the present application provides a computer system, including:
一个或多个处理器;以及One or more processors; and
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如下操作:A memory associated with the one or more processors, where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
原始商品内容数据分集群分库分表存储在第一关系型数据库中;The original product content data is stored in the first relational database in clusters, databases and tables;
根据所述原始商品内容数据建立索引数据并存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;Create index data according to the original product content data and store it in an index database; the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
通过计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。The original product content data is calculated by a calculation program to obtain calculation result data, and the calculation result data is associated with the query dimension identification data and stored in the first relational database.
其中,图6示例性的展示出了计算机系统的架构,具体可以包括处理器1510,视频显示适配器1511,磁盘驱动器1512,输入/输出接口1513,网络接口1514,以及存储器1520。上述处理器1510、视频显示适配器1511、磁盘驱动器1512、输入/输出接口1513、网络接口1514,与存储器1520之间可以通过通信总线1530进行通信连接。Among them, FIG. 6 exemplarily shows the architecture of the computer system, which may specifically include a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected through the communication bus 1530.
其中,处理器1510可以采用通用的CPU(Central ProcElasticsearchsing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请所提供的技术方案。Among them, the processor 1510 may be implemented by a general-purpose CPU (Central ProcElasticsearchsing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits. Perform relevant procedures to realize the technical solutions provided in this application.
存储器1520可以采用ROM(Read Only Memory,只读存储器)、RAM(Random AccElasticsearchs Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1520可以存储用于控制计算机系统1500运行的操作系统1521,用于控制计算机系统1500的低级别操作的基本输入输出系统(BIOS)。另外,还可以存储网页浏览器1523,数据存储管理系统1524,以及图标字体处理系统1525等等。上述图标字体处理系统1525就可以是本申请实施例中具体实现前述各步骤操作的应用程序。总之,在通过软件或者固件来实现本申请所提供的技术方案时,相关的程序代码保存在存储器1520中,并由处理器1510 来调用执行。The memory 1520 may be implemented in the form of ROM (Read Only Memory), RAM (Random AccElasticsearch Memory, random access memory), static storage device, dynamic storage device, etc. The memory 1520 may store an operating system 1521 used to control the operation of the computer system 1500, and a basic input output system (BIOS) used to control low-level operations of the computer system 1500. In addition, a web browser 1523, a data storage management system 1524, and an icon font processing system 1525 can also be stored. The foregoing icon font processing system 1525 may be an application program that specifically implements the foregoing steps in the embodiment of the present application. In short, when the technical solution provided by the present application is implemented through software or firmware, the related program code is stored in the memory 1520, and is called and executed by the processor 1510.
输入/输出接口1513用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 1513 is used to connect input/output modules to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or it can be connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, and an indicator light.
网络接口1514用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The network interface 1514 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线1530包括一通路,在设备的各个组件(例如处理器1510、视频显示适配器1511、磁盘驱动器1512、输入/输出接口1513、网络接口1514,与存储器1520)之间传输信息。The bus 1530 includes a path to transmit information between various components of the device (for example, the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).
另外,该计算机系统1500还可以从虚拟资源对象领取条件信息数据库1541中获得具体领取条件的信息,以用于进行条件判断,等等。In addition, the computer system 1500 can also obtain information about specific receiving conditions from the virtual resource object receiving condition information database 1541 for condition determination, and so on.
需要说明的是,尽管上述设备仅示出了处理器1510、视频显示适配器1511、磁盘驱动器1512、输入/输出接口1513、网络接口1514,存储器1520,总线1530等,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本申请方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus 1530, etc., in the specific implementation process, the The equipment may also include other components necessary for normal operation. In addition, those skilled in the art can understand that the above-mentioned device may also include only the components necessary to implement the solution of the present application, and not necessarily include all the components shown in the figure.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,云服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。From the description of the foregoing implementation manners, it can be known that those skilled in the art can clearly understand that this application can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product can be stored in a storage medium, such as ROM/RAM, magnetic disk , CD-ROM, etc., including a number of instructions to enable a computer device (which may be a personal computer, a cloud server, or a network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments of the present application.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实施例而言,由于其基本相似于方法实施例,所以描述 得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system or system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the part of the description of the method embodiment. The system and system embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, It can be located in one place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
以上对本申请所提供的数据处理方法、平台及系统,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本申请的限制。The data processing methods, platforms, and systems provided by the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above examples are only used to help understand the application. The method and its core idea; meanwhile, for those of ordinary skill in the art, according to the idea of this application, there will be changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation on this application.

Claims (10)

  1. 一种数据处理方法,其特征在于,所述方法包括:A data processing method, characterized in that the method includes:
    原始商品内容数据分集群分库分表存储在第一关系型数据库中;The original product content data is stored in the first relational database in clusters, databases and tables;
    根据所述原始商品内容数据建立索引数据并存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;Create index data according to the original product content data and store it in an index database; the index data includes keyword fields and query dimension identification data corresponding to each keyword field;
    通过计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。The original product content data is calculated by a calculation program to obtain calculation result data, and the calculation result data is associated with the query dimension identification data and stored in the first relational database.
  2. 如权利要求1所述的数据处理方法,其特征在于,所述方法还包括:The data processing method according to claim 1, wherein the method further comprises:
    接收用户的查询请求;Receive user's query request;
    对所述查询请求进行解析获取待查询关键词;Parse the query request to obtain keywords to be queried;
    在所述索引数据库中进行查询获得与所述待查询关键词对应的查询维度标识数据作为目标标识;Query in the index database to obtain query dimension identification data corresponding to the keyword to be queried as a target identification;
    在所述第一关系型数据库中进行查询获得与所述目标标识对应的计算结果数据。Query in the first relational database to obtain calculation result data corresponding to the target identifier.
  3. 如权利要求1所述的数据处理方法,其特征在于,所述方法还包括:The data processing method according to claim 1, wherein the method further comprises:
    将所述计算结果数据的至少部分数据与所述查询维度标识数据关联存储在所述索引数据库中。At least part of the calculation result data is associated with the query dimension identification data and stored in the index database.
  4. 如权利要求3所述的数据处理方法,其特征在于,The data processing method according to claim 3, wherein:
    所述调用计算程序对所述原始商品内容数据进行计算得到计算结果数据包括:The calculation result data obtained by invoking the calculation program to calculate the original product content data includes:
    调用计算程序对所述原始商品内容数据在至少两个内容维度计算每种商品的各维度内容质量得分并根据各维度内容质量得分计算每种商品的内容质量总分;Invoking the calculation program to calculate the content quality score of each dimension of each product in at least two content dimensions for the original product content data, and calculate the total content quality score of each product according to the content quality score of each dimension;
    所述将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中包括:The storing the calculation result data in association with the query dimension identification data in the first relational database includes:
    将所述每种商品的各维度内容质量得分和每种商品的内容质量总分与所述标识数据关联存储在所述第一关系型数据库中;Storing the content quality score of each dimension of each commodity and the total content quality score of each commodity in association with the identification data in the first relational database;
    所述将所述计算结果数据的至少部分数据与所述查询维度标识数据关联存储在所述索引数据库中包括:The storing at least part of the calculation result data in association with the query dimension identification data in the index database includes:
    将所述每种商品的内容质量总分与所述查询维度标识数据关联存储在所述索引数据库中。The content quality total score of each commodity is associated with the query dimension identification data and stored in the index database.
  5. 如权利要求1至4中任一项所述的数据处理方法,其特征在于,所述标识数据为商品编码和/或商家编码。The data processing method according to any one of claims 1 to 4, wherein the identification data is a commodity code and/or a merchant code.
  6. 如权利要求1至4中任一项所述的数据处理方法,其特征在于,所述方法还包括:The data processing method according to any one of claims 1 to 4, wherein the method further comprises:
    接收所述原始商品内容数据并分集群分库分表存储在第二关系型数据库中;Receiving the original product content data and storing it in a second relational database in clusters, databases and tables;
    将所述第二关系型数据库中的所述原始商品内容数据同步到所述第一关系型数据库中。Synchronize the original product content data in the second relational database to the first relational database.
  7. 如权利要求6所述的数据处理方法,其特征在于,所述接收所述原始商品内容数据并分集群分库分表存储在第二关系型数据库中包括:7. The data processing method of claim 6, wherein the receiving the original product content data and storing it in a second relational database in clusters, databases, and tables comprises:
    接收所述原始商品内容数据并按照商品编码分集群分库分表存储在第二关系型数据库中。The original product content data is received and stored in the second relational database according to the product code, cluster and database and table.
  8. 如权利要求6所述的数据处理方法,其特征在于,The data processing method according to claim 6, wherein:
    所述第一关系型数据库为Hbase,所述第二关系型数据库为Mysql,所述计算程序为Spark,所述索引数据库为Elasticsearch。The first relational database is Hbase, the second relational database is Mysql, the calculation program is Spark, and the index database is Elasticsearch.
  9. 一种数据处理平台,其特征在于,所述平台包括数据存储层和数据计算层;A data processing platform, characterized in that the platform includes a data storage layer and a data calculation layer;
    所述数据存储层用于将原始商品内容数据分集群分库分表存储在第一关系型数据库,并根据所述原始商品内容数据建立索引数据存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的查询维度标识数据;The data storage layer is used to store the original product content data in clusters, databases and tables in a first relational database, and build index data based on the original product content data and store it in the index database; the index data includes keywords The query dimension identification data corresponding to the field and each keyword field;
    所述数据计算层,用于调用计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述查询维度标识数据关联存储在所述第一关系型数据库中。The data calculation layer is used to call a calculation program to calculate the original product content data to obtain calculation result data and store the calculation result data in association with the query dimension identification data in the first relational database.
  10. 一种计算机系统,其特征在于,包括:A computer system, characterized in that it comprises:
    一个或多个处理器;以及One or more processors; and
    与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如下操作:A memory associated with the one or more processors, where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
    将原始商品内容数据分集群分库分表存储在第一关系型数据库中;Store the original product content data in clusters, databases and tables in the first relational database;
    根据所述原始商品内容数据建立索引数据并存储在索引数据库中;所述索引数据包括关键词字段与每一关键词字段对应的标识数据;Create index data according to the original product content data and store it in an index database; the index data includes keyword fields and identification data corresponding to each keyword field;
    调用计算程序对所述原始商品内容数据进行计算得到计算结果数据并将所述计算结果数据与所述标识数据关联存储在所述第一关系型数据库中。Calling a calculation program to calculate the original product content data to obtain calculation result data, and store the calculation result data in association with the identification data in the first relational database.
PCT/CN2020/096999 2019-10-10 2020-06-19 Data processing method, platform and system WO2021068549A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3154438A CA3154438A1 (en) 2019-10-10 2020-06-19 Commodity content data processing method,platform and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910959014.9 2019-10-10
CN201910959014.9A CN110837520A (en) 2019-10-10 2019-10-10 Data processing method, platform and system

Publications (1)

Publication Number Publication Date
WO2021068549A1 true WO2021068549A1 (en) 2021-04-15

Family

ID=69575186

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/096999 WO2021068549A1 (en) 2019-10-10 2020-06-19 Data processing method, platform and system

Country Status (3)

Country Link
CN (1) CN110837520A (en)
CA (1) CA3154438A1 (en)
WO (1) WO2021068549A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407445A (en) * 2023-10-27 2024-01-16 上海势航网络科技有限公司 Data storage method, system and storage medium for Internet of vehicles data platform
CN117407445B (en) * 2023-10-27 2024-06-04 上海势航网络科技有限公司 Data storage method, system and storage medium for Internet of vehicles data platform

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837520A (en) * 2019-10-10 2020-02-25 苏宁云计算有限公司 Data processing method, platform and system
CN111459985B (en) * 2020-03-31 2023-10-27 美的集团股份有限公司 Identification information processing method and device
CN111651479A (en) * 2020-04-15 2020-09-11 山东中创软件工程股份有限公司 Article evaluation method, device and related equipment
CN112069021B (en) * 2020-08-21 2024-02-20 北京五八信息技术有限公司 Flow data storage method and device, electronic equipment and storage medium
CN112380276B (en) * 2021-01-15 2021-09-07 四川新网银行股份有限公司 Method for querying data by non-fragment key fields after database division and table division of distributed system
CN113961580A (en) * 2021-12-22 2022-01-21 联通智网科技股份有限公司 Data query method, service system and electronic equipment
CN115455149B (en) * 2022-09-20 2023-05-30 城云科技(中国)有限公司 Database construction method based on coding query mode and application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930054A (en) * 2012-11-19 2013-02-13 北京奇虎科技有限公司 Data search method and data search system
CN105719105A (en) * 2014-12-03 2016-06-29 镇江雅迅软件有限责任公司 Inventory quick lookup method based on keywords
US20170078251A1 (en) * 2015-09-11 2017-03-16 Skyhigh Networks, Inc. Wildcard search in encrypted text using order preserving encryption
CN108874971A (en) * 2018-06-07 2018-11-23 北京赛思信安技术股份有限公司 A kind of tool and method applied to the storage of magnanimity labeling solid data
CN110837520A (en) * 2019-10-10 2020-02-25 苏宁云计算有限公司 Data processing method, platform and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156088B (en) * 2015-04-01 2020-02-04 阿里巴巴集团控股有限公司 Index data processing method, data query method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930054A (en) * 2012-11-19 2013-02-13 北京奇虎科技有限公司 Data search method and data search system
CN105719105A (en) * 2014-12-03 2016-06-29 镇江雅迅软件有限责任公司 Inventory quick lookup method based on keywords
US20170078251A1 (en) * 2015-09-11 2017-03-16 Skyhigh Networks, Inc. Wildcard search in encrypted text using order preserving encryption
CN108874971A (en) * 2018-06-07 2018-11-23 北京赛思信安技术股份有限公司 A kind of tool and method applied to the storage of magnanimity labeling solid data
CN110837520A (en) * 2019-10-10 2020-02-25 苏宁云计算有限公司 Data processing method, platform and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407445A (en) * 2023-10-27 2024-01-16 上海势航网络科技有限公司 Data storage method, system and storage medium for Internet of vehicles data platform
CN117407445B (en) * 2023-10-27 2024-06-04 上海势航网络科技有限公司 Data storage method, system and storage medium for Internet of vehicles data platform

Also Published As

Publication number Publication date
CN110837520A (en) 2020-02-25
CA3154438A1 (en) 2021-04-15

Similar Documents

Publication Publication Date Title
WO2021068549A1 (en) Data processing method, platform and system
US20220035815A1 (en) Processing database queries using format conversion
US9053160B2 (en) Distributed, real-time online analytical processing (OLAP)
EP2702510B1 (en) Joining tables in a mapreduce procedure
US9747349B2 (en) System and method for distributing queries to a group of databases and expediting data access
US20130166552A1 (en) Systems and methods for merging source records in accordance with survivorship rules
CN111767303A (en) Data query method and device, server and readable storage medium
US20110137917A1 (en) Retrieving a data item annotation in a view
US7814045B2 (en) Semantical partitioning of data
CN108875042B (en) Hybrid online analysis processing system and data query method
JP2018506775A (en) Identifying join relationships based on transaction access patterns
US20160196319A1 (en) Multi-dimensional data analysis
TW201820175A (en) Data base transformation server and data base transformation method thereof
US11132363B2 (en) Distributed computing framework and distributed computing method
US7707144B2 (en) Optimization for aggregate navigation for distinct count metrics
US20220374406A1 (en) KV Database Configuration Method, Query Method, Device, and Storage Medium
US11790008B2 (en) Persisted queries and batch streaming
CN107735781B (en) Method and device for storing query result and computing equipment
US7725468B2 (en) Improving efficiency in processing queries directed to static data sets
CN109542912B (en) Interval data storage method, device, server and storage medium
CN110297842B (en) Data comparison method, device, terminal and storage medium
CN113342843A (en) Big data online analysis method and system
CN108595552A (en) Data cube dissemination method, device, electronic equipment and storage medium
TWI578173B (en) Statistical e-commerce transaction data, e-commerce transaction data statistics system and application server
CN111753017B (en) Method and device for processing dimension table based on Kylin system, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20874647

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3154438

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20874647

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20874647

Country of ref document: EP

Kind code of ref document: A1