WO2020147335A1 - Method and system for clustering member data on electronic commerce platform - Google Patents

Method and system for clustering member data on electronic commerce platform Download PDF

Info

Publication number
WO2020147335A1
WO2020147335A1 PCT/CN2019/106863 CN2019106863W WO2020147335A1 WO 2020147335 A1 WO2020147335 A1 WO 2020147335A1 CN 2019106863 W CN2019106863 W CN 2019106863W WO 2020147335 A1 WO2020147335 A1 WO 2020147335A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
bitmap
consumption
query
unit
Prior art date
Application number
PCT/CN2019/106863
Other languages
French (fr)
Chinese (zh)
Inventor
范东
孙迁
汪金忠
Original Assignee
苏宁云计算有限公司
苏宁易购集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁云计算有限公司, 苏宁易购集团股份有限公司 filed Critical 苏宁云计算有限公司
Priority to CA3168300A priority Critical patent/CA3168300A1/en
Publication of WO2020147335A1 publication Critical patent/WO2020147335A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present invention relates to the technical field of data processing, and in particular to a method and system for circle selection of member data for an e-commerce platform.
  • the analysis result of member consumption data is an important reference for planning promotional activities.
  • multi-dimensional circle selection of data is a key step to obtain member consumption data; in the prior art, Commonly used data circle selection methods include pre-summarization circle selection method (OLAP-Druid) and distributed memory computing circle selection method (SPARK).
  • OLAP-Druid circle selection method it mainly uses HLL algorithm to consume members. Data is calculated and analyzed. The accuracy of the algorithm in deduplication business scenarios is often lost, so there is a problem of inaccurate membership consumption data circle selection results.
  • the SPARK circle selection method obtains the original consumption data of members and uses pull The detailed method puts the original consumption data in the memory of the distributed node for logical calculation, and finally obtains the result of member consumption data circle selection.
  • the latter circle selection method can To ensure the accuracy of the results of member consumption data circle selection, but it needs to consume more computing resources and memory resources, so there are defects of low circle selection efficiency and poor user experience.
  • the purpose of the present invention is to provide a member data circle selection method and system for an e-commerce platform, which can reduce the consumption of memory and computing resources while ensuring the accuracy of member consumption data circle selection results, and significantly improve the circle of member consumption data. Election efficiency.
  • one aspect of the present invention provides a member data circle selection method for an e-commerce platform, including:
  • the method further includes:
  • the field data in the bitmap is regularly supplemented and updated to generate a bitmap corresponding to the current time node.
  • the step of periodically supplementing and updating the field data in the bitmap, and generating a bitmap corresponding to the current time node includes:
  • the corresponding field data in the newly added member consumption data is regularly supplemented into the bit chart according to the mapping relationship to realize the supplement and update of the bit chart.
  • periodically supplementing and updating the field data in the bitmap, and generating the bitmap corresponding to the current time node further includes:
  • the bitmap is cleaned, and irrelevant latitude field data is removed.
  • step performing a bit operation on the multiple latitude consumption fields in the bit chart through the integer identifier according to the user's query instruction, and before outputting the circle selection result, it also includes:
  • the multiple latitude consumption fields in the bit chart are bitwise operated through the integer identifier, and the method of outputting the circled selection result includes:
  • the member data circle selection method for e-commerce platforms provided by the present invention has the following beneficial effects:
  • the member data circle selection method for e-commerce platform firstly obtains member consumption data from a data warehouse to create a data model.
  • the data model includes member codes, multiple latitude consumption fields, and consumption dates.
  • the codes are converted into integer identifiers one by one, and the mapping relationship between the integer identifier and the member code is stored in the dictionary table, and then the integer identifier, the consumption field and the consumption date are used to construct a bit chart, which is called after obtaining the user's query instruction
  • the integer identifier performs logical bit operations on multiple latitude consumption fields in the bit chart, and finally obtains the circled result.
  • the member code is replaced by an integer identifier and the member consumption data is represented by a bitmap (bitmap table), so as to realize the circle selection of member data only through the bitmap
  • bitmap table bitmap table
  • Another aspect of the present invention provides a member data circle selection system for an e-commerce platform, which is applied to the member data circle selection method for an e-commerce platform described in the above technical solution, and the system includes:
  • the data model creation unit is used to synchronize member consumption data from the data warehouse to create multiple data models
  • the dictionary table creation unit generates multiple different integer identifiers based on the member code in the data model, and saves the mapping relationship between the member code and the integer identifier in the dictionary table;
  • the bitmap generating unit is used to generate a bitmap by one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data;
  • the query output unit is used to perform bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions, and output the circle selection results.
  • bitmap updating unit connected to the bitmap generating unit
  • the bitmap update unit is used to periodically supplement and update the field data in the bitmap, and generate a bitmap corresponding to the current time node.
  • bitmap update unit includes:
  • the data acquisition module is used to acquire new member consumption data from the data warehouse based on the current time node and synchronize it to the data model;
  • the bit chart update module is used to regularly add the corresponding field data in the newly added member consumption data into the bit chart according to the mapping relationship according to the mapping relationship of the member code in the dictionary table to realize the supplement and update of the bit chart.
  • it further includes a data cleaning unit provided between the bitmap generating unit and the query output unit;
  • the data cleaning unit is used for cleaning the bitmap to eliminate irrelevant latitude field data.
  • it further includes a pre-circle selection unit and a storage unit, the input end of the pre-circle selection unit is connected to the output end of the data cleaning unit, and the output end of the storage unit is connected to the input end of the query output unit ;
  • the pre-circle selection unit is used to preset a variety of query instructions, and perform bit operations on the cleaned bit graph in advance to obtain pre-circle selection results that match the multiple query instructions;
  • the storage unit is used to store various pre-circle selection results in a temporary result table for users to query.
  • the query output unit includes:
  • the judgment module is used to receive a user's query instruction, and determine that the query instruction is a preset query instruction;
  • the output module is used to directly output the corresponding pre-circle selection results from the temporary result table when the judgment result is yes.
  • the judgment result is no, use the integer identifier to add the updated bit map to multiple
  • the latitude consumption field performs logical operations and outputs the result of circle selection.
  • the beneficial effects of the member data circle selection system for e-commerce platforms provided by the present invention are the same as the beneficial effects of the member data circle selection method for e-commerce platforms provided by the above technical solutions. Do repeat.
  • FIG. 1 is a schematic flowchart of a member data circle selection method used on an e-commerce platform in Embodiment 1 of the present invention
  • FIG. 2 is an example diagram of circle selection results obtained by using a member data circle selection method for an e-commerce platform in Embodiment 1 of the present invention
  • Figure 3 is an example diagram of bitmap_table_A statistical representation in the first embodiment of the present invention.
  • bitmap_table_B statistical representation in the first embodiment of the present invention
  • Fig. 5 is a structural block diagram of a member data circle selection system for an e-commerce platform in the second embodiment of the present invention.
  • 51-Data acquisition module 52-bit chart update module.
  • this embodiment provides a method for circle selection of member data for an e-commerce platform, including:
  • the identifier and the multiple latitude consumption fields of the member consumption data generate a bit chart in one-to-one correspondence; according to the user's query instruction, the multiple latitude consumption fields in the bit chart are bit-calculated through the integer identifier to output the circled selection result.
  • the member data circle selection method for e-commerce platform first obtains member consumption data from a data warehouse to create a data model.
  • the data model includes member codes, multiple latitude consumption fields, and consumption dates.
  • the member codes are converted into integer identifiers one by one, and the mapping relationship between the integer identifier and the member code is stored in the dictionary table, and then the integer identifier, consumption field and consumption date are used to construct a bitmap.
  • the integer identifier, consumption field and consumption date are used to construct a bitmap.
  • the integer identifier After obtaining the user's query instruction Call the integer identifier to perform logical bit operations on multiple latitude consumption fields in the bit chart, and finally get the circled result.
  • the member code is replaced by an integer identifier and the member consumption data is represented by a bitmap table, so that the circle selection of member data is only passed
  • the set bit operation in the bitmap table can be obtained, which can significantly improve the operation efficiency while reducing the computing resources and storage resources. It is especially suitable for the circle selection operation of massive member data.
  • this embodiment corresponds to the integer identifier and the multiple latitude consumption fields of the member consumption data. After generating the bit chart, it also includes: regularly supplementing and updating the field data in the bit chart, and generating a bit chart corresponding to the current time node.
  • the method for periodically supplementing and updating field data in the bitmap in the foregoing embodiment, and generating a bitmap corresponding to the current time node includes:
  • the updated member consumption data is periodically obtained from the data warehouse, and then the specified complement function is used to perform the complement operation based on the mapping relationship in the dictionary table to realize the supplementary update of the field data in the alignment chart; for convenience
  • the specific process of the supplementary update the following is an example.
  • the field data in the bitmap is regularly supplemented and updated, and after the bitmap corresponding to the current time node is generated, it also includes: cleaning the bitmap, Eliminate irrelevant latitude field data.
  • this step is equivalent to creating a CUBE model, which reduces the data base after group by after excluding field data of irrelevant dimensions in the bit chart, which can improve query efficiency and speed up circle selection.
  • this embodiment performs bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions. Before outputting the circle selection results, it also includes:
  • Pre-set a variety of query instructions perform bit operations on the cleaned bit chart in advance to obtain pre-circle selection results matching multiple query instructions; store multiple pre-circle selection results in a temporary result table for user inquiries.
  • this embodiment pre-stores commonly used query instructions to make The system can perform bit operations on the cleaned bit map in advance according to these query instructions to obtain the corresponding pre-circle selection results, and store these pre-circle selection results in the temporary result table so that they can be directly called when the user queries.
  • the step of performing bit operations on the multiple latitude consumption fields in the bit chart through the integer identifier according to the user's query instruction, and the method of outputting the circle selection result includes:
  • the updated bit chart can be logically calculated based on the query command to output the circled selection result, that is, the circled selection result can be output through real-time calculation and expanded.
  • the user's query range is improved, and the user-defined multi-dimensional circle selection function is supported.
  • this embodiment takes the circle selection process of new and old member consumption data as an example.
  • the member consumption data with the date of 0826 and 0827 is obtained from the data warehouse to create a data model.
  • the model includes multiple latitudes of consumption fields such as membership code, shopping channel, shopping category, and shopping date.
  • membership code is converted into an integer identifier that is convenient for bitmap table operations. It is represented by the letters A, B, C, and D.
  • the consumption data bitmap_table_A of new and old members of different shopping channels are calculated.
  • the bitmap table collection secondly, you need to count the consumption data bitmap_table_B of new and old members whose la
  • Scenario 1 The shopping date occurs on the bitmap_table_A corresponding to the new buyer on the 0827 line, as shown in Figure 3.
  • the circle selection process is actually to perform the rb_andnot_cardinality bitmap operation on the bitmap set ⁇ A, D ⁇ and the bitmap set ⁇ A, C ⁇ .
  • the circled result is ⁇ D ⁇ , and the number of new buyers counted is 1.
  • Scenario 2 The shopping date occurs at 0827.
  • the bitmap_table_B corresponding to the new buyer of the online air conditioner is shown in Figure 4.
  • the statistical shopping date occurs at 0827 for the new buyer of the online air conditioner.
  • the bitmap collection ⁇ C,A ⁇ and the bitmap collection ⁇ C ⁇ Performing rb_andnot_cardinality bitwise operation, the circled result is ⁇ A ⁇ , and the number of new buyers counted is 1.
  • the online ice wash new buyer whose shopping date occurs at 0827 is actually a collection of bitmaps ⁇ D ⁇ and the empty set of bitmap do rb_andnot_cardinality bit operation, the result of circle selection is ⁇ D ⁇ , and the number of new buyers counted is 1.
  • bitmap_table_A and bitmap_table_B are used as rb_and_cardinality operations, that is, bitmap collection ⁇ A ⁇ and bitmap collection ⁇ A, C ⁇ Performing the rb_andnot_cardinality bit operation, the result of the circle selection is an empty set, and the number of new buyers counted is 0; similarly, how many new buyers are online ice wash new buyers whose shopping date occurs at 0827, in fact, it is Perform the rb_andnot_cardinality bitmap operation on the bitmap set ⁇ D ⁇ and the bitmap set ⁇ A, C ⁇ , the circled result is ⁇ D ⁇ , and the number of new buyers counted is 1.
  • this embodiment provides a member data circle selection system for an e-commerce platform, including:
  • Data model creation unit 1 used to synchronize member consumption data from the data warehouse to create multiple data models
  • the dictionary table creation unit 2 generates multiple different integer identifiers based on the member code in the data model, and saves the mapping relationship between the member code and the integer identifier in the dictionary table;
  • the bitmap generating unit 3 is used to generate a bitmap by one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data;
  • the query output unit 4 is configured to perform bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions, and output the circle selection results.
  • bitmap updating unit 6 connected to the bitmap generating unit 3, and the bitmap updating unit 6 is configured to periodically supplement and update the field data in the bitmap to generate a bitmap corresponding to the current time node.
  • bitmap updating unit 6 includes:
  • the data acquisition module 51 is used to acquire new member consumption data from the data warehouse based on the current time node, and synchronize it to the data model;
  • the bitmap update module 52 is used to periodically add the corresponding field data in the newly added member consumption data to the bitmap according to the mapping relationship according to the mapping relationship of the member code in the dictionary table, so as to realize the supplement and update of the bitmap.
  • it further includes a data cleaning unit 6 provided between the bitmap generating unit 3 and the query output unit 4;
  • the data cleaning unit 6 is used to clean the bit chart, and eliminate irrelevant latitude field data.
  • pre-circle selection unit 7 is connected to the output end of the data cleaning unit 6, and the output end of the storage unit 8 is connected to the input end of the query output unit 4;
  • the pre-circle selection unit 7 is used to preset a variety of query instructions, and perform bit operations on the cleaned bit chart in advance to obtain pre-circle selection results that match the multiple query instructions;
  • the storage unit 8 is used to store various pre-circle selection results in a temporary result table for users to query.
  • the query output unit 4 includes:
  • the judgment module 41 is used to receive a user's query instruction, and determine that the query instruction is a preset query instruction;
  • the output module 42 is used to directly output the matching pre-circle selection result from the temporary result table when the judgment result is yes, and when the judgment result is no, add the updated bit map based on the number of integer identifiers. After performing logical operations on each latitude consumption field, the circled selection result is output.
  • the beneficial effects of the member data circle selection system for e-commerce platforms provided by the embodiment of the present invention are the same as the beneficial effects of the member data circle selection method for e-commerce platforms provided in the first embodiment. I will not repeat them here.
  • the above-mentioned inventive method can be implemented by a program instructing relevant hardware.
  • the above-mentioned program can be stored in a computer readable storage medium. When the program is executed, it includes For each step of the method in the foregoing embodiment, the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, etc.

Abstract

Provided are a method and system for clustering member data on an electronic commerce platform, capable of reducing consumption of memory and computation resources while ensuring precision of clustered member consumption data, thereby significantly enhancing the efficiency of clustering member consumption data. The method comprises: synchronizing member consumption data in a data warehouse to create a multiple-data model; generating multiple mutually different integer identifiers on the basis of member serial numbers in the data model, and storing mapping relationships between the member serial numbers and the integer identifiers in a dictionary table; generating a bitmap on the basis of one-to-one correspondence relationships between the integer identifiers and multiple latitude consumption fields in the member consumption data; and using the integer identifiers to perform, according to a query command of a user, a bit operation with respect to the multiple latitude consumption fields in the bitmap, and outputting a clustered result. The system comprises the method proposed in the above solution.

Description

用于电商平台的会员数据圈选方法及系统Member data circle selection method and system for e-commerce platform 技术领域Technical field
本发明涉及数据处理技术领域,尤其涉及一种用于电商平台的会员数据圈选方法及系统。The present invention relates to the technical field of data processing, and in particular to a method and system for circle selection of member data for an e-commerce platform.
背景技术Background technique
对于电商平台而言,会员消费数据的分析结果是策划促销活动的重要参考,在会员消费数据的分析过程中,数据的多纬度圈选是获取会员消费数据的关键步骤;现有技术中,常用的数据圈选方法包括预汇总圈选法(OLAP-Druid)和分布式内存计算圈选法(SPARK)两种,对于OLAP-Druid圈选法,其主要采用的是HLL算法对会员的消费数据进行计算分析,该算法在去重的业务场景中的精度经常会丢失,故存在会员消费数据圈选结果不精确的问题,对于SPARK圈选法,其通过获取会员的原始消费数据,采用拉明细的方式将原始消费数据放在分布式节点内存中进行逻辑计算,最终得到会员消费数据圈选结果,在实际应用中发现,由于会员的原始消费数据量极大,虽然后者圈选方法能够保证会员消费数据圈选结果的精确性,但是需要消耗较多的计算资源和内存资源,故存在圈选效率低和用户体验差的缺陷。For e-commerce platforms, the analysis result of member consumption data is an important reference for planning promotional activities. In the analysis process of member consumption data, multi-dimensional circle selection of data is a key step to obtain member consumption data; in the prior art, Commonly used data circle selection methods include pre-summarization circle selection method (OLAP-Druid) and distributed memory computing circle selection method (SPARK). For OLAP-Druid circle selection method, it mainly uses HLL algorithm to consume members. Data is calculated and analyzed. The accuracy of the algorithm in deduplication business scenarios is often lost, so there is a problem of inaccurate membership consumption data circle selection results. For the SPARK circle selection method, it obtains the original consumption data of members and uses pull The detailed method puts the original consumption data in the memory of the distributed node for logical calculation, and finally obtains the result of member consumption data circle selection. In practical applications, it is found that due to the huge amount of original consumption data of members, although the latter circle selection method can To ensure the accuracy of the results of member consumption data circle selection, but it needs to consume more computing resources and memory resources, so there are defects of low circle selection efficiency and poor user experience.
发明内容Summary of the invention
本发明的目的在于提供一种用于电商平台的会员数据圈选方法及系统,能够在减少内存及计算资源消耗的同时保证会员消费数据圈选结果的精度,显著提高了会员消费数据的圈选效率。The purpose of the present invention is to provide a member data circle selection method and system for an e-commerce platform, which can reduce the consumption of memory and computing resources while ensuring the accuracy of member consumption data circle selection results, and significantly improve the circle of member consumption data. Election efficiency.
为了实现上述目的,本发明的一方面提供一种用于电商平台的会员数据圈选方法,包括:In order to achieve the above objective, one aspect of the present invention provides a member data circle selection method for an e-commerce platform, including:
从数据仓库中同步会员消费数据创建多数据模型;Create multiple data models by synchronizing member consumption data from the data warehouse;
基于所述数据模型中的会员编码生成多个互不相同的整数标识符,并将 所述会员编码与所述整数标识符的映射关系保存于字典表中;Generating a plurality of different integer identifiers based on the member code in the data model, and storing the mapping relationship between the member code and the integer identifier in a dictionary table;
将所述整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表;Generating a bitmap in one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data;
根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果。Perform bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions, and output the circled selection results.
优选地,在步骤将所述整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表之后还包括:Preferably, after the step of generating a bitmap by one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data, the method further includes:
定期对所述位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表。The field data in the bitmap is regularly supplemented and updated to generate a bitmap corresponding to the current time node.
具体地,所述步骤,定期对所述位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表的方法包括:Specifically, the step of periodically supplementing and updating the field data in the bitmap, and generating a bitmap corresponding to the current time node includes:
基于当前时间节点从数据仓库中获取新增会员消费数据,同步至数据模型中;Obtain new member consumption data from the data warehouse based on the current time node and synchronize it to the data model;
根据字典表中会员编码的映射关系,定期将新增会员消费数据中对应的字段数据按照映射关系补入位图表,实现位图表的补入更新。According to the mapping relationship of the member code in the dictionary table, the corresponding field data in the newly added member consumption data is regularly supplemented into the bit chart according to the mapping relationship to realize the supplement and update of the bit chart.
较佳地,定期对所述位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表之后还包括:Preferably, periodically supplementing and updating the field data in the bitmap, and generating the bitmap corresponding to the current time node further includes:
对所述位图表进行清洗处理,剔除无关纬度字段数据。The bitmap is cleaned, and irrelevant latitude field data is removed.
进一步地,在步骤根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果之前还包括:Further, in the step, performing a bit operation on the multiple latitude consumption fields in the bit chart through the integer identifier according to the user's query instruction, and before outputting the circle selection result, it also includes:
预设多种查询指令,提前对清洗处理后的位图表进行位运算得到与多种查询指令匹配的预圈选结果;Preset a variety of query instructions, and perform bit operations on the cleaned bit chart in advance to obtain pre-circling results matching multiple query instructions;
将多种预圈选结果存储在临时结果表中以备用户查询。Store multiple pre-circle selection results in a temporary result table for user inquiries.
优选地,根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果的方法包括:Preferably, according to the user's query instruction, the multiple latitude consumption fields in the bit chart are bitwise operated through the integer identifier, and the method of outputting the circled selection result includes:
接收用户的查询指令,判断所述查询指令是为预设置查询指令;Receiving a query instruction from a user, and determining that the query instruction is a preset query instruction;
在判断结果为是时,直接从临时结果表中匹配对应的预圈选结果输出, 在判断结果为否时,基于补入更新后的位图表通过整数标识符对其中多个纬度消费字段进行逻辑运算后输出圈选结果。When the judgment result is yes, directly match the corresponding pre-circle selection result output from the temporary result table. When the judgment result is no, perform logic on multiple latitude consumption fields through the integer identifier based on the updated bitmap After calculation, the result of circle selection is output.
与现有技术相比,本发明提供的用于电商平台的会员数据圈选方法具有以下有益效果:Compared with the prior art, the member data circle selection method for e-commerce platforms provided by the present invention has the following beneficial effects:
本发明提供的用于电商平台的会员数据圈选方法,首先从数据仓库中获取会员消费数据创建数据模型,其中,数据模型中包括会员编码、多个纬度消费字段以及消费日期,通过将会员编码一一进行转换成整数标识符,并将整数标识符与会员编码的映射关系保存在字典表中,然后利用整数标识符、消费字段和消费日期构建位图表,在获取用户的查询指令后调用整数标识符对位图表中的多个纬度消费字段进行逻辑位运算,最终得到圈选结果。The member data circle selection method for e-commerce platform provided by the present invention firstly obtains member consumption data from a data warehouse to create a data model. The data model includes member codes, multiple latitude consumption fields, and consumption dates. The codes are converted into integer identifiers one by one, and the mapping relationship between the integer identifier and the member code is stored in the dictionary table, and then the integer identifier, the consumption field and the consumption date are used to construct a bit chart, which is called after obtaining the user's query instruction The integer identifier performs logical bit operations on multiple latitude consumption fields in the bit chart, and finally obtains the circled result.
可见,使用本发明提供的用于电商平台的会员数据圈选方法,通过整数标识符代替会员编码并将会员消费数据通过位图表(bitmap表)表示,以实现会员数据的圈选仅通过bitmap表中的集合位运算即可得到,从而在减少计算资源和存储资源的同时能够显著提高运算效率,特别适用于海量会员数据的圈选运算。It can be seen that using the member data circle selection method for e-commerce platforms provided by the present invention, the member code is replaced by an integer identifier and the member consumption data is represented by a bitmap (bitmap table), so as to realize the circle selection of member data only through the bitmap The set bit operation in the table can be obtained, which can significantly improve the operation efficiency while reducing the computing resources and storage resources, and is especially suitable for the circle selection operation of massive member data.
本发明的另一方面提供一种用于电商平台的会员数据圈选系统,应用于上述技术方案所述的用于电商平台的会员数据圈选方法中,所述系统包括:Another aspect of the present invention provides a member data circle selection system for an e-commerce platform, which is applied to the member data circle selection method for an e-commerce platform described in the above technical solution, and the system includes:
数据模型创建单元,用于从数据仓库中同步会员消费数据创建多数据模型;The data model creation unit is used to synchronize member consumption data from the data warehouse to create multiple data models;
字典表创建单元,基于数据模型中的会员编码生成多个互不相同的整数标识符,并将会员编码与整数标识符的映射关系保存于字典表中;The dictionary table creation unit generates multiple different integer identifiers based on the member code in the data model, and saves the mapping relationship between the member code and the integer identifier in the dictionary table;
位图表生成单元,用于将整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表;The bitmap generating unit is used to generate a bitmap by one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data;
查询输出单元,用于根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果。The query output unit is used to perform bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions, and output the circle selection results.
优选地,还包括与所述位图表生成单元连接的位图表更新单元;Preferably, it further includes a bitmap updating unit connected to the bitmap generating unit;
所述位图表更新单元用于定期对所述位图表中的字段数据进行补入更新, 生成与当前时间节点对应的位图表。The bitmap update unit is used to periodically supplement and update the field data in the bitmap, and generate a bitmap corresponding to the current time node.
较佳地,所述位图表更新单元包括:Preferably, the bitmap update unit includes:
数据获取模块,用于基于当前时间节点从数据仓库中获取新增会员消费数据,同步至数据模型中;The data acquisition module is used to acquire new member consumption data from the data warehouse based on the current time node and synchronize it to the data model;
位图表更新模块,用于根据字典表中会员编码的映射关系,定期将新增会员消费数据中对应的字段数据按照映射关系补入位图表,实现位图表的补入更新。The bit chart update module is used to regularly add the corresponding field data in the newly added member consumption data into the bit chart according to the mapping relationship according to the mapping relationship of the member code in the dictionary table to realize the supplement and update of the bit chart.
较佳地,还包括设在所述位图表生成单元和所述查询输出单元之间的数据清洗单元;Preferably, it further includes a data cleaning unit provided between the bitmap generating unit and the query output unit;
所述数据清洗单元用于对所述位图表进行清洗处理,剔除无关纬度字段数据。The data cleaning unit is used for cleaning the bitmap to eliminate irrelevant latitude field data.
优选地,还包括预圈选单元和存储单元,所述预圈选单元的输入端和所述数据清洗单元的输出端连接,所述存储单元的输出端与所述查询输出单元的输入端连接;Preferably, it further includes a pre-circle selection unit and a storage unit, the input end of the pre-circle selection unit is connected to the output end of the data cleaning unit, and the output end of the storage unit is connected to the input end of the query output unit ;
所述预圈选单元用于预设多种查询指令,提前对清洗处理后的位图表进行位运算得到与多种查询指令匹配的预圈选结果;The pre-circle selection unit is used to preset a variety of query instructions, and perform bit operations on the cleaned bit graph in advance to obtain pre-circle selection results that match the multiple query instructions;
所述存储单元用于将多种预圈选结果存储在临时结果表中以备用户查询。The storage unit is used to store various pre-circle selection results in a temporary result table for users to query.
优选地,所述查询输出单元包括:Preferably, the query output unit includes:
判断模块,用于接收用户的查询指令,判断所述查询指令是为预设置查询指令;The judgment module is used to receive a user's query instruction, and determine that the query instruction is a preset query instruction;
输出模块,用于在判断结果为是时,直接从临时结果表中匹配对应的预圈选结果输出,在判断结果为否时,基于补入更新后的位图表通过整数标识符对其中多个纬度消费字段进行逻辑运算后输出圈选结果。The output module is used to directly output the corresponding pre-circle selection results from the temporary result table when the judgment result is yes. When the judgment result is no, use the integer identifier to add the updated bit map to multiple The latitude consumption field performs logical operations and outputs the result of circle selection.
与现有技术相比,本发明提供的用于电商平台的会员数据圈选系统的有益效果与上述技术方案提供的用于电商平台的会员数据圈选方法的有益效果相同,在此不做赘述。Compared with the prior art, the beneficial effects of the member data circle selection system for e-commerce platforms provided by the present invention are the same as the beneficial effects of the member data circle selection method for e-commerce platforms provided by the above technical solutions. Do repeat.
附图说明BRIEF DESCRIPTION
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present invention and constitute a part of the present invention. The exemplary embodiments and descriptions of the present invention are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the drawings:
图1为本发明实施例一中用于电商平台的会员数据圈选方法的流程示意图;FIG. 1 is a schematic flowchart of a member data circle selection method used on an e-commerce platform in Embodiment 1 of the present invention;
图2为本发明实施例一中采用用于电商平台的会员数据圈选方法获得的圈选结果示例图;2 is an example diagram of circle selection results obtained by using a member data circle selection method for an e-commerce platform in Embodiment 1 of the present invention;
图3为本发明实施例一中bitmap_table_A统计表示例图;Figure 3 is an example diagram of bitmap_table_A statistical representation in the first embodiment of the present invention;
图4为本发明实施例一中bitmap_table_B统计表示例图;4 is an example diagram of the bitmap_table_B statistical representation in the first embodiment of the present invention;
图5为本发明实施例二中用于电商平台的会员数据圈选系统的结构框图。Fig. 5 is a structural block diagram of a member data circle selection system for an e-commerce platform in the second embodiment of the present invention.
附图标记:Reference mark:
1-数据模型创建单元,                       2-字典表创建单元;1-Data model creation unit, 2-Dictionary table creation unit;
3-位图表生成单元,                         4-查询输出单元;3-digit chart generation unit, 4-query output unit;
5-位图表更新单元,                         6-数据清洗单元;5-digit chart update unit, 6-data cleaning unit;
7-预圈选单元,                             8-存储单元;7-pre-circle selection unit, 8-storage unit;
41-判断模块,                              42-输出模块;41-Judgment module, 42-Output module;
51-数据获取模块,                          52-位图表更新模块。51-Data acquisition module, 52-bit chart update module.
具体实施方式detailed description
为使本发明的上述目的、特征和优点能够更加明显易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例,均属于本发明保护的范围。In order to make the above objectives, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
实施例一Example one
请参阅图1,本实施例提供一种用于电商平台的会员数据圈选方法,包括:Referring to FIG. 1, this embodiment provides a method for circle selection of member data for an e-commerce platform, including:
从数据仓库中同步会员消费数据创建多数据模型;基于数据模型中的会员编码生成多个互不相同的整数标识符,并将会员编码与整数标识符的映射关系保存于字典表中;将整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表;根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果。Synchronize member consumption data from the data warehouse to create a multi-data model; generate multiple different integer identifiers based on the member code in the data model, and save the mapping relationship between the member code and the integer identifier in the dictionary table; store the integer The identifier and the multiple latitude consumption fields of the member consumption data generate a bit chart in one-to-one correspondence; according to the user's query instruction, the multiple latitude consumption fields in the bit chart are bit-calculated through the integer identifier to output the circled selection result.
本实施例提供的用于电商平台的会员数据圈选方法,首先从数据仓库中获取会员消费数据创建数据模型,其中,数据模型中包括会员编码、多个纬度消费字段以及消费日期,通过将会员编码一一进行转换成整数标识符,并将整数标识符与会员编码的映射关系保存在字典表中,然后利用整数标识符、消费字段和消费日期构建位图表,在获取用户的查询指令后调用整数标识符对位图表中的多个纬度消费字段进行逻辑位运算,最终得到圈选结果。The member data circle selection method for e-commerce platform provided in this embodiment first obtains member consumption data from a data warehouse to create a data model. The data model includes member codes, multiple latitude consumption fields, and consumption dates. The member codes are converted into integer identifiers one by one, and the mapping relationship between the integer identifier and the member code is stored in the dictionary table, and then the integer identifier, consumption field and consumption date are used to construct a bitmap. After obtaining the user's query instruction Call the integer identifier to perform logical bit operations on multiple latitude consumption fields in the bit chart, and finally get the circled result.
可见,使用本实施例提供的用于电商平台的会员数据圈选方法,通过整数标识符代替会员编码并将会员消费数据通过位图表(bitmap表)表示,以实现会员数据的圈选仅通过bitmap表中的集合位运算即可得到,从而在减少计算资源和存储资源的同时能够显著提高运算效率,特别适用于海量会员数据的圈选运算。It can be seen that using the member data circle selection method for the e-commerce platform provided in this embodiment, the member code is replaced by an integer identifier and the member consumption data is represented by a bitmap table, so that the circle selection of member data is only passed The set bit operation in the bitmap table can be obtained, which can significantly improve the operation efficiency while reducing the computing resources and storage resources. It is especially suitable for the circle selection operation of massive member data.
请继续参阅图1,考虑到会员消费数据每天都处于不断更新的状态,为了避免位图表数据存在滞后性,本实施例在步骤将整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表之后还包括:定期对位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表。Please continue to refer to Figure 1. Considering that the member consumption data is constantly updated every day, in order to avoid the lag in the bit chart data, this embodiment corresponds to the integer identifier and the multiple latitude consumption fields of the member consumption data. After generating the bit chart, it also includes: regularly supplementing and updating the field data in the bit chart, and generating a bit chart corresponding to the current time node.
具体地,上述实施例中定期对位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表的方法包括:Specifically, the method for periodically supplementing and updating field data in the bitmap in the foregoing embodiment, and generating a bitmap corresponding to the current time node includes:
基于当前时间节点从数据仓库中获取新增会员消费数据,同步至数据模型中;根据字典表中会员编码的映射关系,定期将新增会员消费数据中对应的字段数据按照映射关系补入位图表,实现位图表的补入更新。Obtain new member consumption data from the data warehouse based on the current time node and synchronize it to the data model; according to the mapping relationship of the member code in the dictionary table, the corresponding field data in the newly added member consumption data is added to the bit chart according to the mapping relationship on a regular basis , Realize the supplementary update of the bit chart.
具体实施时,定期从数据仓库中获取更新的会员消费数据,然后基于字典表中的映射关系利用指定的补数函数执行补数操作,以实现对位图表中字 段数据的补入更新;为了便于理解补入更新的具体过程,以下进行举例说明,首先从数据模型中获取更新的会员消费数据,并识别其中发生消费数据更新的会员编码,然后通过字典表对会员编码进行转化得到与之匹配的整数标识符,接着获取与整数标识符对应的当天(flag=1)更新的消费字段数据存储到bitmap表中,通过将当天(flag=1)统计的消费字段数据与前一天(flag=2)统计的消费字段数据进行数据整合,作为当前(flag=2)统计的消费字段数据插入到bitmap表中,完成bitmap表的补入更新,需要说明的是,flag=1仅表示当天更新的消费字段数据,flag=2表示当前全部的消费字段数据,其包括当天更新的消费字段数据和在当天之前的全部消费字段数据。通过上述实施过程可知,本实施例无需对所有历史数据进行重复计算,只需通过叠加整合的方式对bitmap表中的消费字段数据进行持续更新,实现在减少计算量的同时保证圈选结果的准确性。In specific implementation, the updated member consumption data is periodically obtained from the data warehouse, and then the specified complement function is used to perform the complement operation based on the mapping relationship in the dictionary table to realize the supplementary update of the field data in the alignment chart; for convenience To understand the specific process of the supplementary update, the following is an example. First, obtain the updated member consumption data from the data model, and identify the member code in which the consumption data update occurs, and then use the dictionary table to convert the member code to obtain the matching Integer identifier, and then obtain the updated consumption field data of the day (flag=1) corresponding to the integer identifier and store it in the bitmap table, by comparing the consumption field data of the current day (flag=1) with the previous day (flag=2) Data integration of the statistical consumption field data is inserted into the bitmap table as the current (flag=2) statistical consumption field data to complete the supplementary update of the bitmap table. It should be noted that flag=1 only means the consumption field updated on the day Data, flag=2 represents all current consumption field data, which includes the consumption field data updated on the current day and all the consumption field data before the current day. Through the above implementation process, it can be seen that this embodiment does not need to perform repeated calculations on all historical data. It only needs to continuously update the consumption field data in the bitmap table by means of superposition and integration, so as to reduce the amount of calculation and ensure the accuracy of the circled results. Sex.
可选地,请接着参阅图1,上述实施例中在步骤,定期对位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表之后还包括:对位图表进行清洗处理,剔除无关纬度字段数据。Optionally, please refer to FIG. 1. In the above embodiment, in the step, the field data in the bitmap is regularly supplemented and updated, and after the bitmap corresponding to the current time node is generated, it also includes: cleaning the bitmap, Eliminate irrelevant latitude field data.
具体实施时,此步骤相当于创建CUBE模型,在剔除位图表中无关维度的字段数据后降低了group by后的数据基数,能够提高查询效率,加快圈选速度。In specific implementation, this step is equivalent to creating a CUBE model, which reduces the data base after group by after excluding field data of irrelevant dimensions in the bit chart, which can improve query efficiency and speed up circle selection.
为了进一步加快圈选速度,本实施例在步骤根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果之前还包括:In order to further speed up the circle selection speed, this embodiment performs bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions. Before outputting the circle selection results, it also includes:
预设置多种查询指令,提前对清洗处理后的位图表进行位运算得到与多种查询指令匹配的预圈选结果;将多种预圈选结果存储在临时结果表中以备用户查询。Pre-set a variety of query instructions, perform bit operations on the cleaned bit chart in advance to obtain pre-circle selection results matching multiple query instructions; store multiple pre-circle selection results in a temporary result table for user inquiries.
具体实施时,考虑到位图表中的字段数据量巨大,若采用实时运算的方式则会造成圈选结果的输出有一定的延迟,鉴于此,本实施例通过将常用的查询指令预存储,以使系统能够根据这些查询指令提前对清洗处理后的位图 表进行位运算得到对应的预圈选结果,并将这些预圈选结果存储在临时结果表中以在用户查询时能够直接调出。In specific implementation, considering the huge amount of field data in the bitmap, if real-time calculations are used, the output of the circled selection results will be delayed to a certain extent. In view of this, this embodiment pre-stores commonly used query instructions to make The system can perform bit operations on the cleaned bit map in advance according to these query instructions to obtain the corresponding pre-circle selection results, and store these pre-circle selection results in the temporary result table so that they can be directly called when the user queries.
进一步地,上述实施例中步骤,根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果的方法包括:Further, in the above-mentioned embodiment, the step of performing bit operations on the multiple latitude consumption fields in the bit chart through the integer identifier according to the user's query instruction, and the method of outputting the circle selection result includes:
接收用户的查询指令,判断查询指令是为预存储查询指令;在判断结果为是时,直接从临时结果表中匹配对应的预圈选结果输出,在判断结果为否时,基于补入更新后的位图表通过整数标识符对其中多个纬度消费字段进行逻辑运算后输出圈选结果。可见,通过上述两种计算模式的设定,能够在用户发出的查询指令与预存储的查询指令匹配时,直接从临时结果表中调出预圈选结果输出,此时能够缩减计算等待时间,而在用户发出的查询指令不能够与用户发出的查询指令匹配时,可直接基于查询指令对补入更新后的位图表进行逻辑运算输出圈选结果,也即通过实时运算输出圈选结果,扩展了用户的查询范围,支持用户自定义多维圈选功能。Receive the user's query instruction, and determine whether the query instruction is a pre-stored query instruction; when the judgment result is yes, directly match the corresponding pre-circle selection result output from the temporary result table, and when the judgment result is no, it will be updated based on the supplement The bitmap of, performs logical operations on multiple latitude consumption fields through integer identifiers, and outputs the result of circle selection. It can be seen that by setting the above two calculation modes, when the query instruction sent by the user matches the pre-stored query instruction, the pre-circle selection result output can be directly called from the temporary result table, and the calculation waiting time can be reduced at this time. When the query command sent by the user cannot match the query command sent by the user, the updated bit chart can be logically calculated based on the query command to output the circled selection result, that is, the circled selection result can be output through real-time calculation and expanded. The user's query range is improved, and the user-defined multi-dimensional circle selection function is supported.
为了便于理解,本实施例以新老会员消费数据的圈选过程为例进行说明,如图2所示,从数据仓库中获取日期在0826和0827两天的会员消费数据创建数据模型,该数据模型中包括会员编码、购物渠道、购物品类和购物日期等多个纬度的消费字段,通过调用字典表中的映射关系将会员编码转换成便于bitmap表运算的整数标识符,为了便于理解区分暂且将其用字母A、B、C、D表示,通过对位图表进行清洗处理后分别统计每天不同购物渠道新老会员的消费数据bitmap_table_A,统计的维度包括购物渠道(线上或线下)+标签(flag=1或flag=2)+购物日期(0826或0827),每天的统计信息会有两条,一条是当天(flag=1)会员的bitmap表集合,另外一条是当前(flag=2)会员的bitmap表集合,其次还需要统计纬度为购物品类+购物渠道+标签(flag=1或flag=2)+购物日期(0826或0827)的新老会员的消费数据bitmap_table_B,每天的统计信息同样有两条,一条是当天(flag=1)会员的bitmap表集合,另外一条是当前(flag=2)会员的bitmap表集合。然后分别根据查询指令func()进行如下三种场景的圈选:For ease of understanding, this embodiment takes the circle selection process of new and old member consumption data as an example. As shown in Figure 2, the member consumption data with the date of 0826 and 0827 is obtained from the data warehouse to create a data model. The model includes multiple latitudes of consumption fields such as membership code, shopping channel, shopping category, and shopping date. By calling the mapping relationship in the dictionary table, the membership code is converted into an integer identifier that is convenient for bitmap table operations. It is represented by the letters A, B, C, and D. After cleaning the bitmap chart, the consumption data bitmap_table_A of new and old members of different shopping channels are calculated. The statistical dimensions include shopping channels (online or offline) + tags ( flag = 1 or flag = 2) + shopping date (0826 or 0827), there will be two statistical information per day, one is the bitmap table collection of the members of the day (flag=1), and the other is the current (flag=2) members The bitmap table collection, secondly, you need to count the consumption data bitmap_table_B of new and old members whose latitude is shopping category + shopping channel + label (flag=1 or flag=2) + shopping date (0826 or 0827), and the daily statistical information is also available There are two, one is the bitmap table set of the members of the day (flag=1), and the other is the bitmap table set of the current (flag=2) members. Then circle the following three scenarios according to the query command func():
场景一:购物日期发生在0827线上新买家对应的bitmap_table_A如图3所示,圈选过程实则就是将bitmap集合{A,D}和bitmap集合{A,C}进行rb_andnot_cardinality位运算,得到的圈选结果为{D},统计的新买家数量为1。Scenario 1: The shopping date occurs on the bitmap_table_A corresponding to the new buyer on the 0827 line, as shown in Figure 3. The circle selection process is actually to perform the rb_andnot_cardinality bitmap operation on the bitmap set {A, D} and the bitmap set {A, C}. The circled result is {D}, and the number of new buyers counted is 1.
场景二:购物日期发生在0827线上空调新买家对应的bitmap_table_B如图4所示,统计的购物日期发生在0827的线上空调新买家实则就是将bitmap集合{C,A}和bitmap集合{C}进行rb_andnot_cardinality位运算,得到的圈选结果为{A},统计的新买家的数量为1;同理,购物日期发生在0827的线上冰洗新买家实则就是将bitmap集合{D}和bitmap空集合做rb_andnot_cardinality位运算,得到的圈选结果为{D},统计的新买家数量为1。Scenario 2: The shopping date occurs at 0827. The bitmap_table_B corresponding to the new buyer of the online air conditioner is shown in Figure 4. The statistical shopping date occurs at 0827 for the new buyer of the online air conditioner. In fact, the bitmap collection {C,A} and the bitmap collection {C} Performing rb_andnot_cardinality bitwise operation, the circled result is {A}, and the number of new buyers counted is 1. In the same way, the online ice wash new buyer whose shopping date occurs at 0827 is actually a collection of bitmaps{ D} and the empty set of bitmap do rb_andnot_cardinality bit operation, the result of circle selection is {D}, and the number of new buyers counted is 1.
场景三:购物日期发生在0827的线上空调新买家中有多少是线上新买家,实则就是将bitmap_table_A和bitmap_table_B做rb_and_cardinality操作,也即将bitmap集合{A}和bitmap集合{A,C}进行rb_andnot_cardinality位运算,得到的圈选结果为空集,统计的新买家数量为0;同理,购物日期发生在0827的线上冰洗新买家有多少是线上新买家,实则就是将bitmap集合{D}和bitmap集合{A,C}进行rb_andnot_cardinality位运算,得到的圈选结果为{D},统计的新买家数量为1。Scenario 3: How many of the new online air-conditioning buyers whose shopping date occurs at 0827 are online new buyers. In fact, bitmap_table_A and bitmap_table_B are used as rb_and_cardinality operations, that is, bitmap collection {A} and bitmap collection {A, C} Performing the rb_andnot_cardinality bit operation, the result of the circle selection is an empty set, and the number of new buyers counted is 0; similarly, how many new buyers are online ice wash new buyers whose shopping date occurs at 0827, in fact, it is Perform the rb_andnot_cardinality bitmap operation on the bitmap set {D} and the bitmap set {A, C}, the circled result is {D}, and the number of new buyers counted is 1.
实施例二Example 2
请参阅图1和图5,本实施例提供一种用于电商平台的会员数据圈选系统,包括:1 and 5, this embodiment provides a member data circle selection system for an e-commerce platform, including:
数据模型创建单元1,用于从数据仓库中同步会员消费数据创建多数据模型;Data model creation unit 1, used to synchronize member consumption data from the data warehouse to create multiple data models;
字典表创建单元2,基于数据模型中的会员编码生成多个互不相同的整数标识符,并将会员编码与整数标识符的映射关系保存于字典表中;The dictionary table creation unit 2 generates multiple different integer identifiers based on the member code in the data model, and saves the mapping relationship between the member code and the integer identifier in the dictionary table;
位图表生成单元3,用于将整数标识符与会员消费数据的多个纬度消费字 段一一对应生成位图表;The bitmap generating unit 3 is used to generate a bitmap by one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data;
查询输出单元4,用于根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果。The query output unit 4 is configured to perform bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions, and output the circle selection results.
优选地,还包括与位图表生成单元3连接的位图表更新单元6,位图表更新单元6用于定期对位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表。Preferably, it further includes a bitmap updating unit 6 connected to the bitmap generating unit 3, and the bitmap updating unit 6 is configured to periodically supplement and update the field data in the bitmap to generate a bitmap corresponding to the current time node.
优选地,位图表更新单元6包括:Preferably, the bitmap updating unit 6 includes:
数据获取模块51,用于基于当前时间节点从数据仓库中获取新增会员消费数据,同步至数据模型中;The data acquisition module 51 is used to acquire new member consumption data from the data warehouse based on the current time node, and synchronize it to the data model;
位图表更新模块52,用于根据字典表中会员编码的映射关系,定期将新增会员消费数据中对应的字段数据按照映射关系补入位图表,实现位图表的补入更新。The bitmap update module 52 is used to periodically add the corresponding field data in the newly added member consumption data to the bitmap according to the mapping relationship according to the mapping relationship of the member code in the dictionary table, so as to realize the supplement and update of the bitmap.
优选地,还包括设在位图表生成单元3和查询输出单元4之间的数据清洗单元6;Preferably, it further includes a data cleaning unit 6 provided between the bitmap generating unit 3 and the query output unit 4;
数据清洗单元6用于对位图表进行清洗处理,剔除无关纬度字段数据。The data cleaning unit 6 is used to clean the bit chart, and eliminate irrelevant latitude field data.
优选地,还包括预圈选单元7和存储单元8,预圈选单元7的输入端和数据清洗单元6的输出端连接,存储单元8的输出端与查询输出单元4的输入端连接;Preferably, it further includes a pre-circle selection unit 7 and a storage unit 8. The input end of the pre-circle selection unit 7 is connected to the output end of the data cleaning unit 6, and the output end of the storage unit 8 is connected to the input end of the query output unit 4;
预圈选单元7用于预设多种查询指令,提前对清洗处理后的位图表进行位运算得到与多种查询指令匹配的预圈选结果;The pre-circle selection unit 7 is used to preset a variety of query instructions, and perform bit operations on the cleaned bit chart in advance to obtain pre-circle selection results that match the multiple query instructions;
存储单元8用于将多种预圈选结果存储在临时结果表中以备用户查询。The storage unit 8 is used to store various pre-circle selection results in a temporary result table for users to query.
优选地,查询输出单元4包括:Preferably, the query output unit 4 includes:
判断模块41,用于接收用户的查询指令,判断查询指令是为预设置查询指令;The judgment module 41 is used to receive a user's query instruction, and determine that the query instruction is a preset query instruction;
输出模块42,用于在判断结果为是时,直接从临时结果表中匹配对应的预圈选结果输出,在判断结果为否时,基于补入更新后的位图表通过整数标识符对其中多个纬度消费字段进行逻辑运算后输出圈选结果。The output module 42 is used to directly output the matching pre-circle selection result from the temporary result table when the judgment result is yes, and when the judgment result is no, add the updated bit map based on the number of integer identifiers. After performing logical operations on each latitude consumption field, the circled selection result is output.
与现有技术相比,本发明实施例提供的用于电商平台的会员数据圈选系统的有益效果与上述实施例一提供的用于电商平台的会员数据圈选方法的有益效果相同,在此不做赘述。Compared with the prior art, the beneficial effects of the member data circle selection system for e-commerce platforms provided by the embodiment of the present invention are the same as the beneficial effects of the member data circle selection method for e-commerce platforms provided in the first embodiment. I will not repeat them here.
本领域普通技术人员可以理解,实现上述发明方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,上述程序可以存储于计算机可读取存储介质中,该程序在执行时,包括上述实施例方法的各步骤,而所述的存储介质可以是:ROM/RAM、磁碟、光盘、存储卡等。Those of ordinary skill in the art can understand that all or part of the steps in the above-mentioned inventive method can be implemented by a program instructing relevant hardware. The above-mentioned program can be stored in a computer readable storage medium. When the program is executed, it includes For each step of the method in the foregoing embodiment, the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, etc.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above are only the specific embodiments of the present invention, but the scope of protection of the present invention is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed by the present invention. It should be covered by the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (12)

  1. 一种用于电商平台的会员数据圈选方法,其特征在于,包括:A member data circle selection method for e-commerce platforms, characterized in that it includes:
    从数据仓库中同步会员消费数据创建多数据模型;Create multiple data models by synchronizing member consumption data from the data warehouse;
    基于所述数据模型中的会员编码生成多个互不相同的整数标识符,并将所述会员编码与所述整数标识符的映射关系保存于字典表中;Generating a plurality of different integer identifiers based on the member code in the data model, and storing the mapping relationship between the member code and the integer identifier in a dictionary table;
    将所述整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表;Generating a bitmap in one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data;
    根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果。Perform bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions, and output the circled selection results.
  2. 根据权利要求1所述的方法,其特征在于,在步骤,将所述整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表之后还包括:The method according to claim 1, wherein after the step of generating a bitmap by one-to-one correspondence between the integer identifier and the multiple latitude consumption fields of the member consumption data, the method further comprises:
    定期对所述位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表。The field data in the bitmap is regularly supplemented and updated to generate a bitmap corresponding to the current time node.
  3. 根据权利要求2所述的方法,其特征在于,所述步骤,定期对所述位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表的方法包括:The method according to claim 2, wherein the step of periodically supplementing and updating field data in the bitmap, and generating a bitmap corresponding to the current time node comprises:
    基于当前时间节点从数据仓库中获取新增会员消费数据,同步至数据模型中;Obtain new member consumption data from the data warehouse based on the current time node and synchronize it to the data model;
    根据字典表中会员编码的映射关系,定期将新增会员消费数据中对应的字段数据按照映射关系补入位图表,实现位图表的补入更新。According to the mapping relationship of the member code in the dictionary table, the corresponding field data in the newly added member consumption data is regularly supplemented into the bit chart according to the mapping relationship to realize the supplement and update of the bit chart.
  4. 根据权利要求2所述的方法,其特征在于,在所述步骤,定期对所述位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表之后还包括:The method according to claim 2, wherein in the step, periodically supplementing and updating the field data in the bitmap, and generating a bitmap corresponding to the current time node further comprises:
    对所述位图表进行清洗处理,剔除无关纬度字段数据。The bitmap is cleaned, and irrelevant latitude field data is removed.
  5. 根据权利要求4所述的方法,其特征在于,在步骤根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果之前还包括:The method according to claim 4, wherein the step of performing bit operations on the multiple latitude consumption fields in the bitmap through the integer identifier according to the user's query instruction, and before outputting the circled selection result, further comprises:
    预设多种查询指令,提前对清洗处理后的位图表进行位运算得到与多种查询指令匹配的预圈选结果;Preset a variety of query instructions, and perform bit operations on the cleaned bit chart in advance to obtain pre-circling results matching multiple query instructions;
    将多种预圈选结果存储在临时结果表中以备用户查询。Store multiple pre-circle selection results in a temporary result table for user inquiries.
  6. 根据权利要求5所述的方法,其特征在于,所述步骤,根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果的方法包括:The method according to claim 5, wherein the step of performing bit operations on multiple latitude consumption fields in the bitmap through integer identifiers according to the user's query instruction, and the method of outputting the circled selection results comprises:
    接收用户的查询指令,判断所述查询指令是为预设置查询指令;Receiving a query instruction from a user, and determining that the query instruction is a preset query instruction;
    在判断结果为是时,直接从临时结果表中匹配对应的预圈选结果输出,在判断结果为否时,基于补入更新后的位图表通过整数标识符对其中多个纬度消费字段进行逻辑运算后输出圈选结果。When the judgment result is yes, directly match the corresponding pre-circle selection result output from the temporary result table, and when the judgment result is no, logically perform logic on multiple latitude consumption fields through the integer identifier based on the updated bitmap After calculation, the result of circle selection is output.
  7. 一种用于电商平台的会员数据圈选系统,其特征在于,包括:A member data circle selection system for an e-commerce platform is characterized in that it includes:
    数据模型创建单元,用于从数据仓库中同步会员消费数据创建多数据模型;The data model creation unit is used to synchronize member consumption data from the data warehouse to create multiple data models;
    字典表创建单元,基于数据模型中的会员编码生成多个互不相同的整数标识符,并将会员编码与整数标识符的映射关系保存于字典表中;The dictionary table creation unit generates multiple different integer identifiers based on the member code in the data model, and saves the mapping relationship between the member code and the integer identifier in the dictionary table;
    位图表生成单元,用于将整数标识符与会员消费数据的多个纬度消费字段一一对应生成位图表;A bitmap generating unit for generating a bitmap by one-to-one correspondence between the integer identifier and multiple latitude consumption fields of the member consumption data;
    查询输出单元,用于根据用户的查询指令通过整数标识符对位图表中的多个纬度消费字段进行位运算,输出圈选结果。The query output unit is used to perform bit operations on multiple latitude consumption fields in the bit chart through integer identifiers according to the user's query instructions, and output the circle selection results.
  8. 根据权利要求7的系统,其特征在于,还包括与所述位图表生成单元连接的位图表更新单元;The system according to claim 7, characterized in that it further comprises a bitmap updating unit connected to said bitmap generating unit;
    所述位图表更新单元用于定期对所述位图表中的字段数据进行补入更新,生成与当前时间节点对应的位图表。The bitmap updating unit is used to periodically supplement and update the field data in the bitmap to generate a bitmap corresponding to the current time node.
  9. 根据权利要求8所述的系统,其特征在于,所述位图表更新单元包括:The system according to claim 8, wherein the bitmap update unit comprises:
    数据获取模块,用于基于当前时间节点从数据仓库中获取新增会员消费数据,同步至数据模型中;The data acquisition module is used to acquire new member consumption data from the data warehouse based on the current time node and synchronize it to the data model;
    位图表更新模块,用于根据字典表中会员编码的映射关系,定期将新增会员消费数据中对应的字段数据按照映射关系补入位图表,实现位图表的补入更新。The bit chart update module is used to regularly add the corresponding field data in the newly added member consumption data into the bit chart according to the mapping relationship according to the mapping relationship of the member code in the dictionary table to realize the supplement and update of the bit chart.
  10. 根据权利要求8所述的系统,其特征在于,还包括设在所述位图表生成单元和所述查询输出单元之间的数据清洗单元;8. The system according to claim 8, further comprising a data cleaning unit provided between the bitmap generating unit and the query output unit;
    所述数据清洗单元用于对所述位图表进行清洗处理,剔除无关纬度字段数据。The data cleaning unit is used for cleaning the bitmap to eliminate irrelevant latitude field data.
  11. 根据权利要求10所述的系统,其特征在于,还包括预圈选单元和存储单元,所述预圈选单元的输入端和所述数据清洗单元的输出端连接,所述存储单元的输出端与所述查询输出单元的输入端连接;The system according to claim 10, further comprising a pre-circle selection unit and a storage unit, the input end of the pre-circle selection unit is connected to the output end of the data cleaning unit, and the output end of the storage unit Connected to the input end of the query output unit;
    所述预圈选单元用于预设多种查询指令,提前对清洗处理后的位图表进行位运算得到与多种查询指令匹配的预圈选结果;The pre-circle selection unit is used to preset a variety of query instructions, and perform bit operations on the cleaned bit graph in advance to obtain pre-circle selection results that match the multiple query instructions;
    所述存储单元用于将多种预圈选结果存储在临时结果表中以备用户查询。The storage unit is used to store various pre-circle selection results in a temporary result table for users to query.
  12. 根据权利要求11所述的系统,其特征在于,所述查询输出单元包括:The system according to claim 11, wherein the query output unit comprises:
    判断模块,用于接收用户的查询指令,判断所述查询指令是为预设置查询指令;The judgment module is used to receive a user's query instruction, and determine that the query instruction is a preset query instruction;
    输出模块,用于在判断结果为是时,直接从临时结果表中匹配对应的预圈选结果输出,在判断结果为否时,基于补入更新后的位图表通过整数标识 符对其中多个纬度消费字段进行逻辑运算后输出圈选结果。The output module is used to directly output the corresponding pre-circle selection results from the temporary result table when the judgment result is yes, and when the judgment result is no, use the integer identifier to fill in the updated bit map to multiple The latitude consumption field performs logical operations and outputs the result of circle selection.
PCT/CN2019/106863 2019-01-16 2019-09-20 Method and system for clustering member data on electronic commerce platform WO2020147335A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3168300A CA3168300A1 (en) 2019-01-16 2019-09-20 Method for selecting member data for e-commerce platforms and system thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910040702.5A CN111444165B (en) 2019-01-16 2019-01-16 Member data selection method and system for e-commerce platform
CN201910040702.5 2019-01-16

Publications (1)

Publication Number Publication Date
WO2020147335A1 true WO2020147335A1 (en) 2020-07-23

Family

ID=71614009

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/106863 WO2020147335A1 (en) 2019-01-16 2019-09-20 Method and system for clustering member data on electronic commerce platform

Country Status (3)

Country Link
CN (1) CN111444165B (en)
CA (1) CA3168300A1 (en)
WO (1) WO2020147335A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112540972A (en) * 2020-12-16 2021-03-23 中盈优创资讯科技有限公司 Roaring bitmap-based massive user efficient selection method and device
CN115982206B (en) * 2023-02-09 2023-08-29 中国证券登记结算有限责任公司 Method and device for processing data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN104715073A (en) * 2015-04-03 2015-06-17 江苏物联网研究发展中心 Association rule mining system based on improved Apriori algorithm
CN105260442A (en) * 2015-10-08 2016-01-20 西安培华学院 Bit operation and inverted index based association rule mining algorithm
CN107273483A (en) * 2017-06-06 2017-10-20 贵州易鲸捷信息技术有限公司 The access method and system of sparse data
CN107291842A (en) * 2017-06-01 2017-10-24 武汉理工大学 The track querying method encoded based on track
US10009832B1 (en) * 2017-08-11 2018-06-26 At&T Intellectual Property I, L.P. Facilitating compact signaling design for reserved resource configuration in wireless communication systems

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269107B (en) * 2016-12-30 2021-12-14 阿里巴巴集团控股有限公司 User information processing method and device
CN106934636A (en) * 2017-02-28 2017-07-07 杭州搜娱科技有限公司 Integrated management approach and system
CN108415978B (en) * 2018-02-09 2021-04-09 北京腾云天下科技有限公司 User tag storage method, user portrait calculation method and calculation equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN104715073A (en) * 2015-04-03 2015-06-17 江苏物联网研究发展中心 Association rule mining system based on improved Apriori algorithm
CN105260442A (en) * 2015-10-08 2016-01-20 西安培华学院 Bit operation and inverted index based association rule mining algorithm
CN107291842A (en) * 2017-06-01 2017-10-24 武汉理工大学 The track querying method encoded based on track
CN107273483A (en) * 2017-06-06 2017-10-20 贵州易鲸捷信息技术有限公司 The access method and system of sparse data
US10009832B1 (en) * 2017-08-11 2018-06-26 At&T Intellectual Property I, L.P. Facilitating compact signaling design for reserved resource configuration in wireless communication systems

Also Published As

Publication number Publication date
CA3168300A1 (en) 2020-07-23
CN111444165A (en) 2020-07-24
CN111444165B (en) 2022-12-02

Similar Documents

Publication Publication Date Title
EP3380954B1 (en) Storing and retrieving data of a data cube
US20160173122A1 (en) System That Reconfigures Usage of a Storage Device and Method Thereof
CN105808653B (en) A kind of data processing method and device based on user tag system
WO2020147335A1 (en) Method and system for clustering member data on electronic commerce platform
CN105787058B (en) A kind of user tag system and the data delivery system based on user tag system
JP7098327B2 (en) Information processing system, function creation method and function creation program
JP2018116706A (en) Data multidimensional model generation system and data multidimensional model generation method
CN112818048A (en) Hierarchical construction method and device of data warehouse, electronic equipment and storage medium
US20230153281A1 (en) Maintaining a dataset based on periodic cleansing of raw source data
CN110727690A (en) Data updating method
CN107735781B (en) Method and device for storing query result and computing equipment
CN108009847B (en) Method for extracting imbedding characteristics of shop under takeaway scene
CN111737537B (en) POI recommendation method, device and medium based on graph database
CN114385663B (en) Data processing method and device
CN115345678A (en) Freight rate determination method and related device
Zhang et al. A new parameter reduction method based on soft set theory
CN104573095A (en) Large-scale object recognition method based on Hadoop frame
CN111652281B (en) Information data classification method, device and readable storage medium
CN114022188A (en) Target crowd circling method, device, equipment and storage medium
CN112269806B (en) Data query method, device, equipment and computer storage medium
CN111107493A (en) Method and system for predicting position of mobile user
CN112199583B (en) Network public opinion information intelligent processing method and system based on multi-rule association analysis
CN107249029A (en) Actively get method, working node, system and the storage medium of task
JP3006132B2 (en) Large-scale graph decomposition device
CN107609746B (en) Intelligent bidding method based on data OLAP analysis and matched retrieval

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19910635

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19910635

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19910635

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3168300

Country of ref document: CA

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 160222)

122 Ep: pct application non-entry in european phase

Ref document number: 19910635

Country of ref document: EP

Kind code of ref document: A1