CN115829615A

CN115829615A - User grouping method, system and storage medium based on multiple databases

Info

Publication number: CN115829615A
Application number: CN202310011133.8A
Authority: CN
Inventors: 孙武祥
Original assignee: Lingchuang Beijing Technology Co ltd
Current assignee: Lingchuang Beijing Technology Co ltd
Priority date: 2023-01-05
Filing date: 2023-01-05
Publication date: 2023-03-21

Abstract

The embodiment of the invention discloses a user grouping method based on multiple databases. The method comprises the steps of storing the pulled user data by a storage method suitable for subsequent use and query requirements, splicing a plurality of query conditions by using logic judgment to realize grouping of users with the same characteristics, using different marketing strategies for different user groups, and dynamically adjusting the users and the marketing strategies of all the user groups.

Description

User grouping method, system and storage medium based on multiple databases

Technical Field

The embodiment of the invention relates to the technical field of storing user purchasing habits in a plurality of databases with different characteristics according to data properties and analyzing data in the databases, in particular to a user grouping method, a user grouping system and a storage medium.

Background

In the network era, the consumption behavior and purchasing patterns of users change with time. Nowadays, "internet +" develops rapidly, and enterprise data information is huge gradually, and the demand of accurate marketing is also constantly increasing, for letting enterprise business marketing more accurate, more effective, big data is accurate just becomes the core competitiveness of each big enterprise.

In the traditional retail industry, daily consumption customers only keep transaction data but sellers cannot analyze the data. Thus, accurate marketing is brought about. The accurate marketing is to analyze and summarize various behavior data and basic data of users to form different labels, so as to conveniently group the users and perform applicable marketing on the users in a group.

The technical core of the accurate marketing of the industry at present lies in: the system really achieves accurate marketing and is used for combining and screening various attribute conditions such as commodities, transactions, RFMG (R value: last consumption, F value: consumption frequency, M value: consumption amount, G value: gross profit contribution in nearly 1 year), hundreds of member portrait labels, point coupons, maintenance record grades, value degrees and the like.

In practical use, the following problems may exist if a conventional single-column storage scheme is used: for hundreds of user tags and categories of commodities purchased by more than ten users, putting the tags into column storage needs to be broken into a record, and a field for marking which tag is (such as a VIP tag, a consumer capability tag, an active tag or not of a user) is added, so that the data type is complex, the storage data amount is large, and the query performance is slow.

Disclosure of Invention

To solve the above problems, the present invention provides a user grouping method, system and storage medium based on multiple databases. Specifically, the invention stores the data in different databases suitable for the query requirement according to the query requirement and the characteristics of the data. During query, the invention combines OLAP and OLTP (On-line Transaction Processing) to perform data query and perform user clustering based On the result of the data query. Therefore, timeliness of data query and timeliness of data support are improved, and efficiency of data query is improved.

In a first aspect, the user clustering method based on multiple databases provided by the present invention includes: aiming at different condition characteristics, a database which is most suitable for the condition is used for storage and efficient query, a plurality of databases with different storage characteristics are aggregated into a database set, and cross-database query is realized during query; the multiple screening conditions are subjected to operation splicing by using logical operators (AND, OR, NOT) to improve the accuracy of user grouping and the query efficiency; the OLAP multi-dimensional model is used for user division, the OLAP is matched with the updated data, the data are subdivided and summarized from different angles and different layers, and the requirements of different analyses are met; different marketing strategies are adopted for different user groups, and the user constitution in the user groups is adjusted according to the updating of the user data, so that the suitability of the marketing strategies in the user groups is improved, and the self-optimization of the groups is carried out according to the updating of the user data in the using process.

In a second aspect, the present invention provides a user grouping system based on multiple databases, including: the data pulling module is used for obtaining behavior data of a user; the data storage module is used for storing the user data by using different storage methods according to the characteristics of the data; the user screening module is used for screening users with the same characteristics to form a user group according to conditions set by an administrator; the grouping updating module is used for updating the user change behavior by the dispersion; and the marketing strategy module is used for updating or newly adding marketing strategies used by the user group, and comprises the simulation promotion and adjustment of the marketing strategies.

In a third aspect, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer processor, implements the multi-database based user clustering method according to any of the embodiments of the present invention.

According to the invention, the user data is adaptively stored, the data fields serving as the screening conditions are subjected to OLAP analysis by screening the user tags or taking the data fields of the user purchasing behaviors as the screening conditions, and thus, the users are clustered. Therefore, user focusing is realized, and resources are saved; by grouping the users and applying different marketing strategies according to the obtained user groups, the method realizes that the proper commodities are recommended to the users at proper time, and provides multi-dimensional grouping and screening of the users.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention uses the database most suitable for the condition to store aiming at different condition characteristics, thereby forming a database set aggregating a plurality of different storage modes, and improving the efficiency of inquiring, updating and screening different types of user data;

2. the invention uses logical operator 'AND, OR, NOT' to splice the screening condition when users are grouped, and automatically generates the query statement for inquiring the database according to the above elements, and has the characteristics of flexible configuration of user grouping condition and strong practicability. OLAP analysis used when users are grouped can subdivide and summarize data from different angles and different levels, and the requirements of different analyses are met;

3. different marketing strategies are adopted for different user groups, the user composition is dynamically adjusted according to the user data or the marketing strategies for the user and the user groups are adopted, the marketing advantages of accurate pushing and local adjustment are achieved, and the requirements of practical application are met.

Drawings

FIG. 1 is a schematic diagram of row-wise storage and column-wise storage;

FIG. 2 is a system architecture diagram used by embodiments of the present invention;

FIG. 3 is a prototype diagram of a precision marketing page used in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a screening condition and a logical operator "AND or NOT" selection page for conditional splicing used in an embodiment of the present invention;

FIG. 5 is a prototype view of a user group page used in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only a part related to the present invention is shown in the drawings, and not all structures are shown. The scope of the present invention is not limited by the following description of the embodiments, but is defined only by the scope of the claims, and includes all modifications having the same meaning as and within the scope of the claims.

Under the traditional marketing mode, data comes from the marketing records of enterprises, and with the development of network and electronic commerce, the online traces of consumers are also converted into data, which becomes one of the sources of the marketing data of the enterprises. The data required by the 'user portrait' is constructed, besides data obtained from a database, data collected by a marketing platform and historical data in the prior art, data such as videos, pictures and the like of consumers in the internet community and geographic space information and the like are obtained as far as possible, the data can reflect attitude, preference and the like of the consumers from the side, the users are pre-grouped by using the information, and then corresponding marketing strategies are used for grouping the users, so that marketing is more accurate, and energy consumed by marketing is saved. There are many databases used to store data needed by user clustering, and the described embodiment of the invention selects the SQL Server database. The database platform can be applied to large-scale online transaction processing, data warehouse, electronic commerce and the like, can share a server in an enterprise to inquire various data, and can reduce the cost of management, production and the like for the enterprise. The user group is constructed by consulting documents, research and interview and under the suggestion of experts on the basis of the 4C theory and combining the demand characteristics of consumers. The "4C" theory includes consumer, cost, convenience, and communication. The method comprises the steps of obtaining basic information of a consumer, wherein the basic information comprises the age, the sex, the occupation and the like of the consumer, the cost not only refers to the price cost which the consumer is willing to pay, but also comprises time, physical strength and mental cost, the convenience mainly refers to the convenience for the consumer to purchase and use, the two factors can be understood from the perspective of consumption behaviors, including behaviors such as information collection behaviors before purchase, purchase records and the like, such as webpage browsing time, click records and the like, and the communication combines the consumer and an enterprise together in an interactive mode, so that the enterprise can know the loyalty, the satisfaction and the like of the consumer, and can be understood from the perspective of an evaluation and communication platform after purchase and the like. The visual representation of the user grouping may be a "user portrait". Since the "user representation" belongs to the information layer, it needs to be designed in the language of database, i.e. it needs to be completed by entity-relationship graph (E-R graph). The E-R diagram provides a way to represent entities (i.e., data objects), attributes, and associations to describe a conceptual model of the real world. The E-R diagram of the embodiment of the present invention is shown in FIG. 2. Kudu is used as bottom storage, good Scan performance is kept, and the characteristic enables OLTP (On-line transformation Processing) to be theoretically considered for high-concurrency low-delay increase-deletion change-check (insert, delete, update, select and the like) type and OLAP (On-line Analytical Processing) to BI (Business Intelligence) type query at the same time. Impala, a legacy SQL parsing engine, has been widely verified in the industry for stability and speed of Ad-Hoc Query (Ad-Hoc Query) requests, and Impala does not have its own storage engine, and is responsible for parsing SQL and connecting to its underlying storage engine. Sales details, point information, coupon information are stored using Kudu and Impala. The es in the figure is an elastic search storage, and is used for storing fields such as commodity subclasses and disease tags which are repeatedly searched in actual use. The method utilizes the characteristic of the elastic search as a search engine to efficiently process the search task and carries out statistics based on the search result, and the aggregation function of the method enables the statistical result to be processed conveniently. A Greenplus database for storing membership portrait tags provides support for a user tag system. The greenplus database is a relational distributed database with the fastest highest cost performance in the industry, adopts an MPP (Massive Parallel Processing) architecture on the basis of PostgreSQL, has strong large-scale data analysis task Processing capacity, and can group users from multiple dimensions. Then, the Impala, the elastic search and the user tag system are aggregated to form a fused database application bdp-application for condition management, and the data source in the application is associated with the condition in the condition management library. Scrm in the figure is the user grouping system of the present invention. The manager is used as a user of the whole system, and the condition retrieval result spanning multiple databases and data sources can be obtained only by introducing the screening condition through the user clustering system.

According to the user grouping method disclosed by the embodiment of the invention, a database which is most suitable for different condition characteristics is used for storage and efficient query; the multiple screening conditions are subjected to operation splicing by using logical operators (AND, OR, NOT) to improve the accuracy of user grouping and the query efficiency; the OLAP (Online Analytical Processing) data Processing technology is used for supporting complex analysis operation, and the decision support for decision personnel and high-level management personnel is emphasized. OLAP is used as a mode for outputting Data to the outside by a Data warehouse, and comprises interactive and ad hoc query and Data analysis query (Data Analytics); different marketing strategies are adopted for different user groups, and the user constitution in the user groups is adjusted according to the updating of the user data, so that the suitability of the marketing strategies in the user groups is improved, and the self-optimization of the groups is carried out according to the updating of the user data in the using process.

Step one, data storage

First, collected user behavior data needs to be stored, as shown in fig. 1, specifically, a specific data storage method is used according to different characteristic conditions (e.g., simple label conditions such as "yes/no", time attribute conditions such as "business flow events", and "array" type conditions).

According to the data characteristics, different storage structures are adopted for the data with different characteristics, and the data are mainly divided into line type storage and column type storage. The reading order and results of row-wise storage and column-wise storage are shown in fig. 1. As shown in fig. 1, the data of a table in the row-type storage is put together, but is stored separately in the column-type storage. Therefore, the data stored in the line is stored together, and the insertion (insert) and the update (update) of the data are easy, but when the condition screening is performed (select \8230; where only some columns in the table are involved), all the data in the table need to be read, which causes large consumption of storage resources and computing resources. The column storage only reads the related column during query, has high efficiency during data projection (project), and any column can be used as index, but when data selection is completed, the selected column needs to be reassembled, and it is troublesome to insert (insert) and update (update) the data. Therefore, the embodiment of the invention selects the line type storage mode and the column type storage mode which are suitable for the use requirement according to the requirements of data query, insertion and update.

A data table of Kudu is divided into a plurality of cells according to a hash table or a zone partition, each cell comprises metadata information and RowSet information representing specific data, and the RowSet comprises a MemRowSet and 0 to a plurality of DiskRowSets. MemRowSet is a line store and DiskRowSet is a column store, which is the basis by Kudu to support some analytical queries, and MemRowSet is based on primary key ordering. The Impala client accesses data stored in the third party database engine, and thus the data storage format of Impala depends on the shared third party database. The Elasticissearch externally provides the concept of indexes, which can be analogized to a database, user query is completed on the indexes, and each index is composed of a plurality of boards, so that the distributed and extensible capability is achieved. The shard is the minimum unit of the elastic search data storage, and the storage capacity of the index is the sum of the storage capacities of all the shards. The storage capacity of the Elasticsearch cluster is then the sum of the storage capacities of all indexes. The columnar stores used in the Elasticissearch are called docvalues, and besides the original store, the inverted index, the Elasticissearch also stores a copy of docvalues for analysis and sorting. Greenplus supports both row and column storage. However, in order to increase the compression ratio of data, columnar storage is often used.

When the value to be stored is a simple numeric class or a boolean class, such as 0, 1, yes, no, etc., it is suitable to store the data in columns, that is, to use a column-wise storage manner, such as writing the data into a bitmap. The bitmap index uses each indexed field as an index key (key), each bit represents one row, and when the row contains a target value, the row is set to be 1, otherwise, the row is set to be 0. From the structure of the bitmap index, it is known that, on the indexed column, each index field corresponds to a bit string, and the length of the bit string is the number of records. Thus, the columns of the bitmap index must not have too many fields, preferably 100 to 10 ten thousand fields, i.e., the bitmap index of such a table has 100 to 10 ten thousand bit strings.

For storing business flow data such as sales details, coupons, points and the like, which are behavior records with time range attributes, similarly, a column type storage mode is also suitable to be used, so that interval screening is performed on time columns according to time periods, and therefore, for example, impala + Kudu is used to support custom time interval screening. The Kudu is used as a bottom layer storage, supports high-concurrency low-delay KV query (KeyValue query, namely query of data meeting a certain condition), and simultaneously keeps good Scan performance. As an SQL parsing engine, the stability and speed of the SQL parsing engine facing Ad-Hoc Query (Ad-Hoc Query) type requests are widely verified in the industry, and the ipla does not have its own storage engine, and is responsible for parsing the SQL (Structured Query Language) and connecting with its underlying storage engine. In the actual use process, the temporarily spliced query statement is often used for business flow data such as sales details, coupons, points and the like, and certain requirements are provided for the query speed, so that the requirement in the actual use process can be met by using an Impala + Kudu framework to store and query the data.

Because data sources such as Kudu and bitmap do not support array search, line storage, for example, elasticSearch storage, is required for array format data such as commodity subclasses and disease labels. The ElasticSearch is a distributed, high-expansion and high-real-time search and data analysis engine. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring. The horizontal flexibility of the elastic search is fully utilized, so that the data can become more valuable in a production environment. The implementation principle of the ElasticSearch is mainly divided into the following steps that data are submitted to an ElasticSearch database, a word controller is used for segmenting words of corresponding sentences, the weights and word segmentation results are stored in the data, when the data are searched, the results are ranked and scored according to the weights, and then query results are presented. Therefore, in actual use, the administrator is required to set corresponding weights for each keyword or condition for retrieval, so as to obtain more accurate query results.

In the embodiment of the invention, when commodity classification and disease label purchase data are calculated according to big data, xxl-joba scheduling tasks are used, data are fetched from a Hive data warehouse every day and are poured into Kudu, the data of the data warehouse are sourced from each business database and are captured at regular time, then multiple data sources are queried concurrently (data query under different attribute conditions is performed in each corresponding data source), and memory aggregation results (secondary calculation is performed according to results returned by each data source), so that query screening efficiency is greatly improved. And the xxl-jobs schedule a plurality of executors to execute tasks through a central scheduling platform, the scheduling center ensures the consistency of cluster distributed scheduling through a database lock, and the database is responsible for scheduling and executing the tasks. Then, the commodity classification and the disease label purchase data of the user are calculated according to the big data, and the data are written into an array (Nested structure) of the ElasticSearch, and the array format data such as the commodity subclass, the disease label and the like are stored into an Object array by using the ElasticSearch Nested structure. For example, when the characteristic value is searched from the array list and contains a certain item of data, if the characteristic value is screened out by a user with the purchase frequency of the commodity subclass A purchased by the user exceeding 2 times, the xxl-job period triggers the Spark to calculate information such as the purchased commodities and disease labels of the members every day and write the information into the Elasticissearch. The Nested is actually an Object array, and because the ElasticSearch natively supports the Object type, any field can have a plurality of objects, and the ElasticSearch has the advantage that all fields are multi-valued, that is, all fields can be list. The elastic search using the Nested structure can change each document in the list into a subdocument to be stored, so that the specific structure information of the document can be known during query. Therefore, using the Nested structure of elastic search to store and query array indexes is very fast.

Step two, user screening

In addition, the embodiment of the invention can configure different screening conditions for different scenes, thereby obtaining different user groups.

Specifically, when setting the filtering condition, the embodiment of the present invention supports the association of the filtering condition with the database, the data table, and the field in the data table. As shown in FIG. 4, the "buy goods" filter field is associated with a sales detail table stored using the Impala + Kudu schema, and the "Chronic regular prescriptive crowd" field is associated with a user tag table in the Greenplus database indexed using bitmap. And the user tag table is calculated by big data according to the sales detail table, so that the association of the database and the data table is realized.

Then, the database query statement is spliced according to the selected screening condition and the set logic judger (hereinafter, referred to as "AND, OR, NOT"). As shown in fig. 4, splitting is performed according to the data sources associated with the screening conditions, and the data sources with the same screening conditions displayed on the page are displayed at similar positions, so that splitting is facilitated. For example: A. the D and E conditions can be queried on the ElasticSearch data source, and the three conditions are adjacent when the page is displayed. And then recording the AND or NOT relation between each data source condition group after splitting, such as: and the Kudu data source A and B condition set and the ' C and D condition set of the elastic search data source ' or the ' E and F condition set of the bitmap data source ' are obtained, the calculation results of the data sources are returned, and then calculation is carried out according to the ' AND or ' NOR ' logic relationship of the data source condition set in sequence, so as to obtain the final result. And for the condition group in each data source, assembling corresponding database query statements according to AND or not logical operators. Due to the fact that different data storage methods are adopted for storage, due to the fact that storage structures are different, corresponding database query sentence assembling methods are different. For example: queries in Kudu and Greenplus require the use of SQL statements, while queries in ElasticSearch require the use of QueryBuilder.

It should be noted that, when selecting the data field stored in the ElasticSearch, due to the screening query characteristic of the ElasticSearch, the weight corresponding to the data field stored in the ElasticSearch may be manually set on the screening condition selection page, or the ElasticSearch may be made to preload the corresponding weight table of each field before the screening query is performed, so as to obtain an accurate screening result.

For a new query field, for example, a "product category" query field, it needs to be set by the administrator in the background and checked whether the field really exists in the database. Such as the specified type of configuration field "enumeration, time range, normal range (e.g., = 50), set", database "bitmap, kudu, elastic search", field-associated database name, table name, field name, storage format (e.g., JSON).

The user labels are stored in a greenplus database based on bitmap indexing and are calculated by using big data, the user labels can reach the level of millions to billions, and because the member labels are not repeated, and a plurality of conditions are required to be inquired for a user subsequently, when the bitmap indexing is used, the more the conditions are used for inquiring, the more the data filtered by AND or the conditions are, and the less the returned result set is. When we perform such a query on the tag field of the membership tag stored using bitmap, a high query efficiency can be obtained. Similarly, according to the ad hoc query and high concurrency characteristics of Kudu, and the horizontal scalability of the ElasticSearch. The ability to search, analyze and explore large data is greatly improved.

In addition, the selection of the query condition field is carried out by using the page, so that the usability of database query is improved on one hand, and on the other hand, the selection of AND or NOR logic operators is directly specified in the page and is used for splicing query statements, the query times can be simplified, accurate user groups under the combined action of a plurality of screening conditions can be obtained, the further feedback adjustment of subsequent user images is facilitated, and corresponding marketing strategies are adopted for the user groups with different characteristics.

In the using process, an administrator can select conditions and a relation or a non-relation in a management platform, send scrm (call a service of accurate marketing), call accurate marketing service bdp-application (the accurate marketing service can be called by any service needed by the upper layer, and a member list after screening can be obtained as long as screening conditions are sent in according to specifications), query and match are carried out on data sources according to the conditions by the bdp-application, SQL query is carried out on multiple conditions, different data source queries are removed, if a condition group of each data source has a 'non' relation, the query is divided into two times (the 'non' query is placed in a JVM memory for operation), and finally result aggregation is carried out in the bdp-application, and a result set meeting the conditions is returned.

On the basis, the method can also form a multi-dimensional analysis cube which simultaneously meets a plurality of conditions through the AND relation in the grouping process. If we use a large query field as a coordinate axis, for example, the "product large category" as the X coordinate axis, the "user" as the Y coordinate axis, and the "region" as the Z coordinate axis, then a "high activity middle-aged user in Beijing who buys changyanning" will include queries in three dimensions, that is, changyanning, northward China and high activity middle-aged user ". It is known that the higher the number of query dimensions, the longer the query takes, and the less efficient the query. The label of enteritis peace and commodity category also includes the sub label of digestive tract medicines, so we can think that there is a cross of the grade of query label field between enteritis peace and commodity category. When processing the query statement of 'high activity middle-aged user of Beijing purchasing enteritis peaceful', for example, the invention does not directly traverse the data contained in all the commodity categories, but firstly queries the field dictionary of 'enteritis peaceful', namely firstly determines which commodity subdivision of 'enteritis peaceful' under 'commodity categories', and only traverses all the medicine name data under the commodity subdivision of 'gut category commodity' after determining that the 'enteritis peaceful' belongs to the commodity subdivision of 'gut category commodity'. At this time, "changyanning" is the new X1 axis. For the user as the Y coordinate axis, the traditional method needs to traverse all users once to find all middle-aged users, and then traverse the screened middle-aged users once to find the high-activity users contained therein, which requires a long time. When the technical scheme is used for inquiring, the user data can be directly traversed once, two fields of the middle-aged user and the high-activity user are extracted according to columns, then the two fields are intersected through the AND, so that an inquiry result of the high-activity middle-aged user is obtained, and the high-activity middle-aged user serves as a new Y1 axis. For the Z axis as the area, the same search mode is adopted as the X axis as the commodity category, only data under the field of the north China area at the previous stage of Beijing is traversed, the data volume needing to be traversed is reduced, and the obtained Beijing becomes a new Z1 axis. At this time, a new multidimensional analysis cube satisfying the conditions of "high activity middle-aged users" in "Beijing" region for purchasing "enteritis peaceful" drugs is composed of X1, Y1, and Z1 axes. In this case, the multidimensional analysis cube formed as described above may be directly derived as a user cluster.

As shown in fig. 5, first click on the user cluster identifier (1 in the figure) within the page to create the cluster. The grouping can be performed while the condition setting and grouping of the other users are performed. The status of user clustering may be viewed in the box labeled 2 after the clustering condition setting is done. Specifically, the user groups are divided into three states, "all" is all user groups, "in-computation" is user group in the user condition screening process, and "completed" is user group with completed condition screening. All the user clustering results are shown in the box labeled 3.

In addition, the user grouping system using the user grouping method provided by the invention has the advantages that the databases contained in the user grouping system are loosely coupled, and the databases are deployed into an integral database system through bdp-application, so that the modular architecture provides possibility for adopting different line type storage or column type storage modes for data, and provides the greatest flexibility for seeking a better storage mode for the data. Facilitating fusion in future cloud computing and cloud-native technologies, resulting in a computationally intensive, resilient, manageable and observable loosely coupled system. The bdp-application of the present invention has two stateless SQL layers, the first one responsible for all online transaction processing (OLTP) work and 80% of the conventional online analytical processing (OLAP), which uses a distributed approach involving the specific query, add, delete, modify work by the coprocessor dropping part of the request into different underlying databases (e.g., kudu, elasticSearch, greenplus, etc.). For more complex OLAP work, that is, grouping different users, establishing different marketing strategy mappings for a plurality of grouping results, or further training the iterative analysis of the machine learning model or real-time intelligent service acquisition, the invention is responsible for the second stateless SQL layer.

Further, since the drug purchaser does not frequently change in a short period of time, the user grouping result according to the present invention may be stored or backed up to a plurality of sub-servers. User clusters in the form of multidimensional analysis cubes resulting from the multi-conditional screening can be aggregated into smaller cubes again using the conditional screening due to the characteristics of the user data it stores. Since the cube of the user clustering results has been significantly reduced in data size compared to the data space formed by all databases, more complex logic can be involved in the query. For example, some analysis-oriented operations may be performed, such as aggregation functions like count, sum, AVG, etc.

The further analysis based on the user grouping result according to the present invention may comprise: drill-down, roll-up, slice (Slice), dice (Dice), and rotate (Pivot). Drilling refers to changing among different levels of the dimension, reducing from an upper layer to a thinner next layer, or splitting summarized data into more detailed data, for example, looking up the total sales data of a quarter in 2020 under a certain user group, the names of drugs with sales amounts ranked three first, or drilling up and looking up the sales data of 3 months to 5 months in Hangzhou City, ningbo City, wenzhou City, zhejiang province. The rollup is the inverse operation of the drill-in, aggregating the user-grouping results towards higher layers. For example, the purchase data of highly active users in Beijing, tianjin and Hebei are summarized to view the sales data in Beijing Ji area. Slicing refers to selecting a specific interval from the cubes of the user clustering results for analysis, such as selecting sales data in the first quarter of 2020, or purchasing data of users who purchase digestive tract medicines and cardiovascular medicines. Rotation refers to the exchange of coordinate axis positions, such as the exchange of the commodity broad axis and the geographic axis.

Compared with the traditional OLAP, the OLAP combined with OLTP has the advantages that the advantages of the multi-view and multi-level data organization form of the traditional OLAP multi-dimensional model are retained, and meanwhile, the analysis result of the OLAP can be updated along with data updating through a flexible database storage mode and is not only stored on the basis of unchangeable data. Although such a data organization method based on the multidimensional model is similar to the way we usually see various things, allowing a decision maker to analyze data in a multi-view and multi-level manner, the OLAP formally applies such a thinking model to a method of big data analysis, and needs to match continuously updated data with the user grouping division of the multidimensional model of the OLAP. The data which is subdivided and updated improves the flexibility of analysis, and the data can be subdivided and summarized from different angles and different levels, so that the requirements of different analyses are met.

Step three, feedback analysis

Finally, the embodiment of the invention provides a feedback verification function, namely, the user behavior feedback of the users in the user group under the marketing means is analyzed again on the basis of implementing different marketing means to different user groups, so that the user groups and the marketing means corresponding to the user groups are calibrated.

Specifically, after a plurality of second-step user screening under different conditions, a plurality of screening results are obtained, as shown in fig. 5, that is, a user group consisting of a plurality of users with similar consumption characteristics. At this time, the user grouping result can be imported into the user marketing module and a user grouping is newly established. Because the selling process is influenced by a plurality of factors in life, the embodiment of the invention takes the point 24 on the day as the time node for automatically updating all the user groups and periodically updates the user group information, thereby dynamically adjusting the user groups and enabling the users in different groups to use the service with more timeliness. The updating of the data is delta updating, the latest behavior data of the user is checked, the comparison is carried out with the newly acquired behavior data, and only the changed part is updated, so that the updating transaction amount of the database is reduced.

Similarly, the screening condition of a certain user group, that is, the corresponding number of people, can also be checked on the page. When a marketing strategy corresponding to a certain grouping is updated, in order to facilitate adjustment, an update link can be clicked, and when a new marketing strategy is imported, simulated user feedback is selected, so that a newly uploaded marketing strategy is adjusted online in a virtual user model fitted in historical user behavior data, the virtual user model can be stored on a virtual server, the uploaded new marketing strategy is not immediately issued to a user, but a prediction result after the new marketing strategy is implemented is obtained in the virtual user model, and a more scientific marketing strategy fitting practical situation is obtained. After a certain administrator confirms that the strategy is pushed out, the strategy is pushed to the rest one or more administrators, the server really serving the user can enter the server after the confirmation push of more than half of the administrators is obtained, and the server is further sent to the user.

When different marketing strategies are used for different user groups, the user labels are updated according to user behaviors of users included in the user groups, so that the positions of the users in the user groups are adjusted, a user group updating threshold value is set, and when the number of the users adjusted in the user groups reaches the updating threshold value, whether the marketing strategies corresponding to the user groups are reasonable or not is checked. Typically, the review and adjustment of marketing strategies may be made by a human team responsible for marketing, but may also be adjusted by the delivery of artificial intelligence. The artificial intelligence is artificial intelligence of a dynamic model established by deeply learning the relation between the user behaviors and the marketing strategies, and the artificial intelligence is endowed with the authority for changing the composition of the user group in the database and the corresponding strategy.

The embodiment of the invention also provides a user grouping system which is provided with a data pulling module and is used for obtaining the behavior data of the users; the data storage module is used for storing the user data by using different storage methods according to the characteristics of the data; the user screening module is used for screening users with the same characteristics to form a user group according to conditions set by an administrator; the grouping updating module is used for updating the user change behavior by the dispersion; and the marketing strategy module is used for updating or newly adding marketing strategies used by the user group, and comprises the simulation promotion and adjustment of the marketing strategies.

An embodiment of the present invention further provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for user grouping as described above.

Those of skill in the art will understand that the logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be viewed as implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, any one or a combination of the following techniques, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

All possible combinations of the technical features of the above embodiments may not be described for the sake of brevity, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express one embodiment of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application.

Claims

1. A user grouping method based on multiple databases is characterized by comprising the following steps:

storing the user data pulled from the data terminal into a service database, wherein the storing is one of line type storing or column type storing which is suitable for data characteristics and use requirements;

screening and grouping a plurality of conditions using logical operators for users according to the user data in the service database, thereby obtaining a plurality of user groups;

setting different marketing strategies for the plurality of user groups;

and performing user composition updating based on user behavior data and marketing strategy updating corresponding to the user groups on the users in the user groups.

2. The method of claim 1, wherein:

the user data is pulled by a timing data pulling task, data is pulled from a data warehouse every day and is filled into a business database, and the data of the data warehouse is sourced from each business data terminal.

3. The method of claim 1, wherein:

and the user data is pulled according to the request of the corresponding manager terminal, after the request of the manager terminal is received, one or more columns of data specified by a manager are pulled from the data warehouse and filled into the service database, and the data of the data warehouse is sourced from each service data terminal.

4. The method of claim 1, wherein:

the data characteristics refer to data types and data attributes, including simple numeric classes, boolean value classes, arrays or time range attributes; the use requirement refers to the requirement of database query or update, and comprises interval screening according to time periods and screening according to commodity subclasses or label categories.

5. The method of claim 1, wherein:

the logical operators include AND, OR, NOT.

6. The method of claim 1, wherein:

the screening grouping of the conditions comprises the steps of firstly setting screening conditions, wherein the screening conditions are associated with a database, a data table and fields in the data table in the database; and secondly, splicing a plurality of screening conditions by using the logical operator, wherein the splicing is the splicing of the syntax of the database query statement corresponding to the database where the corresponding data is located.

7. The method of claim 6, wherein:

the new addition of the screening condition needs to be set and checked by an administrator to determine whether the field really exists in the database, and to specify the database name, the table name, the field name and the storage format in the database related to the field.

8. The method of claim 1, wherein:

the updating of the marketing strategy includes testing and adjusting the marketing strategy in a model modeled from historical user data.

9. A method for hierarchical identification of data according to claim 1, characterized in that:

the updating of the user composition comprises updating the user composition in one or more user groups according to the updated user data, or updating the user composition in one or more user groups after the user composition in one or more user groups reaches the updated threshold value of the user group.

10. A user clustering system applying the method of any one of claims 1-9, comprising:

the data pulling module is used for obtaining behavior data of a user; the data storage module is used for storing the user data by using different storage methods according to the characteristics of the data; the user screening module is used for screening users with the same characteristics to form a user group according to conditions set by an administrator; the grouping updating module is used for updating the user change behavior by the dispersion; and the marketing strategy module is used for updating or newly adding marketing strategies used by the user group, and comprises the simulation promotion and adjustment of the marketing strategies.

11. The user clustering system according to claim 10, wherein:

the user group can be an independent system on the server control host, or any terminal with the authority of accessing user data.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out a method for user grouping according to any one of claims 1 to 9.